Musings of an Asymptotic kind

Favourite Bike Ride 1: Palo/Alto Woodside loop

National Parks in America

I was born and raised in a city. All I knew was buildings, loads of people, gravel roads, malls, theaters and so on. The din of urban life. We used to go on occasion to “exotic places” in South India on vacation, only to be barricaded within a man made structure called a “resort”, a city away from a city. I never knew what it was to see wilderness, nature working as it should be from the times of our forefathers and those before them. 

The first time I visited Yosemite, as I entered the park, I saw a huge wall of granite to my side and a huge green meadow on another. I would later know that the wall was El Capitan, a Yosemite icon known for rock climbing. As I explored the valley, I saw the towering Yosemite falls in the distance. While the valley was extremely crowded, this size of a crowd was still small compared to the din of the city. I was amazed at the natural beauty of the surroundings, the best part being that it was conserved so that I and future generations can enjoy the great outdoors in all its glory. 

I researched more in depth about Yosemite and came across John Muir’s writings. His writings had such depth and vigor about them that as I was reading his books I could feel his sense of passion for the surroundings. The idea that National parks could be a place of spirituality resonated with me completely. It is probably the only place you can be alone with your thoughts and reflect on thoughts unconsciously. It is probably the only place where you realize your place in the world, a tiny insignificant being in the larger universe. Carl Sagan’s Pale blue dot probably conveys this best.

Over time, I’ve grown to love visiting National Parks, each so varied and different from the other. So far, I’ve been to Yosemite(my favorite), Sequoia and Crater Lake, only 3 of the 59 National Parks in the US. I want to see all of them in due time. However, global warming and the increasing encroach of man have started to show its signs in forms of receding glaciers, higher sea levels, drier lakes and globally higher temperatures. I’m afraid that the waterfalls we know today may not be present tomorrow, the lakes we know today may be reservoirs of the future. We all need to step forward to make a conscious choice in conserving the environment around us. Don’t use plastic bottles, use reusable bottles and re-fill whenever possible. Commute by public transit or bike to work a couple of days a week. Don’t litter trails in parks, would you like it if someone shit in your backyard? National Parks are defenitely America’s best idea, more so than NYC’s skyline or the Golden Gate bridge in San Francisco. Lets keep it that way. 

Suffix Trees: Part II Substring search

OK, so lets recall the substring search problem that we discussed in the previous post. We are now going to see an awesome application of Tries to solve this problem.

What is a suffix Tree?

A suffix tree is a Trie of all possible suffixes of a string.

Suffix Tree of the string "test"



.

E.g. For the string “test”, the suffix tree is a trie constructed from the set of strings(suffixes) “test”,”est”,”st” and “t”.

How this solve our substring problem?

There is an interesting property to note about substrings. Every substring of a given string, is a prefix of some suffix of the string. Hence, searching for a string q in the suffix tree will take only O(|q|) time(since its a Trie). Isn’t that mind blowing? 

Implementation

My implementation of suffix trees can be found at 

https://github.com/varrunr/algo-lib/tree/master/src/stree

Drawbacks

The drawbacks of Tries apply here as well.

1.The implementation complexity is higher

2. The cost of construction of a suffix tree is of the order O(n^2) in terms of time and space. Of course the space complexity can be reduced by a few tricks, like collapsing a path containing 1-degree nodes into references to the original string. There also do exist sophisticated algorithms that can build the entire tree in linear time.(Weiner’s, Ukkonen’s, McCreight’s). I will not delve into these, they are a topic for another day. 

PS: My next couple of posts will cover some extra applications of suffix trees like finding the longest Common subsequence and longest palindrome. 

Suffix Trees: Part I : Intro, Tries

Intro

A Suffix tree is a data structure which is so ingenious in many ways. Its primary application is in string algorithms. I will primarily talk about its application in substring searching. Before going into the subject, I will give an intro to the basic problem at hand.

The Problem

Given a string x, search for the existence of a substring y. A substring is defined as a contiguous sequence of characters in the search string that matches a given string. 

Naive Solution

The obvious solution to the given problem would be to start at every character of the search string and try to match every character of the input string through a sequential scan. The algorithmic complexity for this would be O(|x|*|y|). [Why?] Its rather clear that this is not the most efficient of algorithms if my search list of strings is large. 

Tries

Before we go into suffix trees, we need to talk about a data structure which forms the basis for suffix trees, a Trie. Lets first see the problem a Trie tries to solve.

The Problem

Given a set of n strings, build and maintain a data structure to efficiently find, insert and delete the string associated with a query string q. 

What is a trie?

A trie a tree where every path from root to leaf defines one of the strings of the dictionary, where each edge along the path defines the character of a string in increasing order. The root represents a null string. Any two words in the dictionary branch off at the first distinguishing character i.e they share a common prefix. 

Here is an example of a trie of the set of string ‘a’, ‘an’, ‘the’, ‘there’ and ‘us’

Trie

As you can see in the above figure, “an” and “and” share the common prefix “an” and branch off from there. The black nodes mark the end of a string.

As you can see here, the time taken to search for a string in the dictionary can be done in O(|q|) comparisons. Insertions and deletions can also be done in O(|q|) time. 

Drawbacks of Tries

Implementing tries is not straightforward because of the complexity involved in it. Also, there is a one time overhead of constructing the trie which takes O(|n|*k), where k is the average length of the string. 

In the ensuing post, I will talk more in detail about Suffix Trees and their applications. 

References

[1] The Algorithm design manual, Steven Skiena

Intro

Hi everyone,

I am pursuing my Masters in Computer Science a Stony Brook University, NY. Following today, I plan to use this as a notepad for all the cool stuff I am learning in my course work and otherwise. I hope it may be of use to people. Please feel free to suggest any mistakes in the content, or that I add in some stuff.