Suffix Trees: Part II Substring search

OK, so lets recall the substring search problem that we discussed in the previous post. We are now going to see an awesome application of Tries to solve this problem.

What is a suffix Tree?

A suffix tree is a Trie of all possible suffixes of a string.

Suffix Tree of the string "test"



.

E.g. For the string “test”, the suffix tree is a trie constructed from the set of strings(suffixes) “test”,”est”,”st” and “t”.

How this solve our substring problem?

There is an interesting property to note about substrings. Every substring of a given string, is a prefix of some suffix of the string. Hence, searching for a string q in the suffix tree will take only O(|q|) time(since its a Trie). Isn’t that mind blowing? 

Implementation

My implementation of suffix trees can be found at 

https://github.com/varrunr/algo-lib/tree/master/src/stree

Drawbacks

The drawbacks of Tries apply here as well.

1.The implementation complexity is higher

2. The cost of construction of a suffix tree is of the order O(n^2) in terms of time and space. Of course the space complexity can be reduced by a few tricks, like collapsing a path containing 1-degree nodes into references to the original string. There also do exist sophisticated algorithms that can build the entire tree in linear time.(Weiner’s, Ukkonen’s, McCreight’s). I will not delve into these, they are a topic for another day. 

PS: My next couple of posts will cover some extra applications of suffix trees like finding the longest Common subsequence and longest palindrome.