Skip to main content

Suffix Trees , a bit complicated but really powerful data structure

Continuing with my obsession with the string matching algorithms , after 
KMP algorithm which does string matching in O(n +k) time , where n is length of the string and k is the length of the pattern. I tried exploring another data structure called as “Suffix Trees” , Suffix trees are nothing but compressed tries , where you model all the suffix present in a string in an optimized tree/trie structure.



this is how a suffix tree for word “BANANA” will look like




some of the other common use cases , where suffix trees can be used is 



  • Find the longest common substring of the strings S and T. This can be done in time O(|S| + |T|)


  • Find the longest substring of S that appears at least m > 1 times. This query can be answered in O(|S|) time.




Traverse the suffix tree labeling the branch nodes with the sum of the label lengths from the root and also with the number of information nodes in the subtrie.


Traverse the suffix tree visiting branch nodes with information node count >= m. Determine the visited branch node with longest label length.




  • Find all strings that contain a pattern P. Suppose we have a collection S1, S2, … Sk of strings and we wish to report all strings that contain a query pattern P. For example, the genome databank contains tens of thousands of strings, and when a researcher submits a query string, we are to report all databank strings that contain the query string. To answer queries of this type efficiently, we set up a compressed trie (we may call this a multiple string suffix tree) that contains the suffixes of the string S1$S2$…$Sk#, where $ and # are two different digits that do not appear in any of the strings S1, S2, …, Sk. In each node of the suffix tree, we keep a list of all strings Si that are the start point of a suffix represented by an information node in that subtrie.

i have referred to http://stackoverflow.com/questions/9452701/ukkonens-suffix-tree-algorithm-in-plain-english/22126672#22126672


to write an implementation for the suffix trees , the above link explains it very nicely on how to build your own suffix trees.





Comments

Popular posts from this blog

Fractals and Mandelbrot Set

While mathematics is in itself quite interesting and forms the basis of any modern day research, be it computational biology, machine learning or building complex structure, it can be quite a challenge to decide where to start.  That is why i decided to explore Fractals, thinking of it as a bridge between the nature and science. It brings in some really fascinating concepts which should be good enough for me as a gateway go deeper.  Fractals are in simple language never ending patterns which keep on repeating without an end, because fractals are never ending they have an infinite perimeter but finite area.  Since the patterns repeats indefinitely but if you draw a circle around the peremeter the area will remain finite.  It is like adding 1+0.1+0.01+0.001 and never making 2 This video explains the basic concept really well  Fractals are found everywhere nature in Trees, Rivers, Branching patterns, Hurricanes and Galaxies. It tries to bring order and understandin...
It was a great experience to talk to a huge audience in Mumbai and Delhi about how to start your ML journey at Google Cloud Summit ’18 India

Las Vegas

January has been pretty exciting for me since last three years, It marks my annual ritual trip to Las Vegas, this is my third trip this year. Almost all of my trips have been because of work. Since Las Vegas has some of the biggest hotels[1] in the world it has become a natural choice for some of the biggest American companies to host their internal and external conferences. As an example, CES hosts some 200,000 people every year [2]. This year I started my 30-hour flight journey, yes you heard it right I had a 9-hour layover at the London airport. Though it was far from ideal it was not that bad. If you can arrange priority pass [3] card from your credit card company or pay for it. Then you can use the lounge, showers and much more for free. I manage to steal the below picture from a hoarding at the airport.  Then after much rest and time pass at the airport terminal, I boarded my next flight to Las Vegas. When I landed McCarran International Airport, the first surprise for me was the...