Skip to main content

Ruby implementation of the famous Knuth–Morris–Pratt Algorithm

today just decided to write the  Knuth–Morris–Pratt string searching algorithm , implementation in ruby. 



As you can see that the algorithm is divided into two parts , one is the preprocess , which is the crux of the Algo and the actual string searching using the preprocessed metadata.


The preprocess function is run only on the patterns, and it calculates π[i] = the longest proper prefix of pattern[0..i] , which is also a suffix of pattern[0..i].


e.g. if the pattern you are searching for is 

P = [ ‘A’,'B’,'A’,'B’,'A’,'C’,'A’]


the π array will look like


π = [ 0,0,1,2,3,0,1]


which shows that longest running prefix of the pattern is ABA . Buy doing the above calculations the algo saves itself some of the comparisons which it had to do in the naive , brute force approach.



if you get the preprocessing function then the rest of the things are straight forward , the algorithm is intelligent enough to understand that if next char in the seq does not match , but if it is part of the substring which will match the entire pattern then it will not do the additional comparisons.



As you can see that the time complexity of the search function is O(n) , and that of the prepropcess should be O(k) where k is the length of the pattern.


In my case the time complexity of preprocess is more than O(K) because of the additional running_sequence method , but that could be removed if you maintained the seq to match the prefix in the preprocess loop itself.



 

Comments

Popular posts from this blog

Fractals and Mandelbrot Set

While mathematics is in itself quite interesting and forms the basis of any modern day research, be it computational biology, machine learning or building complex structure, it can be quite a challenge to decide where to start.  That is why i decided to explore Fractals, thinking of it as a bridge between the nature and science. It brings in some really fascinating concepts which should be good enough for me as a gateway go deeper.  Fractals are in simple language never ending patterns which keep on repeating without an end, because fractals are never ending they have an infinite perimeter but finite area.  Since the patterns repeats indefinitely but if you draw a circle around the peremeter the area will remain finite.  It is like adding 1+0.1+0.01+0.001 and never making 2 This video explains the basic concept really well  Fractals are found everywhere nature in Trees, Rivers, Branching patterns, Hurricanes and Galaxies. It tries to bring order and understandin...
It was a great experience to talk to a huge audience in Mumbai and Delhi about how to start your ML journey at Google Cloud Summit ’18 India

Las Vegas

January has been pretty exciting for me since last three years, It marks my annual ritual trip to Las Vegas, this is my third trip this year. Almost all of my trips have been because of work. Since Las Vegas has some of the biggest hotels[1] in the world it has become a natural choice for some of the biggest American companies to host their internal and external conferences. As an example, CES hosts some 200,000 people every year [2]. This year I started my 30-hour flight journey, yes you heard it right I had a 9-hour layover at the London airport. Though it was far from ideal it was not that bad. If you can arrange priority pass [3] card from your credit card company or pay for it. Then you can use the lounge, showers and much more for free. I manage to steal the below picture from a hoarding at the airport.  Then after much rest and time pass at the airport terminal, I boarded my next flight to Las Vegas. When I landed McCarran International Airport, the first surprise for me was the...