today just decided to write the Knuth–Morris–Pratt string searching algorithm , implementation in ruby.
As you can see that the algorithm is divided into two parts , one is the preprocess , which is the crux of the Algo and the actual string searching using the preprocessed metadata.
The preprocess function is run only on the patterns, and it calculates π[i] = the longest proper prefix of pattern[0..i] , which is also a suffix of pattern[0..i].
e.g. if the pattern you are searching for is
P = [ ‘A’,'B’,'A’,'B’,'A’,'C’,'A’]
the π array will look like
π = [ 0,0,1,2,3,0,1]
which shows that longest running prefix of the pattern is ABA . Buy doing the above calculations the algo saves itself some of the comparisons which it had to do in the naive , brute force approach.
if you get the preprocessing function then the rest of the things are straight forward , the algorithm is intelligent enough to understand that if next char in the seq does not match , but if it is part of the substring which will match the entire pattern then it will not do the additional comparisons.
As you can see that the time complexity of the search function is O(n) , and that of the prepropcess should be O(k) where k is the length of the pattern.
In my case the time complexity of preprocess is more than O(K) because of the additional running_sequence method , but that could be removed if you maintained the seq to match the prefix in the preprocess loop itself.
Comments
Post a Comment