Tell me more ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free, no registration required.

I have been stuck for some time on which is the fastest string search algorithm, heard many opinions, but in the end I'm not sure.

I have heard some people saying that the fastest algorithm is Boyer-Moore and some saying that Knuth-Morris-Pratt is actually faster.

I have looked up for the complexity on both of them but they mostly look the same O(n+m). I have found that in the worst case scenario Boyer-Moore has an O(nm) complexity compared to Knuth-Morris-Pratt which has O(m+2*n). Where n=length of text and m=length of pattern.

As far as I know Boyer-Moore has a linear-worst case-time if I would use the Galil Rule.

My question, Over all which is actually the fastest String search algorithm (This question includes all possible sting algorithms not just Boyer-Moore and Knuth-Morris-Pratt).

Edit: Due to this answer

What I'm exactly looking for is:

Given a text T and a pattern P I have to find all the appearances of P in T.

Also the length of P and T are from [1,2 000 000] and the program has to run under 0.15 sec.

I know that KMP and Rabin-Karp are enough to get a 100% score on the problem but I for one wanted to try and implement Boyer-Moore. Which would be best for this type of pattern search?

share|improve this question
5  
When you tested these out in your language of choice what did you find? – Walter Jan 15 at 20:54
4  
On some tests Boyer-Moore was better on other KMP was better , but I'm not sure i have the "best" implementation of them . As for the language of choice it is in the tags : C++ ( not sure if you saw that since you wrote "language of choice"). P.S. I am also not sure if i tested on the best tests. – vandamon taigi Jan 15 at 20:57
1  

1 Answer

up vote 12 down vote accepted

It depends on the kind of search you want to perform. Each of the algorithms performs particularly well for certain types of a search, but you have not stated the context of your searches.

Here are some typical thoughts on search types:

  • Boyer-Moore: works by pre-analyzing the pattern and comparing from right-to-left. If a mismatch occurs, the initial analysis is used to determine how far the pattern can be shifted w.r.t. the text being searched. This works particularly well for long search patterns. In particular, it can be sub-linear, as you do not need to read every single character of your text.

  • Knuth-Morris-Pratt: also pre-analyzes the pattern, but tries to re-use whatever was already matched in the initial part of the pattern to avoid having to rematch that. This can work quite well, if your alphabet is small (f.ex. DNA bases), as you get a higher chance that your search patterns contain reuseable subpatterns.

  • Aho-Corasick: Needs a lot of preprocessing, but does so for a number of patterns. If you know you will be looking for the same search patterns over and over again, then this is much better than the other, because you need to analyse patterns only once, not once per search.

Hence, as usual in CS, there is no definite answer to the overall best. It is rather a matter of choosing the right tool for the job at hand.

Another note on your worst-case reasoning: Do consider the kinds of searches required to create that worst-case and think thoroughly about whether these are really relevant in your case. For example, the O(mn) worst-case complexity of the Boyer-Moore algorithm stems from a search pattern and a text that each use only one character (like finding aaa in aaaaaaaaaaaaaaaaaaaaa) - do you really need to be fast for searches like that?

share|improve this answer
I have the whole english alphabet or so to use and I updated the Question , sorry for not starting with this at the begging. – vandamon taigi Jan 16 at 16:28
And yes I do need to be fast even for searches like that – vandamon taigi Jan 16 at 19:15

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.