Sign up ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I'm using the following function to fuzzy match strings:

function fuzzy_match(str,pattern){
    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
    return (new RegExp(pattern)).test(str);
};

Example:

fuzzy_match("fogo","foo") //true
fuzzy_match("jquery.js","jqjs") //true
fuzzy_match("jquery.js","jr") //false

It's very slow, though. How can I optimize that function?

share|improve this question
2  
Why is 360k operations per second slow? If your application is slow, I think the problem is not there. – Florian Margaine Mar 14 '13 at 9:02
    
The third example returns true as well. jquery.js does match j.*r. – Jan Dvorak Mar 14 '13 at 9:07

1 Answer 1

up vote 23 down vote accepted
function fuzzy_match(str,pattern){
    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
    return (new RegExp(pattern)).test(str);
};
  1. String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that reduce already exists, and it's called join:

    pattern = pattern.split("").join(".*")
    
  2. The regex itself can be optimised: .* initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier: .*?. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better: a[^b]*b (at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.

    pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
    
  3. Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.

    var cache = _.memoize(function(pattern){
      return new RegExp(pattern.split("").reduce(function(a,b){
        return a+'[^'+b+']*'+b;
      })
    })
    function fuzzy_match(str,pattern){
      return cache(pattern).test(str)
    };
    
  4. If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:

    var fuzzy_match = (function(){
      var cache = _.memoize(function(str){
        return new RexEgp("^"+str.replace(/./g, function(x){
          return /[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/.test(x) ? "\\"+x+"?" : x+"?";
        })+"$");
      })
      return function(str, pattern){
        return cache(str).test(pattern)
      })
    })()
    

Concerning the last regex:

Given some pattern "ace", the regex you build (/a.*c.*e/) tests if the string contains the characters of the pattern in the string, in the correct order.

If you want to test if a given string "abcde" is matched some pattern: The pattern must only contain the characters of the string, in the correct order: /^a?b?c?d?e?$/. To do this, we regex-escape every regex special character (pattern source: CoolAj86), make every character optional (?), and flank the regex with string boundary anchors.

share|improve this answer
    
Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp? – Dokkat Mar 14 '13 at 13:27
1  
@Dokkat explanation added. Proper escaping added. – Jan Dvorak Mar 14 '13 at 13:50

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.