Faster JavaScript fuzzy string matching function?

Question

I'm using the following function to fuzzy match strings:

function fuzzy_match(str,pattern){
    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
    return (new RegExp(pattern)).test(str);
};

Example:

fuzzy_match("fogo","foo") //true
fuzzy_match("jquery.js","jqjs") //true
fuzzy_match("jquery.js","jr") //false

It's very slow, though. How can I optimize that function?

Why is 360k operations per second slow? If your application is slow, I think the problem is not there. — Florian Margaine, Mar 14 '13 at 9:02
The third example returns true as well. jquery.js does match j.*r. — Jan Dvorak, Mar 14 '13 at 9:07

Jan Dvorak · Accepted Answer · 2013-03-14 13:48:38Z

function fuzzy_match(str,pattern){
    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
    return (new RegExp(pattern)).test(str);
};

String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that reduce already exists, and it's called join:
```
pattern = pattern.split("").join(".*")
```
The regex itself can be optimised: .* initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier: .*?. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better: a[^b]*b (at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.
```
pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
```
Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.
```
var cache = _.memoize(function(pattern){
  return new RegExp(pattern.split("").reduce(function(a,b){
    return a+'[^'+b+']*'+b;
  })
})
function fuzzy_match(str,pattern){
  return cache(pattern).test(str)
};
```
If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:
```
var fuzzy_match = (function(){
  var cache = _.memoize(function(str){
    return new RexEgp("^"+str.replace(/./g, function(x){
      return /[\-\[\]\/\{\}\*\+\?\.\\\^\$\|]/.test(x) ? "\\"+x+"?" : x+"?";
    })+"$");
  })
  return function(str, pattern){
    return cache(str).test(pattern)
  })
})()
```

Concerning the last regex:

Given some pattern "ace", the regex you build (/a.*c.*e/) tests if the string contains the characters of the pattern in the string, in the correct order.

If you want to test if a given string "abcde" is matched some pattern: The pattern must only contain the characters of the string, in the correct order: /^a?b?c?d?e?$/. To do this, we regex-escape every regex special character (pattern source: CoolAj86), make every character optional (?), and flank the regex with string boundary anchors.

Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp? — Dokkat, Mar 14 '13 at 13:27

asked	2 years ago
viewed	5469 times
active	2 years ago

current community

your communities

more stack exchange communities

Faster JavaScript fuzzy string matching function?

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged javascript strings regex or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Faster JavaScript fuzzy string matching function?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged javascript strings regex or ask your own question.

Linked

Related

Hot Network Questions