function fuzzy_match(str,pattern){
pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
return (new RegExp(pattern)).test(str);
};
String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that reduce
already exists, and it's called join
:
pattern = pattern.split("").join(".*")
The regex itself can be optimised: .*
initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier: .*?
. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better: a[^b]*b
(at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b
) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.
pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.
var cache = _.memoize(function(pattern){
return new RegExp(pattern.split("").reduce(function(a,b){
return a+'[^'+b+']*'+b;
})
})
function fuzzy_match(str,pattern){
return cache(pattern).test(str)
};
If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:
var fuzzy_match = (function(){
var cache = _.memoize(function(str){
return new RexEgp("^"+str.replace(/./g, function(x){
return /[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/.test(x) ? "\\"+x+"?" : x+"?";
})+"$");
})
return function(str, pattern){
return cache(str).test(pattern)
})
})()
Concerning the last regex:
Given some pattern "ace"
, the regex you build (/a.*c.*e/
) tests if the string contains the characters of the pattern in the string, in the correct order.
If you want to test if a given string "abcde" is matched some pattern: The pattern must only contain the characters of the string, in the correct order: /^a?b?c?d?e?$/
. To do this, we regex-escape every regex special character (pattern source: CoolAj86), make every character optional (?
), and flank the regex with string boundary anchors.
true
as well.jquery.js
does matchj.*r
. – Jan Dvorak Mar 14 at 9:07