I have a bit of code that converts a Sql Like expression to a regex expression for the purposes of a Linq to objects Like
extension method. For some time I have been using this conversion.
This conversion replaces all "%" with ".*?" which works for contains patterns but is over matching for starts with or ends with patterns. So the conversion of %abc
to .*?abc
is capturing both "abcdef" and "123abcdef".
I've amended the conversion algorithm to account for starts with and ends with LIKE expressions.
Here is my code:
internal static string ConvertLikeToRegex(string pattern)
{
// Turn "off" all regular expression related syntax in the pattern string.
StringBuilder builder = new StringBuilder(Regex.Escape(pattern));
// these are needed because the .*? replacement below at the begining or end of the string is not
// accounting for cases such as LIKE '%abc' or LIKE 'abc%'
bool startsWith = pattern.StartsWith("%") && !pattern.EndsWith("%");
bool endsWith = !pattern.StartsWith("%") && pattern.EndsWith("%");
// this is a little tricky
// ends with in like is '%abc'
// in regex it's 'abc$'
// so need to tanspose
if (startsWith)
{
builder.Replace("%", "", 0, 1);
builder.Append("$");
}
// same but inverse here
if (endsWith)
{
builder.Replace("%", "", pattern.Length - 1, 1);
builder.Insert(0, "^");
}
/* Replace the SQL LIKE wildcard metacharacters with the
* equivalent regular expression metacharacters. */
builder.Replace("%", ".*?").Replace("_", ".");
/* The previous call to Regex.Escape actually turned off
* too many metacharacters, i.e. those which are recognized by
* both the regular expression engine and the SQL LIKE
* statement ([...] and [^...]). Those metacharacters have
* to be manually unescaped here. */
builder.Replace(@"\[", "[").Replace(@"\]", "]").Replace(@"\^", "^");
return builder.ToString();
}
My initial units test are passing but not being overly conversant in regex syntax I am wondering if this is the best approach or if there are gaps in the conversion I am not seeing.