I am using a regular expression to extract key-value pairs from arbitrarily long input strings and have run into a case in which, for a long string with repetitive patterns, it causes a stack overflow.
The KV-parsing code looks something like this:
public static void parse(String input)
{
String KV_REGEX = "((?:\"[^\"^ ]*\"|[^=,^ ])*) *= *((?:\"[^\"]*\"|[^=,^\\)^ ])*)";
Pattern KV_PATTERN = Pattern.compile(KV_REGEX);
Matcher matcher = KV_PATTERN.matcher(input);
System.out.println("\nMatcher groups discovered:");
while (matcher.find())
{
System.out.println(matcher.group(1) + ", " + matcher.group(2));
}
}
Some fictitious examples of output:
String input1 = "2012-08-09 09:10:25,521 INFO com.a.package.SomeClass - Everything working fine {name=CentOS, family=Linux, category=OS, version=2.6.x}";
String input2 = "2012-08-09 blah blah 09:12:38,462 Log for the main thread, PID=5872, version=\"7.1.8.x\", build=1234567, other=done";
Calling parse(input1)
produces:
{name, CentOS
family, Linux
category, OS
version, 2.6.x}
Calling parse(input2)
produces:
PID, 5872
version, "7.1.8.x"
build, 1234567
other, done
This is fine (even with a bit of string processing required for the first case). However, when trying to parse a very long (over 1,000 characters long) classpath string, the aforementioned class overflow occurs, with the following exception (start):
Exception in thread "main" java.lang.StackOverflowError
at java.util.regex.Pattern$BitClass.isSatisfiedBy(Pattern.java:2927)
at java.util.regex.Pattern$8.isSatisfiedBy(Pattern.java:4783)
at java.util.regex.Pattern$8.isSatisfiedBy(Pattern.java:4783)
at java.util.regex.Pattern$8.isSatisfiedBy(Pattern.java:4783)
at java.util.regex.Pattern$8.isSatisfiedBy(Pattern.java:4783)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
...
The string is too long to put here, but it has the following, easily reproducible and repetitive structure:
java.class.path=/opt/files/any:/opt/files/any:/opt/files/any:/opt/files/any
Anyone who wants to reproduce the issue just needs to append :/opt/files/any
a few dozen times to the above string. After creating a string with about 90 copies of ":/opt/files/any" present in the classpath string, the stack overflow occurs.
Is there a generic way that the above KV_REGEX
string could be modified, so that the issue does not occur and the same results are produced?
I explicitly put generic above, as opposed to hacks that (for instance) check for a maximum string length before parsing.
The most gross fix I could come up with, a true anti-pattern, is
public void safeParse(String input)
{
try
{
parse(input);
}
catch (StackOverflowError e) // Or even Throwable!
{
parse(input.substring(0, MAX_LENGTH));
}
}
Funnily enough, it works in a few runs I tried it, but it is not something tasteful enough to recommend. :-)
[^=,^\\)^ ]
. – Keppil Aug 9 '12 at 20:36[^=) ]
(which gives you a nice little smiley as a bonus). Your example also stops at^
. – Keppil Aug 9 '12 at 20:56