Sign up ×
Stack Overflow is a community of 4.7 million programmers, just like you, helping each other. Join them; it only takes a minute:

I am new in java. I am getting java Stack overflow Exception in regex strHindiText. What should I do for that?

try {
     // This regex convert the pattern "{\fldrslt {\fcs1 \ab\af24 \fcs0 ऩ}{"
     // into "{\fldrslt {\fcs1 \ab\af24 \fcs0 ऩ}}}{"
     // strHindiText = strHindiText.replaceAll("\\{(\\\\fldrslt[ ])\\{((\\\\\\S+[ ])+)((\\s*&#\\d+;\\s*(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*)+)\\}\\{","{$1{$2$4}}}{");

     // This regex convert the pattern "{\fcs0 \af0 &#2345;{ or {\fcs0 \af0 *\tab &#2345;{" 
     // into "{\fcs0 \af0 &#2345; }{"
     strHindiText = strHindiText.replaceAll("\\{\\s*((\\\\\\S+[ ](\\*)?)+\\s*)(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*[ ]*(((&#\\d+;)[ ]*(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*[ ]*)+)\\{", "{$1 $4$5 }{");

     // This regex convert the pattern "{&#2345; \fcs0 \af0 {" 
     // into "{&#2345; \fcs0 \af0 }{"
     strHindiText = strHindiText.replaceAll("\\{\\s*(((&#\\d+;)[ ]*(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*[ ]*)+)[ ]*((\\\\\\S+[ ])+)\\{", "{$1 $5 }{");

     } catch(StackOverflowError er) {
            System.out.println("Third try Block StackOverflowError in regex pattern to reform the rtf tags................");
            er.printStackTrace();
        //  throw er;
     }



Whenever these strHindiText contain large data it gives an java stackoverflow exception:

java.lang.StackOverflowError
2013-08-08 15:35:07,743 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$Curly.match0(Pattern.java:3754)
2013-08-08 15:35:07,743 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
2013-08-08 15:35:07,744 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
2013-08-08 15:35:07,744 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3366)
2013-08-08 15:35:07,745 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
2013-08-08 15:35:07,745 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$Curly.match(Pattern.java:3744)



My strHindiText data is:

 `{\rtlch\fcs1 \af1\afs18 \ltrch\fcs0 \f1\fs18\cf21\insrsid13505584 &#2349;&#2379;&#2346;&#2366;&#2354;&#32; &#2404; \par }\pard\plain \ltrpar\s16\ql \li0\ri0\sb100\sa100\sbauto1\saauto1\sl240\slmult0\widctlpar\wrapdefault\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0\pararsid13505584 \cbpat20 \rtlch\fcs1 \af0\afs24\alang1025 \ltrch\fcs0 \fs24\lang1033\langfe1033\cgrid\langnp1033\langfenp1033 {\rtlch\fcs1 \ab\af1\afs18 \ltrch\fcs0 \cs21\b\f1\fs18\cf21\insrsid13505584 &#2309;&#2344;&#2381;&#2357;&#2375;&#2359;&#2339;&#32;&#2325;&#2352;&#2375;&#2306;&#32; :}{\rtlch\fcs1 \af1\afs18 \ltrch\fcs0 \f1\fs18\cf21\insrsid13505584  \par &#2349;&#2379;&#2346;&#2366;&#2354;&#32;&#44;&#32;&#2350;&#2343;&#2381;&#2351;&#32;&#2346;&#2381;&#2352;&#2342;&#2375;&#2358;&#32;&#2325;&#2368;&#32;&#2352;&#2366;&#2332;&#2343;&#2366;&#2344;&#2368;&#32;&#2346;&#2381;&#2352;&#2366;&#2325;&#2371;&#2340;&#2367;&#2325;&#32;&#2360;&#2369;&#2306;&#2342`
share|improve this question
8  
Your alternative paths | are probably causing recursive calls, resulting in the stackoverflow. Regex stuff is complicated in general, and your regex is big. I'm not surprised. – keyser Aug 8 '13 at 10:02
1  
I would suggest instead of alternatives (e.g a|b|c) to use the alternative notation: [abc], this should make the regex clearer, and you just need to escape the closing bracket and no other character. Also, it looks like you want to do something that regexes aren't good for - parsing - for something that isn't text but has a higher ordering. – Tassos Bassoukos Aug 8 '13 at 10:33
7  
You really shouldn't use RegEx for such enormous parsings.. it's not very performant, since the regex expression compiles every time you try to match a string. – GGrec Aug 8 '13 at 11:13
6  
Everything about your code is asking for problems. Try breaking the problem into multiple small problems rather than trying to do a bazillion things all at once with a giant regex. Based on the regexes you're using, I'd be surprised if you didn't experience memory problems. – jahroy Aug 8 '13 at 19:32
1  
I would personally recommend writing a parser for your RTF rather than attempting to cut it up with regex. Regex is meant for simple things, and I don't imagine RTF in Hindi is simple at all. – Ryan Aug 8 '13 at 20:34

Option 1 - Treat the symptoms

Look for recursive calls in your regex.

If you are not sure where your problem lies: try a regex tester like this.

Option 2 - Treat the cause (much better)

Don't use a regex if there are better tools for your task.

In your case you could: Search for a RTF parsing library or write your own parser.
e.g. like the one here that jahroy pointed out in the comments.

share|improve this answer

This is not a full answer but just for your information.

In your regex:

(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)* can be written as [-,/()\";.'<>:?]*

Since this pattern occurs twice (in your first regex), this immediately shortens your regex by 40 characters and makes those sections much more readable.

share|improve this answer

Try this to catch the error

public class Example {
    public static void endless() {
        endless();
    }

    public static void main(String args[]) {
        try {
            endless();
        } catch(StackOverflowError t) {
            // more general: catch(Error t)
            // anything: catch(Throwable t)
            System.out.println("Caught "+t);
            t.printStackTrace();
        }
        System.out.println("After the error...");
    }
}

More importantly try increasing the size of the stack add this to your regex

+'xss='xss

adding the "+" symbol changes the operator to prevent back tracking since this doesnt seem to be necessary in your case.

share|improve this answer
5  
He should consider using the right tool for the job rather than treating the symptoms that result from using the wrong tool... – jahroy Aug 9 '13 at 2:52
1  
chances are the overflow is coming from recursive issues not greediness from the regex. By making the operator possessive we can eliminate branching and recursive handling making this expression more efficient and allows for less memory usage. – Bmize729 Aug 9 '13 at 3:07
4  
I would either look for an RTF parsing library or write one myself. If I wrote one myself I would break up the parsing into small tasks rather than try to do everything at once. If I had to use regexes, I would keep them small and simple and make sure they only operate on small pieces of text. I would never consider feeding the entire document to a single, complicated regex. – jahroy Aug 9 '13 at 3:18
1  
It took about 5 seconds of googling to find this (maybe it will help, maybe it won't...) – jahroy Aug 9 '13 at 3:24
2  
Ok. Sorry if my comments were overly harsh. This whole "I must use regex" mentality is just so common on this site that it sometimes makes you want to scream from the top of the mountain: "not all problems must be solved with regex!" – jahroy Aug 9 '13 at 3:47

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.