Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

PCRE has a feature called recursive pattern, which can be used to match nested subgroups. For example, consider the "grammar"

Q -> \w | '[' A ';' Q* ','? Q* ']' | '<' A '>'
A -> (Q | ',')*
// to match ^A$.

It can be done in PCRE with the pattern

^((?:,|(\w|\[(?1);(?2)*,?(?2)*\]|<(?1)>))*)$

(Example test case: http://www.ideone.com/L4lHE)

There is no recursive pattern in .NET. Instead, it provides balancing groups for stack-based manipulation for matching simple nested patterns.

Is it possible to convert the above PCRE pattern into .NET Regex style?

(Yes I know it's better not to use regex in for this. It's just a theoretical question.)

References

share|improve this question
1  
Nice question. My guess would be "no", but I'd love to see if someone comes up with a way to do it. – EMP Jul 28 '10 at 4:51
1  
I think Perl is winning this battle. I might give it a shot at the evening, but it's too much for work :P – Kobi Jul 28 '10 at 5:32

1 Answer

The answer is (probably) Yes.

The technique is much more complex than the (?1) recursive call, but the result is almost 1-to-1 with the rules of the grammar - I worked in a such methodical way I can easily see it scripted. Basically, you match block-by-block, and use the stack to keep track of where you are. This is an almost working solution:

^(?:
    (\w(?<Q>)) # Q1
    |
    (<(?<Angle>))  #Q2 - start <
    |
    (\>(?<-Angle>)(?<-A>)?(?<Q>))  #Q2 - end >, match Q
    |
    (\[(?<Block>))  # Q3 start - [
    |
    (;(?<Semi-Block>)(?<-A>)?)  #Q3 - ; after [
    |
    (\](?<-Semi>)(?<-Q>)*(?<Q>))  #Q3 ] after ;, match Q
    |
    ((,|(?<-Q>))*(?<A>))   #Match an A group
)*$
# Post Conditions
(?(Angle)(?!))
(?(Block)(?!))
(?(Semi)(?!))

It is missing the part of allowing commas in Q->[A;Q*,?Q*], and for some reason allows [A;A], so it matches [;,,] and [abc;d,e,f]. Rest of the strings match the same as the test cases.
Another minor point is an issue with pushing to the stack with an empty capture - it doesn't. A accepts Ø, so I had to use (?<-A>)? to check if it captured.

The whole regex should look like this, but again, it is useless with the bug there.

share|improve this answer
1  
As a side note, I'm surprised I found so little material on the subject. I learned as I go, an it took a great deal of time. Anyway, if anyone finds the flaw in my code I'll be happy to hear. – Kobi Jul 28 '10 at 20:12
This question still hunts me. I know my answer is fundamentally wrong, since it doesn't take order into account, similarly to kobikobi.wordpress.com/2010/12/14/… – Kobi Feb 18 '11 at 14:43

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.