As per title I need to parse string of the form string_1\string_2 as in a string followed by a backslash then by another string with the following requirements:

  • if string_1 and string_2 are present, break them into two tokens: string_1 and \string_2
  • if only string_1 is present, return it
  • if \string_2 is present but nothing behind the backslash, don't match anything.

So far I've come up with this :

^([\w\s]*)((?!\\\).*)

but the last character in string_1 keeps 'leaking' through and going to string_2 right before the backslash.

Is there a way to fix that? Or any other alternative regex? The following regex does helps with the leaking but it break the third requirement.

^([\w\s]*).((?!\\\).*)

In order to make sure this question is not too localized, note that this could help parse a subset of latex when you have a string coming before say \section{section title comes here {*}}.

  • 4
    Why not use explode? – Liam Sorsby Jul 20 '13 at 19:40
  • @LiamSorsby Because it's exactly latex i'm trying to parse and lots of explode would quickly turn into a nightmare. – nt.bas Jul 20 '13 at 19:41
  • How is the data been constructed to contain with a backslash? – Liam Sorsby Jul 20 '13 at 19:44
  • @LiamSorsby I'm not sure I understand your question but assuming i do, i have a small latex file. it contains a limited number number of latex commands so it happens that i get a user text directly followed by a command - just like i explained in the question. Let me know if i missed your question. – nt.bas Jul 20 '13 at 19:48
  • Yes sorry, I Mistyped that question! I still don't understand why this would be a nightmare to use explode as you should only need one explode and then check the given arrays – Liam Sorsby Jul 20 '13 at 20:15
up vote 2 down vote accepted

I think this is the regex you're looking for:

/^([^\\]+)(\\.+)?/

The first group is a "non-\" of at least 1 character, followed by optional "\" and anything else.

  • I just noticed your requirement that if there's nothing after the \ don't return it. So change the * to a + and you're good. – Chip Camden Jul 20 '13 at 21:26
  • There, I edited it accordingly. – Chip Camden Jul 20 '13 at 23:25

Your Answer

 

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Not the answer you're looking for? Browse other questions tagged or ask your own question.