I was visiting some old python code, which had not thrown up any errors before, but when I tried to run it I encountered an error. This is the code that was giving me an error:

import re

text = r"I quote \"How're you?\" to you."
double = [z.start() for z in re.finditer('(?<!\\)(?:\\\\)*(")', text)]
single = [z.start() for z in re.finditer("(?<!\\)(?:\\\\)*(')", text)]
print(double)
print(single)

The output I had hoped to get from this program was:

[]
[13]

This, however, gives me the error:

double = [z.start() for z in re.finditer('(?<!(?:\\))(?:\\\\)*(")', text)]
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\re.py", line 220, in finditer
return _compile(pattern, flags).finditer(string)
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\re.py", line 293, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\sre_compile.py", line 536, in compile
p = sre_parse.parse(p, flags)
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 829, in parse
p = _parse_sub(source, pattern, 0)
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 437, in _parse_sub
itemsappend(_parse(source, state))
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 722, in _parse
source.tell() - start)
sre_constants.error: missing ), unterminated subpattern at position 0

It is worth mentioning that I had updated python before running this, so maybe the update to python caused this error? (I am now running python 3.5.2, but I can't remember what it was before)

Also, in case it helps, I was trying to find all cases of single or double quotes that were not escaped by a backslash i.e.

' and " are picked up

\' and \" are not

\' and \" are picked up and so on...

I was going to use this to then separate nested strings in the string from other parts of the string.

It is the negative lookbehind (?<!\\) that is causing the issue, but I cannot see what is wrong. The backslash is escaped by the one in front, so I cannot see where the missing bracket is.

Strangely, this works on regex101, so I am starting to run out of ways to debug this.

I tried different replacements for the negative lookbehind to try to get this to work:

(?<!\) #Gets the error, but that is expected

(?<!\\\\) #Same error again, same problem as the original case

(?<!\\\) #Returns [8, 20] and [13]

Clearly this last one has incorrect syntax. Python, however, is interpreting this as correct, but I have no idea what it is actually interpreting this as.

Anyway, I am aware that there is probably some simple explanation, maybe some RegEx syntax I am not aware of.

Also, if there is an alternative, less messy solution to what I am attempting, please feel free to give me that solution instead.

Thank you very much, I am nearly tearing my hair out,

EdW

share|improve this question
1  
regex101 automatically makes it a raw string r'...'. Maybe try that? – Patrick Haugh 25 mins ago

Simply add r to the front of the regex string

import re
text = r"I quote \"How're you?\" to you."
double = [z.start() for z in re.finditer(r'(?<!\\)(?:\\\\)*(")', text)]
single = [z.start() for z in re.finditer(r"(?<!\\)(?:\\\\)*(')", text)]
print(double)
print(single)

Output:

[]
[13]
share|improve this answer
    
To be clear, the reason this works is that it disables the escaping behavior of backslash for the string literal unless the following character is the quote character used to begin the string. When you don't do this, what you pass to finditer contains the literal characters (?<!\)(?:\\)*("); it then interprets that first \ as escaping (for regex purposes) the close paren that follows. Always use raw strings for regex; Python is "helpful", only processing defined escapes ('\d' is len 2; '\\' is len 1), but it just makes it more confusing when you aren't expecting escape processing. – ShadowRanger 18 mins ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.