Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free.

I am tring to go a simple regex replace on a string in python. This is my code:

>>> s = "num1 1 num2 5"
>>> re.sub("num1 (.*?) num2 (.*?)","1 \1 2 \2",s)

I would expect an output like this, with the \numbers being replaced with their corresponding groups.

'1 1 2 5'

However, this is the output I am getting:

'1 \x01 2 \x025'

And I'm kinda stumped as to why the \x0s are their, and not what I would like to be there. Many thanks for any help

share|improve this question
1  
if you just want all numbers: ' '.join(re.findall(r'\d+', 'num1 1 num2 5')) –  Gandaro May 18 '12 at 17:52

3 Answers 3

up vote 5 down vote accepted

You need to start using raw strings (prefix the string with r):

>>> import re
>>> s = "num1 1 num2 5"
>>> re.sub(r"num1 (.*?) num2 (.*?)", r"1 \1 2 \2", s)
'1 1 2 5'

Otherwise you would need to escape your backslashes both for python and for the regex, like this:

>>> re.sub("num1 (.*?) num2 (.*?)", "1 \\1 2 \\2", s)
'1 1 2 5'

(this gets really old really fast, check out the opening paragraphs of the python regex docs

share|improve this answer
3  
+1. To clarify a bit: without the r, the \1 and \2 are octal (base-8) character escapes, so "1 \1 2 \2" means a string whose third character is the ASCII character with value 1, and whose seventh character is the ASCII character with value 2. Those characters aren't printing characters, so the command-line pretty-printer replaces them with hexadecimal character escapes, \x01 and \x02. But the \x0 isn't in the actual string, it's just in what gets printed. –  ruakh May 18 '12 at 17:51
    
Thanks a lot for that, that's really cleared it up for me! –  ACarter May 18 '12 at 17:54

\1 and \2 are getting interpreted as octal character code escapes, rather than just getting passed to the regex engine. Using raw strings r"\1" instead of "\1" prevents this interpretation.

>>> "\17"
'\x0f'
>>> r"\17"
'\\17'
share|improve this answer

The \1 is being interpreted in the string. So you must escape the \ with its own backslash:

>>> re.sub("num1 (.*?) num2 (.*?)", "1 \\1 2 \\2",s)
'1 1 2 5'

You can also use a raw string:

>>> re.sub("num1 (.*?) num2 (.*?)", r"1 \1 2 \2",s)
'1 1 2 5'
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.