Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I need to replace some things in a string using an array, they can look like this:

array = [3, "$x" , "$y", "$hi_buddy"]
#the first number is number of things in array
string = "$xena is here $x and $y."

I've got another array with things to replace those things, let's say its called rep_array.

rep_array = [3, "A", "B", "C"]

For the replacement I use this:

for x in range (1, array[0] + 1):
  string = string.replace(array[x], rep_array[x])

But the result is:

string = "Aena is here A and B."

But I need to much only lonely $x not $x in another word. Result should look like this:

string = "$xena is here A and B."

Note that:

  • all patterns in array start with $.
  • a pattern matches if it matches the whole word after $; $xena doesn't match $x, but foo$x would match.
  • $ can be escaped with @ and than it should not be matched (for example $x does not match @$x)
share|improve this question
    
Do all your replacement patterns start with $? –  Martijn Pieters 23 hours ago
1  
why don't you use string formats from python, instead of reinventing the wheel? –  zmo 23 hours ago
    
@MartijnPieters: What about ` \$x ` and in the replacement array you would have ` A `? –  npinti 23 hours ago
    
@npinti: sorry, I don't follow what you mean there. –  Martijn Pieters 23 hours ago
1  
@Alfe: This is why I asked if the pattern always starts with $. –  Martijn Pieters 23 hours ago
show 6 more comments

4 Answers

up vote 3 down vote accepted

Use a regular expression that wraps your source text with some whitespace look-behind and a \b anchor; make sure to include the start of the string too:

import re

for pattern, replacement in zip(array[1:], rep_array[1:]):
    pattern = r'{}\b'.format(re.escape(pattern))
    string = re.sub(pattern, replacement, string)

This uses re.escape() to ensure any regular expression meta characters in the pattern are escaped first. zip() is used to pair up your patterns and replacement values; a more pythonic alternative to your range() loop.

\b only matches at a position where a word character is followed by a non-word character (or vice versa), a word boundary. Your patterns all end in a word character, so this makes sure your patterns only match if the next character is not a word character, blocking $x from matching inside $xena.

Demo:

>>> import re
>>> array = [3, "$x" , "$y", "$hi_buddy"]
>>> rep_array = [3, "A", "B", "C"]
>>> string = "$xena is here $x and $y. foo$x matches too!"
>>> for pattern, replacement in zip(array[1:], rep_array[1:]):
...     pattern = r'{}\b'.format(re.escape(pattern))
...     string = re.sub(pattern, replacement, string)
... 
>>> print string
$xena is here A and B. fooA matches too!
share|improve this answer
    
Your solution is almost working the way I need, I also need to be able to escape the $ with @, can u tell me what's wrong with this patter? pattern = r'(?:([^@])|^){}\b'.format(re.escape(pattern)) –  White dracke 21 hours ago
1  
@Whitedracke: You need a look-behind: r'(?:(?<=[^@])|^){}\b' –  Martijn Pieters 21 hours ago
1  
@Whitedracke: or better still, a negative look-behind: r'(?<!@){}\b' –  Martijn Pieters 21 hours ago
add comment

this is not a direct answer to your question, but as I guess you'll get other giving solutions hacking around \b, I'm going to suggest you a more pythonic solution:

rep_dict = {'x': 'A', 'y': 'B', 'hi_buddy': 'C'}
string = '{xena} is here {x} and {y}'

print string.format(rep_dict)

but here, it will raise a KeyError for missing xena in rep_dict, which can be solved by answers to that question, using a defaultdict or a formatter you may prefer depending on your use case.

The problem with using $, is that it is not trivial to make something that matches that does not define the real boundary. Most languages using $ variables apply it to the next one character, using a boundary on larger characters (those are shells and makefiles), i.e. ${xena}. Languages like Perl use a grammar to define the context of a $ variable, and I guess they may use regexps as well in the tokenizer.

That's why in python, we only use formatting operators to mark the boundaries of the variable {} in the string, not having useless $ so we do not have to deal with ambiguities ($xena => ${x}ena or ${xena}?).

HTH

share|improve this answer
    
+1 for "hacking around \b" and the more pythonic approach! –  Jasper 23 hours ago
    
of course, I'm giving this for the OP to know and consider using that if it can be an option to him, and for future readers that may consider using a $ variable in strings for a use case that strings formats have been built for. ;-) –  zmo 22 hours ago
1  
This is the correct TOOWTDI (wiki.python.org/moin/TOOWTDI), if the OP has any power over the input strings. –  Davidmh 21 hours ago
add comment

string.replace does not know about regular expressions, so you have to use the re module (https://docs.python.org/3.4/library/re.html), namely the re.sub method:

>>>re.sub(r"\$x\b", "replace", r"$xenia $x")
'$xenia replace'
share|improve this answer
    
This'll match $x in foo$x too. –  Martijn Pieters 23 hours ago
    
I need foo$x to be replaced, but dunno how the escape '\' get into the arrays. –  White dracke 22 hours ago
    
@Whitedracke: That's an important detail; do include that in your question post! –  Martijn Pieters 22 hours ago
1  
@Whitedracke: I updated your post to include that detail, as well as the fact that all patterns start with $. It's details like that that make a huge difference in what is a proper solution and what is not. –  Martijn Pieters 22 hours ago
add comment

You can also try something like this:

import re

search = ["$x" , "$y", "$hi_buddy"]
replace = ["A", "B", "C"]
string = "$xena is here $x and $y skip$x."

repl = dict(zip(search, replace))
print re.sub(r'\B\$\w+', lambda m: repl.get(m.group(0), m.group(0)), string)

# result: $xena is here A and B skip$x.

\B here means "match $ when it's preceded by a non-word char". If you need skip$x to be replaced as well, just drop the \B:

print re.sub(r'\$\w+', lambda m: repl.get(m.group(0), m.group(0)), string)
# $xena is here A and B skipA.
share|improve this answer
    
Using \B means $$x is also matched. –  Martijn Pieters 22 hours ago
    
@MartijnPieters: right, and so do !$x, ...$x and similar. I didn't understand from the question if this is a desired behavior or not. –  georg 22 hours ago
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.