Python replace, using patterns in array

Question

I need to replace some things in a string using an array, they can look like this:

array = [3, "$x" , "$y", "$hi_buddy"]
#the first number is number of things in array
string = "$xena is here $x and $y."

I've got another array with things to replace those things, let's say its called rep_array.

rep_array = [3, "A", "B", "C"]

For the replacement I use this:

for x in range (1, array[0] + 1):
  string = string.replace(array[x], rep_array[x])

But the result is:

string = "Aena is here A and B."

But I need to much only lonely $x not $x in another word. Result should look like this:

string = "$xena is here A and B."

Note that:

all patterns in array start with $.
a pattern matches if it matches the whole word after $; $xena doesn't match $x, but foo$x would match.
$ can be escaped with @ and than it should not be matched (for example $x does not match @$x)

why don't you use string formats from python, instead of reinventing the wheel? — zmo, 23 hours ago
@MartijnPieters: What about ` \$x ` and in the replacement array you would have ` A `? — npinti, 23 hours ago
@Alfe: This is why I asked if the pattern always starts with $. — Martijn Pieters, 23 hours ago

Martijn Pieters · Accepted Answer · 2014-04-14 10:13:36Z

Use a regular expression that wraps your source text with some whitespace look-behind and a \b anchor; make sure to include the start of the string too:

import re

for pattern, replacement in zip(array[1:], rep_array[1:]):
    pattern = r'{}\b'.format(re.escape(pattern))
    string = re.sub(pattern, replacement, string)

This uses re.escape() to ensure any regular expression meta characters in the pattern are escaped first. zip() is used to pair up your patterns and replacement values; a more pythonic alternative to your range() loop.

\b only matches at a position where a word character is followed by a non-word character (or vice versa), a word boundary. Your patterns all end in a word character, so this makes sure your patterns only match if the next character is not a word character, blocking $x from matching inside $xena.

Demo:

>>> import re
>>> array = [3, "$x" , "$y", "$hi_buddy"]
>>> rep_array = [3, "A", "B", "C"]
>>> string = "$xena is here $x and $y. foo$x matches too!"
>>> for pattern, replacement in zip(array[1:], rep_array[1:]):
...     pattern = r'{}\b'.format(re.escape(pattern))
...     string = re.sub(pattern, replacement, string)
... 
>>> print string
$xena is here A and B. fooA matches too!

Your solution is almost working the way I need, I also need to be able to escape the $ with @, can u tell me what's wrong with this patter? pattern = r'(?:([^@])|^){}\b'.format(re.escape(pattern)) — White dracke, 21 hours ago
@Whitedracke: You need a look-behind: r'(?:(?<=[^@])|^){}\b' — Martijn Pieters, 21 hours ago
@Whitedracke: or better still, a negative look-behind: r'(?<!@){}\b' — Martijn Pieters, 21 hours ago

zmo · Answer 2 · 2014-04-14 09:46:19Z

this is not a direct answer to your question, but as I guess you'll get other giving solutions hacking around \b, I'm going to suggest you a more pythonic solution:

rep_dict = {'x': 'A', 'y': 'B', 'hi_buddy': 'C'}
string = '{xena} is here {x} and {y}'

print string.format(rep_dict)

but here, it will raise a KeyError for missing xena in rep_dict, which can be solved by answers to that question, using a defaultdict or a formatter you may prefer depending on your use case.

The problem with using $, is that it is not trivial to make something that matches that does not define the real boundary. Most languages using $ variables apply it to the next one character, using a boundary on larger characters (those are shells and makefiles), i.e. ${xena}. Languages like Perl use a grammar to define the context of a $ variable, and I guess they may use regexps as well in the tokenizer.

That's why in python, we only use formatting operators to mark the boundaries of the variable {} in the string, not having useless $ so we do not have to deal with ambiguities ($xena => ${x}ena or ${xena}?).

HTH

of course, I'm giving this for the OP to know and consider using that if it can be an option to him, and for future readers that may consider using a $ variable in strings for a use case that strings formats have been built for. ;-) — zmo, 22 hours ago
This is the correct TOOWTDI (wiki.python.org/moin/TOOWTDI), if the OP has any power over the input strings. — Davidmh, 21 hours ago

Jasper · Answer 3 · 2014-04-14 09:39:49Z

up vote 0 down vote

string.replace does not know about regular expressions, so you have to use the re module (https://docs.python.org/3.4/library/re.html), namely the re.sub method:

>>>re.sub(r"\$x\b", "replace", r"$xenia $x")
'$xenia replace'

answered 23 hours ago

Jasper
6176

This'll match $x in foo$x too. – Martijn Pieters 23 hours ago

I need foo$x to be replaced, but dunno how the escape '\' get into the arrays. – White dracke 22 hours ago

@Whitedracke: That's an important detail; do include that in your question post! – Martijn Pieters 22 hours ago

1

@Whitedracke: I updated your post to include that detail, as well as the fact that all patterns start with $. It's details like that that make a huge difference in what is a proper solution and what is not. – Martijn Pieters 22 hours ago

add comment

georg · Answer 4 · 2014-04-14 09:49:38Z

up vote 0 down vote

You can also try something like this:

import re

search = ["$x" , "$y", "$hi_buddy"]
replace = ["A", "B", "C"]
string = "$xena is here $x and $y skip$x."

repl = dict(zip(search, replace))
print re.sub(r'\B\$\w+', lambda m: repl.get(m.group(0), m.group(0)), string)

# result: $xena is here A and B skip$x.

\B here means "match $ when it's preceded by a non-word char". If you need skip$x to be replaced as well, just drop the \B:

print re.sub(r'\$\w+', lambda m: repl.get(m.group(0), m.group(0)), string)
# $xena is here A and B skipA.

answered 22 hours ago

georg
7296

Using \B means $$x is also matched. – Martijn Pieters 22 hours ago

@MartijnPieters: right, and so do !$x, ...$x and similar. I didn't understand from the question if this is a desired behavior or not. – georg 22 hours ago

add comment

asked	today
viewed	52 times
active	today

current community

your communities

more stack exchange communities

Python replace, using patterns in array

4 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged python arrays regex replace or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Python replace, using patterns in array

4 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python arrays regex replace or ask your own question.

Linked

Related

Hot Network Questions