Using sed to find and replace complex string (preferrably with regex)

Question

I have a file with the following contents:

<username><![CDATA[name]]></username>
<password><![CDATA[password]]></password>
<dbname><![CDATA[name]]></dbname>

and I need to make a script that changes the "name" in the first line to "something", the "password" on the second line to "somethingelse", and the "name" in the third line to "somethingdifferent". I can't rely on the order of these occurring in the file, so I can't simply replace the first occurance of "name" with "something" and the second occurrence of "name" with "somethingdifferent". I actually need to do a search for the surrounding strings to make sure I'm finding and replacing the correct thing.

So far I have tried this command to find and replace the first "name" occurance:

sed -i "s/<username><![CDATA[name]]><\/username>/something/g" file.xml

however it's not working so I'm thinking some of these characters might need escaping, etc.

Ideally, I'd love to be able to use regex to just match the two "username" occurrences and replace only the "name". Something like this but with sed:

<username>.+?(name).+?</username>

and replace the contents in the brackets with "something".

Is this possible?

Just note that pretty much any regexp-based solution, unless extremely contrived, will risk breaking any time the input format changes. Regexps are a poor choice for dealing with XML, SGML or derivates (which this looks to me). — Michael Kjörling, Jun 7 '13 at 21:57
Approved! Consider using XQuery for example: w3schools.com/xquery/default.asp. This is the W3C standard for retrieving and manipulating XML content. — lgeorget, Jun 7 '13 at 22:01

evilsoup · Answer 1 · 2013-06-07 22:05:08Z

up vote 4 down vote

sed -e '/username/s/CDATA\[name\]/CDATA\[something\]/' \
-e '/password/s/CDATA\[password\]/CDATA\[somethingelse\]/' \
-e '/dbname/s/CDATA\[name\]/CDATA\[somethingdifferent\]/' file.txt

The /username/ before the s tells sed to only work on lines containing the string 'username'.

answered Jun 7 '13 at 22:05

evilsoup
1,3181211

Elegant, efficient and perfectly fitted for the case. +1 – lgeorget Jun 7 '13 at 22:08

add comment

lgeorget · Answer 2 · 2013-06-07 21:52:27Z

up vote 2 down vote

sed -i -E "s/(<username>.+)name(.+<\/username>)/\1something\2/" file.xml

This is, I think, what you're looking for.

Explanation:

parenthesises in the first part define groups (strings in fact) that can be reused in the the second part
\1, \2, etc. in the second part are references to the i-th group captured in the first part (the numbering starts with 1)
-E enables extended regular expressions (needed for + and grouping).

answered Jun 7 '13 at 21:52

lgeorget
1,724113

This is probably not the most efficient way to do it, but when dealing with regexp the tradeoff is always the same: readability vs. efficience! :D – lgeorget Jun 7 '13 at 21:55

1

+1 for the -E option – sgmart Jun 7 '13 at 22:03

add comment

sgmart · Answer 3 · 2013-06-07 22:01:03Z

For replace the "name" word with the "something" word, use:

sed "s/\(<username><\!\[[A-Z]*\[\)name\]/\1something/g" file.xml

That is going to replace all the occurrences of the specified word.

So far all is outputted to standard output, you can use:

sed "s/\(<username><\!\[[A-Z]*\[\)name\]/\1something/g" file.xml > anotherfile.xml

to save the changes to another file.

Gilles · Answer 4 · 2013-06-08 00:15:06Z

You need to quote \[.*^$/ in the regular expression part of the s command and \&/ in the replacement part, plus newlines. The regular expression is a basic regular expression, and in addition you need to quote the delimiter for the s command.

You can pick a different delimiter to avoid having to quote /. You'll have to quote that character instead, but usually the point of changing the delimiter is to pick one that doesn't occur in either the text to replace or the replacement text.

sed -e 's~<username><!\[CDATA\[name\]\]></username>~<username><![CDATA[something]]></username>~'

You can use groups to avoid repeating some parts in the replacement text, and accommodate variation on these parts.

sed -e 's~\(<username><!\[[A-Z]*\[\)name\(\]\]></username>\)~\1something\2~'

sed -e 's~\(<username>.*[^A-Za-z]\[\)name\([^A-Za-z].*</username>\)~\1something\2~'

manatwork · Answer 5 · 2013-06-08 17:58:59Z

If sed is not a hard requirement, better use a dedicated tool instead.

If your file is valid XML (not just those 3 XML-looking tags), then you can use XMLStarlet:

xml ed -P -O -L \
  -u '//username/text()' -v 'something' \
  -u '//password/text()' -v 'somethingelse' \
  -u '//dbname/text()' -v 'somethingdifferent' file.xml

The above will also work in situations which would be difficult to solve with regular expressions:

Can replace the values of the tags without specifying their current values.
Can replace the values even if they are just escaped and not enclosed in CDATA.
Can replace the values even if the tags have attributes.
Can easily replace just occurrences of tags, if there are multiple with the same name.
Can format the modified XML by indenting it.

(Brief demonstration of the above.)

asked	8 months ago
viewed	1732 times
active	8 months ago

current community

more stack exchange communities

Using sed to find and replace complex string (preferrably with regex)

5 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged sed regular-expression quoting or ask your own question.

Hot Network Questions

current community

more stack exchange communities

Using sed to find and replace complex string (preferrably with regex)

5 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged sed regular-expression quoting or ask your own question.

Related

Hot Network Questions