Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems.. It's 100% free, no registration required.

I have a file with the following contents:

<username><![CDATA[name]]></username>
<password><![CDATA[password]]></password>
<dbname><![CDATA[name]]></dbname>

and I need to make a script that changes the "name" in the first line to "something", the "password" on the second line to "somethingelse", and the "name" in the third line to "somethingdifferent". I can't rely on the order of these occurring in the file, so I can't simply replace the first occurance of "name" with "something" and the second occurrence of "name" with "somethingdifferent". I actually need to do a search for the surrounding strings to make sure I'm finding and replacing the correct thing.

So far I have tried this command to find and replace the first "name" occurance:

sed -i "s/<username><![CDATA[name]]><\/username>/something/g" file.xml

however it's not working so I'm thinking some of these characters might need escaping, etc.

Ideally, I'd love to be able to use regex to just match the two "username" occurrences and replace only the "name". Something like this but with sed:

<username>.+?(name).+?</username>

and replace the contents in the brackets with "something".

Is this possible?

share|improve this question
1  
Just note that pretty much any regexp-based solution, unless extremely contrived, will risk breaking any time the input format changes. Regexps are a poor choice for dealing with XML, SGML or derivates (which this looks to me). –  Michael Kjörling Jun 7 '13 at 21:57
 
Approved! Consider using XQuery for example: w3schools.com/xquery/default.asp. This is the W3C standard for retrieving and manipulating XML content. –  lgeorget Jun 7 '13 at 22:01
add comment

5 Answers

sed -e '/username/s/CDATA\[name\]/CDATA\[something\]/' \
-e '/password/s/CDATA\[password\]/CDATA\[somethingelse\]/' \
-e '/dbname/s/CDATA\[name\]/CDATA\[somethingdifferent\]/' file.txt

The /username/ before the s tells sed to only work on lines containing the string 'username'.

share|improve this answer
 
Elegant, efficient and perfectly fitted for the case. +1 –  lgeorget Jun 7 '13 at 22:08
add comment
sed -i -E "s/(<username>.+)name(.+<\/username>)/\1something\2/" file.xml

This is, I think, what you're looking for.

Explanation:

  • parenthesises in the first part define groups (strings in fact) that can be reused in the the second part
  • \1, \2, etc. in the second part are references to the i-th group captured in the first part (the numbering starts with 1)
  • -E enables extended regular expressions (needed for + and grouping).
share|improve this answer
 
This is probably not the most efficient way to do it, but when dealing with regexp the tradeoff is always the same: readability vs. efficience! :D –  lgeorget Jun 7 '13 at 21:55
1  
+1 for the -E option –  sgmart Jun 7 '13 at 22:03
add comment

For replace the "name" word with the "something" word, use:

sed "s/\(<username><\!\[[A-Z]*\[\)name\]/\1something/g" file.xml

That is going to replace all the occurrences of the specified word.

So far all is outputted to standard output, you can use:

sed "s/\(<username><\!\[[A-Z]*\[\)name\]/\1something/g" file.xml > anotherfile.xml

to save the changes to another file.

share|improve this answer
add comment

You need to quote \[.*^$/ in the regular expression part of the s command and \&/ in the replacement part, plus newlines. The regular expression is a basic regular expression, and in addition you need to quote the delimiter for the s command.

You can pick a different delimiter to avoid having to quote /. You'll have to quote that character instead, but usually the point of changing the delimiter is to pick one that doesn't occur in either the text to replace or the replacement text.

sed -e 's~<username><!\[CDATA\[name\]\]></username>~<username><![CDATA[something]]></username>~'

You can use groups to avoid repeating some parts in the replacement text, and accommodate variation on these parts.

sed -e 's~\(<username><!\[[A-Z]*\[\)name\(\]\]></username>\)~\1something\2~'

sed -e 's~\(<username>.*[^A-Za-z]\[\)name\([^A-Za-z].*</username>\)~\1something\2~'
share|improve this answer
add comment

If sed is not a hard requirement, better use a dedicated tool instead.

If your file is valid XML (not just those 3 XML-looking tags), then you can use XMLStarlet:

xml ed -P -O -L \
  -u '//username/text()' -v 'something' \
  -u '//password/text()' -v 'somethingelse' \
  -u '//dbname/text()' -v 'somethingdifferent' file.xml

The above will also work in situations which would be difficult to solve with regular expressions:

  • Can replace the values of the tags without specifying their current values.
  • Can replace the values even if they are just escaped and not enclosed in CDATA.
  • Can replace the values even if the tags have attributes.
  • Can easily replace just occurrences of tags, if there are multiple with the same name.
  • Can format the modified XML by indenting it.

(Brief demonstration of the above.)

share|improve this answer
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.