Egrep multiple strings in an XML file

Question

I have a collection of XML files in a standard format that I'd like to search to see if they match two strings.

Here is the idea:

<ELEMENT1>Dave</ELEMENT>
<DON'TCARE1>Blaa</DON'TCARE2>
<DON'TCARE2>Blaa2</DON'TCARE2>
<ELEMENT2>History</ELEMENT2>

How can I match the content of ELEMENT1 and ELEMENT2 with egrep and return the filename that contains them?

You should first read this to realize how bad it is to use regexps to parse HTML or XML : stackoverflow.com/questions/1732348/…. To look for an element in a XML file, use an XPath expression instead. — lgeorget, Mar 7 '14 at 12:43
Shouldn't it be </ELEMENT1> instead of </ELEMENT> above? — Stéphane Chazelas, Mar 7 '14 at 12:48

Stéphane Chazelas · Accepted Answer · 2014-03-07 12:49:46Z

up vote 3 down vote accepted

With recent GNU grep built with recent PCRE:

grep -Po '<(ELEMENT[12]>)\K.*?(?=</\1)'

answered Mar 7 '14 at 12:49

Stéphane Chazelas
127k20161327

add a comment |

lgeorget · Answer 2 · 2014-03-07 13:06:13Z

The following XQuery should give you the desired output :

for $x in (/content/element1,/content/element2)
return $x/text()

For example, with an XQuery interpreter such as XQilla and an input file like

<?xml version="1.0" ?>
<content>
   <element1>truc</element1>
   <dontcare>blah</dontcare>
   <dontcare>blah</dontcare>
   <element2>truc2</element2>
   <dontcare>blah</dontcare>
   <dontcare>blah</dontcare>
</content>

xqilla -i 1.xml 1.query outputs

truc
truc2

For your example, regexps might be sufficient but in the general case it's a bad idea to use them for XML parsing because XML is not a regular language (i.e. a language parsable with regular expressions).

asked	1 year ago
viewed	132 times
active	1 year ago

current community

your communities

more stack exchange communities

Egrep multiple strings in an XML file

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged centos grep xml or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Egrep multiple strings in an XML file

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged centos grep xml or ask your own question.

Related

Hot Network Questions