Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems.. It's 100% free, no registration required.

patterns.txt:

"BananaOpinion"
"ExitWarning"
"SomeMessage"
"Help"
"Introduction"
"MessageToUser"

Strings.xml

<string name="Introduction">One day there was an apple that went to the market.</string>
<string name="BananaOpinion">Bananas are great!</string>
<string name="MessageToUser">We would like to give you apples, bananas and tomatoes.</string>

Expected output:

"ExitWarning"
"SomeMessage"
"Help" 

How do I print the terms in patterns.txt that are not found in Strings.xml? I can print the matched/unmatched lines in Strings.xml, but how do I print the unmatched patterns? I'm using ggrep (GNU grep) version 2.21, but am open to other tools. Apologies if this is a duplicate of another question that I couldn't find.

share|improve this question

3 Answers 3

up vote 7 down vote accepted

You could use grep -o to print only the matching part and use the result as patterns for a second grep -v on the original patterns.txt file:

grep -oFf patterns.txt Strings.xml | grep -vFf - patterns.txt
share|improve this answer
    
Another way without grep, if your "patterns" are always fixed strings enclosed in double quotes: join -t\" -v1 -1 2 -2 2 -o 1.1 1.2 1.3 <(sort -t \" -k 2 patterns.txt) <(sort -t \" -k 2 strings.xml) –  don_crissti yesterday

The best approach is probably what @don_crissti suggested, so here's a variation on the same theme:

$ grep -vf <(grep -Po 'name=\K.+?"' Strings.xml) patterns.txt
"ExitWarning"
"SomeMessage"
"Help"

This basically is the inverse of @don_crissti's approach. It uses grep with Perl Compatible Regular Expressions (-P) and the -o switch to print only the matching part of the line. Then, the regex looks for name= and discards it (\K), and then looks for one or more characters until the first " (.+?"). This results in the list of patterns present in the String.txt file which is then passed as input to a reverse grep (grep -v) using process substitution (<(command)).

share|improve this answer

I would use cut, probably. That is, if, as it appears, you know where to expect the quoted string you're looking for.

If I do:

{   cut  -sd\" -f2 |
    grep -vFf- pat
}   <<\IN
#   <string name="Introduction">One day there was an apple that went to the market.</string>
#   <string name="BananaOpinion">Bananas are great!</string>
#   <string name="MessageToUser">We would like to give you apples, bananas and tomatoes.</string>
IN

...after saving my own copy of your example patterns.txt in pat and running the above command the output is:

"ExitWarning"
"SomeMessage"
"Help"

cut prints to stdout only the second " double-quote -delimited -field for each delimiter-matched line of input and -suppresses all others.

What cut actually prints at grep is:

Introduction
BananaOpinion
MessageToUser

grep searches its named file operand for lines which -v don't match the -Fixed strings in its - stdin pattern -file.

If you can rely on the second "-delimited field as the one to match, then it will definitely be an optimization over grep -Perl mode by just matching -Fixed strings and only tiny portions of them because cut does the heavy lifting - and it does it fast.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.