Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

I am writing a bash script which accepts a list of CSV files as arguments and outputs e-mail addresses only found in the first file. To accomplish this, for each record in the first CSV file I look up the e-mail address field and read its contents into a shell variable. Then, I use grep -iE with the following regular expression to look up the e-mail address just found in all the remaining files, making sure that it is not a substring (e.g. [email protected] is not the same as [email protected]), and allowing it to be at the beginning or end of a record:

"^(.*,)?($EMAIL_ADDRESS|\"$EMAIL_ADDRESS\")(,.*)?\$"

A problem with this approach is that e-mail addresses contain dots which have a special meaning in regular expressions. My questions are:

  1. How can I avoid this problem in an elegant way?
  2. How can I avoid this problem in a more general context, e.g. when the value to look up is not an e-mail address but some free text and might contain other special characters as well?
share|improve this question
1  
Use Perl instead of bash: quotemeta. –  choroba Apr 14 at 9:47
    
awk -F, '$2 == "[email protected]"'? –  Michael Homer Apr 14 at 9:47
1  
use a backslash \. in front of the dot to escape it. you probably need two backslashes \\. to get the shell to pass one to the program. –  Skaperen Apr 14 at 10:11
    
You can use grep -F to do not treat pattern as regular expression, just like a string and -w option (whole word) which mean that pattern should fill full "word" ( so " he@" is not compare " she@") –  Costas Apr 14 at 11:25

1 Answer 1

in perl regexp (grep -P ...) you may use \Q...\E to protect meta chars

grep -P "(^|,)\Q$EMAIL\E(,|$)" file.csv

where:

  • (^|,) = start of field
  • (,|$) = end of field
share|improve this answer
    
Don't you need to escape the $? Also, where are \Q and \E documented? –  Angel Tsankov Apr 15 at 10:15
    
In bash both ...$) and ...\$) work. Escaping is shell dependent. \Q...\E is document in any tutorial about perl-like regexp. See also the "quotemeta" link of @choroba comment. –  JJoao Apr 15 at 11:13

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.