Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

Having a CSV file like this:

HEADER
"first, column"|"second "some random quotes" column"|"third ol' column"
FOOTER

and looking for result like:

HEADER
first, column|second "some random quotes" column|third ol' column

in other words removing "FOOTER", quotes in beginning, end and around |.

So far this code works:

sed '/FOOTER/d' csv > csv1 | #remove FOOTER
sed 's/^\"//' csv1 > csv2 | #remove quote at the beginning
sed 's/\"$//' csv2 > csv3 | #remove quote at the end
sed 's/\"|\"/|/g' csv3 > csv4 #remove quotes around pipe

As you see the problem is it creates 4 extra files.

Here is another solution, that has a goal not to create extra files and to do the same thing in a single script. It doesn't work very well.

#!/bin/ksh

sed '/begin/, /end/ { 
        /FOOTER/d
        s/^\"//
        s/\"$//
        s/\"|\"/|/g 
}' csv > csv4
share|improve this question
1  
Since you are having quotes you can have newlines in the fields. your sed is not going to work with that, only with simplified csv. Use a programming language with a library that can handle real CSV files (Python/Perl/Ruby). –  Anthon yesterday

2 Answers 2

up vote 5 down vote accepted

First of all, as Michael showed, you can just combine all of these into a single command:

sed '/^FOOTER/d; s/^\"//; s/\"$//; s/\"|\"/|/g' csv > csv1

I think some sed implementations can't cope with that and might need:

  sed -e '/^FOOTER/d' -e 's/^\"//' -e 's/\"$//' -e 's/\"|\"/|/g' csv > csv1

That said, it looks like your fields are defined by | and you just want to remove " around the entire field, leaving those that are within the field. In that case, you could do:

$ sed '/FOOTER/d; s/\(^\||\)"/\1/g; s/"\($\||\)/\1/g' csv 
HEADER
first, column|second "some random quotes" column|third ol' column

Or, with GNU sed:

sed -r '/FOOTER/d; s/(^|\|)"/\1/g; s/"($|\|)/\1/g' csv 

You could also use Perl:

$ perl -F"|" -lane 'next if /FOOTER/; s/^"|"$// for @F; print @F' csv 
HEADER
first, column|second some random quotes column|third ol' column
share|improve this answer

This would also work:

sed 's/^"//; s/"|"/|/g; s/""$/"/'

Example:

$ echo '"this"|" and "ths""|" and "|" this 2"|" also "this", "thi", "and th""' | 
sed 's/^"//; s/"|"/|/g; s/""$/"/'
this| and "ths"| and | this 2| also "this", "thi", "and th"

pretty version

sed '
s/^"//
s/"|"/|/g
s/""$/"/
$d
'
share|improve this answer
1  
This doesn't deal with the footer. –  terdon yesterday
    
Fixed. Thanks terdon! –  Michael Durrant yesterday
2  
But that will remove the last line no matter what its contents. If there is no FOOTER, it will remove wanted data. –  terdon yesterday

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.