Unix Truncate Column in csv file

Question

How do I truncate column "test10" to 5 characters from Unix command line?

From this

test1,test2,test3,test4,test10,test11,test12,test17
rh,mbn,ccc,khj,ee3 eeeeeEeee ee$eeee e.eeeee2eeeee5eeeeeeee,a2,3,u
hyt,bb,mb,khj,R ee3ee eeEeee ee$eeee e.eeeee2eeeee5eeeeeeee,a,5,r
mbn,htr,ccc,fdf,F1ee eeeeEeee ee$eeee e.eeeee2eeeee5eeeeeeee,a,e,r

To this

test1,test2,test3,test4,test10,test11,test12,test17
rh,mbn,ccc,khj,ee3 e,a2,3,u
hyt,bb,mb,khj,R ee3,a,5,r
mbn,htr,ccc,fdf,F1ee ,a,e,r

Are you actually running Unix or is it Linux? Do you have access to GNU tools? — terdon♦, 22 hours ago

terdon · Accepted Answer · 2016-12-29 17:14:42Z

If your file really is as simple as your example, you can do one of:

awk
```
$ awk -F, -vOFS=, 'NR>1{$5=substr($5,1,5)}1' file 
test1,test2,test3,test4,test10,test11,test12,test17
rh,mbn,ccc,khj,ee3 e,a2,3,u
hyt,bb,mb,khj,R ee3,a,5,r
mbn,htr,ccc,fdf,F1ee ,a,e,r
```
Explanation

The -F, sets the input field separator to , and the -vOFS=, sets the variable OFS (the output field separator) to ,. NR is the current line number, so the script above will change the 5th field to a 5-character substring of itself. The lone 1 is awk shorthand for "print this line".
perl
```
$ perl -F, -lane '$F[4]=~s/(.{5}).*/$1/ if $.>1; print join ",", @F' file 
test1,test2,test3,test4,test10,test11,test12,test17
rh,mbn,ccc,khj,ee3 e,a2,3,u
hyt,bb,mb,khj,R ee3,a,5,r
mbn,htr,ccc,fdf,F1ee ,a,e,r
```
Explanation

The -a makes perl act like awk and split its input lines on the character given by -F and saves them as elements of the array @F. We then remove all but the 1st 5 characters of the 5th field (they start counting at 0) and then print the resulting @F array joined with commas.
sed
```
$ sed  -E '1!s/(([^,]+,){4}[^,]{5,5})[^,]*,/\1,/' file
test1,test2,test3,test4,test10,test11,test12,test17
rh,mbn,ccc,khj,ee3 e,a2,3,u
hyt,bb,mb,khj,R ee3,a,5,r
mbn,htr,ccc,fdf,F1ee ,a,e,r
```
Explanation

This is the substitution operator whose general format is s/original.replacement/. The 1! means "don't do this for the 1st line". The regular expression matches a set of non-, followed by a , 4 times (([^,]+,){4}), then any 5 non-, characters ([^,]{5})—these are the 1st 5 of the 5th field—and then anything else until the end of the field ([^,]+,). All this is replaced with the first part of the line, effectively truncating the field.

Kusalananda · Answer 2 · 2016-12-29 16:21:31Z

Using Awk:

$ awk -F "," 'BEGIN { OFS = FS } NR > 1 { $5 = substr($5,1,5) } { print }' data.csv

The -F flag sets the input field separator, and the BEGIN block sets the output field separator to whatever the input field separator is (a comma).

If the ordinal number of the current record (NR) is greater than one (i.e. we've passed the header line), then the substr() function will truncate the fifth field (column) to at most five characters. This avoids modifying the first line of the input data.

The lone print will print the (possibly) modified record (line) to standard output.

asked	today
viewed	159 times
active	today

current community

your communities

more stack exchange communities

Unix Truncate Column in csv file

2 Answers 2

Explanation

Explanation

Explanation

Your Answer

Not the answer you're looking for? Browse other questions tagged text-processing or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Unix Truncate Column in csv file

2 Answers 2

Explanation

Explanation

Explanation

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged text-processing or ask your own question.

Related

Hot Network Questions