Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

How do I truncate column "test10" to 5 characters from Unix command line?

From this

test1,test2,test3,test4,test10,test11,test12,test17
rh,mbn,ccc,khj,ee3 eeeeeEeee ee$eeee e.eeeee2eeeee5eeeeeeee,a2,3,u
hyt,bb,mb,khj,R ee3ee eeEeee ee$eeee e.eeeee2eeeee5eeeeeeee,a,5,r
mbn,htr,ccc,fdf,F1ee eeeeEeee ee$eeee e.eeeee2eeeee5eeeeeeee,a,e,r

To this

test1,test2,test3,test4,test10,test11,test12,test17
rh,mbn,ccc,khj,ee3 e,a2,3,u
hyt,bb,mb,khj,R ee3,a,5,r
mbn,htr,ccc,fdf,F1ee ,a,e,r
share|improve this question
    
Are you actually running Unix or is it Linux? Do you have access to GNU tools? – terdon 22 hours ago
    
I am using Solaris 10 command line. – Mike 22 hours ago
up vote 10 down vote accepted

If your file really is as simple as your example, you can do one of:

  • awk

    $ awk -F, -vOFS=, 'NR>1{$5=substr($5,1,5)}1' file 
    test1,test2,test3,test4,test10,test11,test12,test17
    rh,mbn,ccc,khj,ee3 e,a2,3,u
    hyt,bb,mb,khj,R ee3,a,5,r
    mbn,htr,ccc,fdf,F1ee ,a,e,r
    

    Explanation

    The -F, sets the input field separator to , and the -vOFS=, sets the variable OFS (the output field separator) to ,. NR is the current line number, so the script above will change the 5th field to a 5-character substring of itself. The lone 1 is awk shorthand for "print this line".

  • perl

    $ perl -F, -lane '$F[4]=~s/(.{5}).*/$1/ if $.>1; print join ",", @F' file 
    test1,test2,test3,test4,test10,test11,test12,test17
    rh,mbn,ccc,khj,ee3 e,a2,3,u
    hyt,bb,mb,khj,R ee3,a,5,r
    mbn,htr,ccc,fdf,F1ee ,a,e,r
    

    Explanation

    The -a makes perl act like awk and split its input lines on the character given by -F and saves them as elements of the array @F. We then remove all but the 1st 5 characters of the 5th field (they start counting at 0) and then print the resulting @F array joined with commas.

  • sed

    $ sed  -E '1!s/(([^,]+,){4}[^,]{5,5})[^,]*,/\1,/' file
    test1,test2,test3,test4,test10,test11,test12,test17
    rh,mbn,ccc,khj,ee3 e,a2,3,u
    hyt,bb,mb,khj,R ee3,a,5,r
    mbn,htr,ccc,fdf,F1ee ,a,e,r
    

    Explanation

    This is the substitution operator whose general format is s/original.replacement/. The 1! means "don't do this for the 1st line". The regular expression matches a set of non-, followed by a , 4 times (([^,]+,){4}), then any 5 non-, characters ([^,]{5})—these are the 1st 5 of the 5th field—and then anything else until the end of the field ([^,]+,). All this is replaced with the first part of the line, effectively truncating the field.

share|improve this answer

Using Awk:

$ awk -F "," 'BEGIN { OFS = FS } NR > 1 { $5 = substr($5,1,5) } { print }' data.csv

The -F flag sets the input field separator, and the BEGIN block sets the output field separator to whatever the input field separator is (a comma).

If the ordinal number of the current record (NR) is greater than one (i.e. we've passed the header line), then the substr() function will truncate the fifth field (column) to at most five characters. This avoids modifying the first line of the input data.

The lone print will print the (possibly) modified record (line) to standard output.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.