Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems.. It's 100% free, no registration required.

I have N files like so:

file1.txt

Header1,Header2,Header3,Header4,Header5
A,B,RANDOM,1,2
C,D,RANDOM,3,4

fileN.txt

Header1,Header2,Header3,Header4,Header5
A,B,RANDOM,1,2
C,D,RANDOM,3,4

They all have the same headers. I would like to sum all of Header4 and Header5 based on Header1 and Header2. So all items with the A,B fields should sum Header4,Header5.

To print something like

A,B,2,4
C,D,6,8
share|improve this question

1 Answer 1

up vote 2 down vote accepted

Assuming ordering of output is no requirement...

awk '
  BEGIN { FS=OFS=SUBSEP="," }
  { s4[$1,$2]+=$4 ; s5[$1,$2]+=$5 }
  END { for (k in s4) print k,s4[k],s5[k] }
' file1 ... fileN

(Edited and restructured for better legibility.)

share|improve this answer
    
What happens when not all files have A,B? Also would it be possible to print the number of times A,B was found? –  Sten Kin Mar 11 at 21:42
    
@StenKin, what do you mean? This solution does not rely on "A" or "B". Do you mean that some files do not have 5 columns? Or that some fields are empty? –  glenn jackman Mar 11 at 21:44
    
@Sten Kin, in this case there's nothing to add for those files and only the other files contribute to the key "A,B". In other words, the code is robust against inhomogeneous data files of that sort. –  Janis Mar 11 at 21:45
    
Ah that makes sense! Neat solution. –  Sten Kin Mar 11 at 21:49
    
@Sten Kin, I see you had another question in your comment (but not in your original question above). To get the number of appearences of the keys you can count them, c[$1,$2]++, and print thes result (e.g. by adding c[k] as another argument in the print statement). –  Janis Mar 11 at 22:50

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.