Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems.. It's 100% free, no registration required.

I have encountered a problem that exceeds my basic unix knowledge and would really appreciate some help. I have a large file in the following format:

2 1019 0 12
2 1019 3 0
2 1021 0 2
2 1021 2 0
2 1022 4 5
2 1030 0 1
2 1030 5 0
2 1031 4 4

If the values in column two match, I want to sum the values in column 3 and 4 of both lines, else just the sum of the values in the unique line.

So the output I am hoping for would look like this:

2 1019 15
2 1021 4
2 1022 9
2 1030 6
2 1031 8

I am able to sort files according to column 2 with awk or sort and sum the last columns with awk, but only for individual lines not for two lines where column 2 matches.

share|improve this question
1  
What about column 1? –  glenn jackman Mar 18 at 11:59
    
@glennjackman: Column 1 has the same value throughout each file. It serves as an identifier for the file (I have 45 of those) and will be used for some downstream process. For my question it might as well be ignored (or deleted) and later added again. –  TomPio Mar 18 at 14:33
    
or, make $1 $2 as the key. –  glenn jackman Mar 18 at 15:04

3 Answers 3

up vote 4 down vote accepted

I would do this in Perl:

$ perl -lane '$k{"$F[0] $F[1]"}+=$F[2]+$F[3]; 
              END{print "$_ $k{$_}" for keys(%k) }' file 
2 1019 15
2 1021 4
2 1030 6
2 1031 8
2 1022 9

Or awk:

awk '{a[$1" "$2]+=$3+$4}END{for (i in a){print i,a[i]}}' file 

If you want the output sorted according to the second column you could just pipe to sort:

awk '{a[$1" "$2]+=$3+$4}END{for (i in a){print i,a[i]}}' file | sort -k2

Note that both solutions include the 1st column as well. The idea is to use the first and second columns as keys to a hash (in perl) or an associative array (in awk). The key in each solution is column1 column2 so if two lines have the same column two but a different column one, they will be grouped separately:

$ cat file
2 1019 2 3
2 1019 4 1
3 1019 2 2

$ awk '{a[$1" "$2]+=$3+$4}END{for (i in a){print i,a[i]}}' file
3 1019 4
2 1019 10
share|improve this answer

Maybe this could help, but is column 1 always 2 and does results depend on it?

awk '{ map[$2] += $3 + $4; } END { for (i in map) { print "2", i, map[i] | "sort -t't'" } }' file

or as mentioned by glenn jackman in comments about sorting:

gawk '{ map[$2] += $3 + $4; } END { PROCINFO["sorted_in"] = "@ind_str_asc"; for (i in map) { print 2, i, map[i] } }' file
share|improve this answer
2  
If you have GNU awk, use PROCINFO["sorted_in"] = "@ind_num_asc" instead of piping to sort. ref gnu.org/software/gawk/manual/html_node/… –  glenn jackman Mar 18 at 12:01
    
@taliezin: Thanks taliezin and terdon. Both approaches worked like a charm. I really appreciate your help. –  TomPio Mar 18 at 14:39
1  
@taliezin: As I said both worked for me, I marked the terdon answers as the "correct" one. I guess that's what you intended. Thanks again. –  TomPio Mar 20 at 13:43
1  
If I understand the question you want the total unique keys, we can just add a counter and print it: awk '{ map[$2] += $3 + $4; } END { for (i in map) { print "2", i, map[i] | "sort -t'n'"; cnt++; } print "total unique: " cnt }' file –  taliezin Mar 23 at 9:19
1  
It is almost the same: awk '{ map[$2] += $3 + $4; oc[$2]++; } END { for (i in map) { print "2", i, map[i], oc[i] | "sort -t'n'"; } }', now you will see another column with occurrences. –  taliezin Mar 23 at 9:30

You could pre-sort the data and let awk handle the details:

sort -n infile | awk 'NR>1 && p!=$2 {print p,s} {s+=$3+$4} {p=$2}'

Output:

1019 15
1021 19
1022 28
1030 34

If you really want the keep the first column, do something like this:

sort -n infile | awk 'NR>1 && p!=$1FS$2 {print p,s} {s+=$3+$4} {p=$1FS$2}'

Output:

2 1019 15
2 1021 19
2 1022 28
2 1030 34

Explanation

The p variable holds the $2 value of the previous line, or $1FS$2 in the second case above. This means that the {print p,s} is triggered when $2 of the previous line is not the same as the one on the current line (p!=$2).

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.