1

I'm having two frequency count dictionary, I have tried to merge them using join/sort cmd but always getting wrong output or different frequency count. I want to join them together, adding +1 if the word exist, and the new word count 1

  7 umslipped
  1 umslippersmouthwashand
  3 umslobagas
 35 umslopogaas
  5 (umslopogaas
 15 (umslopogaas)
  1 umslower
  6 umsmall
  2 umsnag
  2 um[snaps
 13 umsnootchie
  2 umsnow
 84 umso
 14 um-so ##

The second dictionary

1   palpating
1   palpated
1   palpate
1   palpably
1   palpable
1   palominos
1   palomino
1   palomar
1   palmyra
1   palmy
1   palmtops
1   palmtop
1   palms
1   palmolive
1   palmists
1   palmistry
1   palmist

1 Answer 1

3

You can use awk to add it all up.

awk '{ arr[$2] += $1} END {for (key in arr) {printf "%4s %s\n", arr[key], key}}' file1 file2

Explanation

  • { arr[$2] += $1} set array index of our string aka $2 to += the number $1
  • END When we're done
  • {for (key in arr) {printf "%4s %s\n", arr[key], key}} Loop through it all and print it out.

What I did to test it

file1

  7 umslipped
  1 umslippersmouthwashand
  3 umslobagas
 35 umslopogaas
  5 (umslopogaas
 15 (umslopogaas)
  1 umslower
  6 umsmall
  2 umsnag
  2 um[snaps
 13 umsnootchie
  2 umsnow
 84 umso
 14 um-so ##

file2

 14 um-so ##
 84 umso
  2 umsnow
 13 umsnootchie
  2 um[snaps
  2 umsnag
  6 umsmall
  1 umslower
 15 (umslopogaas)
  5 (umslopogaas
 35 umslopogaas
  3 umslobagas
  1 umslippersmouthwashand
  7 umslipped

Output

  10 (umslopogaas
  12 umsmall
   6 umslobagas
  28 um-so
   2 umslippersmouthwashand
  30 (umslopogaas)
  70 umslopogaas
  26 umsnootchie
   4 umsnag
 168 umso
   4 um[snaps
  14 umslipped
   4 umsnow
   2 umslower
5
  • putting the output in the same format as the input rather than swapping around the order and adding a colon would mean that the program could be used to process data multiple times.
    – icarus
    Commented Nov 19, 2016 at 2:19
  • Thank you, but this is a dataset the format should be the same,
    – Fuji
    Commented Nov 19, 2016 at 14:37
  • @ahmedsabir I updated the command to change the format to the desired outcome. I am on mobile so I will update the sample output when I'm near a computer Commented Nov 19, 2016 at 14:40
  • @ahmedsabir I updated the command and the output Commented Nov 21, 2016 at 20:12
  • @ZacharyBrady, thanks a lot, it works like a charm.
    – Fuji
    Commented Nov 22, 2016 at 11:35

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.