Comparing files in unix columnwise

Question

I want to compare two files with same number of rows and columns with records in same order. Just want to highlight the differences in the column values if any.

file A:

1,kolkata,19,ab

2,delhi,89,cd

3,bangalore,56,ef

file B:

1,kolkata,21,ab

2,mumbai,89,gh

3,bangalore,11,kl

Considering column 1 as primary key, we have differences in other columns. I want to highlight those differences.

Output format may be (not sure):

record_number,  columns_with_diff
1               3

2               2,4

3               3,4

Can diff or comm solve my problem? If yes then what would be the exact command?

Are the blank lines part of your file? – terdon♦ Jun 24 at 13:10 — terdon♦, Jun 24 at 13:10

LatinSuD · Answer 1 · 2014-06-24 13:36:41Z

This would be it. Just some style issue with the extra comma at the end of each line.

awk '
     BEGIN{ FS=","; ORS="" }

     { 
       # read line from secondary file
       getline aux < "file2"
       split(aux,f2,",")

       # print current line number
       print NR" "

       # process each field in current line
       for(i=1; i<=NF; i++) {
         if ($i!=f2[i]) {
           print i","
         }
       }
       print "\n"
     }
' file1

The output:

1 3,
2 2,4,
3 3,4,

Gnouc · Answer 2 · 2014-06-25 09:17:04Z

You can do it easier with perl:

$ perl -F',' -anle '
    BEGIN{
        print "record_number,  columns_with_diff";
        $" = ",";
    }
    if (!defined($h{$.})) {
        @{$h{$.}}{0..$#F} = @F[0..$#F];
    } else {
        @diff =  grep { $h{$.}{$_} ne $F[$_] } 0..$#F;
        print "$.\t\t@{[map {$_+1} @diff]}";
    } 
    close ARGV if eof;
' file1 file2
record_number,  columns_with_diff
1       3
2       2,4
3       3,4

You should remove blank lines in your input for this works.

Explanation

In BEGIN block we just print the header of output, then set list separator to ,
@{$h{$.}}{0..$#F} = @F[0..$#F]: we create a hash of hashes with the keys of first hash are the line number, each sub hashes has the keys are index of field minus 1, and the values are values coresspond with that fields.

Here we use a hash slice to quickly assign values to hash of hashes.

if you use Data::Dumper to print the hash of hashes %h, you can see something like this:

VAR1 = {
          '2' => {
                   '2' => '89',
                   '0' => '2',
                   '1' => 'delhi',
                   '3' => 'cd'
                 },
          '3' => {
                   '1' => 'bangalore',
                   '3' => 'ef',
                   '0' => '3',
                   '2' => '56'
                 },
          '1' => {
                   '3' => 'ab',
                   '1' => 'kolkata',
                   '0' => '1',
                   '2' => '19'
                 }
        };

If we have created %h (if (!defined($h{$.}))) - meaning we finished processing file1 - we just compare each fields of current line with coresspond value in %h, storing all the index that different in array @diff. map {$_+1} @diff restore the column number, because index of array start at 0, column number start at 1.
close ARGV if eof restore $. counter.

asked	15 days ago
viewed	44 times
active	14 days ago

current community

your communities

more stack exchange communities

Comparing files in unix columnwise

2 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged files diff comm or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Comparing files in unix columnwise

2 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged files diff comm or ask your own question.

Related

Hot Network Questions