Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems.. It's 100% free, no registration required.

I want to compare two files with same number of rows and columns with records in same order. Just want to highlight the differences in the column values if any.

file A:

1,kolkata,19,ab

2,delhi,89,cd

3,bangalore,56,ef

file B:

1,kolkata,21,ab

2,mumbai,89,gh

3,bangalore,11,kl

Considering column 1 as primary key, we have differences in other columns. I want to highlight those differences.

Output format may be (not sure):

record_number,  columns_with_diff
1               3

2               2,4

3               3,4

Can diff or comm solve my problem? If yes then what would be the exact command?

share|improve this question
    
Are the blank lines part of your file? –  terdon Jun 24 at 13:10
add comment

2 Answers

This would be it. Just some style issue with the extra comma at the end of each line.

awk '
     BEGIN{ FS=","; ORS="" }

     { 
       # read line from secondary file
       getline aux < "file2"
       split(aux,f2,",")

       # print current line number
       print NR" "

       # process each field in current line
       for(i=1; i<=NF; i++) {
         if ($i!=f2[i]) {
           print i","
         }
       }
       print "\n"
     }
' file1

The output:

1 3,
2 2,4,
3 3,4,
share|improve this answer
add comment

You can do it easier with perl:

$ perl -F',' -anle '
    BEGIN{
        print "record_number,  columns_with_diff";
        $" = ",";
    }
    if (!defined($h{$.})) {
        @{$h{$.}}{0..$#F} = @F[0..$#F];
    } else {
        @diff =  grep { $h{$.}{$_} ne $F[$_] } 0..$#F;
        print "$.\t\t@{[map {$_+1} @diff]}";
    } 
    close ARGV if eof;
' file1 file2
record_number,  columns_with_diff
1       3
2       2,4
3       3,4

You should remove blank lines in your input for this works.

Explanation

  • In BEGIN block we just print the header of output, then set list separator to ,

  • @{$h{$.}}{0..$#F} = @F[0..$#F]: we create a hash of hashes with the keys of first hash are the line number, each sub hashes has the keys are index of field minus 1, and the values are values coresspond with that fields.

Here we use a hash slice to quickly assign values to hash of hashes.

if you use Data::Dumper to print the hash of hashes %h, you can see something like this:

VAR1 = {
          '2' => {
                   '2' => '89',
                   '0' => '2',
                   '1' => 'delhi',
                   '3' => 'cd'
                 },
          '3' => {
                   '1' => 'bangalore',
                   '3' => 'ef',
                   '0' => '3',
                   '2' => '56'
                 },
          '1' => {
                   '3' => 'ab',
                   '1' => 'kolkata',
                   '0' => '1',
                   '2' => '19'
                 }
        };
  • If we have created %h (if (!defined($h{$.}))) - meaning we finished processing file1 - we just compare each fields of current line with coresspond value in %h, storing all the index that different in array @diff. map {$_+1} @diff restore the column number, because index of array start at 0, column number start at 1.

  • close ARGV if eof restore $. counter.

share|improve this answer
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.