calculate sum of squares using shell script in perl/awk

Question

I have 2 files as below.

file1

0.34
0.27
0.32

file2

0.15
0.21
0.15

Now, I would like to calculate the sum of squares between each column. For example,

[(0.34 - 0.15)^2 + (0.27 - 0.21)^2 + (0.32 - 0.15)^2 ] / 3

Where 3 is the total number of lines in the file. I will be having same number of lines in both the files.

I have come up with the below bash script which works perfectly fine, but I want to know if there is some other easier way.

#! /bin/bash   
sum=0.0
while true; do
  read -r lineA <&3
  read -r lineB <&4
  if [ -z "$lineA" -o -z "$lineB" ]; then
    break
  fi
diff=$(bc <<< "scale=5; $lineA - $lineB")
square=$(bc <<< "scale=5; $diff*$diff")
sum=$(bc <<< "scale=5; $sum+$square")
done 3<file1 4<file2
filelen=`wc -l file1 | cut -f1 -d' '`
final=$(bc <<< "scale=5; $sum/$filelen")
echo "$final"

Is there a simpler way in awk or perl?

EDIT

I had 2 million rows in my input file and the input file actually contained scientific numbers like below.

3.59564e-185

My script as well as the suggested answers failed on scientific numbers. However, I could make my script in the question work when I changed the scientific numbers to 10^ notation.

I converted my input file as below.

sed -e 's/[eE]+*/\*10\^/' file1 > file1_converted
sed -e 's/[eE]+*/\*10\^/' file2 > file2_converted

Now, the suggested 2 answers failed giving me the error message as Nan. My script seemed to work but for 2 million rows it is taking a long time to execute.

Is there any efficient way to make it work?

1_CR · Answer 1 · 2014-06-30 22:54:23Z

up vote 7 down vote

One way to do it using paste since your files have the same number of lines.

paste file1 file2 | awk '{s += ($1-$2)^2}; END{print (s+0)/NR}'
0.0228667

answered yesterday

1_CR
4,9782729

it worked for general cases. But when I had 2 million rows with scientific numbers unfortunately it did not work. Even after converting to 10^ notation, this command did not work. I have modified my question with the errors am facing. – Ramesh 21 hours ago

add comment

Hauke Laging · Answer 2 · 2014-06-30 23:12:06Z

up vote 4 down vote

awk 'FNR==NR { file1[NR]=$1; next; }; { diff=$1-file1[FNR]; sum+=diff^2;}; 
  END { print sum/FNR; }' file1 file2

answered yesterday

Hauke Laging
17.5k22159

Please see my edit. Your command worked for general cases. But not for my case. – Ramesh 21 hours ago

@Ramesh My awk does recognize such numbers: echo 0.05 3.59564e-185 | awk '{print $1 * $2}'; echo 0.05 3.59564e-85 | awk '{print $1 * $2^2; }' – Hauke Laging 20 hours ago

add comment

Gnouc · Answer 3 · 2014-07-01 19:09:14Z

up vote 1 down vote

With your big float data, you can use perl with its bignum:

$ paste file1 file2 | perl -Mbignum -anle '
    $sum += ($F[0] - $F[1])**2;
    END {     
        print $sum/$.;
    }                
'
0.02286666666666666666666666666666666666667

answered 21 hours ago

Gnouc
13.2k21542

thanks. Should I convert it to 10^ notation or I can use this perl command on scientific numbers as well? – Ramesh 21 hours ago

Don't need to convert to 10^. Make sure you have enough memory and with 2 milion rows, it will take long time to run. – Gnouc 21 hours ago

thanks again. I will check it out and let you know. – Ramesh 21 hours ago

add comment

asked	yesterday
viewed	163 times
active	today

current community

your communities

more stack exchange communities

calculate sum of squares using shell script in perl/awk

3 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged awk perl or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

calculate sum of squares using shell script in perl/awk

3 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged awk perl or ask your own question.

Related

Hot Network Questions