optimizing a bash script chain - speed suggestions?

Question

i am testing an application that i wrote and want to test the solution my algorithm produces to a Monte carlo solution. I use the harddisk a lot and i was wondering if there was a solution that uses writing data to a file a lot less, since it is really slowing the process down.

The solutions are computed on the nodes of a cluster and examined using this script ( that runs on a node): Parameter $1 is an outputfile that the program wrote.

file=$1
script=/home/hefke/ov_paper/scripts
mv $file.out $file.out.old
grep "Overlapscore:" $file.monte > $file.grepped
awk '/./{print $2}' $file.grepped > $file.overlap
print "$script/std_dev.sh $file.overlap > $file.out"
$script/std_dev.sh $file.overlap > $file.out
cat $file.analy >> $file.out
cat "DONE" >> $file.out

Here is the script that collects the data on the main node. Analy and Monte files are my output files.

echo "Processing outputfiles for the mc_stdev_of_ov"
script=/home/hefke/ov_paper/scripts
curdir=`pwd`
folder=filedata
for file in `ls -1 $curdir/temp_output/$folder/*.analy| sed 's/\(.*\)\..*/\1/'|uniq`
do 
    echo $file
    $script/submitter.sh $curdir "processonefile.sh $file.out"
done
echo "$file.out now contains what stdtev spat out."
cat $curdir/temp_output/$folder/*.out >> $curdir/temp_output/tmp.out 
awk -f keys.awk $curdir/temp_output/tmp.out >> table.out
cat table.out

How can i optimize this procedure for speed?

l0b0 · Accepted Answer · 2012-03-13 15:20:24Z

up vote 1 down vote accepted

You don't need to store in files between each command. Instead, just redirect the output:

$script/std_dev.sh < <(grep "Overlapscore:" $file.monte | awk '/./{print $2}') > $file.out

The Bash Guide has an excellent article about I/O.

There's only one place where you write to tmp.out, and awk can take more than one file, so you can simplify those lines similarly:

awk -f keys.awk $curdir/temp_output/$folder/*.out

There's no need to redirect to table.out and cating it afterwards.

You shouldn't use ls in scripts; you can simply loop over a glob:

for file in $curdir/temp_output/$folder/*.analy
    file="${file%.*}" # Remove extension

edited Mar 13 '12 at 15:20

answered Mar 13 '12 at 15:09

l0b0
258112

	when i use script/std_dev.sh < <(grep "Overlapscore:" $file.monte \| awk '/./{print $2}') > $file.out, it tells me :Missing name for redirect. – tarrasch Mar 14 '12 at 13:28
	Are you sure you're actually running Bash? – l0b0 Mar 14 '12 at 13:50
	l0b0 you sir are a genius. as a matter of fact i am not :(. I am running the cshell. Thank you very much for your answer anyways :) – tarrasch Mar 15 '12 at 6:44
	not relating to the question any more, but is there a way to group commands with the () as in bash in cshell? – tarrasch Mar 15 '12 at 8:03
	Sorry @tarrasch, csh is one beast I've never had to handle, so I really don't know. Maybe food for a separate question on USE? – l0b0 Mar 15 '12 at 8:53

Olivier Dulac · Answer 2 · 2012-12-06 16:45:39Z

It's not related, but please don't mind if I use an "answer" to just comment : it seems I can't comment, maybe because I don't have enough points yet to do so...

Tarrasch, if you still use csh for your shell, please do not script in it.

Please read: http://www.faqs.org/faqs/unix-faq/shell/csh-whynot/

Use instead sh, bash (or even ksh). And better to stick to sh-only because that's what's all unix system rely on (and rc scripts, for example, are based on).

asked	1 year ago
viewed	308 times
active	5 months ago

optimizing a bash script chain - speed suggestions?

2 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged optimization bash or ask your own question.

optimizing a bash script chain - speed suggestions?

2 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged optimization bash or ask your own question.

Related