I have been trying to parallelize the following script, specifically each of the three FOR loop instances, using GNU Parallel but haven't been able to. The 4 commands contained within the FOR loop run in series, each loop taking around 10 minutes.

#!/bin/bash

kar='KAR5'
runList='run2 run3 run4'
mkdir normFunc
for run in $runList
do 
  fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
  fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
  fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
  fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear

  rm -f *.mat
done
share|improve this question
up vote 34 down vote accepted

Why don't you just fork (aka. background) them?

foo () {
    local run=$1
    fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
    fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
    fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
    fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}

for run in $runList; do foo "$run" & done

In case that's not clear, the significant part is here:

for run in $runList; do foo "$run" & done
                                   ^

Causing the function to be executed in a forked shell in the background. That's parallel.

share|improve this answer
2  
That worked like a charm. Thank you. Such a simple implementation (Makes me feel so stupid now!). – Ravnoor S Gill Dec 5 '13 at 21:24
3  
In case I had 8 files to run in parallel but only 4 cores, could that be integrated in such a setting or would that require a Job Scheduler? – Ravnoor S Gill Dec 5 '13 at 21:27
    
It doesn't really matter in this context; it's normal for the system to have more active processes than cores. If you have many short tasks, ideally you would feed a queue serviced by a number or worker threads < the number of cores. I don't know how often that is really done with shell scripting (in which case, they wouldn't be threads, they'd be independent processes) but with relatively few long tasks it would be pointless. The OS scheduler will take care of them. – goldilocks Dec 5 '13 at 21:50
    
You also might want to add a wait command at the end so the master script does not exit until all of the background jobs do. – psusi Nov 19 '15 at 0:22
1  
I would also fine it useful to limit the number of concurrent processes: my processes each use 100% of a core's time for about 25 minutes. This is on a shared server with 16 cores, where many people are running jobs. I need to run 23 copies of the script. If I run them all concurrently, then I swamp the server, and make it useless for everyone else for an hour or two (load goes up to 30, everything else slows way down). I guess it could be done with nice, but then I don't know if it'd ever finish.. – naught101 Nov 26 '15 at 23:00
for stuff in things
do
( something
  with
  stuff ) &
done
wait # for all the something with stuff

Whether it actually works depends on your commands; I'm not familiar with them. The rm *.mat looks a bit prone to conflicts if it runs in parallel...

share|improve this answer
2  
This runs perfectly as well. You are right I would have to change rm *.mat to something like rm $run".mat" to get it to work without one process interfering with the other. Thank you. – Ravnoor S Gill Dec 5 '13 at 21:38
    
@RavnoorSGill Welcome to Stack Exchange! If this answer solved your problem, please mark it as accepted by ticking the check mark next to it. – Gilles Dec 5 '13 at 23:54
3  
+1 for wait, which I forgot. – goldilocks Dec 6 '13 at 12:13
2  
If there are tons of 'things', won't this start tons of processes? It would be better to start only a sane number of processes simultaneously, right? – David Doria Mar 20 '15 at 15:17
    
@DavidDoria sure, this is meant for small scale. (The example in the question had only three items). I use this style for unlocking a dozen LUKS containers on bootup... if I had a lot more, I'd have to use some other method, but on a small scale this is simple enough. – frostschutz Mar 20 '15 at 16:41
for stuff in things
do
sem -j+0 ( something
  with
  stuff )
done
sem --wait

This will use semaphores, parallelizing as many iterations as the number of available cores (-j +0 means you will parallelize N+0 jobs, where N is the number of available cores).

sem --wait tells to wait until all the iterations in the for loop have terminated execution before executing the successive lines of code.

Note: you will need "parallel" (sudo apt-get install parallel).

share|improve this answer
1  
is it possible to go past 60? mine throws an error saying not enough file descriptors. – chovy Nov 27 '15 at 7:47

Sample task

task(){
   sleep 0.5; echo "$1";
}

Sequential runs

for thing in a b c d e f g; do 
   task "$thing"
done

Parallel runs

for thing in a b c d e f g; do 
  task "$thing" &
done

Parallel runs in N-process batches

N=4
(
for thing in a b c d e f g; do 
   ((i=i%N)); ((i++==0)) && wait
   task "$thing" & 
done
)

It's also possible to use FIFOs as semaphores and use them to ensure that new processes are spawned as soon as possible and that no more than N processes runs at the same time. But it requires more code.

N processes with a FIFO-based semaphore:

open_sem(){
    mkfifo pipe-$$
    exec 3<>pipe-$$
    rm pipe-$$
    local i=$1
    for((;i>0;i--)); do
        printf %s 000 >&3
    done
}
run_with_lock(){
    local x
    read -u 3 -n 3 x && ((0==x)) || exit $x
    (
    "$@" 
    printf '%.3d' $? >&3
    )&
}

N=4
open_sem $N
for thing in {a..g}; do
    run_with_lock task $thing
done 
share|improve this answer
1  
The line with wait in it basically lets all processes run, until it hits the nth process, then waits for all of the others to finish running, is that right? – naught101 Nov 26 '15 at 23:03
    
If i is zero, call wait. Increment i after the zero test. – PSkocik Nov 26 '15 at 23:08
    
Love the n parallel runs! Thank you. – joshperry Sep 15 at 16:31

It seems the fsl jobs are depending on eachother, so the 4 jobs cannot be run in parallel. The runs, however, can be run in parallel.

Make a bash function running a single run and run that function in parallel:

#!/bin/bash

myfunc() {
    run=$1
    kar='KAR5'
    mkdir normFunc
    fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
    fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
    fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
    fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}

export -f myfunc
parallel myfunc ::: run2 run3 run4

To learn more watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1 and spend an hour walking through the tutorial http://www.gnu.org/software/parallel/parallel_tutorial.html Your command line will love you for it.

share|improve this answer
    
If you're using a non-bash shell you'll need to also export SHELL=/bin/bash before running parallel. Otherwise you'll get an error like: Unknown command 'myfunc arg' – AndrewHarvey Jul 31 '15 at 3:39
    
@AndrewHarvey: isn't that what the shebang is for? – naught101 Nov 26 '15 at 23:02

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.