Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

Let's say I have a command accepting a single argument which is a file path:

mycommand myfile.txt

Now I want to execute this command over multiple files in parallel, more specifically, file matching pattern myfile*.

Is there an easy way to achieve this?

share|improve this question
1  
You could use find for that, something like find . -name "*xls*" -exec ls -l {} \;. In your case: find . -name "*xls*" -exec mycommand {} \;. Although find searches for files recursively you can limit find not to search in all sub directories. – mnille Feb 25 at 11:32
1  
that would run the commands sequentially, not in parallel – the_velour_fog Feb 25 at 11:34
    
Oh, ok, sorry, right. I didn't get this point. – mnille Feb 25 at 12:26
up vote 8 down vote accepted

With GNU xargs and a shell with support for process substitution

xargs -r -0 -P4 -n1 -a <(printf '%s\0' myfile*) mycommand

Would run up to 4 mycommands in parallel.

If mycommand doesn't use its stdin, you can also do:

printf '%s\0' myfile* | xargs -r -0 -P4 -n1 mycommand

Which would also work with the xargs of modern BSDs.

For a recursive search for myfile* files, replace the printf command with:

find . -name 'myfile*' -type f -print0

(-type f is for regular-files only. For a glob-equivalent, you need zsh and its printf '%s\0' myfile*(.)).

share|improve this answer
    
Is the benefit of this - over a loop - that the xargs -0 and printf "%s\0" combination handle whitespace in the filenames better? – the_velour_fog Feb 25 at 11:39
    
@the_velour_fog, no the loop will handle whitespace OK as long as you don't forget to quote the variables. The benefit is that you can limit the number of concurrent processes more easily and that you can handle find's output more easily. – Stéphane Chazelas Feb 25 at 11:46

Using a loop:

for f in myfile*; do
  mycommand "$f" &
done

wait

or using GNU parallel.

share|improve this answer

Using GNU Parallel it looks like this:

parallel mycommand ::: myfile*

It will run one job per core.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.