Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

Sign up

Here's how it works:

Anybody can ask a question
Anybody can answer
The best answers are voted up and rise to the top

Execute command on multiple files matching a pattern in parallel

up vote 2 down vote favorite

Let's say I have a command accepting a single argument which is a file path:

mycommand myfile.txt

Now I want to execute this command over multiple files in parallel, more specifically, file matching pattern myfile*.

Is there an easy way to achieve this?

edited Feb 25 at 21:51

Gilles

326k54563999

asked Feb 25 at 11:28

vonPetrushev

315611

You could use find for that, something like find . -name "*xls*" -exec ls -l {} \;. In your case: find . -name "*xls*" -exec mycommand {} \;. Although find searches for files recursively you can limit find not to search in all sub directories. – mnille Feb 25 at 11:32

that would run the commands sequentially, not in parallel – the_velour_fog Feb 25 at 11:34

Oh, ok, sorry, right. I didn't get this point. – mnille Feb 25 at 12:26

add a comment |

3 Answers 3

active oldest votes

up vote 8 down vote accepted

With GNU xargs and a shell with support for process substitution

xargs -r -0 -P4 -n1 -a <(printf '%s\0' myfile*) mycommand

Would run up to 4 mycommands in parallel.

If mycommand doesn't use its stdin, you can also do:

printf '%s\0' myfile* | xargs -r -0 -P4 -n1 mycommand

Which would also work with the xargs of modern BSDs.

For a recursive search for myfile* files, replace the printf command with:

find . -name 'myfile*' -type f -print0

(-type f is for regular-files only. For a glob-equivalent, you need zsh and its printf '%s\0' myfile*(.)).

edited Feb 25 at 11:41

answered Feb 25 at 11:33

Stéphane Chazelas

149k22219396

Is the benefit of this - over a loop - that the xargs -0 and printf "%s\0" combination handle whitespace in the filenames better? – the_velour_fog Feb 25 at 11:39

@the_velour_fog, no the loop will handle whitespace OK as long as you don't forget to quote the variables. The benefit is that you can limit the number of concurrent processes more easily and that you can handle find's output more easily. – Stéphane Chazelas Feb 25 at 11:46

add a comment |

up vote 7 down vote

Using a loop:

for f in myfile*; do
  mycommand "$f" &
done

wait

or using GNU parallel.

answered Feb 25 at 11:31

cuonglm

59.7k983162

add a comment |

up vote 1 down vote

Using GNU Parallel it looks like this:

parallel mycommand ::: myfile*

It will run one job per core.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

answered Feb 25 at 20:59

Ole Tange

5,80642652

add a comment |

Your Answer

Sign up or log in

Post as a guest

Name

Post as a guest

Name

discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged command-line parallelism or ask your own question.

question feed

asked	16 days ago
viewed	105 times
active	16 days ago

current community

your communities

more stack exchange communities

Execute command on multiple files matching a pattern in parallel

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged command-line parallelism or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Execute command on multiple files matching a pattern in parallel

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged command-line parallelism or ask your own question.

Related

Hot Network Questions