shell script to read from multiple files in parallel

Question

I need to write a script that runs parallel and looks for a string in multiple files.
I tried a lot of options but they slow down the speed of my processor.

Gilles · Answer 1 · 2012-09-06 00:26:41Z

If the files are on separate disks, run one grep command on each disk.

For files on the same disk, the bottleneck is reading from the disk. Reading from multiple files in parallel will only make the speed worse.

If the files are on a RAID-0 array, you might get a speed increase by running two grep commands at the same time. Benchmark to see if you really gain time. The low-tech way:

grep file1 file2 file3 &
grep file4 file5 file6

With GNU parallel:

parallel -j 2 grep ::: file1 file2 file3 file4 file5 file6

If you're getting files from find:

find … -print0 | parallel -0 -j 2

Remember: if the files are on the same disk, a single grep command is the fastest.

Thanks . I am using parallel but it says command not found . I tried giving it exact path but no luck , any suggestions — helloworld0722, Sep 6 '12 at 0:31

cas · Answer 2 · 2012-09-06 01:37:00Z

I'm guessing that your files are quite large (otherwise you probably wouldn't care about parallelising the job).

The GNU parallel suggestions are good (and GNU's xargs also has a -P option for parallel execution) BUT given that grepping a file (or files) is an I/O-bound operation, not CPU-bound, you may find that running multiple greps in parallel actually slows things down because you now have multiple processes competing for disk access.

I/O speed is the limiting factor here, not CPU power. Even a single grep process is probably spending most of its time waiting for data from disk (i.e. CPU is mostly idle).

If the files are not physically close to each other on the disk, it could be MANY times slower as the disk heads have to move around a lot more (of course, this would not be a problem on an SSD or a ramdisk or if the files are already cached)

sandesh247 · Answer 3 · 2012-09-06 16:09:03Z

up vote 2 down vote

You might try GNU parallel:

find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}

( from http://www.gnu.org/software/parallel/man.html#example__parallel_grep )

Edit: Note that other comments that state that grep will run faster sequentially, if the bottleneck is IO, are correct.

edited Sep 6 '12 at 16:09

answered Sep 6 '12 at 0:01

sandesh247
1213

thanks for the reply , i didnt quite understand by reading the link . here is what I am trying to do I have a command that needs to be run parellely and look for strings in multiple files located in one directory . Lets say the string I am looking for is "sandesh247" and I have 20 files ... what will be the resulting command ? I would appreciate if you could please help as I am fairly new to shell . thanks in advance – helloworld0722 Sep 6 '12 at 0:16

suppose the directory in which your files are is '/path/to/dir' (and you want to search all the files in dir), the command is: find /path/to/dir -type f | parallel -k -j150% -n 1000 -m grep -H -n 'sandesh247' {} – sandesh247 Sep 6 '12 at 5:31

thanks for your reply . Can parellel be used to process a command mulitple times ?? I have a script in which i am running a for loop to execute a command multiple times . do you suggest using parellel ? – helloworld0722 Sep 6 '12 at 15:18

No, the reasons for using a for loop, and the parallel program, are different. If for serves you well in your current context, I suggest you stay with it. – sandesh247 Sep 6 '12 at 16:11

@helloworld0722 look at -N0 – Ole Tange Sep 11 '12 at 23:11

add a comment |

asked	2 years ago
viewed	1566 times
active	1 year ago

current community

your communities

more stack exchange communities

shell script to read from multiple files in parallel

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged shell files grep parallel or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

shell script to read from multiple files in parallel

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged shell files grep parallel or ask your own question.

Related

Hot Network Questions