I need to write a script that runs parallel and looks for a string in multiple files.
I tried a lot of options but they slow down the speed of my processor.
|
|
|
If the files are on separate disks, run one For files on the same disk, the bottleneck is reading from the disk. Reading from multiple files in parallel will only make the speed worse. If the files are on a RAID-0 array, you might get a speed increase by running two
With GNU parallel:
If you're getting files from
Remember: if the files are on the same disk, a single |
|||||
|
|
I'm guessing that your files are quite large (otherwise you probably wouldn't care about parallelising the job). The GNU I/O speed is the limiting factor here, not CPU power. Even a single grep process is probably spending most of its time waiting for data from disk (i.e. CPU is mostly idle). If the files are not physically close to each other on the disk, it could be MANY times slower as the disk heads have to move around a lot more (of course, this would not be a problem on an SSD or a ramdisk or if the files are already cached) |
|||
|
|
|
You might try GNU parallel:
( from http://www.gnu.org/software/parallel/man.html#example__parallel_grep ) Edit: Note that other comments that state that grep will run faster sequentially, if the bottleneck is IO, are correct. |
|||||||||||||||||||||
|