--Hi guys, --
I have about 4000 (1-50MB) files to sort.
I was thinking to have Python call the Linux sort command. And since I'm thinking this might be somewhat I/O bound, I would use the threading library.
So here's what I have but I when I run it and watch the system monitor I don't see 25 sort tasks pop up. It seems to be running one at a time? What am I doing wrong?
...
print "starting sort"
def sort_unique(file_path):
"""Run linux sort -ug on a file"""
out = commands.getoutput('sort -ug -o "%s" "%s"' % (file_path, file_path))
assert not out
pool = ThreadPool(25)
for fn in os.listdir(target_dir):
fp = os.path.join(target_dir,fn)
pool.add_task(sort_unique, fp)
pool.wait_completion()
Here's where ThreadPool comes from, perhaps that is broken?