We have a regular job that does du
summaries of a number of subdirectories, picking out worst offenders, and use the output to find if there are things that are rapidly rising to spot potential problems. We use diff
against snapshots to compare them.
There is a top level directory, with a number (few hundred) of subdirectories, each of which may contain 10's of thousands of files each (or more).
A "du -s
" in this context can be very IO aggressive, causing our server to bail its cache and then massive IO spikes which are a very unwelcome side affect.
What strategy can be used to get the same data, without the unwanted side effects?