There's no error with head -n 2
; you can check that by removing the |
and the subsequent code.
The problem is that the code between the braces is only executed once - it's not a loop. And read
only reads data from a single line of input. So you need to make some sort of loop to print data for multiple files.
You could use a while
loop, or you could take advantage of the built-in loop of awk to read & print the data. Eg, the awk command below only prints the size info if the size of the current file is different to the size of the previous file.
awk 'BEGIN{size=-1}; {if($1!=size){size=$1; printf "size: %d\n", size}; printf "\t%s\n", $2}'
We don't really need to explicitly initialize size
, since it's automatically initialized to the empty string, but it's nice to be explicit about these things, IMHO.
That awk command replaces the
{
read -r file dir
printf "size: %d\n\t%s\n" "$file" "$dir"
}
section of your code. In other words, you can use
find "$dir" -type f -printf '%s %p\n' |
sort -n -r | head -n 2 |
awk 'BEGIN{size=-1};
{if($1!=size){size=$1; printf "size: %d\n", size};
printf "\t%s\n", $2}'
You can put it all on one line, or split it over multiple lines. It's also possible to put the awk program into its own file, but there's no need to do that for such a tiny program.
Note that you can make the -n
option to head
as large as you like, and the awk program will behave as expected. Also note that awk is very fast - it's much more efficient than using read
and printf
.
FWIW, awk code for simple text processing is often significantly faster than equivalent Python code, so even though many consider awk to be antiquated it's still quite popular.
To print the data for only the largest file(s) in a directory, you can do this:
find . -type f -printf '%s %p\n' |
sort -nr |
awk 'NR==1{size=$1;printf "size: %d\n", size};
$1!=size{exit};
{printf "\t%s\n", $2}'
The NR==1
says to execute the following block (the stuff inside the {}
) only when the Number of the Record equal 1 - a record is just a line. So we get the size of the first file, which is the largest file (thanks to the precedingsort
command), save it in the size
variable, and then print the size.
$1!=size{exit}
says to exit the program as soon as we read a line where the data in the 1st field doesn't match what we've saved in the size
variable.
The last block {printf "\t%s\n", $2}
prints the pathname for each file.
There are various ways to print both the largest and smallest files found by the find
command. One way would be to read all the data into awk, storing it in an array, sort the array, and then print the data for the files of maximum & minimum size. But I'm going to adopt a simpler strategy here, and re-cycle my existing code. To do this more efficiently, I'll put the awk program into a file. Save this file to a directory in your command PATH and give it execute permission.
field1match.awk
#!/usr/bin/awk -f
# print only the records whose 1st field matches that of the 1st record
# Written by PM 2Ring 2015.05.21
NR==1{size=$1; printf "size: %d\n", size}
$1!=size{exit}
{printf "\t%s\n", $2}
And here's the command line which uses tee
to duplicate the output from find
and then sort it and print it using process substitution:
find "$dir" -type f -printf '%s %p\n' |
tee > >(sort -n | field1match.awk) >(sort -rn | field1match.awk)