I have two ways of calling printf
on my system:
$ type -a printf
printf is a shell builtin
printf is /usr/bin/printf
$ file /usr/bin/printf
/usr/bin/printf: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically
linked (uses shared libs), for GNU/Linux 2.6.32,
BuildID[sha1]=d663d220e5c2a2fc57462668d84d2f72d0563c33, stripped
So, one is a bash built in command and the other is a proper compiled executable. I would have expected a program whose only job is pritnf
to be much faster than the shell function. Granted, the builtin is already loaded into memory but the actual execution time should be faster in a dedicated program right? It would be optimized to do one thing very well in the best of Unix philosophy.
Apparently not:
$ >/tmp/foo; time for i in `seq 1 3000`; do printf '%s ' "$i" >> /tmp/foo; done;
real 0m0.065s
user 0m0.036s
sys 0m0.024s
$ >/tmp/foo; time for i in `seq 1 3000`; do /usr/bin/printf '%s ' "$i" >> /tmp/foo; done;
real 0m18.097s
user 0m1.048s
sys 0m7.124s
A lot of this, as @Guru points out is because of the cost of creating threads which is only incurred by /usr/bin/printf
. If that were all, I would expect the executable to be faster than the builtin if run outside a loop. Unfortunately, /usr/bin/printf
has a limit to the size of a variable that it can take, so I could only test this with a relatively short string:
$ i=$(seq 1 28000 | awk '{k=k$1}END{print k}'); time /usr/bin/printf '%s ' "$i" > /dev/null;
real 0m0.035s
user 0m0.004s
sys 0m0.028s
$ i=$(seq 1 28000 | awk '{k=k$1}END{print k}'); time printf '%s ' "$i" > /dev/null;
real 0m0.008s
user 0m0.008s
sys 0m0.000s
The builtin is still consistently and significantly faster. To make it even clearer, lets make both start new processes:
$ time for i in `seq 1 1000`; do /usr/bin/printf '%s ' "$i" >/dev/null; done;
real 0m33.695s
user 0m0.636s
sys 0m30.628s
$ time for i in `seq 1 1000`; do bash -c "printf '%s ' $i" >/dev/null; done;
real 0m3.557s
user 0m0.380s
sys 0m0.508s
The only reason I can think of is that the variable being printed is internal to bash
and can be passed directly to the builtin. Is that enough to explain the difference in speed? What other factors are at play?