Why is bash's printf faster than /usr/bin/printf?

Question

I have two ways of calling printf on my system:

$ type -a printf
printf is a shell builtin
printf is /usr/bin/printf
$ file /usr/bin/printf
/usr/bin/printf: ELF 64-bit LSB  executable, x86-64, version 1 (SYSV), dynamically
linked (uses shared libs), for GNU/Linux 2.6.32,
BuildID[sha1]=d663d220e5c2a2fc57462668d84d2f72d0563c33, stripped

So, one is a bash built in command and the other is a proper compiled executable. I would have expected a program whose only job is pritnf to be much faster than the shell function. Granted, the builtin is already loaded into memory but the actual execution time should be faster in a dedicated program right? It would be optimized to do one thing very well in the best of Unix philosophy.

Apparently not:

$ >/tmp/foo; time for i in `seq 1 3000`; do printf '%s ' "$i" >> /tmp/foo; done;
real    0m0.065s
user    0m0.036s
sys     0m0.024s

$ >/tmp/foo; time for i in `seq 1 3000`; do /usr/bin/printf '%s ' "$i" >> /tmp/foo; done;   
real    0m18.097s
user    0m1.048s
sys     0m7.124s

A lot of this, as @Guru points out is because of the cost of creating threads which is only incurred by /usr/bin/printf. If that were all, I would expect the executable to be faster than the builtin if run outside a loop. Unfortunately, /usr/bin/printf has a limit to the size of a variable that it can take, so I could only test this with a relatively short string:

$ i=$(seq 1 28000 | awk '{k=k$1}END{print k}'); time /usr/bin/printf '%s ' "$i" > /dev/null; 

real    0m0.035s
user    0m0.004s
sys     0m0.028s

$ i=$(seq 1 28000 | awk '{k=k$1}END{print k}'); time printf '%s ' "$i" > /dev/null; 

real    0m0.008s
user    0m0.008s
sys     0m0.000s

The builtin is still consistently and significantly faster. To make it even clearer, lets make both start new processes:

$ time for i in `seq 1 1000`; do /usr/bin/printf '%s ' "$i" >/dev/null; done;   
real    0m33.695s
user    0m0.636s
sys     0m30.628s

$ time for i in `seq 1 1000`; do bash -c "printf '%s ' $i" >/dev/null; done;   

real    0m3.557s
user    0m0.380s
sys     0m0.508s

The only reason I can think of is that the variable being printed is internal to bash and can be passed directly to the builtin. Is that enough to explain the difference in speed? What other factors are at play?

I just tried your latest test, and the builtin is about 27% slower than the external. (~18 vs ~13 seconds.) This is on a fairly fast machine under OS X 10.8. I increased the iteration count to 10,000 to be sure. (The times I've given are with the higher iteration count.) — Warren Young, Sep 27 at 6:27

slm · Accepted Answer · 2013-09-27 09:45:28Z

Standalone printf

Part of the "expense" in invoking a process is that several things have to happen that are resource intensive.

The executable has to be loaded from the disk, this incurs slowness since the HDD has be be accessed in order to load the binary blob from the disk which the executable is stored as.
The executable is typically built using dynamic libraries, so some secondary files to the executable will also have to be loaded, (i.e. more binary blob data being read from the HDD).

Operating system overhead. Each process that you invoke incurs overhead in the form of a process ID having to be created for it. Space in memory will also have be carved out to both house the binary data being loaded from the HDD in steps 1 & 2, as well as multiple structures having to be populated to store things such as the processes' environment (environment variables etc.)

excerpt of an strace of /usr/bin/printf

$ strace /usr/bin/printf "%s\n" "hello world"
*execve("/usr/bin/printf", ["/usr/bin/printf", "%s\\n", "hello world"], [/* 91 vars */]) = 0
brk(0)                                  = 0xe91000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a6b000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=242452, ...}) = 0
mmap(NULL, 242452, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd155a2f000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\357!\3474\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1956608, ...}) = 0
mmap(0x34e7200000, 3781816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x34e7200000
mprotect(0x34e7391000, 2097152, PROT_NONE) = 0
mmap(0x34e7591000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x191000) = 0x34e7591000
mmap(0x34e7596000, 21688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x34e7596000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a2e000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a2c000
arch_prctl(ARCH_SET_FS, 0x7fd155a2c720) = 0
mprotect(0x34e7591000, 16384, PROT_READ) = 0
mprotect(0x34e701e000, 4096, PROT_READ) = 0
munmap(0x7fd155a2f000, 242452)          = 0
brk(0)                                  = 0xe91000
brk(0xeb2000)                           = 0xeb2000
brk(0)                                  = 0xeb2000
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=99158752, ...}) = 0
mmap(NULL, 99158752, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd14fb9b000
close(3)                                = 0
fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd155a6a000
write(1, "hello world\n", 12hello world
)           = 12
close(1)                                = 0
munmap(0x7fd155a6a000, 4096)            = 0
close(2)                                = 0
exit_group(0)                           = ?*

Looking thorugh the above you can get a sense of the additional resources that /usr/bin/printf is having to incur due to it being a standalone executable.

Builtin printf

With the built version of printf all the libraries that it depends on as well as it's binary blob have already been loaded into memory when Bash was invoked. So none of that has to be incurred again.

Effectively when you call the builtin "commands" to Bash, you're really making what amounts to a function call, since everything has already been loaded.

An analogy

If you've ever worked with a programming language, such as Perl, it's equivalent to making calls to the function (system("mycmd)) or using the backticks (\mycmd``). When you do either of those things, you're forking a separate process with it's own overhead, vs. using the functions that are offered to you through Perl's core functions.

Anatomy of Linux Process Management

There's a pretty good article on IBM Developerworks that breaksdown the various aspects of how Linux processes are created and destroyed along with the different C libraries involved in the process. The articled is titled:Anatomy of Linux process management - Creation, management, scheduling, and destruction. It's also available as a PDF.

terdon · Answer 2 · 2013-09-27 05:42:35Z

up vote 4 down vote

Execution of an external command /usr/bin/printf leads to a process creation which a shell built-in does not. So, for a loop of 3000, 3000 processes created, and hence slower.

You can check this by running them outside a loop:

edited Sep 27 at 5:42

terdon
15.6k21148

answered Sep 27 at 5:01

Guru
1,38715

Fair enough, I just wouldn't have expected the process creation to be that expensive, that is a huge difference. – terdon Sep 27 at 5:34

2

@terdon You are also saying that the executable is a C program and that it should be fast. The shell is also written in C and the implementation can be as fast (or could even be the same). – Matteo Sep 27 at 5:38

@Matteo true, but the shell is doing all sorts of other things, I would have expected the standalone, dedicated program to be faster. I'll clarify. – terdon Sep 27 at 5:43

@Guru, see updated question. – terdon Sep 27 at 5:44

1

"the shell is doing all sorts of other things" -> No, the shell is a single threaded process. It does only one thing at a time...except when it forks (which actually makes it two shells). "I just wouldn't have expected the process creation to be that expensive" -> I'd guess process creation is about as expensive a system call as you can find. Each one of those is a fork(). An entire new address space has to be created, etc. – goldilocks Sep 27 at 6:31

add comment

slm · Answer 3 · 2013-09-27 09:18:32Z

While the fact that the time of spawning and setting up a new process and loading, executing and initializing, cleaning up and terminating a program and its library dependencies in it overshadows by far the actually time needed to perform the action has already been covered, here are some timings with different printf implementations for one expensive action that is not overshadowed by the rest:

$ time /usr/bin/printf %2000000000s > /dev/null
/usr/bin/printf %2000000000s > /dev/null  13.72s user 1.42s system 99% cpu 15.238 total

$ time busybox printf %2000000000s > /dev/null
busybox printf %2000000000s > /dev/null  1.50s user 0.49s system 95% cpu 2.078 total


$ time bash -c 'printf %2000000000s' > /dev/null
bash -c 'printf %2000000000s' > /dev/null  4.59s user 3.35s system 84% cpu 9.375 total

$ time zsh -c 'printf %2000000000s' > /dev/null
zsh -c 'printf %2000000000s' > /dev/null  1.48s user 0.24s system 81% cpu 2.115 total

$ time ksh -c 'printf %2000000000s' > /dev/null
ksh -c 'printf %2000000000s' > /dev/null  0.48s user 0.00s system 88% cpu 0.543 total

$ time mksh -c 'printf %2000000000s' > /dev/null
mksh -c 'printf %2000000000s' > /dev/null  13.59s user 1.57s system 99% cpu 15.262 total

$ time ash -c 'printf %2000000000s' > /dev/null
ash -c 'printf %2000000000s' > /dev/null  13.74s user 1.42s system 99% cpu 15.214 total

$ time yash -c 'printf %2000000000s' > /dev/null
yash -c 'printf %2000000000s' > /dev/null  13.73s user 1.40s system 99% cpu 15.186 total

You can see that at least in that regard, GNU printf has not been optimized for performance. There's not much point optimizing a command like printf anyway because for 99.999% of usages, the time spent performing the action is going to be overshadowed by the execution time anyway. It makes a lot more sense to optimize commands like grep or sed that can potentially process gigabytes of data in one run.

asked	2 months ago
viewed	183 times
active	2 months ago

Explore our sites

Why is bash's printf faster than /usr/bin/printf?

3 Answers

Standalone printf

Builtin printf

An analogy

Anatomy of Linux Process Management

Your Answer

Not the answer you're looking for? Browse other questions tagged bash performance shell-builtin or ask your own question.

Hot Network Questions

Explore our sites

Why is bash's printf faster than /usr/bin/printf?

3 Answers

Standalone printf

Builtin printf

An analogy

Anatomy of Linux Process Management

Your Answer

Sign up or login

Post as a guest

Not the answer you're looking for? Browse other questions tagged bash performance shell-builtin or ask your own question.

Related

Hot Network Questions