The pipe is a file opened in an in-kernel file-system and is not accessible as a regular file on-disk. It is automatically buffered only to a certain size and will eventually block when full. Unlike files sourced on block-devices, pipes behave very like character devices, and so generally do not support lseek()
and data read from them cannot be read again as you might do with a regular file.
The here-string is a regular file created in a conventional file-system. The shell creates the file, gets a descriptor for it, and promptly deletes it before ever it writes/reads a byte to/from the file. The kernel will maintain the space required for the file until all processes release all descriptors for them. If the child reading from such a descriptor has the capability to do so, it can be rewound with lseek()
and read again.
In both cases the tokens <<<
and |
represent file-descriptors and not necessarily the files themselves. You can get a better idea of what's going on by doing stuff like:
readlink /dev/fd/1 | cat
...or...
ls -l <<<'' /dev/fd/*
The most significant difference between the two files is that the here-string/doc is pretty much an all-at-once affair - the shell writes all data into it before offering the read descriptor up to the child. On the other hand, the shell opens the pipe on the appropriate descriptors and forks off children to manage those for the pipe - and so it is written/read concurrently at both ends.
These distinctions, though, are only generally true. As far as I am aware (which isn't really all that far) this is true of pretty much every shell which handles the <<<
here-string short-hand for <<
a here-document redirection with the single exception of yash
. yash
, busybox
, dash
, and other ash
variants do tend to back here-documents with pipes, though, and so in those shells there really is very little difference between the two after all.
Ok - two exceptions. Now that I'm thinking about it, ksh93
doesn't actually do a pipe at all for |
, but rather handles the whole business w/ sockets - though it does do a deleted tmp file for <<<*
as most others do. What's more, it only puts the separate sections of a pipeline in a subshell environment which is a sort of POSIX euphemism for at least it acts like a subshell, and so doesn't even do the forks.
The fact is that @PSkocik's benchmark (which is very useful) results here can vary widely for many reasons, and most of these are implementation dependent. For the here-document setup the biggest factors will be the target ${TMPDIR}
file-system type and current cache configuration/availability, and still moreso the amount of data to be written. For the pipe it will be the size of the shell process itself, because copies are made for the required forks. In this way bash
is terrible at pipeline setup (to include $(
command)
substitutions) - because it is big and very slow, but with ksh93
it makes hardly any difference at all.
Here's another little shell snippet to demonstrate how a shell splits off subshells for a pipeline:
pipe_who(){ echo "$$"; sh -c 'echo "$PPID"'; }
pipe_who
pipe_who | { pipe_who | cat /dev/fd/3 -; } 3<&0
32059 #bash's pid
32059 #sh's ppid
32059 #1st subshell's $$
32111 #1st subshell sh's ppid
32059 #2cd subshell's $$
32114 #2cd subshell sh's ppid
The difference there is subshells - while a separate process - still report the parent shell's pid for $$
. But sh
will tell you what its actual parent pid is.