Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems.. It's 100% free, no registration required.

We can get the same result using the following two in bash,

echo 'foo' | cat

and

cat <<< 'foo'

My question is what are the difference between these two as far as the resources used are concerned and which one is better ?

My thought is that while using pipe we are using an extra process echo and pipe while in here string only a file descriptor is being used with cat.

share|improve this question

2 Answers 2

up vote 10 down vote accepted

The pipe is a file opened in an in-kernel file-system and is not accessible as a regular file on-disk. It is automatically buffered only to a certain size and will eventually block when full. Unlike files sourced on block-devices, pipes behave very like character devices, and so generally do not support lseek() and data read from them cannot be read again as you might do with a regular file.

The here-string is a regular file created in a conventional file-system. The shell creates the file, gets a descriptor for it, and promptly deletes it before ever it writes/reads a byte to/from the file. The kernel will maintain the space required for the file until all processes release all descriptors for them. If the child reading from such a descriptor has the capability to do so, it can be rewound with lseek() and read again.

In both cases the tokens <<< and | represent file-descriptors and not necessarily the files themselves. You can get a better idea of what's going on by doing stuff like:

readlink /dev/fd/1 | cat

...or...

ls -l <<<'' /dev/fd/*

The most significant difference between the two files is that the here-string/doc is pretty much an all-at-once affair - the shell writes all data into it before offering the read descriptor up to the child. On the other hand, the shell opens the pipe on the appropriate descriptors and forks off children to manage those for the pipe - and so it is written/read concurrently at both ends.

These distinctions, though, are only generally true. As far as I am aware (which isn't really all that far) this is true of pretty much every shell which handles the <<< here-string short-hand for << a here-document redirection with the single exception of yash. yash, busybox, dash, and other ash variants do tend to back here-documents with pipes, though, and so in those shells there really is very little difference between the two after all.

Ok - two exceptions. Now that I'm thinking about it, ksh93 doesn't actually do a pipe at all for |, but rather handles the whole business w/ sockets - though it does do a deleted tmp file for <<<* as most others do. What's more, it only puts the separate sections of a pipeline in a subshell environment which is a sort of POSIX euphemism for at least it acts like a subshell, and so doesn't even do the forks.

The fact is that @PSkocik's benchmark (which is very useful) results here can vary widely for many reasons, and most of these are implementation dependent. For the here-document setup the biggest factors will be the target ${TMPDIR} file-system type and current cache configuration/availability, and still moreso the amount of data to be written. For the pipe it will be the size of the shell process itself, because copies are made for the required forks. In this way bash is terrible at pipeline setup (to include $(command) substitutions) - because it is big and very slow, but with ksh93 it makes hardly any difference at all.

Here's another little shell snippet to demonstrate how a shell splits off subshells for a pipeline:

pipe_who(){ echo "$$"; sh -c 'echo "$PPID"'; }
pipe_who
pipe_who | { pipe_who | cat /dev/fd/3 -; } 3<&0

32059  #bash's pid
32059  #sh's ppid
32059  #1st subshell's $$
32111  #1st subshell sh's ppid
32059  #2cd subshell's $$
32114  #2cd subshell sh's ppid

The difference there is subshells - while a separate process - still report the parent shell's pid for $$. But sh will tell you what its actual parent pid is.

share|improve this answer
    
Very helpful. The in-kernel filesystem, is there a name for it ? does it mean it exists in the kernel space ? –  utlamn 18 hours ago
1  
@utlamn - actually, yes - simply pipefs. It's all in-kernel - but (aside from stuff like FUSE) so is all file i/o. –  mikeserv 18 hours ago

There's no substitute for benchmarking:

pskocik@ProBook:~ 
$ time (for((i=0;i<1000;i++)); do cat<<< foo >/dev/null; done  )

real    0m2.080s
user    0m0.738s
sys 0m1.439s
pskocik@ProBook:~ 
$ time (for((i=0;i<1000;i++)); do echo foo |cat >/dev/null; done  )

real    0m4.432s
user    0m2.095s
sys 0m3.927s
$ time (for((i=0;i<1000;i++)); do cat <(echo foo) >/dev/null; done  )
real    0m3.380s
user    0m1.121s
sys 0m3.423s

And for a larger amount of data:

TENMEG=$(ruby -e 'puts "A"*(10*1024*1024)')
pskocik@ProBook:~ 
$ time (for((i=0;i<100;i++)); do echo "$TENMEG" |cat >/dev/null; done  )

real    0m42.327s
user    0m38.591s
sys 0m4.226s
pskocik@ProBook:~ 
$ time (for((i=0;i<100;i++)); do cat<<< "$TENMEG" >/dev/null; done  )

real    1m26.946s
user    1m23.116s
sys 0m3.681s
pskocik@ProBook:~ 

$ time (for((i=0;i<100;i++)); do cat <(echo "$TENMEG") >/dev/null; done  )

real    0m43.910s
user    0m40.178s
sys 0m4.119s

It would appear the pipe version has a larger setup cost but is in the end more efficient.

share|improve this answer
    
@mikeserv That was correct. I added a benchmark with a larger amount of data. –  PSkocik 18 hours ago
1  
echo foo >/dev/shm/1;cat /dev/shm/1 >/dev/null seemed to be fast in both cases... –  user23013 10 hours ago
    
@user23013 That makes sense. I don't see why either echo "$longstring" or <<<"$longstring" would be tweaked for efficiency and with short strings, efficiency doesn't matter much anyway. –  PSkocik 6 hours ago
    
It is interesting that in my case (on Ubuntu 14.04, Intel quad core i7) cat <(echo foo) >/dev/null is faster than echo foo | cat >/dev/null. –  pabouk 3 hours ago
    
@pabouk That is interesting. I added benchmarks for that too. For the small data version it wins against | which would indicate that it has a better setup cost, but for larger data it has pretty much the same profile like the regular | (it should be a pipe underneath in both cases). Thanks for mentioning it. –  PSkocik 3 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.