Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I need to concatenate chunks from two files:

if I needed concatenate whole files, I could simply do

cat file1 file2 > output

But I need to skip first 1MB from the first file, and I only want 10 MB from the second file. Sounds like a job for dd.

dd if=file1 bs=1M count=99 skip=1 of=temp1
dd if=file2 bs=1M count=10 of=temp2
cat temp1 temp2 > final_output

Is there a possibility to do this in one step? ie, without the need to save the intermediate results? Can I use multiple input files in dd ?

share|improve this question

dd can write to stdout too.

( dd if=file1 bs=1M count=99 skip=1
  dd if=file2 bs=1M count=10  ) > final_output
share|improve this answer
    
This is probably the best way. The output file isn't closed/reopened (like it is with oflag=append conv=notrunc), so filesystems that do delayed allocation (like XFS) are least likely to decide the file is done being written when there's still more to go. – Peter Cordes yesterday
    
@PeterCordes that's a good point, but as long as dd isn't asked to sync, delayed allocation shouldn't kick in immediately anyway (unless memory is tight in which case neither method will postpone allocation). – Stephen Kitt yesterday
    
@StephenKitt: You're probably right. I was thinking of XFS's speculative preallocation, where it does need to specially detect the close/reopen access pattern (sometimes seen for log files). – Peter Cordes yesterday
3  
In shells like bash and mksh that don't optimize out the fork for the last command in a subshell, you can make it slightly more efficient by replacing the subshell with a command group. For other shells, it shouldn't matter, and the subshell approach might even be slightly more efficient as the shell doesn't need to save and restore stdout. – Stéphane Chazelas yesterday

I don't think you can easily read multiple files in a single dd invocation, but you can append to build the output file in several steps:

dd if=file1 bs=1M count=99 skip=1 of=final_output
dd if=file2 bs=1M count=10 of=final_output oflag=append conv=notrunc

You need to specify both conv=notrunc and oflag=append. The first avoids truncating the output, the second starts writing from the end of the existing file.

share|improve this answer

Bear in mind that dd is a raw interface to the read(), write() and lseek() system call. You can only use it reliably to extract chunks of data off regular files, block devices and some character devices (like /dev/urandom), that is files for which read(buf, size) is guaranteed to return size as long as the end of the file is not reached.

For pipes, sockets and most character devices (like ttys), you have no such guarantee unless you do read()s of size 1, or use the GNU dd extension iflag=fullblock.

So either:

{
  gdd < file1 bs=1M iflag=fullblock count=99 skip=1
  gdd < file2 bs=1M iflag=fullblock count=10
} > final_output

Or:

M=1048576
{
  dd < file1 bs=1 count="$((99*M))" skip="$M"
  dd < file2 bs=1 count="$((10*M))"
} > final_output

Or with shells with builtin support for a seek operator like ksh93:

M=1048576
{
  command /opt/ast/bin/head -c "$((99*M))" < file1 <#((M))
  command /opt/ast/bin/head -c "$((10*M))" < file2
}

Or zsh (assuming your head supports the -c option here):

zmodload zsh/system &&
{
  sysseek 1048576 && head -c 99M &&
  head -c 10M < file2
} < file1 > final_output
share|improve this answer
    
Do you really need the quotes? Wont the result always be an integer? – Steven Penny yesterday
    
@StevenPenny, leaving the expansion unquoted is asking the shell to split+glob it which wouldn't make any sense here. The split part being done on the current value of $IFS. That's irrespective of the content of the variable/expansion. See also Security implications of forgetting to quote a variable in bash/POSIX shells – Stéphane Chazelas 17 hours ago
    
@Stéphane Chazelas - in the first example, you are using gdd instead of dd. Is that a typo, or is that intentional ? – Martin Vegter 16 hours ago

With a bashism, and a functionally "useless use of cat", but closest to the syntax the OP uses:

cat <(dd if=file1 bs=1M count=99 skip=1) \
    <(dd if=file2 bs=1M count=10) \
   > final_output

(That being said, Stephen Kitt's answer seems to be the most efficient possible method.)

share|improve this answer
3  
Strictly speaking, <(...) is a kshism which both zsh and bash copied. – Stéphane Chazelas yesterday

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.