Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Elsewhere, there was a question about an elegant solution to a problem.

I'm curious whether this solution is elegant from the perspective of easy to understand for a ruby programmer (which I am no longer).

merged_file = File.open("merge_out.txt", "w")

files = ARGV.map { |filename| File.open( filename, "r") }

lines = files.map { |file| file.gets }

while lines.any?
    next_line = lines.compact.min
    file_id = lines.index( next_line )
    merged_file.print next_line
    lines[ file_id ] = files[ file_id ].gets
end

The question is not whether it is efficient, but just "how long does it take a ruby programmer to understand what this does?"

share|improve this question

3 Answers 3

up vote 7 down vote accepted

I had no trouble figuring it out what it does, but I'm not sure I'd call it "easy to understand".

All the declarations are implicit and thus if you're familiar with all the in's and outs of Array and the map method, it's pretty straight forward what the code is attempting to do. Some of the logic takes a bit of thinking about,

I did puzzle a bit over this line

file_id = lines.index( next_line)

I think the most puzzling part is the unstated requirement that the input files are already sorted. Otherwise this line doesn't work. Once I figured out that this line meant that the input files were sorted, the rest fell into place.

The exit condition is also a bit tricky to figure out.

The code really feels like a translation to ruby from a more explicitly functional language, it doesn't feel "ruby-like" at all.

share|improve this answer
  1. Took around a 100 seconds to understand. Would be twice faster if I had a commentary: # this code merges two presorted files.

  2. All is pretty ok, but what you forgot here is to close all opened files. And it is the only reason to have a first variable declared, otherwise I would mix both maps together.

  3. If you pass filenames via ARGV, there should be also the merged_filename as the first or last parameter. Or at any position and optional if you start to use some named options parsing.

So we have this:

files_to_merge = ARGV.map &File.method(:open)
lines = files_to_merge.map &:gets
merged_file = File.open "merge_out.txt", "w"
while lines.any?
    merged_file.print(next_line = lines.compact.min)
    file_id = lines.index next_line
    lines[file_id] = files_to_merge[file_id].gets
end
merged_file.close
files_to_merge.each &:close

But what if try to solve the problem of mindblowing file_id = .index?
You could put file handler and current line together into an Array or Hash, but I don't feel like it makes code better:

files_and_lines = ARGV.map(&File.method(:open)).map{ |file| {file:file, line:file.gets} }
...
loop do
    files_and_lines.select!{ |tuple| tuple[:line] }
    break if files_and_lines.empty?
    next_tuple = files_and_lines.min_by{ |tuple| tuple[:line] }
    merged_file.print next_tuple[:line]
    next_tuple[:line] = next_tuple[:file].gets
end
...

Trying to make it shorter didn't work for me: converting tuples into Hash pair file->line can't easily return file for line and line->file doesn't look better either because to edit a key you have to have some temporal array variable:

files_and_lines = Hash[ ARGV.map(&File.method(:open)).map{ |file| [file.gets, file] } ]
...
loop do
    break if files_and_lines.keep_if{ |line, file| line }.empty?
    array = files_and_lines.to_a.sort_by &:first
    merged_file.print array[0][0]
    array[0][0] = array[0][1].gets
    files_and_lines = Hash[array]
end
...

So just leave Britney alone don't touch it -- .index is ok.

share|improve this answer

It took me about five minutes to figure out, then I found that I got it wrong. At first, I thought that it would take lines from each named file in turn. Then, I thought that it would take the first paragraph from each file, before realizing that an "empty" line still consists of a "\n" and therefore wouldn't get compacted out. In the end, I decided that it's just concatenating all the files. Then I ran the code to discover that it does one round of mergesort.

share|improve this answer
    
This is exactly how I read the code, except that I never ran it and discovered that it ran just one round of merge sort. The mental disconnect for me was seeing those file opens. I expected the code below to deal with the files in their entirety, not just part of them. –  Wayne Conrad Jan 8 '14 at 1:41
    
What does "one round of mergesort" mean? The code is supposed to deal with the files in their entirety... my understanding is that it does. –  GreenAsJade Jan 8 '14 at 3:49
1  
I mean that if each of the files is already individually sorted, then the code in this question lists all of the lines in sorted order. –  200_success Jan 8 '14 at 5:35
    
Ah yes - that is what it indeed does... in their entirety. –  GreenAsJade Jan 8 '14 at 11:07
    
That some of us were so confused about what it does is an indication that, at least for us, the code is too clever. –  Wayne Conrad Jan 9 '14 at 0:35

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.