Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I am working on a Ruby gem to pull data from the Wikidata API. The data is deeply nested, as it contains a lot of repetition.

I have written a piece of Ruby (sans Rails) code that enables an array of hashes to be merged together, with any duplicate key-value pairs being stored as an array under the same key.

For example:

[{:a=>1, :b=>2}, {:b=>3, :c=>4}, {:d=>5, :a=>6}].merge_hashes

gives:

{:a=>[1, 6], :b=>[2, 3], :c=>4, :d=>5}

I began with the following:

class Array
  def merge_hashes
    raise StandardError.new "Array is not an Array of Hashes" unless self.all? {|e| e.is_a? Hash}
    self.each_with_object({}) do |el, h|
      el.each do |k, v|
        if h[k].nil?
          h[k] = v
        else
          h[k] = Array.new([h[k]]) << v
          h[k] = h[k].flatten
        end
      end
    end
  end
end

And refactored to:

class Array
def merge_hashes
    raise StandardError.new "Array is not an Array of Hashes" unless self.all? {|e| e.is_a? Hash}
    self.each_with_object({}) { |el, h| el.each { |k, v| h[k].nil? ? h[k] = v : h[k] = (Array.new([h[k]]) << v).flatten } }
  end 
end

Are there any other improvements I could make? Is that an acceptable exception to throw?

share|improve this question
up vote 0 down vote accepted

I've tried to simplify assignment condition (splat operator is used instead of #flatten):

class Array
  def merge_hashes_alt
    # ...
    self.each_with_object({}) do |el, h|
      el.each { |k, v| h[k] = h[k] ? [*h[k]] << v : v }
    end
  end
end
share|improve this answer
    
I think this is good, because it doesn't check for nil? explicitly and only has one h[k] = assignment. In addition, it gets rid of the Array.new() and the flatten, as you mentioned. – Bradley Marques yesterday

I'd say the refactor is barely readable, the first version is better. Anyway, if you use functional abstractions instead of doing a imperative processing from scratch, algorithms are more clear. In this case, it's a pity Ruby hasn't got Enumerable#map_by (a group_by variation where you can control what to accumulate), then you could write:

require 'facets'

class Array
  def merge_hashes
    flat_map(&:to_a).map_by { |k, v| [k, v] }
  end 
end

Note that this always return an array as value, instead of the scalar/list you have. Having mixed types in data structures is usually a bad idea: every time you have to work with a value, you need to check whether it's an array or not, not nice.

share|improve this answer
    
Thanks for the answer and for the link to facets; I didn't know about it. – Bradley Marques yesterday

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.