Optimizing sum_combination_for(n) code

Question

I'm working on a piece of code for calculating all the possible combinations of numbers that sum to n. For example:

sum_combination(3) = [
  [1,1,1],
  [1,2],
  [3]
]

So my ruby code is this:

class Integer
  SUM_COMBINATION = {
    1 => [[1]],
    2 => [[1,1],[2]],
    3 => [[1,1,1],[1,2],[3]]
  }

  def sum_combination
    SUM_COMBINATION[self] ||= (
      (self-1).downto((self/2.0).ceil).map do |n|
        n.sum_combination.map { |c| (c+[self-n]).sort }
      end.flatten(1).uniq + [[self]]
    )
  end
end

The code works, but it's insanely slow for numbers above 50, meaning I never had enough patience to watch it finish :(...

tokland · Accepted Answer · 2013-04-20 09:29:32Z

Notes:

Using uniq in a combinatorics problem is most likely a code smell. It means that you are generating more elements that you need (waste of space and time) and later you have to remove them (waste of more time).
Taking paper and pencil and drawing some trees, the natural way seems to be passing down which is the maximum subtraction value allowed, that way you won't repeat values in different orders.
I wouldn't extend Integer for a method like this, it does not seem a general enough abstraction.
Note that map + flatten(1) -> flat_map. This method has been around since Ruby 1.9.2 but for some reason it's still pretty much ignored.

I'd write (also in functional style, as your code):

module Combinatorics
  def self.sets_with_sum(value, max = value)
    if value == 0
      [[]]
    else
      1.upto(max).flat_map do |n|
        sets_with_sum(value - n, [value, n].min).map do |ns|
          [n] + ns
        end
      end
    end 
  end
end

p Combinatorics.sets_with_sum(4)
p Combinatorics.sets_with_sum(50).size

Note that I don't memoize anything yet the performance seems acceptable:

$ time ruby sum-combination.rb
[[4], [3, 1], [2, 2], [2, 1, 1], [1, 1, 1, 1]]
204226

real    0m8.416s

Kyle · Answer 2 · 2013-04-16 20:30:22Z

I don't have a specific solution for you yet, but I can point you in the right direction.

I threw your code into irb and ran it through RubyProf. RubyProf will tell you where the bottlenecks in your code are and help you optimize it.

I copied your code into IRB, then ran the following script:

require 'ruby-prof'

result = RubyProf.profile do
  20.sum_combination
end

printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT)

The output is:

 %self      total      self      wait     child     calls  name
 37.03      0.160     0.096     0.000     0.064    15390   Array#hash
 24.57      0.064     0.064     0.000     0.000   104648   Kernel#hash
 12.37      0.055     0.032     0.000     0.023    11754   Array#eql?
  8.71      0.023     0.023     0.000     0.000    36146   Numeric#eql?
  6.74      0.233     0.018     0.000     0.216       17   Array#uniq
  6.31      0.026     0.016     0.000     0.009       98   Array#map
  3.59      0.009     0.009     0.000     0.000     7695   Array#sort
  0.16      0.000     0.000     0.000     0.000      636   Fixnum#==
  0.15      0.000     0.000     0.000     0.000       17   Array#flatten
  0.04      0.260     0.000     0.000     0.260        1   Object#irb_binding
  0.02      0.260     0.000     0.000     0.260       99  *Integer#sum_combination
  0.01      0.000     0.000     0.000     0.000       17   Float#ceil
  0.01      0.000     0.000     0.000     0.000       17   Hash#[]=
  0.01      0.000     0.000     0.000     0.000       17   Fixnum#/
  0.01      0.200     0.000     0.000     0.200       34  *Integer#downto
  0.00      0.200     0.000     0.000     0.200       17  *Enumerator#each
  0.00      0.200     0.000     0.000     0.200       17  *Enumerable#map

As you can see, the vast majority of your time is spent in #hash and #eql?, which are called by sort, uniq, and possibly flatten. That suggests that finding ways to avoid uniquifying the data until you've completed the algorithm would be much more efficient.

If you can refactor your code to avoid calling sort, flatten, and uniq on every run, that would speed things up significantly.

I am not sure profiling is very useful to analyze an algorithm. I mean, the most efficient implementation could have a similar profile for all we know, these are only relative times. In general algorithms require a more mathematical approach to see what's going on. Though indeed is some cases it may give a hint where the problem lies.

normalocity · Answer 3 · 2013-04-16 20:33:19Z

up vote 2 down vote

Unless you need the entire set of combinations immediately, you could use a lazy enumerator, which just gets the next item in the set when asked.

This requires using Ruby 2.0, as it was only recently implemented.

Here's a great article about it iterating over an infinite range.

If using Ruby 2.0 isn't an option, then you're screwed. Your problem is inherently exponentially more costly as the value of n increases. Unlike the coin/money problem where the number of possible coins (8) never increases, in the problem you're proposing both the number of output combinations and the number of integers that can go into making them increases infinitely as you increase the value of n.

edited Apr 16 at 20:33

answered Apr 16 at 20:25

normalocity
31116

It turns out it's not exponentially, it grows much slower (it would require some analysis to determine exactly how it grows, but its does not seem trivial). – tokland Apr 18 at 12:44

I don't think the number of unique combinations ([1, 1, 2] and [2, 1, 1] being treated as non-unique) may not grow exponentially, but the number of possible arrangements of integers may (where the previous example is taken as two possible combinations instead of one) - it's an interesting question. If, in the process of calculating all possible unique combinations you have to calculate the non-unique ones as well, that's a ton of wasted time. In that scenario most of the time spent would be wasted. Maybe there's a simple away to avoid this duplication. – normalocity Apr 18 at 21:11

1

If we take permutations instead of combinations indeed we have exponential cases (2^(N-1)). But there is no need to do that, check my answer, you simply have to pass down what's the maximum value you are allowed to subtract in that branch. – tokland Apr 20 at 9:31

asked	1 month ago
viewed	84 times
active	1 month ago

Optimizing sum_combination_for(n) code

3 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged ruby or ask your own question.

Optimizing sum_combination_for(n) code

3 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged ruby or ask your own question.

Related