Code Review Stack Exchange is a question and answer site for peer programmer code reviews. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I'm currently going through some lessons on Codility. I've just spent a couple of hours with GenomicRangeQuery, which is intended to demonstrate the use of prefix sums.

The task description is here. Broadly, the idea is that given a string and a set of arbitrary cut points one should be able to return the lowest value character in a given slice. O(N*M) complexity is what we want to avoid.

My solution scores 100%, but I would still value any feedback on style and readability.

def solution(str, slice_starts, slice_ends)
    # Initiate an array to hold rolling totals of character occurrences
    prefix_sums = [0,0,0,0]]
    # Map possible characters to their positions in the prefix sum array
    chars = { "A" => 0, "C" => 1, "G" => 2, "T" => 3 }

    str.split('').each_with_index do |char, i|
        prefix_sums[i + 1] = prefix_sums[i].clone
        prefix_sums[i + 1][chars[char]] += 1
    end

    slice_starts.each_with_index.map do |slice_start, i|
        s = slice_start
        e = slice_ends[i]

        chars.each do |char, ii|
            occurrences = prefix_sums[e + 1][ii] - prefix_sums[s][ii]
            break (ii + 1) if occurrences > 0
        end
    end
end

Updated

Here's my current preferred version using tips from some of the answers below. I should probably also use an array of hashes, rather than arrays, for the prefix sums, but I'm lazy.

def solution(str, slice_starts, slice_ends)
  prefix_sums = [[0] * 4]
  chars = { "A" => 0, "C" => 1, "G" => 2, "T" => 3 }

  str.chars.each_with_index do |char, i|
      prefix_sums[i + 1] = prefix_sums[i].clone
      prefix_sums[i + 1][chars[char]] += 1
  end

  slice_starts.zip(slice_ends).map do |s, e|
    chars.each do |char, ii|
      occurrences = prefix_sums[e + 1][ii] - prefix_sums[s][ii]
      break (ii + 1) if occurrences > 0
    end
  end
end
share|improve this question

Some notes:

  • Use 2-space indentation.
  • [[0,0,0,0]]. Always a space after a comma.
  • str.split('') - >str.chars
  • prefix_sums = [[0,0,0,0]]. This kind of implicit structures hinder readability, I'd simply use a hash. A small space penalty (not in big-Oh, though), better readibility.
  • each_with_index. Use zip instead.
  • As you say, the problem is about getting partial sums (frequencies). You can implement and use a very generic abstraction (Enumerable#scanl), which can be used in more codility problems.

I'd write this purely functional solution:

def solution(input_str, slice_start, slice_end)
  impact_factors = {"A" => 1, "C" => 2, "G" => 3, "T" => 4}
  frequencies = input_str.chars.scanl(Hash.new(0)) do |freq, n|
    freq.merge(n => freq[n] + 1)
  end

  slice_start.zip(slice_end).map do |from, to|
    difference_count_by_factor = frequencies[to+1].map do |n, count| 
      [impact_factors[n], count - frequencies[from][nucleotide]]
    end.to_h
    difference_count_by_factor.reject { |factor, count| count.zero? }.keys.min
  end
end
share|improve this answer
1  
This is a great answer, thanks. My only reservation is implementing Enumerable#scanl might confuse other Rubyists, especially those like me without a functional background. I like your solution, but from my experience it perhaps doesn't feel like the most idiomatic Ruby. – Dae Aug 26 '15 at 15:23

Here's a take on the algorithm you used that takes advantage of more ruby sugar. Explanation and comments are inline:

def solution(s, p, q)
  impacts = {'A'=>1, 'C'=>2, 'G'=>3, 'T'=>4}
  initial = {'A'=>0, 'C'=>0, 'G'=>0, 'T'=>0}

  # Same as your prefix_sums
  # But using hashes to store the nucleotide counts at each index
  counts_at = s.chars.each_with_object({}).with_index do |(c,m), i|
    m[i] = (m[i-1] || initial).clone.tap {|x| x.update(c => x[c] + 1)}
  end

  # Find the necleotides guaranteed to exist in each subsequence
  # Transform them into impacts, and choose the min
  p.zip(q).map do |from, to|
    impacts.keys.reject {|dna| counts_at[from][dna] == counts_at[to][dna]}  # dna keys whose count has changed must appear
                .concat([s[from]])                                          # the first point must appear too
                .map {|dna| impacts[dna]}                                   # turn it into impacts
                .min                                                        # select the min
  end
end

s = 'CAGCCTA'
p = [2,5,0]
q = [4,5,6]

p solution(s, p, q) #=> [2, 4, 1]
share|improve this answer
    
Is #tap necessary in the prefix sum calculation step? I think counts_at may be a better variable name. And I like using hashes for the counts. I think I prefer my method of breaking early out of the #zip though. Fractionally faster and no less readable. – Dae Oct 25 '15 at 8:56
1  
You can avoid tap if you do it in two lines, where the first line ends after clone, and the 2nd line is m[i][c] += 1, which many might consider more readable, though it's a more "procedural" style. re break, you're right it's slightly faster, but i think it is slightly less readable vs a purely functional, declarative solution, but that's partly a matter of taste. – Jonah Oct 25 '15 at 12:57
    
I do like xrthdsf's solution a lot too. – Jonah Oct 25 '15 at 13:02
def solution(str, slice_starts, slice_ends)
  scores = { 'A' => 1, 'C' => 2, 'G' => 3, 'T' => 4 }
  occurences = scores.keys.map { |k| [k, [0]*(str.size+1)] }.to_h
  str.chars.each_with_index do |c, i| 
    scores.keys.each { |n| occurences[n][i+1] = occurences[n][i] + ((n == c) ? 1 : 0) }
  end
  slice_starts.zip(slice_ends).map do |from, to|
    scores.keys.each { |n| break(scores[n]) if occurences[n][to+1] - occurences[n][from] > 0 }
  end
end

By the way, I'd use double quotes only in string interpolations.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.