Prefix Sum in Ruby, Genomic Range Query from Codility

Question

I'm currently going through some lessons on Codility. I've just spent a couple of hours with GenomicRangeQuery, which is intended to demonstrate the use of prefix sums.

The task description is here. Broadly, the idea is that given a string and a set of arbitrary cut points one should be able to return the lowest value character in a given slice. O(N*M) complexity is what we want to avoid.

My solution scores 100%, but I would still value any feedback on style and readability.

def solution(str, slice_starts, slice_ends)
    # Initiate an array to hold rolling totals of character occurrences
    prefix_sums = [0,0,0,0]]
    # Map possible characters to their positions in the prefix sum array
    chars = { "A" => 0, "C" => 1, "G" => 2, "T" => 3 }

    str.split('').each_with_index do |char, i|
        prefix_sums[i + 1] = prefix_sums[i].clone
        prefix_sums[i + 1][chars[char]] += 1
    end

    slice_starts.each_with_index.map do |slice_start, i|
        s = slice_start
        e = slice_ends[i]

        chars.each do |char, ii|
            occurrences = prefix_sums[e + 1][ii] - prefix_sums[s][ii]
            break (ii + 1) if occurrences > 0
        end
    end
end

Updated

Here's my current preferred version using tips from some of the answers below. I should probably also use an array of hashes, rather than arrays, for the prefix sums, but I'm lazy.

def solution(str, slice_starts, slice_ends)
  prefix_sums = [[0] * 4]
  chars = { "A" => 0, "C" => 1, "G" => 2, "T" => 3 }

  str.chars.each_with_index do |char, i|
      prefix_sums[i + 1] = prefix_sums[i].clone
      prefix_sums[i + 1][chars[char]] += 1
  end

  slice_starts.zip(slice_ends).map do |s, e|
    chars.each do |char, ii|
      occurrences = prefix_sums[e + 1][ii] - prefix_sums[s][ii]
      break (ii + 1) if occurrences > 0
    end
  end
end

tokland · Answer 1 · 2015-08-20 12:56:23Z

Some notes:

Use 2-space indentation.
[[0,0,0,0]]. Always a space after a comma.
str.split('') - >str.chars
prefix_sums = [[0,0,0,0]]. This kind of implicit structures hinder readability, I'd simply use a hash. A small space penalty (not in big-Oh, though), better readibility.
each_with_index. Use zip instead.
As you say, the problem is about getting partial sums (frequencies). You can implement and use a very generic abstraction (Enumerable#scanl), which can be used in more codility problems.

I'd write this purely functional solution:

def solution(input_str, slice_start, slice_end)
  impact_factors = {"A" => 1, "C" => 2, "G" => 3, "T" => 4}
  frequencies = input_str.chars.scanl(Hash.new(0)) do |freq, n|
    freq.merge(n => freq[n] + 1)
  end

  slice_start.zip(slice_end).map do |from, to|
    difference_count_by_factor = frequencies[to+1].map do |n, count| 
      [impact_factors[n], count - frequencies[from][nucleotide]]
    end.to_h
    difference_count_by_factor.reject { |factor, count| count.zero? }.keys.min
  end
end

This is a great answer, thanks. My only reservation is implementing Enumerable#scanl might confuse other Rubyists, especially those like me without a functional background. I like your solution, but from my experience it perhaps doesn't feel like the most idiomatic Ruby. — Dae, Aug 26 '15 at 15:23

Jonah · Answer 2 · 2015-10-25 05:07:28Z

up vote 1 down vote

Here's a take on the algorithm you used that takes advantage of more ruby sugar. Explanation and comments are inline:

def solution(s, p, q)
  impacts = {'A'=>1, 'C'=>2, 'G'=>3, 'T'=>4}
  initial = {'A'=>0, 'C'=>0, 'G'=>0, 'T'=>0}

  # Same as your prefix_sums
  # But using hashes to store the nucleotide counts at each index
  counts_at = s.chars.each_with_object({}).with_index do |(c,m), i|
    m[i] = (m[i-1] || initial).clone.tap {|x| x.update(c => x[c] + 1)}
  end

  # Find the necleotides guaranteed to exist in each subsequence
  # Transform them into impacts, and choose the min
  p.zip(q).map do |from, to|
    impacts.keys.reject {|dna| counts_at[from][dna] == counts_at[to][dna]}  # dna keys whose count has changed must appear
                .concat([s[from]])                                          # the first point must appear too
                .map {|dna| impacts[dna]}                                   # turn it into impacts
                .min                                                        # select the min
  end
end

s = 'CAGCCTA'
p = [2,5,0]
q = [4,5,6]

p solution(s, p, q) #=> [2, 4, 1]

answered Oct 25 '15 at 5:07

Jonah

2,432413

Is #tap necessary in the prefix sum calculation step? I think counts_at may be a better variable name. And I like using hashes for the counts. I think I prefer my method of breaking early out of the #zip though. Fractionally faster and no less readable. – Dae Oct 25 '15 at 8:56

1

You can avoid tap if you do it in two lines, where the first line ends after clone, and the 2nd line is m[i][c] += 1, which many might consider more readable, though it's a more "procedural" style. re break, you're right it's slightly faster, but i think it is slightly less readable vs a purely functional, declarative solution, but that's partly a matter of taste. – Jonah Oct 25 '15 at 12:57

I do like xrthdsf's solution a lot too. – Jonah Oct 25 '15 at 13:02

add a comment |

cafe_ · Answer 3 · 2015-10-25 12:45:09Z

def solution(str, slice_starts, slice_ends)
  scores = { 'A' => 1, 'C' => 2, 'G' => 3, 'T' => 4 }
  occurences = scores.keys.map { |k| [k, [0]*(str.size+1)] }.to_h
  str.chars.each_with_index do |c, i| 
    scores.keys.each { |n| occurences[n][i+1] = occurences[n][i] + ((n == c) ? 1 : 0) }
  end
  slice_starts.zip(slice_ends).map do |from, to|
    scores.keys.each { |n| break(scores[n]) if occurences[n][to+1] - occurences[n][from] > 0 }
  end
end

By the way, I'd use double quotes only in string interpolations.

asked	1 year ago
viewed	183 times
active	1 year ago

current community

your communities

more stack exchange communities

Prefix Sum in Ruby, Genomic Range Query from Codility

Updated

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged beginner algorithm ruby programming-challenge bioinformatics or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Prefix Sum in Ruby, Genomic Range Query from Codility

Updated

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged beginner algorithm ruby programming-challenge bioinformatics or ask your own question.

Related

Hot Network Questions