Take the tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Basically, this code takes a keyword and an array of strings, then sorts them based on two things: the number of characters shared with the key, and the distance between them.(As a side note, would this properly be called a sort, not a search?)

def search(key, list)
  # Format for everything pushed to out:
  # Should this be a class?    
  # { 
  #   val: the string in question, 
  #   shared_chars: [{char: shared char of key list elem, 
  #                   index: index of shared char,
  #                   kindex: index of shared char in key
  #                   distance: how close the shared chars are}]
  # }
  out = []
  key = key.downcase

  list.each { |l|
    l.downcase!
    shared_chars = []

    keyi = 0
    key.each_char { |c|
      index = /#{c}/ =~ l

      if index != nil
        shared_chars.push char: c, index: index, kindex: keyi
      end

      # Inelegant?
      keyi += 1
    }

    # Calulate total distance between shared_chars
    distance = 0
    odistance = 0

    shared_chars.each_index { |i|
      unless i == shared_chars.length-1
        distance += (shared_chars[i+1][:index] - shared_chars[i][:index]).abs
        odistance += (shared_chars[i+1][:kindex] - shared_chars[i][:kindex]).abs 
      end
    }

    distance -= odistance
    distance = distance.abs

    out.push val: l, shared_chars: shared_chars, distance: distance
  }

  out.sort_by! { |e|
    # 100 is arbitrary
    100 + e[:distance] - e[:shared_chars].length*3
  }

  out
end

Specifically I would like to know if any part of it could be made more efficient and/or elegant, but general critique is of course welcome.

share|improve this question
 
You appear to be trying to do fuzzy text matching. Consider using an existing solution, another existing solution, or an alternative metric for string similarity. –  200_success Dec 23 '13 at 19:40
 
@200_success I mostly did this for fun and as an exercise. –  EpicPineapple Dec 23 '13 at 19:42
add comment

1 Answer

up vote 2 down vote accepted
  • Blocks longer than one line are conventionally written using doend rather than braces.
  • Code of the form

    out = []
    list.each { |l| ... out.push(something) }
    

    … would be better expressed as

    out = list.collect { |l| something }
    

    Besides being slightly more compact, there is a subtle difference in thinking: the former feels like "do this, then this, then this to build a result"; the latter says "transform each element like this", and it's easier to see that there is a one-to-one correspondence between input elements and output elements.

  • Within that block,

    out = list.collect do |l|
      # Calling l.downcase! would have the side-effect of modifying
      # strings in the original list, which is impolite.
      l = l.downcase
    
      shared_chars = []
      key.split.each_with_index do |c, keyi|
        index = l.index(c)
        # Actually, char:c is never used and could be eliminated
        shared_chars.push(char: c, index: index, kindex: keyi) if index
      end
    
      distance, kdistance = 0, 0
      shared_chars.each_cons(2) do |a, b|
        distance += (b[:index] - a[:index]).abs
        kdistance += (b[:kindex] - a[:kindex]).abs
      end
    
      { val: l, shared_chars: shared_chars, distance: (distance - odistance).abs }
    end
    
  • Furthermore,

    out.sort_by! { |e| ... }
    out
    

    could just be

    out.sort_by { |e| ... }
    

    (.sort_by is defined because an Array is an Enumerable.)

  • I think that the whole function would be better as one chained expression:

    def search(key, list)
      key = key.downcase
    
      list.collect do |l|
        ...
      end.sort_by do |e|
        # There's no point in adding 100 to each element
        e[:distance] - 3 * e[:shared_chars].length
      end
    end
    
share|improve this answer
 
Excellent answer. Note that the bad indentation only happened when I pasted to stackexchange –  EpicPineapple Dec 23 '13 at 19:16
 
do ... end can make problems because of operator precedence. –  Nakilon Dec 24 '13 at 12:05
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.