Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I came across the question about creating memory-efficient Ruby pipe class with lazy evaluation. Some code is given to effectively create a pipeline of commands using lazy enumerators. I have been messing round with it and have implemented a tee command shown below. I feel like there is a better way to do it though.

class Pipe

  class Tee

    def initialize(pipe)
      @pipe = pipe
      @buffer = []
      @pipe.source = self.to_enum
    end

    def <<(item)
      @buffer << item
      # puts "Added '#{item}' to buffer: #{@buffer.inspect}"
    end

    def to_enum
      Enumerator.new do |yielder|
        item = @buffer.shift
        yielder << item if item
      end.lazy
    end

  end

  attr_writer :source

  def initialize
    @commands = []
    @source = nil
  end

  def add(command, opts = {})
    @commands << [command, opts]
    self
  end

  def run
    enum = @source

    @commands.each do |command, options|
      enum = method(command).call(enum, options)
    end

    enum.each {}

    enum
  end

  def cat(enum, options)
    Enumerator.new do |yielder|
      enum.map { |line| yielder << line } if enum

      options[:input].tap do |ios|
        ios.each { |line| yielder << line }
      end
    end.lazy
  end

  def out(enum, options)
    Enumerator.new do |yielder|
      enum.each do |line|
        puts line
        yielder << line
      end
    end.lazy
  end

  def cut(enum, options)
    Enumerator.new do |yielder|
      enum.each do |line|
        fields = line.chomp.split(%r{#{options[:delimiter]}})

        yielder << fields[options[:field]]
      end
    end.lazy
  end

  def upcase(enum, options)
    Enumerator.new do |yielder|
      enum.each do |line|
        yielder << line.upcase
      end
    end
  end

  def tee(enum, options)

    teed = Tee.new(options.fetch(:other))
    Enumerator.new do |yielder|
      enum.each do |line|
        yielder << line
        teed << line
      end
    end
  end

  def grep(enum, options)
    Enumerator.new do |yielder|
      enum.each do |line|
        yielder << line if line.match(options[:pattern])
      end
    end.lazy
  end

end

another_pipe = Pipe.new
another_pipe.add(:grep, pattern: /testing/i)
another_pipe.add(:out)

pipe = Pipe.new
pipe.add(:cat, :input => StringIO.new("testing\na new\nline"))
# pipe.add(:grep, pattern: /line/)
pipe.add(:tee, :other => another_pipe)
pipe.add(:upcase)
pipe.add(:out)
pipe.run

puts "================================================="
another_pipe.run

Here's the output:

TESTING
A NEW
LINE
=================================================
testing

Does anyone see a smarter way of doing this?

share|improve this question
    
Is there a reason you don't call Enumerator#lazy in the pipe or tee methods when you do in all of the others? –  Jordan Oct 15 '14 at 20:08
    
@Jordan I think I left it out by mistake. They should probably both use Enumerator#lazy –  rainkinz Oct 15 '14 at 20:18

1 Answer 1

up vote 6 down vote accepted

I know it's an old question, but it's an interesting one.

First, a couple small things:

  1. In Pipe#run:

    enum = method(command).call(enum, options)
    

    Calling method has a lot of overhead, and send(command, enum, options) does the same thing.

  2. Also in Pipe#run:

    enum.each {}
    

    The purpose of this is to cause the final object to be enumerated. We can do the same thing cheaper (even an empty block takes CPU cycles) with enum.to_a, which is also aliased as enum.force.

Those things aside, the first thing I noticed is that there's a lot of repetition. grep and upcase, for example, only differ on one line. We can extract the shared code out into another method, which I'll call enum_lazy:

def upcase(enum, options)
  enum_lazy(enum) do |line, yielder|
    yielder << line.upcase
  end
end

def grep(enum, options)
  enum_lazy(enum) do |line, yielder|
    yielder << line if line.match(options[:pattern])
  end
end

private
def enum_lazy(enum, &block)
  Enumerator.new do |yielder|
    enum.each do |line|
      yield(line, yielder)
    end
  end.lazy
end

...but this made me realize that some of these methods don't need the yielder pattern at all. For example, upcase takes one element at a time, performs an operation on it, and yields one value each time—it's just a lazy map, and this is equivalent:

def upcase(enum, options)
  enum.lazy.map(&:upcase)
end

Since we called lazy, this will return an Enumerator::Lazy as expected. We can simplify cut and out in the same way:

def out(enum, options)
  enum.lazy.map do |line|
    puts line
    line
  end
end

def cut(enum, options)
  enum.lazy.map do |line|
    fields = line.chomp.split(options[:delimiter])
    fields[ options[:field] ]
  end
end

grep is different, because we only don't want output for every input line. As it turns out, though, grep is just a lazy select:

def grep(enum, options)
  enum.lazy.select {|line| line =~ options[:pattern] }
end

That just leaves cat. I have a strong inkling that there's a similar optimization we could do for cat, but I haven't yet been able to actually figure it out. Perhaps someone else can help figure it out.

Here's the final code:

class Pipe
  class Tee
    # (no changes here)
  end

  attr_writer :source

  def initialize
    @commands = []
    @source = nil
  end

  def add(command, opts = {})
    @commands << [command, opts]
    self
  end

  def run
    enum = @source

    @commands.each do |command, options|
      enum = send(command, enum, options)
    end

    enum.force
    enum
  end

  def cat(enum, options)
    Enumerator.new do |yielder|
      enum.map { |line| yielder << line } if enum

      options[:input].tap do |ios|
        ios.each { |line| yielder << line }
      end
    end.lazy
  end

  def out(enum, options)
    enum.lazy.map do |line|
      puts line
      line
    end
  end

  def cut(enum, options)
    enum.lazy.map do |line|
      fields = line.chomp.split(/#{options[:delimiter]}/)
      fields[ options[:field] ]
    end
  end

  def upcase(enum, options)
    enum.lazy.map(&:upcase)
  end

  def tee(enum, options)
    teed = Tee.new(options.fetch(:other))

    enum_lazy(enum) do |line, yielder|
      yielder << line
      teed << line
    end
  end

  def grep(enum, options)
    enum.lazy.select {|line| line =~ options[:pattern] }
  end

  private
  def enum_lazy(enum, &block)
    Enumerator.new do |yielder|
      enum.each do |line|
        yield(line, yielder)
      end
    end.lazy
  end
end
share|improve this answer
    
Since I just remembered that your real question was whether or not your Tee class could be improved... somewhat anti-climactically, I can't think of any obvious improvements. Good work. :) –  Jordan Oct 15 '14 at 22:11
    
Awesome codereview. Thanks! –  rainkinz Oct 16 '14 at 14:46

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.