Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I'm using Anemone to Spider a website, I am then using a set of rules specific to that website, to find certain parameters.

I feel like it's simple enough, but any attempt I make to save the products into arrays looks very messy.

the Rules are different for each site (the script is simply grabbing site 1 from the DB at the moment). rule.name should become the column name.

Any ideas of a good way to store this data (not on a db)? my Array pushing seemed horrible.

So my way of storing goes like this: I have 2 hashes (entity and product) and an array (array). I loop through the rules making an entity which I merge to Product after each successful iteration. I then push Product to Array before moving on to the next page.

As I said.. It seems and feels crappy. I would like to add a Product Model with a method to set variable keys for a hash.. but I'm not certain.

desc "Crawl client site"
task :crawl => :environment do

  require 'anemone'

  @client = Client.find(1)
  @rules = @client.rules
  $i = 0 #just for testing
array = Array.new

  #Set up model/object or array to save the data.

  Anemone.crawl(@client.url) do |anemone|
    anemone.on_every_page do |page|
      #puts page.url
      #Create new instance of object or row of array?
      entity = Hash.new
      product = Hash.new



      product = {url: page.url}

      @client.rules.each do |rule|


       # if page.doc.at_css(rule.rule) != nil || !rule.required? #.text[/[0-9\.]+/]
        if page.doc.xpath(rule.rule) != nil || !rule.required? #.text[/[0-9\.]+/]

          entity[rule.product_attribute.name] = page.doc.xpath(rule.rule).remove

          product.merge!(entity)

        else
          #Not a product Page. Break the rules loop and move on to next page. (also delete current instance)
          product = nil
          break
        end
        #$i+= 1
      end


      if product then array.push(product) end

    end
  end
#puts $i
  puts array
end
share|improve this question
    
Woops, well it is working code. at the moment though it simply displays the page url of product pages. I need it to store the rule data –  David Sigley May 14 '14 at 8:15
    
I've added my solution. It's now complete. I left it out because I think it's substandard... but I understand the rules and therefore have added it back in. –  David Sigley May 14 '14 at 13:25
1  
I've retracted my vote, and deleted my comments. I hope you will have good reviews! –  Marc-Andre May 14 '14 at 13:27

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged or ask your own question.