Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

HTTP request is made, and a JSON string is returned, which needs to be parsed.
Example response:

{"urlkey": "com,practicingruby)/", "timestamp": "20150420004437", "status": "200", "url": "https://practicingruby.com/", "filename": "common-crawl/crawl-data/CC-MAIN-2015-18/segments/1429246644200.21/warc/CC-MAIN-20150417045724-00242-ip-10-235-10-82.ec2.internal.warc.gz", "length": "9219", "mime": "text/html", "offset": "986953615", "digest": "DOGJXRGCHRUNDTKKJMLYW2UY2BSWCSHX"}
{"urlkey": "com,practicingruby)/", "timestamp": "20150425001851", "status": "200", "url": "https://practicingruby.com/", "filename": "common-crawl/crawl-data/CC-MAIN-2015-18/segments/1429246645538.5/warc/CC-MAIN-20150417045725-00242-ip-10-235-10-82.ec2.internal.warc.gz", "length": "9218", "mime": "text/html", "offset": "935932558", "digest": "LJKP47MYZ2KEEAYWZ4HICSVIHDG7CARQ"}
{"urlkey": "com,practicingruby)/articles/ant-colony-simulation?u=5c7a967f21", "timestamp": "20150421081357", "status": "200", "url": "https://practicingruby.com/articles/ant-colony-simulation?u=5c7a967f21", "filename": "common-crawl/crawl-data/CC-MAIN-2015-18/segments/1429246641054.14/warc/CC-MAIN-20150417045721-00029-ip-10-235-10-82.ec2.internal.warc.gz", "length": "10013", "mime": "text/html", "offset": "966385301", "digest": "AWIR7EJQJCGJYUBWCQBC5UFHCJ2ZNWPQ"}

My code:

result = Net::HTTP.get(URI("http://index.commoncrawl.org/CC-MAIN-2015-18-index?url=#{url}&output=json")).split("}")

result.each do |res|
    break if res == "\n"
    #need to add back braces because we used it to split the various json hashes from the http request
    res << "}"
    to_crawl = JSON.parse(res)
    puts to_crawl
end

It works, but I'm sure there is a much better way to do it, or at least a better way to write the code.

share|improve this question

2 Answers 2

up vote 3 down vote accepted

This body.split('{'}) is doing you a disservice, as it destroys the structure of the response. Split it by lines instead:

body = Net::HTTP.get(...)
data = body.lines.map { |line| JSON.parse(line) }
share|improve this answer

Use faraday

require 'faraday'

conn = Faraday.new("http://index.commoncrawl.org/") do |faraday|
  faraday.request :url_encoded             # form-encode POST params
  faraday.adapter Faraday.default_adapter  # make requests with Net::HTTP
end

response = conn.get("/CC-MAIN-2015-18-index?url=#{url}&output=json")
parsed = JSON.parse(response.body)
share|improve this answer
    
Welcome bogem. I hope you enjoy the site. –  Legato Jul 10 at 7:17
    
@Legato hello. Yes, I really enjoy it. Thanks! –  bogem Jul 10 at 7:26

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.