I had a task for a programmer position which I failed. I am a newbie programmer and I accept that. The only problem is that employer never told me what the actual problem with the code is. So maybe community will be able to give few hints on how to improve it. The task was to write a code which would parse a given webpage, fetch all the images and save it to given directory. Webpage address and directory are command line parameters. Performance was a critical issue for this task. Here is the code
require 'open-uri'
require 'nokogiri'
class Grab
def runner(url, path)
threads = []
doc = Nokogiri::HTML(open("http://#{url}"))
img_srcs = doc.css('img').map{ |i| i['src'] }.uniq
img_srcs = rel_to_abs(url, img_srcs)
img_srcs.each do |img_src|
threads << Thread.new(img_src) do
name = img_src.match(/^http:\/\/.*\/(.*)$/)[1]
image = fetch img_src
save(image, name, path)
end
end
threads.each{ |thread| thread.join }
end
def fetch(img_src)
puts "Fetching #{img_src}\n"
image = open(img_src)
end
def save(image, name, path)
File.open("#{path}/#{name}", "wb"){ |file| file.write(image.read) }
end
def rel_to_abs(url, img_srcs)
img_srcs.each_with_index do |img_src, index|
img_srcs[index] = "http://#{url}/#{img_src}" unless img_src.match(/http:\/\//)
end
img_srcs
end
end