Web scraping is the use of a program to simulate human interaction with a web server or to extract specific information from a web page.

learn more… | top users | synonyms

12
votes
2answers
71 views

Nokogiri crawler

The following code works but is a mess. But being totally new to Ruby I have had big problems trying to refactor it into something resembling clean OOP code. Could you help with this and explain what ...
1
vote
1answer
25 views

Cheat Code Scraper

During breaks, I find myself playing Emerald version a lot and was tired of having to use the school's slow wifi to access the internet. I wrote a scraper to obtain cheat codes and send them to my psp ...
3
votes
1answer
45 views

Clean up repeated file.writes, if/elses when adding keys to a dict

I'm getting familiar with python and I'm still learning it's tricks and idioms. Is there an better way to implement print_html() without the multiple calls to ...
3
votes
1answer
31 views

Node PSP ISO Scraper

I recently bought a PSP and wanted to know the best ISO files and wrote a scraper to retrieve games ISOs titles that received a high rating and send them to a csv. Any recommendations as to ...
7
votes
1answer
96 views

Improved minimal webcrawler - why is it so slow?

I recently made a webcrawler that I submitted here for a review: Minimal webcrawler - bad structure and error handling? With that help, I've made a much cleaner and better(?) webcrawler. The only ...
10
votes
2answers
172 views

Spliterator implementation

I'm trying to post a little tutorial on the new Spliterator class. There are many tutorials these days on using stream starting from a standard Java collection, but ...
4
votes
1answer
86 views

Web Crawler in Java

I've written a working web crawler in Java that finds the frequencies of words on web pages. I have two issues with it. The organization of my code in WebCrawler.java is terrible. Is there a way I ...
5
votes
1answer
59 views

Reverse-engineering with Filepicker API

I have this script to pull data out of the Filepicker API internal. It's mostly reverse-engineering and the code seems to be ugly to me. How can this be improved? ...
2
votes
0answers
68 views

Parsing a website

Following is the code I wrote to download the information of different items in a page. I have one main website which has links to different items. I parse this main page to get the list. This is ...
1
vote
0answers
42 views

scraping and saving using Arrays or Objects

I'm using Anemone to Spider a website, I am then using a set of rules specific to that website, to find certain parameters. I feel like it's simple enough, but any attempt I make to save the ...
11
votes
3answers
761 views

Minimal webcrawler - bad structure and error handling?

I did this code over one day as a part of a job application, where they wanted me to make a minimal webcrawler in any language. The purpose was to crawl a site, find all of the URLs on that page, and ...
1
vote
0answers
52 views

Optimize web-scraping of Moscow grocery website

This code works fine, but I believe it has optimization problems. Please review this. Also, please keep in mind that it stops after each iteration of the loop ...
16
votes
2answers
178 views

We'll be counting stars

Lately, I've been, I've been losing sleep Dreaming about the things that we could be But baby, I've been, I've been praying hard, Said, no more counting dollars We'll be counting stars, yeah we'll be ...
2
votes
1answer
53 views

Scraping thefreedictionary.com

Scrape results from thefreedictionary.com ...
4
votes
1answer
318 views

A simple little Python web crawler

The crawler is in need of a mechanism that will dispatch threads based on network latency and system load. How does one keep track of network latency in Python without using system tools like ping? ...
2
votes
0answers
73 views

Prototype spider for indexing RSS feeds

This code is super slow. I'm looking for advice on how to improve its performance. ...
10
votes
1answer
100 views

Is this the Clojure way to web-scrape a book cover image?

Is there a way to write this better or more Clojure way? Especially the last part with with-open and the let. Should I put the ...
5
votes
1answer
1k views

Getting data correctly from <span> tag with beautifulsoup and regex

I am scraping an online shop page, trying to get the price mentioned in that page. In the following block the price is mentioned: ...
6
votes
3answers
140 views

HTTP scraper not clean and straightforwardly coded?

A job application of mine has been declined because the test project I submitted was not coded in a clean and straightforward way. Fine, but that's all the feedback I got. Since I like to ...
1
vote
0answers
415 views

Script taking too long for curl request

The below script takes the list of provided url's and scrapes the present links in each url and for each scraped link fb share, ...
1
vote
1answer
185 views

URL and source page scraper

The code does seem a bit repetitive in places such as the parenturlscraper module and the childurlscraper module. Does anyone ...
4
votes
1answer
204 views

Web scraper for job listings

Is there any room for improvement on this code? I use mechanize to get the links of a job listing web site. There are pages with pagination (when jobs > 25) and pages without. If there is, then the ...
2
votes
2answers
3k views

Beautifulsoup scraper for sport events

I've written a simple scraper that parses HTML using BeautifulSoup and collects the data (schedule of sports events), then clubs them together in a list of dicts. The code works just fine, but the ...
2
votes
2answers
152 views

HNews “ask section” page scraping Python script

Here is a small script I wrote to get the HNews ask section and display them without using a web browser. I'm just looking for feedback on how to improve my style/coding logic/overall code. ...