Newest &#39;web-scraping&#39; Questions

12

votes

2answers

71 views

Nokogiri crawler

The following code works but is a mess. But being totally new to Ruby I have had big problems trying to refactor it into something resembling clean OOP code. Could you help with this and explain what ...

asked 2 days ago

Dave Gordon
1635

1

vote

1answer

25 views

Cheat Code Scraper

During breaks, I find myself playing Emerald version a lot and was tired of having to use the school's slow wifi to access the internet. I wrote a scraper to obtain cheat codes and send them to my psp ...

regex bash web-scraping

asked Jul 2 at 15:57

Lucien Lachance
2659

3

votes

1answer

45 views

Clean up repeated file.writes, if/elses when adding keys to a dict

I'm getting familiar with python and I'm still learning it's tricks and idioms. Is there an better way to implement print_html() without the multiple calls to ...

python html web-scraping

asked Jun 27 at 22:19

Creek
1184

3

votes

1answer

31 views

Node PSP ISO Scraper

I recently bought a PSP and wanted to know the best ISO files and wrote a scraper to retrieve games ISOs titles that received a high rating and send them to a csv. Any recommendations as to ...

javascript node.js web-scraping

asked Jun 23 at 16:58

Lucien Lachance
2659

7

votes

1answer

96 views

Improved minimal webcrawler - why is it so slow?

I recently made a webcrawler that I submitted here for a review: Minimal webcrawler - bad structure and error handling? With that help, I've made a much cleaner and better(?) webcrawler. The only ...

python http web-scraping

asked Jun 19 at 23:41

bjornasm
955

10

votes

2answers

172 views

Spliterator implementation

I'm trying to post a little tutorial on the new Spliterator class. There are many tutorials these days on using stream starting from a standard Java collection, but ...

java stream url web-scraping

asked May 29 at 21:49

trapo
1537

4

votes

1answer

86 views

Web Crawler in Java

I've written a working web crawler in Java that finds the frequencies of words on web pages. I have two issues with it. The organization of my code in WebCrawler.java is terrible. Is there a way I ...

java web-scraping

asked May 22 at 0:17

Kyranstar
3106

5

votes

1answer

59 views

Reverse-engineering with Filepicker API

I have this script to pull data out of the Filepicker API internal. It's mostly reverse-engineering and the code seems to be ugly to me. How can this be improved? ...

ruby ruby-on-rails curl web-scraping

asked May 15 at 18:39

bl0b
1187

2

votes

0answers

68 views

Parsing a website

Following is the code I wrote to download the information of different items in a page. I have one main website which has links to different items. I parse this main page to get the list. This is ...

python parsing error-handling logging web-scraping

asked May 14 at 18:44

Pranav Raj
3029

1

vote

0answers

42 views

scraping and saving using Arrays or Objects

I'm using Anemone to Spider a website, I am then using a set of rules specific to that website, to find certain parameters. I feel like it's simple enough, but any attempt I make to save the ...

ruby array ruby-on-rails web-scraping

asked May 13 at 15:51

David Sigley
1062

11

votes

3answers

761 views

Minimal webcrawler - bad structure and error handling?

I did this code over one day as a part of a job application, where they wanted me to make a minimal webcrawler in any language. The purpose was to crawl a site, find all of the URLs on that page, and ...

python web-scraping

asked May 11 at 20:16

bjornasm
955

1

vote

0answers

52 views

Optimize web-scraping of Moscow grocery website

This code works fine, but I believe it has optimization problems. Please review this. Also, please keep in mind that it stops after each iteration of the loop ...

php mysqli curl geospatial web-scraping

asked Apr 29 at 23:10

Mubin
114

16

votes

2answers

178 views

We'll be counting stars

Lately, I've been, I've been losing sleep Dreaming about the things that we could be But baby, I've been, I've been praying hard, Said, no more counting dollars We'll be counting stars, yeah we'll be ...

python python3 stackexchange web-scraping beautiful-soup

asked Apr 26 at 19:14

Simon André Forsberg
17.6k337116

2

votes

1answer

53 views

Scraping thefreedictionary.com

Scrape results from thefreedictionary.com ...

python python-2.7 dictionary web-scraping

asked Apr 13 at 5:26

Ricky Wilson
1305

4

votes

1answer

318 views

A simple little Python web crawler

The crawler is in need of a mechanism that will dispatch threads based on network latency and system load. How does one keep track of network latency in Python without using system tools like ping? ...

python multithreading http web-scraping

asked Apr 12 at 9:16

Ricky Wilson
1305

2

votes

0answers

73 views

Prototype spider for indexing RSS feeds

This code is super slow. I'm looking for advice on how to improve its performance. ...

python performance web-scraping rss

asked Apr 6 at 17:48

Ricky Wilson
1305

10

votes

1answer

100 views

Is this the Clojure way to web-scrape a book cover image?

Is there a way to write this better or more Clojure way? Especially the last part with with-open and the let. Should I put the ...

clojure web-scraping

asked Mar 20 at 13:47

Fu86
1535

5

votes

1answer

1k views

Getting data correctly from <span> tag with beautifulsoup and regex

I am scraping an online shop page, trying to get the price mentioned in that page. In the following block the price is mentioned: ...

python regex unicode web-scraping beautiful-soup

asked Feb 2 at 7:53

avi
3129

6

votes

3answers

140 views

HTTP scraper not clean and straightforwardly coded?

A job application of mine has been declined because the test project I submitted was not coded in a clean and straightforward way. Fine, but that's all the feedback I got. Since I like to ...

mvc objective-c interview-questions ios web-scraping

asked Jan 9 at 14:25

Mathijs
1332

1

vote

0answers

415 views

Script taking too long for curl request

The below script takes the list of provided url's and scrapes the present links in each url and for each scraped link fb share, ...

php performance php5 web-scraping

asked Jul 29 '13 at 11:11

curious_coder
1062

1

vote

1answer

185 views

URL and source page scraper

The code does seem a bit repetitive in places such as the parenturlscraper module and the childurlscraper module. Does anyone ...

python url web-scraping

asked Feb 13 '13 at 14:36

thefragileomen
1233

4

votes

1answer

204 views

Web scraper for job listings

Is there any room for improvement on this code? I use mechanize to get the links of a job listing web site. There are pages with pagination (when jobs > 25) and pages without. If there is, then the ...

optimization ruby web-scraping

asked Aug 11 '12 at 10:38

Cacofonix
211

2

votes

2answers

3k views

Beautifulsoup scraper for sport events

I've written a simple scraper that parses HTML using BeautifulSoup and collects the data (schedule of sports events), then clubs them together in a list of dicts. The code works just fine, but the ...

python web-scraping beautiful-soup

asked Jul 12 '12 at 17:43

Manish Gill
1313

2

votes

2answers

152 views

HNews “ask section” page scraping Python script

Here is a small script I wrote to get the HNews ask section and display them without using a web browser. I'm just looking for feedback on how to improve my style/coding logic/overall code. ...

python web-scraping

asked Dec 13 '11 at 23:41

Greg Brown
1253

current community

your communities

more stack exchange communities

Tagged Questions

Nokogiri crawler

Cheat Code Scraper

Clean up repeated file.writes, if/elses when adding keys to a dict

Node PSP ISO Scraper

Improved minimal webcrawler - why is it so slow?

Spliterator implementation

Web Crawler in Java

Reverse-engineering with Filepicker API

Parsing a website

scraping and saving using Arrays or Objects

Minimal webcrawler - bad structure and error handling?

Optimize web-scraping of Moscow grocery website

We'll be counting stars

Scraping thefreedictionary.com

A simple little Python web crawler

Prototype spider for indexing RSS feeds

Is this the Clojure way to web-scrape a book cover image?

Getting data correctly from <span> tag with beautifulsoup and regex

HTTP scraper not clean and straightforwardly coded?

Script taking too long for curl request

URL and source page scraper

Web scraper for job listings

Beautifulsoup scraper for sport events

HNews “ask section” page scraping Python script

Hot Network Questions

your communities

Tagged Questions

Related Tags