Newest &#39;web-scraping&#39; Questions

2

votes

0answers

6 views

Compressing a blog into a preview using tumblr_api_read

Here is what I have currently working. I would like to make it look more aesthetically pleasing, so not finish words in mid word. Also not have the two previews be so much larger than the other. ...

asked Oct 14 at 21:47

user3052990
111

1

vote

1answer

41 views

Crawl multiple pages at once

This an update to my last question. I want to process multiple pages at once pulling URLs from tier_list in the crawl_web ...

python optimization multithreading web-scraping breadth-first-search

asked Oct 4 at 23:52

Ralph
685

3

votes

3answers

99 views

Implementing a POC Async Web Crawler

I've created a small proof of concept web crawler to learn more about asynchrony in .NET. Currently when run it crawls stack overflow with a fixed number of current requests (workers). I was ...

c# design-patterns asynchronous web-scraping

asked Oct 3 at 14:09

CountZero
1185

1

vote

2answers

91 views

Basic search engine

I want to improve efficiency of this search engine. It works in about 10 seconds for a search depth of 1, but 4 minutes at 2 etc. I tried to give straightforward comments and variable names, any ...

python optimization python-2.7 web-scraping breadth-first-search

asked Oct 2 at 10:18

Ralph
685

0

votes

1answer

42 views

Improving Watir::Browser for my needs

I want to: use Watir::Browser methods without browser. instance prefix expand abilities of ...

ruby web-scraping

asked Sep 15 at 15:42

Nakilon
664310

2

votes

2answers

76 views

Phone Number Extracting using RegEx And HtmlAgilityPack

I've written this whole code to extract cell numbers from a website. It is extracting numbers perfectly but very slowly, and it's also hanging my Form while Extracting. ...

performance regex vb.net web-scraping

asked Sep 14 at 16:27

Shehryar Iqbal
111

1

vote

0answers

34 views

Using URLs and RegEx for web scraper from a dictionary [closed]

I have dozens of functions which GET/POST to some URLs and extract data using RegEx. The URLs and regular expressions were hard-coded earlier but now I moved all of them to a dictionary. I then saw ...

python regex dictionary url web-scraping

asked Sep 10 at 16:30

Zuck
1384

5

votes

0answers

106 views

Clojure core.async web crawler

I'm currently a beginner with clojure and I thought I'd try building a web crawler with core.async. What I have works, but I am looking for feedback on the following points: How can I avoid using ...

asynchronous clojure web-scraping

asked Sep 6 at 18:38

Jamie
762

2

votes

1answer

56 views

Web scraper running extremely slow

I am making my first web scraper in Python. It works great but it runs extremely slow. The website loads in about 10ms but it only does like 1 every couple of seconds. There are about 4-6 million ...

python web-scraping beautiful-soup

asked Sep 5 at 23:13

Skye
133

2

votes

0answers

55 views

Rails app that scrapes forum using Nokogiri gem

I've built a website that scrapes a guitar forum's pages and populates Rails model. I'm using rake task along with heroku scheduler to run background scrapes every hour. On the homepage, the forum ads ...

ruby-on-rails exception-handling cache web-scraping

asked Aug 28 at 22:03

Jamie B
1055

2

votes

1answer

90 views

Getting rid of certain HTML tags

This code simply returns a small section of HTML code and then gets rid of all tags except for break tags. It seems inefficient because you cannot search and replace with a beautiful soup object as ...

python beginner strings web-scraping beautiful-soup

asked Aug 23 at 10:20

ElioRubens
184

10

votes

2answers

406 views

Scrape an HTML table with python

I think I'm on the right track, but ANY suggestions or critques are welcome. The program just scrapes an HTML table and prints it to stdout. ...

python web-scraping

asked Aug 22 at 4:26

Creek
1806

2

votes

0answers

118 views

Scraping HTML using PHP

Because a website I need data from doesn't have any API or RSS feed for their service status, I use a Web Scraper I built using PHP to grab the data I need and structure it as JSON. However I want to ...

php web-scraping

asked Aug 7 at 5:11

ecnepsnai
1113

2

votes

1answer

25 views

Find and select image files from webpage

For some reason, I feel like this is a bit messy and could be cleaner. Any suggestions? I'm selecting any image files ending in .png or ...

ruby web-scraping

asked Jul 31 at 6:19

Lucien Lachance
294110

13

votes

2answers

162 views

Nokogiri crawler

The following code works but is a mess. But being totally new to Ruby I have had big problems trying to refactor it into something resembling clean OOP code. Could you help with this and explain what ...

beginner oop ruby sql web-scraping

asked Jul 4 at 6:45

Dave Gordon
1686

2

votes

1answer

37 views

Cheat Code Scraper

During breaks, I find myself playing Emerald version a lot and was tired of having to use the school's slow wifi to access the internet. I wrote a scraper to obtain cheat codes and send them to my psp ...

regex bash web-scraping

asked Jul 2 at 15:57

Lucien Lachance
294110

5

votes

3answers

78 views

Clean up repeated file.writes, if/elses when adding keys to a dict

I'm getting familiar with python and I'm still learning it's tricks and idioms. Is there an better way to implement print_html() without the multiple calls to ...

python html web-scraping

asked Jun 27 at 22:19

Creek
1806

3

votes

1answer

46 views

Node PSP ISO Scraper

I recently bought a PSP and wanted to know the best ISO files and wrote a scraper to retrieve games ISOs titles that received a high rating and send them to a csv. Any recommendations as to ...

javascript node.js web-scraping

asked Jun 23 at 16:58

Lucien Lachance
294110

7

votes

1answer

121 views

Improved minimal webcrawler - why is it so slow?

I recently made a webcrawler that I submitted here for a review: Minimal webcrawler - bad structure and error handling? With that help, I've made a much cleaner and better(?) webcrawler. The only ...

python http web-scraping

asked Jun 19 at 23:41

bjornasm
955

10

votes

2answers

577 views

Spliterator implementation

I'm trying to post a little tutorial on the new Spliterator class. There are many tutorials these days on using stream starting from a standard Java collection, but ...

java stream url web-scraping

asked May 29 at 21:49

trapo
1538

4

votes

1answer

388 views

Web Crawler in Java

I've written a working web crawler in Java that finds the frequencies of words on web pages. I have two issues with it. The organization of my code in WebCrawler.java is terrible. Is there a way I ...

java web-scraping

asked May 22 at 0:17

Kyranstar
3376

5

votes

1answer

68 views

Reverse-engineering with Filepicker API

I have this script to pull data out of the Filepicker API internal. It's mostly reverse-engineering and the code seems to be ugly to me. How can this be improved? ...

ruby ruby-on-rails curl web-scraping

asked May 15 at 18:39

bl0b
1187

2

votes

0answers

86 views

Parsing a website

Following is the code I wrote to download the information of different items in a page. I have one main website which has links to different items. I parse this main page to get the list. This is ...

python parsing error-handling logging web-scraping

asked May 14 at 18:44

Pranav Raj
3179

1

vote

0answers

64 views

scraping and saving using Arrays or Objects

I'm using Anemone to Spider a website, I am then using a set of rules specific to that website, to find certain parameters. I feel like it's simple enough, but any attempt I make to save the ...

ruby array ruby-on-rails web-scraping

asked May 13 at 15:51

David Sigley
1062

11

votes

3answers

803 views

Minimal webcrawler - bad structure and error handling?

I did this code over one day as a part of a job application, where they wanted me to make a minimal webcrawler in any language. The purpose was to crawl a site, find all of the URLs on that page, and ...

python web-scraping

asked May 11 at 20:16

bjornasm
955

1

vote

2answers

200 views

Number of Google search results over a period of time, saved to database

I am writing a Python script that scrapes data from Google search results and stores it in a database. I couldn't find any Google API for this, so I am just sending a HTTP GET request on Google's main ...

python mongodb web-scraping beautiful-soup pymongo

asked May 3 at 15:47

avi
368110

7

votes

1answer

117 views

Optimize web-scraping of Moscow grocery website

This code works fine, but I believe it has optimization problems. Please review this. Also, please keep in mind that it stops after each iteration of the loop ...

php mysqli curl geospatial web-scraping

asked Apr 29 at 23:10

Mubin
414

16

votes

2answers

232 views

We'll be counting stars

Lately, I've been, I've been losing sleep Dreaming about the things that we could be But baby, I've been, I've been praying hard, Said, no more counting dollars We'll be counting stars, yeah we'll be ...

python python3 stackexchange web-scraping beautiful-soup

asked Apr 26 at 19:14

Simon André Forsberg
25k449154

2

votes

1answer

65 views

Scraping thefreedictionary.com

Scrape results from thefreedictionary.com ...

python python-2.7 dictionary web-scraping

asked Apr 13 at 5:26

Ricky Wilson
1305

4

votes

1answer

759 views

A simple little Python web crawler

The crawler is in need of a mechanism that will dispatch threads based on network latency and system load. How does one keep track of network latency in Python without using system tools like ping? ...

python multithreading http web-scraping

asked Apr 12 at 9:16

Ricky Wilson
1305

2

votes

0answers

98 views

Prototype spider for indexing RSS feeds

This code is super slow. I'm looking for advice on how to improve its performance. ...

python performance web-scraping rss

asked Apr 6 at 17:48

Ricky Wilson
1305

3

votes

1answer

122 views

Crawling for emails on websites given by Google API

I'm trying to build an app which crawls a website to find the emails that it has and prints them. I also want to allow the user to type "false" into the console when they want to skip the website ...

ruby http email web-scraping

asked Apr 2 at 22:25

Bula
1554

10

votes

1answer

193 views

Is this the Clojure way to web-scrape a book cover image?

Is there a way to write this better or more Clojure way? Especially the last part with with-open and the let. Should I put the ...

clojure web-scraping

asked Mar 20 at 13:47

Fu86
1535

5

votes

1answer

2k views

Getting data correctly from <span> tag with beautifulsoup and regex

I am scraping an online shop page, trying to get the price mentioned in that page. In the following block the price is mentioned: ...

python regex unicode web-scraping beautiful-soup

asked Feb 2 at 7:53

avi
368110

7

votes

1answer

259 views

AngularJs and Google Bot experiment

I have learned the question of solving Angular app optimization for search engines, and was frustrated that the most recommended option is prerendering HTML. After some time spent, I suggested to ...

javascript optimization angular.js web-scraping

asked Jan 29 at 13:42

Saike
7315

6

votes

3answers

159 views

HTTP scraper not clean and straightforwardly coded?

A job application of mine has been declined because the test project I submitted was not coded in a clean and straightforward way. Fine, but that's all the feedback I got. Since I like to ...

mvc objective-c interview-questions ios web-scraping

asked Jan 9 at 14:25

Mathijs
1333

2

votes

2answers

1k views

Scraping HTML using Beautiful Soup

I have written a script using Beautiful Soup to scrape some HTML and do some stuff and produce HTML back. However, I am not convinced with my code and I am looking for some improvements. Structure of ...

python html web-scraping beautiful-soup

asked Sep 9 '13 at 12:05

avi
368110

1

vote

1answer

536 views

Script taking too long for curl request

The below script takes the list of provided URLs and scrapes the present links in each URL and for each scraped link Facebook ...

php performance php5 url web-scraping

asked Jul 29 '13 at 11:11

curious_coder
1062

5

votes

2answers

242 views

Spreadsheet function that gives the number of Google indexed pages

I've developed this spreadsheet in order to scrape a website's number of indexed pages through Google and Google Spreadsheets. I'm not a developer, so how can I improve this code in order to have ...

javascript web-scraping google-apps-script google-sheets

asked Jul 22 '13 at 8:21

Kuhm Cyril
305

4

votes

1answer

339 views

Craigslist search-across-regions script

I'm a JavaScript developer. I'm pretty sure that will be immediately apparent in the below code if for no other reason than the level/depth of chaining that I'm comfortable with. However, I'm learning ...

oop ruby multithreading http web-scraping

asked Jun 27 '13 at 6:35

nathanhammond
232

3

votes

4answers

3k views

Download an image from a webpage

I am trying to write a Python script that download an image from a webpage. On the webpage (I am using NASA's picture of the day page), a new picture is posted everyday, with different file names. ...

python beginner parsing windows web-scraping

asked Mar 12 '13 at 0:08

Cici
11613

1

vote

1answer

209 views

URL and source page scraper

The code does seem a bit repetitive in places such as the parenturlscraper module and the childurlscraper module. Does anyone ...

python url web-scraping

asked Feb 13 '13 at 14:36

thefragileomen
1233

4

votes

1answer

219 views

Web scraper for job listings

Is there any room for improvement on this code? I use mechanize to get the links of a job listing web site. There are pages with pagination (when jobs > 25) and pages without. If there is, then the ...

optimization ruby web-scraping

asked Aug 11 '12 at 10:38

Cacofonix
211

2

votes

2answers

2k views

Download image links posted to reddit.com

This is a Python script to save imgur pictures posted to reddit.com forums. I'm looking for an assessment on the design of this script and any web security issues that might exist. Obvious ...

python image web-scraping

asked Jul 23 '12 at 19:11

Levu WebWorks
113

2

votes

2answers

3k views

Beautifulsoup scraper for sport events

I've written a simple scraper that parses HTML using BeautifulSoup and collects the data (schedule of sports events), then clubs them together in a list of dicts. The code works just fine, but the ...

python web-scraping beautiful-soup

asked Jul 12 '12 at 17:43

Manish Gill
1314

2

votes

2answers

348 views

CR Stack Exchange crawler

I am writing a program which automatically crawls codes from this site! Would you please review my code? The required .jars: jsoup, org.apache.commons.io. Main.java: ...

java beginner web-scraping stackexchange

asked May 7 '12 at 14:34

S.J.
774

2

votes

1answer

270 views

HTML downloader and parser for CR

This program downloads a Code Review HTML file and parses it. Could you review my program? Main.java ...

java html parsing regex web-scraping

asked May 5 '12 at 8:35

S.J.
774

5

votes

2answers

1k views

Slow web-scraping geolocator

How do I make my Python program faster? I have 3 suspects right now for it being so slow: Maybe my computer is just slow Maybe my Internet is too slow (sometimes my program has to download the html ...

python performance io web-scraping compression

asked Dec 24 '11 at 13:39

jacob501
283

2

votes

2answers

161 views

HNews “ask section” page scraping Python script

Here is a small script I wrote to get the HNews ask section and display them without using a web browser. I'm just looking for feedback on how to improve my style/coding logic/overall code. ...

python web-scraping

asked Dec 13 '11 at 23:41

Greg Brown
1454

13

votes

1answer

978 views

Simple xkcd comic downloader

I'd really appreciate some harsh/constructive criticism of what I would consider as my first program in Haskell. The program should download all of the xkcd comics into a folder in the current ...

haskell json http web-scraping

asked May 23 '11 at 1:58

adrian
1464

your communities

Tagged Questions

Related Tags