Newest &#39;web-scraping&#39; Questions

3

votes

5answers

92 views

Finding the occurrences of all words in movie scripts

I was wondering if someone could tell me things I could improve in this code. This is one of my first Python projects. This program gets the script of a movie (in this case Interstellar) and then ...

asked Dec 17 at 3:00

user49487
683

3

votes

0answers

38 views

Scraping efficiently with mechanize and bs4

I have written some code that scrapes data on asteroids, but the problem is that is super slow! I understand that it has a lot to scrape, but as of now it has been running for 5 days and is bot even a ...

python performance datetime web-scraping beautiful-soup

asked Dec 15 at 18:30

Tazza2092
162

0

votes

1answer

46 views

Program to create list of all English Wikipedia articles

This program will scrape Wikipedia to create a list of all English Wikipedia articles. How can I improve this program as it currently performs very badly performance-wise? On my Internet connection ...

python html windows web-scraping

asked Nov 20 at 19:37

Dominik Schmidt
62

7

votes

3answers

103 views

RateBeer.com scraper

This was largely an exercise in making my code more Pythonic, especially in catching errors and doing things the right way. I opted to make the PageNotFound ...

python error-handling library web-scraping beautiful-soup

asked Nov 14 at 22:28

AFL
383

2

votes

1answer

261 views

Refactoring a Crawler

I've recently ported an old project and made it object-oriented. However, I've noticed that rubocop points out the following status: ...

ruby http web-scraping

asked Nov 14 at 1:51

Lucien Lachance
328110

1

vote

1answer

133 views

Utilization of Steam APIs and web-scraping

Some background info here: This is a small fun project I made utilizing Steam APIs and web-scraping This is the first time I've ever used Python, so I'm not very familiar with the language I used ...

python optimization beginner web-scraping flask

asked Oct 28 at 0:52

Vishwa Iyer
1064

5

votes

1answer

89 views

Getting information of countries out of a website that isn't using consistent verbiage

From this website I needed to grab the information for each country and insert it into an Excel spreadsheet. My original plan was to use my program and search each website for the text and later ...

c# strings excel web-scraping

asked Oct 27 at 17:14

Funlamb
1057

2

votes

0answers

18 views

Compressing a blog into a preview using tumblr_api_read

Here is what I have currently working. I would like to make it look more aesthetically pleasing, so not finish words in mid word. Also not have the two previews be so much larger than the other. ...

javascript html json api web-scraping

asked Oct 14 at 21:47

user3052990
111

1

vote

1answer

85 views

Crawl multiple pages at once

This an update to my last question. I want to process multiple pages at once pulling URLs from tier_list in the crawl_web ...

python optimization multithreading web-scraping breadth-first-search

asked Oct 4 at 23:52

Ralph
685

3

votes

3answers

164 views

Implementing a POC Async Web Crawler

I've created a small proof of concept web crawler to learn more about asynchrony in .NET. Currently when run it crawls stack overflow with a fixed number of current requests (workers). I was ...

c# design-patterns asynchronous web-scraping

asked Oct 3 at 14:09

CountZero
1385

1

vote

2answers

111 views

Basic search engine

I want to improve efficiency of this search engine. It works in about 10 seconds for a search depth of 1, but 4 minutes at 2 etc. I tried to give straightforward comments and variable names, any ...

python optimization python-2.7 web-scraping breadth-first-search

asked Oct 2 at 10:18

Ralph
685

2

votes

2answers

139 views

Phone Number Extracting using RegEx And HtmlAgilityPack

I've written this whole code to extract cell numbers from a website. It is extracting numbers perfectly but very slowly, and it's also hanging my Form while Extracting. ...

performance regex vb.net web-scraping

asked Sep 14 at 16:27

Shehryar Iqbal
111

1

vote

0answers

60 views

Using URLs and RegEx for web scraper from a dictionary [closed]

I have dozens of functions which GET/POST to some URLs and extract data using RegEx. The URLs and regular expressions were hard-coded earlier but now I moved all of them to a dictionary. I then saw ...

python regex dictionary url web-scraping

asked Sep 10 at 16:30

Zuck
1574

5

votes

0answers

228 views

Clojure core.async web crawler

I'm currently a beginner with clojure and I thought I'd try building a web crawler with core.async. What I have works, but I am looking for feedback on the following points: How can I avoid using ...

asynchronous clojure web-scraping

asked Sep 6 at 18:38

Jamie
762

2

votes

1answer

88 views

Web scraper running extremely slow

I am making my first web scraper in Python. It works great but it runs extremely slow. The website loads in about 10ms but it only does like 1 every couple of seconds. There are about 4-6 million ...

python web-scraping beautiful-soup

asked Sep 5 at 23:13

Skye
133

2

votes

0answers

85 views

Rails app that scrapes forum using Nokogiri gem

I've built a website that scrapes a guitar forum's pages and populates Rails model. I'm using rake task along with heroku scheduler to run background scrapes every hour. On the homepage, the forum ads ...

ruby-on-rails exception-handling cache web-scraping

asked Aug 28 at 22:03

Jamie B
1055

2

votes

1answer

131 views

Getting rid of certain HTML tags

This code simply returns a small section of HTML code and then gets rid of all tags except for break tags. It seems inefficient because you cannot search and replace with a beautiful soup object as ...

python beginner strings web-scraping beautiful-soup

asked Aug 23 at 10:20

ElioRubens
184

10

votes

2answers

779 views

Scrape an HTML table with python

I think I'm on the right track, but ANY suggestions or critques are welcome. The program just scrapes an HTML table and prints it to stdout. ...

python web-scraping

asked Aug 22 at 4:26

Creek
1806

2

votes

0answers

198 views

Scraping HTML using PHP

Because a website I need data from doesn't have any API or RSS feed for their service status, I use a Web Scraper I built using PHP to grab the data I need and structure it as JSON. However I want to ...

php web-scraping

asked Aug 7 at 5:11

ecnepsnai
1113

2

votes

1answer

29 views

Find and select image files from webpage

For some reason, I feel like this is a bit messy and could be cleaner. Any suggestions? I'm selecting any image files ending in .png or ...

ruby web-scraping

asked Jul 31 at 6:19

Lucien Lachance
328110

13

votes

2answers

183 views

Nokogiri crawler

The following code works but is a mess. But being totally new to Ruby I have had big problems trying to refactor it into something resembling clean OOP code. Could you help with this and explain what ...

beginner oop ruby sql web-scraping

asked Jul 4 at 6:45

Dave Gordon
1686

2

votes

1answer

38 views

Cheat Code Scraper

During breaks, I find myself playing Emerald version a lot and was tired of having to use the school's slow wifi to access the internet. I wrote a scraper to obtain cheat codes and send them to my psp ...

regex bash web-scraping

asked Jul 2 at 15:57

Lucien Lachance
328110

5

votes

3answers

79 views

Clean up repeated file.writes, if/elses when adding keys to a dict

I'm getting familiar with python and I'm still learning it's tricks and idioms. Is there an better way to implement print_html() without the multiple calls to ...

python html web-scraping

asked Jun 27 at 22:19

Creek
1806

3

votes

1answer

48 views

Node PSP ISO Scraper

I recently bought a PSP and wanted to know the best ISO files and wrote a scraper to retrieve games ISOs titles that received a high rating and send them to a csv. Any recommendations as to ...

javascript node.js web-scraping

asked Jun 23 at 16:58

Lucien Lachance
328110

7

votes

1answer

133 views

Improved minimal webcrawler - why is it so slow?

I recently made a webcrawler that I submitted here for a review: Minimal webcrawler - bad structure and error handling? With that help, I've made a much cleaner and better(?) webcrawler. The only ...

python http web-scraping

asked Jun 19 at 23:41

bjornasm
955

10

votes

2answers

882 views

Spliterator implementation

I'm trying to post a little tutorial on the new Spliterator class. There are many tutorials these days on using stream starting from a standard Java collection, but ...

java stream url web-scraping

asked May 29 at 21:49

trapo
1538

4

votes

1answer

670 views

Web Crawler in Java

I've written a working web crawler in Java that finds the frequencies of words on web pages. I have two issues with it. The organization of my code in WebCrawler.java is terrible. Is there a way I ...

java web-scraping

asked May 22 at 0:17

Kyranstar
3897

5

votes

1answer

75 views

Reverse-engineering with Filepicker API

I have this script to pull data out of the Filepicker API internal. It's mostly reverse-engineering and the code seems to be ugly to me. How can this be improved? ...

ruby ruby-on-rails curl web-scraping

asked May 15 at 18:39

bl0b
1287

2

votes

0answers

92 views

Parsing a website

Following is the code I wrote to download the information of different items in a page. I have one main website which has links to different items. I parse this main page to get the list. This is ...

python parsing error-handling logging web-scraping

asked May 14 at 18:44

Pranav Raj
3179

1

vote

0answers

71 views

scraping and saving using Arrays or Objects

I'm using Anemone to Spider a website, I am then using a set of rules specific to that website, to find certain parameters. I feel like it's simple enough, but any attempt I make to save the ...

ruby array ruby-on-rails web-scraping

asked May 13 at 15:51

David Sigley
1062

11

votes

3answers

840 views

Minimal webcrawler - bad structure and error handling?

I did this code over one day as a part of a job application, where they wanted me to make a minimal webcrawler in any language. The purpose was to crawl a site, find all of the URLs on that page, and ...

python web-scraping

asked May 11 at 20:16

bjornasm
955

1

vote

2answers

302 views

Number of Google search results over a period of time, saved to database

I am writing a Python script that scrapes data from Google search results and stores it in a database. I couldn't find any Google API for this, so I am just sending a HTTP GET request on Google's main ...

python mongodb web-scraping beautiful-soup pymongo

asked May 3 at 15:47

avi
378210

8

votes

1answer

143 views

Optimize web-scraping of Moscow grocery website

This code works fine, but I believe it has optimization problems. Please review this. Also, please keep in mind that it stops after each iteration of the loop ...

php mysqli curl geospatial web-scraping

asked Apr 29 at 23:10

Mubin
435

17

votes

2answers

437 views

We'll be counting stars

Lately, I've been, I've been losing sleep Dreaming about the things that we could be But baby, I've been, I've been praying hard, Said, no more counting dollars We'll be counting stars, yeah we'll be ...

python python3 stackexchange web-scraping beautiful-soup

asked Apr 26 at 19:14

Simon André Forsberg
27.3k560171

2

votes

1answer

71 views

Scraping thefreedictionary.com

Scrape results from thefreedictionary.com ...

python python-2.7 dictionary web-scraping

asked Apr 13 at 5:26

Ricky Wilson
1305

4

votes

1answer

978 views

A simple little Python web crawler

The crawler is in need of a mechanism that will dispatch threads based on network latency and system load. How does one keep track of network latency in Python without using system tools like ping? ...

python multithreading http web-scraping

asked Apr 12 at 9:16

Ricky Wilson
1305

2

votes

0answers

111 views

Prototype spider for indexing RSS feeds

This code is super slow. I'm looking for advice on how to improve its performance. ...

python performance web-scraping rss

asked Apr 6 at 17:48

Ricky Wilson
1305

3

votes

1answer

144 views

Crawling for emails on websites given by Google API

I'm trying to build an app which crawls a website to find the emails that it has and prints them. I also want to allow the user to type "false" into the console when they want to skip the website ...

ruby http email web-scraping

asked Apr 2 at 22:25

Bula
1554

10

votes

1answer

273 views

Is this the Clojure way to web-scrape a book cover image?

Is there a way to write this better or more Clojure way? Especially the last part with with-open and the let. Should I put the ...

clojure web-scraping

asked Mar 20 at 13:47

Fu86
1535

5

votes

1answer

3k views

Getting data correctly from <span> tag with beautifulsoup and regex

I am scraping an online shop page, trying to get the price mentioned in that page. In the following block the price is mentioned: ...

python regex unicode web-scraping beautiful-soup

asked Feb 2 at 7:53

avi
378210

7

votes

1answer

305 views

AngularJs and Google Bot experiment

I have learned the question of solving Angular app optimization for search engines, and was frustrated that the most recommended option is prerendering HTML. After some time spent, I suggested to ...

javascript optimization angular.js web-scraping

asked Jan 29 at 13:42

Saike
7315

6

votes

3answers

165 views

HTTP scraper not clean and straightforwardly coded?

A job application of mine has been declined because the test project I submitted was not coded in a clean and straightforward way. Fine, but that's all the feedback I got. Since I like to ...

mvc objective-c interview-questions ios web-scraping

asked Jan 9 at 14:25

Mathijs
1333

2

votes

2answers

1k views

Scraping HTML using Beautiful Soup

I have written a script using Beautiful Soup to scrape some HTML and do some stuff and produce HTML back. However, I am not convinced with my code and I am looking for some improvements. Structure of ...

python html web-scraping beautiful-soup

asked Sep 9 '13 at 12:05

avi
378210

1

vote

1answer

600 views

Script taking too long for curl request

The below script takes the list of provided URLs and scrapes the present links in each URL and for each scraped link Facebook ...

php performance php5 url web-scraping

asked Jul 29 '13 at 11:11

curious_coder
1062

5

votes

2answers

270 views

Spreadsheet function that gives the number of Google indexed pages

I've developed this spreadsheet in order to scrape a website's number of indexed pages through Google and Google Spreadsheets. I'm not a developer, so how can I improve this code in order to have ...

javascript web-scraping google-apps-script google-sheets

asked Jul 22 '13 at 8:21

Kuhm Cyril
305

4

votes

1answer

345 views

Craigslist search-across-regions script

I'm a JavaScript developer. I'm pretty sure that will be immediately apparent in the below code if for no other reason than the level/depth of chaining that I'm comfortable with. However, I'm learning ...

oop ruby multithreading http web-scraping

asked Jun 27 '13 at 6:35

nathanhammond
232

3

votes

4answers

4k views

Download an image from a webpage

I am trying to write a Python script that download an image from a webpage. On the webpage (I am using NASA's picture of the day page), a new picture is posted everyday, with different file names. ...

python beginner parsing windows web-scraping

asked Mar 12 '13 at 0:08

Cici
11613

1

vote

1answer

222 views

URL and source page scraper

The code does seem a bit repetitive in places such as the parenturlscraper module and the childurlscraper module. Does anyone ...

python url web-scraping

asked Feb 13 '13 at 14:36

thefragileomen
1283

4

votes

1answer

224 views

Web scraper for job listings

Is there any room for improvement on this code? I use mechanize to get the links of a job listing web site. There are pages with pagination (when jobs > 25) and pages without. If there is, then the ...

optimization ruby web-scraping

asked Aug 11 '12 at 10:38

Cacofonix
211

2

votes

2answers

2k views

Download image links posted to reddit.com

This is a Python script to save imgur pictures posted to reddit.com forums. I'm looking for an assessment on the design of this script and any web security issues that might exist. Obvious ...

python image web-scraping

asked Jul 23 '12 at 19:11

Levu WebWorks
113

your communities

Tagged Questions

Related Tags