Newest &#39;web-scraping&#39; Questions

5

votes

0answers

57 views

Python program that scrapes my CS teacher's website

I am new to programming, and I'm looking forward to seeing what I can do to improve my code. I've been working on creating an individual final project for my python CS class that checks my teacher's ...

asked 2 days ago

cody.codes
1263

4

votes

0answers

29 views

Crawling and parsing meteorological data from the web into R

I am interested in collecting directly into R data published by the Mexican Met-office. The data pieces are spread through several URLs, but one can start here. There I can get the names and ...

parsing web-scraping r

asked Mar 1 at 16:01

MiEquiZ
212

4

votes

2answers

288 views

Amazon web scraper

I am trying to improve my programming and programming design skills (poor at the moment). I created a small Amazon scraper program. It is a working program. I would be very grateful if you could ...

python oop python3 web-scraping

asked Feb 27 at 9:07

Wramana
213

2

votes

2answers

64 views

Web-scraper for a larger program

I have a web scraper that I use in a part of a larger program. However, I feel like I semi-repeat my code a lot and take up a lot of room. Is there any way I can condense this code? ...

python python-2.7 web-scraping

asked Feb 27 at 2:48

Scarecrow
1635

2

votes

1answer

32 views

Scraping through product pages

I'm working through a scraping function where pages of results lead to product pages. I've added a default maximum number of results pages, and pages per set of results, to prevent a simple mistake ...

python web-scraping beautiful-soup

asked Feb 21 at 19:04

GollyJer
1135

4

votes

2answers

90 views

Press any login button on any site

I'm working on a script that will be able to press the login button on any site for an app I'm working on. I have it working (still a few edge cases to work out such as multiple submit buttons and ...

javascript html regex web-scraping webdriver

asked Feb 1 at 18:44

Levi Fuller
1334

3

votes

0answers

76 views

Pure Python script that saves html page with all images

Here is pure Python script that saves html page without CSS but with all images on it and replaces all hrefs with path of image on hard drive. I know that there are great libraries like ...

python parsing python-2.7 web-scraping

asked Jan 27 at 17:29

micgeronimo
1213

4

votes

3answers

52 views

Searching for a string in a downloaded PDF

This code goes to the website containing the PDF, downloads the PDF, then it converts this PDF to text. Finally, it reads this whole file (Over 5000 lines) into a list, line by line, and searches for ...

python url web-scraping child-process pdf

asked Jan 26 at 22:36

Twin802
235

4

votes

3answers

53 views

Displaying sorted results of a web crawl

The issue I have with this class is that most of the methods are almost the same. I would like for this code to be more pythonic. Note: I plan on replacing all the ...

python sorting web-scraping

asked Jan 16 at 17:17

Ricky Wilson
2706

4

votes

2answers

259 views

Trivago hotels price checker

I've decided to write my first project in Python. I would like to hear some opinion from you. Description of the script: Generate Trivago URLs for 5 star hotels in specified city. Scrap these URLs ...

python beginner web-scraping

asked Jan 12 at 15:49

mostaszewski
234

5

votes

1answer

45 views

Print the list of winter bash 2014 hats as a list of checkboxes in GFM format

In Winter Bash 2014, since there is no easy way to see the hats I'm missing per site, I decided to use Gists for that. A perhaps not so well-known feature of GitHub Flavered Markdown (GFM) format ...

python python3 web-scraping beautiful-soup markdown

asked Dec 24 '14 at 18:00

janos
40.8k445180

4

votes

2answers

92 views

Retrieving stock prices

It takes around 5-8 seconds for me to retrieve a previously-closed stock price and a dividend rate from US Yahoo! Finance. If I wanted to retrieve 10+ stock prices, it would take me more than a minute ...

performance vba excel web-scraping internet-explorer

asked Dec 23 '14 at 17:15

pexpex223
233

4

votes

5answers

122 views

Finding the occurrences of all words in movie scripts

I was wondering if someone could tell me things I could improve in this code. This is one of my first Python projects. This program gets the script of a movie (in this case Interstellar) and then ...

python python-2.7 web-scraping beautiful-soup

asked Dec 17 '14 at 3:00

user49487
734

3

votes

0answers

101 views

Scraping efficiently with mechanize and bs4

I have written some code that scrapes data on asteroids, but the problem is that is super slow! I understand that it has a lot to scrape, but as of now it has been running for 5 days and is bot even a ...

python performance datetime web-scraping beautiful-soup

asked Dec 15 '14 at 18:30

Tazza2092
162

0

votes

1answer

60 views

Program to create list of all English Wikipedia articles

This program will scrape Wikipedia to create a list of all English Wikipedia articles. How can I improve this program as it currently performs very badly performance-wise? On my Internet connection ...

python html windows web-scraping

asked Nov 20 '14 at 19:37

Dominik Schmidt
62

7

votes

3answers

130 views

RateBeer.com scraper

This was largely an exercise in making my code more Pythonic, especially in catching errors and doing things the right way. I opted to make the PageNotFound ...

python error-handling library web-scraping beautiful-soup

asked Nov 14 '14 at 22:28

AFL
383

3

votes

1answer

1k views

Refactoring a Crawler

I've recently ported an old project and made it object-oriented. However, I've noticed that rubocop points out the following status: ...

ruby http web-scraping

asked Nov 14 '14 at 1:51

user27606

1

vote

1answer

263 views

Utilization of Steam APIs and web-scraping

Some background info here: This is a small fun project I made utilizing Steam APIs and web-scraping This is the first time I've ever used Python, so I'm not very familiar with the language I used ...

python optimization beginner web-scraping flask

asked Oct 28 '14 at 0:52

Vishwa Iyer
1064

5

votes

1answer

91 views

Getting information of countries out of a website that isn't using consistent verbiage

From this website I needed to grab the information for each country and insert it into an Excel spreadsheet. My original plan was to use my program and search each website for the text and later ...

c# strings excel web-scraping

asked Oct 27 '14 at 17:14

Funlamb
1057

2

votes

0answers

28 views

Compressing a blog into a preview using tumblr_api_read

Here is what I have currently working. I would like to make it look more aesthetically pleasing, so not finish words in mid word. Also not have the two previews be so much larger than the other. ...

javascript html json api web-scraping

asked Oct 14 '14 at 21:47

user3052990
111

1

vote

1answer

209 views

Crawl multiple pages at once

This an update to my last question. I want to process multiple pages at once pulling URLs from tier_list in the crawl_web ...

python optimization multithreading web-scraping breadth-first-search

asked Oct 4 '14 at 23:52

Ralph
685

3

votes

3answers

332 views

Implementing a POC Async Web Crawler

I've created a small proof of concept web crawler to learn more about asynchrony in .NET. Currently when run it crawls stack overflow with a fixed number of current requests (workers). I was ...

c# design-patterns asynchronous web-scraping

asked Oct 3 '14 at 14:09

CountZero
1385

1

vote

2answers

137 views

Basic search engine

I want to improve efficiency of this search engine. It works in about 10 seconds for a search depth of 1, but 4 minutes at 2 etc. I tried to give straightforward comments and variable names, any ...

python optimization python-2.7 web-scraping breadth-first-search

asked Oct 2 '14 at 10:18

Ralph
685

2

votes

2answers

210 views

Phone Number Extracting using RegEx And HtmlAgilityPack

I've written this whole code to extract cell numbers from a website. It is extracting numbers perfectly but very slowly, and it's also hanging my Form while Extracting. ...

performance regex vb.net web-scraping

asked Sep 14 '14 at 16:27

Shehryar Iqbal
111

6

votes

1answer

366 views

Clojure core.async web crawler

I'm currently a beginner with clojure and I thought I'd try building a web crawler with core.async. What I have works, but I am looking for feedback on the following points: How can I avoid using ...

asynchronous clojure web-scraping

asked Sep 6 '14 at 18:38

Jamie
812

2

votes

1answer

144 views

Web scraper running extremely slow

I am making my first web scraper in Python. It works great but it runs extremely slow. The website loads in about 10ms but it only does like 1 every couple of seconds. There are about 4-6 million ...

python web-scraping beautiful-soup

asked Sep 5 '14 at 23:13

Skye
133

2

votes

0answers

133 views

Rails app that scrapes forum using Nokogiri gem

I've built a website that scrapes a guitar forum's pages and populates Rails model. I'm using rake task along with heroku scheduler to run background scrapes every hour. On the homepage, the forum ads ...

ruby-on-rails exception-handling cache web-scraping

asked Aug 28 '14 at 22:03

Jamie B
1055

2

votes

1answer

299 views

Getting rid of certain HTML tags

This code simply returns a small section of HTML code and then gets rid of all tags except for break tags. It seems inefficient because you cannot search and replace with a beautiful soup object as ...

python beginner strings web-scraping beautiful-soup

asked Aug 23 '14 at 10:20

ElioRubens
184

11

votes

2answers

2k views

Scrape an HTML table with python

I think I'm on the right track, but ANY suggestions or critques are welcome. The program just scrapes an HTML table and prints it to stdout. ...

python web-scraping

asked Aug 22 '14 at 4:26

Creek
1857

2

votes

0answers

299 views

Scraping HTML using PHP

Because a website I need data from doesn't have any API or RSS feed for their service status, I use a Web Scraper I built using PHP to grab the data I need and structure it as JSON. However I want to ...

php web-scraping

asked Aug 7 '14 at 5:11

ecnepsnai
1113

5

votes

1answer

2k views

Instagram bot script

I'm very new to Python and would like some feedback on my script. I'm fairly clueless to best practices, code correctness etc. so if there's anything at all that looks wrong, isn't 'pythonic' or could ...

python python-2.7 web-scraping instagram

asked Aug 1 '14 at 23:09

user1274763
285

2

votes

1answer

34 views

Find and select image files from webpage

For some reason, I feel like this is a bit messy and could be cleaner. Any suggestions? I'm selecting any image files ending in .png or ...

ruby web-scraping

asked Jul 31 '14 at 6:19

user27606

13

votes

2answers

238 views

Nokogiri crawler

The following code works but is a mess. But being totally new to Ruby I have had big problems trying to refactor it into something resembling clean OOP code. Could you help with this and explain what ...

beginner oop ruby sql web-scraping

asked Jul 4 '14 at 6:45

Dave Gordon
1686

2

votes

1answer

38 views

Cheat Code Scraper

During breaks, I find myself playing Emerald version a lot and was tired of having to use the school's slow wifi to access the internet. I wrote a scraper to obtain cheat codes and send them to my psp ...

regex bash web-scraping

asked Jul 2 '14 at 15:57

user27606

5

votes

3answers

81 views

Clean up repeated file.writes, if/elses when adding keys to a dict

I'm getting familiar with python and I'm still learning it's tricks and idioms. Is there an better way to implement print_html() without the multiple calls to ...

python html web-scraping

asked Jun 27 '14 at 22:19

Creek
1857

3

votes

1answer

55 views

Node PSP ISO Scraper

I recently bought a PSP and wanted to know the best ISO files and wrote a scraper to retrieve games ISOs titles that received a high rating and send them to a csv. Any recommendations as to ...

javascript node.js web-scraping

asked Jun 23 '14 at 16:58

user27606

7

votes

1answer

152 views

Improved minimal webcrawler - why is it so slow?

I recently made a webcrawler that I submitted here for a review: Minimal webcrawler - bad structure and error handling? With that help, I've made a much cleaner and better(?) webcrawler. The only ...

python http web-scraping

asked Jun 19 '14 at 23:41

bjornasm
1076

10

votes

2answers

1k views

Spliterator implementation

I'm trying to post a little tutorial on the new Spliterator class. There are many tutorials these days on using stream starting from a standard Java collection, but ...

java stream url web-scraping

asked May 29 '14 at 21:49

trapo
1539

4

votes

1answer

1k views

Web Crawler in Java

I've written a working web crawler in Java that finds the frequencies of words on web pages. I have two issues with it. The organization of my code in WebCrawler.java is terrible. Is there a way I ...

java web-scraping

asked May 22 '14 at 0:17

Kyranstar
44918

5

votes

1answer

87 views

Reverse-engineering with Filepicker API

I have this script to pull data out of the Filepicker API internal. It's mostly reverse-engineering and the code seems to be ugly to me. How can this be improved? ...

ruby ruby-on-rails curl web-scraping

asked May 15 '14 at 18:39

bl0b
1287

2

votes

0answers

96 views

Parsing a website

Following is the code I wrote to download the information of different items in a page. I have one main website which has links to different items. I parse this main page to get the list. This is ...

python parsing error-handling logging web-scraping

asked May 14 '14 at 18:44

Pranav Raj
3279

1

vote

0answers

86 views

scraping and saving using Arrays or Objects

I'm using Anemone to Spider a website, I am then using a set of rules specific to that website, to find certain parameters. I feel like it's simple enough, but any attempt I make to save the ...

ruby array ruby-on-rails web-scraping

asked May 13 '14 at 15:51

David Sigley
1062

11

votes

3answers

880 views

Minimal webcrawler - bad structure and error handling?

I did this code over one day as a part of a job application, where they wanted me to make a minimal webcrawler in any language. The purpose was to crawl a site, find all of the URLs on that page, and ...

python web-scraping

asked May 11 '14 at 20:16

bjornasm
1076

1

vote

2answers

462 views

Number of Google search results over a period of time, saved to database

I am writing a Python script that scrapes data from Google search results and stores it in a database. I couldn't find any Google API for this, so I am just sending a HTTP GET request on Google's main ...

python mongodb web-scraping beautiful-soup pymongo

asked May 3 '14 at 15:47

avi
378210

8

votes

1answer

165 views

Optimize web-scraping of Moscow grocery website

This code works fine, but I believe it has optimization problems. Please review this. Also, please keep in mind that it stops after each iteration of the loop ...

php mysqli curl geospatial web-scraping

asked Apr 29 '14 at 23:10

Mubin
1435

26

votes

2answers

536 views

We'll be counting stars

Lately, I've been, I've been losing sleep Dreaming about the things that we could be But baby, I've been, I've been praying hard, Said, no more counting dollars We'll be counting stars, yeah we'll be ...

python python3 stackexchange web-scraping beautiful-soup

asked Apr 26 '14 at 19:14

Simon André Forsberg
31.9k571199

2

votes

1answer

76 views

Scraping thefreedictionary.com

Scrape results from thefreedictionary.com ...

python python-2.7 dictionary web-scraping

asked Apr 13 '14 at 5:26

Ricky Wilson
2706

4

votes

1answer

1k views

A simple little Python web crawler

The crawler is in need of a mechanism that will dispatch threads based on network latency and system load. How does one keep track of network latency in Python without using system tools like ping? ...

python multithreading http web-scraping

asked Apr 12 '14 at 9:16

Ricky Wilson
2706

2

votes

0answers

127 views

Prototype spider for indexing RSS feeds

This code is super slow. I'm looking for advice on how to improve its performance. ...

python performance web-scraping rss

asked Apr 6 '14 at 17:48

Ricky Wilson
2706

3

votes

1answer

162 views

Crawling for emails on websites given by Google API

I'm trying to build an app which crawls a website to find the emails that it has and prints them. I also want to allow the user to type "false" into the console when they want to skip the website ...

ruby http email web-scraping

asked Apr 2 '14 at 22:25

Bula
1604

your communities

Tagged Questions

Related Tags