Web scraping is the use of a program to simulate human interaction with a web server or to extract specific information from a web page.
3
votes
0answers
30 views
Web scraping from the Google Play store
I am using this R function to web scrape data from the google play store. Is there a way to increase its efficiency using R? This code takes about 4 seconds for 14 urls with my machine/internet ...
4
votes
1answer
26 views
Webscraping Bing wallpapers
I wanted to scrape all the wallpapers from the Bing wallpaper gallery. This was for personal use and to learn about webscraping. The gallery progressively gets images using javascript as the user ...
9
votes
2answers
78 views
Financial Data From Webqueries in Excel
I'm new (to CR and to programming in general). I wrote my first VBA on Monday. This is my first working project. It Takes a bunch of financial data from a company called Financial Analytics and a ...
2
votes
1answer
44 views
Scraping a table from Texas Dept. of Criminal Justice website
First, I'll start off with my script:
...
5
votes
2answers
81 views
Downloading stock information from Yahoo! Finance
The program downloads stock information from Yahoo! Finance and displays it in the spreadsheet. On my Mac the program takes 10 minutes to get data for approximately 4000 stocks and on the PC it takes ...
4
votes
1answer
119 views
2
votes
0answers
21 views
OOP Web scraper using regex to grab tag contents
I'm about learning about implementing the solid principle in PHP.
I want to create simple content crawler/grabber from some websites. This crawler will grab the content from the website url. Since we ...
3
votes
2answers
50 views
Google News scraper to fetch links with similar stories
The following code takes either a URL or the title to an existing news article.
Searches Google News using the title.
Collects all links from search results.
...
9
votes
1answer
352 views
Soup of the day: best served during election season
Community moderator elections on the Stack Exchange network are really exciting.
Alas, on the page of the primaries, I find it mildly annoying that candidates are randomly reordered on every page ...
8
votes
2answers
535 views
Web crawler that filters out non diseases
It is very messy and I lack the experience to make it eloquent so I wanted some help with it. The process time is also very slow.
Currently it goes into the first page and goes through the links in ...
3
votes
3answers
140 views
Web crawler uses lots of memory
I am developing a web crawler application. When I run the program for more than 3 hours, the program runs out of memory. I should run the program for more that 2-3 days non-stop to get the results I ...
6
votes
1answer
58 views
Wikipedia Table Scraper
I created this small script to strip the data out of tables that have hyperlinks as their <th /> elements. I was hoping to get input on code clarity and ...
5
votes
2answers
152 views
Script that parses the Chase.com page for my recent payments
I wrote this code earlier to parse Chase.com's online transactions page. It's written in WinForms.
stepBtn is a button that starts this.
...
2
votes
2answers
85 views
Scraper for words from Wiktionary
I wrote this code in Java using the Jaunt library. The program scrapes all words from Wiktionary from category "English_uncountable_nouns". And after save
each world to text file.
I am not sure that ...
5
votes
1answer
142 views
YouTube Search Result Scraper
This is a program I wrote in Python using the BeautifulSoup library. The program scrapes YouTube search results for a given query and extracts data from the channels returned in the search results.
...
2
votes
0answers
145 views
Web Scraping with Python + asyncio
I've been working at speeding up my web scraping with the asyncio library. I have a working solution, but am unsure as to how pythonic it is or if I am properly using the library. Any input would be ...
3
votes
0answers
44 views
Using Nokogiri to scrape Oscars winners from Wikipedia
I am scraping a Wikipedia page, getting info from that page and instantiating a new object with the information collected:
...
1
vote
0answers
49 views
BeautifulSoup web spider for driver links
The following spider will grab some driver links, OS version, and the name.
All the info is in a table class, but some pages might be a little different in the location and Number of cells in each ...
1
vote
1answer
61 views
Formatting HTML for use in a locally hosted iframe
This formats HTML for use in a locally hosted iframe so that you can manipulate the content in the iframe freely, without running into cross domain issues. It uses Goutte to retrieve the HTML. I'd ...
7
votes
1answer
82 views
Parsing a Wikipedia page for a country
The program should accept a name of a country as input. It should then parse the Wikipedia page for that country and find all links to the wikipedia pages of other countries on that page and make a ...
2
votes
1answer
108 views
Web-scraping Reddit Bot
I have been working on a web-scraping Reddit bot in Python 2.7 with the premise of going to /r/eve (a game sub-reddit) finding posts that contain a link to a website hosting killmail information ...
7
votes
1answer
184 views
Scraping scores from flashscore.com
I built a bot with Python to scrap scores on flashscore.com but the data scrap from the site loads into its listbox very slowly. I am curious about the speed of selenium so I made a button that prints ...
1
vote
0answers
133 views
Using BeautifulSoup to scrape various tables and combine in a .csv file
A page contains a table of links, each link contains a table relevant to the link (a subject). Create a list of these links to pass to the function called ...
1
vote
1answer
89 views
College web-scraping LinkedIn once off test
I'm trying to use ruby LinkedIn scraper for a college project where I have to demonstrate web scraping on 20 names for their name, title etc. I've never used Ruby before but this gem seems reasonably ...
1
vote
1answer
33 views
Extracts marks of all students in class from website
This code extracts marks of all students in class and stores the result in a file results.txt, using BeautifulSoup. I'm looking for code review and suggestions.
...
2
votes
1answer
299 views
Download stock data from Yahoo Finance
This Python 3.4 script downloads stock data and puts it into an Excel file.
...
1
vote
0answers
96 views
Web scraping using CasperJS
Here's a few line written for web scraping using CasperJS. The code does what it should, but how can I improve it? For example, how can I make it more reusable? It could be also nice if I could remove ...
8
votes
1answer
195 views
Scraping my CS teacher's website, then emailing me when the site is updated
I've been working on creating an individual final project for my python CS class that checks my teacher's website on a daily basis and determines if he's changed any of the web pages on his website ...
4
votes
0answers
145 views
Crawling and parsing meteorological data from the web into R
I am interested in collecting directly into R data published by the Mexican Met-office. The data pieces are spread through several URLs, but one can start here. There I can get the names and ...
4
votes
2answers
623 views
Amazon web scraper
I am trying to improve my programming and programming design skills (poor at the moment). I created a small Amazon scraper program. It is a working program. I would be very grateful if you could ...
2
votes
2answers
106 views
Web-scraper for a larger program
I have a web scraper that I use in a part of a larger program. However, I feel like I semi-repeat my code a lot and take up a lot of room. Is there any way I can condense this code?
...
2
votes
1answer
67 views
Scraping through product pages
I'm working through a scraping function where pages of results lead to product pages. I've added a default maximum number of results pages, and pages per set of results, to prevent a simple mistake ...
4
votes
2answers
183 views
Press any login button on any site
I'm working on a script that will be able to press the login button on any site for an app I'm working on. I have it working (still a few edge cases to work out such as multiple submit buttons and ...
6
votes
2answers
490 views
Pure Python script that saves an HTML page with all images
Here is a pure Python script that saves an HTML page without CSS but with all images on it and replaces all hrefs with a path of an image on the hard drive.
I know that there are great libraries like ...
4
votes
3answers
164 views
Searching for a string in a downloaded PDF
This code goes to the website containing the PDF, downloads the PDF, then it converts this PDF to text. Finally, it reads this whole file (Over 5000 lines) into a list, line by line, and searches for ...
4
votes
3answers
58 views
Displaying sorted results of a web crawl
The issue I have with this class is that most of the methods are almost the same. I would like for this code to be more pythonic.
Note: I plan on replacing all the ...
4
votes
2answers
459 views
Trivago hotels price checker
I've decided to write my first project in Python. I would like to hear some opinion from you.
Description of the script:
Generate Trivago URLs for 5 star hotels in specified city.
Scrap these URLs ...
5
votes
1answer
49 views
Print the list of winter bash 2014 hats as a list of checkboxes in GFM format
In Winter Bash 2014,
since there is no easy way to see the hats I'm missing per site,
I decided to use Gists for that.
A perhaps not so well-known feature of GitHub Flavered Markdown (GFM) format ...
6
votes
2answers
191 views
Retrieving stock prices
It takes around 5-8 seconds for me to retrieve a previously-closed stock price and a dividend rate from US Yahoo! Finance. If I wanted to retrieve 10+ stock prices, it would take me more than a minute ...
5
votes
5answers
228 views
Finding the occurrences of all words in movie scripts
I was wondering if someone could tell me things I could improve in this code. This is one of my first Python projects. This program gets the script of a movie (in this case Interstellar) and then ...
5
votes
1answer
246 views
Scraping efficiently with mechanize and bs4
I have written some code that scrapes data on asteroids, but the problem is that is super slow! I understand that it has a lot to scrape, but as of now it has been running for 5 days and is bot even a ...
0
votes
1answer
80 views
Program to create list of all English Wikipedia articles
This program will scrape Wikipedia to create a list of all English Wikipedia articles.
How can I improve this program as it currently performs very badly performance-wise? On my Internet connection ...
7
votes
3answers
199 views
RateBeer.com scraper
This was largely an exercise in making my code more Pythonic, especially in catching errors and doing things the right way.
I opted to make the PageNotFound ...
6
votes
1answer
4k views
Refactoring a Crawler
I've recently ported an old project and made it object-oriented. However, I've noticed that rubocop points out the following status: ...
1
vote
1answer
475 views
Utilization of Steam APIs and web-scraping
Some background info here:
This is a small fun project I made utilizing Steam APIs and web-scraping
This is the first time I've ever used Python, so I'm not very familiar with the language
I used ...
5
votes
1answer
98 views
Getting information of countries out of a website that isn't using consistent verbiage
From this website I needed to grab the information for each country and insert it into an Excel spreadsheet.
My original plan was to use my program and search each website for the text and later ...
2
votes
0answers
73 views
Compressing a blog into a preview using tumblr_api_read
Here is what I have currently working. I would like to make it look more aesthetically pleasing, so not finish words in mid word. Also not have the two previews be so much larger than the other.
...
1
vote
1answer
458 views
Crawl multiple pages at once
This an update to my last question.
I want to process multiple pages at once pulling URLs from tier_list in the crawl_web ...
3
votes
3answers
765 views
Implementing a POC Async Web Crawler
I've created a small proof of concept web crawler to learn more about asynchrony in .NET.
Currently when run it crawls stack overflow with a fixed number of current requests (workers).
I was ...
1
vote
2answers
192 views
Basic search engine
I want to improve efficiency of this search engine. It works in about 10 seconds for a search depth of 1, but 4 minutes at 2 etc.
I tried to give straightforward comments and variable names, any ...