Newest &#39;web-scraping&#39; Questions

3

votes

0answers

30 views

Web scraping from the Google Play store

I am using this R function to web scrape data from the google play store. Is there a way to increase its efficiency using R? This code takes about 4 seconds for 14 urls with my machine/internet ...

performance r web-scraping

asked 2 days ago

CFM
161

4

votes

1answer

26 views

Webscraping Bing wallpapers

I wanted to scrape all the wallpapers from the Bing wallpaper gallery. This was for personal use and to learn about webscraping. The gallery progressively gets images using javascript as the user ...

python python3 web-scraping

asked Aug 31 at 15:08

amt528
834

9

votes

2answers

78 views

Financial Data From Webqueries in Excel

I'm new (to CR and to programming in general). I wrote my first VBA on Monday. This is my first working project. It Takes a bunch of financial data from a company called Financial Analytics and a ...

beginner vba excel web-scraping

asked Aug 31 at 1:43

BlackHatGuy
788

2

votes

1answer

44 views

Scraping a table from Texas Dept. of Criminal Justice website

First, I'll start off with my script: ...

python web-scraping beautiful-soup

asked Aug 7 at 20:07

Scoutdrago3
556

5

votes

2answers

81 views

Downloading stock information from Yahoo! Finance

The program downloads stock information from Yahoo! Finance and displays it in the spreadsheet. On my Mac the program takes 10 minutes to get data for approximately 4000 stocks and on the PC it takes ...

vba excel finance web-scraping

asked Jul 29 at 19:08

Chris Atkeson
313

4

votes

1answer

119 views

Yellow Pages scraper

What are your thoughts and comments relating to areas I could improve on? ...

python web-scraping

asked Jul 29 at 1:44

Halcyon Abraham Ramirez
233

2

votes

0answers

21 views

OOP Web scraper using regex to grab tag contents

I'm about learning about implementing the solid principle in PHP. I want to create simple content crawler/grabber from some websites. This crawler will grab the content from the website url. Since we ...

oop php5 web-scraping

asked Jul 27 at 23:09

Fatimah Wulandari
211

3

votes

2answers

50 views

Google News scraper to fetch links with similar stories

The following code takes either a URL or the title to an existing news article. Searches Google News using the title. Collects all links from search results. ...

python web-scraping

asked Jul 17 at 14:25

user2573339
183

9

votes

1answer

352 views

Soup of the day: best served during election season

Community moderator elections on the Stack Exchange network are really exciting. Alas, on the page of the primaries, I find it mildly annoying that candidates are randomly reordered on every page ...

python python3 web-scraping stackexchange beautiful-soup

asked Jul 9 at 11:10

janos♦
58.8k560231

8

votes

2answers

535 views

Web crawler that filters out non diseases

It is very messy and I lack the experience to make it eloquent so I wanted some help with it. The process time is also very slow. Currently it goes into the first page and goes through the links in ...

python web-scraping beautiful-soup

asked Jul 6 at 6:16

Kunal Sondhi
432

3

votes

3answers

140 views

Web crawler uses lots of memory

I am developing a web crawler application. When I run the program for more than 3 hours, the program runs out of memory. I should run the program for more that 2-3 days non-stop to get the results I ...

java web-scraping memory-optimization

asked Jun 19 at 21:07

medo0070
185

6

votes

1answer

58 views

Wikipedia Table Scraper

I created this small script to strip the data out of tables that have hyperlinks as their <th /> elements. I was hoping to get input on code clarity and ...

javascript jquery web-scraping

asked Jun 19 at 20:55

Mike
537

5

votes

2answers

152 views

Script that parses the Chase.com page for my recent payments

I wrote this code earlier to parse Chase.com's online transactions page. It's written in WinForms. stepBtn is a button that starts this. ...

c# parsing winforms finance web-scraping

asked Jun 10 at 5:22

Chipperyman
1455

2

votes

2answers

85 views

Scraper for words from Wiktionary

I wrote this code in Java using the Jaunt library. The program scrapes all words from Wiktionary from category "English_uncountable_nouns". And after save each world to text file. I am not sure that ...

java hash-table web-scraping

asked May 29 at 11:24

xtfkpi
393

5

votes

1answer

142 views

YouTube Search Result Scraper

This is a program I wrote in Python using the BeautifulSoup library. The program scrapes YouTube search results for a given query and extracts data from the channels returned in the search results. ...

python web-scraping beautiful-soup youtube

asked May 28 at 0:14

dk1
262

2

votes

0answers

145 views

Web Scraping with Python + asyncio

I've been working at speeding up my web scraping with the asyncio library. I have a working solution, but am unsure as to how pythonic it is or if I am properly using the library. Any input would be ...

python asynchronous web-scraping screen-scraping

asked May 26 at 22:34

Adam Hammes
111

3

votes

0answers

44 views

Using Nokogiri to scrape Oscars winners from Wikipedia

I am scraping a Wikipedia page, getting info from that page and instantiating a new object with the information collected: ...

performance ruby html web-scraping

asked May 24 at 18:33

Cyzanfar
1564

1

vote

0answers

49 views

BeautifulSoup web spider for driver links

The following spider will grab some driver links, OS version, and the name. All the info is in a table class, but some pages might be a little different in the location and Number of cells in each ...

python python3 web-scraping beautiful-soup

asked May 18 at 9:04

Alex Zel
62

1

vote

1answer

61 views

Formatting HTML for use in a locally hosted iframe

This formats HTML for use in a locally hosted iframe so that you can manipulate the content in the iframe freely, without running into cross domain issues. It uses Goutte to retrieve the HTML. I'd ...

php regex web-scraping

asked May 13 at 16:28

zacbrac
233

7

votes

1answer

82 views

Parsing a Wikipedia page for a country

The program should accept a name of a country as input. It should then parse the Wikipedia page for that country and find all links to the wikipedia pages of other countries on that page and make a ...

python json web-scraping

asked May 12 at 11:14

anshul yadav
361

2

votes

1answer

108 views

Web-scraping Reddit Bot

I have been working on a web-scraping Reddit bot in Python 2.7 with the premise of going to /r/eve (a game sub-reddit) finding posts that contain a link to a website hosting killmail information ...

python python-2.7 web-scraping beautiful-soup reddit

asked May 11 at 13:41

ArnoldM904
2155

7

votes

1answer

184 views

Scraping scores from flashscore.com

I built a bot with Python to scrap scores on flashscore.com but the data scrap from the site loads into its listbox very slowly. I am curious about the speed of selenium so I made a button that prints ...

python performance web-scraping selenium

asked May 9 at 11:11

GnomeMage
361

1

vote

0answers

133 views

Using BeautifulSoup to scrape various tables and combine in a .csv file

A page contains a table of links, each link contains a table relevant to the link (a subject). Create a list of these links to pass to the function called ...

python csv web-scraping beautiful-soup

asked May 6 at 10:53

Ashley John Kent
605

1

vote

1answer

89 views

College web-scraping LinkedIn once off test

I'm trying to use ruby LinkedIn scraper for a college project where I have to demonstrate web scraping on 20 names for their name, title etc. I've never used Ruby before but this gem seems reasonably ...

ruby web-scraping

asked May 4 at 10:13

colin.kane
254

1

vote

1answer

33 views

Extracts marks of all students in class from website

This code extracts marks of all students in class and stores the result in a file results.txt, using BeautifulSoup. I'm looking for code review and suggestions. ...

python web-scraping beautiful-soup

asked Apr 17 at 15:47

rick112358
156112

2

votes

1answer

299 views

Download stock data from Yahoo Finance

This Python 3.4 script downloads stock data and puts it into an Excel file. ...

python python3 excel finance web-scraping

asked Apr 8 at 4:23

spelchekr
132

1

vote

0answers

96 views

Web scraping using CasperJS

Here's a few line written for web scraping using CasperJS. The code does what it should, but how can I improve it? For example, how can I make it more reusable? It could be also nice if I could remove ...

javascript web-scraping casper.js

asked Apr 7 at 2:52

Omid
234

8

votes

1answer

195 views

Scraping my CS teacher's website, then emailing me when the site is updated

I've been working on creating an individual final project for my python CS class that checks my teacher's website on a daily basis and determines if he's changed any of the web pages on his website ...

python performance python3 web-scraping beautiful-soup

asked Mar 14 at 4:45

cody.codes
37218

4

votes

0answers

145 views

Crawling and parsing meteorological data from the web into R

I am interested in collecting directly into R data published by the Mexican Met-office. The data pieces are spread through several URLs, but one can start here. There I can get the names and ...

parsing web-scraping r

asked Mar 1 at 16:01

MiEquiZ
212

4

votes

2answers

623 views

Amazon web scraper

I am trying to improve my programming and programming design skills (poor at the moment). I created a small Amazon scraper program. It is a working program. I would be very grateful if you could ...

python oop python3 web-scraping

asked Feb 27 at 9:07

Wramana
213

2

votes

2answers

106 views

Web-scraper for a larger program

I have a web scraper that I use in a part of a larger program. However, I feel like I semi-repeat my code a lot and take up a lot of room. Is there any way I can condense this code? ...

python python-2.7 web-scraping

asked Feb 27 at 2:48

ArnoldM904
2155

2

votes

1answer

67 views

Scraping through product pages

I'm working through a scraping function where pages of results lead to product pages. I've added a default maximum number of results pages, and pages per set of results, to prevent a simple mistake ...

python web-scraping beautiful-soup

asked Feb 21 at 19:04

GollyJer
1135

4

votes

2answers

183 views

Press any login button on any site

I'm working on a script that will be able to press the login button on any site for an app I'm working on. I have it working (still a few edge cases to work out such as multiple submit buttons and ...

javascript html regex web-scraping webdriver

asked Feb 1 at 18:44

Levi Fuller
1334

6

votes

2answers

490 views

Pure Python script that saves an HTML page with all images

Here is a pure Python script that saves an HTML page without CSS but with all images on it and replaces all hrefs with a path of an image on the hard drive. I know that there are great libraries like ...

python parsing python-2.7 web-scraping

asked Jan 27 at 17:29

micgeronimo
1363

4

votes

3answers

164 views

Searching for a string in a downloaded PDF

This code goes to the website containing the PDF, downloads the PDF, then it converts this PDF to text. Finally, it reads this whole file (Over 5000 lines) into a list, line by line, and searches for ...

python url web-scraping child-process pdf

asked Jan 26 at 22:36

Twin802
385

4

votes

3answers

58 views

Displaying sorted results of a web crawl

The issue I have with this class is that most of the methods are almost the same. I would like for this code to be more pythonic. Note: I plan on replacing all the ...

python sorting web-scraping

asked Jan 16 at 17:17

Ricky Wilson
29517

4

votes

2answers

459 views

Trivago hotels price checker

I've decided to write my first project in Python. I would like to hear some opinion from you. Description of the script: Generate Trivago URLs for 5 star hotels in specified city. Scrap these URLs ...

python beginner web-scraping

asked Jan 12 at 15:49

mostaszewski
234

5

votes

1answer

49 views

Print the list of winter bash 2014 hats as a list of checkboxes in GFM format

In Winter Bash 2014, since there is no easy way to see the hats I'm missing per site, I decided to use Gists for that. A perhaps not so well-known feature of GitHub Flavered Markdown (GFM) format ...

python python3 web-scraping beautiful-soup markdown

asked Dec 24 '14 at 18:00

janos♦
58.8k560231

6

votes

2answers

191 views

Retrieving stock prices

It takes around 5-8 seconds for me to retrieve a previously-closed stock price and a dividend rate from US Yahoo! Finance. If I wanted to retrieve 10+ stock prices, it would take me more than a minute ...

vba excel finance web-scraping internet-explorer

asked Dec 23 '14 at 17:15

pexpex223
333

5

votes

5answers

228 views

Finding the occurrences of all words in movie scripts

I was wondering if someone could tell me things I could improve in this code. This is one of my first Python projects. This program gets the script of a movie (in this case Interstellar) and then ...

python python-2.7 web-scraping beautiful-soup

asked Dec 17 '14 at 3:00

user49487
8315

5

votes

1answer

246 views

Scraping efficiently with mechanize and bs4

I have written some code that scrapes data on asteroids, but the problem is that is super slow! I understand that it has a lot to scrape, but as of now it has been running for 5 days and is bot even a ...

python performance datetime web-scraping beautiful-soup

asked Dec 15 '14 at 18:30

Tazza2092
262

0

votes

1answer

80 views

Program to create list of all English Wikipedia articles

This program will scrape Wikipedia to create a list of all English Wikipedia articles. How can I improve this program as it currently performs very badly performance-wise? On my Internet connection ...

python html windows web-scraping

asked Nov 20 '14 at 19:37

Dominik Schmidt
185

7

votes

3answers

199 views

RateBeer.com scraper

This was largely an exercise in making my code more Pythonic, especially in catching errors and doing things the right way. I opted to make the PageNotFound ...

python error-handling library web-scraping beautiful-soup

asked Nov 14 '14 at 22:28

AFL
383

6

votes

1answer

4k views

Refactoring a Crawler

I've recently ported an old project and made it object-oriented. However, I've noticed that rubocop points out the following status: ...

ruby http web-scraping

asked Nov 14 '14 at 1:51

user27606

1

vote

1answer

475 views

Utilization of Steam APIs and web-scraping

Some background info here: This is a small fun project I made utilizing Steam APIs and web-scraping This is the first time I've ever used Python, so I'm not very familiar with the language I used ...

python optimization beginner web-scraping flask

asked Oct 28 '14 at 0:52

Vishwa Iyer
1064

5

votes

1answer

98 views

Getting information of countries out of a website that isn't using consistent verbiage

From this website I needed to grab the information for each country and insert it into an Excel spreadsheet. My original plan was to use my program and search each website for the text and later ...

c# strings excel web-scraping

asked Oct 27 '14 at 17:14

Funlamb
3158

2

votes

0answers

73 views

Compressing a blog into a preview using tumblr_api_read

Here is what I have currently working. I would like to make it look more aesthetically pleasing, so not finish words in mid word. Also not have the two previews be so much larger than the other. ...

javascript html json api web-scraping

asked Oct 14 '14 at 21:47

user3052990
111

1

vote

1answer

458 views

Crawl multiple pages at once

This an update to my last question. I want to process multiple pages at once pulling URLs from tier_list in the crawl_web ...

python optimization multithreading web-scraping breadth-first-search

asked Oct 4 '14 at 23:52

Ralph
685

3

votes

3answers

765 views

Implementing a POC Async Web Crawler

I've created a small proof of concept web crawler to learn more about asynchrony in .NET. Currently when run it crawls stack overflow with a fixed number of current requests (workers). I was ...

c# design-patterns asynchronous web-scraping

asked Oct 3 '14 at 14:09

CountZero
1385

1

vote

2answers

192 views

Basic search engine

I want to improve efficiency of this search engine. It works in about 10 seconds for a search depth of 1, but 4 minutes at 2 etc. I tried to give straightforward comments and variable names, any ...

python optimization python-2.7 web-scraping breadth-first-search

asked Oct 2 '14 at 10:18

Ralph
685

your communities

Tagged Questions

Related Tags