Web scraping is the use of a program to simulate human interaction with a web server or to extract specific information from a web page.

learn more… | top users | synonyms

1
vote
0answers
35 views

Scraping the date of most recent post from various social media services

I have a large spreadsheet (csvToRead in my code). Each line includes (among other things I don't care about): The URL of a social media account (not always ...
2
votes
0answers
30 views

Finding shortest paths in a Wikipedia article graph using Java

I have this sort of a web crawler that asks for two (English) Wikipedia article titles (the source and the target), and proceeds to compute the shortest path between the two. My code is as follows: ...
7
votes
0answers
89 views

The YouTube crawler

I have coded a program to scrap YouTube data (for educational purposes). When the link of the channel is entered it scraps the channel name, description of the channel, the videos posted by the ...
1
vote
2answers
42 views

Crawling SPOJ through cURL and C++

I am trying to write industry standard code. https://www.quora.com/How-do-I-follow-a-user-on-Spoj-for-solving-problems-Refer-Details Someone gave me this A2A. And I wrote this code for it ...
2
votes
1answer
38 views

Get data from a web page, with client code, using F#

I want to extract a price for a single index fund, the price of which is available on dynamic web pages. Being new to this, my original idea was to download the single page of static HTML and get ...
1
vote
0answers
24 views

Scrape google apps page and store application details in database

Below is a python script which scrapes specific google apps url for example https://play.google.com/store/apps/details?id=com.mojang.minecraftpe and save the ...
3
votes
2answers
433 views

Web crawler in F#

I have been writing a web crawler in F# that downloads pages with stylesheets and scripts. Can somebody give me suggestions on improving this code, please? Would appreciate any feedback that could ...
2
votes
1answer
32 views

Web scraping with VBA

I have this code that fetches rates from a website called X-Rates, and outputs to excel the monthly averages of a chosen country. The code runs quite fast, but I still think there's improvements to ...
6
votes
2answers
193 views

Fast(er) web scraping with VBA

I have a code that fetches rates from a website called X-Rates, and outputs to excel the monthly averages of a chosen country. The code runs quite fast, but I still think I could improve the code a ...
0
votes
0answers
15 views

Ruby written SQL vulnerability pentesting tool

I've wrote a program that scrapes websites for SQL vulnerability, (IT DOES NOT EXPLOIT JUST SEARCHES) I would like some critique on what I've done, is there a way I can write the ...
5
votes
2answers
65 views

Steam Community Market Strange Part Scraper

A while back I wrote a simple little PHP script that searches the Steam community market for any TF2 strange weapons with strange parts on the first page of results for that weapon type. It works by ...
6
votes
0answers
77 views

Node.js parallel file download, the ES6 way

I wrote a script that downloads all PDFs found on the web page of a particular government agency. I would have chosen bash for such a task, but I want the script to ...
0
votes
0answers
36 views

Web-scraping library

Here are two functions that make a request for a given URL and then takes the response body (HTML) and loads it into the cheerio library: scrapeListing.js ...
2
votes
2answers
66 views

Java web scraping robots

I am developing application that goes through 2 websites and gets all the articles, but my code is identical in most parts, is there a way to optimize this code actually :/ (TL and DN are the naming ...
2
votes
1answer
36 views

Wikipedia indexer and shortest link finder

I have the following code, how can I make it more efficient? Also, it doesn't always find the shortest route. (See Cat -> Tree) ...
0
votes
0answers
69 views

Web crawler with Python and the asnyncio library

I am trying to experiment with Python 3.5 async/await and the whole asyncio library. I tried ...
3
votes
2answers
77 views

Parsing HTML from multiple webpages simultaneously

My friend wrote a scraper in Go that takes the results from a house listing webpage and finds listings for houses that he's interested in. The initial search returns listings, they are filtered by ...
0
votes
1answer
121 views

CRAP index (56) in Web Scraper Engine

I am working on a Web Scraper for the first time using Test Driven Development, however I have caught myself into a huge CRAP (Change Risk Anti-Patterns) index (56) and I can not seem to find a ...
4
votes
0answers
70 views

Web scraping with Nokogiri

At work we have a need to know what printers are getting dangerously low on their toner, and paper consumption, etc.. So I've created a program that pulls the printer information off the websites the ...
0
votes
0answers
40 views

Email a notification when detecting changes on a website - follow-up

I read through other questions here and improved the code and added a new feature. The old question can be found at: Email a notification when detecting changes on a website The improvements that are ...
19
votes
5answers
960 views

An OEIS Lookup tool in Python

I'm from PPCG so I was making an esolang and I decided to write it in Python. Eventually it went from an esolang to an OEIS (Online Encyclopedia of Integer Sequences) lookup tool. I'm very new to ...
0
votes
0answers
25 views

Extracting articles mentioned in comments in three GET requests

I need to make several HTTP GET requests and do the following stuff: When all of them will be completed, I need to parse each HTML After HTML parsing or in case of any error I need to set flag ...
5
votes
3answers
281 views

Email a notification when detecting changes on a website

The text of a website is checked in a given time period. If there are any changes a mail is sent. There is a option to show/mail the new parts in the website. What could be improved? ...
5
votes
1answer
104 views

Newest Reddit submissions grabber

My program does exactly what I want it to do and it works well. However, I feel like it's very clunky. I'd like my code to be more efficient. By that I mean, I'd like it to accomplish what it already ...
2
votes
2answers
86 views

PHP crawler to collect comments on articles

I have code that parses through web pages finds commentaries and saves commentary info in DB. I have an array where all necessary pages are stored. I iterate through all these pages one by one and ...
1
vote
0answers
38 views

Checking paginated website for new entries

I'm interested in determining the best way to check a paginated website for new entries. I want to be able to scrape pages 1, 2, 3, ... as necessary to get all updates. However the scraping is fairly ...
3
votes
0answers
69 views

Scraping links from the first page of Google using Kivy

I'm making a scraper/web crawler using Kivy when I run the code it works but I'm not sure if what I'm doing is Pythonic because all the language I can find is about using the Kivy library. I'm unsure ...
3
votes
1answer
69 views

Factory pattern in F# for a web scraper

I'm trying to learn F# by creating a little web scraper that will do custom scraping based on the url domain. For this, I need to create and select the correct kind of scraper. I figure I would use a ...
0
votes
1answer
46 views

Implementation of bridge design pattern for a web scraping app - follow-up

Earlier today I tried to implement an example of the bridge design pattern, but I ended up misinterpreting it. I made a lot of changes: ...
0
votes
1answer
53 views

Implementation of Bridge Design Pattern

I made an implementation of the Bridge Pattern to handle ever-changing in crawler APIs that I'm using in my APP. ...
2
votes
0answers
33 views

HTML Scraper for Plex downloads page

I have written a scraper in Python 3 using Beautiful Soup 4 to retrieve the latest version of Plex Media Server from https://plex.tv, and I'd like some feedback on how to improve it. The HTML the ...
4
votes
2answers
200 views

Page Scraper and DOM manipulator

This code is a page scraper using HtmlAgilityPack that creates a DOM document upon construction and allows for node manipulation afterward. HtmlAgilityPack uses XPath Selectors for selecting nodes. ...
6
votes
1answer
38 views

Helper functions to extract SEDE query results into more user-friendly format

This Python module contains helper functions to download the result page of SEDE queries and extract columns from it, most prominently: ...
14
votes
2answers
168 views

OLog Userscript - Logging messages, planets and researches

For the online text-based browser game OGame I am working on an application with as aim to assist the users where possible, for this I have a server-side part and a client-side part, the respective ...
4
votes
1answer
115 views

Web crawler that charts stock ticker data using matplotlib

I've built a web crawler using the BeautifulSoup library that pulls stock ticker data from CSV files on Yahoo finance, and charts the data using ...
14
votes
2answers
117 views

Getting to Wikipedia's “Philosophy” article using Python

On Wikipedia, if you click the first non-italicised internal link in the main text of an article that's not within parentheses, and then repeat the process, you usually end up on the "Philosophy" ...
6
votes
1answer
109 views

Java batch movie downloader

The idea is to batch download a list of movies (torrents) off a torrent site and add them to your server. I have a little bit of Java experience (sophomore in college), so I'm looking for things that ...
6
votes
1answer
34 views

Scraping SEDE query results with caching

I use this script to scrape the results of a SEDE page and return as a BeautifulSoup object. A small twist is that if I don't use a SEDE query manually in the browser for a few days, then ...
6
votes
2answers
42 views

Scraping columns from SEDE results

I use the following script to download the result of a SEDE query and scrape a specific column from it using BeautifulSoup: ...
9
votes
2answers
232 views

Web Scraping with VBA

I wrote this to scrape album review data from AOTY into a spreadsheet. Check it out and let me know what I could've done better. ...
6
votes
2answers
130 views

Web Scraper in Python

So, this is my first web scraper (or part of it at least) and looking for things that I may have done wrong, or things that could be improved so I can learn from my mistakes. I made a few short ...
2
votes
1answer
418 views

Multithreaded Webcrawler in Java

I am working on a multi-threaded webcrawling program in Java. Each WebCrawler starts at a root page and repeatedly extracts new links and writes them to a database. ...
2
votes
0answers
184 views

Movie torrent-site web scraper with IMDb info and streaming

I'm completely new to Javascript and NodeJs and functional programming in general. The code below scrapes a torrent-website containing movies, gets info about the movie from the OMDb API and lets a ...
3
votes
2answers
74 views

Program to retrieve key/message from a multiple times used one time pad

I wrote a program to retrieve the key/messages from 10 different ciphers which were all encrypted with the same key with an xor one-time-pad method via crib dragging. To do this, I wrote a python ...
6
votes
3answers
238 views

IP and router connections

How can I make my code more pythonic ? I definitely think there is a way to make this code a lot more readable and clear + shorter... But I haven't found an effective way. Any techniques I can use to ...
1
vote
2answers
346 views

Scrapy spider for products on a site

I recently submitted a code sample for a web scraping project and was rejected without feedback as to what they didn't like. The prompt, while I cannot give it here verbatim, basically stated that I ...
5
votes
1answer
79 views

Mixed scripting language API to determine file upload location and scrape city government website to find corresponding government official

I wrote this script as an NYC-specific-API for file upload for a mobile app. Users upload a video file and also their geographic coordinates. I then use an external API to get the corresponding ...
4
votes
3answers
629 views

Multithreaded web scraper with proxy and user agent switching

I am trying to improve the performance of my scraper and plug up any possible security leaks (identifying information being revealed). Ideally, I would like to achieve a performance of 10 pages per ...
2
votes
0answers
351 views

A web crawler for scraping images from stock photo websites

I created a web crawler that uses beautiful soup to crawl images from a website and scrape them to a database. in order to use it you have to create a class that inherits from Crawler and implements 4 ...
3
votes
1answer
125 views

Web crawlers for three image sites

I'm very new to python and only vaguely remember OOP from doing some Java a few years ago so I don't know what the best way to do this is. I've build a bunch of classes that represent a crawler that ...