Web scraping is the use of a program to simulate human interaction with a web server or to extract specific information from a web page.
1
vote
0answers
35 views
Scraping the date of most recent post from various social media services
I have a large spreadsheet (csvToRead in my code). Each line includes (among other things I don't care about):
The URL of a social media account (not always ...
2
votes
0answers
30 views
Finding shortest paths in a Wikipedia article graph using Java
I have this sort of a web crawler that asks for two (English) Wikipedia article titles (the source and the target), and proceeds to compute the shortest path between the two. My code is as follows:
...
7
votes
0answers
89 views
The YouTube crawler
I have coded a program to scrap YouTube data (for educational purposes). When the link of the channel is entered it scraps the channel name, description of the channel, the videos posted by the ...
1
vote
2answers
42 views
Crawling SPOJ through cURL and C++
I am trying to write industry standard code.
https://www.quora.com/How-do-I-follow-a-user-on-Spoj-for-solving-problems-Refer-Details
Someone gave me this A2A.
And I wrote this code for it
...
2
votes
1answer
38 views
Get data from a web page, with client code, using F#
I want to extract a price for a single index fund, the price of which is available on dynamic web pages.
Being new to this, my original idea was to download the single page of static HTML and get ...
1
vote
0answers
24 views
Scrape google apps page and store application details in database
Below is a python script which scrapes specific google apps url for example https://play.google.com/store/apps/details?id=com.mojang.minecraftpe and save the ...
3
votes
2answers
433 views
Web crawler in F#
I have been writing a web crawler in F# that downloads pages with stylesheets and scripts.
Can somebody give me suggestions on improving this code, please?
Would appreciate any feedback that could ...
2
votes
1answer
32 views
Web scraping with VBA
I have this code that fetches rates from a website called X-Rates, and outputs to excel the monthly averages of a chosen country.
The code runs quite fast, but I still think there's improvements to ...
6
votes
2answers
193 views
Fast(er) web scraping with VBA
I have a code that fetches rates from a website called X-Rates, and outputs to excel the monthly averages of a chosen country.
The code runs quite fast, but I still think I could improve the code a ...
0
votes
0answers
15 views
Ruby written SQL vulnerability pentesting tool
I've wrote a program that scrapes websites for SQL vulnerability, (IT DOES NOT EXPLOIT JUST SEARCHES) I would like some critique on what I've done, is there a way I can write the ...
5
votes
2answers
65 views
Steam Community Market Strange Part Scraper
A while back I wrote a simple little PHP script that searches the Steam community market for any TF2 strange weapons with strange parts on the first page of results for that weapon type. It works by ...
6
votes
0answers
77 views
Node.js parallel file download, the ES6 way
I wrote a script that downloads all PDFs found on the web page of a particular government agency. I would have chosen bash for such a task, but I want the script to ...
0
votes
0answers
36 views
Web-scraping library
Here are two functions that make a request for a given URL and then takes the response body (HTML) and loads it into the cheerio library:
scrapeListing.js
...
2
votes
2answers
66 views
Java web scraping robots
I am developing application that goes through 2 websites and gets all the articles, but my code is identical in most parts, is there a way to optimize this code actually :/ (TL and DN are the naming ...
2
votes
1answer
36 views
Wikipedia indexer and shortest link finder
I have the following code, how can I make it more efficient? Also, it doesn't always find the shortest route. (See Cat -> Tree)
...
0
votes
0answers
69 views
Web crawler with Python and the asnyncio library
I am trying to experiment with Python 3.5 async/await and the whole asyncio library. I tried ...
3
votes
2answers
77 views
Parsing HTML from multiple webpages simultaneously
My friend wrote a scraper in Go that takes the results from a house listing webpage and finds listings for houses that he's interested in. The initial search returns listings, they are filtered by ...
0
votes
1answer
121 views
CRAP index (56) in Web Scraper Engine
I am working on a Web Scraper for the first time using Test Driven Development, however I have caught myself into a huge CRAP (Change Risk Anti-Patterns) index (56) and I can not seem to find a ...
4
votes
0answers
70 views
Web scraping with Nokogiri
At work we have a need to know what printers are getting dangerously low on their toner, and paper consumption, etc..
So I've created a program that pulls the printer information off the websites the ...
0
votes
0answers
40 views
Email a notification when detecting changes on a website - follow-up
I read through other questions here and improved the code and added a new feature. The old question can be found at: Email a notification when detecting changes on a website
The improvements that are ...
19
votes
5answers
960 views
An OEIS Lookup tool in Python
I'm from PPCG so I was making an esolang and I decided to write it in Python. Eventually it went from an esolang to an OEIS (Online Encyclopedia of Integer Sequences) lookup tool. I'm very new to ...
0
votes
0answers
25 views
Extracting articles mentioned in comments in three GET requests
I need to make several HTTP GET requests and do the following stuff:
When all of them will be completed, I need to parse each HTML
After HTML parsing or in case of any error I need to set flag ...
5
votes
3answers
281 views
Email a notification when detecting changes on a website
The text of a website is checked in a given time period. If there are any changes a mail is sent. There is a option to show/mail the new parts in the website. What could be improved?
...
5
votes
1answer
104 views
Newest Reddit submissions grabber
My program does exactly what I want it to do and it works well. However, I feel like it's very clunky.
I'd like my code to be more efficient. By that I mean, I'd like it to accomplish what it already ...
2
votes
2answers
86 views
PHP crawler to collect comments on articles
I have code that parses through web pages finds commentaries and saves commentary info in DB. I have an array where all necessary pages are stored. I iterate through all these pages one by one and ...
1
vote
0answers
38 views
Checking paginated website for new entries
I'm interested in determining the best way to check a paginated website for new entries. I want to be able to scrape pages 1, 2, 3, ... as necessary to get all updates. However the scraping is fairly ...
3
votes
0answers
69 views
Scraping links from the first page of Google using Kivy
I'm making a scraper/web crawler using Kivy when I run the code it works but I'm not sure if what I'm doing is Pythonic because all the language I can find is about using the Kivy library. I'm unsure ...
3
votes
1answer
69 views
Factory pattern in F# for a web scraper
I'm trying to learn F# by creating a little web scraper that will do custom scraping based on the url domain. For this, I need to create and select the correct kind of scraper. I figure I would use a ...
0
votes
1answer
46 views
Implementation of bridge design pattern for a web scraping app - follow-up
Earlier today I tried to implement an example of the bridge design pattern, but I ended up misinterpreting it.
I made a lot of changes:
...
0
votes
1answer
53 views
Implementation of Bridge Design Pattern
I made an implementation of the Bridge Pattern to handle ever-changing in crawler APIs that I'm using in my APP.
...
2
votes
0answers
33 views
HTML Scraper for Plex downloads page
I have written a scraper in Python 3 using Beautiful Soup 4 to retrieve the latest version of Plex Media Server from https://plex.tv, and I'd like some feedback on how to improve it.
The HTML the ...
4
votes
2answers
200 views
Page Scraper and DOM manipulator
This code is a page scraper using HtmlAgilityPack that creates a DOM document upon construction and allows for node manipulation afterward.
HtmlAgilityPack uses XPath Selectors for selecting nodes.
...
6
votes
1answer
38 views
Helper functions to extract SEDE query results into more user-friendly format
This Python module contains helper functions to download the result page of SEDE queries and extract columns from it, most prominently:
...
14
votes
2answers
168 views
OLog Userscript - Logging messages, planets and researches
For the online text-based browser game OGame I am working on an application with as aim to assist the users where possible, for this I have a server-side part and a client-side part, the respective ...
4
votes
1answer
115 views
Web crawler that charts stock ticker data using matplotlib
I've built a web crawler using the BeautifulSoup library that pulls stock ticker data from CSV files on Yahoo finance, and charts the data using ...
14
votes
2answers
117 views
Getting to Wikipedia's “Philosophy” article using Python
On Wikipedia, if you click the first non-italicised internal link in the main text of an article that's not within parentheses, and then repeat the process, you usually end up on the "Philosophy" ...
6
votes
1answer
109 views
Java batch movie downloader
The idea is to batch download a list of movies (torrents) off a torrent site and add them to your server.
I have a little bit of Java experience (sophomore in college), so I'm looking for things that ...
6
votes
1answer
34 views
Scraping SEDE query results with caching
I use this script to scrape the results of a SEDE page and return as a BeautifulSoup object.
A small twist is that if I don't use a SEDE query manually in the browser for a few days, then ...
6
votes
2answers
42 views
Scraping columns from SEDE results
I use the following script to download the result of a SEDE query and scrape a specific column from it using BeautifulSoup:
...
9
votes
2answers
232 views
Web Scraping with VBA
I wrote this to scrape album review data from AOTY into a spreadsheet. Check it out and let me know what I could've done better.
...
6
votes
2answers
130 views
Web Scraper in Python
So, this is my first web scraper (or part of it at least) and looking for things that I may have done wrong, or things that could be improved so I can learn from my mistakes.
I made a few short ...
2
votes
1answer
418 views
Multithreaded Webcrawler in Java
I am working on a multi-threaded webcrawling program in Java. Each WebCrawler starts at a root page and repeatedly extracts new links and writes them to a database. ...
2
votes
0answers
184 views
Movie torrent-site web scraper with IMDb info and streaming
I'm completely new to Javascript and NodeJs and functional programming in general. The code below scrapes a torrent-website containing movies, gets info about the movie from the OMDb API and lets a ...
3
votes
2answers
74 views
Program to retrieve key/message from a multiple times used one time pad
I wrote a program to retrieve the key/messages from 10 different ciphers which were all encrypted with the same key with an xor one-time-pad method via crib dragging.
To do this, I wrote a python ...
6
votes
3answers
238 views
IP and router connections
How can I make my code more pythonic ? I definitely think there is a way to make this code a lot more readable and clear + shorter...
But I haven't found an effective way. Any techniques I can use to ...
1
vote
2answers
346 views
Scrapy spider for products on a site
I recently submitted a code sample for a web scraping project and was rejected without feedback as to what they didn't like. The prompt, while I cannot give it here verbatim, basically stated that I ...
5
votes
1answer
79 views
Mixed scripting language API to determine file upload location and scrape city government website to find corresponding government official
I wrote this script as an NYC-specific-API for file upload for a mobile app. Users upload a video file and also their geographic coordinates.
I then use an external API to get the corresponding ...
4
votes
3answers
629 views
Multithreaded web scraper with proxy and user agent switching
I am trying to improve the performance of my scraper and plug up any possible security leaks (identifying information being revealed).
Ideally, I would like to achieve a performance of 10 pages per ...
2
votes
0answers
351 views
A web crawler for scraping images from stock photo websites
I created a web crawler that uses beautiful soup to crawl images from a website and scrape them to a database. in order to use it you have to create a class that inherits from Crawler and implements 4 ...
3
votes
1answer
125 views
Web crawlers for three image sites
I'm very new to python and only vaguely remember OOP from doing some Java a few years ago so I don't know what the best way to do this is.
I've build a bunch of classes that represent a crawler that ...