Web scraping is the use of a program to simulate human interaction with a web server or to extract specific information from a web page.
0
votes
0answers
36 views
Whitewidow, SQL vulnerability web scraper
I believe it was yesterday I posted a program that scrapes google for SQL vulnerable web pages here. I never got an answer, so I just went all out and made it look a little prettier, it still only ...
3
votes
0answers
43 views
SQL vulnerability web scraper
I've created a program that will be used for pentesting. It scrapes the first page of Google out of an array of searches. It then attempts to find an SQL error within the site by adding an apostrophe ...
5
votes
0answers
49 views
Node.js parallel file download, the ES6 way
I wrote a script that downloads all PDFs found on the web page of a particular government agency. I would have chosen bash for such a task, but I want the script to ...
0
votes
0answers
32 views
Web-scraping library
Here are two functions that make a request for a given URL and then takes the response body (HTML) and loads it into the cheerio library:
scrapeListing.js
...
2
votes
2answers
55 views
Java web scraping robots
I am developing application that goes through 2 websites and gets all the articles, but my code is identical in most parts, is there a way to optimize this code actually :/ (TL and DN are the naming ...
2
votes
1answer
33 views
Wikipedia indexer and shortest link finder
I have the following code, how can I make it more efficient? Also, it doesn't always find the shortest route. (See Cat -> Tree)
...
0
votes
0answers
47 views
Web crawler with Python and the asnyncio library
I am trying to experiment with Python 3.5 async/await and the whole asyncio library. I tried ...
3
votes
2answers
75 views
Parsing HTML from multiple webpages simultaneously
My friend wrote a scraper in Go that takes the results from a house listing webpage and finds listings for houses that he's interested in. The initial search returns listings, they are filtered by ...
0
votes
1answer
118 views
CRAP index (56) in Web Scraper Engine
I am working on a Web Scraper for the first time using Test Driven Development, however I have caught myself into a huge CRAP (Change Risk Anti-Patterns) index (56) and I can not seem to find a ...
4
votes
0answers
61 views
Web scraping with Nokogiri
At work we have a need to know what printers are getting dangerously low on their toner, and paper consumption, etc..
So I've created a program that pulls the printer information off the websites the ...
0
votes
0answers
27 views
Email a notification when detecting changes on a website - follow-up
I read through other questions here and improved the code and added a new feature. The old question can be found at: Email a notification when detecting changes on a website
The improvements that are ...
19
votes
5answers
930 views
An OEIS Lookup tool in Python
I'm from PPCG so I was making an esolang and I decided to write it in Python. Eventually it went from an esolang to an OEIS (Online Encyclopedia of Integer Sequences) lookup tool. I'm very new to ...
0
votes
0answers
25 views
Extracting articles mentioned in comments in three GET requests
I need to make several HTTP GET requests and do the following stuff:
When all of them will be completed, I need to parse each HTML
After HTML parsing or in case of any error I need to set flag ...
5
votes
3answers
250 views
Email a notification when detecting changes on a website
The text of a website is checked in a given time period. If there are any changes a mail is sent. There is a option to show/mail the new parts in the website. What could be improved?
...
5
votes
1answer
94 views
Newest Reddit submissions grabber
My program does exactly what I want it to do and it works well. However, I feel like it's very clunky.
I'd like my code to be more efficient. By that I mean, I'd like it to accomplish what it already ...
2
votes
2answers
74 views
PHP crawler to collect comments on articles
I have code that parses through web pages finds commentaries and saves commentary info in DB. I have an array where all necessary pages are stored. I iterate through all these pages one by one and ...
1
vote
0answers
30 views
Checking paginated website for new entries
I'm interested in determining the best way to check a paginated website for new entries. I want to be able to scrape pages 1, 2, 3, ... as necessary to get all updates. However the scraping is fairly ...
3
votes
0answers
58 views
Scraping links from the first page of Google using Kivy
I'm making a scraper/web crawler using Kivy when I run the code it works but I'm not sure if what I'm doing is Pythonic because all the language I can find is about using the Kivy library. I'm unsure ...
3
votes
1answer
53 views
Factory pattern in F# for a web scraper
I'm trying to learn F# by creating a little web scraper that will do custom scraping based on the url domain. For this, I need to create and select the correct kind of scraper. I figure I would use a ...
0
votes
1answer
43 views
Implementation of bridge design pattern for a web scraping app - follow-up
Earlier today I tried to implement an example of the bridge design pattern, but I ended up misinterpreting it.
I made a lot of changes:
...
0
votes
1answer
50 views
Implementation of Bridge Design Pattern
I made an implementation of the Bridge Pattern to handle ever-changing in crawler APIs that I'm using in my APP.
...
2
votes
0answers
22 views
HTML Scraper for Plex downloads page
I have written a scraper in Python 3 using Beautiful Soup 4 to retrieve the latest version of Plex Media Server from https://plex.tv, and I'd like some feedback on how to improve it.
The HTML the ...
4
votes
2answers
192 views
Page Scraper and DOM manipulator
This code is a page scraper using HtmlAgilityPack that creates a DOM document upon construction and allows for node manipulation afterward.
HtmlAgilityPack uses XPath Selectors for selecting nodes.
...
6
votes
1answer
35 views
Helper functions to extract SEDE query results into more user-friendly format
This Python module contains helper functions to download the result page of SEDE queries and extract columns from it, most prominently:
...
14
votes
2answers
163 views
OLog Userscript - Logging messages, planets and researches
For the online text-based browser game OGame I am working on an application with as aim to assist the users where possible, for this I have a server-side part and a client-side part, the respective ...
4
votes
1answer
93 views
Web crawler that charts stock ticker data using matplotlib
I've built a web crawler using the BeautifulSoup library that pulls stock ticker data from CSV files on Yahoo finance, and charts the data using ...
14
votes
2answers
109 views
Getting to Wikipedia's “Philosophy” article using Python
On Wikipedia, if you click the first non-italicised internal link in the main text of an article that's not within parentheses, and then repeat the process, you usually end up on the "Philosophy" ...
6
votes
1answer
97 views
Java batch movie downloader
The idea is to batch download a list of movies (torrents) off a torrent site and add them to your server.
I have a little bit of Java experience (sophomore in college), so I'm looking for things that ...
6
votes
1answer
32 views
Scraping SEDE query results with caching
I use this script to scrape the results of a SEDE page and return as a BeautifulSoup object.
A small twist is that if I don't use a SEDE query manually in the browser for a few days, then ...
6
votes
2answers
40 views
Scraping columns from SEDE results
I use the following script to download the result of a SEDE query and scrape a specific column from it using BeautifulSoup:
...
9
votes
2answers
207 views
Web Scraping with VBA
I wrote this to scrape album review data from AOTY into a spreadsheet. Check it out and let me know what I could've done better.
...
6
votes
2answers
121 views
Web Scraper in Python
So, this is my first web scraper (or part of it at least) and looking for things that I may have done wrong, or things that could be improved so I can learn from my mistakes.
I made a few short ...
2
votes
1answer
293 views
Multithreaded Webcrawler in Java
I am working on a multi-threaded webcrawling program in Java. Each WebCrawler starts at a root page and repeatedly extracts new links and writes them to a database. ...
2
votes
0answers
160 views
Movie torrent-site web scraper with IMDb info and streaming
I'm completely new to Javascript and NodeJs and functional programming in general. The code below scrapes a torrent-website containing movies, gets info about the movie from the OMDb API and lets a ...
3
votes
2answers
71 views
Program to retrieve key/message from a multiple times used one time pad
I wrote a program to retrieve the key/messages from 10 different ciphers which were all encrypted with the same key with an xor one-time-pad method via crib dragging.
To do this, I wrote a python ...
6
votes
3answers
233 views
IP and router connections
How can I make my code more pythonic ? I definitely think there is a way to make this code a lot more readable and clear + shorter...
But I haven't found an effective way. Any techniques I can use to ...
1
vote
2answers
300 views
Scrapy spider for products on a site
I recently submitted a code sample for a web scraping project and was rejected without feedback as to what they didn't like. The prompt, while I cannot give it here verbatim, basically stated that I ...
5
votes
1answer
77 views
Mixed scripting language API to determine file upload location and scrape city government website to find corresponding government official
I wrote this script as an NYC-specific-API for file upload for a mobile app. Users upload a video file and also their geographic coordinates.
I then use an external API to get the corresponding ...
4
votes
3answers
468 views
Multithreaded web scraper with proxy and user agent switching
I am trying to improve the performance of my scraper and plug up any possible security leaks (identifying information being revealed).
Ideally, I would like to achieve a performance of 10 pages per ...
2
votes
0answers
280 views
A web crawler for scraping images from stock photo websites
I created a web crawler that uses beautiful soup to crawl images from a website and scrape them to a database. in order to use it you have to create a class that inherits from Crawler and implements 4 ...
3
votes
1answer
106 views
Web crawlers for three image sites
I'm very new to python and only vaguely remember OOP from doing some Java a few years ago so I don't know what the best way to do this is.
I've build a bunch of classes that represent a crawler that ...
3
votes
0answers
186 views
Web scraping from the Google Play store
I am using this R function to web scrape data from the google play store. Is there a way to increase its efficiency using R? This code takes about 4 seconds for 14 urls with my machine/internet ...
4
votes
1answer
48 views
Webscraping Bing wallpapers
I wanted to scrape all the wallpapers from the Bing wallpaper gallery. This was for personal use and to learn about webscraping. The gallery progressively gets images using javascript as the user ...
10
votes
2answers
152 views
Financial Data From Webqueries in Excel
I'm new (to CR and to programming in general). I wrote my first VBA on Monday. This is my first working project. It Takes a bunch of financial data from a company called Financial Analytics and a ...
3
votes
1answer
81 views
Scraping a table from Texas Dept. of Criminal Justice website
The script scrapes a table from the website mentioned, looks at the last 2 columns, takes that information, and then sorts it (and then returns the largest county, and the set of races and their ...
5
votes
2answers
472 views
Downloading stock information from Yahoo! Finance
The program downloads stock information from Yahoo! Finance and displays it in the spreadsheet. On my Mac the program takes 10 minutes to get data for approximately 4000 stocks and on the PC it takes ...
4
votes
1answer
435 views
2
votes
0answers
65 views
OOP Web scraper using regex to grab tag contents
I'm about learning about implementing the solid principle in PHP.
I want to create simple content crawler/grabber from some websites. This crawler will grab the content from the website url. Since we ...
3
votes
2answers
195 views
Google News scraper to fetch links with similar stories
The following code takes either a URL or the title to an existing news article.
Searches Google News using the title.
Collects all links from search results.
...
2
votes
1answer
91 views
Reducing execution time of an HTML parsing script
The script is intended to return an array with texts containing specific words in English and the equivalent texts in Polish from EUR-Lex - a website with EU documents.
The script downloads the page ...