Web scraping is the use of a program to simulate human interaction with a web server or to extract specific information from a web page.
5
votes
1answer
87 views
Simple recursive web crawler
I did a simple web crawler, I know there's many better ones out there, I did this just with the learning purpose.
The problem is that I think there's some things I could improve here. I commented the ...
6
votes
1answer
90 views
Basic IMDb scraper and movie generator
I just built my first scraper and I'd like to get your thoughts on the structure and the way I went about it. The basic premise of the script:
Get a random movie from IMDb's Top 250
Ask the user ...
2
votes
1answer
65 views
Scrapy Username Spider
This currently starts out with the speed of 2000 pages/min but shortly after starting it becomes very slow with a speed of about 200 pages/min. Why is this happening? How can I improve this scraper?
<...
3
votes
2answers
204 views
Scraping HTML via async controller & classes + HTML agility pack
I've developed a simple application to grab golfer index scores from a website that has no API. The application works but is very slow, with 6 users that require updating takes 60 seconds. I've tried ...
0
votes
0answers
145 views
Python Scrapy code using Selenium
I have written some Python code that uses Scrapy and Selenium to scrap restaurant names and addresses from a website. I needed to use Selenium because the button to show more restaurants on a page is ...
5
votes
1answer
95 views
Webscraping calendar events using Python 3, with or without BeautifulSoup
I'm trying to find out why my web-scraping code with BeautifulSoup (BS) is slower than my code without BS. I would think that BS code would be faster than the other code - so, maybe I'm doing ...
5
votes
1answer
70 views
Scraping names of directors from a website
I am scraping the names of the directors from a website using Python / ScraPy. I am very new to coding (under a year and after work) - any views would be appreciated.
The reason I have a ...
4
votes
2answers
65 views
Scrape 4chan for alive images
I'm trying to learn Clojure recently and I thought writing a simple web app would be a good way to dive in.
This function gets the list of alive threads from the API and reduces, filters and maps ...
0
votes
3answers
103 views
Image hosting site image downloader using requests and BeautifulSoup
I went about this the way I have because Selenium is slow and inconvenient (opens browser) and I was unable to find the href attribute in the link element using ...
2
votes
1answer
88 views
Scrape an infinite-scroll page
My algorithm scrapes an infinite-scroll page but it takes too long. It scrolls three times but I'm wondering if there is a way to do a ScrollBottom() so no need of ...
4
votes
2answers
855 views
Simple Python job vacancies downloader
I have created a BeautifulSoup vacancies parser, which works, but I do not like how it looks. Therefore, I'd be very happy if somebody could give me any improvements.
...
3
votes
1answer
82 views
VBA - XMLHTTP web scraping
I navigate with IE, do various things, then select all results option from a list and fire on click event. Once all results have been listed, I loop through their URLs, using the following code to ...
0
votes
0answers
63 views
Scraping websites and saving to MySQL
I have the following piece of code which scrapes websites and saves some information back to MySQL.
At the moment is consuming all the memory on my machine every time it runs.
I've refactored the ...
4
votes
1answer
174 views
Web scraping VBA - Internet Explorer
The code below extracts data from one web page - I emulate search, select all results from the list and when the list appears (42000 items) I loop through these items.
I get an id value from their ...
1
vote
1answer
152 views
Basic web scrape project written in NodeJS
Here is a short program web scraping program written in Node.js. I'm just getting to grips with node and this is the first thing I've written with it. I'm liking it so far though I guess I'm kinda ...
1
vote
0answers
109 views
GUI in Tkinter to log events for a web-scraper
I'm creating a GUI with tkinter that will handle starting/stopping/and logging events for a web-scraper (scraper not created yet).
The current code is working... but I've been gathering my ...
2
votes
1answer
103 views
Sample scraping Project Gutenberg using Beautiful Soup and requests
I am trying to learn web scraping in Python using Beautiful Soup and requests. My program goes to the book page on Project Gutenberg with the given book number (Example). It then finds the link for ...
2
votes
1answer
59 views
Downloading and saving news articles
To be honest, I am pretty new to coding. I want to analyse the articles my local newspaper publishes on ther website. To do that, I have a list of URLs and from that, I want to download and save all ...
4
votes
1answer
43 views
Haikuifier (Or at least Haiku Identifier)
All the usual stuff. Style, substance, algorithm please! I'm vaguely considering plugging it into a bot, hence the lacklustre catching of exceptions right now.
...
2
votes
1answer
108 views
Translating text using Google Translate mobile site
I have this code to translate text using google translate mobile site. currently text size is limited by the request method.
Everything else seems to works just fine
I am also about to post this on ...
1
vote
0answers
139 views
Google Searching Bot with Proxy support
I have been asked by a client to program a bot which searches Google and will show how many no of results I get.
Note: I know about Google Custom Search API and it will not produce the exact output ...
0
votes
2answers
103 views
Movie data scraping
I enter in the IMDb link and YouTube trailer link in the command line to a movie and the first main program loads all the info about the movie. The second main program uses an IMDb link to the movie ...
3
votes
0answers
25 views
VIM colors downloader in Python, using multiprocessing
I recently posted this script:
VIM colors downloader in Python
But since I'm not allowed to update the code there, I wanted to get an idea on this version, that uses multiprocessing:
...
8
votes
2answers
107 views
Finding words that rhyme
Preface
I was trying to review this question on the same topic, but in the end many points I wanted to make were excellently explained by @ferada so I felt that posting my code and explaining the ...
14
votes
2answers
725 views
VIM colors downloader in Python
Recently, I wanted to change my vim colors to something new. So I went to the vim colors website and then I decided that I wanted to download ALL the colors.
So I ...
3
votes
1answer
159 views
Python Document Downloader
This is a python document (PDF) downloader I made to download some question papers automatically. However, it is very slow.
Any better way to do this?
The code:
...
9
votes
2answers
915 views
Simple Python username scraper
I started learning Python recently and I really like it, so I decided to share one of my first projects mainly in hopes of someone telling me what I can do to make it run faster (threading/...
5
votes
2answers
182 views
Multithreaded webcrawler
I've been trying to learn Java for the last day or two. This is the first project I am working on, so please bear with me. I worked on a multithreaded web crawler. It is fairly simple but I'd like to ...
-1
votes
1answer
127 views
Selenium Kijiji web scraper
I have this script working pretty well but I know that there must be many things that I could do better to make it more efficient.
...
5
votes
1answer
75 views
Python Politico API attempt
I love politics, and I love programming, so I figured why not try and combine the two for something to do? I'm making a work-in-progress (but runnable at this stage) Politico api that I call "...
1
vote
0answers
19 views
Wikipath stack in Java - Part II/IV - The implicit Wikipedia article graph
This question is the continuation of the Wikipath stack series: the two classes that - given a Wikipedia article \$A\$ - return the lists of neighbour articles. The forward node expander return the ...
0
votes
1answer
65 views
1
vote
1answer
615 views
Downloading yahoo finance stock historical data as CSV using C++
This post is a continuation of my previous post where I used jsoncpp package to fetch exchange rates from fixer.io. In this post I have reused the above code and used it to fetch stock historical ...
3
votes
1answer
88 views
Fetching specific foreign exchange rates from fixer using curl and jsconcpp in C++
I am trying to create my own algorithmic trading system using C++. I have searched the web for a nice tutorial for such systems and I didnt find any. Then I started to learn about ...
2
votes
1answer
69 views
Optimizing Java HTML parser
I wrote a program that goes through a webpage and returns matches of regex. I used it on my letterboxd.com account to go through all of my movies (over 900 entries) and then find genres field for each ...
6
votes
2answers
119 views
Crawl site getting URL and status code
I wrote a crawler that for every page visited collects the status code.
Below my solution. Is this code optimizable?
...
5
votes
1answer
218 views
Session handling using Python Requests client
I'm using this code to login to an experiment login system created by me for this purpose.
...
5
votes
1answer
116 views
Web scraping VBA and VB Script
I am working on a project on VBA where the objective is to have a "program" that fetches rates from a website called X-Rates, and outputs to excel the monthly averages of a chosen country.
Initially ...
7
votes
0answers
64 views
Regex-guided crawler that downloads regex-matching images up to a crawling level
This is one simple crawler that downloads images from websites, the website's URL to be crawled to must match the regex, as well as any image-to-download's URL.
(Also, I know, I made my own thread ...
6
votes
1answer
103 views
Simple image scraping
I wrote this code over the last few days and I learned a lot, and it works as I expect. However I suspect it's woefully inefficient:
...
10
votes
1answer
963 views
Scraping after login using Scrapy
I just finished a scraper in python using scrapy. The scraper logs in to a certain page and then scrapes a list of other pages using the authenticated session.
It retrieves the title of these pages ...
6
votes
1answer
43 views
Script to download sequentially named files, rename them, and delete smaller files
I've written a little script to download sequentially named files, rename them, and delete files smaller than an certain number of kilobytes. I came up with this but I'm not too happy. Any advice for ...
2
votes
1answer
405 views
PHP web crawler
I'm working on a "nice" crawler that start with one URL, and find the other URLs to process each page, a kind of "Google" crawler, to index pages.
I worked hard on this crawler to respect many points ...
4
votes
2answers
66 views
I'll visit the 18th
I wrote this program, which purpose is to visit the 18th link on the list of links and then on the new page visit the 18th link again.
This program works as intended, but it's a little repetitive and ...
4
votes
1answer
102 views
Recursive Web Crawler in Go
This is probably my third Go application. It essentially takes one or two command line arguments of wikipedia articles and pulls every /wiki/ link that isn't a special page, memoizes them to avoid ...
0
votes
0answers
57 views
Finding shortest paths in a Wikipedia article graph using Java - second attempt
I have improved Finding shortest paths in a Wikipedia article graph using Java.
Now I have this:
AbstractWikipediaShortestPathFinder.java:
...
9
votes
3answers
239 views
Scraping the date of most recent post from various social media services
Task
I have a large spreadsheet where each line should include:
The URL of a social media account
A field indicating whether the account is "active"
A name and UID number for each account
I have to ...
5
votes
1answer
108 views
Finding shortest paths in a Wikipedia article graph using Java
(See also Finding shortest paths in a Wikipedia article graph using Java - second attempt.)
I have this sort of a web crawler that asks for two (English) Wikipedia article titles (the source and the ...
10
votes
1answer
585 views
The YouTube crawler
I have coded a program to scrap YouTube data (for educational purposes). When the link of the channel is entered it scraps the channel name, description of the channel, the videos posted by the ...
1
vote
2answers
64 views
Crawling SPOJ through cURL and C++
I am trying to write industry standard code.
https://www.quora.com/How-do-I-follow-a-user-on-Spoj-for-solving-problems-Refer-Details
Someone gave me this A2A.
And I wrote this code for it
...