All Questions
Tagged with python web-scraping
46
questions with no upvoted or accepted answers
7
votes
0answers
150 views
Get a movie directory, rename it and save imdb info as a webpage
I have my movie folders formatted:
~: no sub included
{}: already watched
nothing: sub included, ready to watch
the program asks for the movie directory; extracts the name and searches IMDb for the ...
6
votes
0answers
73 views
Booking an East London Tennis Court
Description
I'm not sure if it Covid-19 but lately it is impossible to book a tennis court in my area on time. It's always full or maybe just don't check enough :)
To beat the queue and get notified ...
6
votes
2answers
132 views
Web scraper for driver's license test times
I have created a small selenium script that checks for available times to write a test for a drivers license. The program runs every minute and takes approx 50 seconds to run. I have noticed that it's ...
6
votes
0answers
849 views
Parsing different categories using Scrapy from a webpage
I've written a script in Python Scrapy to parse different "model", "country" and "year" of various bikes from a webpage. There are several subcategories to track to reach ...
4
votes
0answers
60 views
Scraping a hiring website using python's requests and BeautifulSoup
I'm designing a scraping application using python, requests and BeautifulSoup4.
I decided to divide the logic into two classes:
Spider : gets the base url ...
4
votes
0answers
88 views
List all links in a website
I wrote this code as part of a project I'm working on. It's supposed to get all links from a website and then perform black-box tests on them. How can I improve this code to be faster and more ...
4
votes
0answers
812 views
Speeding web-scraping up Python 3
I want to get information from two websites and display it on 'real-time' in the console.
To get the information from the website I am using BeautifulSoup 4. I have read, that the bottleneck of ...
3
votes
0answers
60 views
Scraping OddsPortal with requests only
This is a scraper written to do most of what had been attempted by another user in this question:
How can I optimise this webscraping code
I did the rewrite because I felt bad that the new user didn't ...
3
votes
0answers
47 views
Speed Up API Requests & Overall Python Code
I'm not asking for help solving a problem but rather asking for help for possible ways to improve the speed of my program.
Essentially what this does is:
Tracks market data by pulling the data from ...
3
votes
0answers
68 views
Scraping reddit using Python
My objective is to find out on what other subreddit users from r/(subreddit) are posting on; you can see my code below. It works pretty well, but I am curious to know if I could improve it by:
First, ...
3
votes
0answers
45 views
Scraping local news sites
This is my first Python web scraper (and overall my first Python project). I am also relatively new to OOP but do understand its core fundamentals. The script below scrapes two local news sites for ...
3
votes
0answers
299 views
Python web scraper
This is my first attempt at a sizeable amount of code in Python and I've made an attempt to standardize a script into a library so that it can be reused.
However, ...
3
votes
0answers
159 views
Web scrape data for a list of stocks
I wrote a script that will web scrape data for a list of stocks. The scraper has to get the data from 2 separate pages so each stock symbol must scrape 2 different pages. If I run the process on a ...
3
votes
0answers
181 views
Scrape a stock value and add it to a csv file along with other data. [Python 3]
Any and all suggestions are welcome, mainly looking for suggestions on clean up, and making it more readable as I feel that it is inefficient and not very clean.
Some specific things:
A better way ...
3
votes
0answers
51 views
VIM colors downloader in Python, using multiprocessing
I recently posted this script:
VIM colors downloader in Python
But since I'm not allowed to update the code there, I wanted to get an idea on this version, that uses multiprocessing:
...
3
votes
0answers
203 views
Checking paginated website for new entries
I'm interested in determining the best way to check a paginated website for new entries. I want to be able to scrape pages 1, 2, 3, ... as necessary to get all updates. However the scraping is fairly ...
3
votes
0answers
562 views
Scraping links from the first page of Google using Kivy
I'm making a scraper/web crawler using Kivy when I run the code it works but I'm not sure if what I'm doing is Pythonic because all the language I can find is about using the Kivy library. I'm unsure ...
2
votes
0answers
64 views
Python script to scrape google maps without API
I wrote a python script to scrape google maps for my app. I really want my code to be readable and have tried to follow PEP-8 wherever I could, so I have come to you all for guidance.
It uses selenium ...
2
votes
0answers
32 views
requests vs selenium vs scrapy
This is a follow-up of my question over here.
I have been working on my web-scraping techniques by trying out different approaches and rewriting code for a handful of online databases.
I've recently ...
2
votes
0answers
278 views
Discord Bot Python. Selenium screenshots
I wrote my own Discord Bot which is taking screenshot of a specific website. This scrip is very simple and thought it should work fast, but it isn't. I read a lot, and I think I'm not able to improve ...
2
votes
0answers
42 views
Scraping Forum Tables w/ Links Using Beautiful Soup
This code scrapes post links (among other information) from a table on a forum. While the current code works, I would like to know if there is a better/simpler way of writing it (maybe not as many for-...
2
votes
0answers
834 views
Downloading multiple urls with aiohttp Python 3
I am trying to use aiohttp library in python to download information from url. I have about 300 000 urls. They are saved in file "my_file.txt". When I get web page, I extract pairs of a question and ...
2
votes
0answers
298 views
Wrapping a web scraper in a RESTful API
The problem I am looking to solve is wrapping a web scraper in a RESTful API such that it can be called programmatically from another application, frontend or microservice. The overall goal is that ...
2
votes
0answers
968 views
Grabbing information traversing multiple pages
I've written a script in python in combination with selenium to parse different information from a webpage and store the collected data in a csv file. The data are rightly coming through. The email ...
2
votes
0answers
831 views
Recursively scrape links from web pages and check them
I'm new to programming and especially new to object oriented programming. I have built a web scraper using functional programming and am trying to build another using OOP principles.
The overall idea ...
2
votes
0answers
158 views
Handling IndexError using lambda function within scrapy
I've written a script using python's scrapy library to parse some fields from craigslist. The spider I've created here is way normal than what usually gets considered ideal to be reviewed. However, I'...
2
votes
0answers
169 views
Football Web Scraper Part 2
A revised version of the code in this question.
Things I have done so far:
Adjusted some formatting things like constant & variable naming and indentation
Wrapped most of the functions into a ...
2
votes
0answers
53 views
River Flood Warning System v2.1 - Those Pesky NoneTypes
Below is my River Flood Warning System version 2 build 1. After following the help and advice given for version 1 the whole program is looking and behaving much better. The original code would only ...
2
votes
0answers
1k views
Google Searching Bot with Proxy support
I have been asked by a client to program a bot which searches Google and will show how many no of results I get.
Note: I know about Google Custom Search API and it will not produce the exact output ...
2
votes
0answers
2k views
A web crawler for scraping images from stock photo websites
I created a web crawler that uses beautiful soup to crawl images from a website and scrape them to a database. in order to use it you have to create a class that inherits from Crawler and implements 4 ...
2
votes
0answers
266 views
BeautifulSoup web spider for driver links
The following spider will grab some driver links, OS version, and the name.
All the info is in a table class, but some pages might be a little different in the location and Number of cells in each row....
2
votes
0answers
329 views
Prototype spider for indexing RSS feeds
This code is super slow. I'm looking for advice on how to improve its performance.
...
1
vote
0answers
16 views
Organizing things together to form a minimum viable Scraper App (part 2)
This is a follow-up of my question over here.
Response to @Reinderien's answer:
I have corrected the more trivial issues highlighted in @Reinderien's answer below as follows. ...
1
vote
0answers
52 views
Optimising this web-scraping code
A member of SO who I immensely respect just told me that the code below makes him uncomfortable.
...
1
vote
0answers
56 views
Efficient Multithreading in Python 3 - Webscraping behind a login
I'm a total beginner at Python and I would like someone to rate my code and to give me tips about talking between threads in Python. I don't know if I need these much threads for this script, but by ...
1
vote
0answers
37 views
Webscraping code to import logs from a website which is about to die
We all know that the famous twitch log site OverRustleLogs is getting shut down. So I decided to do some web scraping to download my favourite streamer's logs using BeautifulSoup. How can make this ...
1
vote
0answers
142 views
Web scraping dynamic content
Hi I'm fairly new to coding and would appreciate some feedback on the code. This is for a site that has a dynamic login page and infinite scrolling.
I wanted to use scrapy not necessarily because it ...
1
vote
0answers
44 views
bs4 HTML cleaner function with repetitive nested if clauses,
[I made a little web-scraper that downloads the source html files as well. Now for the sake of saving storage space i wrote a small function to delete quite a bit of stuff in the html file (specific ...
1
vote
0answers
111 views
Scraping and printing titles from Craigslist
I've written a very tiny script using class to scrape some titles of products from craigslist. My intention is to make use of __str__() method so that my script can ...
1
vote
0answers
62 views
Python3.x Download(async) + Process(bs4) + Save(EPUB)
I had a simple webscraper with Beautiful Soup 4 which downloaded novel chapters from a website and converted them to an EPUB file. It was straight and simple imperative programming.
Then I thought, ...
1
vote
0answers
237 views
Extracting data from a used car sales site
I am developing code for extracting data from a used car sales site. There are 4 sites in total. In 3 of them I use requests and beautifulsoup. The time taken to extract data from these sites was ...
1
vote
0answers
435 views
Scraping web data using asynchronous request
I've written a script using python to grab different categories from a webpage. I used "grequests" in my scraper to perform the activity. My intention here was to perform the action swiftly making ...
1
vote
0answers
1k views
GUI in Tkinter to log events for a web-scraper
I'm creating a GUI with tkinter that will handle starting/stopping/and logging events for a web-scraper (scraper not created yet).
The current code is working... but I've been gathering my ...
1
vote
0answers
2k views
Email a notification when detecting changes on a website - follow-up
I read through other questions here and improved the code and added a new feature. The old question can be found at: Email a notification when detecting changes on a website
The improvements that are ...
0
votes
0answers
25 views
Adding page iteration capability to Requests scraper
I am trying to build on @Reinderien's answer to my previous question over here to add page iteration functionality to the code:
...
0
votes
0answers
71 views
Asynchronous web scraping
This is my solution to a "vacancy test" task.
I'm not sure at all if I have correctly implemented the task, but here is my solution.
Goals of code:
Parse rows of table from a URL and ...