All Questions
Tagged with web-scraping python
429 questions
3
votes
1
answer
106
views
Multi-Page Web Scraping Code Using Selenium with Multithreading
I have written a web scraping script using Selenium to crawl blog content from multiple URLs. The script processes URLs in batches of 1000 and uses multithreading with the ThreadPoolExecutor to ...
5
votes
2
answers
687
views
Readability and error handling improvements for Python web scraping class
Description
I recently wrote a Python script to download files from the Library of Congress (LOC) based on a search query. The code fetches metadata, extracts file ...
2
votes
1
answer
80
views
Scrapy Spider for fetching product data from multiple pages of a website
I have written a Scrapy spider to scrape product data from a website. The spider navigates through multiple pages to reach a specific product and extracts details such as the product name, price, ...
3
votes
2
answers
93
views
Validating a web crawlers page visits with a decorator
I am writing a crawler that is going to end up in production and I was trying to come up with a way to validate its page visits. It scrapes asp.net pages so each scraping process involves a few ...
5
votes
3
answers
839
views
code format and steps web scraping using beautiful soup
I've done simple web scraping and want to make sure all my steps are correct? Is it considered clean code? Is there a better way to use the multi-page scraping feature?
...
0
votes
2
answers
158
views
Drayage Webscraper: Limited to table structure
This is my first working scraper. I'm sure a lot can be improved. My biggest question is how can I better specify what data to pull? All the data I'm currently grabbing is needed, but I couldn't ...
2
votes
1
answer
72
views
A selenium web scraper to package NBA data
I'm building a selenium web scraper for basketball-reference.com that takes a player name and returns data in either a JSON format or Pandas DataFrame object. The class in question is one of many that ...
5
votes
1
answer
196
views
Scraping the Divar.ir
I've wrote a code to scrape the Divar, which is an equivalent of Ebay in Iran. I have a few questions:
Am I doing the error handling and logging ok?
Is there a better way to optimize this code? (note ...
1
vote
2
answers
186
views
Web scraping spider
I'm currently working on my first web scraping project and I need to scrape a lot of websites. With my current code it takes more than a day but for my project I need to scan the same websites every 5 ...
0
votes
1
answer
116
views
Poetry Web Scraping in Python [closed]
I have a script that obtains urls that lead to a specific poem. This code current works and uses multiprocessing pools. I currently am getting restricted or blocked by some way from the website that I ...
3
votes
1
answer
72
views
HTTP scraper for Python Package
I'm trying to make my first Python package as a learning experience. There's a lot of things that I suspect I am doing poorly, but this post is specifically about my HttpRequest class. I made this ...
3
votes
1
answer
72
views
URL link scraper and analyser
I recently wrote a testing tool (called plink) for retrieving all the links from a website (and then retreiving links from the linked pages, and so on).
Essentially,...
4
votes
2
answers
266
views
Test generator I made for practice
Made this generator to practice using imports from other modules and better readability for coding. What could I have done better and what did I do wrong?
File called test_generator.py
...
2
votes
1
answer
116
views
Search Stack Overflow and GitHub for code in a specified language
This code is designed to scrape Stack Overflow and GitHub, pulling information based on a user-specified programming language and processing the data into a format for AI learning.
It uses a number of ...
3
votes
1
answer
229
views
A simple web scraper for nature.com news articles
I have created a simple web scraper that fetches news article previews from nature.com and saves each article to a file containing the article preview text.
I am learning independently, so I would ...