All Questions
Tagged with python web-scraping
47
questions
13
votes
3
answers
2k
views
AniPop - The anime downloader
Note: The topics of performance and Selenium/BS4 have not yet been addressed,
so this question can still receive a better answer!
Chat Room: https://chat.stackexchange.com/rooms/100275/anipop-...
5
votes
1
answer
590
views
Instagram scraper Posts (Videos and Photos)
I wrote this code which has the ability to download images and videos from a specific Instagram profile.
Using multiprocessing and threading I managed to speed up the extraction of data.
My goal is ...
3
votes
1
answer
3k
views
Scraping Instagram with selenium, extract URLs, download posts
I made a very simple Instagram Bot that can download images and videos of the user, like Gallery with photos or videos. It saves the data in the folder.
How it works:
Creating directory for saving ...
6
votes
1
answer
524
views
Web scraper that extracts urls from Amazon and eBay
Description: This is a simple script for scraping Amazon and eBay category, sub-category and product URLs and saving contents to files. In case
of previously saved files, the files will be read and no ...
6
votes
1
answer
2k
views
Scraping Instagram - Download posts, photos - videos
Python script that can downloads public and private profiles images and videos, like Gallery with photos or videos. It saves the data in the folder.
How it works:
Log in in instragram using selenium ...
5
votes
0
answers
1k
views
Scraping OddsPortal with requests only
This is a scraper written to do most of what had been attempted by another user in this question:
How can I optimise this webscraping code
I did the rewrite because I felt bad that the new user didn't ...
3
votes
1
answer
737
views
Web scraping using selenium, multiprocessing, InstagramBot
An Instagram Bot which downloads the posts from profile
I have to mention my previous posts:
Instagram scraper Posts (Videos and Photos)
Scraping Instagram with selenium, extract URLs, download ...
2
votes
1
answer
630
views
Instagram Scraping Using Selenium
Python script that can download images and videos of the user, like Gallery with photos or videos. It saves the data in the folder.
How it works:
Log in in instragram using selenium and navigate to ...
2
votes
1
answer
307
views
Download pictures (or videos) from Instagram using Selenium
Python script that can downloads public and private profiles images and videos, like Gallery with photos or videos. It saves the data in the folder.
How it works:
Log in in instragram using selenium ...
2
votes
1
answer
1k
views
Instagram Scraping Posts Using Selenium
Python script that can download images and videos of the user, like Gallery with photos or videos. It saves the data in the folder.
How it works:
Log in in instragram using selenium and navigate to ...
1
vote
1
answer
718
views
Instagram Bot, selenium, web scraping
I made some changes in my code from the previous post.
The changes that I made:
I put all the functions to the class
All the global arrays I moved them to class too
Created ...
44
votes
3
answers
2k
views
We'll be counting stars
Lately, I've been, I've been losing sleep
Dreaming about the things that we could be
But baby, I've been, I've been praying hard,
Said, no more counting dollars
We'll be counting stars, yeah we'll be ...
20
votes
4
answers
10k
views
Web scraping the titles and descriptions of trending YouTube videos
This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?
...
14
votes
2
answers
840
views
VIM colors downloader in Python
Recently, I wanted to change my vim colors to something new. So I went to the vim colors website and then I decided that I wanted to download ALL the colors.
So I ...
12
votes
3
answers
1k
views
Minimal webcrawler - bad structure and error handling?
I did this code over one day as a part of a job application, where they wanted me to make a minimal webcrawler in any language. The purpose was to crawl a site, find all of the URLs on that page, and ...
10
votes
1
answer
636
views
River Flood Warning system in Python
This code represents my first real Python 3 program. It retrieves flood data from the NWS weather center for the river near my home and posts a warning to my Facebook page whenever certain flood ...
10
votes
1
answer
832
views
Let's read a random Goodreads book in an optimal way
I have made the following program to gather data on random books from Goodreads, via their random books feature.
...
9
votes
1
answer
2k
views
A library for interacting with Pinnacle Sports Bets API
My code provides the following functionality for interacting with Pinnacle Bets API:
retrieving betting history
retrieving fixtures (future events)
retrieving odds for the given leagues (competitions)...
8
votes
2
answers
2k
views
Image downloader for a website v2
This code takes a website and downloads all .jpg images in the webpage. It supports only websites that have the img element and src contains a .jpg link. The previous version can be found here
...
8
votes
2
answers
2k
views
Parsing Wikipedia table with Python
I am new to Python and recently started exploring web crawling. The code below parses the S&P 500 List Wikipedia page and writes the data of a specific table into a database.
While this script is ...
8
votes
2
answers
4k
views
Finding words that rhyme
Preface
I was trying to review this question on the same topic, but in the end many points I wanted to make were excellently explained by @ferada so I felt that posting my code and explaining the ...
6
votes
3
answers
692
views
Scraping a webpage copying with the logic of scrapy
Today, while coming across a tutorial made by ScrapingHub on Scrapy about how it usually deals with a webpage while scraping it's content. I could see that the same logic applied in Scrapy can be ...
6
votes
3
answers
9k
views
Email a notification when detecting changes on a website
The text of a website is checked in a given time period. If there are any changes a mail is sent. There is a option to show/mail the new parts in the website. What could be improved?
...
5
votes
1
answer
545
views
Ultra fast Amazon scraper multi-threaded
This is a follow up to the code here: Web scraper that extracts urls from Amazon and eBay
A multi-threaded modification to the previous version that is Amazon focused and most of the necessary ...
5
votes
1
answer
210
views
Richelieu - product scraper
I wanted to see how would I deal with a large amount of data being scraped and written into a CSV file, so I decided to get the info out of a random website.
First off, I found a way to search for ...
4
votes
1
answer
884
views
Cleaner way of appending data to List in BeautifulSoup
So I've been experimenting various way to get data from different variety of website; as such, between the use of JSON or BeautifulSoup. Currently, I have written a scraper to collect data such as <...
4
votes
0
answers
124
views
The anime downloader [duplicate]
NOTE: Here's the latest version of this program, since this question idled out.
This is a recreational script made to update my home server w/ the latest anime from HorribleSubs. I'd like to know if ...
4
votes
2
answers
10k
views
Scraping HTML using Beautiful Soup
I have written a script using Beautiful Soup to scrape some HTML and do some stuff and produce HTML back. However, I am not convinced with my code and I am looking for some improvements.
Structure of ...
4
votes
1
answer
2k
views
Web scraper for Football (Soccer) data with BeautifulSoup and Requests
I wrote a web scraper to get football scores from here. I'm getting the data for all seasons for the three major German leagues. It all works at the moment, but I'm sure it's possible to make it a lot ...
4
votes
1
answer
697
views
Scraping a dynamic website with Scrapy (or Requests) and Selenium
I am trying to use Scrapy for one of the sites I've scraped before using Selenium over here.
Because the search field for this site is dynamically generated and requires the user to hover the cursor ...
4
votes
1
answer
333
views
Beginner web scraper for Nagios
I am attempting to learn Python. It was suggested to me to try a web scraper, so I thought to get myself to look at multiple Nagios instances. I have not programmed in Python before, but learned from ...
3
votes
2
answers
2k
views
Instagram Scraping Using Selenium - Download Posts - Photos - Videos
Python script that can downloads public and private profiles images and videos, like Gallery with photos or videos. It saves the data in the folder.
How it works:
Log in in instragram using selenium ...
3
votes
1
answer
84
views
3
votes
0
answers
119
views
requests vs selenium vs scrapy
This is a follow-up of my question over here.
I have been working on my web-scraping techniques by trying out different approaches and rewriting code for a handful of online databases.
I've recently ...
3
votes
1
answer
478
views
Scraping current day Counter-Strike match results from a website
As a fan of competitive Counter-Strike, I like to keep up with who is currently winning and who is losing. There is a website that provides me with just that. I thought it would be cool if I could ...
3
votes
1
answer
147
views
Reaching the philosophy wiki page - Follow Up
This is a follow up to my original post:
I've written a class that will start from a random Wikipedia page, then choose the first link in the main body, and then navigate following the links until ...
3
votes
3
answers
2k
views
Image downloader for a website
This code takes a website and downloads all .jpg images in the webpage. It supports only websites that have the <img> element and ...
3
votes
1
answer
116
views
Reaching the philosophy wiki page
I've written a class that will start from a random Wikipedia page, then choose the first link in the main body, and then navigate following the links until it finds the Philosophy page. When I run the ...
3
votes
1
answer
589
views
Web crawlers for three image sites
I'm very new to python and only vaguely remember OOP from doing some Java a few years ago so I don't know what the best way to do this is.
I've build a bunch of classes that represent a crawler that ...
2
votes
1
answer
113
views
Take information from a webpage and compare to previous request
After I have been doing some improvements from my Previous code review. I have taken the knowledge to upgrade and be a better coder but now im here again asking for Code review where I think it could ...
2
votes
0
answers
2k
views
A web crawler for scraping images from stock photo websites
I created a web crawler that uses beautiful soup to crawl images from a website and scrape them to a database. in order to use it you have to create a class that inherits from Crawler and implements 4 ...
1
vote
0
answers
20
views
Organizing things together to form a minimum viable Scraper App (part 2)
This is a follow-up of my question over here.
Response to @Reinderien's answer:
I have corrected the more trivial issues highlighted in @Reinderien's answer below as follows. ...
1
vote
2
answers
1k
views
Basic search engine
I want to improve efficiency of this search engine. It works in about 10 seconds for a search depth of 1, but 4 minutes at 2 etc.
I tried to give straightforward comments and variable names, any ...
1
vote
1
answer
5k
views
Crawl multiple pages at once
This an update to my last question.
I want to process multiple pages at once pulling URLs from tier_list in the crawl_web ...
1
vote
0
answers
2k
views
Email a notification when detecting changes on a website - follow-up
I read through other questions here and improved the code and added a new feature. The old question can be found at: Email a notification when detecting changes on a website
The improvements that are ...
1
vote
1
answer
13k
views
Extract html content based on tags, specifically headers
I want the function to take as an input json file containing html_body with its corresponding url and return list of tuples containing headers and their corresponding url (so could be tuple with one ...
1
vote
1
answer
118
views
Organizing things together to form a minimum viable Scraper App
This is a follow-up of my group of scraper questions starting from here.
I have thus far, with the help of @Reinderien, written 4 separate "modules" that expose a ...