Newest 'web-scraping+html' Questions

2 votes

0 answers

68 views

Simplified HTML parsing for LEGO features

The goal is to extract the the Features section from a Lego product page. In the Features section, usually there's a header (...

alvas

709

asked Apr 7, 2023 at 7:29

1 vote

1 answer

54 views

Extracting information from HTML with XSLT 3.0 when data is grouped visually as siblings in a td separated by blank lines

I have a work-in-progress where I'm using XSLT 3 to extract information from some preprocessed archaic HTML. I'd like to produce JSON showing the relationships between the various entities for further ...

Forensic_07

135

asked Feb 1, 2022 at 2:24

2 votes

1 answer

2k views

Parsing scraped data from html table

I've written a simple python web scraper that parses text from an html table and stores the scraped data in List of dictionaries. The code works and doesn't seem to have any glaring issues performance-...

loremIpsum1771

469

asked Jul 17, 2018 at 2:47

2 votes

1 answer

1k views

Python script to scrape titles of public Youtube playlist

Just started in Python; wrote a script to get the names of all the titles in a public Youtube playlist given as input, but it got messier than it might have to be. I looked around online and found ...

Honest Escape

23

asked Feb 2, 2018 at 5:05

3 votes

2 answers

1k views

Performance and Readability Improvements for HTML Parser with BeautifulSoup

This function takes as an argument a JSON file (could contain anything in JSON format, since I scrape hundreds of random pages) and returns a list of dictionaries where a URL is mapped to its ...

oba2311

197

asked Jun 29, 2017 at 12:03

1 vote

1 answer

14k views

Extract html content based on tags, specifically headers

I want the function to take as an input json file containing html_body with its corresponding url and return list of tuples containing headers and their corresponding url (so could be tuple with one ...

oba2311

197

asked Jun 26, 2017 at 13:36

3 votes

1 answer

749 views

Scraping data from a table in python

I'm new to python, and after doing a few tutorials, some about scraping, I've been trying some simple scraping on my own. Using BeautifulSoup I manage to get data from web pages where everything has ...

Pablo

33

asked Mar 30, 2017 at 19:15

2 votes

1 answer

160 views

Optimizing Java HTML parser

I wrote a program that goes through a webpage and returns matches of regex. I used it on my letterboxd.com account to go through all of my movies (over 900 entries) and then find genres field for each ...

mlukas

21

asked Jul 22, 2016 at 17:55

4 votes

1 answer

291 views

HTML Scraper for Plex downloads page

I have written a scraper in Python 3 using Beautiful Soup 4 to retrieve the latest version of Plex Media Server from https://plex.tv, and I'd like some feedback on how to improve it. The HTML the ...

Jack Wilsdon

1,661

asked Jan 19, 2016 at 19:40

3 votes

0 answers

146 views

Using Nokogiri to scrape Oscars winners from Wikipedia

I am scraping a Wikipedia page, getting info from that page and instantiating a new object with the information collected: ...

Cyzanfar

223

asked May 24, 2015 at 18:33

5 votes

2 answers

289 views

Press any login button on any site

I'm working on a script that will be able to press the login button on any site for an app I'm working on. I have it working (still a few edge cases to work out such as multiple submit buttons and ...

Levi Fuller

163

asked Feb 1, 2015 at 18:44

1 vote

1 answer

173 views

Program to create list of all English Wikipedia articles

This program will scrape Wikipedia to create a list of all English Wikipedia articles. How can I improve this program as it currently performs very badly performance-wise? On my Internet connection ...

Dominik Schmidt

143

asked Nov 20, 2014 at 19:37

2 votes

0 answers

139 views

Compressing a blog into a preview using tumblr_api_read

Here is what I have currently working. I would like to make it look more aesthetically pleasing, so not finish words in mid word. Also not have the two previews be so much larger than the other. ...

raney24

21

asked Oct 14, 2014 at 21:47

5 votes

3 answers

111 views

Clean up repeated file.writes, if/elses when adding keys to a dict

I'm getting familiar with python and I'm still learning it's tricks and idioms. Is there an better way to implement print_html() without the multiple calls to <...

Creek

305

asked Jun 27, 2014 at 22:19

4 votes

2 answers

10k views

Scraping HTML using Beautiful Soup

I have written a script using Beautiful Soup to scrape some HTML and do some stuff and produce HTML back. However, I am not convinced with my code and I am looking for some improvements. Structure of ...

avi

993

asked Sep 9, 2013 at 12:05

Stack Exchange Network

All Questions

Simplified HTML parsing for LEGO features

Extracting information from HTML with XSLT 3.0 when data is grouped visually as siblings in a td separated by blank lines

Parsing scraped data from html table

Python script to scrape titles of public Youtube playlist

Performance and Readability Improvements for HTML Parser with BeautifulSoup

Extract html content based on tags, specifically headers

Scraping data from a table in python

Optimizing Java HTML parser

HTML Scraper for Plex downloads page

Using Nokogiri to scrape Oscars winners from Wikipedia

Press any login button on any site

Program to create list of all English Wikipedia articles

Compressing a blog into a preview using tumblr_api_read

Clean up repeated file.writes, if/elses when adding keys to a dict

Scraping HTML using Beautiful Soup

Hot Network Questions