Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
2 votes
0 answers
68 views

Simplified HTML parsing for LEGO features

The goal is to extract the the Features section from a Lego product page. In the Features section, usually there's a header (...
alvas's user avatar
  • 709
1 vote
1 answer
54 views

Extracting information from HTML with XSLT 3.0 when data is grouped visually as siblings in a td separated by blank lines

I have a work-in-progress where I'm using XSLT 3 to extract information from some preprocessed archaic HTML. I'd like to produce JSON showing the relationships between the various entities for further ...
Forensic_07's user avatar
2 votes
1 answer
2k views

Parsing scraped data from html table

I've written a simple python web scraper that parses text from an html table and stores the scraped data in List of dictionaries. The code works and doesn't seem to have any glaring issues performance-...
loremIpsum1771's user avatar
2 votes
1 answer
1k views

Python script to scrape titles of public Youtube playlist

Just started in Python; wrote a script to get the names of all the titles in a public Youtube playlist given as input, but it got messier than it might have to be. I looked around online and found ...
Honest Escape's user avatar
3 votes
2 answers
1k views

Performance and Readability Improvements for HTML Parser with BeautifulSoup

This function takes as an argument a JSON file (could contain anything in JSON format, since I scrape hundreds of random pages) and returns a list of dictionaries where a URL is mapped to its ...
oba2311's user avatar
  • 197
1 vote
1 answer
14k views

Extract html content based on tags, specifically headers

I want the function to take as an input json file containing html_body with its corresponding url and return list of tuples containing headers and their corresponding url (so could be tuple with one ...
oba2311's user avatar
  • 197
3 votes
1 answer
749 views

Scraping data from a table in python

I'm new to python, and after doing a few tutorials, some about scraping, I've been trying some simple scraping on my own. Using BeautifulSoup I manage to get data from web pages where everything has ...
Pablo's user avatar
  • 33
2 votes
1 answer
160 views

Optimizing Java HTML parser

I wrote a program that goes through a webpage and returns matches of regex. I used it on my letterboxd.com account to go through all of my movies (over 900 entries) and then find genres field for each ...
mlukas's user avatar
  • 21
4 votes
1 answer
291 views

HTML Scraper for Plex downloads page

I have written a scraper in Python 3 using Beautiful Soup 4 to retrieve the latest version of Plex Media Server from https://plex.tv, and I'd like some feedback on how to improve it. The HTML the ...
Jack Wilsdon's user avatar
  • 1,661
3 votes
0 answers
146 views

Using Nokogiri to scrape Oscars winners from Wikipedia

I am scraping a Wikipedia page, getting info from that page and instantiating a new object with the information collected: ...
Cyzanfar's user avatar
  • 223
5 votes
2 answers
289 views

Press any login button on any site

I'm working on a script that will be able to press the login button on any site for an app I'm working on. I have it working (still a few edge cases to work out such as multiple submit buttons and ...
Levi Fuller's user avatar
1 vote
1 answer
173 views

Program to create list of all English Wikipedia articles

This program will scrape Wikipedia to create a list of all English Wikipedia articles. How can I improve this program as it currently performs very badly performance-wise? On my Internet connection ...
Dominik Schmidt's user avatar
2 votes
0 answers
139 views

Compressing a blog into a preview using tumblr_api_read

Here is what I have currently working. I would like to make it look more aesthetically pleasing, so not finish words in mid word. Also not have the two previews be so much larger than the other. ...
raney24's user avatar
  • 21
5 votes
3 answers
111 views

Clean up repeated file.writes, if/elses when adding keys to a dict

I'm getting familiar with python and I'm still learning it's tricks and idioms. Is there an better way to implement print_html() without the multiple calls to <...
Creek's user avatar
  • 305
4 votes
2 answers
10k views

Scraping HTML using Beautiful Soup

I have written a script using Beautiful Soup to scrape some HTML and do some stuff and produce HTML back. However, I am not convinced with my code and I am looking for some improvements. Structure of ...
avi's user avatar
  • 993

15 30 50 per page