100
votes
6answers
16k views

What are the pros and cons of the leading Java HTML parsers?

Searching SO and Google, I've found that there are a few Java HTML parsers which are consistently recommended by various parties. Unfortunately it's hard to find any information on the strengths and ...
139
votes
8answers
253k views

How to send HTTP request in java?

I want to compose a HTTP request message in java and then want to send it to a HTTP WebServer. I also want the document content of the page recieved which I would have recieved if I had sent the same ...
120
votes
18answers
122k views

Removing HTML from a Java String

Is there a good way to remove HTML from a Java string? A simple regex like replaceAll("\\<.*?>","") will work, but things like &amp; wont be converted correctly and non-HTML between ...
37
votes
2answers
57k views

Java: How to decode HTML character entities in Java like HttpUtility.HtmlDecode?

Basically I would like to decode a given Html document, and replace all special chars, such as "&nbsp" -> " ", "&gt;" -> ">". In .NET we can make use of HttpUtility.HtmlDecode. What's ...
39
votes
11answers
58k views

Java HTML Parsing [closed]

I'm working on an app which scrapes data from a website and I was wondering how I should go about getting the data. Specifically I need data contained in a number of div tags which use a specific CSS ...
35
votes
11answers
21k views

Converting HTML files to PDF [closed]

I need to automatically generate a PDF file from an exisiting (X)HTML-document. The input files (reports) use a rather simple, table-based layout, so support for really fancy JavaScript/CSS stuff is ...
12
votes
10answers
33k views

Convert Word doc to HTML programmatically in Java

I need to convert a Word document into HTML file(s) in Java. The function will take input an word document and the output will be html file(s) based on the number of pages the word document has i.e. ...
47
votes
3answers
32k views

Which Html Parser is best? [closed]

I code a lot of parsers. Up till now, I was using HtmlUnit headless browser for parsing and browser automation. Now, I want to separate both the tasks. As 80% of my work involves just parsing, I ...
51
votes
6answers
50k views

Recommended method for escaping HTML in Java

Is there a recommended way to escape <, >, " and & characters when outputting HTML in plain Java code? (Other than manually doing the following, that is). String source = "The less than ...
5
votes
7answers
2k views

Swing HTML drawString

I'm trying to create some special component for a specific purpose, on that component I need to draw a HTML string, here's a sample code: public class MyComponent extends JComponent{ public ...
1
vote
1answer
517 views

Inconsistent performance applying ForegroundActions in a JEditorPane when reading HTML

I'm building an HTML editor using JEditorPane, but I'm getting some inconsistent performance issues with Foreground Actions. I have a simplified version of my editor below that has three actions: ...
19
votes
10answers
12k views

How to “scan” a website (or page) for info, and bring it into my program?

Well, I'm pretty much trying to figure out how to pull information from a webpage, and bring it into my program (in Java). For example, if I know the exact page I want info from, for the sake of ...
21
votes
7answers
12k views

HTML/XML Parser for Java [closed]

What HTML parsers have the following features: Fast Thread-safe Reliable and bug-free Parses HTML and XML Handles erroneous HTML Has a DOM implementation Supports HTML4, JavaScript, and CSS tags ...
5
votes
4answers
3k views

Loading images from jars for Swing HTML

While this answer works to load images from Jar files for ImageIcons, I cannot seem to get the right path for images referenced in Swing HTML. This displays an image in the Swing HTML when the ...
11
votes
3answers
2k views

What HTML parsing libraries do you recommend in Java [closed]

I want to parse some HTML in order to find the values of some attributes/tags etc. What HTML parsers do you recommend? Any pros and cons?

1 2 3 4 5 26
15 30 50 per page