Tagged Questions
100
votes
6answers
16k views
What are the pros and cons of the leading Java HTML parsers?
Searching SO and Google, I've found that there are a few Java HTML parsers which are consistently recommended by various parties. Unfortunately it's hard to find any information on the strengths and ...
139
votes
8answers
253k views
How to send HTTP request in java?
I want to compose a HTTP request message in java and then want to send it to a HTTP WebServer.
I also want the document content of the page recieved which I would have recieved if I had sent the same ...
120
votes
18answers
122k views
Removing HTML from a Java String
Is there a good way to remove HTML from a Java string? A simple regex like
replaceAll("\\<.*?>","")
will work, but things like
&
wont be converted correctly and non-HTML between ...
37
votes
2answers
57k views
Java: How to decode HTML character entities in Java like HttpUtility.HtmlDecode?
Basically I would like to decode a given Html document, and replace all special chars, such as " " -> " ", ">" -> ">".
In .NET we can make use of HttpUtility.HtmlDecode.
What's ...
39
votes
11answers
58k views
Java HTML Parsing [closed]
I'm working on an app which scrapes data from a website and I was wondering how I should go about getting the data. Specifically I need data contained in a number of div tags which use a specific CSS ...
35
votes
11answers
21k views
Converting HTML files to PDF [closed]
I need to automatically generate a PDF file from an exisiting (X)HTML-document. The input files (reports) use a rather simple, table-based layout, so support for really fancy JavaScript/CSS stuff is ...
12
votes
10answers
33k views
Convert Word doc to HTML programmatically in Java
I need to convert a Word document into HTML file(s) in Java. The function will take input an word document and the output will be html file(s) based on the number of pages the word document has i.e. ...
47
votes
3answers
32k views
Which Html Parser is best? [closed]
I code a lot of parsers. Up till now, I was using HtmlUnit headless browser for parsing and browser automation.
Now, I want to separate both the tasks.
As 80% of my work involves just parsing, I ...
51
votes
6answers
50k views
Recommended method for escaping HTML in Java
Is there a recommended way to escape <, >, " and & characters when outputting HTML in plain Java code? (Other than manually doing the following, that is).
String source = "The less than ...
5
votes
7answers
2k views
Swing HTML drawString
I'm trying to create some special component for a specific purpose, on that component I need to draw a HTML string, here's a sample code:
public class MyComponent extends JComponent{
public ...
1
vote
1answer
517 views
Inconsistent performance applying ForegroundActions in a JEditorPane when reading HTML
I'm building an HTML editor using JEditorPane, but I'm getting some inconsistent performance issues with Foreground Actions. I have a simplified version of my editor below that has three actions: ...
19
votes
10answers
12k views
How to “scan” a website (or page) for info, and bring it into my program?
Well, I'm pretty much trying to figure out how to pull information from a webpage, and bring it into my program (in Java).
For example, if I know the exact page I want info from, for the sake of ...
21
votes
7answers
12k views
HTML/XML Parser for Java [closed]
What HTML parsers have the following features:
Fast
Thread-safe
Reliable and bug-free
Parses HTML and XML
Handles erroneous HTML
Has a DOM implementation
Supports HTML4, JavaScript, and CSS tags
...
5
votes
4answers
3k views
Loading images from jars for Swing HTML
While this answer works to load images from Jar files for ImageIcons, I cannot seem to get the right path for images referenced in Swing HTML.
This displays an image in the Swing HTML when the ...
11
votes
3answers
2k views
What HTML parsing libraries do you recommend in Java [closed]
I want to parse some HTML in order to find the values of some attributes/tags etc.
What HTML parsers do you recommend? Any pros and cons?