Parsing HTML using JavaScript

Question

I'm working a page that needs to fetch info from some other pages and then display parts of that information/data on the current page.

I have the HTML source code that I need to parse in a string. I'm looking for a library that can help me do this easily. (I just need to extract specific tags and the text they contain) The HTML is well formed (All closing/ending tags present).

I've looked at some options but they are all being extremely difficult to work with for various reasons.

I've tried the following solutions:

jkl-parsexml library (The library js file itself throws up HTTPError 101)
jQuery.parseXML Utility (Didn't find much documentation/many examples to figure out what to do)
XPATH (The Execute statement is not working but the JS Error Console shows no errors)

And so I'm looking for a more user friendly library or anything(tutorials/books/references/documentation) that can let me use the aforementioned tools better, more easily and efficiently.

An Ideal solution would be something like BeautifulSoup available in Python.

You could add it to the DOM, hide it, then access your elements with plain js or jQuery. That's actually letting the browser parse it for you, and using js to traverse the DOM. — bfavaretto, Sep 11 '12 at 22:53
The HTML I have is heavily nested(10-12 levels deep) and lacks class,name and id attributes; i.e the getELementById and similar functions are rendered effectively useless. So recovering the required data would be a real bother that way. — ffledgling, Sep 11 '12 at 22:56
Hm. Take a look at jquery selectors. It should be powerful enough. Something like this "div p span" will find all spans located inside div and than inside p. "div>p>span" will do the same, but now p must be a direct child of div and span - direct child of such p. And there are a lot of other helpful selectors/functions in jquery — Viktor S., Sep 11 '12 at 23:00
@bfavaretto I can't say for sure that a custom parser will make the job easier, but this was the first approach I tried and it was extremely time consuming. I was hoping that the parser would give me nested dictionaries which I could loop through more easily. — ffledgling, Sep 11 '12 at 23:03

Elliot Bonneville · Accepted Answer · 2012-09-11 22:56:08Z

up vote 4 down vote accepted

Using jQuery, it would be as simple as $(HTMLstring); to create a jQuery object with the HTML data from the string inside it (this DOM would be disconnected from your document). From there it's very easy to do whatever you want with it--and traversing the loaded data is, of course, a cinch with jQuery.

answered Sep 11 '12 at 22:56

Elliot Bonneville

27.5k1257101

I'm not sure if this is a problem with my code or the HTML itself but I get "Error: Invalid XML" when I try this. Here is the code I used ` htmlDoc = $.parseXML(pagetext);$html = $( htmldoc );$html.find("body");` – ffledgling Sep 11 '12 at 23:07

@Ayos: I would guess it's because you're trying to pass something into .parseXML that is invalid XML. What's the contents of pagetext? – Elliot Bonneville Sep 11 '12 at 23:09

The page contains HTML with CSS in the head and Javascript within the <script> tags. It's basically the entire source code of a website obtained via XHR's responseText. – ffledgling Sep 11 '12 at 23:13

Try var $html = $(pagetext) directly, then. – Elliot Bonneville Sep 11 '12 at 23:13

add a comment |

Viktor S. · Answer 2 · 2012-09-11 22:56:52Z

You can do something like this:

$("string with html here").find("jquery selector")

$("string with html here") this will create a document fragment and put an html into it (basically, it will parse your HTML). And find will search for elements in that document fragment (and only inside it). At the same time it will not put it in page DOM

asked	4 years ago
viewed	808 times
active	4 years ago

Parsing HTML using JavaScript

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged javascript jquery xpath xml-parsing html-parsing or ask your own question.

Hot Network Questions

Parsing HTML using JavaScript

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged javascript jquery xpath xml-parsing html-parsing or ask your own question.

Related

Hot Network Questions