i'm trying to parse an html page with XPathDocument, but gives error 'cause the html is not an xml... is there a way to do this or not?

share|improve this question
check here: stackoverflow.com/questions/56107/… – pinichi Oct 15 '10 at 7:26

2 Answers

up vote 6 down vote accepted

Should use HtmlAgilityPack. Still the best!

share|improve this answer

Use something like Html Agility Pack which can load your html into a DOM object which can be traversed with for example xpath queries.

Unless your html is in fact xhtml, it is usually not a valid xml structure with correct opening and ending node tags.

share|improve this answer
I would like to mark this answer up, but htmlagilitypack does not work with the doc I'm giving it, the LoadFile() method does not have a return value, and does not throw an exception either. The document appears to not return anything when I query it either, so I'm assuming the code has "silently failed" when this happens? – Conrad B Jan 2 at 15:46
Hi @ConradB, Have you tried the sample at htmlagilitypack.codeplex.com/wikipage?title=Examples? Load should not return anything, but it should make you able to loop over nodes doing selections. – Mikael Svenson Jan 2 at 20:29

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.