c# parse html using XPathDocument

Question

i'm trying to parse an html page with XPathDocument, but gives error 'cause the html is not an xml... is there a way to do this or not?

pinichi · Accepted Answer · 2010-10-15 07:25:09Z

up vote 6 down vote accepted

Should use HtmlAgilityPack. Still the best!

answered Oct 15 '10 at 7:25

pinichi
1,130314

Mikael Svenson · Answer 2 · 2010-10-15 07:25:50Z

up vote 2 down vote

Use something like Html Agility Pack which can load your html into a DOM object which can be traversed with for example xpath queries.

Unless your html is in fact xhtml, it is usually not a valid xml structure with correct opening and ending node tags.

answered Oct 15 '10 at 7:25

Mikael Svenson
13.5k1834

	I would like to mark this answer up, but htmlagilitypack does not work with the doc I'm giving it, the LoadFile() method does not have a return value, and does not throw an exception either. The document appears to not return anything when I query it either, so I'm assuming the code has "silently failed" when this happens? – Conrad B Jan 2 at 15:46
	Hi @ConradB, Have you tried the sample at htmlagilitypack.codeplex.com/wikipage?title=Examples? Load should not return anything, but it should make you able to loop over nodes doing selections. – Mikael Svenson Jan 2 at 20:29

2 Answers