I have a web scraper developed using C#, windows forms and the HTML Agility Pack.
I had it all working great when the site changed it's code and broke it. I know it happens often with web scrapers but now I am having trouble figuring out how to correct the issue.
At this time my scraper loops thru multiple URL's and scrapes data from each page.
The problem I am running into is that the template of the site it loops thru will randomly show the newer template which does not have the same HTML classes and ID's that I have defined in the program. What I am trying to do is run a simple if that checks if a single node if null and if it is runs a separate set of code for the new template.
The problem I am having is that my program throws a NullReferenceException on my if statement.
Here is the statement I am using to check if it is null:
var varitem = doc.DocumentNode.SelectSingleNode("//h1[@class='producttitle']").InnerText;
if (varitem == null) MessageBox.Show("no titles");
It throws the exception at the first line defining the varitem and doesn't even make it to the if statement.
Any advise appreciated!
doc.DocumentNode.SelectSingleNode("//h1[@class='producttitle']")
is null. When it is null you'll get NullReferenceException (null.InnerText) – I4V Jun 8 '13 at 17:15