Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

Using XpathBuilder I can construct a simple search engine query and pull data out of the search results using XPath. I have some simple examples in a Google Doc spreadsheet here, which runs the query "XPath tutorial" on various search engines and attempts to pull out the number of results each search engine returns.

The code in that Google Doc is as follows:

=importxml("http://www.google.com/search?q="xpath+tutorial"&num=30&pws=0", 
           "//div[@id='resultStats']")
=importxml("http://www.bing.com/search?q=xpath+tutorial&count=30", 
           "//span[@class='sb_count']")
=importxml("http://search.yahoo.com/search?p=xpath+tutorial&n=30", 
           "//span[@id='resultCount']")

There are some oddities about this that I don't understand. Firstly, the Google search does not return any results, but the XPath query looks OK. Indeed, there are a number of online tutorials which recommend exactly what I have done here.

The Yahoo query returns the correct result, it's the only one that does.

The number of results found by the Bing Xpath query do not match the results given on the Bing web page, even though there is only one XML node which matches the XPath query. More details are on the spreadsheet here

Where did it all go so wrong?

share|improve this question
Downvoting because no code is shown. – Michael Kay Jul 13 '12 at 10:05
The code was all in the Google Doc which also showed the results and gave some details of the source code from the search engines. I've added some of this to the question now. – snim2 Jul 13 '12 at 11:12

3 Answers

up vote 1 down vote accepted

Try this....

=importxml("http://www.google.com/search?q='xpath+tutorial&num=30&pws=0'", "//div[@id='resultStats']")
share|improve this answer
Wow,well, that worked. It seems like the only difference between my code and yours is the extra quotes around the key/value pairs in the URL. Thanks. – snim2 Aug 21 '12 at 21:57

The devil is in the detail - if you don't show us your code, we can't find your bugs for you.

However, XPath is defined to run against XML whereas you appear to be running it against HTML. So the confusion may be in the way HTML is mapped to XML: for example by the addition of implicit nodes such as tbody, by case-folding, or by namespace handling.

share|improve this answer
The code is visible in the Google Doc that I linked to. – snim2 Jul 13 '12 at 11:05
The point about HTML is an interesting one. I suspect that's why the Google query doesn't work, but it doesn't explain the result from Yahoo. – snim2 Jul 13 '12 at 11:55

The Google one probably doesn’t work due to the unencoded double-quotation marks in the URL. Since the importxml string delimiter character is a double-quotation mark, that’s probably not going to work. Encode the double-quotation marks to %22.

Not sure about Bing. Best guess is that your XPath is working but Bing is returning different results to you and Google Docs for some reason.

share|improve this answer
OK, I've taken the quotation marks off so now the statement reads =importxml("http://www.google.com/search?q=xpath+tutorial&num=30&pws=0", "//div[@id='resultStats']") and the output in the spreadsheet is still the same :( – snim2 Jul 15 '12 at 19:33
@snim2 The only thing I can think of is that the Google search results page sent to Google Docs (which may be different from what you see in your browser) does not contain a div with an id attribute of “resultStats”. – Alex Bishop Jul 20 '12 at 18:16
It's possible, but I can't see why that would be. Also, this clearly /used/ to work as there are a bunch of tutorials that recommend the method I've used! – snim2 Jul 22 '12 at 19:01

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.