XPath query in Google docs does not match HTML source

Question

Using XpathBuilder I can construct a simple search engine query and pull data out of the search results using XPath. I have some simple examples in a Google Doc spreadsheet here, which runs the query "XPath tutorial" on various search engines and attempts to pull out the number of results each search engine returns.

The code in that Google Doc is as follows:

=importxml("http://www.google.com/search?q="xpath+tutorial"&num=30&pws=0", 
           "//div[@id='resultStats']")
=importxml("http://www.bing.com/search?q=xpath+tutorial&count=30", 
           "//span[@class='sb_count']")
=importxml("http://search.yahoo.com/search?p=xpath+tutorial&n=30", 
           "//span[@id='resultCount']")

There are some oddities about this that I don't understand. Firstly, the Google search does not return any results, but the XPath query looks OK. Indeed, there are a number of online tutorials which recommend exactly what I have done here.

The Yahoo query returns the correct result, it's the only one that does.

The number of results found by the Bing Xpath query do not match the results given on the Bing web page, even though there is only one XML node which matches the XPath query. More details are on the spreadsheet here

Where did it all go so wrong?

The code was all in the Google Doc which also showed the results and gave some details of the source code from the search engines. I've added some of this to the question now.

j0k · Accepted Answer · 2012-08-21 16:35:19Z

up vote 1 down vote accepted

Try this....

=importxml("http://www.google.com/search?q='xpath+tutorial&num=30&pws=0'", "//div[@id='resultStats']")

edited Aug 21 '12 at 16:35

j0k
10.8k102437

answered Aug 21 '12 at 16:22

Bruce
261

Wow,well, that worked. It seems like the only difference between my code and yours is the extra quotes around the key/value pairs in the URL. Thanks. – snim2 Aug 21 '12 at 21:57

Michael Kay · Answer 2 · 2012-07-13 10:05:33Z

up vote 0 down vote

The devil is in the detail - if you don't show us your code, we can't find your bugs for you.

However, XPath is defined to run against XML whereas you appear to be running it against HTML. So the confusion may be in the way HTML is mapped to XML: for example by the addition of implicit nodes such as tbody, by case-folding, or by namespace handling.

answered Jul 13 '12 at 10:05

Michael Kay
35.2k21327

	The code is visible in the Google Doc that I linked to. – snim2 Jul 13 '12 at 11:05
	The point about HTML is an interesting one. I suspect that's why the Google query doesn't work, but it doesn't explain the result from Yahoo. – snim2 Jul 13 '12 at 11:55

Alex Bishop · Answer 3 · 2012-07-13 23:40:19Z

up vote 1 down vote

The Google one probably doesn’t work due to the unencoded double-quotation marks in the URL. Since the importxml string delimiter character is a double-quotation mark, that’s probably not going to work. Encode the double-quotation marks to %22.

Not sure about Bing. Best guess is that your XPath is working but Bing is returning different results to you and Google Docs for some reason.

edited Jul 13 '12 at 23:40

answered Jul 13 '12 at 23:34

Alex Bishop
1785

	OK, I've taken the quotation marks off so now the statement reads `=importxml("http://www.google.com/search?q=xpath+tutorial&num=30&pws=0", "//div[@id='resultStats']")` and the output in the spreadsheet is still the same :( – snim2 Jul 15 '12 at 19:33
	@snim2 The only thing I can think of is that the Google search results page sent to Google Docs (which may be different from what you see in your browser) does not contain a div with an id attribute of “resultStats”. – Alex Bishop Jul 20 '12 at 18:16
	It's possible, but I can't see why that would be. Also, this clearly /used/ to work as there are a bunch of tutorials that recommend the method I've used! – snim2 Jul 22 '12 at 19:01

asked	10 months ago
viewed	401 times
active	9 months ago

XPath query in Google docs does not match HTML source

3 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged xml xpath google-docs or ask your own question.

Community Bulletin

XPath query in Google docs does not match HTML source

3 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged xml xpath google-docs or ask your own question.

Community Bulletin

Related