Stack Overflow is a community of 4.7 million programmers, just like you, helping each other.

Join them; it only takes a minute:

Sign up
Join the Stack Overflow community to:
  1. Ask programming questions
  2. Answer and help your peers
  3. Get recognized for your expertise

I'm scraping real estate data. On sites generated with javascript Selenium does a splendid job: you find the tags that hold the relevant information and loop over all of them with

driver.find_elements_by...

But on this site , the listings are produced by angular js. I tried the same approach:

for article in driver.find_elements_by_css_selector("div.property.ng-scope"):
    do something

I figured out that I have to make my webdriver (phantomJS) click the link leading to the individual listings' site:

linkbase = article.find_element_by_css_selector("div.info.clear.ng-scope")
link = linkbase.find_element_by_tag_name('a')
link.click()

Then the webdriver is simply pointed towards that site and I can get all the information I want for one listing.

As soon as one run through the loop ends, I get the following error:

> Message: {"errorMessage":"Element does not exist in cache","request":{"headers":
{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","
Content-Length":"142","Content-Type":"application/json;charset=UTF-8","Host":"12
7.0.0.1:56577","User-Agent":"Python-urllib/3.4"},"httpVersion":"1.1","method":"P
OST","post":"{\"sessionId\": \"f9ec2c10-dfd9-11e5-9d4c-3bbe8f5bf7c0\", \"using\"
: \"css selector\", \"id\": \":wdc:1456856343349\", \"value\": \"div.info.clear.
ng-scope\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"elemen
t","directory":"/","path":"/element","relative":"/element","port":"","host":"","
password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/ele
ment","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/f9ec2c10-dfd9-
11e5-9d4c-3bbe8f5bf7c0/element/:wdc:1456856343349/element"}}

The element containing the link on the page is:

<a ng-href="/detail/prodej/dum/rodinny/jemnice-jemnice-/3800125532" ng-click="beforeOpen(i.iterator, i.regionTip)" class="title" href="/detail/prodej/dum/rodinny/jemnice-jemnice-/3800125532">
<span class="name ng-binding"> ... </a>

Which is just the title text of each listing. I did set a user-agent following this answer even though it doesn't appear in the error. Also I wait before the surrounding element is loaded:

wait = WebDriverWait(driver, getSearchResults_CZ.waiting)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.content")))

What I want is to parse all these property elements, save their links to a list and then loop through the list, opening each link with driver.get() I know that by clicking the link, the driver url changes, but I thought that once the list of articles has been established with find_elements_by, it would serve as a stable reference point. Accessing the link by searching for the "a" tag and calling get_attribute('href') didn't work in this case with the angular js framework. What am I not seeing?

EDIT: As answered, get_attribute without .click() is the right way to go. My original error was related to the CSS selector: I have been using "div[class^='property']" and got a totally different link. Must have found another element I hadn't seen before.

share|improve this question
up vote 1 down vote accepted

Wait for at least one "property" to be visible and then grab the links:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get("http://www.sreality.cz/hledani/prodej/domy?region=jemnice")
driver.maximize_window()

wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "property")))

links = [link.get_attribute("href") for link in driver.find_elements_by_css_selector("div.property div.info a")]
print(links)

driver.close()

Works for me.

share|improve this answer
    
As it does for me... Not clicking is the right way to go. Otherwise Selenium loses the webobjects it's supposed to loop over. – Thanados Mar 2 at 15:01

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.