Stack Overflow is a community of 4.7 million programmers, just like you, helping each other.

Join them; it only takes a minute:

Sign up
Join the Stack Overflow community to:
  1. Ask programming questions
  2. Answer and help your peers
  3. Get recognized for your expertise

I'm trying to scrape some data from one flight-searching web page. It is probably generated with Javascript. I've tried many approaches but nothing works so I've decided to try selenium.

from selenium import webdriver

driver = webdriver.Firefox()
driver.get('https://www.pelikan.sk/sk/flights/list?dfc=CVIE%20BUD%20BTS&dtc=CMAD&rfc=CMAD&rtc=CVIE%20BUD%20BTS&dd=2015-07-09&rd=2015-07-14&px=1000&ns=0&prc=&rng=1&rbd=0&ct=0')
print driver.page_source

I though that it return final javascript-generated html code but I can't find there strings which are on this page when open it in browser.

Where could be the problem? What should I do to get those flights?

EDIT: I forgot to mention that the page is continualy loading new flights. So when you open it in a browser it shows some flights but it still loading other flights.

share|improve this question
up vote 1 down vote accepted

The page has quite a dynamic nature and you need to wait for the page to load. Choose something that would indicate that a page and search results were loaded. For instance, wait until the loading image (with a pelican) becomes invisible:

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get("https://www.pelikan.sk/sk/flights/list?dfc=CVIE%20BUD%20BTS&dtc=CMAD&rfc=CMAD&rtc=CVIE%20BUD%20BTS&dd=2015-07-09&rd=2015-07-14&px=1000&ns=0&prc=&rng=1&rbd=0&ct=0")

wait = WebDriverWait(driver, 60)
wait.until(EC.invisibility_of_element_located((By.XPATH, '//img[contains(@src, "loading")]')))
wait.until(EC.invisibility_of_element_located((By.XPATH, u'//div[. = "Poprosíme o trpezlivosť, hľadáme pre Vás ešte viac letov"]/preceding-sibling::img')))

print(driver.page_source)

Here we are waiting for two pelicans to fly away disappear: a bigger one and a smaller one.

share|improve this answer
    
Thanks, great advice, now it works! – Milano Jun 18 '15 at 13:48

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.