0

I'm trying to scrape some data from TripAdvisor and using Selenium with Python binding to get it done.

The review objects in the webpage sometimes have a 'More' button at the bottom to display the full review content upon clicking it. It is actually a span element with an onlclick JS function written for it.

What I want to achieve is to load the page, find the 'More' links and click them so that the web page then has fully loaded reviews shown before scraping operations begin.

So far, I've tried the following code with no luck. I can't seem to understand the errors shown in stack trace.

import os
import time
from selenium import webdriver

driver = webdriver.Firefox()
driver.get("https://www.tripadvisor.ca/Attraction_Review-g304138-d317476-Reviews-Temple_of_the_Tooth_Sri_Dalada_Maligawa-Kandy_Central_Province.html#REVIEWS");

more = [];
more = driver.find_elements_by_class_name('moreLink')
print(len(more));
for x in range(0,len(more)):
    if more[x].is_displayed():
        more[x].click();
        print("clicked");

These are the error logs that I'm getting in the console.

3

Traceback (most recent call last):
  File "C:\Users\**\workspace\ReviewScraper\src\scraper\test3.py", line 13, in <module>
    more[x].click();
  File "C:\Users\**\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 75, in click
    self._execute(Command.CLICK_ELEMENT)
  File "C:\Users\**\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 454, in _execute
    return self._parent.execute(command, params)
  File "C:\Users\**\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 201, in execute
    self.error_handler.check_response(response)
  File "C:\Users\**\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 102, in check_response
    value = json.loads(value_json)
  File "C:\Users\**\AppData\Local\Programs\Python\Python35-32\lib\json\__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "C:\Users\**\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\**\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Any help is highly appreciated.

1 Answer 1

0

I managed to get this done by reverting back to Selenium 1.48.0, and by logging into TA before scraping the reviews, everytime. Once logged in, you could click on 'More' button and extract the full reviews easily.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.