x Dismiss

Join the Stack Overflow Community

Stack Overflow is a community of 6.3 million programmers, just like you, helping each other.
Join them; it only takes a minute:

getting dynamic data using python

up vote 0 down vote favorite

I'm new to Python and got interested in writing scripts. I'm currently building a crawler that goes on a page and extract copy from tags. Write now I can only list tags; I'm having trouble getting the text out of tags and I'm not sure why exactly. I'm also using BeautifulSoup and PyQt4 to get dynamic data(this might need a new question).

So based on this code below, I should be getting the "Images" copy from the Google homepage, or at least the span tag itself. I'm getting returned NONE

I tried reading the docs for BeautifulSoup and it was a little overwhelming. I'm still reading it, but I think I keep going down a rabbit hole. I can print all anchor tags or all divs, but targeting a specific one is where I'm struggling.

import urllib
import re
from bs4 import BeautifulSoup, Comment

import sys  
from PyQt4.QtGui import *  
from PyQt4.QtCore import *  
from PyQt4.QtWebKit import *  

class Render(QWebPage):  
 def __init__(self, url):  
  self.app = QApplication(sys.argv)  
  QWebPage.__init__(self)  
  self.loadFinished.connect(self._loadFinished)  
  self.mainFrame().load(QUrl(url))  
  self.app.exec_()  

  def _loadFinished(self, result):  
  self.frame = self.mainFrame()  
  self.app.quit()  

url = 'http://google.com' 
source = urllib.urlopen(url).read()
soup = BeautifulSoup(source, 'html.parser')
js_test =  soup.find("a", class_="gb_P")
print js_test

asked 18 hours ago

whyeven

first check page send by Google. It can send empty page or with captcha if it thinks you are bot/spamer. – furas 18 hours ago

Google can use JavaScript to add images on page - and you use only BS which doesn't run JavaScript. I don't see you use QtWebKit or Selenium, PhantomJS, etc. – furas 18 hours ago

How would I go about checking? Also, I used QtWebKit and followed a tutorial on youtube to find dynamic content. I'm guessing I didn't use it properly...I'll look into that one some more. For PhantomJS, would I need to use that in a separate file or can I use it in the .py file? – whyeven 18 hours ago

you import QtWebKit and you create class Render but you don't use it. So you use only BS to get data. – furas 18 hours ago

add a comment |

Your Answer

Sign up or log in

Post as a guest

Name

Post as a guest

Name

discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged javascript python or ask your own question.

question feed

asked	today
viewed	27 times

current community

your communities

more stack exchange communities

getting dynamic data using python

Your Answer

Browse other questions tagged javascript python or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

getting dynamic data using python

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged javascript python or ask your own question.

Related

Hot Network Questions