Join the Stack Overflow Community
Stack Overflow is a community of 6.3 million programmers, just like you, helping each other.
Join them; it only takes a minute:
Sign up

I'm new to Python and got interested in writing scripts. I'm currently building a crawler that goes on a page and extract copy from tags. Write now I can only list tags; I'm having trouble getting the text out of tags and I'm not sure why exactly. I'm also using BeautifulSoup and PyQt4 to get dynamic data(this might need a new question).

So based on this code below, I should be getting the "Images" copy from the Google homepage, or at least the span tag itself. I'm getting returned NONE

I tried reading the docs for BeautifulSoup and it was a little overwhelming. I'm still reading it, but I think I keep going down a rabbit hole. I can print all anchor tags or all divs, but targeting a specific one is where I'm struggling.

import urllib
import re
from bs4 import BeautifulSoup, Comment

import sys  
from PyQt4.QtGui import *  
from PyQt4.QtCore import *  
from PyQt4.QtWebKit import *  

class Render(QWebPage):  
 def __init__(self, url):  
  self.app = QApplication(sys.argv)  
  QWebPage.__init__(self)  
  self.loadFinished.connect(self._loadFinished)  
  self.mainFrame().load(QUrl(url))  
  self.app.exec_()  

  def _loadFinished(self, result):  
  self.frame = self.mainFrame()  
  self.app.quit()  

url = 'http://google.com' 
source = urllib.urlopen(url).read()
soup = BeautifulSoup(source, 'html.parser')
js_test =  soup.find("a", class_="gb_P")
print js_test
share|improve this question
    
first check page send by Google. It can send empty page or with captcha if it thinks you are bot/spamer. – furas 18 hours ago
    
Google can use JavaScript to add images on page - and you use only BS which doesn't run JavaScript. I don't see you use QtWebKit or Selenium, PhantomJS, etc. – furas 18 hours ago
    
How would I go about checking? Also, I used QtWebKit and followed a tutorial on youtube to find dynamic content. I'm guessing I didn't use it properly...I'll look into that one some more. For PhantomJS, would I need to use that in a separate file or can I use it in the .py file? – whyeven 18 hours ago
    
you import QtWebKit and you create class Render but you don't use it. So you use only BS to get data. – furas 18 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged or ask your own question.