I'm learning a web scraping technique from series of tutorials from Chris Reeves. Really great stuff, you should check it out.
I ran onto issue with example from tutorial no. 10 where Chris explains connection to mySQL database. First I had problem with not commiting a value to table in database. Then in comments I've discovered that I'm missing conn.commit()
which author of video does not include in his program. I've added this part of code to my program and that is working great, now it looks like this:
from threading import Thread
import urllib
import re
import MySQLdb
conn = MySQLdb.connect(host="127.0.0.1",port=3307,user="root",passwd="root",db="stock_data")
query = "INSERT INTO tutorial (symbol) values('AAPL')"
x = conn.cursor()
x.execute(query)
conn.commit()
row = x.fetchall()
It connects to my local database, and it adds AAPL to table tutorial under column symbol successfully.
My problems started on second part of Chris's tutorial where you suppose to add multithreaded part of code which reads 4-lettered symbols from external .txt file and add's everything into same database.
Now when my program looked like this:
from threading import Thread
import urllib
import re
import MySQLdb
gmap = {}
def th(ur):
base = "http://finance.yahoo.com/q?s="+ur
regex = '<span id="yfs_l84_'+ur.lower()+'">(.+?)</span>'
pattern = re.compile(regex)
htmltext = urllib.urlopen(base).read()
results = re.findall(pattern,htmltext)
try:
gmap[ur] = results [0]
except:
print "got an error"
symbolslist = open("threads/symbols.txt").read()
symbolslist = symbolslist.replace(" ","").split(",")
print symbolslist
threadlist = []
for u in symbolslist:
t = Thread(target=th,args=(u,))
t.start()
threadlist.append(t)
for b in threadlist:
b.join()
conn = MySQLdb.connect(host="127.0.0.1",port=3307,user="root",passwd="root",db="stock_data")
for key in gmap.keys():
print key,gmap[key]
query = "INSERT INTO tutorial (symbol,last) values("
query = query+"'"+key+"',"+gmap[key]+")"
x = conn.cursor()
x.execute(query)
conn.commit()
row = x.fetchall()
which is almost exactly like Chris example (except I don't use external login data, but direct in code, but that is not a problem), I'm getting error for all threads and they look like this:
Exception in thread Thread-474:
Traceback (most recent call last):
File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
self.run()
File "C:\Python27\lib\threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "threads/threads2.py", line 12, in th
htmltext = urllib.urlopen(base).read()
File "C:\Python27\lib\urllib.py", line 87, in urlopen
return opener.open(url)
File "C:\Python27\lib\urllib.py", line 208, in open
return getattr(self, name)(url)
File "C:\Python27\lib\urllib.py", line 345, in open_http
h.endheaders(data)
File "C:\Python27\lib\httplib.py", line 969, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 829, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 791, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 772, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 571, in create_connection
raise err
IOError: [Errno socket error] [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
And this is, as I said, just one error, for Thread-474, but I'm getting it for multiple threads in IDE, as so for Thread-441, Thread-390, Thread-391 etc...
What am I missing? is it something in the code or in my setup of MySql server? Because according to everything in Chris example, it should work
Help anyone?