I wanted to find out which states and cities the USA hockey team was from, but I didn't want to manually count from the roster site here.
I'm really interested to see if someone has a more elegant way to do what I've done (which feels like glue and duct tape) for future purposes. I read about 12 different Stack Overflow questions to get here.
from bs4 import BeautifulSoup
from collections import Counter
import urllib2
url='http://olympics.usahockey.com/page/show/1067902-roster'
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
locations = []
city = []
state = []
counter = 0
tables = soup.findAll("table", { "class" : "dataTable" })
for table in tables:
rows = table.findAll("tr")
for row in rows:
entries = row.findAll("td")
for entry in entries:
counter = counter + 1
if counter == 7:
locations.append(entry.get_text().encode('ascii'))
counter = 0
for i in locations:
splitter = i.split(", ")
city.append(splitter[0])
state.append(splitter[1])
print Counter(state)
print Counter(city)
I essentially did a three tier loop for table->tr->td
, and then used a counter to grab the 7th column and added it to a list. Then I iterated through the list splitting the first word to one list, and the second word to a second list. Then ran it through Counter to print the cities and states. I get a hunch this could be done a lot simpler, curious for opinions.