How to parse an xml feed using python?

Question

I am trying to parse this xml (http://www.reddit.com/r/videos/top/.rss) and am having troubles doing so. I am trying to save the youtube links in each of the items, but am having trouble because of the "channel" child node. How do I get to this level so I can then iterate through the items?

#reddit parse
reddit_file = urllib2.urlopen('http://www.reddit.com/r/videos/top/.rss')
#convert to string:
reddit_data = reddit_file.read()
#close file because we dont need it anymore:
reddit_file.close()

#entire feed
reddit_root = etree.fromstring(reddit_data)
channel = reddit_root.findall('{http://purl.org/dc/elements/1.1/}channel')
print channel

reddit_feed=[]
for entry in channel:   
    #get description, url, and thumbnail
    desc = #not sure how to get this

    reddit_feed.append([desc])

Community · Accepted Answer · 2013-04-25 06:49:09Z

You can try findall('channel/item')

import urllib2
from xml.etree import ElementTree as etree
#reddit parse
reddit_file = urllib2.urlopen('http://www.reddit.com/r/videos/top/.rss')
#convert to string:
reddit_data = reddit_file.read()
print reddit_data
#close file because we dont need it anymore:
reddit_file.close()

#entire feed
reddit_root = etree.fromstring(reddit_data)
item = reddit_root.findall('channel/item')
print item

reddit_feed=[]
for entry in item:   
    #get description, url, and thumbnail
    desc = entry.findtext('description')  
    reddit_feed.append([desc])

sputnick · Answer 2 · 2012-10-14 05:26:06Z

I wrote that for you using Xpath expressions (tested successfully ):

from lxml import etree
import urllib2

headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request('http://www.reddit.com/r/videos/top/.rss', None, headers)
reddit_file = urllib2.urlopen(req).read()

reddit = etree.fromstring(reddit_file)

for item in reddit.xpath('/rss/channel/item'):
    print "title =", item.xpath("./title/text()")[0]
    print "description =", item.xpath("./description/text()")[0]
    print "thumbnail =", item.xpath("./*[local-name()='thumbnail']/@url")[0]
    print "link =", item.xpath("./link/text()")[0]
    print "-" * 100

asked	7 months ago
viewed	159 times
active	1 month ago

How to parse an xml feed using python?

2 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged python xml parsing or ask your own question.

Community Bulletin

How to parse an xml feed using python?

2 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python xml parsing or ask your own question.

Community Bulletin

Related