Stack Overflow is a community of 4.7 million programmers, just like you, helping each other.

Join them; it only takes a minute:

Sign up
Join the Stack Overflow community to:
  1. Ask programming questions
  2. Answer and help your peers
  3. Get recognized for your expertise

I am actually trying to extract data from RSS documents. I am using the following code to parse xml doc.

But wont work for this document http://www.mediafire.com/?hptptj8847awnn1 . Please help!!

#import easy to use xml parser called minidom:
import xml.dom.minidom as minidom
import csv

def getTags(xml):
"""
Print out all titles found in xml
"""

doc = minidom.parse(xml)



node = doc.documentElement
items = doc.getElementsByTagName("item")

titles = []
for item in items:
    titleObj = item.getElementsByTagName("title")[0]
    titles.append(titleObj)


print len(titles)

x = 0
for x in range(len(titles)):
    nodes = titles[x].childNodes
    for node in nodes:
        if node.nodeType == node.CDATA_SECTION_NODE:
            titletxt = node.data

        elif node.nodeType == node.TEXT_NODE:
            titletxt = node.data

if __name__ == "__main__":
    document = 'D2B0918.xml'
    getTags(document)
share|improve this question
    
Define "won't work". – Michael Petrotta Nov 8 '11 at 4:47
    
Getting this error:line 10, in getTags doc = minidom.parse(xml) File "C:\Python26\lib\xml\dom\minidom.py", line 1918, in parse return expatbuilder.parse(file) File "C:\Python26\lib\xml\dom\expatbuilder.py", line 924, in parse result = builder.parseFile(fp) File "C:\Python26\lib\xml\dom\expatbuilder.py", line 207, in parseFile parser.Parse(buffer, 0) ExpatError: not well-formed (invalid token): line 2, column 573 – ISGAL Nov 8 '11 at 4:49
    
Ok. What's on line 2, column 573? – Michael Petrotta Nov 8 '11 at 4:52
    
A character xAE – ISGAL Nov 8 '11 at 4:57

If you want to parse RSS in particular, I'll just humbly point you towards the excellent feedparser library, which probably does what you want and then some.

http://code.google.com/p/feedparser/

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.