1

It's my first time trying to parse XML with python so answer could be simple but I can't figure this out.

I'm using ElementTree to parse some XML file. Problem is that I cannot get any result inside the tree when having this attribute:

<package xmlns="http://apple.com/itunes/importer" version="software5.1">

When removing this attribute everything works great. To be clear I mean when changing first line of XML file to:

<package>

Everything works great.

What am I doing wrong?

Here is my code:

import xml.etree.ElementTree as ET

tree = ET.parse('metadataCopy.xml')
root = tree.getroot()

p = root.find(".//intervals/interval")

print p
for interval in root.iterfind(".//intervals/interval"):
    start_date = interval.find('start_date').text
    end_date = interval.find('end_date').text
    print start_date, end_date

Please help. Thanks!

UPDATE: The XML file:

<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://apple.com/itunes/importer" version="software5.1">
<metadata_token>TOKEN</metadata_token>
<provider>Provider Name</provider>
<team_id>Team_ID_Here</team_id>
<software>
    <!--Apple ID: 01234567-->
    <vendor_id>vendorSKU</vendor_id>
    <read_only_info>
        <read_only_value key="apple-id">01234567</read_only_value>
    </read_only_info>
    <software_metadata>
        <versions>
            <version string="1.0">
                <locales>
                    <locale name="en-US">
                        <title>title text</title>
                        <description>Description text</description>
                        <keywords>
                            <keyword>key1</keyword>
                            <keyword>key2</keyword>
                        </keywords>
                        <version_whats_new>New things here</version_whats_new>
                        <support_url>http://someurl.com</support_url>
                        <software_screenshots>
                            <software_screenshot display_target="iOS-3.5-in" position="1">

                            </software_screenshot>
                            <software_screenshot display_target="iOS-4-in" position="1">

                            </software_screenshot>
                        </software_screenshots>
                    </locale>
                </locales>
            </version>
        </versions>
        <products>
            <product>
                <territory>WW</territory>
                <cleared_for_sale>true</cleared_for_sale>
                <sales_start_date>2013-01-05</sales_start_date>
                <intervals>
                    <interval>
                        <start_date>2013-08-25</start_date>
                        <end_date>2014-09-01</end_date>
                        <wholesale_price_tier>5</wholesale_price_tier>
                    </interval>
                    <interval>
                        <start_date>2014-09-01</start_date>
                        <wholesale_price_tier>6</wholesale_price_tier>
                    </interval>
                </intervals>
                <allow_volume_discount>true</allow_volume_discount>
            </product>
        </products>
    </software_metadata>
</software>

2

1 Answer 1

5

This is because, xml in python is not auto aware of namespaces. We need to prefix every element in a tree with the namespace prefix for lookup.

    import xml.etree.ElementTree as ET

namespaces = {"pns" : "http://apple.com/itunes/importer"}
tree = ET.parse('metadataCopy.xml')
root = tree.getroot()

p = root.find(".//pns:intervals/pns:interval", namespaces=namespaces)

print p
for interval in root.iterfind(".//pns:intervals/pns:interval",namespaces=namespaces):
    start_date = interval.find('pns:start_date',namespaces=namespaces)
    end_date = interval.find('pns:end_date',namespaces=namespaces)
    st_text = end_text = None
    if start_date is not None:
        st_text = start_date.text
    if end_date is not None:
        end_text = end_date.text 
    print st_text, end_text

The xml file shared is not well formed XML. The last tag has to end with package tag. With this change done, programs produces:

<Element '{http://apple.com/itunes/importer}interval' at 0x178b350>
2013-08-25 2014-09-01
2014-09-01 None

If its possible to change the library, you can look for using lxml. lxml has a great support for working with namespaces. Check out the quick short tutorial here http://lxml.de/tutorial.html#namespaces

3
  • 1
    I get this error: Traceback (most recent call last): File "XMLParser.py", line 75, in <module> p = root.find(".//pns:intervals/pns:interval", namespaces=namespaces) TypeError: find() takes no keyword arguments Commented Sep 1, 2013 at 11:45
  • 1
    Can you please also share the contents of xml file? Commented Sep 1, 2013 at 11:54
  • 1
    Updated the answer. Please check. Commented Sep 1, 2013 at 12:14

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.