2

I want to parse an XML content and return a dictionary which contains only the name attribute and its values as dictionary. For example:

  <ecmaarray>
   <number name="xyz1">123.456</number>  
   <ecmaarray name="xyz2">  
       <string name="str1">aaa</string>  
       <number name="num1">55</number>  
   </ecmaarray>  
   <strictarray name="xyz3">  
       <string>aaa</string>  
       <number>55</number>  
   </strictarray>  
</ecmaarray>  

The output has to be in a dictionary something like this..

Dict:{ 'xyz1': 123.456, 
       'xyz2': {'str1':'aaa', 'num1': '55'},
       'xyz3': ['aaa','55']
     }

Can any one suggest a recursive solution for this ?

3
  • xmltodict is ideal for this use case. Though it likely won't generate quite that dictionary (by default, anyways).
    – Brian Cain
    Commented Jul 17, 2013 at 12:20
  • Thanks Jakob Bowyer and Brian Cain for a quick response. Is it possible to provide a more detailed help ?
    – Aryan
    Commented Jul 17, 2013 at 12:23
  • An xml2dict function will parse the xml to a Python dictionary in which elements are keys. You could then easily change the keys so that the dictionary has the format you desire. lxml will allow you to write xpath expressions to extract what you need. You could also write a complete parser with start_tag and end_tag functions, but I wouldn't recommend that over the other two suggestions. You should try using one of the above methods, then post back with your code if you have problems.
    – ChrisP
    Commented Jul 17, 2013 at 12:31

1 Answer 1

1

Assuming situation like this:

<strictarray name="xyz4">
    <string>aaa</string>
    <number name="num1">55</number>
</strictarray>

is not possible, here's a sample code using lxml:

from lxml import etree


tree = etree.parse('test.xml')

result = {}
for element in tree.xpath('/ecmaarray/*'):
    name = element.attrib["name"]
    text = element.text
    childs = element.getchildren()

    if not childs:
        result[name] = text
    else:
        child_dict = {}
        child_list = []
        for child in childs:
            child_name = child.attrib.get('name')
            child_text = child.text
            if child_name:
                child_dict[child_name] = child_text
            else:
                child_list.append(child_text)

        if child_dict:
            result[name] = child_dict
        else:
            result[name] = child_list


print result

prints:

{'xyz3': ['aaa', '55'], 
 'xyz2': {'str1': 'aaa', 'num1': '55'}, 
 'xyz1': '123.456'}

You may want to improve the code - it's just a hint on where to go.

Hope that helps.

1
  • Any recursive solution for this ??
    – Aryan
    Commented Jul 19, 2013 at 9:19

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.