python xml parsing

Question

I have to delete particular tags from an xml file. Sample xml below.

       <data>
          <tag:action/>
        </data>

I want to delete all contents between data and /data. The XML tags are not displayed in the question after posting.

I am able to do this by using remove() method in Python ElementTree xml parser. I am writing the modified contents to a new after the deletion of the element.

tree.write('new.xml');

The problem is that all the tag names in the original xml file are renamed to ns0, ns1 and so on in new.xml.

Is there any way to modify the XML file keeping all other contents in tact?

That looks like an incomplete XML file to me. How would lxml know what namespace to associate with tag? — Anthon, May 8 at 5:25

kitekat75 · Answer 1 · 2014-05-09 06:50:04Z

You can use beautiful soup to do the job :

#!/usr/bin/python
# -*- coding: utf-8 -*-

import bs4

content = '''
<people>

  <person born="1975">
    <name>
      <first_name>John</first_name>
      <last_name>Doe</last_name>
    </name>
    <profession>computer scientist</profession>
    <homepage href="http://www.example.com/johndoe"/>
  </person>

  <person born="1977">
    <name>
      <first_name>Jane</first_name>
      <last_name>Doe</last_name>
    </name>
    <profession>computer scientist</profession>
    <homepage href="http://www.example.com/janedoe"/>
  </person>

</people>
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(content)

for s in soup('name'):
    s.extract()

print(soup)

It produces the following result :

<html><body><people>
<person born="1975">

<profession>computer scientist</profession>
<homepage href="http://www.example.com/johndoe"></homepage>
</person>
<person born="1977">

<profession>computer scientist</profession>
<homepage href="http://www.example.com/janedoe"></homepage>
</person>
</people>
</body></html>

With namespaces :

#!/usr/bin/python
# -*- coding: utf-8 -*-

import bs4

content = '''<people xmlns:h="http://www.example.com/to/">

  <h:person born="1975">
    <h:name>
      <h:first_name>John</h:first_name>
      <h:last_name>Doe</h:last_name>
    </h:name>
    <h:profession>computer scientist</h:profession>
    <h:homepage href="http://www.example.com/johndoe"/>
  </h:person>

  <h:person born="1977">
    <h:name>
      <h:first_name>Jane</h:first_name>
      <h:last_name>Doe</h:last_name>
    </h:name>
    <h:profession>computer scientist</h:profession>
    <h:homepage href="http://www.example.com/janedoe"/>
  </h:person>

</people>
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(content).people

for s in soup('h:name'):
    s.extract()

print(soup)

I added .people to prevent <html><body> </body></html> in the result.

<people xmlns:h="http://www.example.com/to/">
<h:person born="1975">

<h:profession>computer scientist</h:profession>
<h:homepage href="http://www.example.com/johndoe"></h:homepage>
</h:person>
<h:person born="1977">

<h:profession>computer scientist</h:profession>
<h:homepage href="http://www.example.com/janedoe"></h:homepage>
</h:person>
</people>

Thank You for the answer. I got it working with beautifulsoup. — Akhitha, May 8 at 11:30
Thank You for the answer. I got it working with beautifulsoup. But, there are namespaces in XML tags. How can I search for a particular tag if namespace is present. I used find and find_all, but its not returning the values. — Akhitha, May 8 at 13:23

asked	1 month ago
viewed	61 times
active	1 month ago

current community

your communities

more stack exchange communities

python xml parsing

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged python xml or ask your own question.

Community Bulletin

Hot Network Questions

current community

your communities

more stack exchange communities

python xml parsing

1 Answer

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python xml or ask your own question.

Community Bulletin

Related

Hot Network Questions