Parsing XML to send data into several files

Question

My code works for the most part. (The problems are on the html side not the python side) however I feel it could run quicker or more efficiently. Due to the size of the XML document it parses it's always going to take a little while though.

The script parses the XML file and finds the names of several things, estates, symbols and ticktypes. I then create text files storing these items and for estates also use the information in an html file which will be expanded and improved later.

How could I make my code a bit sleeker and more efficient?

from xml.dom import minidom
from os import remove


#THIS IS THE LIVE XML FILE
xmldoc = minidom.parse('xxx.xml')

"""
THIS IS THE TEST XML FILE
xmldoc = minidom.parse('xxx.xml')
"""

#Establishing Lists of Items for use in defined Functions
symlist = xmldoc.getElementsByTagName('Symbol')
EstateList = xmldoc.getElementsByTagName('Estate')
ticklist = xmldoc.getElementsByTagName('Ticktype')
tickset = set(s.firstChild.data for s in ticklist)


#ESTATES
def ESTATES():
    with open('estate.txt', 'w') as f:
        f.write("There are currently %d Data Estates \n \n" % len(EstateList))
        for s in EstateList:
             f.write(s.attributes['EstateName'].value + "\n")

#TICK TYPES
def TICKS():
    with open('ticktypes.txt', 'w') as e:
        e.write("There are currently %d unique Tick Types \n \n" % len(tickset))
        for s in tickset:
            e.write(s + "\n")

#SYMBOLS
def SYMS():
    with open('symbollist.txt', 'w') as g:
        g.write("There are currently %d unique Symbols \n \n" % len(symlist))
        for s in symlist:
            g.write(s.attributes['SymbolName'].value + "\n")

 def uniquelines(lineslist):
     unique = {}
     result = []
    for item in lineslist:
        if item.strip() in unique: continue
        unique[item.strip()] = 1
        result.append(item)
    return result


with open('Z:\Joe\\temp.txt', 'w') as z:
    for element in EstateList:
        EstateStr = element.getAttribute('EstateName')
        asset = EstateStr.split('.', 3)[2]
        z.write(asset + "\n")


with open('Z:\Joe\\temp.txt','r+') as f:
    filelines = f.readlines()


with open("Z:\Joe\\assets.txt", "w") as output:
output.write("There are currently %d unique Assets \n \n" %len(uniquelines(filelines)))
output.writelines(uniquelines(filelines))

remove('Z:\Joe\\temp.txt')

ESTATES()
TICKS()
SYMS()

print "There are currently %d Symbols, %d Unique Tick Types, %d Data Estates and %d unique assets" % (len(symlist), len(tickset), len(EstateList), len(uniquelines(filelines)))


print "Compiling data into html pages"


with open("C:\code\jsdd\Dropdown\home.html", "w") as f:
    f.write('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> \n')
    f.write('<html xmlns="http://www.w3.org/1999/xhtml"> \n')
    f.write('<head> \n')
    f.write('<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> \n')
    f.write('<title>JavaScript Dropdown</title> \n')
    f.write('<link rel="stylesheet" href="design.css" type="text/css" /> \n')
    f.write('<script type="text/javascript" src="drop.js"></script> \n')
    f.write('</head> \n')
    f.write('<body> \n')
    f.write('\n')
    f.write('<dl class="dropdown"> \n')
    f.write('\t <dt id="one-ddheader" onmouseover="ddMenu(\'one\',1)" onmouseout="ddMenu(\'one\',-1)">Data Estates</dt> \n')
    f.write('\t <dd id="one-ddcontent" onmouseover="cancelHide(\'one\')" onmouseout="ddMenu(\'one\',-1)"> \n')
    f.write('\t \t <ul> \n')
    for estate in EstateList:
        f.write('\t \t  <li><a href="#" class="seperator">' + (estate.attributes['EstateName'].value) + '</a></li> \n')
    f.write('\t \t </ul> \n')
    f.write('\t </dd> \n')
    f.write('\t </dl> \n')
    f.write('</dl> \n')
    f.write('<div style="clear:both" /> \n')
    f.write('</body> \n')
    f.write('</html> \n')

I have changed some file names and hidden some as this is for work so don't want to risk breaching confidentiality.

jcollado · Accepted Answer · 2014-07-31 08:13:13Z

up vote 1 down vote accepted

Not having any example xml file, I cannot test the code provided, but I can see that it relies on xml.dom.minidom.getElementsByTagName. My suggestion here is to give a try xml.sax and see if it's more efficient because:

There's no need to keep in memory the whole xml document
You can get results about the three things you're interested in in a single pass

Some comments regarding the style of the code provided:

Use # for comments. Note that triple quotes is just a way to create a string object and drop it immediately unless is assigned to a variable.
Function names should be in lower case letters
Avoid short names for variables like e and g (I don't have a problem with f though if it's a file).
Use a main function in the script to make clearer what is the intended execution path
Write docstrings (PEP257)
Use a template engine to generate the HTML document (I like jinja, but there are others like mako or cheetah).

edited Jul 31 '14 at 8:13

answered Jul 30 '14 at 14:02

jcollado

1,28847

I would've liked to add an example XML file but because it deals with financial data I didn't want to upload the actual thing. It also contains 90,000+ of the symbols I mention in the script, so to edit it to hide the data would've taken far far too long. I shall take a look at xml.sax to see if it improves things, thanks for the recommendation – Joe Smart Jul 30 '14 at 14:04

@JoeSmart Ok, I understand. You can also try to generate random data. I've added some more comments regarding the code style, please have a look at them and, if you don't have much time, at least try to use a template engine, you won't regret. – jcollado Jul 30 '14 at 14:13

I've followed up some of those and modified my script a bit to better fit pythonic convention. However I am unsure what you mean by main method to make the script clearer. I am pretty new to python (only learnt it over the last week) and I've not really heard of doing that before. @jcollado – Joe Smart Jul 31 '14 at 8:05

Regarding the main function (sorry, I said method in my original response), have a look at this old article from Guido van Rossum. – jcollado Jul 31 '14 at 8:12

Brilliant, I see now how that could be very useful when writing a script that includes functions purely for the purpose of importing them into another script. However I'm still struggling to see how that would enhance my script. It's always going to be run as a standalone script and not imported. Is a main method still useful and do I just do if __name__ == "__main__": for each function I call? @jcollado – Joe Smart Jul 31 '14 at 8:30

| show 2 more comments

asked	2 years ago
viewed	202 times
active	2 years ago

current community

your communities

more stack exchange communities

Parsing XML to send data into several files

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged python html parsing xml file-system or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Parsing XML to send data into several files

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python html parsing xml file-system or ask your own question.

Related

Hot Network Questions