How to optimize this program in Python?

Question

I have one file with indexes. In example, they are:

1    Fruit
2    Meat
3    Fish        
4    Salmon
5    Pork
6    Apple

And a dictionary, because I want to match entries that I choose, so for example

similar = {'1' : '6', '2' : '5'}

Since they are on the same file, what my program does is, IE, scans for the '1' on the file, and then re-scans from the beginning looking for the '6'. Same with all the numbers, re-scanning always.

It's a very large file with a lot of numbers.

This is some code

similar = { '4719' : '4720c01' }

for aline in invFormatted:
    lines = aline.split("\t") #Splits the CSV file on every tab.
    lotID = lines[0]
    partNO = lines[1] #Splits and gets the part that would be the "index"
    if similar.has_key(partNO):
        for alines in invFormatted:
            liness = alines.split("\t")
            lotIDmatch = liness[0]
            partNOmatch = liness[1]
            if liness[1] == similar[partNO]:

Would there be a way to make it so it only scans it once?

Or any other ideas to put better this program?

Cheers!

In fear of falling in the TL;DR category or the "you want me to do this code for you" category, I didn't post the full thing, but since some people asked, here it is (don't kill me).

There is a text file formatted this way.

51689299    4719    Medium Azure    Riding Cycle
51689345    4720c01 Trans-Clear Wheel & Tire Assembly

In real life, it would have thousands of entries.

Then, my python file 'knows' that the part number 4719 matches the part number 4720c01.

similar = { '4719' : '4720c01' }

Now, what it does. (I think!)

invFormatted = open('export.txt', 'r') #Opens the file with the part numbers

with open ('upload.txt', 'a') as upload:

    upload.write("<INVENTORY>") #Something that would be on the export

    for aline in invFormatted:
        lines = aline.split("\t") #Splits the text file on every tab
        lotID = lines[0] #The lot ID is the first word
        partNO = lines[1] #Part number the second one
        if partNO in similar: #Sees if the part number in question has a match, if YES
            for alines in invFormatted: #Searchs my inventory file again to see if the match is in my inventory
                liness = alines.split("\t")
                lotIDmatch = liness[0]
                partNOmatch = liness[1]
                if liness[1] == similar[partNO]: #if YES, prints out what I want it to print, and goes back to the first step
                    upload.write("blabla")

    invFormatted.close()

    upload.write("\n</INVENTORY>")

There it is, thank you!

I have gotten here so far, I think it's close to as much optimization as I can. Maybe someone can think of else?

parts = {}

infile = open("export.txt")
for line in infile:
    line = line.strip()
    xml = [p.strip() for p in line.split("\t")]
    parts[xml[1]] = (xml[0], xml[2], xml[3])

for aline in invFormatted:
    lines = aline.split("\t") #Splits the CSV file on every "
    lotID = lines[0]
    partNO = lines[1]
    if partNO in similar:
        similarPart = similar[partNO]
        if partNO in parts:
                upload.write("\n <ITEM>\n  <LOTID>" + lotID + "</LOTID>\n  <DESCRIPTION>To be used with &amp;lt;a href=\"/storeDetail.asp?b=-16205034&amp;h=314137&amp;q=" + similarPart + "\"&amp;gt;" + similarPart + "&amp;lt;/a&amp;gt;</DESCRIPTION>\n" + " </ITEM>")

#Splits the CSV file on every "," - no it doesn't. It splits the lines on every tab. Also, if this is supposed to be CSV, there's a module for that. — user2357112, Jan 6 at 9:31
Your example code doesn't demonstrate the double-scanning your question is about. Could you provide a different example? — user2357112, Jan 6 at 9:32
Of course, that comment was actually from another file I had where it splitted them on the commas. Always a module for that. :-) — Brick Top, Jan 6 at 9:32
The simple thing to do is to read the whole file in memory to a dict, and do your lookup there. If the file is too big to fit in memory, you can use pytables with an indexed column. Or even simpler; if the numbers are always continuously counting from 1, you can simply seek for the correct line in the file. — Eelco Hoogendoorn, Jan 6 at 9:39
@EelcoHoogendoorn, so, something like this? stackoverflow.com/questions/4803999/python-file-to-dictionary PS the file actually has 3 variables, can you do a 3 variable dict? — Brick Top, Jan 6 at 9:44

volcano · Accepted Answer · 2014-01-06 09:40:23Z

If you rebuild your similar dictionary to be two-way:

complement = {v:k for k, v in similar.iteritems()}
similar.update(complement)

You may skip second pass (BTW, drop has_key - it is an old form):

if part_no in similar:

asked	2 months ago
viewed	31 times
active	2 months ago

current community

your communities

more stack exchange communities

How to optimize this program in Python?

migrated from stackoverflow.com Jan 15 at 14:14

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged python optimization or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

How to optimize this program in Python?

migrated from stackoverflow.com Jan 15 at 14:14

1 Answer

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python optimization or ask your own question.

Related

Hot Network Questions