Using python to parse log files? [closed]

Question

My first useful projects as a programmer has been python scripts that parse out relevant information from log files and do some analysis. I've bumped around and found my way to some functional solutions, but have a sneaking suspicion there are more efficient approaches.

I will outline my current process with the 4 basic steps:

Clean up the source data: In the more involved scenarios I have a text file that is generally some sort of CSV variant. "Generally" because it might require a first pass to clean up outlier situations before I can effectively use the CSV module.
Write clean data to temporary text file: After cleaning up each line, I write the line to a fresh text file.
Read in clean formatted temp text file using the CSV module: I've assumed that reading the data in using the standard CSV module would be reasonably efficient method and ideal because then I can easily extract values from specific columns in each line.
Extract relevant values: Now I can easily traverse the whole file grabbing relevant data. I append the data to lists which I use to do the actual analysis.

The big red flag for me is that I'm traversing all of my data so many times. Maybe I should spend more time trying to find patterns in the data so I can extract the important values on the first pass? Also with larger logs (20,000+ lines) one of my scripts takes 15-30 seconds. That seems rather slow.

What are areas of optimization? Be it a modification of the current design, or a completely different approach.

I would usually do something like: cleanup file.txt | analyze to avoid the temporary file. — U2EF1, Feb 9 '14 at 1:00

Wyatt Barnett · Answer 1 · 2014-02-07 18:39:35Z

up vote 0 down vote

A slightly different approach I would take would be to stuff the cleaned data into a database rather than back into another flat file. From there you can do much easier data traversal and querying rather than having to take multiple passes through files.

answered Feb 7 '14 at 18:39

Wyatt Barnett
17.7k3559

One restriction is I have been using pyinstaller to create a portable executable. Seems using a database would complicate this? Also, I don't have database experience yet. Could you point in the direction of something basic that plays nice with python? – Anticipation Feb 7 '14 at 21:28

Sqllite is a good option here -- it is pretty self-contained and can be distributed in a portable manner. – Wyatt Barnett Feb 7 '14 at 23:08

add a comment |

asked	1 year ago
viewed	1007 times
active	1 year ago

current community

your communities

more stack exchange communities

Using python to parse log files? [closed]

closed as too broad by gnat, MichaelT, GlenH7, Bart van Ingen Schenau, Michael Kohne Feb 12 '14 at 13:39

1 Answer 1

Not the answer you're looking for? Browse other questions tagged python parsing csv or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Using python to parse log files? [closed]

closed as too broad by gnat, MichaelT, GlenH7, Bart van Ingen Schenau, Michael Kohne Feb 12 '14 at 13:39

1 Answer 1

Not the answer you're looking for? Browse other questions tagged python parsing csv or ask your own question.

Related

Hot Network Questions