Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I have a bunch of .csv files which I have to read and look for data. The .csv file is of the format:

A row of data I will ignore
State,County,City
WA,king,seattle
WA,pierce,tacoma

In every csv file, the order of columns is not consistent. For example in csv1 the order can be State,County,City, in csv2 it can be City,County,State. What I am interested is the State and County. Given a county I want to find out what State it is in. I am ignoring the fact that same counties can exist in multiple States. The way I am approaching this:

with open(‘file.csv’) as f:
    data = f.read()

# convert the data to iterable, skip the first line
reader = csv.DictReader(data.splitlines(1)[1:])
lines = list(reader)
counties = {k: v for (k,v in ((line[‘county’], line[‘State’]) for line in lines)}

Is there a better approach to this?

share|improve this question

1 Answer 1

up vote 4 down vote accepted

You're on the right track, using a with block to open the file and csv.DictReader() to parse it.

Your list handling is a bit clumsy, though. To skip a line, use next(f). Avoid making a list of the entire file's data, if you can process the file line by line. The dict comprehension has an unnecessary complication as well.

with open('file.csv') as f:
    _ = next(f)
    reader = csv.DictReader(f)
    counties = { line['County']: line['State'] for line in reader }

Your sample file had County as the header, whereas your code looked for line[‘county’]. I assume that the curly quotes are an artifact of copy-pasting, but you should pay attention to the capitalization.

share|improve this answer
    
I am really getting the data from an S3 bucket, but I didn't want to make the code more complicated in my example. So, I get the key from the bucket and then I say data = key.get_contents_as_string() So I am not really reading from a file. Instead, the contents of the key are the string representation of the csv file. I like the way you eliminated the list and cleaned up the dict comprehension, is there a way that I can avoid doing the data.splitlines(1)[1:]) when I create the reader since I already have the data in a string? (and i need to ignore the first row) –  Mark Nov 26 '14 at 3:39
4  
With all due respect, if you choose to strip out key relevant details in the code you submit for review, you should be prepared to accept an answer that addresses the code you submitted, not the code you had in mind. We would have been quite happy to review the code you actually wrote, had you submitted that instead. –  200_success Nov 26 '14 at 4:15
    
I apologize, I should have stated that earlier. Thank you for your help, I do appreciate it –  Mark Nov 26 '14 at 14:10

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.