I have a CSV file that i need to rearrange and renecode. I'd like to run

line = line.decode('windows-1250').encode('utf-8')

on each line before it's parsed and split by the CSV reader. Or I'd like iterate over lines myself run the re-encoding and use just single line parsing form CSV library but with the same reader instance.

Is there a way to do that nicely?

share|improve this question

75% accept rate
Can you use Python 3? – Tim Pietzcker Feb 25 '10 at 15:24
No, but is there any difference? – WooYek Feb 27 '10 at 10:55
feedback

3 Answers

Loop over lines on file can be done this way:

with open('path/to/my/file.csv', 'r') as f:
    for line in f:
        puts line # here You can convert encoding and save lines

But if You want to convert encoding of a whole file You can also call:

$ iconv -f Windows-1250 -t UTF8 < file.csv > file.csv

Edit: So where the problem is?

with open('path/to/my/file.csv', 'r') as f:
    for line in f:
        line = line.decode('windows-1250').encode('utf-8')
        elements = line.split(",")
share|improve this answer
I do not want to read/write the file twice. The iconv solution is lame, I want it done in code no by some tool, I need to crate a tool that will prepare files in one process not instructions to do that. – WooYek Feb 25 '10 at 14:31
Again, no support for CSV parsing at the same time, line splitting just won't cut it. – WooYek Feb 25 '10 at 14:39
feedback
up vote 1 down vote accepted

Thx, for the answers. The wrapping one gave me an idea:

def reencode(file):
    for line in file:
        yield line.decode('windows-1250').encode('utf-8')

csv_writer = csv.writer(open(outfilepath,'w'), delimiter=',',quotechar='"', quoting=csv.QUOTE_MINIMAL)
csv_reader = csv.reader(reencode(open(filepath)), delimiter=";",quotechar='"')
for c in csv_reader:
    l = # rearange columns here
    csv_writer.writerow(l)

That's exactly what i was going for re-encoding a line just before it's get parsed by the csv_reader.

share|improve this answer
feedback

At the very bottom of the csv documentation is a set of classes (UnicodeReader and UnicodeWriter) that implements Unicode support for csv:

rfile = open('input.csv')
wfile = open('output.csv','w')
csv_reader = UnicodeReader(rfile,encoding='windows-1250')
csv_writer = UnicodeWriter(wfile,encoding='utf-8')
for c in csv_reader:
    # process Unicode lines
    csv_writer.writerow(c)
rfile.close()
wfile.close()
share|improve this answer
feedback

Your Answer

 
or
required, but never shown
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.