I don't have enough reputation to add in comments, so I just write some of my findings of this annoying TypeError here:
Basically, I think it's a bug in the json.dump()
function in Python 2 only - It can't dump a Python (dictionary / list) data containing non-ASCII characters, even you open the file with the encoding = 'utf-8'
parameter. (i.e. No matter what you do). But, json.dumps()
works on both Python 2 and 3.
To illustrate this, following up phihag's answer: the code in his answer breaks in Python 2 with exception TypeError: must be unicode, not str
, if data
contains non-ASCII characters. (Python 2.7.6, Debian):
import json
data = {u'\u0430\u0431\u0432\u0433\u0434': 1} #{u'абвгд': 1}
with open('data.txt', 'w') as outfile:
json.dump(data, outfile)
It however works fine in Python 3.
Antony Hatchkins's first solution works in Python 2, but as a follow up to his answer: The unicode
part is not necessary (and IMHO, it's wrong), the following works on my system (ensure_ascii=Fasle
is also not strictly needed for the code to work, but more related to the output format):
import io, json
with io.open('data.txt', 'w', encoding='utf-8') as f:
f.write(json.dumps(data, ensure_ascii=False))
In fact, the above code work in both Python 2 and 3 (the original code with unicode()
call won't work in Python 3 because unicode()
function no longer exists in Python 3). I guess the reason both code (with / without unicode()
call) work in Python 2 is because Python 2 uses bytes
while to write on the file handler, and unicode
is converted to bytes
implicitly before writing.
json
docs – Martin Thoma Oct 11 '15 at 14:44