I was trying to open a file on Ubuntu in Python using:-
open('<unicode_string>', "wb")
unicode_string is '\u9879\u76ee\u7ba1\u7406'. It is a Chinese text. But I get the following error:-
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256)
I am trying to understand what limits this behavior? Went through some links. I understand that the OS's filesystem has the responsibility to encode the string. For windows the 'mbcs' encoding handles possibly every character. What could be the problem with linux.
- Does not fail for all linux setups. What should I be checking?
locale
(type that command, show its output)? NotablyLANG
andLC_ALL
environment variables? – Basile Starynkevitch Mar 23 at 20:53mbcs
does not handle every character, it would limit you to characters in the ANSI code page for that machine. However Python has special support for accepting pathnames as Unicode strings and passing those directly to Win32-specific APIs instead of using the C standard library calls withmbcs
encoding. – bobince Mar 24 at 11:41