Unicode
Navigate Language Fundamentals topic: ) |
Most Java program text consists of ASCII characters, but any Unicode character can be used as part of identifier names, in comments, and in character and string literals. For example, π (which is the Greek Lowercase Letter pi) is a valid Java identifier:
![]() |
Code section 3.100: Pi.
|
and in a string literal:
![]() |
Code section 3.101: Pi literal.
|
Unicode escape sequences[edit]
Unicode characters can also be expressed through Unicode Escape Sequences. Unicode escape sequence may appear anywhere in a Java source file (including inside identifiers, comments, and string literals).
Unicode escape sequences consist of
- a backslash '
\
' (ASCII character 92, hex 0x5c), - a '
u
' (ASCII 117, hex 0x75) - optionally one or more additional '
u
' characters, and - four hexadecimal digits (the characters '
0
' through '9
' or 'a
' through 'f
' or 'A
' through 'F
').
Such sequences represent the UTF-16 encoding of a Unicode character. For example, 'a' is equivalent to '\u0061'. This escape method does not support characters beyond U+FFFF or you have to make use of surrogate pairs.[1]
Any and all characters in a program may be expressed in Unicode escape characters, but such programs are not very readable, except by the Java compiler! They are not compact either!
One can find a full list of the characters here.
π may also be represented in Java as the Unicode escape sequence \u03C0
. Thus, the following is a valid, but not very readable, declaration and assignment:
![]() |
Code section 3.102: Unicode escape sequences for Pi.
|
The following demonstrates the use of Unicode escape sequences in other Java syntax:
![]() |
Code section 3.103: Unicode escape sequences in a string literal.
|
Note that a Unicode escape sequence functions just like any other character in the source code. E.g., \u0022
(double quote, ") needs to be quoted in a string just like ".
![]() |
Code section 3.104: Double quote.
|
International language support[edit]
The language distinguishes between bytes and characters. Characters are stored internally using UCS-2, although as of J2SE 5.0, the language also supports using UTF-16 and its surrogates. Java program source may therefore contain any Unicode character.
The following is thus perfectly valid Java code; it contains Chinese characters in the class and variable names as well as in a string literal:
![]() |
Code listing 3.50: 哈嘍世界.java
|