Java Programming/Syntax/Unicode Escape Sequences

Most characters in a Java source program are ASCII characters, although all Unicode characters can be used in comments, character and string literals. Unicode characters can also be expressed through Unicode Escape Sequences.

Unicode escape sequence may appear anywhere in a Java source file (including inside identifiers, comments, and string literals.)

Unicode escape sequences consist of

a backslash '\' (ASCII character 92, hex 0x5c),
a 'u' (ASCII 117, hex 0x75)
optionally one or more additional 'u' characters, and
four hexadecimal digits (the characters '0' through '9' or 'a' through 'f' or 'A' through 'F').

Such sequences represent the UTF-16 encoding of a Unicode character, for example, 'a' is equivalent to '\u0061'. This escape method does not support characters beyond U+FFFF or you have to make use of surrogate pairs.^[1]

Any and all characters in a program may be expressed in Unicode escape characters, but such programs are not very readable, except by the Java compiler! They are not compact either!

One can find a full list of the characters here: unicode chart.

[edit] References

↑ "3.1 Unicode", The Java™ Language Specification [1], Java SE 7 Edition, pp. 15-16.

[0] "3.1 Unicode", The Java™ Language Specification [1], Java SE 7 Edition, pp. 15-16.

[1]

Java Programming/Syntax/Unicode Escape Sequences

[edit] References

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Community

Toolbox

Sister projects

Print/export