Java Programming/Syntax/Unicode Escape Sequences

From Wikibooks, open books for an open world
Jump to: navigation, search

Most characters in a Java source program are ASCII characters, although all Unicode characters can be used in comments, character and string literals. Unicode characters can also be expressed through Unicode Escape Sequences.

Unicode escape sequence may appear anywhere in a Java source file (including inside identifiers, comments, and string literals.)

Unicode escape sequences consist of

  1. a backslash '\' (ASCII character 92, hex 0x5c),
  2. a 'u' (ASCII 117, hex 0x75)
  3. optionally one or more additional 'u' characters, and
  4. four hexadecimal digits (the characters '0' through '9' or 'a' through 'f' or 'A' through 'F').

Such sequences represent the UTF-16 encoding of a Unicode character, for example, 'a' is equivalent to '\u0061'. This escape method does not support characters beyond U+FFFF or you have to make use of surrogate pairs.[1]

Any and all characters in a program may be expressed in Unicode escape characters, but such programs are not very readable, except by the Java compiler! They are not compact either!

One can find a full list of the characters here: unicode chart.

[edit] References

  1. "3.1 Unicode", The Java™ Language Specification [1], Java SE 7 Edition, pp. 15-16.
Personal tools
Namespaces

Variants
Actions
Navigation
Community
Toolbox
Sister projects
Print/export