Suppose I press the A key in a text editor and this inserts the character a
in the document and displays it on the screen. I know the editor application isn't directly communicating with the hardware (there's a kernel and stuff in between), so what is going on inside my computer?
|
|||
|
There are several different scenarios; I'll describe the most common ones. The successive macroscopic events are:
GUI applicationsThe de facto standard graphical user interface of unix systems is the X Window System, often called X11 because it stabilized in the 11th version of its core protocol between applications and the display server. A program called the X server sits between the operating system kernel and the applications; it provides services including displaying windows on the screen and transmitting key presses to the window that has the focus. Input
First, information about the key press and key release is transmitted from the keyboard to the computer and inside the computer. The details depend on the type of hardware. I won't dwell more on this part because the information remains the same throughout this part of the chain: a certain key was pressed or released.
When a hardware event happens, the CPU triggers an interrupt, which causes some code in the kernel to execute. This code detects that the hardware event is a key press or key release coming from a keyboard and records the scan code which identifies the key. The X server reads input events through a device file, for example X calls the scan code that it reads a keycode. The X server maintains a table that translates key codes into keysyms (short for “key symbol”). Keycodes are numeric, whereas keysyms are names such as There are two mechanisms to configure the mapping from keycodes to keysyms:
Applications connect to the X server and receive a notification when a key is pressed while a window of that application has the focus. The notification indicates that a certain keysym was pressed or released as well as what modifiers are currently pressed. You can see keysyms by running the program In a typical configuration, when you press the key labeled A with no modifiers, this sends the keysym Relationship of keyboard layout and xmodmap goes into more detail on keyboard input. How do mouse events work in linux? gives an overview of mouse input at the lower levels. Output
There are two ways to display a character.
See What are the purposes of the different types of XWindows fonts? for a discussion of client-side and server-side text rendering under X11. What happens between the X server and the Graphics Processing Unit (the processor on the video card) is very hardware-dependent. Simple systems have the X server draw in a memory region called a framebuffer, which the GPU picks up for display. Advanced systems such as found on any 21st century PC or smartphone allow the GPU to perform some operations directly for better performance. Ultimately, the GPU transmits the screen content pixel by pixel every fraction of a second to the monitor. Text mode application, running in a terminalIf your text editor is a text mode application running in a terminal, then it is the terminal which is the application for the purpose of the section above. In this section, I explain the interface between the text mode application and the terminal. First I describe the case of a terminal emulator running under X11. What is the exact difference between a 'terminal', a 'shell', a 'tty' and a 'console'? may be useful background here. After reading this, you may want to read the far more detailed What are the responsibilities of each Pseudo-Terminal (PTY) component (software, master side, slave side)? Input
The terminal emulator receives events like “ Printable characters outside the ASCII range are transmitted as one or more byte depending on the character and encoding. For example, in the UTF-8 encoding of the Unicode character set, characters in the ASCII range are encoded as a single bytes, while characters outside that range are encoded as multiple bytes. Key presses that correspond to a function key or a printable character with modifiers such as Ctrl or Alt are sent as an escape sequence. Escape sequences typically consist of the character escape (byte value 27 = 0x1B = Different terminals send different escape sequences for a given key or key combination. Fortunately, the converse is not true: given a sequence, there is in practice at most one key combination that it encodes. The one exception is the character 127 = 0x7f = In a terminal, if you type Ctrl+V followed by a key combination, this inserts the first byte of the escape sequence from the key combination literally. Since escape sequences normally consist only of printable characters after the first one, this inserts the whole escape sequence literally. See key bindings table? for a discussion of zsh in this context. The terminal may transmit the same escape sequence for some modifier combinations (e.g. many terminals transmit a space character for both Space and Shift+Space). A few keys are not transmitted at all, for example modifier keys or keys that trigger a binding of the terminal emulator (e.g. a copy or paste command). It is up to the application to translate escape sequences into symbolic key names if it so desires. Output
Output is rather simpler than input. If the application outputs a character to the pty device file, the terminal emulator displays it at the current cursor position. (The terminal emulator maintains a cursor position, and scrolls if the cursor would fall under the bottom of the screen.) The application can also output escape sequences (mostly beginning with Escape sequences supported by the terminal emulator are described in the termcap or terminfo database. Most terminal emulator nowadays are fairly closely aligned with xterm. See Documentation on LESS_TERMCAP_* variables? for a longer discussion of terminal capability information databases, and How to stop cursor from blinking and Can I set my local machine's terminal colors to use those of the machine I ssh into? for some usage examples. Application running in a text consoleIf the application is running directly in a text console, i.e. a terminal provided by the kernel rather than by a terminal emulator application, the same principles apply. The interface between the terminal and the application is still a byte stream which transmits characters, with special keys and commands encoded as escape sequences. Remote application, accessed over the networkRemote text applicationIf you run a program on a remote machine, e.g. over SSH, then the network communication protocol relays data at the pty level.
This is mostly transparent, except that sometimes the remote terminal database may not know all the capabilities of the local terminal. Remote X11 applicationThe communication protocol between applications an the server is itself a byte stream that can be sent over a network protocol such as SSH.
This is mostly transparent, except that some acceleration features such as movie decoding and 3D rendering that require direct communication between the application and the display are not available. |
|||||||||
|
If you want to see this in a Unix system that is small enough to be understandable, dig into Xv6. It is more or less the mythical Unix 6th Edition that became the base of John Lion's famous comentary, long circulated as samizdat. Its code was reworked to compile under ANSI C and taking modern developments, like multiprocessors, in count. |
|||
|