Let Σ be a non-empty, finite set of symbols, called an alphabet. Then Σ* is the countable infinite set of finite words that can be formed by concatenating zero or more symbols from Σ. Any well-defined subset L ⊆ Σ* is a language.
Let's apply this to XML. Its alphabet is the Unicode character set U, which is non-empty and finite. Not every concatenation of zero or more Unicode characters is a well-formed XML document, for example, the string
<tag> soup &; not <//good>
is clearly not. The subset XML ⊂ U* that forms well-formed XML documents is decidable (or “recursive”). There exists a machine (algorithm or computer program) that takes as input any word w ∈ U* and after a finite amount of time, outputs either 1 if w ∈ XML and 0 otherwise. Such an algorithm is a sub-routine of any XML processing software. Not all languages are decidable. For example, the set of valid C programs that terminate in a finite amount of time, is not (this is known as the halting problem). When one designs a new language, an important decision to make is whether it should be as powerful as possible or whether the expressiveness would better be restricted in favor of decidability.
Schemata are an addition to XML that allow refining the set of well-formed documents. A well-formed document that follows a certain schema is called valid according to that schema. For example, the string
<?xml version="1.0" encoding="utf-8" ?>
<root>all evil</root>
is a well-formed XML document but not a valid XHTML document. There exists schemata for XHTML, SVG, XSLT and what not else. Schema validation can also be done by an algorithm that is guaranteed to halt after finite amount of steps for every input. Such a program is called a validator or a validating parser.
Because you can define your own schemata, XML is called an extensible language, which is the origin of the “X” in “XML”.
You can define a set of rules that gives XML documents an interpretation as descriptions of computer programs. XSLT, mentioned earlier, is an example of such a programming language built with XML. More generally, you can serialize the abstract syntax tree of almost any programming language quite naturally into XML, if this is what you want.