The xml
data type can be used to
store XML data. Its advantage over storing XML data in a
text
field is that it checks the input
values for well-formedness, and there are support functions to
perform type-safe operations on it; see Section 9.14. Use of this
data type requires the installation to have been built with
configure --with-libxml
.
The xml
type can store well-formed
“documents”, as defined by the XML standard,
as well as “content” fragments, which are defined by
the production XMLDecl? content
in
the XML standard. Roughly, this means that content fragments can
have more than one top-level element or character node. The
expression
can
be used to evaluate whether a particular xmlvalue
IS DOCUMENTxml
value is a full document or only a content
fragment.
To produce a value of type xml
from character data, use the function xmlparse
:
XMLPARSE ( { DOCUMENT | CONTENT } value
)
Examples:
XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter></book>') XMLPARSE (CONTENT 'abc<foo>bar</foo><bar>foo</bar>')
While this is the only way to convert character strings into XML values according to the SQL standard, the PostgreSQL-specific syntaxes:
xml '<foo>bar</foo>' '<foo>bar</foo>'::xml
can also be used.
The xml
type does not validate
input values against a document type declaration (DTD), even when the input
value specifies a DTD. There is also currently no built-in
support for validating against other XML schema languages such
as XML Schema.
The inverse operation, producing a character string value
from xml
, uses the function
xmlserialize
:
XMLSERIALIZE ( { DOCUMENT | CONTENT }value
AStype
)
type
can be
character
, character varying
, or text
(or an alias for one of those). Again,
according to the SQL standard, this is the only way to convert
between type xml
and character types,
but PostgreSQL also allows you to simply cast the value.
When a character string value is cast to or from type
xml
without going through
XMLPARSE
or XMLSERIALIZE
, respectively, the choice of
DOCUMENT
versus CONTENT
is determined by the “XML option” session
configuration parameter, which can be set using the standard
command:
SET XML OPTION { DOCUMENT | CONTENT };
or the more PostgreSQL-like syntax
SET xmloption TO { DOCUMENT | CONTENT };
The default is CONTENT
, so all
forms of XML data are allowed.
With the default XML option setting, you cannot directly
cast character strings to type xml
if they contain a document type declaration, because the
definition of XML content fragment does not accept them. If
you need to do that, either use XMLPARSE
or change the XML option.
Care must be taken when dealing with multiple character
encodings on the client, server, and in the XML data passed
through them. When using the text mode to pass queries to the
server and query results to the client (which is the normal
mode), PostgreSQL converts all character data passed between
the client and the server and vice versa to the character
encoding of the respective end; see Section 23.3. This
includes string representations of XML values, such as in the
above examples. This would ordinarily mean that encoding
declarations contained in XML data can become invalid as the
character data is converted to other encodings while traveling
between client and server, because the embedded encoding
declaration is not changed. To cope with this behavior,
encoding declarations contained in character strings presented
for input to the xml
type are
ignored, and content is
assumed to be in the current server encoding. Consequently, for
correct processing, character strings of XML data must be sent
from the client in the current client encoding. It is the
responsibility of the client to either convert documents to the
current client encoding before sending them to the server, or
to adjust the client encoding appropriately. On output, values
of type xml
will not have an encoding
declaration, and clients should assume all data is in the
current client encoding.
When using binary mode to pass query parameters to the server and query results back to the client, no encoding conversion is performed, so the situation is different. In this case, an encoding declaration in the XML data will be observed, and if it is absent, the data will be assumed to be in UTF-8 (as required by the XML standard; note that PostgreSQL does not support UTF-16). On output, data will have an encoding declaration specifying the client encoding, unless the client encoding is UTF-8, in which case it will be omitted.
Needless to say, processing XML data with PostgreSQL will be less error-prone and more efficient if the XML data encoding, client encoding, and server encoding are the same. Since XML data is internally processed in UTF-8, computations will be most efficient if the server encoding is also UTF-8.
Some XML-related functions may not work at all on
non-ASCII data when the server encoding is not UTF-8. This is
known to be an issue for xmltable()
and xpath()
in particular.
The xml
data type is unusual in
that it does not provide any comparison operators. This is
because there is no well-defined and universally useful
comparison algorithm for XML data. One consequence of this is
that you cannot retrieve rows by comparing an xml
column against a search value. XML values
should therefore typically be accompanied by a separate key
field such as an ID. An alternative solution for comparing XML
values is to convert them to character strings first, but note
that character string comparison has little to do with a useful
XML comparison method.
Since there are no comparison operators for the xml
data type, it is not possible to create an
index directly on a column of this type. If speedy searches in
XML data are desired, possible workarounds include casting the
expression to a character string type and indexing that, or
indexing an XPath expression. Of course, the actual query would
have to be adjusted to search by the indexed expression.
The text-search functionality in PostgreSQL can also be used to speed up full-document searches of XML data. The necessary preprocessing support is, however, not yet available in the PostgreSQL distribution.
If you see anything in the documentation that is not correct, does not match your experience with the particular feature or requires further clarification, please use this form to report a documentation issue.