XML data type in SQL Server 2005 vs. VARCHAR (MAX)
As a database administrator, I tend to look closely at performance issues and how to make sure
the use of XML does not affect SQL Server performance. In this tip, I'll take you through an
example using two tables, one inserted and queried with XML and the other with the VARCHAR (MAX)
data type. Take a look at the storage, CPU and I/O results and make the best choice for your SQL
Server environment.
Note: The tests used here use only base tables, with no indexes.
If you're interested in a comparison of performance implications using
Premium Access
Register now for unlimited access to our premium content across our network of over 70 information Technology web sites.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States.
Privacy
This was first published in January 2008
T-SQL commands
and XML AUTO in SQL Server, check out my previous tip.
The XML data type
The XML data type logically differs from the VARCHAR (MAX) in the checks done by SQL Server to make
sure the content is indeed valid XML.
Testing environment description For my testing, I will use XML I copied from an *.rdl
file (Reporting Services Report), which is 265KB long. I will create two tables with the same
structure, except one table uses XML data type and the other uses VARCHAR (MAX) data type for the
MyXML field:
create table TryVACRCHARDatatype(
id int identity not null,
MyXML VARCHAR(MAX) null
)
go
create table TryXMLDatatype (
id int identity not null,
MyXML XML null
) Go
I will insert the XML data the same way to each table:
set statistics io on
Go
declare @XML XML
-- My big XML (for space reasons I am not including all of it
here) SET @XML = '<?xml
version="1.0" encoding="utf-8"?>
………
'
insert into TryXMLDatatype (MyXML) values
(@XML)
go
declare @Varch VARCHAR(MAX)
-- My big XML (for space reasons I am not including all of it
here)
SET @Varch = '<?xml version="1.0"
encoding="utf-8"?>
………
'
insert into TryXMLDatatype (MyXML) values
(@Varch)
Go
Insert XML data into the tables
I monitored the above insert commands in SQL Profiler and with statistics I/O, running them
twice.
Statistics I/O results:
Table 'TryXMLDatatype'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob
logical reads 18, lob physical reads 0, lob
read-ahead reads 0.
Table 'TryVACRCHARDatatype'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads
0, lob logical reads 90, lob physical reads 0, lob
read-ahead reads 0.
Execution plans show the exact same plan for both commands.
Profiler results (two executions of each insert):
Note: The XML insertion takes more CPU but less reads and writes. Also, inserting the XML
data type has more duration. I will analyze this behavior in the next section.
Querying the tables
I ran SELECT * FROM in each table and monitored with Profiler and Statistics I/O:
SELECT * FROM
TryXMLDatatype
Go
SELECT * FROM TryVACRCHARDatatype
Go
Results:
Note: The values in the fields are different!
Statistics I/O results:
Table 'TryXMLDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob
logical reads 56, lob physical reads 0, lob
read-ahead reads 18. Table 'TryVACRCHARDatatype'.
Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 200, lob physical reads 0, lob read-ahead reads 0.
Less I/O is issued with the XML data type.
Again, execution plans show the exact same plan for both commands.
Why is there such a difference between the reads of the two tables?
Note: The duration of extracting the XML is higher. The duration is subjected to many
different factors depending on the activity on the machine. I'll ignore this difference, mainly
because no CPU is shown. If CPU activity was involved, it might explain the difference in duration,
but this is not the case.
Let's query the length of the columns in each table:
SELECT datalength(MyXML)
FROM TryXMLDatatype
Go
SELECT datalength(MyXML) FROM TryVACRCHARDatatype
Go
Surprisingly, the XML field has far fewer characters than the other. Here is the
explanation:
XML -- less I/O:
When inserting the XML data type, the "extra" data such as the "
and tabs
are removed from the field too. The result is a much more economical storage.
I tried to copy the value from the XML column to the VARCHAR(MAX) column as follows:
truncate table TryVARCHARDatatype
Go
insert into TryVARCHARDatatype (MyXML)
select convert(varchar(max),MyXML) from TryXMLDatatype
Go
The result:
Looks the same, but…
SELECT datalength(MyXML)
FROM TryXMLDatatype
Go
SELECT
datalength(MyXML)
FROM
TryVACRCHARDatatype
Go
This is better than before (136,268 instead of 271,210), but still not the same. If I convert
the XML to a VARCHAR(MAX) and calculate the length:
SELECT
datalength(convert(varchar(max),MyXML)) FROM TryXMLDatatype
go
SELECT datalength(MyXML)FROM TryVARCHARDatatype
Go
Same length.
This means that the XML datatype is stored in a more efficient way. And now that both tables
contain the same data, let's view the Profiler trace of the select * from both tables (three
executions):
There are about 20% fewer reads for the XML datatype.
 |
More on improving SQL Server performance: |
|
|
|
 |
 |
The statistics I/O:
Table 'TryXMLDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob
logical reads 56, lob physical reads 0, lob
read-ahead reads 18.
Table 'TryVARCHARDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0,
lob logical reads 100, lob physical reads 0, lob
read-ahead reads 0.
Again, it shows a more efficient read operation with the XML datatype.
What about querying the XML converted to VARCHAR(MAX):
SELECT convert(varchar(max),MyXML) FROM TryXMLDatatype
Go
SELECT MyXML FROM
TryVARCHARDatatype
Go
Table 'TryXMLDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob
logical reads 20, lob physical reads 0, lob
read-ahead reads 9.
Table 'TryVARCHARDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0,
lob logical reads 100, lob physical reads 0, lob
read-ahead reads 0.
The only metric with a constant behavior is the reads I/O. Furthermore, it is more likely for
the XML convertion query to use CPU for the conversion -- that's normal.
Conclusion
The XML data type is meant to store pure XML data, excluding unnecessary headers and trailing
characters. This results in a more cost-effective storage by means of I/O, yet there is still the
CPU cost for checking that the data is valid XML.
ABOUT THE AUTHOR
Michelle Gutzait works as a senior database consultant for ITERGY International Inc., an IT consulting firm
specializing in the design, implementation, security and support of Microsoft products in the
enterprise. Gutzait has been involved in IT for 20 years as a developer, business analyst and
database consultant. For the last 10 years, she has worked exclusively with SQL Server. Her skills
include SQL Server infrastructure design, database design, performance tuning, security, high
availability, VLDBs, replication, T-SQL/packages coding, and more.
Disclaimer:
Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.
Join the conversationComment
Share
Comments
Results
Contribute to the conversation