How to maintain SQL Server indexes for query optimization
Matthew Schroeder, Contributor
Maintaining SQL Server indexes is an uncommon practice. If a query stops using indexes,
oftentimes a new non-clustered index is created that simply holds a different combination of
columns or the same columns. A detailed analysis on why SQL Server is ignoring those indexes is not
explored.
Let's take a look at how
Premium Access
Register now for unlimited access to our premium content across our network of over 70 information Technology web sites.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States.
Privacy
This was first published in March 2008
clustered
and
non-clustered
indexes are selected and why query optimizer might choose a table scan instead of a
non-clustered index. In this tip, you'll learn how page splits, fragmented indexes, table
partitions and statistics updates affect the use of indexes. Ultimately, you'll find out how to
maintain SQL Server indexes so that query optimizer uses these indexes, and so these indexes are
searched quickly.
Index selection
Clustered indexes are by far the easiest to understand in the area of index selection. Clustered
indexes are basically keys that reference each row uniquely. Even if you define a clustered index
and do not declare it as unique, SQL Server still makes the clustered index unique behind the
scenes by adding a 4-byte "uniqueifier" to it. The additional "uniqueifier" increases the width of
the clustered index, which causes increased maintenance time and slower searches. Since clustered
indexes are the key that identifies each row, they are used in every query.
When we start talking about non-clustered indexes, things get confusing. Queries can ignore
non-clustered indexes for the following reasons:
- High fragmentation – If an index is fragmented over 40%, the optimizer will probably ignore the
index because it's more costly to search a fragmented index than to perform a table scan.
- Uniqueness – If the optimizer determines that a non-clustered index is not very unique, it may
decide that a table scan is faster than trying to use the non-clustered index. For example: If a
query references a bit column (where bit = 1) and the statistics on the column say that 75% of the
rows are 1, then the optimizer will probably decide a table scan will get the results faster versus
trying to scan over a non-clustered index.
- Outdated statistics – If the statistics on a column are out of date, then SQL Server can
misguide the benefit of a non-clustered index. Automatically updating statistics doesn't just slow
down your data modification scripts, but over time it also becomes out of sync with the real
statistics of the rows. Occasionally it's a good idea to run sp_updatestats or UPDATE
STATISTICS.
- Function usage – SQL Server is unable to use indexes if a function is present in the criteria.
If you're referencing a non-clustered index column, but you're using a function such as
convert(varchar, Col1_Year) = 2004, then SQL Server cannot use the index on Col1_Year.
- Wrong columns – If a non-clustered index is defined on (col1, col2, col3) and your query has a
where clause, such as "where col2 = 'somevalue'", that index won't be used. A non-clustered index
can only be used if the first column in the index is referenced within the where clause. A where
clause, such as "where col3 = 'someval'", would not use the index, but a where clause, like "where
col1 = 'someval'" or "where col1='someval and col3 = 'someval2'" would pick up the index.
The index would not use col3 for its seek, since that column is not after col1 in the index
definition. If you wanted col3 to have a seek occur in situations such as this, then it is best if
you define two separate non-clustered indexes, one on col1 and the other on col3.
Page splits
To store data, SQL Server uses pages that are 8 kb data blocks. The amount of data filling the
pages is called the fill factor, and the higher the fill factor, the more full the 8 kb page
is. A higher fill factor means fewer pages will be required resulting in less IO/CPU/RAM usage. At
this point, you might want to set all your indexes to 100% fill factor; however, here is the
gotcha: Once the pages fill up and a value comes in that fits within a filled-up index range, then
SQL Server will make room in an index by doing a "page split." In essence, SQL Server takes the
full page and splits it into two separate pages, which have substantially more room at that point.
You can account for this issue by setting a fill-factor of 70% or so. This allows 30% free space
for incoming values. The problem with this approach is that you continually have to "re-index" the
index so that it maintains a free space percentage of 30%.
Clustered index maintenance
Clustered indexes that are static or "ever-increasing" should have a fill factor of 100%. Since
the values are always increasing, pages will just be added to the end of the index and virtually no
fragmentation will occur. For a more detailed explanation, see part 1 of this series, SQL Server
clustered index design for performance. This index category does not need to be re-indexed
because it doesn't fragment.
Clustered indexes that are either not static or "ever-increasing" will experience fragmentation
and page splits as the data rows move around within the data pages. The indexes in this category
have to be re-indexed in order to keep fragmentation low and allow queries to efficiently use the
index.
 |
Designing and maintaining SQL Server indexes: |
|
|
|
 |
 |
When you re-index these clustered indexes, you have to
decide what the fill factor should be. Normally this is 70% to 80%, giving you 20% to 30% empty
space for new records coming into the page. The optimal settings for your environment will depend
on how often records shift around, how many records are inserted and how often re-indexing occurs.
The goal is to set a fill factor low enough so that by the time you reach your next maintenance
cycle, the pages are around 95% full, but not yet splitting, which happens when they hit the 100%
limit.
Non-clustered index maintenance
Non-clustered indexes will always have data shifting around the pages. It's not quite as big of
an issue like it is with clustered indexes -- the actual row data shifts with clustered indexes,
whereas only row pointers shift with non-clustered indexes. That said, the same rules apply to
non-clustered indexes as far as fill factors go. Again, the goal is to set a fill factor low enough
so that by the time you reach your next maintenance cycle, the pages are only around 95% full.
Non-clustered indexes will always fragment, and to avoid this you must constantly monitor and
maintain them.
Partitioned table index considerations
Partitioned tables allow data to be segregated into different partitions, depending on the data
in a column. Many tables are partitioned based on date ranges. Let's say your order table is
partitioned into years. Assuming the clustered index is aligned(see part 1 of this series), then you could re-index the non-clustered indexes
for, say, year 2000 at 100% fill factor, since that data, technically, won't be shifting around. In
this scenario, the year 2008 partition may have a fill factor of 70% on non-clustered indexes to
allow for data shifts, but the year 2000 will not have any shifts and can be re-indexed at 100%
fill factor so you optimize index seeks.
The same concept would apply to clustered indexes that are either not static or ever-increasing.
Clustered indexes with shifting data might be set to 70% fill factor for the year 2008 partition
and 100% fill factor for the year 2000.
SQL Server statistics
Statistics are maintained on columns and indexes and they help SQL Server determine how "unique"
some value may be -- i.e., if statistics say a value will match approximately 80% of the rows, SQL
Server will do a table scan instead. If statistics say a value will probably match around 10% of
the rows, then the query optimizer will opt for a seek to minimize database impact.
SQL Server statistics can be maintained automatically or you can run them manually. Since
re-indexing changes the statistics results, I recommend that after re-indexing, you manually run
sp_updatestats or the T-SQL UPDATE STATISTICS command. Statistics are only maintained on the first
column of any compound index, so the "uniqueness" of other columns in the index cannot be
determined.
Summary
Index maintenance is critical to ensure that queries continue to benefit from index use and to
reduce IO/RAM/CPU, which reduces blocking as well.
Run your queries with the option "show execution plan" turned on. If the query is not using your
index, then check the following:
- Run dbcc showcontig ('tablename') to see if the table is fragmented.
- Check your "where clause" to see if it references the first column in the index.
- Ensure that your "where clause" does not have a function for the criteria for the first column
of the index.
- Update the statistics just in case they are out of date. If the table is fragmented, then run
this step after re-indexing.
- Make sure the criteria you are using is unique enough and that SQL Server will see a benefit in
using it to search the data.
SQL Server clustered and
non-clustered index design
Part 1: SQL Server clustered index design for performance
Part 2: Designing SQL Server non-clustered indexes
Part 3: How to maintain SQL Server indexes
ABOUT THE AUTHOR
Matthew Schroeder is a senior software engineer who works on SQL Server database
systems ranging in size from 2 GB to 3+ TB, with between 2k and 40+k trans/sec. He specializes in
OLTP/OLAP DBMS systems as well as highly scalable processing systems written in .NET. Matthew is a
Microsoft certified MCITP, Database Developer, has a master's degree in computer science and more
than 12 years of experience in SQL Server/Oracle. He can be reached at [email protected].
Disclaimer:
Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.
Join the conversationComment
Share
Comments
Results
Contribute to the conversation