Need for reaching data through clustered index with a non-clustered index

Question

I have found that when a table has both clustered and non-clustered indexes (on different columns), the leaf level non-clustered pages, instead of pointing to the data row, point to the node of the clustered index, from where another search is instituted to find the data row. What is the point of this extra level of indirection? If the clustered index has, say, 8 levels, then the indirection from the NCI leaf page to the CI root would have to traverse through these 8 levels to reach the data. Why not store the normal RID in the NCI leaf page so that we can access the data at once without going through the CI index structure?

See Why Doesn’t SQL Server Use the RID for Lookups from a Nonclustered Index into a Clustered Table? - Kim Tripp is the Queen of Indexing for SQL Server. If you don't want to believe me - fine - but read her article and embrace what she says - she really knows what she's talking about!

jcolebrand · Accepted Answer · 2012-09-04 03:48:59Z

up vote 7 down vote accepted

The reason for this is that the "fixed" physical location of your row - the RID (or row identifier) might (and will!) change over time - think page splits that occur when a row needs to be inserted into a table on a page that's already full.

Updating those RIDs in all the nonclustered indices that exist on a given table is quickly becoming both a hassle, and a huge performance killer. You might have 5, 10, 20 nonclustered indices on your table, and SQL Server would have to scan all those indices (basically scanning the whole index, all rows in the index, and that 10, 20 times) and update all RIDs.... that's just not practical - very quickly so.

If you store the value of the clustering index as a "row pointer" instead, that value should typically never change - and most definitely it doesn't need to be updated every time a page is split. Yes, it does involve a second index seek operation - the key lookup - but for simple scenarios, retrieving a single row or a few rows, that's still much more efficient than anything else.

edited Sep 4 '12 at 3:48

jcolebrand♦
3,64411647

answered Sep 1 '12 at 12:37

marc_s
3,02711123

Ok, got that. So basically you are saying that for a NCI, multiple index rows will have to be updated with the pointers to the newly fragmented page if a page split occurs (these may be huge if the number of rows in a page on average is large), whereas for a CI index, if a fragmentation occurs, only the next address of the old page and the newly formed page have to be updated. – Cupidvogel Sep 1 '12 at 12:44

1

If you NCI stored the physical address of the actual data row, then anytime the data row moves (e.g. through a page split), ALL NCI on that table would have to be updated. When a page split occurs, the CI value does NOT CHANGE and no update is needed in the NCI. HUGE improvement! – marc_s Sep 1 '12 at 12:49

Two questions, when a page split occurs, why should all the NIC's need to be updated? Only the RID's in the parent pages that point to the now fragmented pages should be updated, no? And for a CI, if a page split occurs, won't the addresses stored in the doubly linked list for that page need to be updated to point to the newly created page? – Cupidvogel Sep 1 '12 at 12:57

@Cupidvogel: if you have 10 NCI on a table, each one of them will basically point to any of the data pages. So if that page holding the data is split, all NCI will need to be updated. Agreed? And if the page is split, then yes - the index structure of the clustered index will need to be updated - but only few pages - the parent page, the previous and next sibling pages - that's all - no NCI need to be updated. Also: there's only one single CI - there cannot be 10 of those to update! – marc_s Sep 1 '12 at 13:18

Oh, you meant that all NIC's have to be updated. Sure. I though that you meant all index rows within the NIC have to be updated. That's why I said only the relevant index rows have to be updated. But these will of course add up cumulatively. Another question, the article you pointed to, says that if the NCIs hold row-ids, that's space wastage, for it's useless data. But when the NCIs are pointing to to the node, they all are containing a pointer to the node-id anyway, so where is the economy of space? – Cupidvogel Sep 1 '12 at 13:33

show 4 more comments

gbn · Answer 2 · 2012-09-01 12:37:21Z

In simple terms, it involves less processing and movement of NC index entries when data in the clustered index physically moves (row forwarding, page splits, INSERTs etc).

Mostly the clustered index entries only need changed: not the NC index pointers. By using RIDs, you'd need to do a lot more work on the NC indexes.

To minimize this lookup in a query, you'd make the NC indexes "covering". See Which is better: Bookmark/Key Lookup or Index Scan for example

asked	8 months ago
viewed	130 times
active	8 months ago

Need for reaching data through clustered index with a non-clustered index

migrated from stackoverflow.com Sep 1 '12 at 15:53

2 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged sql-server clustered-index nonclustered-index or ask your own question.

Linked

Need for reaching data through clustered index with a non-clustered index

migrated from stackoverflow.com Sep 1 '12 at 15:53

2 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged sql-server clustered-index nonclustered-index or ask your own question.

Linked

Related