I would have thought that databases would know enough about what they encounter often and be able to respond to the demands they're placed under that they could decide to add indexes to highly requested data.
The index design that you put in place is something more of an art than a science. The RDBMS isn't smart enough to take common workloads and design a smart indexing strategy. It is up to human intervention (read: DBA) to analyze workload and determine what is the best approach. If there was no penalty of having indexes then it would be a shotgun approach to just add an infinite number of indexes. But because data modification (INSERTS, UPDATES, and DELETES) have impact on the enabled indexes on a table then there is going to be that variable overhead of these indexes. It takes human design and strategy to smartly create indexes that'll maximize read performance, while having the least amount of data modification overhead. |
|||||||||||||||||||||
|
Some databases do already (kind of) create indexes automatically. In SQL Server the execution plan can sometimes include an Index Spool operator where the RDBMS dynamically creates an indexed copy of the data. However this spool is not a persistent part of the database kept in synch with the source data and it cannot be shared between query executions, meaning execution of such plans may end up repeatedly creating and dropping temporary indexes on the same data. Perhaps in the future RDBMSs will have the capacity to dynamically drop and create persistent indexes according to workload. The process of index optimisation is in the end just a cost benefit analysis. Whilst it is true that humans may have more information about relative importance of queries in a workload in principle there is no reason why this information could not be made available to the optimiser. SQL Server already has a resource governor that allows sessions to be classified into different workload groups with different resource allocations according to priority. The missing index DMVs mentioned by Kenneth are not intended to be implemented blindly as they only consider the benefits to a specific query and make no attempt to take account of the cost of the potential index to other queries. Nor does it consolidate similar missing indexes. e.g. the output of this DMV may report missing indexes on Some current issues with the idea are
It is probably reasonable to expect the accuracy of costing models to improve over time but point 2 looks trickier to solve and point 3 is inherently insoluble. Nevertheless probably the vast majority of installs are not in this idealised situation with skilled staff who continuously monitor, diagnose, and anticipate (or at least react to) changes in workloads. The AutoAdmin project at Microsoft Research has been running since 1996
The project home page lists several intriguing projects. One is particularly relevant to the question here
The authors state
The paper introduces an algorithm
The implementation of the algorithm allows for throttling in response to changes in server load and also can abort index creation if during creation the workload changes and expected benefit falls below the point that it is deemed worthwhile. The conclusion of the authors on the topic of Online versus traditional physical tuning.
The conclusions here are similar to those in another paper Autonomous Query-driven Index Tuning
|
||||
|
In fact, there are some databases that do this. For example, Google's BigTable and Amazon's SimpleDB automatically create indices (though neither are RDBMS's). There is also at least one MySQL engine that does this. SQL Server also keeps track of indices it thinks you should create, though it doesn't go so far as actually creating them. The problem is surprisingly difficult to get correct, so it's no wonder that most databases don't automatically create them (BigTable/SimpleDB gets away with it because it doesn't allow arbitrary joins, which makes thing significantly easier). Also, creating indices on the fly is a time-consuming process that requires exclusive access to the entire table - definitely not something you want happening while the table is on-line. However, given the number of LAMP web applications out there that were written by amateurs who don't even know what an index is, I still think this feature could be beneficial, at least for MySQL. |
|||||||||||||||||
|
While there are some extensive answers already, they seem to skirt around the real answer: Indexes aren't always desirable. With the car analogy mentioned in comments, you'd be better of saying why aren't all cars fitted with extreme sports packages? Partly it's expense, but it's also down to the fact that a lot of people don't need or want low profile tires and rock hard suspension; it's unnecessarily uncomfortable. So maybe you have 1,000 reads for every insert, why not have an auto created index? If the table is wide and the queries are varied, why not have several? Maybe the commit is time critical and the reads aren't; in the circumstances it might be unacceptable to slow down your insert. Maybe you're working with limited disk space and you can't afford to have additional indexes eating into the space you've got. The point is, indexes aren't automatically created because they aren't the answer to everything. Designing indexes isn't simply a case of saying "hey this will speed up my reads", there are other factors to consider. |
|||||
|
They can analyse past queries and suggest/create indexes however this doesn't work optimally because indexes strike a balance to speed up what you want optimised at a cost and the server can not know your intentions. |
|||
|
UNIQUE
constraints. – dan04 Jun 4 at 16:20