Why does Python use hash table to implement dict, but not Red-Black Tree?
What is the key? Performance?
Why does Python use hash table to implement dict, but not Red-Black Tree? What is the key? Performance? |
|||||||||||||
closed as too broad by Martijn Pieters, MichaelT, gnat, World Engineer♦ Apr 8 at 0:10There are either too many possible answers, or good answers would be too long for this format. Please add details to narrow the answer set or to isolate an issue that can be answered in a few paragraphs.If this question can be reworded to fit the rules in the help center, please edit the question. |
|||||||||||||
|
This is a general, non-Python-specific answer. Algorithmic complexity comparison
The problem with hash tables is that hashes can collide. There are various mechanisms to resolve collisions, e.g. open addressing or separate chaining. The absolute worst case is that all keys have the same hash code, in which case a hash table will degrade into a linked list. In all other cases, a hash table is a great data structure that's easy to implement and delivers good performance. A downside is that implementations that can quickly grow the table and redistribute their entries will likely waste nearly as much memory as is actually being used. RB-Trees are self-balancing and do not change their algorithmic complexity in a worst case. However, they are more difficult to implement. Their average complexities are also worse than that of a hash table. Restrictions on keysAll keys in a hash table must be hashable and comparable for equality among each other. This is especially easy for strings or integers, but is also fairly straightforward to extend to user-defined types. In some languages like Java these properties are guaranteed by definition. Keys in an RB-Tree must have a total order: each key must be comparable with any other key, and the two keys must either compare smaller, greater, or equal. This sorting equality must be equivalent to actual equality. This is straightforward for integers and other numbers, also fairly easy for strings (the order need only be consistent and not externally observable, so the order does not need to consider locales), but difficult for other types that have no inherent order. It is absolutely impossible to have keys of different types unless some comparison between them is possible. One might think that a cheap solution for RB-Tree keys would be to first test for equality, then compare identity (i.e. compare pointers). However, this ordering would not be transitive: If Other Aspects
|
|||||||||
|
There are a whole range of reasons that might be true, but the key ones are likely to be:
Easier to write/maintain, and a performance winner in typical use cases? Sign me up, please! |
|||
|