I've been doing a bit of research on the subject. I know unordered_sets are hash tables, where the key and value are one and the same. What I'd like to know is how the compiler figures out where in the table each object belongs.
1 Answer
An unordered_set
is specified quite a bit more "tightly" than most people seem to realize, so it's difficult (if possible at all) to implement it as anything except a hash table that resolves hash collisions by chaining.
In particular, the standard requires that the hash table consist of buckets, that each bucket contain items that hashed to the same value, that the size of a bucket be obtainable with constant complexity, and that traversing the items in a bucket be possible with linear complexity.
Those requirements are trivial to meet if you use collision chaining--and difficult or impossible to meet otherwise.
As such, the basic data structure would typically look something like this (ignoring some extra template parameters we don't care about right now):
template <class T>
class unordered_set {
std::vector<std::vector<T> > data;
// ...
The insert
member might look vaguely like this:
pair<iterator, bool>
insert(T t) {
auto raw_pos = hash(t); // hash the input
auto pos = raw_pos % data.size(); // reduce the range to that of the table
auto &bucket = data[pos];
// try to find existing item:
auto existing = std::find(bucket.begin(), bucket.end(), t);
// if there was no existing item, add the new one
if (existing == bucket.end()) {
bucket.push_back(t);
return {bucket.back(), true};
}
// if there was an existing item, return iterator and false to indicate we didn't add
return {existing, false};
}
Note that this definitely isn't an attempt at anything similar to production code--just a general sketch of basic logic. And when I say "basic", I mean it--a real implementation needs considerably more logic, such as to re-size the table in case the new addition increases the load factor beyond the specified limit. In fact, some of the code isn't just simplified, it's technically wrong as it is right now (but fixing it would make the code more complex without really adding much).
-
This is a very nice explanation. My only remark is that the iterator you return doesn't work for traversing the entire set but rather a single bucket. And
bucket.back()
gives you a reference and no iterator at all. I don't know how to fix that without making the answer needlessly complicated, though. Maybe just say it.5gon12eder– 5gon12eder2016-01-09 04:20:51 +00:00Commented Jan 9, 2016 at 4:20 -
Yeah, probably worth mentioning that it's technically wrong.Jerry Coffin– Jerry Coffin2016-01-09 05:02:10 +00:00Commented Jan 9, 2016 at 5:02
unordered_set
source (really, this) too