How are objects stored in unordered_sets?

Question

I've been doing a bit of research on the subject. I know unordered_sets are hash tables, where the key and value are one and the same. What I'd like to know is how the compiler figures out where in the table each object belongs.

By hashing the key/value. I believe the precise mechanism of mapping hash to bucket is implementation-dependent but I haven't looked into it. What are you trying to accomplish? — Lightness Races in Orbit
– Lightness Races in Orbit, Commented Jan 8, 2016 at 20:09
It sounds like you're asking how hash tables work in general, rather than unordered_sets specifically. Is that the case? Also, "the key and the value are one and the same" doesn't sound right; what do you mean by that? — Ixrec
– Ixrec, Commented Jan 8, 2016 at 20:12
@moonman239 that sort of question may be best suited to the relevant GCC mailing list then - could try to decode the libstdc++ unordered_set source (really, this) too — Lightness Races in Orbit
– Lightness Races in Orbit, Commented Jan 8, 2016 at 20:25
@LightnessRacesinOrbit I've changed my mind - I actually use Clang in my day-to-day work. — moonman239
– moonman239, Commented Jan 8, 2016 at 20:30

Jerry Coffin · Accepted Answer · 2016-01-09 04:57:42Z

An unordered_set is specified quite a bit more "tightly" than most people seem to realize, so it's difficult (if possible at all) to implement it as anything except a hash table that resolves hash collisions by chaining.

In particular, the standard requires that the hash table consist of buckets, that each bucket contain items that hashed to the same value, that the size of a bucket be obtainable with constant complexity, and that traversing the items in a bucket be possible with linear complexity.

Those requirements are trivial to meet if you use collision chaining--and difficult or impossible to meet otherwise.

As such, the basic data structure would typically look something like this (ignoring some extra template parameters we don't care about right now):

template <class T>
class unordered_set {
    std::vector<std::vector<T> > data;

    // ...

The insert member might look vaguely like this:

pair<iterator, bool>
insert(T t) { 
    auto raw_pos = hash(t); // hash the input
    auto pos = raw_pos % data.size(); // reduce the range to that of the table

    auto &bucket = data[pos];

    // try to find existing item:
    auto existing = std::find(bucket.begin(), bucket.end(), t);

    // if there was no existing item, add the new one
    if (existing == bucket.end()) {
        bucket.push_back(t);
        return {bucket.back(), true};
    }
    // if there was an existing item, return iterator and false to indicate we didn't add
    return {existing, false};
}

Note that this definitely isn't an attempt at anything similar to production code--just a general sketch of basic logic. And when I say "basic", I mean it--a real implementation needs considerably more logic, such as to re-size the table in case the new addition increases the load factor beyond the specified limit. In fact, some of the code isn't just simplified, it's technically wrong as it is right now (but fixing it would make the code more complex without really adding much).

This is a very nice explanation. My only remark is that the iterator you return doesn't work for traversing the entire set but rather a single bucket. And bucket.back() gives you a reference and no iterator at all. I don't know how to fix that without making the answer needlessly complicated, though. Maybe just say it. — 5gon12eder
– 5gon12eder, Commented Jan 9, 2016 at 4:20
Yeah, probably worth mentioning that it's technically wrong. — Jerry Coffin
– Jerry Coffin, Commented Jan 9, 2016 at 5:02

Stack Exchange Network

How are objects stored in unordered_sets?

1 Answer 1

Your Answer

Hot Network Questions

How are objects stored in unordered_sets?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions