Background
Hashing is a very important concept in computer programming and a very popular concept for interview questions. Two important data structures related to hashing are HashMap and HashSet which is exactly what we are going to look at in this post.
Overview
If you are from a computer science background you would know HashMap is a data structure which stores key value pairs where as a HashSet stores unique data. In HashMap we have kind of buckets and each data added to a HashMap falls into one of the buckets depending on the hash value of it. Also you must have heard that adding and retrieving objects in HashMap happen in time complexity O(1).
But still there are open end question like -
- What happens when two objects added to HashMap have same hash (code) value ? - a situation typically known as collision.
- If above is handled how get and put work in hashmap?.... and so on.
Understanding how HashMap works
You can visualize HashMap as follows-
So you have an Array and each array position is essentially a Linked List. As you know you don't have to specify size of the HashMap. It increases dynamically like ArrayList. The main data structure is essentially an array.
/** * The table, resized as necessary. Length MUST Always be a power of two. */ transient Entry[] table;
When you create a HashMap you either choose to provide initial capacity or when you don't a default value is used.
/** * The default initial capacity - MUST be a power of two. */ static final int DEFAULT_INITIAL_CAPACITY = 16;
When table is created it is created with either this initial default capacity or the capacity you provide in the constructor.
Next important thing is when should out table we resized to meet the dynamically changing data size of HashMap. Answer depends on a threshold that is determined by load factor.
/** * The load factor used when none specified in constructor. */ static final float DEFAULT_LOAD_FACTOR = 0.75f;
HashMap provides a constructor to initialize load factor as well. So when your data size equals
- threshold = table capacity * load factor
then the HashMap table is resized. Constructors are as follows -
/** * Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and load factor. * * @param initialCapacity the initial capacity * @param loadFactor the load factor * @throws IllegalArgumentException if the initial capacity is negative * or the load factor is nonpositive */ public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); // Find a power of 2 >= initialCapacity int capacity = 1; while (capacity < initialCapacity) capacity <<= 1; this.loadFactor = loadFactor; threshold = (int)(capacity * loadFactor); table = new Entry[capacity]; init(); } /** * Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and the default load factor (0.75). * * @param initialCapacity the initial capacity. * @throws IllegalArgumentException if the initial capacity is negative. */ public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); } /** * Constructs an empty <tt>HashMap</tt> with the default initial capacity * (16) and the default load factor (0.75). */ public HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR); table = new Entry[DEFAULT_INITIAL_CAPACITY]; init(); }
How is data with keys generating same hash code stored in HashMap?
As mentioned earlier each index has reference to object of type Entry which is a Linked List. It has a next pointer. So if an entry is added to hash map it's hash code is computed which determines the index at which it should be put. If that index has an entry then new entry is added to the start of the linked list and existing linked list is appended to next of it.
public V put(K key, V value) { if (key == null) return putForNullKey(value); int hash = hash(key.hashCode()); int i = indexFor(hash, table.length); for (Entry<K,V> e = table[i]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; }
void addEntry(int hash, K key, V value, int bucketIndex) { Entry<K,V> e = table[bucketIndex]; table[bucketIndex] = new Entry<K,V>(hash, key, value, e); if (size++ >= threshold) resize(2 * table.length); }
Also note how null is handled. Yes null is an acceptable key in HashMap.
So how is data retrieved if two keys generate same has code?
Same hash code will make searched for both keys data land on same index in the table. From there the each Entry object is iterated over and it's key compared with the search key. On successful match corresponding value is returned.
public V get(Object key) { if (key == null) return getForNullKey(); int hash = hash(key.hashCode()); for (Entry<K,V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) return e.value; } return null; }
Lastly lets see HashSet.
Understanding HashSet
Well if you are still guessing the data structure of HashSet then following would be a surprise -
Yup HashSet stores data as keys of HashMap with dummy value. Constructor is equivalent to that of HashMap -
Add and Remove methods are as follows -
private transient HashMap<E,Object> map; // Dummy value to associate with an Object in the backing Map private static final Object PRESENT = new Object();
Yup HashSet stores data as keys of HashMap with dummy value. Constructor is equivalent to that of HashMap -
/** * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has * default initial capacity (16) and load factor (0.75). */ public HashSet() { map = new HashMap<E,Object>(); } public HashSet(int initialCapacity, float loadFactor) { map = new HashMap<E,Object>(initialCapacity, loadFactor); } public HashSet(int initialCapacity) { map = new HashMap<E,Object>(initialCapacity); }
Add and Remove methods are as follows -
public boolean add(E e) { return map.put(e, PRESENT)==null; } public boolean remove(Object o) { return map.remove(o)==PRESENT; }
Related Links
- Interview Question #12 Differences between HashMap and Hashtable?
- Hashtable example in Java.
- How to iterate over a Map in Java?
- Simple Map example in Java.Simple Map example in Java.
- How to sort a Map on the values in Java?
- How HashMap works in Java (Java Revisited)
- Interview Question #10 How ArrayList works internally in java?Simple Map example in Java.