I am going to assume that the reason you're supposed to be doing this is to better understand the costs and benefits of hash tables/dictionaries generally. Therefore I think that the key issue is that instead of:
[None] * size
your storage should be:
[[] for _ in range(size)]
(see here for why you can't use the same syntax. TL;DR: mutability.)
Note also that Python's dictionaries don't hash their keys; they ask the keys for their hash. To reflect that, here is a simple key-value pair class:
class KeyValuePair:
"""An container to store things in a HashTable."""
def __init__(self, key, value=None):
self.key = key
self.value = value
self._hash()
def __repr__(self):
return "{0.key!r}: {0.value!r}".format(self)
def __eq__(self, other):
return self.key == other.key
def _hash(self):
"""Calculate the hash of the key.
Notes:
Currently only works for strings.
Raises:
TypeError: If the key isn't supported.
"""
if isinstance(self.key, str):
self.hash = sum(map(ord, self.key))
else:
raise TypeError("Can't hash {!r}.".format(self.key))
(Note that this isn't what your spec asks for, but you can adapt these ideas into your own code.)
And here is a hash table, with just set
and get
implemented:
class HashTable:
"""A very rough approximation of a Python dictionary."""
def __init__(self, size=10):
self.buckets = [[] for _ in range(size)]
self.size = size
def __repr__(self):
return "\n".join(["{}: {!r}".format(index, bucket) for
index, bucket in enumerate(self.buckets)])
def set(self, key, value=None):
"""Add a new key-value pair to the hash table.
Notes:
The key must be hashable by KeyValuePair._hash.
Args:
key: The key to use.
value (optional): The value to store (defaults to None).
"""
keyval = KeyValuePair(key, value)
bucket = self.buckets[keyval.hash % self.size]
for kvp in bucket:
if kvp == keyval:
bucket.remove(kvp)
break
bucket.append(keyval)
def get(self, key):
"""Retrive the value for the specified key.
Notes:
The key must be hashable by KeyValuePair._hash.
Args:
key: The key to use.
Raises:
KeyError: If the key can't be found.
"""
keyval = KeyValuePair(key)
bucket = self.buckets[keyval.hash % self.size]
for kvp in bucket:
if kvp == keyval:
return kvp.value
raise KeyError("Key {!r} not found.".format(key))
Now in use:
>>> table = HashTable(8)
>>> for python in ("John Cleese", "Graham Chapman", "Terry Gilliam",
"Eric Idle", "Terry Jones", "Michael Palin"):
first, last = python.split()
table.set(last, first)
>>> table
0: ['Chapman': 'Graham']
1: ['Cleese': 'John']
2: []
3: []
4: ['Palin': 'Michael']
5: []
6: ['Idle': 'Eric']
7: ['Gilliam': 'Terry', 'Jones': 'Terry']
>>> table.get("Gilliam")
'Terry'
>>> table.set(1, "foo")
Traceback (most recent call last):
File "<pyshell#21>", line 1, in <module>
table.set(1, "foo")
File "<pyshell#5>", line 12, in set
keyval = KeyValuePair(key, value)
File "<pyshell#15>", line 6, in __init__
self._hash()
File "<pyshell#15>", line 27, in _hash
raise TypeError("Can't hash {!r}.".format(self.key))
TypeError: Can't hash 1.
>>> table.get("Smith")
Traceback (most recent call last):
File "<pyshell#23>", line 1, in <module>
table.get("Smith")
File "<pyshell#5>", line 26, in get
raise KeyError("Key {!r} not found.".format(key))
KeyError: "Key 'Smith' not found."
So what's important here?
- Not every bucket has anything in - we're reserving more space than we strictly need (space vs. speed is a classic computing trade-off).
- Using a hash to determine the bucket means we can quickly (hopefully
O(1)
) find out where we should be looking, rather than searching through the whole table. This is why:
- hash speed is important - if it takes ages to calculate the hash, we lose the advantage; and
- hash stability is important - if you get a different hash for the same key, you can't find it again;
- Some buckets have more than one thing in, in which case we fall back to a linear search
O(len(bucket))
. This is why hash distribution is important - if everything ended up in the same bucket (e.g. def _hash(self): self.hash = 42
, this is known as hash collision) we're back to a O(n)
search through everything.
Note the following general points:
- Lots of docstrings to explain what's going on (your code has none);
- Compliance with the style guide (your code lacks e.g. whitespace after commas, and (despite your specification!) it should really be e.g.
_get_hash
); and
- Raising errors rather than returning
None
when something goes wrong - None
can be a valid return value!
If you're interested in learning more about dictionaries, see The Mighty Dictionary.