Python Hash Table Implementation

Question

I'm attempting to implement a basic hash table in Python using only lists. Any tips would be appreciated (including better hash functions). I intend this to handle collisions with separate chaining.

Are there any features I didn't implement that are standard in hash tables?
Are there any things I handled incorrectly or could have implemented in a better way?

My implementation:

class HashTable(object):

    table = [None] * 256

    def get_value(self, key):
        total = 0
        for i in range(len(key)):
            total += ord(key[i]) * (7**i)
        return (len(key) * total) % 256

    def insert(self, key):
        val = self.get_value(key)
        if self.table[val] == None:
            self.table[val] = key
        else:
            if type(self.table[val]) == list:
                self.table[val].append(key)
            else:
                self.table[val] = [self.table[val], key]

    def delete(self, key):
        val = self.get_value(key)
        if self.table[val] != None:
            if type(self.table[val]) == list:
                i = self.table[val].index(key)
                self.table[val][i] = None
            else:
                self.table[val] = None
        else:
            KeyError()

    def lookup(self, key):
        found = False
        val = self.get_value(key)
        if type(self.table[val]) == list:
            found = key in self.table[val]
        else:
            found = self.table[val] == key
        return found

@Quill This was written in Python 2.7, although I'm not sure I used any features that aren't in Python3 either. — drexx, Jan 28 at 5:48
@drexx Generally people will tag both Python-3.x and Python-2.7 if they're looking to make code suitable for both cases, it's worth also saying that's a goal in the question so that people know to examine that. — SuperBiasedMan, Feb 1 at 9:57

slallum · Accepted Answer · 2016-01-28 08:03:34Z

I'm pretty sure you implemented the necessary features in a standard hash table. If you want the hash table to populate only unique values then you need to change your insert method by looking up the value before inserting it.

Here are some things you did incorrectly or can do better in my opinion:

table = [None] * 256

table is static right now, which means that any instance of the class will have the same table variable. You need to initiate it in __init__.

def get_value(self, key):

get_value is a method that should not be called by the user of the class. I recommend making is private by changing its name to _get_value.

def insert(self, key):
    val = self.get_value(key)
    if self.table[val] == None:
        self.table[val] = key
    else:
        if type(self.table[val]) == list:
            self.table[val].append(key)
        else:
            self.table[val] = [self.table[val], key]

Why starting with a single value and only after that changing to a list? I recommend starting with a list right from the start. As said by python's module this:

Special cases aren't special enough to break the rules.

That way, you can start with a table of empty lists right from the start, this will simplify your insert and lookup methods.

def delete(self, key):
    val = self.get_value(key)
    if self.table[val] != None:
        if type(self.table[val]) == list:
            i = self.table[val].index(key)
            self.table[val][i] = None

Making the value None can be dangerous - what if the user inserts lot's of values and then removes all of them? the lookup method will then take much more time than required. Although it takes more time, I still think list.remove is the right thing to do here.

    ...
    else:
        KeyError()

You need to raise the KeyError. Also - it's better to put in the error the right message. something like: raise KeyError("key {key} can't be found.".format(key=key)

SuperBiasedMan · Answer 2 · 2016-02-01 10:03:52Z

I agree with slallum that you ought to just use lists from the start, but I would also note that this is not how you test for a list:

if type(self.table[val]) == list:

In general type(var) == type is discouraged, because it wont accept inheritance. The most obvious example is that a DefaultDict is a special type of dictionary. It has some extra functionality but mostly is the same as a regular dictionary. Yet if you tried this test, you would be told that they're different types. The way to compare type directly is to use the function isinstance. Which will test if a variable is an instance of a type, or any type that inherits from it:

if isinstance(self.table[val], list):

But generally, it's more Pythonic to test something based on whether or not it can do what you need. For instance, you might want to attempt to append to the value, and if it cannot take an append then it must not be a list and you should instead create one. Like so:

try:
    self.table[val].append(key)
except AttributeError:
    self.table[val] = [self.table[val], key]

This will attempt to append to the list, but if it cannot that means that it's not a list and will instantiate one there.

oliverpool · Answer 3 · 2016-01-28 08:05:28Z

A first step could be to add docstrings to your methods: it helps your readers understand the goal of your methods and it help yourself know what those methods should and should not do.

Then you should check the user input: I suppose, that the key variable is a string. You should enforce it and raise a TypeError when it is not the case (for example a list of strings should not be accepted).

I would rename the get_value method to get_hash, to show that it computes a hash between 0 and 255.

You currently store the actual values either directly or in a list. I think you should stick to one data structure: a list, or even better a set (it has add and remove method just as you need).

You could also use the same conventions as the set for your public method naming: add, remove (and I would suggest has instead of lookup).

You could also implement some of the python magic methods like __contains__(self, item) and __delitem__(self, key)

set is implemented by hash table, so it would be "cheating" using set instead of regular list. — slallum, Jan 28 at 8:12
Fair enough. But one could also use a list as a hash table: elt in list, list.remove(elt) (with some methods to prevent duplication of the elements). I think we should add the [reinventing-the-wheel] tag ;-) — oliverpool, Jan 28 at 8:17

ppperry · Answer 4 · 2016-06-02 00:00:03Z

Python already has a built-in hash function, so you can simplify your get_value method to this:

def get_value(self, key):
     return hash(key)%256

As a bonus, your hash table now works with more than just strings.

The get_value method does not do anything with the HashTable instance, so it should be moved out of the class:

def get_value(key):
    return hash(key)%256
class HashTable(object):
     ...

The built in dict type in Python will automatically get bigger as you add more stuff, unlike your HashTable class.

asked	10 months ago
viewed	2625 times
active	6 months ago

current community

your communities

more stack exchange communities

Python Hash Table Implementation

4 Answers 4

Your Answer

Not the answer you're looking for? Browse other questions tagged python python-3.x python-2.7 reinventing-the-wheel hash-table or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Python Hash Table Implementation

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python python-3.x python-2.7 reinventing-the-wheel hash-table or ask your own question.

Linked

Related

Hot Network Questions