Code Review Stack Exchange is a question and answer site for peer programmer code reviews. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I'm attempting to implement a basic hash table in Python using only lists. Any tips would be appreciated (including better hash functions). I intend this to handle collisions with separate chaining.

  1. Are there any features I didn't implement that are standard in hash tables?

  2. Are there any things I handled incorrectly or could have implemented in a better way?

My implementation:

class HashTable(object):

    table = [None] * 256

    def get_value(self, key):
        total = 0
        for i in range(len(key)):
            total += ord(key[i]) * (7**i)
        return (len(key) * total) % 256

    def insert(self, key):
        val = self.get_value(key)
        if self.table[val] == None:
            self.table[val] = key
        else:
            if type(self.table[val]) == list:
                self.table[val].append(key)
            else:
                self.table[val] = [self.table[val], key]

    def delete(self, key):
        val = self.get_value(key)
        if self.table[val] != None:
            if type(self.table[val]) == list:
                i = self.table[val].index(key)
                self.table[val][i] = None
            else:
                self.table[val] = None
        else:
            KeyError()

    def lookup(self, key):
        found = False
        val = self.get_value(key)
        if type(self.table[val]) == list:
            found = key in self.table[val]
        else:
            found = self.table[val] == key
        return found
share|improve this question
    
Is this Python 2 or 3? – Quill Jan 28 at 5:47
    
@Quill This was written in Python 2.7, although I'm not sure I used any features that aren't in Python3 either. – drexx Jan 28 at 5:48
    
Fair enough, then – Quill Jan 28 at 5:49
    
@drexx Generally people will tag both Python-3.x and Python-2.7 if they're looking to make code suitable for both cases, it's worth also saying that's a goal in the question so that people know to examine that. – SuperBiasedMan Feb 1 at 9:57
up vote 4 down vote accepted

I'm pretty sure you implemented the necessary features in a standard hash table. If you want the hash table to populate only unique values then you need to change your insert method by looking up the value before inserting it.

Here are some things you did incorrectly or can do better in my opinion:

table = [None] * 256

table is static right now, which means that any instance of the class will have the same table variable. You need to initiate it in __init__.

def get_value(self, key):

get_value is a method that should not be called by the user of the class. I recommend making is private by changing its name to _get_value.

def insert(self, key):
    val = self.get_value(key)
    if self.table[val] == None:
        self.table[val] = key
    else:
        if type(self.table[val]) == list:
            self.table[val].append(key)
        else:
            self.table[val] = [self.table[val], key]

Why starting with a single value and only after that changing to a list? I recommend starting with a list right from the start. As said by python's module this:

Special cases aren't special enough to break the rules.

That way, you can start with a table of empty lists right from the start, this will simplify your insert and lookup methods.

def delete(self, key):
    val = self.get_value(key)
    if self.table[val] != None:
        if type(self.table[val]) == list:
            i = self.table[val].index(key)
            self.table[val][i] = None

Making the value None can be dangerous - what if the user inserts lot's of values and then removes all of them? the lookup method will then take much more time than required. Although it takes more time, I still think list.remove is the right thing to do here.

    ...
    else:
        KeyError()

You need to raise the KeyError. Also - it's better to put in the error the right message. something like: raise KeyError("key {key} can't be found.".format(key=key)

share|improve this answer

I agree with slallum that you ought to just use lists from the start, but I would also note that this is not how you test for a list:

if type(self.table[val]) == list:

In general type(var) == type is discouraged, because it wont accept inheritance. The most obvious example is that a DefaultDict is a special type of dictionary. It has some extra functionality but mostly is the same as a regular dictionary. Yet if you tried this test, you would be told that they're different types. The way to compare type directly is to use the function isinstance. Which will test if a variable is an instance of a type, or any type that inherits from it:

if isinstance(self.table[val], list):

But generally, it's more Pythonic to test something based on whether or not it can do what you need. For instance, you might want to attempt to append to the value, and if it cannot take an append then it must not be a list and you should instead create one. Like so:

try:
    self.table[val].append(key)
except AttributeError:
    self.table[val] = [self.table[val], key]

This will attempt to append to the list, but if it cannot that means that it's not a list and will instantiate one there.

share|improve this answer

A first step could be to add docstrings to your methods: it helps your readers understand the goal of your methods and it help yourself know what those methods should and should not do.

Then you should check the user input: I suppose, that the key variable is a string. You should enforce it and raise a TypeError when it is not the case (for example a list of strings should not be accepted).

I would rename the get_value method to get_hash, to show that it computes a hash between 0 and 255.

You currently store the actual values either directly or in a list. I think you should stick to one data structure: a list, or even better a set (it has add and remove method just as you need).

You could also use the same conventions as the set for your public method naming: add, remove (and I would suggest has instead of lookup).

You could also implement some of the python magic methods like __contains__(self, item) and __delitem__(self, key)

share|improve this answer
    
set is implemented by hash table, so it would be "cheating" using set instead of regular list. – slallum Jan 28 at 8:12
1  
Fair enough. But one could also use a list as a hash table: elt in list, list.remove(elt) (with some methods to prevent duplication of the elements). I think we should add the [reinventing-the-wheel] tag ;-) – oliverpool Jan 28 at 8:17

Python already has a built-in hash function, so you can simplify your get_value method to this:

def get_value(self, key):
     return hash(key)%256

As a bonus, your hash table now works with more than just strings.

The get_value method does not do anything with the HashTable instance, so it should be moved out of the class:

def get_value(key):
    return hash(key)%256
class HashTable(object):
     ...

The built in dict type in Python will automatically get bigger as you add more stuff, unlike your HashTable class.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.