Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

Background information:

In my project I'm applying Reinforcement Learning (RL) to the Mario domain. For my state representation I chose to use a hashtable with custom objects as keys. My custom objects are immutable and have overwritten the .equals() and the .hashcode() (which were generated by the IntelliJ IDE).

This is the resulting .hashcode(), I've added the possible values in comments as extra information:

@Override
public int hashCode() {
    int result = (stuck ? 1 : 0);                // 2 possible values: 0, 1
    result = 31 * result + (facing ? 1 : 0);     // 2 possible values: 0, 1 
    result = 31 * result + marioMode;            // 3 possible values: 0, 1, 2
    result = 31 * result + (onGround ? 1 : 0);   // 2 possible values: 0, 1 
    result = 31 * result + (canJump ? 1 : 0);    // 2 possible values: 0, 1 
    result = 31 * result + (wallNear ? 1 : 0);   // 2 possible values: 0, 1 
    result = 31 * result + nearestEnemyX;        // 33 possible values: - 16 to 16
    result = 31 * result + nearestEnemyY;        // 33 possible values: - 16 to 16

    return result;
}

The Problem:

The problem here is that the result in the above code can exceed Integer.MAX_VALUE. I've read online this doesn't have to be a problem, but in my case it is. This is partly due to algorithm used which is Q-Learning (an RL method) and depends on the correct Q-values stored inside the hashtable. Basically I cannot have conflicts when retrieving values. When running my experiments I see that the results are not good at all and I'm 95% certain the problem lies with the retrieval of the Q-values from the hashtable. (If needed I can expand on why I'm certain about this, but this requires some extra information on the project which isn't relevant for the question.)

The Question:

Is there a way to avoid the integer overflow, maybe I'm overlooking something here? Or is there another way (perhaps another datastructure) to get reasonably fast the values given my custom-key?

Remark:

After reading some comments I do realise that my choice for using a HashTable wasn't maybe the best one as I want unique keys that do not cause collisions. If I still want to use the HashTable I will probably need a proper encoding.

share|improve this question
1  
If you have such a special use of hashes, why do you use Object's .hashCode() to begin with? Why not use a dedicated hash? You can for instance use Guava's HashFunctions –  fge May 13 '14 at 14:23
    
@fge That's Object's hashCode()? I thought the normal implementation of hashCode() was a native method? –  user3580294 May 13 '14 at 14:24
    
@user3580294 if this class is immutable, then the code above will generate a non-changing hash code. Anyway, that's not the main point in the question. –  Luiggi Mendoza May 13 '14 at 14:25
1  
@user3580294 it looks to me like the OP wants to generate unique keys for each object or something; the problem, of course, is that hashCode() makes no guarantees about that –  fge May 13 '14 at 14:28
1  
Floris: HashMaps are already pretty darn fast, but the Java implementation does have some behind-the-scenes stuff like extra hashing that might slow things down. Also, it probably would have helped to say you needed perfect hashes instead of worrying about the range of numbers returned by hashCode(). If you're using the objects as keys into a HashMap the range of values returned by hashCode() doesn't matter so long as the method obeys the contract for hashCode() –  user3580294 May 13 '14 at 14:37

4 Answers 4

up vote 5 down vote accepted

You need a dedicated Key Field to guarantee uniqueness

.hashCode() isn't designed for what you are using it for

.hashCode() is designed to give good general results in bucketing algorithms, which can tolerate minor collisions. It is not designed to provide a unique key. The default algorithm is a trade off of time and space and minor collisions, it isn't supposed to guarantee uniqueness.

Perfect Hash

What you need to implement is a perfect hash or some other unique key based on the contents of the object. This is possible within the boundries of an int but I wouldn't use .hashCode() for this representation. I would use an explicit key field on the object.

Unique Hashing

One way to use use SHA1 hashing that is built into the standard library which has an extremely low chance of collisions for small data sets. You don't have a huge combinational explosion in the values you posts to SHA1 will work.

You should be able to calculate a way to generate a minimal perfect hash with the limited values that you are showing in your question.

A minimal perfect hash function is a perfect hash function that maps n keys to n consecutive integers—usually [0..n−1] or [1..n]. A more formal way of expressing this is: Let j and k be elements of some finite set K. F is a minimal perfect hash function iff F(j) =F(k) implies j=k (injectivity) and there exists an integer a such that the range of F is a..a+|K|−1. It has been proved that a general purpose minimal perfect hash scheme requires at least 1.44 bits/key.2 The best currently known minimal perfect hashing schemes use around 2.6 bits/key.[3]

A minimal perfect hash function F is order preserving if keys are given in some order a1, a2, ..., an and for any keys aj and ak, j

A minimal perfect hash function F is monotone if it preserves the lexicographical order of the keys. In this case, the function value is just the position of each key in the sorted ordering of all of the keys. If the keys to be hashed are themselves stored in a sorted array, it is possible to store a small number of additional bits per key in a data structure that can be used to compute hash values quickly.[6]

Solution

Note where it talks about a URL it can be any byte[] representation of any String that you calculate from your object.

I usually override the toString() method to make it generate something unique, and then feed that into the UUID.nameUUIDFromBytes() method.

Type 3 UUID can be just as useful as well UUID.nameUUIDFromBytes()

Version 3 UUIDs use a scheme deriving a UUID via MD5 from a URL, a fully qualified domain name, an object identifier, a distinguished name (DN as used in Lightweight Directory Access Protocol), or on names in unspecified namespaces. Version 3 UUIDs have the form xxxxxxxx-xxxx-3xxx-yxxx-xxxxxxxxxxxx where x is any hexadecimal digit and y is one of 8, 9, A, or B.

To determine the version 3 UUID of a given name, the UUID of the namespace (e.g., 6ba7b810-9dad-11d1-80b4-00c04fd430c8 for a domain) is transformed to a string of bytes corresponding to its hexadecimal digits, concatenated with the input name, hashed with MD5 yielding 128 bits. Six bits are replaced by fixed values, four of these bits indicate the version, 0011 for version 3. Finally, the fixed hash is transformed back into the hexadecimal form with hyphens separating the parts relevant in other UUID versions.

My preferred solution is Type 5 UUID ( SHA version of Type 3)

Version 5 UUIDs use a scheme with SHA-1 hashing; otherwise it is the same idea as in version 3. RFC 4122 states that version 5 is preferred over version 3 name based UUIDs, as MD5's security has been compromised. Note that the 160 bit SHA-1 hash is truncated to 128 bits to make the length work out. An erratum addresses the example in appendix B of RFC 4122.

Key objects should be immutable

That way you can calculate toString(), .hashCode() and generate a unique primary key inside the Constructor and set them once and not calculate them over and over.

Here is a straw man example of an idiomatic immutable object and calculating a unique key based on the contents of the object.

package com.stackoverflow;

import javax.annotation.Nonnull;
import java.util.Date;
import java.util.UUID;

public class Q23633894
{

    public static class Person
    {
        private final String firstName;
        private final String lastName;
        private final Date birthday;
        private final UUID key;
        private final String strRep;

        public Person(@Nonnull final String firstName, @Nonnull final String lastName, @Nonnull final Date birthday)
        {
            this.firstName = firstName;
            this.lastName = lastName;
            this.birthday = birthday;
            this.strRep = String.format("%s%s%d", firstName, lastName, birthday.getTime());
            this.key = UUID.nameUUIDFromBytes(this.strRep.getBytes());
        }

        @Nonnull
        public UUID getKey()
        {
            return this.key;
        }

        // Other getter/setters omitted for brevity

        @Override
        @Nonnull
        public String toString()
        {
            return this.strRep;
        }

        @Override
        public boolean equals(final Object o)
        {
            if (this == o) { return true; }
            if (o == null || getClass() != o.getClass()) { return false; }
            final Person person = (Person) o;
            return key.equals(person.key);
        }

        @Override
        public int hashCode()
        {
            return key.hashCode();
        }
    }
}
share|improve this answer
    
There's no such a thing like "perfect hash", because a collision may occur anyway. By the way, you'll really need to hack the things to get the same hashsum –  Dmitry Ginzburg May 13 '14 at 14:30
    
you can create a perfect hash for anything given enough bits to cover the set of possible values –  Jarrod Roberson May 13 '14 at 14:31
    
@JarrodRoberson Are you even sure OP's looking for a perfect hash? He said he's using a hash table, which shouldn't care about the hash values that much unless he's rolling his own –  user3580294 May 13 '14 at 14:33
    
@JarodRoberson that's truth in the mathematical sense, but hashCode() method returns an integer, which is bounded. –  Dmitry Ginzburg May 13 '14 at 14:33
1  
@JarrodRoberson I accepted your answer because of how informative it is, but in the the answer of Victor-Philipp Negoescu and Peter Lawrey also helped me. –  Floris Devriendt May 21 '14 at 22:22

For a unique representation of your object's state, you would need 19 bits in total. Thus, it is possible to represent it by a "perfect hash" integer value (which can have up to 32 bits):

@Override
public int hashCode() {
    int result = (stuck ? 1 : 0); // needs 1 bit (2 possible values)
    result += (facing ? 1 : 0) << 1; // needs 1 bit (2 possible values)
    result += marioMode << 2; // needs 2 bits (3 possible values)
    result += (onGround ? 1 : 0) << 4; // needs 1 bit (2 possible values)
    result += (canJump ? 1 : 0) << 5; // needs 1 bit (2 possible values)
    result += (wallNear ? 1 : 0) << 6; // needs 1 bit (2 possible values)
    result += (nearestEnemyX + 16) << 7; // needs 6 bits (33 possible values)
    result += (nearestEnemyY + 16) << 13; // needs 6 bits (33 possible values)
}
share|improve this answer
    
Might it be even better to calculate the hashCode() once and cache it because the object is immutable? –  user3580294 May 13 '14 at 14:40
1  
Of course, but that was not the question. :-) Ok, it was part of the question. But calculation of such a hash code is somewhat simple. You would need to store 2^19 integers (all possible hash codes) which would need at least 2^19 * 32 bit = 2 MB. –  Victor-Philipp Negoescu May 13 '14 at 14:40
    
Yeah, true. Just thought it might help –  user3580294 May 13 '14 at 14:42
2  
@user3580294 to be honest the whole operation is going to take 10 ns or so on a recent cpu so it is unlikely that this is where the program is going to spend some time. –  assylias May 13 '14 at 14:42
1  
@assylias The math might be fast, true. Depends on how frequently the map is accessed. –  user3580294 May 13 '14 at 14:46

Instead of using 31 as a your magic number, you need to use the number of possibilities (normalised to 0)

@Override
public int hashCode() {
    int result = (stuck ? 1 : 0);                // 2 possible values: 0, 1
    result = 2 * result + (facing ? 1 : 0);      // 2 possible values: 0, 1 
    result = 3 * result + marioMode;             // 3 possible values: 0, 1, 2
    result = 2 * result + (onGround ? 1 : 0);    // 2 possible values: 0, 1 
    result = 2 * result + (canJump ? 1 : 0);     // 2 possible values: 0, 1 
    result = 2 * result + (wallNear ? 1 : 0);    // 2 possible values: 0, 1 
    result = 33 * result + (16 + nearestEnemyX); // 33 possible values: - 16 to 16
    result = 33 * result + (16 + nearestEnemyY); // 33 possible values: - 16 to 16

    return result;
}

This will give you 104544 possible hashCodes() BTW you can reverse this process to get the original values from the code by using a series of / and %

share|improve this answer
1  
Thanks for the top on my nearestEnemyX & nearestEnemyY by adding the 16 to make the numbers positive. This helped me as well, just as the accepted answer. Unfortunately I can only accept one. –  Floris Devriendt May 21 '14 at 22:24

Try Guava's hashCode() method or JDK7's Objects.hash(). It's way better than writing your own. Don't repeat code yourself (and anyone else when you can use out of box solution):

share|improve this answer
    
java.util doesn't have a hashCode() method... And Object#hashCode() only works for reference equality, not object equality –  user3580294 May 13 '14 at 14:31
1  
@user3375746 What you propose won't solve the issue of getting negative integers. –  assylias May 13 '14 at 14:43
2  
@assylias I don't think that the negative integers are really the issue here (negative integers are valid hashCodes AFAIK) but the uniqueness which wouldn't be solved by this answer either. –  Puce May 13 '14 at 15:07

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.