Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

This program takes an arbitrary array of strings and group anagrams into arrays and return them in a array. Any string without any anagrams is still put in an array.

This code works perfectly and I would appreciate any comments on making this more efficient in terms of computation time.

import java.util.Arrays;
import java.util.HashMap;
import java.util.ArrayList;

public class AnagramSort{

    public static void main( String[] args ){
        HashMap< Integer, ArrayList< String >> hm = new HashMap();

        groupAnagrams( args, hm );
        System.out.println( hm );
    }


    public static void groupAnagrams( String[] list, HashMap< Integer, ArrayList< String >> hm ){

        for( int x=0; x<list.length; x++ ){
            if( list[ x ] == null ) continue;

            String curX    = list[ x ];
            int    hashX   = primeHash( curX );

            hm.put( hashX, new ArrayList( Arrays.asList( curX )));

            for( int y=x+1; y<list.length; y++ ){

                String curY    = list[ y ];
                int    hashY   = primeHash( curY );

                if( curY == null || curY.length() != curX.length())  continue;

                if( hashX == hashY ){
                    hm.get( hashX ).add( curY );

                    list[ y ] = null; // if its an anagram null it out to avoid checking again
                }
            }
        }
    }


// Utility Mehthods

    public static int primeHash( String word ){
        int productOfPrimes = 1;
        int prime[] = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 
            37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101 };

        for( char ch : word.toCharArray() ){
            productOfPrimes *= prime[ (int) ch - (int) 'a' ];
        }
        return productOfPrimes;
    }

}

Sample Input:

[ mother, mothre, dad, add, gift, gender ]

Output:

[ [mother, mothre], [dad,add], [gift], [gender] ]
share|improve this question
up vote 6 down vote accepted

Technical

   groupAnagrams( args );

That should be:

    groupAnagrams( args, hm );

Bug in copy/paste somehow?

1-liner statements, even simple ones like this:

      if( list[ x ] == null ) continue;

should be braced like:

       if( list[ x ] == null ) {
           continue;
       }

This prevents maintenance problems later, and makes revision history easier to follow.

Algorithm

Your algorithm is taking each member of the array, and comparing to each subsequent member, removing anagram matches.

The primeHash() method is interesting, but ultimately it is a red herring of sorts... and it only works for lower-case input words. You have obviously invested some thought in to it, but there's a simpler solution to that problem:

private static final String anagramKey(String word) {
    word = word.toLowerCase();
    char[] chars = word.toCharArray();
    Arrays.sort(chars);
    return new String(chars);
}

Take all letters, sort them, return a String. All anagrams of the same letters will have the same keys.

With that key system, the basic code can be come simpler with:

HashMap<String,List<String>> matchMap = new HashMap<>();
for (String word : args) {
    String key = anagramKey(word);
    if (!matchMap.containsKey(key)) {
        matchMap.put(key, new ArrayList<String>());
    }
    matchMap.get(key).add(word);
}

System.out.println(matchMap);

That reduces your problem to an \$O(n)\$ one.

share|improve this answer
    
Thanks for the answer. I used the code you suggested and it works really well. However, I realised that the primeHash method I used has a better worse case than sorting(merge sort in this case), O(n) So I combined primeHash with your loop construct and had a really good run time. – T. Rex Feb 18 '15 at 15:25

Just a few minor things on top of @rolfl's answer.

Don't use raw types

Using raw types, like in new HashMap() is a bad practice:

HashMap< Integer, ArrayList< String >> hm = new HashMap();

In Java6 you should write this as:

HashMap< Integer, ArrayList< String >> hm = new HashMap<Integer, ArrayList<String>>();

In Java7 and above you can use the diamond operator <> to write simpler:

HashMap< Integer, ArrayList< String >> hm = new HashMap<>();

Use interface types when possible

Whenever appropriate, use interface types instead of implementations. For example given this code:

HashMap< Integer, ArrayList< String >> hm = new HashMap<>();

The hm variable (poorly named, btw), doesn't really need to be a HashMap. Your algorithm will work just fine with a TreeMap too, or WhatEverTheHeckMap too, as long as it's a Map. So use a Map:

Map<Integer, List<String>> hm = new HashMap<>();

Note that I also changed ArrayList to List, for the same reason.

Do this way everywhere, for example in the groupAnagrams method too.

Code organization

Why pass a Map to the void method groupAnagrams? Why not pass only the input parameters and make it return a Collection (or List) of the results?

One big reason to do this is that callers of the groupAnagrams shouldn't need a Map at all. The fact that groupAnagrams uses a Map in its algorithm is just an implementation detail. The caller really needs only a collection (or list) of the results.

Unit testing

To verify that your algorithm actually works, it's good to have unit tests.

@Test
public void test_mother_mothre_dad_add_gift_gender() {
    Map<Integer, List<String>> map = new HashMap<>();
    AnagramSort.groupAnagrams(new String[]{"mother", "mothre", "dad", "add", "gift", "gender"}, map);
    assertEquals("[[gender], [dad, add], [gift], [mother, mothre]]", map.values().toString());
}

Once you have this (and hopefully more) test cases, you can refactor your algorithm like @rolfl suggested, and when your done you can simply re-run the tests with one simple click, and you'll know immediately if the implementation works or not.

Readability

Your coding style is very different from the way an IDE like Eclipse/IntelliJ would auto-format the code, for example instead of:

public static void main( String[] args ){
    HashMap< Integer, ArrayList< String >> hm = new HashMap();

    groupAnagrams( args, hm );
    System.out.println( hm );
}

Like this:

public static void main(String[] args) {
    HashMap<Integer, ArrayList<String>> hm = new HashMap();

    groupAnagrams(args, hm);
    System.out.println(hm);
}

Once you adopt this style, life gets simpler. If somebody gives you code in a different format, you can just use your IDE to reformat it to a familiar style.

Other minor things

You don't need to cast the char values to (int):

productOfPrimes *= prime[(int) ch - (int) 'a'];

This works too and simpler:

productOfPrimes *= prime[ch - 'a'];
share|improve this answer
    
As there is only one output value, returning it instead of modifying an output parameter might seem more intuitive. – Adrian Leonhard Feb 16 '15 at 21:19
    
@AdrianLeonhard That was also my point in the "Code organization" section above – janos Feb 16 '15 at 21:39
1  
That's a very elaborate answer :) I couldn't have asked for better. Thanks. For my formating choices, I do that because I feel it makes the code more readable. Not sure how harmful that it is though! – T. Rex Feb 18 '15 at 15:33

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.