Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

Problem: Remove duplicates from a given array of integers.

I want this method to work without using data sets except an array.

My code is currently \$O(n^2 + n) = O(n^2)\$. How do I shorten this? My main goal in writing this code was in the mindset that I am in an interview and that I am given only 20 minutes to solve this problem. Thus, I tried to find the quickest way of solving the problem. Advice on how better to approach these problems would also be very helpful!

public int[] removeDuplicate(int[] arr)
{
    int temp[] = new int[arr.length];
    boolean[] check = new boolean[arr.length];
    int index = 0;
    for(int i = 0; i < arr.length; i++)
    {
        if(check[i] != true)
            temp[index++] = arr[i];
        else
            continue;

        int var = arr[i];
        for(int j = i+1; j < arr.length; j++)
        {
            if(var == arr[j])
                check[j] = true;
        }
    }
    int returnArray[] = new int[index];

    for(int i = 0; i < returnArray.length; i++)
        returnArray[i] = temp[i];

    return returnArray;
}
share|improve this question
2  
Wouldn't traversing the array once, adding each item to a hashset as you go (i.e. not adding dupes), be O(n) already? – Mat's Mug yesterday
    
Actually, I am trying to do this method without the use of any other data sets. I added this specification to the question. But I would love to see a suggested answer using an hashset. – Jonathan Smit yesterday
    
@Mat'sMug the complexity of theadd method of HashSet is O(logM) where M is the number of elements in the HashSet. So using it will give a complexity of O(N logN) and not O(N) – webNeat 8 hours ago
up vote 5 down vote accepted

Your algorithm is relatively clear to see, and it makes sense, and it works. In an interview, these are good things.

Your algorithm is an \$O(n^2)\$ algorithm in time complexity - for every value in the array, you effectively check every other value for duplication.

There are a couple of things I would note here, though.

The check array is unnecessary. Instead of marking all 'future' duplicates of the value, why don't you check all processed values so far? Consider this "inner" loop:

boolean dup = false;
for (int j = 0; j < index, j++) {
    if (temp[j] == value) {
        dup = true;
        break;
    }
}
if (!dup) {
    temp[index++] = value;
}

The above loop also allows you to "enhance" the outer for-loop:

for (int value : arr) {
    ....
}

The above algorithm is also an \$O(n^2)\$ one, but it will likely run faster for 2 reasons:

  1. It breaks early if it finds a duplicate
  2. The inner loop only processes unique values, not all values.

As an aside, if the order of the output values is not important, I would consider keeping the temp array as a sorted array.... and using Arrays.binarySearch(...). This will make the algorithm complexity a bit more complex ... something in between \$O(n \log n)\$ and \$O(n^2)\$.... profiling would be required to see if it is an actual improvement.....

for (int value : arr) {
    int ip = Arrays.binarySearch(temp, 0, index, value);
    if ip >= 0 {
        // duplicate
        continue;
    }

    // insert value in to temp in the sorted order.
    ip = -ip - 1;
    System.arrayCopy(temp, ip, temp, ip + 1, index - ip);
    temp[ip] = value;
    index++;
}

The binary search is a \$O(\log n)\$ operation, and the array-copy is a hard thing to quantify in terms of n because it "depends".

Anyway, food for thought.

Of course, you could just sort the array, and then skip duplicates that way... it would make it a more apparent \$O(n \log n)\$ operation.

share|improve this answer

Problem: Remove duplicates from a given array of integers.

An important part of programming interviews is asking clarifying questions, and try to not assume anything. For example:

  • What kind of values are we talking about?
  • What is the range of possible values?
  • Is the array sorted?
  • How many values are there?
  • How many duplicate values are there?
  • Do you need to preserve the order of the elements that are not removed?

Depending on the answers, you may have a very different problem to solve. For example if the range of values is limited to [1:100], or if the array is sorted.

My gut reaction is to assume that the values are in the full range of valid integers, the array is not sorted, and there may be any number of duplicate values. However, during an interview it's important to not assume anything. Whether the candidate asks clarifying questions or just quietly makes assumptions can make a big difference. (How many times do programmers think they know what system they are supposed to build but they don't really know? Lots of times!)

My main goal in writing this code was in the mindset that I am in an interview and that I am given only 20 minutes to solve this problem.

It's usually ok to make something work the easiest possible way and then gradually make it better. In this example the easiest solution seems to be to iterate into a set and then convert that to an array.

public int[] removeDuplicate(int[] arr) {
    return toIntArray(toSet(arr));
}

private Set<Integer> toSet(int[] arr) {
    Set<Integer> set = new LinkedHashSet<>();
    for (int num : arr) {
        set.add(num);
    }
    return set;
}

private int[] toIntArray(Set<Integer> set) {
    int[] result = new int[set.size()];
    int i = 0;
    for (int num : set) {
        result[i++] = num;
    }
    return result;
}

Code review

Using the boolean literals true and false in conditions like this can be a red flag:

if (check[i] != true)

Use boolean expressions directly:

if (!check[i])

This is manual array copy:

    int returnArray[] = new int[index];

    for(int i = 0; i < returnArray.length; i++)
        returnArray[i] = temp[i];

    return returnArray;

It's recommended to use System.arraycopy or Arrays.copyOf instead. In this example you can replace the above lines with this one:

    return Arrays.copyOf(temp, index);

temp is a very poor name. Almost anything would be better, how about unique. With index renamed to uniqueCount, you end up with something quite natural:

    return Arrays.copyOf(unique, uniqueCount);
share|improve this answer

This version does not need a temp array for storing the result temporary, but I think its a bit more readable (potentially an opinion). The downside is that it will transverse the input array when populating the result array, while yours only iterates returnArray.length

public static int[] removeDups(int[] arg) {
    boolean[] isDup = new boolean[arg.length];
    int uniquesLength = arg.length;

    for(int i = 0; i < arg.length; i++) {
        if(!isDup[i]) {
            for (int k = i + 1; k < arg.length; k++) {
                if(!isDup[k] && arg[i] == arg[k]) {
                    isDup[k] = true;
                    uniquesLength--;    
                }
            }
        }
    }
    int[] uniques = new int[uniquesLength];
    int uniquesIndex = 0;

    for(int r = 0; r < arg.length; r++) {
        if(!isDup[r]) {
            uniques[uniquesIndex++] = arg[r];
        }   
    }
    return uniques;
}

I don't mean to say this code is better, because since you are concerned with time complexity, my version is worse (in that sense) because it will transverse the argument array one more time, while your version only transverses the (possibly) smaller result arrays length.

It can be done with less memory, and still fall within the same time complexity order.

On having better readability, I think not having a temp array makes the code clearer. And other posters already pointed out the importance of having meaningful variable names (which also made me correct my own variable names).

share|improve this answer
1  
While you have said that this code is more readable, it's more helpful to go into detail about it, say how, for example, clearer names improve readability. The feedback and suggestions are more important than just posting better code. – SuperBiasedMan 11 hours ago

Although you said this method should work without Set, you also mentionned that you would love to see one solution with a HashSet.

In Java 8, very concise but not very efficient (behind the hoods, there is a boxing/unboxing operation for each element).

public int[] removeDuplicate(int[] arr) {
    return Arrays.stream(arr)
            .distinct()
            .toArray();
}
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.