Finding pair of sum in sorted array

Question

In a sorted array, I am trying to find pairs that sum up to a certain value. I was wondering if anyone could help me improve my code in performance or memory. I believe my code is O(n). If there are other method for finding pairs the sum up to a certain value that is much more efficient, I am open to these methods!

static int[] intArray = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};

public static ArrayList<String> pairSum(int[] j, int sum) {
    int len = j.length;
    int pointer1;
    int pointer2;
    ArrayList<String> storage = new ArrayList<String>();

    for (int i = 0; i < len; i++) {
        pointer1 = j[i];
        for (int k = len - 1; k >= 0; k--) {
            pointer2 = j[k];
            if (pointer1 + pointer2 == sum && pointer1 != pointer2) {
                i++;
                String pair = Integer.toString(pointer1) + " and " + Integer.toString(pointer2) ;
                storage.add(pair);
            }
        }
    }
    return storage;
}

If I ran this code, say pairSum(intArray,16), I would get:

[0 and 16, 2 and 14, 4 and 12, 6 and 10, 9 and 7, 11 and 5, 13 and 3, 15 and 1]

If I remove the i++, the output would be:

[0 and 16, 1 and 15, 2 and 14, 3 and 13, 4 and 12, 5 and 11, 6 and 10, 7 and 9, 9 and 7, 10 and 6, 11 and 5, 12 and 4, 13 and 3, 14 and 2, 15 and 1, 16 and 0]

The i++; produces the wrong result. I'd recommend you remove it from your question, since the code under review out to run and produce correct results. Why is it there? — Wayne Conrad, yesterday

JakeP · Accepted Answer · 2014-01-10 16:27:55Z

There are two solutions which come to mind for your problem, but first, let's look through your code:

you have the variables pointer1 and pointer2. These are very descriptive names, which is great, but unfortunately they are not pointers. They are the values at the i and k positions in the array. In other words, i and k are the actual pointers. I would call pointer1 and pointer2 something like val1 and val2. But, in fact, I would drop the pointers entirely, and just use an index in to the data (why copy the data when you don't need to....? )
In most languages the variable name i is 'reserved' for indexed looping through data, just like you have for (int i = 0; i < len; i++) {...}. It is standard to use j and k as the variable names for any inner/nested loops you need. i is the top level, j is the first nested level, k is the subsequent level, and, if you need more than that, you should be calling a function/method (Never use l as a variable name, it is too easy to confuse with 1). You are breaking this convention by using j as an array, and going immediately to k inside your loop. My suggestion is to rename j to be data, and to rename k to be j.
as a general rule, when you have a loop constrained by an index, like you have for (int i = 0; i < len; i++) { ..., you should never modify the index inside the loop. I cannot think of a reason why it should ever be necessary, and, certainly, it should never be conditionally modified (inside an if-statement inside another loop). This will lead to bugs. In fact, the simple fact that you have that inner i++ tells me you do not understand your own code's behaviour.
Your method should return the interface type List<String> instead of the concrete class ArrayList<String> since there is no reason for outsiders to know the physical implementation.
storage is not a great name for a variable.... everything could be called storage since all variables store things... I would choose something like: results.
Your inner if condition makes sure that the sums are a match, but also checks pointer1 != pointer2. This second check is a problem, because it should not be testing the values, but the indexes.... For example what if the data is [4,4] and the target sum is 8?
To get your result pair as a String you do: String pair = Integer.toString(pointer1) + " and " + Integer.toString(pointer2).... this is serious overkill, try this instead: String pair = pointer1 + " and " + pointer2. In Java the + String operator will implicitly convert the integers to a String... no need to do it yourself.

So, you have some style problems, and some concerns about your inner workings.

Now, let's talk about your algorithm..... It loops through each value. For each value, it then compares it against every other value (except itself). So, you loop n times, and, for each n you do m checks. Your complexity is O(n x m), but, in reality, n and m are the same, so your complexity is O(n²)

Now, about that i++ inside the k loop.... the reason you need it is because your k loop is starting from the wrong place. your algorithm is wrong. Let me explain.

If you have the data [ 0, 1, 2, 3, 4, 5 ] , you are using your i index to loop through all the data. Your first data will be 0. You sum this with 5 (Note, you have summed 0 + 5 now), and check the result, then with 4, and so on. When you are done, you move i to the value 1, and you keep moving i until you get to the value 5 for i. You then compare **this value 5 to all the other values, including 0. This is *the second time you sum the items 0 + 5 (but it is now 5 + 0) *.

You need to change the algorithm to only scan those values that have not yet been scanned. The easiest way to do this is to modify your inner/nested loop. Consider your code:

   for (int i = 0; i < len; i++) {
       ...
       for (int k = len - 1; k >= 0; k--) {
           ...

Now, we change the inner loop as follows:

for (int i = 0; i < len; i++) {
    ....
    for (int j = i + 1; j < len; j++) {

(note, I have renamed k to be j). We start our j index at i + 1, because we don't need to sum values before i since that sum was done already when i was smaller.....

Now, our loops only calculate the sum value once for each pair of values.

Unfortunately, our complexity is still O(n²) because even though our inner loop is only testing half (on average) the values, we have not actually changed the way the algorithm scales (if you double the input data, you with quadruple the execution time).

Still, even with this algorithm change, and some variable renaming, and some bug fixing, your code will look like:

public static List<String> pairSum(int[] data, int sum) {
    int len = data.length;
    int val1;
    int val2;
    List<String> results = new ArrayList<String>();

    for (int i = 0; i < len; i++) {
        val1 = data[i];
        for (int j = i + 1; j < len; j++) {
            val2 = data[j];
            if (val1 + val2 == sum) {
                String pair = val1 + " and " + val2;
                results.add(pair);
            }
        }
    }
    return results;
}

This is OK, but my preference would be to strip all the val1 variables entirely, and you can simplify some other things too. Consider the following more 'compact' code:

public static List<String> pairSum(int[] data, int sum) {
    List<String> results = new ArrayList<String>();

    for (int i = 0; i < data.length; i++) {
        for (int j = i + 1; j < data.length; j++) {
            if (data[i] + data[j] == sum) {
                results.add(data[i] + " and " + data[j]);
            }
        }
    }

    return results;
}

That is what I would recommend for a simple, easy-to-read option that is O(n²). For small data sets (say less than 50 or so), this solution will be great.

But, if you have more data, a better solution would be more complicated, but faster.

In your example code, you have an input test array that is all positive, sorted, and the values are unique. Let's assume that all the data will be this way (we can modify this assumption later).

Consider the algorithm that does:

take all values that are less than half the target `sum`.
find a value that is the difference

It would look like:

public static List<String> pairSumFast(int[] data, int sum) {
    List<String> results = new ArrayList<String>();

    int limit = sum / 2;
    for (int i = 0; i < data.length && data[i] <= limit; i++) {
        int match = Arrays.binarySearch(data, i + 1, data.length, sum - data[i]);
        if (match >= 0) {
            results.add(data[i] + " and " + data[match]);
        }
    }

    return results;
}

This has the complexity of O( nlog(n) ) for the loop, and the binary-search respectively.

If your data set is large, and the values are unique, but the data is unsorted, then I would recommend the change:

public static List<String> pairSumFast(int[] indata, int sum) {
    List<String> results = new ArrayList<String>();

    int[] data = Arrays.copyOf(indata, indata.length);
    Arrays.sort(data);

    int limit = sum / 2;
    for (int i = 0; i < data.length && data[i] <= limit; i++) {
        int match = Arrays.binarySearch(data, i + 1, data.length, sum - data[i]);
        if (match >= 0) {
            results.add(data[i] + " and " + data[match]);
        }
    }

    return results;
}

This is O(n log(n) ) for the complexity still... (it scales this way, even through it does more computation....)

Finally, if the data is not unique, then you will get duplicate output values. To accommodate that, I would add an inner loop to the match....:

public static List<String> pairSumFast(int[] data, int sum) {
    List<String> results = new ArrayList<String>();

    int limit = sum / 2;
    for (int i = 0; i < data.length && data[i] <= limit; i++) {
        int match = Arrays.binarySearch(data, i + 1, data.length, sum - data[i]);
        if (match >= 0) {
            String pair = data[i] + " and " + data[match];
            results.add(pair);
            // binary search will return the index of one match, look around to find others.
            int more = match - 1;
            while (more >=0 && data[more] == data[match]) {
                results.add(pair);
                more--;
            }
            more = match + 1;
            while (more < data.length && data[more] == data[match]) {
                results.add(pair);
                more++;
            }
        }
    }

    return results;
}

Much more comprehensive than my answer. I hope the OP gives this one the checkmark. — Wayne Conrad, yesterday
Yes thank you so much for taking the time to write this ! I really appreciate it! I learn a lot!! @WayneConrad thank you for your answer as well! — Liondancer, yesterday
Thanks! I was 2-thirds through the answer when I saw your's pop up .... and I figured I covered things you had not, so I posted.... That's why there's a lot of duplication between our answers, sorry ;-) — rolfl, yesterday
@rolfi, One of the things I like about codereview is that the fastest gun doesn't always win. I've been out of the Java game for a long time, so I'm glad someone who knows it well was able to cover what I missed. — Wayne Conrad, yesterday
A few things I saw, the results declaration should be List<String>, int more can be directly set and there's a stray space at more --. — Bobby, yesterday

Wayne Conrad · Answer 2 · 2014-01-10 14:28:43Z

Code improvements

Before addressing efficiency, let's see if there are any improvements that could be made to the code.

I suspect some of the names could be improved.

pointer1 -> value1
pointer2 -> value2
j -> values
storage -> value_pairs
pairSum -> pairsThatAddUpToASum

I would consider getting rid of this temporary:

int len = j.length;

as it's not carrying its weight. Using j.length inline is not burdensome, and the compiler/runtime are likely to produce adequately fast code without it.

Lastly, I wonder about the decision to return the pairs as strings. This depends upon how the function is being used, but I would often rather return the pairs of numbers, and let the caller worry about formatting. This decouples the function from I/O.

A bug?

This term in the if:

pointer1 != pointer2

Might be a bug. If the list is guaranteed to contain no duplicates, it will work fine (but might be more communicative as i != k). If, however, the list may contain duplicates, and it is the intent of the code to return the resulting duplicate pairs, then this must be i != k.

Efficiency.

In order to be O(n), the algorithm must visit each element a constant number of times. We typically think of a O(n) algorithm as one that visits each element once. In this algorithm, both the outer and inner loop visit each element. That makes this an O(n^2) algorithm.

A google search for "efficiently finding pairs that add up to a sum" finds, not surprisingly, several Stack Overflow questions. Here's one of them that may be useful to you: http://stackoverflow.com/questions/1494130/design-an-algorithm-to-find-all-pairs-of-integers-within-an-array-which-sum-to-a

Thanks for this analysis! My code is O(n^2) didnt realize this! — Liondancer, yesterday

asked	yesterday
viewed	323 times
active	yesterday

Explore our sites

Finding pair of sum in sorted array

2 Answers

Code improvements

A bug?

Efficiency.

Your Answer

Not the answer you're looking for? Browse other questions tagged java performance or ask your own question.

Hot Network Questions

Explore our sites

Finding pair of sum in sorted array

2 Answers

Did you find this question interesting? Try our newsletter

Code improvements

A bug?

Efficiency.

Your Answer

Sign up or login

Post as a guest

Not the answer you're looking for? Browse other questions tagged java performance or ask your own question.

Related

Hot Network Questions