There are two solutions which come to mind for your problem, but first, let's look through your code:
- you have the variables
pointer1
and pointer2
. These are very descriptive names, which is great, but unfortunately they are not pointers. They are the values at the i
and k
positions in the array. In other words, i
and k
are the actual pointers. I would call pointer1
and pointer2
something like val1
and val2
. But, in fact, I would drop the pointers entirely, and just use an index in to the data (why copy the data when you don't need to....? )
- In most languages the variable name
i
is 'reserved' for indexed looping through data, just like you have for (int i = 0; i < len; i++) {...}
. It is standard to use j
and k
as the variable names for any inner/nested loops you need. i
is the top level, j
is the first nested level, k
is the subsequent level, and, if you need more than that, you should be calling a function/method (Never use l
as a variable name, it is too easy to confuse with 1
). You are breaking this convention by using j
as an array, and going immediately to k
inside your loop. My suggestion is to rename j
to be data
, and to rename k
to be j
.
- as a general rule, when you have a loop constrained by an index, like you have
for (int i = 0; i < len; i++) { ...
, you should never modify the index inside the loop. I cannot think of a reason why it should ever be necessary, and, certainly, it should never be conditionally modified (inside an if-statement inside another loop). This will lead to bugs. In fact, the simple fact that you have that inner i++
tells me you do not understand your own code's behaviour.
- Your method should return the interface type
List<String>
instead of the concrete class ArrayList<String>
since there is no reason for outsiders to know the physical implementation.
storage
is not a great name for a variable.... everything could be called storage
since all variables store things... I would choose something like: results
.
- Your inner
if
condition makes sure that the sums are a match, but also checks pointer1 != pointer2
. This second check is a problem, because it should not be testing the values, but the indexes.... For example what if the data is [4,4]
and the target sum is 8
?
- To get your result pair as a String you do:
String pair = Integer.toString(pointer1) + " and " + Integer.toString(pointer2)
.... this is serious overkill, try this instead: String pair = pointer1 + " and " + pointer2
. In Java the +
String operator will implicitly convert the integers to a String... no need to do it yourself.
So, you have some style problems, and some concerns about your inner workings.
Now, let's talk about your algorithm..... It loops through each value. For each value, it then compares it against every other value (except itself). So, you loop n
times, and, for each n
you do m
checks. Your complexity is O(n x m), but, in reality, n
and m
are the same, so your complexity is O(n2)
Now, about that i++
inside the k
loop.... the reason you need it is because your k
loop is starting from the wrong place. your algorithm is wrong. Let me explain.
If you have the data [ 0, 1, 2, 3, 4, 5 ]
, you are using your i
index to loop through all the data. Your first data will be 0
. You sum this with 5
(Note, you have summed 0 + 5
now), and check the result, then with 4
, and so on. When you are done, you move i
to the value 1
, and you keep moving i
until you get to the value 5
for i
. You then compare **this value 5
to all the other values, including 0
. This is *the second time you sum the items 0 + 5
(but it is now 5 + 0
) *.
You need to change the algorithm to only scan those values that have not yet been scanned. The easiest way to do this is to modify your inner/nested loop. Consider your code:
for (int i = 0; i < len; i++) {
...
for (int k = len - 1; k >= 0; k--) {
...
Now, we change the inner loop as follows:
for (int i = 0; i < len; i++) {
....
for (int j = i + 1; j < len; j++) {
(note, I have renamed k
to be j
). We start our j
index at i + 1
, because we don't need to sum values before i
since that sum was done already when i
was smaller.....
Now, our loops only calculate the sum
value once for each pair of values.
Unfortunately, our complexity is still O(n2) because even though our inner loop is only testing half (on average) the values, we have not actually changed the way the algorithm scales (if you double the input data, you with quadruple the execution time).
Still, even with this algorithm change, and some variable renaming, and some bug fixing, your code will look like:
public static List<String> pairSum(int[] data, int sum) {
int len = data.length;
int val1;
int val2;
List<String> results = new ArrayList<String>();
for (int i = 0; i < len; i++) {
val1 = data[i];
for (int j = i + 1; j < len; j++) {
val2 = data[j];
if (val1 + val2 == sum) {
String pair = val1 + " and " + val2;
results.add(pair);
}
}
}
return results;
}
This is OK, but my preference would be to strip all the val1
variables entirely, and you can simplify some other things too. Consider the following more 'compact' code:
public static List<String> pairSum(int[] data, int sum) {
List<String> results = new ArrayList<String>();
for (int i = 0; i < data.length; i++) {
for (int j = i + 1; j < data.length; j++) {
if (data[i] + data[j] == sum) {
results.add(data[i] + " and " + data[j]);
}
}
}
return results;
}
That is what I would recommend for a simple, easy-to-read option that is O(n2). For small data sets (say less than 50 or so), this solution will be great.
But, if you have more data, a better solution would be more complicated, but faster.
In your example code, you have an input test array that is all positive, sorted, and the values are unique. Let's assume that all the data will be this way (we can modify this assumption later).
Consider the algorithm that does:
take all values that are less than half the target `sum`.
find a value that is the difference
It would look like:
public static List<String> pairSumFast(int[] data, int sum) {
List<String> results = new ArrayList<String>();
int limit = sum / 2;
for (int i = 0; i < data.length && data[i] <= limit; i++) {
int match = Arrays.binarySearch(data, i + 1, data.length, sum - data[i]);
if (match >= 0) {
results.add(data[i] + " and " + data[match]);
}
}
return results;
}
This has the complexity of O( nlog(n) ) for the loop, and the binary-search respectively.
If your data set is large, and the values are unique, but the data is unsorted, then I would recommend the change:
public static List<String> pairSumFast(int[] indata, int sum) {
List<String> results = new ArrayList<String>();
int[] data = Arrays.copyOf(indata, indata.length);
Arrays.sort(data);
int limit = sum / 2;
for (int i = 0; i < data.length && data[i] <= limit; i++) {
int match = Arrays.binarySearch(data, i + 1, data.length, sum - data[i]);
if (match >= 0) {
results.add(data[i] + " and " + data[match]);
}
}
return results;
}
This is O(n log(n) ) for the complexity still... (it scales this way, even through it does more computation....)
Finally, if the data is not unique, then you will get duplicate output values. To accommodate that, I would add an inner loop to the match....:
public static List<String> pairSumFast(int[] data, int sum) {
List<String> results = new ArrayList<String>();
int limit = sum / 2;
for (int i = 0; i < data.length && data[i] <= limit; i++) {
int match = Arrays.binarySearch(data, i + 1, data.length, sum - data[i]);
if (match >= 0) {
String pair = data[i] + " and " + data[match];
results.add(pair);
// binary search will return the index of one match, look around to find others.
int more = match - 1;
while (more >=0 && data[more] == data[match]) {
results.add(pair);
more--;
}
more = match + 1;
while (more < data.length && data[more] == data[match]) {
results.add(pair);
more++;
}
}
}
return results;
}
i++;
produces the wrong result. I'd recommend you remove it from your question, since the code under review out to run and produce correct results. Why is it there? – Wayne Conrad yesterday