Introduction
Rabindranath Tagore said:
"যদি তোর ডাক শুনে কেউ না আসে তবে একলা চলো রে..."
Translated: "If no one responds to your call, then go your own way alone..." 1
After waiting for two whole days in vain, I've decided to answer my own question. In these two days, I've tried several methods and made certain interesting observations. Here, I shall present them in a logical order.
First, the Euclidean algorithm
Here is the implementation of the "Modern Euclidean Algorithm" (Algorithm A, TAOCP Vol - 2, Pg - 337, Section - 4.5.2) in Java:
/**
* Returns the GCD (Greatest Common Divisor, also known as the GCF -
* Greatest Common Factor, or the HCF - Highest Common Factor) of the two
* (signed, long) integers given.
* <p>
* It calculates the GCD using the Euclidean GCD algorithm.
*
* @param u The first integer (preferably the larger one).
* @param v The second integer (preferably the smaller one).
* @return The GCD (or GCF or HCF) of {@code u} and {@code v}.
*/
public static long euclideanGCD(long u, long v) {
// Corner cases
if (u < 0) u = -u;
if (v < 0) v = -v;
if (u == 0) return v;
if (v == 0) return u;
// Correction
if (u < v) {
long t = u;
u = v;
v = t;
}
// A1
while (v != 0) {
// A2
long r = u % v;
u = v;
v = r;
}
return u;
}
In order to test it, I wrote this class:
class Test {
public static void main(String[] args) {
// Warm-up
int g = 0;
while (++g > 0)
Math.sqrt(g);
// Timing starts
long time = System.nanoTime();
long sum = 0;
for (int i = 0; i < 20000; i++) {
for (int j = 0; j < i; j++) {
sum += GCD.euclideanGCD(i, j); // Over here
}
}
time = System.nanoTime() - time;
System.out.println("sum = " + sum);
System.out.println("time = " + time);
}
}
Here is its output:
sum = 1352820356
time = 35306053968
So, it took 35.3 seconds on my Windows 7 Laptop with a 2.1 Gz Intel Pentium processor. (If we run the test again, the time will of course be different, but it will be in the range: 34-36 seconds). We shall use this time as the reference point (and the sum to test the correctness of the other methods). Now, let's try and improve upon this. Here are a few points to note:
- This method is horribly slow because of the divisions (modulo operations) that it has to perform. We can improve upon this by making every effort to make the divisors smaller.
- It also has so many (rather unnecessary) assignment operations. This is also a factor that reduces the speed.
Keeping these points in mind, I wrote the method that I had provided in the question. For convenience, here it is again (now well documented):
The "Better" Euclidean algorithm
/**
* Returns the GCD (Greatest Common Divisor, also known as the GCF -
* Greatest Common Factor, or the HCF - Highest Common Factor) of the two
* (signed, long) integers given.
* <p>
* It calculates the GCD using a modified form of the Euclidean GCD algorithm.
*
* @param u The first integer.
* @param v The second integer.
* @return The GCD (or GCF or HCF) of {@code u} and {@code v}.
*/
public static long euclideanGCD2(long u, long v) {
// Corner cases
if (u < 0) u = -u;
if (v < 0) v = -v;
int a0 = Long.numberOfTrailingZeros(u);
int b0 = Long.numberOfTrailingZeros(v);
// Make both of them odd.
u >>>= a0;
v >>>= b0;
// t is the common power of 2 (may be 0)
int t = a0 < b0 ? a0 : b0;
while ((u & v) != 0) { // i.e. both are non-zero
if (u > v) {
u %= v;
u >>>= Long.numberOfTrailingZeros(u);
} else {
v %= u;
v >>>= Long.numberOfTrailingZeros(v);
}
}
return (u | v) << t; // {the non-zero value of the two} * 2^t
}
We test again by changing the line marked withe the comment to sum += GCD.euclideanGCD2(i, j);
Here is the output:
sum = 1352820356
time = 26400006940
This is a definite improvement (of around 25.2%), and this is about as far as we can go without any other help. But can go further if we use the Binary GCD algorithm. So here it is:
The binary GCD algorithm
/**
* Returns the GCD (Greatest Common Divisor, also known as the GCF -
* Greatest Common Factor, or the HCF - Highest Common Factor) of the two
* (signed, long) integers given.
* <p>
* This uses the Binary GCD algorithm.
*
* @param a The first integer.
* @param b The second integer.
* @return The GCD (or GCF or HCF) of {@code a} and {@code b}.
*/
public static long binaryGCD(long a, long b) {
// Corner cases
if (a < 0) a = -a;
if (b < 0) b = -b;
if (a == 0) return b;
if (b == 0) return a;
int a0 = Long.numberOfTrailingZeros(a);
int b0 = Long.numberOfTrailingZeros(b);
int t = a0 < b0 ? a0 : b0;
a >>>= a0;
b >>>= b0;
while (a != b) {
if (a > b) {
a -= b;
a >>>= Long.numberOfTrailingZeros(a);
} else {
b -= a;
b >>>= Long.numberOfTrailingZeros(b);
}
}
return a << t;
}
The output is:
sum = 1352820356
time = 13381525510
So, the improvement is quite dramatic (around 60% compared to the original). Now, we know that this algorithm slows down when \$u\$ and \$v\$ have very different bit lengths. So, let's try and improve this:
The "Hybrid" GCD algorithm
/**
* Returns the GCD (Greatest Common Divisor, also known as the GCF -
* Greatest Common Factor, or the HCF - Highest Common Factor) of the two
* (signed, long) integers given.
* <p>
* It uses a hybrid of Euclidean and the binary GCD algorithm.
*
* @param a The first integer.
* @param b The second integer.
* @return The GCD (or GCF or HCF) of {@code a} and {@code b}.
*/
public static long hybridGCD(long a, long b) {
// Corner cases
if (a < 0) a = -a;
if (b < 0) b = -b;
if (a == 0) return b;
if (b == 0) return a;
int a0 = Long.numberOfTrailingZeros(a);
int b0 = Long.numberOfTrailingZeros(b);
int t = a0 < b0 ? a0 : b0;
a >>>= a0;
b >>>= b0;
while ((a & b) != 0) {
// if 'a' and 'b' have almost same bit length
if (Math.abs(Long.numberOfLeadingZeros(a)
- Long.numberOfLeadingZeros(b)) < 2) {
return binaryGCD(a, b) << t;
}
// use Euclidean algorithm
if (a > b) {
a %= b;
a >>>= Long.numberOfTrailingZeros(a);
} else {
b %= a;
b >>>= Long.numberOfTrailingZeros(b);
}
}
return (a | b) << t;
}
The output is:
sum = 1352820356
time = 19178926693
Well, that's interesting! :-) It seems that our "optimized" algorithm is actually slower than the un-optimized version of the binary algorithm. This occurs because the difference in bit-length is not such a big factor in case of the binary algorithm(i.e. 64 bits max in case of long
). Sure enough, if we change the length-checking value from 2
to 4
, we see that the running time is ~16 seconds.
Conclusion
Here, we have seen that:
- Division-based algorithms should generally be avoided.
- Premature optimization seldom pays off.
- Shifts, additions and subtractions are the way to go in a binary environment.
Hence, the answers are:
- Yes, but there can be more.
- Many, many improvements... For starters, try reducing the absolute values of the remainders.
- If the library supports integers which can have huge differences in bit-length. Otherwise, I'd use the Binary GCD algorithm for
long
.
1 See this article for more info on the song.