Incremental Steepest Descent (gradient descent) Algorithm

Question

Here is the code I wrote to calculate the minimum of a complex function. The code uses the incremental steepest descent algorithm which uses gradients to find the line of steepest descent and uses a heuristic formula to find the minimum along that line. The algorithm should zig zag down a function and find a local minimum and usually a global minimum can be found by running the algorithm a number of times.

/*******************************************************************************
* Incremental Steepest Descent Algorithm
*
* Grant Williams
*
* Version 1.0.0
* Feb 10, 2016
*
* Implementation of the incremental steepest descent algorithm
*
* To Compile Please use icc -std=c++11 if using intel or g++ -std=c++11 if using GCC.
*
*
*******************************************************************************/

#include <iostream>
#include <cmath>
#include <vector>


double get_rand(double HI, double LO){

    double num = LO + static_cast <double> (rand()) / (RAND_MAX / (HI - LO));

    return num;
}

double f1(double x, double y) {

    // Beale's Function
    // minimum is 0 at (3,0.5)
    // boundaries are [-4.5, 4.5] for x & y
    return (1.5 - x + x * y) * (1.5 - x + x * y) + (2.25 - x + x * y * y) * (2.25 - x + x * y * y) + (2.625 - x + x * y * y * y) * (2.625 - x + x * y * y * y);
}

double x_partial(double x, double y) {

    double x_prime = (f1(x + 0.000001,y) - f1(x - 0.000001, y)) / 0.000002;
    return x_prime;
}

double y_partial(double x, double y) {

    double y_prime = (f1(x, y + 0.000001) - f1(x, y - 0.000001)) / 0.000002;
    return y_prime;
}

void seed_rand(){
    // seed random number
    srand (static_cast <unsigned> (time(0)));
}

double isd(){
    // Declare Variables
    double tol = 0.0000001; // tolerance for convergence
    int iter = 0;
    int max_iter = 100000; // maximum number of iterations


    // coefficients for gradient
    double const alpha = 1.1; // expansion
    double const beta = 0.5; // contraction
    double ds = 0.5; // gradient variable
    double x, y, grad, gradx, grady, coeff;
    double dx, dy;
    double last_fit, fit;

    // boundaries for variables
    double low = -4.5;
    double high = 4.5;

    // get initial guess
    double x0 = get_rand(low, high);
    double y0 = get_rand(low, high);


    bool constraint = true;

    /* begin actual ISD algorithm */



    last_fit = f1(x0,y0); // get initial fitness
    fit = last_fit;

    //begin main loop
    for (iter = 0; iter < max_iter; iter++){

        gradx = -1 * x_partial(x0, y0);
        grady = -1 * y_partial(x0, y0);
        grad = std::sqrt(gradx * gradx + grady * grady);

        if (grad == 0){
            //std::cout << "grad == 0 \n";
            return fit;
        }

        coeff = ds / grad; // get cauchy coefficient

        // advance x and y by coefficient
        x = x0 + coeff * gradx;
        y = y0 + coeff * grady;

        if (x < low || x > high || y < low || y > high){
            constraint = false;
        }

        //get new fitness
        fit = f1(x,y);

        if (std::abs(fit-last_fit)<= tol){
            //std::cout << "fit: " << fit << " lastfit: " << last_fit << "\n";
            return fit;
        }

        dx = x - x0;
        dy = y - y0;

        if (std::abs(dx) <= tol && std::abs(dy) <= tol){
            //std::cout << "dx, dy\n";
            //std::cout << "x: " << x << " dx: " << dx << " y: " << y << " dy: " << dy << "\n";
            return fit;
        }


        // cauchy step was too big
        if (fit > last_fit || !constraint){

            ds *= beta;
        }else{

            ds *= alpha;
            last_fit = fit;
            x0 = x;
            y0 = y;
        }


    }

    if (iter == (max_iter -1)){
        std::cout << "Solution did not converge quickly enough \n";
    }else{
        //std::cout << "Gen: " << iter << " Min: " << fit << " x: " << x << " y: " << y << "\n";
        return fit;
    }

    return fit; // return our best value i guess

}


int main()
{

    //create seed for random numbers
    seed_rand();
    const int trials = 10000;
    std::vector<double> mins;

    // for stats printing
    double best = 100000;
    double avg;
    double avg_time;

    // start timing trials
    std::clock_t start;
    start = std::clock();

    // run all trials
    for (int i = 0; i < trials; i++){
        mins.push_back(isd());
    }

    // finish timing trials
    avg_time = ( std::clock() - start) / (double) CLOCKS_PER_SEC;

    // figure stats on our runs
    for (int j = 0; j < trials; j++){
        best = best < mins[j] ? best : mins[j];
        avg += mins[j];
    }

    avg /= trials;

    std::cout << "Absolute minimum is: 0 and is found at: (3,0.5)\n-----------------------------------------\n";
    std::cout << "The best minimum was: " << best << "\nThe average minimum was: " << avg;
    std::cout << "\nThe total computation time was: " << avg_time << "\nThe average time was: " << avg_time / trials << "\n\n";

    return 0;
}

And a sample output on my machine looks something like:

Absolute minimum is: 0 and is found at: (3,0.5)
-----------------------------------------
The best minimum was: 4.00079e-08
The average minimum was: 0.936557
The total computation time was: 0.618151
The average time was: 6.18151e-05

I'm hoping to get feedback on either ways to increase the algorithmic efficiency, ways to make the coding style better, or criticism on things that I did in a stupid way.

Jerry Coffin · Accepted Answer · 2016-02-17 06:50:29Z

Include necessary headers

You're using time and clock, but haven't included ctime.
You're using srand and rand, but having included cstdlib.

...but see below--you should probably include different headers and use different functions/classes instead of these.

Don't use `rand` or `srand`

Modern C++ includes the <random> header, with superior random number generation facilities. This includes distribution classes to generate random numbers in a range (without the bias that your get_rand introduces).

Don't use `clock`

Modern C++ includes the <chrono> header with superior timing facilities.

Do use applicable algorithms

For example, your loop:

for (int i = 0; i < trials; i++) {
    mins.push_back(isd());
}

...would be better written (in my opinion, anyway), as:

std::generate_n(std::back_inserter(mins), trials, isd);

Improve names

Right now, you have a fair number of names like tol, fit and grad that could be easily changed to tolerance, fitness, and gradient respectively to make the code a lot easier to read without going back to the comments to see an explanation of what each really is/means (though of these, I'd say grad is the least crucial).

constraint is somewhat the same, but strikes me as even worse. It apparently really means "within constraints" (or something similar), but doesn't really say that.

Remove magic numbers

For a couple of obvious examples, in x_partial and y_partial you have 0.000001 in a couple of places apiece. I'd rewrite these as something like:

double x_partial(double x, double y) {

    static const double step = 0.000001;

    double x_prime = (f1(x + step, y) - f1(x - step, y)) / (2 * step);
    return x_prime;
}

Consider using scientific notation

Well, it's not really scientific notation, but C++'s approximation of it:

static const double step = 1e-5;

static const double tol = 1e-7;    // tolerance for convergence

Use meaningful comments

Most of the content of your file header (for the most glaring example) should be handled by any half-way decent version control system.

Some of the others are basically redundant:

void seed_rand(){
    // seed random number
    srand (static_cast <unsigned> (time(0)));
}

I've actually just learned of <chrono> and <random> this week, so i'll make sure to update those! I actually haven't heard of generate before this, is there a reason to use it over a for loop besides legibility? Does it produce more efficient code or is it just easier to understand? The names and magic numbers come from the algorithm notes we were given in my optimization class, but i'll make sure to update those for sure. Finally i actually had no idea C++ had 1e10 notation at all, it was one of the things i missed from when i learned programming with Matlab. — Grant Williams, Feb 17 at 6:48
@GrantWilliams: Algorithms don't always produce more efficient code, but it is possible (and soon we'll have parallel algorithms that often will, at least for code that's easy to execute concurrently). — Jerry Coffin, Feb 17 at 6:52
So why would you personally chose to use std::generate_n over a looping method? I'm still quite unfamiliar with C++11 so i'm trying to learn how and when to use it correctly! — Grant Williams, Feb 17 at 6:54
@GrantWilliams: I'd personally choose it because it immediately gives a more specific idea of the intent of the code, without having to read through the body of the loop to see what it's doing in each iteration. Easier conversion to parallel execution is icing on the cake. — Jerry Coffin, Feb 17 at 6:55
that makes perfect sense! How difficult would it be to parallelize the runs in C++? I'm quite familiar with how it would be done in Matlab, but I'm not sure where to start in C++. Any function names I can google to get a start on the literature? — Grant Williams, Feb 17 at 7:00

asked	8 months ago
viewed	939 times
active	8 months ago

current community

your communities

more stack exchange communities

Incremental Steepest Descent (gradient descent) Algorithm

1 Answer 1

Include necessary headers

Don't use `rand` or `srand`

Don't use `clock`

Do use applicable algorithms

Improve names

Remove magic numbers

Consider using scientific notation

Use meaningful comments

Your Answer

Not the answer you're looking for? Browse other questions tagged c++ performance or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Incremental Steepest Descent (gradient descent) Algorithm

1 Answer 1

Include necessary headers

Don't use rand or srand

Don't use clock

Do use applicable algorithms

Improve names

Remove magic numbers

Consider using scientific notation

Use meaningful comments

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged c++ performance or ask your own question.

Related

Hot Network Questions

Don't use `rand` or `srand`

Don't use `clock`