Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I've been trying to optimize this piece of code:

void detect_optimized(int width, int height, int threshold)
{
  int x, y, z;
  int tmp;

  for (y = 1; y < width-1; y++)
    for (x = 1; x < height-1; x++)
      for (z = 0; z < 3; z++)
      {
    tmp = mask_product(mask,a,x,y,z);
    if (tmp>255)
          tmp = 255;
        if (tmp<threshold)
          tmp = 0;
        c[x][y][z] = 255-tmp;
      }
  return;
}

So far I've tried "Blocking" and and a few other things, but can't seem to get it to run any faster.

Blocking resulted in:

  for(yy = 1; yy<height-1; yy+=4){
    for(xx = 1; xx<width -1; xx+=4){
        for (y = yy; y < 4+yy; y++){
            for (x = xx; x < 4+xx; x++){
                for (z = 0; z < 3; z++)
                      {
                         tmp = mask_product(mask,a,x,y,z);
                         if (tmp>255)
                          tmp = 255;
                         if (tmp<threshold)
                          tmp = 0;
                         c[x][y][z] = 255-tmp;
                      }}}}}

Which ran at the same speed as the original program.

Any suggestions would be great.

mask_function cannot be changed, but here is its code:

int mask_product(int m[3][3], byte bitmap[MAX_ROW][MAX_COL][NUM_COLORS], int x, int y, int z)
{
  int tmp[9];
  int i, sum;

// ADDED THIS LINE (sum = 0) TO FIX THE BUG
  sum = 0;

  tmp[0] = m[0][0]*bitmap[x-1][y-1][z];
  tmp[1] = m[1][0]*bitmap[x][y-1][z];
  tmp[2] = m[2][0]*bitmap[x+1][y-1][z];
  tmp[3] = m[0][1]*bitmap[x-1][y][z];
  tmp[4] = m[1][1]*bitmap[x][y][z];
  tmp[5] = m[2][1]*bitmap[x+1][y][z];
  tmp[6] = m[0][2]*bitmap[x-1][y+1][z];
  tmp[7] = m[1][2]*bitmap[x][y+1][z];
  tmp[8] = m[2][2]*bitmap[x+1][y+1][z];
  for (i=0; i<9; i++)
    sum = sum + tmp[i];
  return sum;
}
share|improve this question
    
The indentation after tmp = 255; indicates that you want that if to be wrapped into the other one's body. Is that correct? If so, then you are missing parenthesis. If not, then the indentation should be fixed. –  Nobody yesterday
    
Is mask_product some standard function that you may not change? If not then please provide its code as well. –  Nobody yesterday
    
The format must have changed when I copied it. Fixed it. –  user3311482 yesterday
1  
Did you try optimized compilation? Most of the advisable optimizations can be done automatically by the compiler so you can concentrate on getting the code right (instead of obfuscating it with "optimizations"). –  Nobody yesterday
add comment

2 Answers

Do not expect much:

void detect_optimized(int width, int height, int threshold)
{
  int x, y, z;
  int tmp;
  int widthM1= width-1;
  int heightM1=height-1;

  for (y = 1; y < widthM1; y++){
    for (x = 1; x < heightM1; x++){
      for (z = 0; z < 3; z++){
        tmp = mask_product(mask,a,x,y,z);
        if (tmp>255)
          c[x][y][z] = 0;
        else if (tmp<threshold)
          c[x][y][z] = 255;
        else
          c[x][y][z] = 255 ^ tmp;  // in this case xor is the same as -
      }
    } 
  }
  return;
}

You can also unroll the z- loop, by copying the inner body 2 more times.

If you can manage to change the mask_function:

int mask_product(int m[3][3], byte bitmap[MAX_ROW][MAX_COL][NUM_COLORS], int x, int y, int z)
{
  int xp1=x+1;
  int xm1=x-1;
  int yp1=y+1;
  int ym1=y-1;
  return  m[0][0]*bitmap[xm1][ym1][z];
    + m[1][0]*bitmap[x][ym1][z];
    + m[2][0]*bitmap[xp1][ym1][z];
    + m[0][1]*bitmap[xm1][y][z];
    + m[1][1]*bitmap[x][y][z];
    + m[2][1]*bitmap[xp1][y][z];
    + m[0][2]*bitmap[xm1][yp1][z];
    + m[1][2]*bitmap[x][yp1][z];
    + m[2][2]*bitmap[xp1][yp1][z];
 }

Also have a look if you can make cour compiler inline the mask_product method.

share|improve this answer
add comment

I am afraid I can't do much about the performance but here are some other tips:

  • The z loop bound is not checked against NUM_COLORS (which I believe is the size of the array for this index).
  • z should be renamed to something like colorIndex.
  • Using prefix increments (++x) might give a speedup
  • Precalculating loop bounds to avoid repeated recalculation might give another speedup: maxY = width - 1; and then for (y = 1; y < maxY; ++y)

  • tmp should have a better name (what does it actually contain/represent?)

  • place the ifs into an own function (or macro) clamp (which is a widely used name and maybe has a standard implementation) and use it to make the code cleaner: tmp = clamp(tmp, 0, 255);

  • do not use global variables instead of function parameters, they introduce hard to debug side effects

  • Although you cannot change the mask function I would recommend avoiding the tmp array and instead directly summing up into sum.

You could gain some speedup by replacing the inner loop over the colors by MMX functionality which allows to do the same calculation on all colors at once. However, I am no expert in this and it likely requires inline assembly.

share|improve this answer
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.