Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Code

I have the following piece of code

        #define equidIND(divid, recip) (size_t)((divid) * (recip) + 0.5)
        ...
        // Get index for average velocity
        //idx_velo_avg    = (size_t)(velo_avg * funmaps_velo->map_velo_step_recip + 0.5);
        idx_velo_avg = equidIND(velo_avg, funmaps_velo->map_velo_step_recip);
        // Get effective velocity
        velo_eff = sqrt(velo_pos_k * velo_pos_k + accel * delta_s);
        // Get index for effective velocity (HOTSPOT)
        idx_velo_eff = equidIND(velo_eff, funmaps_velo->map_velo_step_recip);

Explanation

It contains three lines with operations

  • First line - idx_velo_avg = ... Getting index by value using rounding/NNinterpolation:

    1. Mapping value to index by division turned into a multiplication with the reciprocal:
      velo_avg * funmaps_velo->map_velo_step_recip
    2. Adding 0.5 to the resulting double
    3. Truncating using a cast (size_t). The values are always > 0 btw.
  • The second line computes then the effective velocity, using a call to sqrt.

  • The third line does exactly the same as the first, but with a different base value (first line velo_avg - third line velo_eff)

Profiling

Very Sleepy and AMD CodeXL give similar profiling results

code
The snapshot is taken from AMD CodeXL.

Question(s)

What I wonder is:

  1. Is this way of computing an index a good, fast and secure one?
  2. Why does the third line apparently more than 4 times slower than the first. At least this is how I interpret the percentage values in CodeXL and Very Sleepy kind of confirms this interpretation. Has it to do with the intermediate call to sqrt?

ASM

As additional information, the assembly view shows the following (link to image)

enter image description here

share|improve this question

1 Answer 1

These two statements are functionally identical:

idx_velo_avg = (size_t)((velo_avg) * (funmaps_velo->map_velo_step_recip) + 0.5);

idx_velo_eff = (size_t)((velo_eff) * (funmaps_velo->map_velo_step_recip) + 0.5);

The only difference is the variable on the left side of the multiplication. Since velo_eff is calculated just before the second of these statements, we can safely assume that the slowdown isn't caused by a cache miss.

So I think the only possible explanation is that the processor has to wait for the result of the square root calculation to emerge from the processing pipeline before it can start calculating idx_velo_eff.

Try calculating the square root first, and see if that helps:

velo_eff = sqrt(velo_pos_k * velo_pos_k + accel * delta_s);
idx_velo_avg = equidIND(velo_avg, funmaps_velo->map_velo_step_recip);
idx_velo_eff = equidIND(velo_eff, funmaps_velo->map_velo_step_recip);

Also, check your compiler optimization settings. A decent compiler should be able to do this sort of thing for you.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.