Two index computations differing in execution time

Question

Code

I have the following piece of code

        #define equidIND(divid, recip) (size_t)((divid) * (recip) + 0.5)
        ...
        // Get index for average velocity
        //idx_velo_avg    = (size_t)(velo_avg * funmaps_velo->map_velo_step_recip + 0.5);
        idx_velo_avg = equidIND(velo_avg, funmaps_velo->map_velo_step_recip);
        // Get effective velocity
        velo_eff = sqrt(velo_pos_k * velo_pos_k + accel * delta_s);
        // Get index for effective velocity (HOTSPOT)
        idx_velo_eff = equidIND(velo_eff, funmaps_velo->map_velo_step_recip);

Explanation

It contains three lines with operations

First line - idx_velo_avg = ... Getting index by value using rounding/NNinterpolation:
1. Mapping value to index by division turned into a multiplication with the reciprocal:
  velo_avg * funmaps_velo->map_velo_step_recip
2. Adding 0.5 to the resulting double
3. Truncating using a cast (size_t). The values are always > 0 btw.
The second line computes then the effective velocity, using a call to sqrt.
The third line does exactly the same as the first, but with a different base value (first line velo_avg - third line velo_eff)

Profiling

Very Sleepy and AMD CodeXL give similar profiling results

code
The snapshot is taken from AMD CodeXL.

Question(s)

What I wonder is:

Is this way of computing an index a good, fast and secure one?
Why does the third line apparently more than 4 times slower than the first. At least this is how I interpret the percentage values in CodeXL and Very Sleepy kind of confirms this interpretation. Has it to do with the intermediate call to sqrt?

ASM

As additional information, the assembly view shows the following (link to image)

enter image description here

squeamish ossifrage · Answer 1 · 2015-01-30 13:29:29Z

These two statements are functionally identical:

idx_velo_avg = (size_t)((velo_avg) * (funmaps_velo->map_velo_step_recip) + 0.5);

idx_velo_eff = (size_t)((velo_eff) * (funmaps_velo->map_velo_step_recip) + 0.5);

The only difference is the variable on the left side of the multiplication. Since velo_eff is calculated just before the second of these statements, we can safely assume that the slowdown isn't caused by a cache miss.

So I think the only possible explanation is that the processor has to wait for the result of the square root calculation to emerge from the processing pipeline before it can start calculating idx_velo_eff.

Try calculating the square root first, and see if that helps:

velo_eff = sqrt(velo_pos_k * velo_pos_k + accel * delta_s);
idx_velo_avg = equidIND(velo_avg, funmaps_velo->map_velo_step_recip);
idx_velo_eff = equidIND(velo_eff, funmaps_velo->map_velo_step_recip);

Also, check your compiler optimization settings. A decent compiler should be able to do this sort of thing for you.

asked	7 months ago
viewed	39 times
active	7 months ago

current community

your communities

more stack exchange communities

Two index computations differing in execution time

Code

Explanation

Profiling

Question(s)

ASM

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged c assembly casting or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Two index computations differing in execution time

Code

Explanation

Profiling

Question(s)

ASM

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged c assembly casting or ask your own question.

Related

Hot Network Questions