|
Code compilation issues
The code didn't compile cleanly for me.
There was a missing right paren on this line:
memcpy(&dest[dest_offset], &bitmapS[nBitmap][src_offset], sizeof(pixel);
I got a warning that memset was used without a prototype. I added #include <string.h> to fix that.
The code was missing the BMP and DIB structures. I substituted 14 for sizeof(BMP) and 40 for sizeof(DIB) to compensate.
Code style/formatting
Keeping in mind that style and formatting are somewhat subjective, here are some issues I spotted:
I didn't like the use of a nested for loop at the same indentation. It confused me a little bit:
for(i = 0; i < src_width; i++)
for(j = 0; j < src_height; j++)
I didn't like the "Windows-ization" of the types. Personally, I don't like seeing ddword , dword , and word because I really don't have any idea what those are. I prefer to just #include <stdint.h> and use uint64_t , uint32_t , and uint15_t . Plus, the stdint types are safer to use. Your dword type, for example, is presumably supposed to be 32 bits wide, but you typedef'd it to unsigned long . On a 64-bit machine, unsigned long is likely to be 64 bits wide.
Currently, bmp_rotate() takes an index as an argument, and then indexes into a global array of bitmaps. The code would be more reusable if the function took one bitmap as an argument, like this:
buffer bmp_rotate(const buffer bitmap, float angle);
Then it could be used from anywhere without being entangled with the global variable.
There is dead code in the inner loop. This code doesn't do anything:
deltaX = i + midX;
deltaY = j + midY;
Bugs in the code
The rotation is wrong. For example, if you pass in 0 for the angle, you should wind up with the same bitmap. Instead, you get a bitmap that is rotated 90 degrees to the right and mirror imaged. To fix it, I changed your rotation computation to this:
rotX = (dword)(midX + deltaX * cos(angle) + deltaY * sin(angle));
rotY = (dword)(midY - deltaX * sin(angle) + deltaY * cos(angle));
There are uninitialized pixels in the rotated bitmap. The destination bitmap is created like this:
buffer dest = malloc( src_pixels + src_nraw );
memcpy(dest, bitmapS[nBitmap], src_pixels);
This allocates a bitmap of the same size of the original and copies the header information from the original to the new bitmap. However, the pixels in the new bitmap are all uninitialized. Later, when doing the rotation, not all pixels in the new bitmap are set (because some are not mapped to any original pixel). So those pixels end up with a random value. You can fix this by simply using calloc() to allocate the new bitmap.
The offset of the pixels may not be right. Currently, the offset from the start of the bmp data to the first pixel is calculated like this:
const dword offset = sizeof(BMP) + sizeof(DIB);
I'm not an expert on bmp files, but I believe that the DIB part is a variable length part. So to get the right offset, you should be using the value you call src_pixels as your offset instead.
Speed issues covered up by the compiler
There are a couple of things that I spotted that you could have done better speedwise, but that the compiler (gcc -O4 ) seemed to fix for you.
Instead of using sin(angle) and cos(angle) in your inner loop, you could compute those once at the start of the function, and then use the precomputed versions later on:
float sin_angle = sin(angle);
float cos_angle = cos(angle);
...
rotX = (dword)(midX + deltaX * cos_angle + deltaY * sin_angle);
rotY = (dword)(midY - deltaX * sin_angle + deltaY * cos_angle);
Apparently, gcc is smart enough to know that sin() and cos() are pure functions, so it was able to optimize away the calls in the inner loop. Still, it's good practice to do this kind of optimization yourself. If instead of sin() and cos() you were calling your own function, then the compiler might not know the optimization was possible.
In bmp_find_xy() , all of the lines up to the computation of pixAddress are the same for a given sized bitmap. So you could do an optimization where you compute offset , rowsize , and single just once, and then use those precomputed values in the inner loop. Just as with #1, gcc was smart enough to do all that already.
Speed issues not covered by the compiler
The way that you order your nested loop, on every iteration, you move from one destination pixel to the one below it. If you swapped the two loops, you would instead move from one pixel to the one to its right. This is a better ordering, because it is more cache friendly. In fact, when I swapped the two loops, I was able to get a 5% speedup on a 45 degree rotation.
Oftentimes with loops, you can often simplify the calculation within an inner loop if you compute an initial value and then increment the value instead of recomputing the whole thing. For example, let's look at a sample loop:
for (i=0;i<n;i++) {
offset = i * ROW_WIDTH + startOffset;
do_something(offset);
}
You can simplify this to:
offset = 0 * ROW_WIDTH + startOffset;
for (i=0;i<n;i++) {
do_something(offset);
offset += ROW_WIDTH;
}
Using this technique, I simplified your main loop to this:
for(j = 0; j < src_height; j++) {
ddword dest_offset = src_pixels + j * rowsize;
float deltaY = j - midY;
float deltaX = 0 - midX;
float x = midX + deltaX * cos_angle + deltaY * sin_angle;
float y = midY - deltaX * sin_angle + deltaY * cos_angle;
for(i = 0; i < src_width; i++) {
dword rotX = x;
dword rotY = y;
if(rotX >= 0 && rotX < src_width && rotY >= 0 && rotY < src_height)
{
ddword src_offset = src_pixels + rotY * rowsize + rotX * single;
memcpy(&dest[dest_offset], &bitmap[src_offset], sizeof(pixel));
}
x += cos_angle;
y -= sin_angle;
dest_offset += single;
}
}
Notice how the inner loop has replaced three complicated expressions with three simple additions. This change resulted in a 140% speedup (it was 2.4x faster)!
Memcpy isn't so great for only 3 bytes. I replaced your call to memcpy with some inlined code:
// Old code:
// memcpy(&dest[dest_offset], &bitmap[src_offset], sizeof(pixel));
// New code:
dest[dest_offset+0] = bitmap[src_offset+0];
dest[dest_offset+1] = bitmap[src_offset+1];
dest[dest_offset+2] = bitmap[src_offset+2];
This resulted in a 9% speedup.
|