Code Review Stack Exchange is a question and answer site for peer programmer code reviews. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I've written the following unsafe C# method to convert a byte array to Base64 encoding. It works, but it runs at a significantly slower rate than the built-in Convert.ToBase64String method.

public static unsafe string From(byte[] data)
    {
        int div = data.Length / 3;
        int mod = data.Length % 3;
        int length = data.Length;
        int b64Length = div * 4 + (mod == 0 ? 0 : 4);

        int c = 0;
        char[] r = new char[b64Length];
        fixed (char* tblPointer = base64Table)
        fixed (char* rPointer = r)
        fixed (byte* dPointer = data)
        {
            for (int i = 0; i < div * 3; i += 3)
            {
                rPointer[c] = tblPointer[(dPointer[i] & 0xfc) >> 2];
                rPointer[c + 1] = tblPointer[((dPointer[i] & 0x03) << 4) | ((dPointer[i + 1] & 0xf0) >> 4)];
                rPointer[c + 2] = tblPointer[((dPointer[i + 1] & 0x0f) << 2) | ((dPointer[i + 2] & 0xc0) >> 6)];
                rPointer[c + 3] = tblPointer[((dPointer[i + 2]) & 0x3f)];
                c += 4;
            }
            switch (mod)
            {
                case 1:
                    rPointer[c] = tblPointer[(dPointer[length - 1] & 0xfc) >> 2];
                    rPointer[c + 1] = tblPointer[((dPointer[length - 1] & 0x03) << 4)];
                    rPointer[c + 2] = '=';
                    rPointer[c + 3] = '=';
                    c += 4;
                    break;
                case 2:
                    rPointer[c] = tblPointer[(dPointer[length - 2] & 0xfc) >> 2];
                    rPointer[c + 1] = tblPointer[((dPointer[length - 2] & 0x03) << 4) | ((dPointer[length - 1] & 0xf0) >> 4)];
                    rPointer[c + 2] = tblPointer[((dPointer[length - 1] & 0x0f) << 2)];
                    rPointer[c + 3] = '=';
                    c += 4;
                    break;
            }
        }
        return new string(r);
    }

I looked at the Reference Source for the .NET method and found that my code is very very similar already. Is there something I'm missing or is there some sort of optimization to the built in method I don't know about?

The variable base64Table in the code is simply a char[] with the relevant base64 characters.

The results being 31 Ticks for the built in method and 2230 Ticks for my method, measured with the System.Diagnostics.Stopwatch class.

share|improve this question
    
Are you timing it in Debug or Release mode, and are you timing it on Any CPU, x86 or x64? – EBrown 4 hours ago
    
@EBrown Ahh I didn't even think of that! Was testing in Debug on Any CPU. I should try Release on 64 bit yes? – Luke Park 4 hours ago
    
If you have a 64-bit system, yes. Release -> x64 -> Build -> Open Folder -> Run. Then, for a proper benchmark, you should consider a benchmark tool (Google BenchmarkDotNet), but what you should do is loop, say 10 times on each, discard the first 10 loop results, then loop 10 more times, then take the averages for your comparison. (Right now you're likely also measuring JIT on your code.) – EBrown 4 hours ago
    
@EBrown Hmmmm just tested and the performance gap is about the same, if anything, worse. – Luke Park 4 hours ago
2  
I'll write an answer after I actually review the code. :) You're not getting away that easily. – EBrown 4 hours ago
up vote 5 down vote accepted

First and foremost: testing in Debug - Any CPU mode is bad. Debug configurations have a lot of extra overhead, and Any CPU is a non-native instruction configuration, both will cause you to get poor measurements.

Next, when you do your testing you should consider (I say consider because when doing simple comparisons you don't need to, but when trying to prove something you should definitely do so) using a proper benchmark tool like BenchmarkDotNet. (It's in NuGet so it's simple to install.)

However, if you choose not to (I'm not going to judge you for that) you should not measure the first execution of your code. You should loop it a few times (I usually use 10-128, depending on how fast the code is) and then discard those results, then do it again and keep those results. Take the average as your metric.

Why?

The first time you execute your code, the JITter (Just-In Time optimizer) will optimize your code another time. This adds a substantial amount of overhead to the first execution. Sometimes a lot of it. This can (and will) skew your results substantially.


Now that the lecture is over, let's look over the code quick:

You have a lot of "magic numbers" here, most notably 4 and 3, you should consider giving them a const identifier. (Consider what 4 and 3 mean in each instance, and what they mean if you were to be writing a Base32 converter instead of Base64.)

Your for loop (for (int i = 0; i < div * 3; i += 3)) can use length instead of div * 3 there, just as well you should assign length first, then use it instead of data.Length everywhere. I.e.:

    int div = data.Length / 3;
    int mod = data.Length % 3;

Should instead be:

    int div = length / 3;
    int mod = length % 3;

With as frequently as you use dPointer[length - 1] and dPointer[length - 2], you should consider extracting variables there for extra micro-performance boosts.

Those are the only complaints I have, good work!

share|improve this answer
    
+1 One thing though: Storing things in variables (div * 3, length, dPointer[length - 1] won't improve performance. While a naive compilation would pointlessly repeat things like array range checks, in practice the compiler will certainly eliminate common subexpressions if they're worth eliminating. Readability should be the OP's main concern here. – Nathan Cooper 31 mins ago
    
Do you have anything to back up your claim, that AnyCPU has an impact on performance? – linac 12 mins ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.