Code Review Stack Exchange is a question and answer site for peer programmer code reviews. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

This is some fairly heavily optimized C#. Aside from reading like C++ code due to all of the pointers, does this pass sanity checks, and is there any way to improve it, while keeping its speed (or perhaps a performant way to fix the weighting issue)? For reference, it takes ~170ns to produce each 16 character string when averaged out over 1 million iterations.

Requirements were that it produce Base36 (ie. all capital letters and numbers) strings of specific sizes (typically 16 in length), with a fairly even distribution (ie. avoiding collisions). I'm aware of the slight weighting towards the lowerbound characters (specifically the first 15 in the constant due to ushort not lining up with a multiple of 36)

    private const int PoolSize = 4096;
    private const string Base36 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ‏";
    private static readonly Lazy<RNGCryptoServiceProvider> Crypto = new Lazy<RNGCryptoServiceProvider>(() => new RNGCryptoServiceProvider());
    private static readonly ThreadLocal<byte[]> BytePoolProvider = new ThreadLocal<byte[]>(() => new byte[PoolSize]);
    private static ThreadLocal<int> currentIndex = new ThreadLocal<int>(() => PoolSize);

    public static unsafe string Create(int size)
    {
        if (size > 1024) return CreateLarge(size); //not included in this post, essentially the same but allocates a new byte array every time
        //Get a reference to the byte pool for our thread.
        var bytePool = BytePoolProvider.Value;
        //Get a reference to the current index for our thread.
        var index = currentIndex.Value;

        //Allocate a new string of length size.
        var newString = new string('\0', size);
        //Get a pointer to the first character of newString and Base36
        fixed (char* buffer = newString, b36 = Base36)
        {
            //reset the byte pool if we'd go out of bounds on it.
            if (index + size * 2 > PoolSize)
            {
                Crypto.Value.GetBytes(bytePool);
                index = 0;
            }
            //Get a pointer to the byte pool at our current index
            fixed (byte* pbyte = &bytePool[index])
            {
                for (var cnt = 0; cnt < size; cnt++)
                {
                    //Now here's where the real "magic" begins.
                    //&pbyte[cnt * 2] obtains a pointer to the byte by current step, skipping every other byte because we're casting to ushort
                    //*((ushort*) lets us use the data at the pointer (and its next byte) as a ushort, which is then truncated to 36 via modulous,
                    //and then used to index on the pointer to the base36 constant string.
                    buffer[cnt] = b36[(*((ushort*) &pbyte[cnt*2]))%36];
                }
                //Increment the index for this thread.
                currentIndex.Value = index + size*2;
                //After we've done all that, we want the string, right?
                return newString;
            }
        }
    }
share|improve this question
    
This is a really nice piece of code. One question Crypto.Value.GetBytes(bytePool); fills the byte array with random values -- have you considered doing the work yourself with a fixed value? I would assume that it might be a tiny bit more performant since you don't have to generate the random values and the values by themselves don't matter anyway. – Jeroen Vannevel Aug 28 '15 at 18:25
    
@JeroenVannevel: The values do matter, they're being used to index on the Base36 string for the random characters. They need to be not just pseudorandom, but have a good bit of entropy to avoid collisions. – willaien Aug 28 '15 at 18:33
up vote 1 down vote accepted

After some review, I've ended up coming up with the following, which satisfies my request for performance enhancements (~2x faster than the code in my original question), and accuracy (shouldn't have any problems with weighting towards certain characters) by taking a few shortcuts (took out the modulus operator by expanding the Base36 constant, etc.) and simply using a single byte from the pool per character by arbitrarily skipping any above 251. I'll keep the question open for a while to see if anyone else comes along with better ideas.

    private const int PoolSize = 4096;
    private const string Base36 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ‏";
    private static readonly Lazy<RNGCryptoServiceProvider> Crypto = new Lazy<RNGCryptoServiceProvider>(() => new RNGCryptoServiceProvider());
    private static readonly ThreadLocal<byte[]> BytePoolProvider = new ThreadLocal<byte[]>(() =>
    {
        var pool = new byte[PoolSize];
        Crypto.Value.GetBytes(pool);
        return pool;
    });
    private static ThreadLocal<int> currentIndex = new ThreadLocal<int>(() => 0);

    public static unsafe string Create(int size)
    {
        //Get a reference to the byte pool for our thread.
        var bytePool = BytePoolProvider.Value;
        //Get a reference to the current index for our thread.
        var index = currentIndex.Value;

        //Allocate a new string of length size.
        var newString = new string('\0', size);
        //Get a pointer to the first character of newString and Base36
        fixed (char* buffer = newString, b36 = Base36)
        {
            //Get a pointer to the byte pool
            fixed (byte* pbyte = bytePool)
            {
                for (var cnt = 0; cnt < size; cnt++)
                {
                    //Bounds check the pool, make sure we're not going over.
                    if (++index >= PoolSize)
                    {
                        Crypto.Value.GetBytes(bytePool);
                        index = 0;
                    }
                    byte currentValue;
                    while ((currentValue = pbyte[index]) >= 252) //Skip bytes of value greater than 251, as they will skew our results.
                    {
                        if (++index >= PoolSize)
                        {
                            Crypto.Value.GetBytes(bytePool);
                            index = 0;
                        }
                    }
                    //Yay for unsafe. These are all pinned, and bounds checked before we get to here, so, we can skip the bounds check.
                    buffer[cnt] = b36[currentValue];
                }
                //Increment the index for this thread.
                currentIndex.Value = index;
                //After we've done all that, we want the string, right?
                return newString;
            }
        }
    }
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.