Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I stumbled upon this piece of code in .NET's List source code:

// Following trick can reduce the range check by one
if ((uint) index >= (uint)_size) {
  ThrowHelper.ThrowArgumentOutOfRangeException();
}

Apparently this is more efficient (?) than if (index < 0 || index >= _size)

I am curious about the rationale behind the trick. Is a single branch instruction really more expensive than two conversions to uint? Or is there some other optimization going on that will make this code faster than an additional numeric comparison?

To address the elephant in the room: yes, this is micro optimization, no I do not intend to use this everywhere in my code – I'm just curious ;)

share|improve this question
4  
Curiosity in this case is easily satisfied with a few lines of code. Test it. –  Sam Axe 14 hours ago
1  
@SamAxe a test would only confirm that casts are faster (if they are), not explain why. –  enzi 14 hours ago
1  
A peek at the resultant IL would answer the why. –  Sam Axe 14 hours ago
4  
The two "conversions" to uint are free - the same bit patterns will exist in the same register (or in memory, but if you're lucky, in a register) –  Damien_The_Unbeliever 14 hours ago
2  
Related article(which also includes a cast from int to uint): codeproject.com/Articles/8052/… –  Tim Schmelter 14 hours ago

5 Answers 5

up vote 21 down vote accepted

From MS Parition I, section 12.1 (Supported data types):

The signed integer types (int8, int16, int32, int64, and native int) and their corresponding unsigned integer types (unsigned int8, unsigned int16, unsigned int32, unsigned int64, and native unsigned int) differ only in how the bits of the integer are interpreted. For those operations in which an unsigned integer is treated differently from a signed integer (e.g., in comparisons or arithmetic with overflow) there are separate instructions for treating an integer as unsigned (e.g., cgt.un and add.ovf.un).

That is, the conversion from an int to a uint is merely a matter of book-keeping - from now on, the value on the stack/in a register is now known to be an unsigned int rather than an int.

So the two conversions should be "free" once the code is JITted, and then the unsigned comparison operation can be performed.

share|improve this answer
2  
I'm a bit surprised that the compiler doesn't implement this optimization automatically. (Or does it?) –  Ilmari Karonen 12 hours ago
1  
@IlmariKaronen How could it? It doesn't know the meaning of the values. C# and .NET are both very specifically defined (unlike, say, C++), and this is exactly the kind of "optimization" that would be pretty unsafe to do in general. Not to mention that the JIT compiler doesn't really have enough time to look for these kinds of "optimizations", and the C# compiler itself doesn't really do a whole lot of optimizations. Just write clear code unless you can prove the performance benefits (in a place where you actually care about performance). –  Luaan 12 hours ago
2  
@Luaan: Ah, yes, I see... the problem, presumably, is that the compiler isn't smart enough to know that _size cannot be negative, so it cannot safely apply the optimization (since it's only valid when (int)_size >= 0). –  Ilmari Karonen 11 hours ago
1  
The two conversions are free before the code is jitted; since they're only ever used as locals the difference is between blt or blt.s and blt.un or blt.un.s, there's no need for the CIL produced from the C# to involve any actual conversion at all. –  Jon Hanna 10 hours ago
2  
@Luaan: It couldn't safely do it in that case, either, since the optimization also breaks if _size > int.MaxValue. The compiler could do the optimization if _size was a non-negative constant, or if it could infer from earlier code that the value of _size was always between 0 and int.MaxValue inclusive. And yes, modern compilers do commonly perform that kind of data flow analysis, although obviously there are limits to how much of it they can do (since the full problem is Turing-complete). –  Ilmari Karonen 9 hours ago

Note that this trick won't work if your project is checked instead of unchecked. Best case it will be slower (because each cast will need to be checked against overflow) (or at least not-faster), worst case you will get an OverflowException if you try to pass -1 as the index (instead of your exception).

If you want to write it "correctly" and in a more "will surely work" way, you should put a

unchecked
{
    // test
}

all around the test.

share|improve this answer
    
Checking for overflow generally doesn't slow anything down, considering how overflow checking happens on the chip. It will of course indeed throw if done in a checked context rather than the normal unchecked. This doesn't really answer the question though. –  Jon Hanna 10 hours ago
    
@JonHanna Yep... But it was too much long for a comment, and a good response was already up. And on speed: If you look at the disassembly of a cast (Release mode, so optimized code), I see a cmp dword ptr [rsp+64h],0 / jl 00000000000000A2, so a cast does explicitly a comparison if the int is < 0 then jump –  xanatos 9 hours ago

Assuming _sizeis an integer, private to the list and index is the argument to this function which's validity needs to be tested.

Assuming further that _size is always >= 0.

Then the original test would have been:

if(index < 0 || index > size) throw exception

The optimized version

if((uint)index > (uint)_size) throw exception

has one comparison (as opsed to two int he former example.) Because the cast just reinterpretes the bits and makes the >in fact an unsigned comparison, no additonal CPU cycles are used for it.

Why does it work?

The results are simple/trivial as long as index >= 0.

If index < 0 the (uint)index will turn it into a very large number:

Example: 0xFFFF is -1 as int, but 65535 as uint, thus

(uint)-1 > (uint)x 

is always true if x was positive.

share|improve this answer
2  
In a checked context you get an OverflowException. In an unchecked context you cannot rely on the result: "The result of the conversion is an unspecified value of the destination type." stackoverflow.com/questions/22757239/… –  Tim Schmelter 13 hours ago
    
@Tim the quote seems to be for For a conversion from float or double to an integral type, the processing depends on the overflow checking context, so not int->uint –  xanatos 12 hours ago
    
@xanatos: but the rules are same, if the cast fails you get an OverflowException with checked context and T-MaxValue in unchecked context. So this returns uint.Maxvalue: unchecked { uint ui = (uint)-1; };. But it is not guaranteed. If you try that with checked you get a compiler error with the -1-constant and if you use a variable an OverflowException at runtime. Btw, i was referring to "If index < 0 the (uint)index will turn it into a very large number:...." –  Tim Schmelter 12 hours ago
    
@TimSchmelter: Just to clarify, while unchecked (uint)-1 equals uint.MaxValue, unchecked (uint)-2 doesn't -- it equals uint.MaxValue-1, instead. Both are "very large" -- in fact, strictly larger than int.MaxValue -- though. –  Ilmari Karonen 11 hours ago
1  
@TimSchmelter the meaning of this cast is indeed defined and exactly what is needed here. You're quoting an answer that is quoting the wrong part of §6.2.1. The relevant case here is mentioned a couple of paragraphs before that quote begins, giving the result of "then the source value is treated as a value of the destination type.". –  Jon Hanna 9 hours ago

Let's say we have:

public void TestIndex1(int index)
{
  if(index < 0 || index < _size)
    ThrowHelper.ThrowArgumentOutOfRangeException();
}
public void TestIndex2(int index)
{
  if((uint)index >= (uint)_size)
    ThrowHelper.ThrowArgumentOutOfRangeException();
}

Let's compile these and look at ILSpy:

.method public hidebysig 
    instance void TestIndex1 (
        int32 index
    ) cil managed 
{
    IL_0000: ldarg.1
    IL_0001: ldc.i4.0
    IL_0002: blt.s IL_000d
    IL_0004: ldarg.1
    IL_0005: ldarg.0
    IL_0006: ldfld int32 TempTest.TestClass::_size
    IL_000b: bge.s IL_0012
    IL_000d: call void TempTest.ThrowHelper::ThrowArgumentOutOfRangeException()
    IL_0012: ret
}

.method public hidebysig 
    instance void TestIndex2 (
        int32 index
    ) cil managed 
{
    IL_0000: ldarg.1
    IL_0001: ldarg.0
    IL_0002: ldfld int32 TempTest.TestClass::_size
    IL_0007: blt.un.s IL_000e
    IL_0009: call void TempTest.ThrowHelper::ThrowArgumentOutOfRangeException()
    IL_000e: ret
}

It's easy to see that the second has less code, with one less branch.

Really, there's no cast at all, there's the choice of whether to use blt.s and bge.s or to use blt.s.un, where the latter treats the integers passed as unsigned while the former treats them as signed.

(Note for those not familiar with CIL, since this is a C# question with a CIL answer, bge.s, blt.s and blt.s.un are the "short" versions of bge, blt and blt.un respectively. blt pops two values off the stack and branches if the first is less than the second when considering them as signed values while blt.un pops two values of the stack and branches if the first is less than the second when considering them as unsigned values).

It's utterly a micro-opt, but there are times when micro-opt's are worth doing. Consider further, that with the rest of the code in the method body this could mean the difference between something falling inside the jitters limits for inlining or not, and if they're bothering to have a helper for throwing out of range exceptions they're probably attempting to ensure inlining happens if at all possible, and the extra 4 bytes could make all the difference.

Indeed, it's quite likely for that inlining difference to be a much bigger deal than the reduction of one branch. There aren't a lot of times when going out of your way to ensure inlining happens is worth it, but a core method of a class of such heavy use as List<T> would certainly be one of them.

share|improve this answer
    
I wish languages would include an construct to test whether a variable was within a certain range, so one wouldn't have to guess what optimizers will or won't do. If a micro-optimization like that could double the speed of a tight loop on some systems (entirely possible if it nudges an "in-lining" decision threshold), it may be very worth doing; if it won't offer any real speed-up on any systems, however, one should use the more readable form. I find it irksome when the most readable form of an expression might be comparable to the fastest form, but might not. –  supercat 9 hours ago
    
@supercat I agree in general, though here it requires wider knowledge to know that because _size is already guaranteed to be greater than 0 then (index < 0 || index < _size) == ((uint)index >= (uint)_size). Of course, a compiler that could use code-contracts as part of its optimisation decision-making could certainly do something like this, but then even optimising to beat the (moving target of the) inlining limit is a peculiar case in itself in some ways. –  Jon Hanna 9 hours ago

Yes, this is more efficient. The JIT does the same trick when range-checking array accesses.

The transformation and reasoning is as follows:

i >= 0 && i < array.Length becomes (uint)i < (uint)array.Length because array.Length <= int.MaxValue so that array.Length has the same value as (uint)array.Length. If i happens to be negative then (uint)i > int.MaxValue and the check fails.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.