Low cost exceptions implementation using metaprogramming [closed]

Question

(preface - boring stuff, feel free to skip down to the implementation details)

I need to provide exception handling to a language I am working on. It "compiles" to a subset of C, and since I don't want to make C++ a dependency, and found the few available C libraries rather stiff and lacking, the only way is to come up with an implementation of my own.

Naturally, I'd like it to be as efficient as possible. Exception handling schemes seem to always come with overheads, in some cases drastic performance hits, in other cases - (almost) zero cost, at least when it comes to executing the code, but there are still overheads in terms of code size, memory usage and CPU time when exceptions are thrown.

Going down to assembly level in order to get control of the stack is not an option either - although I could cook up a basic compiler that targets assembly for a platform or two, I am not in the capacity to produce a professional grade compiler with all its optimizations and such that can target as many platforms as say GCC. So I am stuck at C.

Which is not necessarily a bad thing. How would one go about handling errors in C? Check return values, check array bounds, check pointers before usage, check for 0 before division - good old tried and true.

But not necessarily convenient. Exceptions are intended to be more "coarse grain", involving deep trees of operations that may fail in different configuration, making it an (understatement) arduous task to propagate such a naive error checking scheme over all the code.

Implementation details

Luckily, the language I am working on has advanced meta capabilities, one of which - context aware code that can modify itself and generate extra code. Meaning that tasks, which would otherwise be considered impossibly tedious if done manually can be automated fairly easy. Thus my exception handling strategy begins to take shape.

Functions and operators that may fail are marked by a failSafe specifier, and as such are given an implicit parameter, which is essentially an integer, where 0 means we are all good and other than zero means something went wrong. Internally, they do the good old fine grain error checking stuff, and use the integer to notify of the error, which is the actual exception throwing.

Functions can have both a unsafe and fail-safe flavor. Functions that themselves do not fail, but contain invocation of functions that may fail are also implicitly fail-safe.

A try block essentially creates that error code integer, and forces the compiler to "roll out" all the code nested in the try block, and change to the overloads which propagate the integer to every stack frame that contains code which may fail. Every explicitly fail-safe function inserts a check after its invocation, and in case of an error, the caller function does not resume but returns, to another such check all the way down to the try block, essentially unrolling the stack and collecting all the locals along the way.

It may seem that all the checks which would reveal no error can be omitted, and instead implement "escape" code path, which is entered on the first error and normally skipped if the function succeeds, but saving the states to make all those jumps will likely cost more than skipping the checks from successful operations will save, on top of the extra complexity.

Another useful feature fairly easy to implement is the ability to filter out what kind of errors you want protection against. A try block may be set to protect only against specific errors likely or known to occur in this scenario, reducing the overheads from error checking and code duplication.

A 32 bit integer outta be efficient enough to pass around, while still providing an ample 4+ billion of error codes. Negative values are reserved for language and library use, positive are for user exceptions. The exception error string itself can be retrieved from a global enum which accumulates the available exception types transparently from the user - cryptic integer values are not a concern, both the error and the exception itself are available in the form of readable text. Also no need for any extra allocations for the exception. Destructors are forbidden from throwing exceptions in order to not interfere with the stack unrolling. Lastly - nested exceptions are possible, as in there can be nested try blocks, directly or indirectly, and exceptions which are not caught propagate down through the try blocks until the root try block. The application does not "crash" if the exception is not handled, the operation attempted in the root try block simply fails and program execution continues.

Unfortunately, it will inevitably lead to some code duplication, having both unsafe and exception safe overloads, although I am not sufficiently aware of the details of "zero cost" implementations and can't really make a comparison with the amount of code they generate. The unrolling of the stack itself seems like it will be a little more efficient, since it will not involve any sort of loading of exception handling data, progress lookup and extra jumps, just call destructors if any and return to the previous stack frame. It doesn't require any extra stuff like a particular well defined ABI, no exception frames/tables/decoding, PC lookups or any of the complexity typically associated with exception handling implementations and requires no compiler support (from the C compiler that is, not the compiler for my language). It works as if you actually went and painstakingly propagated error checking over the call chain.

A couple of things to consider in the context of CPU time overhead:

My compiler does a lot of inlining, practically all trivial calls such as accessors, operators and such are eliminated, since they cost more code to execute as function calls than to execute inline. On top of that the C compiler is free to do even more inlining and optimization. Over 90% of the calls are omitted (stats from the library code that accompanies the language) even before the code reaches the C compiler. This means a lot of the error code propagation is eliminated as well, and the variable will spend most of its time on a register, requiring only a single clock cycle per check.

Error handling is typically used in resource creation and allocation, involving latent operations such as memory allocation or random memory access, which come at a dozen to dozens of cycles of latency. I expect this to mask off and mitigate the overhead somewhat.

Having outlined the design, what I am interested in the following:

is it a sound design, does it make sense?
am I missing something important?
is there room for improvement?
are there any downsides, flaws or limitations?
how does it compare to existing implementations in regards to memory and CPU time overheads and capabilities (no concrete numbers naturally)?

Since several people mentioned profiling, it is testing time, using GCC 4.9.2 -o3:

int datacount, callcount, repeatcount, limit;
int * data = 0;

void foo1();
void foo2();
void foo3();
void foo4();

bool bar1();
bool bar2();
bool bar3();
bool bar4();

void __attribute__ ((noinline)) foo1() {
    ++callcount;
    for (int i = 0; i < datacount; ++i) data[i] += 1;
    if ((callcount < (limit / 10)) && !(callcount % 3)) foo2();
    if ((callcount < (limit / 5)) && !(callcount % 2)) foo3();
    if (callcount < limit) foo4();
    return;
}

bool __attribute__ ((noinline)) bar1() {
    ++callcount;
    for (int i = 0; i < datacount; ++i) data[i] += 1;
    if ((callcount < (limit / 10)) && !(callcount % 3)) if (!bar2()) return false;
    if ((callcount < (limit / 5)) && !(callcount % 2)) if (!bar3()) return false;
    if (callcount < limit) if (!bar4()) return false;
    return callcount;
}

void __attribute__ ((noinline)) foo2() {
    ++callcount;
    for (int i = 0; i < datacount; ++i) data[i] -= 1;
    if ((callcount < (limit / 10)) && !(callcount % 3)) foo3();
    if ((callcount < (limit / 5)) && !(callcount % 2)) foo4();
    if (callcount < limit) foo1();
    return;
}

bool __attribute__ ((noinline)) bar2() {
    ++callcount;
    for (int i = 0; i < datacount; ++i) data[i] -= 1;
    if ((callcount < (limit / 10)) && !(callcount % 3)) if (!bar3()) return false;
    if ((callcount < (limit / 5)) && !(callcount % 2)) if (!bar4()) return false;
    if (callcount < limit) if (!bar1()) return false;
    return callcount;
}

void __attribute__ ((noinline)) foo3() {
    ++callcount;
    for (int i = 0; i < datacount; ++i) data[i] *= 2;
    if ((callcount < (limit / 10)) && !(callcount % 3)) foo4();
    if ((callcount < (limit / 5)) && !(callcount % 2)) foo1();
    if (callcount < limit) foo2();
    return;
}

bool __attribute__ ((noinline)) bar3() {
    ++callcount;
    for (int i = 0; i < datacount; ++i) data[i] *= 2;
    if ((callcount < (limit / 10)) && !(callcount % 3)) if (!bar4()) return false;
    if ((callcount < (limit / 5)) && !(callcount % 2)) if (!bar1()) return false;
    if (callcount < limit) if (!bar2()) return false;
    return callcount;
}

void __attribute__ ((noinline)) foo4() {
    ++callcount;
    for (int i = 0; i < datacount; ++i) data[i] /= 2;
    if ((callcount < (limit / 10)) && !(callcount % 3)) foo1();
    if ((callcount < (limit / 5)) && !(callcount % 2)) foo2();
    if (callcount < limit) foo3();
    return;
}

bool __attribute__ ((noinline)) bar4() {
    ++callcount;
    for (int i = 0; i < datacount; ++i) data[i] /= 2;
    if ((callcount < (limit / 10)) && !(callcount % 3)) if (!bar1()) return false;
    if ((callcount < (limit / 5)) && !(callcount % 2)) if (!bar2()) return false;
    if (callcount < limit) if (!bar3()) return false;
    return callcount;
}

Generated assembly:

foo4():
    subq    $8, %rsp
    movl    callcount(%rip), %eax
    leal    1(%rax), %ecx
    movl    datacount(%rip), %eax
    movl    %ecx, callcount(%rip)
    testl   %eax, %eax
    jle .L2
    movq    data(%rip), %rdx
    xorl    %ecx, %ecx
.L3:
    movl    (%rdx), %eax
    addl    $1, %ecx
    addq    $4, %rdx
    movl    %eax, %esi
    shrl    $31, %esi
    addl    %esi, %eax
    sarl    %eax
    movl    %eax, -4(%rdx)
    cmpl    %ecx, datacount(%rip)
    jg  .L3
    movl    callcount(%rip), %ecx
.L2:
    movl    limit(%rip), %esi
    movl    $1717986919, %edx
    movl    %esi, %eax
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    $2, %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jle .L4
    movl    %ecx, %eax
    movl    $1431655766, %edx
    imull   %edx
    movl    %ecx, %eax
    sarl    $31, %eax
    subl    %eax, %edx
    leal    (%rdx,%rdx,2), %eax
    cmpl    %eax, %ecx
    je  .L10
.L4:
    movl    %esi, %eax
    movl    $1717986919, %edx
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jle .L5
    testb   $1, %cl
    je  .L11
.L5:
    cmpl    %esi, %ecx
    jl  .L12
    addq    $8, %rsp
    ret
.L12:
    addq    $8, %rsp
    jmp foo3()
.L11:
    call    foo2()
    movl    callcount(%rip), %ecx
    movl    limit(%rip), %esi
    jmp .L5
.L10:
    call    foo1()
    movl    limit(%rip), %esi
    movl    callcount(%rip), %ecx
    jmp .L4
foo1():
    subq    $8, %rsp
    movl    callcount(%rip), %eax
    leal    1(%rax), %ecx
    movl    datacount(%rip), %eax
    movl    %ecx, callcount(%rip)
    testl   %eax, %eax
    jle .L14
    movq    data(%rip), %rax
    xorl    %edx, %edx
.L15:
    addl    $1, (%rax)
    addl    $1, %edx
    addq    $4, %rax
    cmpl    %edx, datacount(%rip)
    jg  .L15
    movl    callcount(%rip), %ecx
.L14:
    movl    limit(%rip), %esi
    movl    $1717986919, %edx
    movl    %esi, %eax
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    $2, %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jle .L16
    movl    %ecx, %eax
    movl    $1431655766, %edx
    imull   %edx
    movl    %ecx, %eax
    sarl    $31, %eax
    subl    %eax, %edx
    leal    (%rdx,%rdx,2), %eax
    cmpl    %eax, %ecx
    je  .L21
.L16:
    movl    %esi, %eax
    movl    $1717986919, %edx
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jle .L17
    testb   $1, %cl
    je  .L22
.L17:
    cmpl    %esi, %ecx
    jl  .L23
    addq    $8, %rsp
    ret
.L23:
    addq    $8, %rsp
    jmp foo4()
.L22:
    call    foo3()
    movl    callcount(%rip), %ecx
    movl    limit(%rip), %esi
    jmp .L17
.L21:
    call    foo2()
    movl    limit(%rip), %esi
    movl    callcount(%rip), %ecx
    jmp .L16
foo2():
    subq    $8, %rsp
    movl    callcount(%rip), %eax
    leal    1(%rax), %ecx
    movl    datacount(%rip), %eax
    movl    %ecx, callcount(%rip)
    testl   %eax, %eax
    jle .L25
    movq    data(%rip), %rax
    xorl    %edx, %edx
.L26:
    subl    $1, (%rax)
    addl    $1, %edx
    addq    $4, %rax
    cmpl    %edx, datacount(%rip)
    jg  .L26
    movl    callcount(%rip), %ecx
.L25:
    movl    limit(%rip), %esi
    movl    $1717986919, %edx
    movl    %esi, %eax
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    $2, %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jle .L27
    movl    %ecx, %eax
    movl    $1431655766, %edx
    imull   %edx
    movl    %ecx, %eax
    sarl    $31, %eax
    subl    %eax, %edx
    leal    (%rdx,%rdx,2), %eax
    cmpl    %eax, %ecx
    je  .L32
.L27:
    movl    %esi, %eax
    movl    $1717986919, %edx
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jle .L28
    testb   $1, %cl
    je  .L33
.L28:
    cmpl    %esi, %ecx
    jl  .L34
    addq    $8, %rsp
    ret
.L34:
    addq    $8, %rsp
    jmp foo1()
.L33:
    call    foo4()
    movl    callcount(%rip), %ecx
    movl    limit(%rip), %esi
    jmp .L28
.L32:
    call    foo3()
    movl    limit(%rip), %esi
    movl    callcount(%rip), %ecx
    jmp .L27
foo3():
    subq    $8, %rsp
    movl    callcount(%rip), %eax
    leal    1(%rax), %ecx
    movl    datacount(%rip), %eax
    movl    %ecx, callcount(%rip)
    testl   %eax, %eax
    jle .L36
    movq    data(%rip), %rax
    xorl    %edx, %edx
.L37:
    sall    (%rax)
    addl    $1, %edx
    addq    $4, %rax
    cmpl    %edx, datacount(%rip)
    jg  .L37
    movl    callcount(%rip), %ecx
.L36:
    movl    limit(%rip), %esi
    movl    $1717986919, %edx
    movl    %esi, %eax
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    $2, %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jle .L38
    movl    %ecx, %eax
    movl    $1431655766, %edx
    imull   %edx
    movl    %ecx, %eax
    sarl    $31, %eax
    subl    %eax, %edx
    leal    (%rdx,%rdx,2), %eax
    cmpl    %eax, %ecx
    je  .L43
.L38:
    movl    %esi, %eax
    movl    $1717986919, %edx
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jle .L39
    testb   $1, %cl
    je  .L44
.L39:
    cmpl    %esi, %ecx
    jl  .L45
    addq    $8, %rsp
    ret
.L45:
    addq    $8, %rsp
    jmp foo2()
.L44:
    call    foo1()
    movl    callcount(%rip), %ecx
    movl    limit(%rip), %esi
    jmp .L39
.L43:
    call    foo4()
    movl    limit(%rip), %esi
    movl    callcount(%rip), %ecx
    jmp .L38
bar4():
    subq    $8, %rsp
    movl    callcount(%rip), %eax
    leal    1(%rax), %ecx
    movl    datacount(%rip), %eax
    movl    %ecx, callcount(%rip)
    testl   %eax, %eax
    jle .L47
    movq    data(%rip), %rdx
    xorl    %ecx, %ecx
.L48:
    movl    (%rdx), %eax
    addl    $1, %ecx
    addq    $4, %rdx
    movl    %eax, %esi
    shrl    $31, %esi
    addl    %esi, %eax
    sarl    %eax
    movl    %eax, -4(%rdx)
    cmpl    %ecx, datacount(%rip)
    jg  .L48
    movl    callcount(%rip), %ecx
.L47:
    movl    limit(%rip), %esi
    movl    $1717986919, %edx
    movl    %esi, %eax
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    $2, %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jle .L49
    movl    %ecx, %eax
    movl    $1431655766, %edx
    imull   %edx
    movl    %ecx, %eax
    sarl    $31, %eax
    subl    %eax, %edx
    leal    (%rdx,%rdx,2), %eax
    cmpl    %eax, %ecx
    je  .L63
.L49:
    movl    %esi, %eax
    movl    $1717986919, %edx
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jg  .L64
.L52:
    cmpl    %esi, %ecx
    jl  .L65
.L54:
    testl   %ecx, %ecx
    setne   %al
    addq    $8, %rsp
    ret
.L64:
    testb   $1, %cl
    jne .L52
    call    bar2()
    testb   %al, %al
    je  .L53
    movl    callcount(%rip), %ecx
    movl    limit(%rip), %esi
    jmp .L52
.L65:
    call    bar3()
    testb   %al, %al
    je  .L53
    movl    callcount(%rip), %ecx
    jmp .L54
.L63:
    call    bar1()
    testb   %al, %al
    movl    limit(%rip), %esi
    movl    callcount(%rip), %ecx
    jne .L49
.L53:
    xorl    %eax, %eax
    addq    $8, %rsp
    ret
bar1():
    subq    $8, %rsp
    movl    callcount(%rip), %eax
    leal    1(%rax), %ecx
    movl    datacount(%rip), %eax
    movl    %ecx, callcount(%rip)
    testl   %eax, %eax
    jle .L67
    movq    data(%rip), %rax
    xorl    %edx, %edx
.L68:
    addl    $1, (%rax)
    addl    $1, %edx
    addq    $4, %rax
    cmpl    %edx, datacount(%rip)
    jg  .L68
    movl    callcount(%rip), %ecx
.L67:
    movl    limit(%rip), %esi
    movl    $1717986919, %edx
    movl    %esi, %eax
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    $2, %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jle .L69
    movl    %ecx, %eax
    movl    $1431655766, %edx
    imull   %edx
    movl    %ecx, %eax
    sarl    $31, %eax
    subl    %eax, %edx
    leal    (%rdx,%rdx,2), %eax
    cmpl    %eax, %ecx
    je  .L83
.L69:
    movl    %esi, %eax
    movl    $1717986919, %edx
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jg  .L84
.L72:
    cmpl    %esi, %ecx
    jl  .L85
.L74:
    testl   %ecx, %ecx
    setne   %al
    addq    $8, %rsp
    ret
.L84:
    testb   $1, %cl
    jne .L72
    call    bar3()
    testb   %al, %al
    je  .L73
    movl    callcount(%rip), %ecx
    movl    limit(%rip), %esi
    jmp .L72
.L85:
    call    bar4()
    testb   %al, %al
    je  .L73
    movl    callcount(%rip), %ecx
    jmp .L74
.L83:
    call    bar2()
    testb   %al, %al
    movl    limit(%rip), %esi
    movl    callcount(%rip), %ecx
    jne .L69
.L73:
    xorl    %eax, %eax
    addq    $8, %rsp
    ret
bar2():
    subq    $8, %rsp
    movl    callcount(%rip), %eax
    leal    1(%rax), %ecx
    movl    datacount(%rip), %eax
    movl    %ecx, callcount(%rip)
    testl   %eax, %eax
    jle .L87
    movq    data(%rip), %rax
    xorl    %edx, %edx
.L88:
    subl    $1, (%rax)
    addl    $1, %edx
    addq    $4, %rax
    cmpl    %edx, datacount(%rip)
    jg  .L88
    movl    callcount(%rip), %ecx
.L87:
    movl    limit(%rip), %esi
    movl    $1717986919, %edx
    movl    %esi, %eax
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    $2, %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jle .L89
    movl    %ecx, %eax
    movl    $1431655766, %edx
    imull   %edx
    movl    %ecx, %eax
    sarl    $31, %eax
    subl    %eax, %edx
    leal    (%rdx,%rdx,2), %eax
    cmpl    %eax, %ecx
    je  .L103
.L89:
    movl    %esi, %eax
    movl    $1717986919, %edx
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jg  .L104
.L92:
    cmpl    %esi, %ecx
    jl  .L105
.L94:
    testl   %ecx, %ecx
    setne   %al
    addq    $8, %rsp
    ret
.L104:
    testb   $1, %cl
    jne .L92
    call    bar4()
    testb   %al, %al
    je  .L93
    movl    callcount(%rip), %ecx
    movl    limit(%rip), %esi
    jmp .L92
.L105:
    call    bar1()
    testb   %al, %al
    je  .L93
    movl    callcount(%rip), %ecx
    jmp .L94
.L103:
    call    bar3()
    testb   %al, %al
    movl    limit(%rip), %esi
    movl    callcount(%rip), %ecx
    jne .L89
.L93:
    xorl    %eax, %eax
    addq    $8, %rsp
    ret
bar3():
    subq    $8, %rsp
    movl    callcount(%rip), %eax
    leal    1(%rax), %ecx
    movl    datacount(%rip), %eax
    movl    %ecx, callcount(%rip)
    testl   %eax, %eax
    jle .L107
    movq    data(%rip), %rax
    xorl    %edx, %edx
.L108:
    sall    (%rax)
    addl    $1, %edx
    addq    $4, %rax
    cmpl    %edx, datacount(%rip)
    jg  .L108
    movl    callcount(%rip), %ecx
.L107:
    movl    limit(%rip), %esi
    movl    $1717986919, %edx
    movl    %esi, %eax
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    $2, %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jle .L109
    movl    %ecx, %eax
    movl    $1431655766, %edx
    imull   %edx
    movl    %ecx, %eax
    sarl    $31, %eax
    subl    %eax, %edx
    leal    (%rdx,%rdx,2), %eax
    cmpl    %eax, %ecx
    je  .L123
.L109:
    movl    %esi, %eax
    movl    $1717986919, %edx
    imull   %edx
    movl    %esi, %eax
    sarl    $31, %eax
    sarl    %edx
    subl    %eax, %edx
    cmpl    %ecx, %edx
    jg  .L124
.L112:
    cmpl    %esi, %ecx
    jl  .L125
.L114:
    testl   %ecx, %ecx
    setne   %al
    addq    $8, %rsp
    ret
.L124:
    testb   $1, %cl
    jne .L112
    call    bar1()
    testb   %al, %al
    je  .L113
    movl    callcount(%rip), %ecx
    movl    limit(%rip), %esi
    jmp .L112
.L125:
    call    bar2()
    testb   %al, %al
    je  .L113
    movl    callcount(%rip), %ecx
    jmp .L114
.L123:
    call    bar4()
    testb   %al, %al
    movl    limit(%rip), %esi
    movl    callcount(%rip), %ecx
    jne .L109
.L113:
    xorl    %eax, %eax
    addq    $8, %rsp
    ret

I ran the test against a input parameter data set loaded from file to avoid any potential optimizations. The ran 5 times in a row, total run time about 38 minutes.

Initially foo is considerably faster - by 13.1%, but soon enough both are equalized, for a peak of 13.5% in favor of bar in the last quarter of the test.

All in all, over the duration of the entire test, bar actually takes the lead by an advantage of 3.36%.

I made a bunch of charts, hoping to find some correlation between the input parameters and the results, but so far I don't seem to detect any.

The following set of charts shows the relation between the result and the input parameters for every test run:

Total calls are complimentary to the data size, since I aimed to provide fairly uniform running time. But it does fluctuate, from a little less than 1 second to over 25 seconds.

Here is another set of charts, in which the test runs and results are sorted from the best for foo to the best for bar, overlaid by the test input parameters. Again, no correlation between parameters and results is visible to me:

Performance vs test run time, which for some reason I forgot about:

Finally, this mess, all of the charts from the last set laid over one another without any scale, in hope to reveal a correlation:

Thinking about your end-product, if performance is going to be adversely affected in ways that are noticed by the developers of this new language, then go with speed at the cost of simplicity. You can always add more complex functionality later. Exception-handling is ultimately a "secondary" feature of a language in many ways. Speed-to-market is a critical, primary concern. Although, if this is for a doctoral thesis or something that will never reach the shelves, then make it as complex as you want. :) — John Doe, Nov 21 '15 at 17:36
Well, you say it is secondary, yet many people say it is a must. And I've already dealt with "speed-to-market" or at least, would like to think so, as it already supports rapid prototyping, bytecode interpretation, JIT and "full" compilation, but I'd really like to have exception handling so it doesn't appear as a half-baked product. I'd like to not make any compromises, I want it both easy and efficient, there is no rule that says one is always at the expense of the other, more like unfortunate trend with existing languages. — ddriver, Nov 21 '15 at 17:40
Unfortunately the question was closed before I could type up my answer. Here it is (note: subject to deletion), hope it will be useful. — rwong, Nov 21 '15 at 22:39
When I run your benchmark, I get a small but consistent and measurable advantage for foo across all settings. What compiler flags are you using? — Winston Ewert, Nov 23 '15 at 17:03
@WinstonEwert - Only -O3. Why not post some numbers, I have no idea what you mean by "small but consistent and measurable advantage". — ddriver, Nov 23 '15 at 17:06

Winston Ewert · Answer 1 · 2015-11-21 18:51:18Z

up vote 3 down vote

If you restrict exceptions to integer error codes, then you cannot attach extra data. When I index an array out of bounds, its nice if the error can tell me which index I used, the size of the array, and the stack trace.

If you just tell me that I got an ArrayIndexError, that's pretty much useless.

To me, your solution comes across as overthinking it. Think about how you would implement error handling in C. Typically, the return value becomes the error indication, and what would have normally been a return value becomes an out parameter. Its pretty straightforward to see how you could compile exception handling logic to manual checks and returns. You've proposed pretty much the same thing, but swapping the return value and out parameter. But that's a minor style difference.

The whole point of zero-cost exception handling is that it costs even less then these manual checks. Constantly verifying that the error indicator isn't set takes some of the performance away from your code. This is unfortunate especially because in normal behavior, the error indicators are typically rarely set. Zero-cost exception handling doesn't do anything during normal execution, but once an exception occurs, it traces up the stack looking for exception handlers. That search is actually expensive, but in theory happens rarely.

answered Nov 21 '15 at 18:51

Winston Ewert

15.8k74779

Pardon my ignorance, but aren't exceptions thrown based on those manual checks? Yes, some exceptions may come from the OS, but generally I don't think one substitutes the other, more like error checking is a sub-component of error handling. Also, those checks usually involve data that is already on a register, and take a single clock cycle to make. The same applies to the error code variable, which will always be "hot" and if not on a register, at the very least in CPU cache. – ddriver Nov 21 '15 at 18:56

Anyway, it would be cool to avoid the unnecessary checks, but those implementations of exception handling are very very complex, that much I could get from my short time researching the subject. Done by a lot of people over the course of many years. I am just one dude with 2 years of experience, therefore I go for what has the best complexity to efficiency ratio. – ddriver Nov 21 '15 at 18:58

@ddriver, yes, some sort of manual check has to happen to throw the exception initially. The difference is in the propagation. When using manual checks, every function I call, I have to check if it returned an error. – Winston Ewert Nov 21 '15 at 18:59

@ddriver, yes, its almost certainly in a register, but there are still cycles wasted checking it. I'm not saying you need to implement these more complicated techniques. I'm just trying to give context about how it differs from what you propose. – Winston Ewert Nov 21 '15 at 19:00

As for extra error data, that can be dumped on the error spot in a reserved buffer in global scope, it is not necessary that it is attached to the exception object. Although try blocks can be nested, there will never be more than one exception thrown, so there is no danger to run out of space. As for the the clock cycles wasted on checking a register, I am honestly not too worried about it, considering the latency involved in accessing ram or even higher level cache. A million checks will cost less than a millisecond, and will mostly be in resource allocation rather than computation. – ddriver Nov 21 '15 at 19:05

| show 16 more comments

asked	7 months ago
viewed	163 times
active	7 months ago

current community

your communities

more stack exchange communities

Low cost exceptions implementation using metaprogramming [closed]

Implementation details

closed as too broad by Ixrec, Bart van Ingen Schenau, Scant Roger Nov 21 '15 at 21:29

1 Answer 1

Not the answer you're looking for? Browse other questions tagged performance exception-handling efficiency code-generation meta-programming or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Low cost exceptions implementation using metaprogramming [closed]

Implementation details

closed as too broad by Ixrec, Bart van Ingen Schenau, Scant Roger Nov 21 '15 at 21:29

1 Answer 1

Not the answer you're looking for? Browse other questions tagged performance exception-handling efficiency code-generation meta-programming or ask your own question.

Related

Hot Network Questions