The misunderstanding of floating point arithmetic and its short-comings is a major cause of surprise and confusion in programming (consider the number of questions on Stack Overflow pertaining to "numbers not adding correctly"). Considering many programmers have yet to understand its implications, it has the potential to introduce many subtle bugs (especially into financial software). What can programming languages do to avoid its pitfalls for those that are unfamiliar with the concepts, while still offering its speed when accuracy is not critical for those that do understand the concepts?
|
You say "especially for financial software", which brings up one of my pet peeves: money is not a float, it's an int. Sure, it looks like a float. It has a decimal point in there. But that's just because you're used to units that confuse the issue. Money always comes in integer quantities. In America, it's cents. (In certain contexts I think it can be mills, but ignore that for now.) So when you say $1.23, that's really 123 cents. Always, always, always do your math in those terms, and you will be fine. For more information, see:
Answering the question directly, programming languages should just include a Money type as a reasonable primitive. update Ok, I should have only said "always" twice, rather than three times. Money is indeed always an int; those who think otherwise are welcome to try sending me 0.3 cents and showing me the result on your bank statement. But as commenters point out, there are rare exceptions when you need to do floating point math on money-like numbers. E.g., certain kinds of prices or interest calculations. Even then, those should be treated like exceptions. Money comes in and goes out as integer quantities, so the closer your system hews to that, the saner it will be. |
|||||||||||||||||||||
|
Providing support for a Decimal type helps in many cases. Many languages have a decimal type, but they are underused. Understanding the approximation that occurs when working with representation of real numbers is important. Using both decimal and floating point types Providing an approximates operator would help. However, such comparisons are problematic. Note that .9999 trillion dollars is approximately equal to 1 trillion dollars. Could you please deposit the difference in my bank account? |
|||||||||||||||||
|
I don't believe anything can or should be done at a language level. |
|||||||||||||||||||||
|
We were told what to do in the first year ( sophomore) lecture in computer science when I went to university , ( this course was a pre-requisite for most science courses as well) I recall the lecturer saying "Floating point numbers are approximations. Use integer types for money. Use FORTRAN or other language with BCD numbers for accurate computation." ( and then he pointed out the approximation, using that classic example of 0.2 impossible to represent accurately in binary floating point). This also turned up that week in the laboratory exercises. Same lecture : "If you must get more accuracy from floating point, sort your terms. Add small numbers together, not to big numbers." That stuck in my mind. A few years ago I had a some spherical geometry that needed to be very accurate, and still fast. 80 bit double on PC's was not cutting it, so I added some types to the program that sorted terms before performing commutative operations. Problem solved. Before you complain about the quality of the guitar, learn to play. I had a co-worker four years ago who'd worked for JPL. He expressed disbelief that we used FORTRAN for some things. (We needed super accurate numerical simulations calculated offline.) "We replaced all that FORTRAN with C++" he said proudly. I stopped wondering why they missed a planet. |
|||||
|
The two biggest problems involving floating point numbers are:
The first type of failure can only be remedied by providing a composite type that includes value and unit information. For example, a The second type of failure is a conceptual failure. The failures manifest themselves when people think of them as absolute numbers. It affects equality operations, cumulative rounding errors, etc. For example, it may be correct that for one system two measurements are equivalent within a certain margin of error. I.e. .999 and 1.001 are roughly the same as 1.0 when you don't care about differences that are smaller than +/- .1. However, not all systems are that lenient. If there is any language level facility needed, then I would call it equality precision. In NUnit, JUnit, and similarly constructed testing frameworks you can control the precision that is considered correct. For example:
If, for example, C# or Java were altered to include a precision operator, it might look something like this:
However, if you supply a feature like that, you also have to consider the case where equality is good if the +/- sides are not the same. For example, +1/-10 would consider two numbers equivalent if one of them was within 1 more, or 10 less than the first number. To handle this case, you might need to add a
|
|||||||||
|
By default, languages should use arbitrary-precision rationals for non-integer numbers. Those who need to optimize can always ask for floats. Using them as a default made sense in C and other systems programming languages, but not in most languages popular today. |
|||||||||||||
|
What can programming languages do? Don't know if there's one answer to that question, because anything the compiler/interpreter does on the programmer's behalf to make his/her life easier usually works against performance, clarity, and readability. I think both the C++ way (pay only for what you need) and the Perl way (principle of least surprise) are both valid, but it depends on the application. Programmers still need to work with the language and understand how it handles floating points, because if they don't, they'll make assumptions, and one day the perscribed behavior won't match up with their assumptions. My take on what the programmer needs to know:
|
||||
|
Use sensible defaults, e.g. built-in support for decmials. Groovy does this quite nicely, although with a bit of effort you can still write code to introduce floating point imprecision. |
||||
|
I agree there's nothing to do at the language level. Programmers must understand that computers are discrete and limited, and that many of the mathematical concepts represented in them are only approximations. Never mind floating point. One has to understand that half of the bit patterns are used for negative numbers and that 2^64 is actually quite small to avoid typical problems with integer arithmetic. |
||||
|
If more programming languages took a page from databases and allowed developers to specify the length and precision of their numeric data types, they could substantially reduce the probability of floating point related errors. If a language allowed a developer to declare a variable as a Float(2), indicating that they needed a floating point number with two decimal digits of precision, it could perform mathematical operations much more safely. If it did so by representing the variable as an integer internally and dividing by 100 before exposing the value, it could improve speed by using the faster integer arithmetic paths. The semantics of a Float(2) would also let developers avoid the constant need to round data before outputting it since a Float(2) would inherently round data to two decimal points. Of course, you'd need to allow a developer to ask for a maximum-precision floating point value when the developer needs to have that precision. And you would introduce problems where slightly different expressions of the same mathematical operation produce potentially different results because of intermediate rounding operations when developers don't carry enough precision in their variables. But at least in the database world, that doesn't seem to be too big a deal. Most people aren't doing the sorts of scientific calculations that require lots of precision in intermediate results. |
|||||||||
|
One thing I would like to see would be a recognition that
If one has a By contrast, if one has a Although the IEEE-744 standard requires that floating-point maths be performed as though every floating-point number represents the exact numerical quantity precisely at the center of its range, that should not be taken to imply that floating-point values actually represent those exact numerical quantities. Rather, the requirement that the values be assumed to be at the center of their ranges stems from three facts: (1) calculations must be performed as though the operands have some particular precise values; (2) consistent and documented assumptions are more helpful than inconsistent or undocumented ones; (3) if one is going to make a consistent assumption, no other consistent assumption is apt to be better than assuming a quantity represents the center of its range. Incidentally, I remember some 25 years or so ago, someone came up with a numerical package for C which used "range types", each consisting of a pair of 128-bit floats; all calculations would be done in such fashion as to compute the minimum and maximum possible value for each result. If one performed a big long iterative calculation and came up with a value of [12.53401391134 12.53902812673], one could be confident that while many digits of precision were lost to rounding errors, the result could still be reasonably expressed as 12.54 (and it wasn't really 12.9 or 53.2). I'm surprised I haven't seen any support for such types in any mainstream languages, especially since they would seem a good fit with math units that can operate on multiple values in parallel. (*)In practice, it's often helpful to use double-precision values to hold intermediate computations when working with single-precision numbers, so having to use a typecast for all such operations could be annoying. Languages could help by having a "fuzzy double" type, which would perform computations as double, and could be freely cast to and from single; this would be especially helpful if functions which take parameters of type |
||||
|
These above are applicable in some cases, but not really a general solution for dealing with float values. The real solution is to understand the problem and learn how to deal with it. If you're using float point calculations, you should always check is your algorithms are numerically stable. There is huge field of mathematics/computer science which relates to the problem. It's called Numerical Analysis. |
||||
|
One thing languages could do--remove the equality comparison from floating point types other than a direct comparison to the NAN values. Equality testing would only exist is as function call that took the two values and a delta, or for languages like C# that allow types to have methods an EqualsTo that takes the other value and the delta. |
||||
|
I find it strange that nobody has pointed out the Lisp family's rational number trick. Seriously, open sbcl, and do this:
That should help somewhat in some situations, shouldn't it? |
||||
|
As other answers have noted, the only real way to avoid floating point pitfalls in financial software is not to use it there. This may actually be feasible -- if you provide a well-designed library dedicated to financial math. Functions designed to import floating-point estimates should be clearly labelled as such, and provided with parameters appropriate to that operation, e.g.:
The only real way to avoid floating point pitfalls in general is education -- programmers need to read and understand something like What Every Programmer Should Know About Floating-Point Arithmetic. A few things that might help, though:
|
||||
|
Most programmers would be surprised that COBOL got that right... in the first version of COBOL there was no floating point, only decimal, and the tradition in COBOL continued until today that the first thing you think of when declaring a number is decimal... floating point would only be used if you really needed it. When C came along, for some reason, there was no primitive decimal type, so in my opinion, that's where all the problems started. |
|||||||||||||||||||||
|