Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Beating Floating Point at Its Own Game: Posit Arithmetic [pdf] (johngustafson.net)
119 points by speps on June 30, 2017 | hide | past | favorite | 46 comments


> There are no “NaN” (not-a-number) bit representations with posits; instead, the calculation is interrupted, and the interrupt handler can be set to report the error and its cause, or invoke a workaround and continue computing, but posits do not make the logical error of assigning a number to something that is, by definition, not a number. This simplifies the hardware considerably.

What a strange claim. Outputting a NaN is easier than raising an interrupt, or could be done within an interrupt, and detecting a NaN input requires a handful of gates or less.

This is not to say that NaNs are a good or bad method, but they're definitely not expensive to implement.

> posits lack a separate ∞ and −∞ [...] “negative zero” is another defiance of mathematical logic that exists in IEEE floats.

I will note that the IEEE standard almost had a projective infinity mode, and x87 has that mode.

> floats are asymmetric and use those bit patterns for a vast and unused cornucopia of NaN values

If we want to ignore languages like javascript and lua, sure.


> Outputting a NaN is easier than raising an interrupt

That's probably why the recent RISC-V architecture doesn't even give the option of raising an interrupt on floating point errors ("As allowed by the standard, we do not support traps on floating-point exceptions in the base ISA, [...]"). Going even farther, it also doesn't raise an interrupt on integer divide by zero (https://content.riscv.org/wp-content/uploads/2017/05/riscv-s...). That probably simplifies out-of-order implementations, since the only things that can trap then are the instruction decoder (instruction fetch errors and invalid instructions), the load/store instructions, and special instructions (system calls, breakpoints, and like).

(It also doesn't propagate NaN payloads, always returning the same canonical NaN whenever an operation produces a NaN).


> If a programmer finds the need for NaN values, it indicates the program is not yet finished...

These guys have never allreduced a timestep in a simulation where something can go aphysical.

I mean, sure, one can communicate aphysical conditions using out-of-band means. But it is so pretty when it just rides along for free as a NaN result.

Edit: I stand corrected having looked up authors. I still think having float effectively be Maybe[float] is a feature not a bug.


Quiet NaN is critically different from the Maybe monad's Nothing value. Maybe gives you proper exception semantics even if it's implemented by means of predicated data flow rather than early-out control flow (and in a pure, non-strict language like Haskell there isn't a real distinction between data flow and control flow). Whereas quiet NaNs have a significant issue with non-termination where you have to very carefully express all your loop conditions positively. For example, 'while (x - y > eps)' will terminate if either x, y or eps are NaN, but any code that relies on 'if (positive_condition) break' and similar constructs for termination will loop or recurse forever if the condition operands become NaN.

A closer analogue of NaN might be the x ("unknown") value in the three/four-valued logic of hardware simulation languages like Verilog. But the situation in Verilog, despite its challenges, is so much better since you don't have a hard discontinuity between "numbers" and "logical values". Numerical comparisons between unknowns can produce unknown logical values, and you have equations like true & unknown = unknown, true | unknown = true, ~unknown = unknown, etc, so you don't have the issue with NaNs where as soon as something crosses from the floating-point realm to the integer or logical realm, it has to commit to a definite value, with all the aforementioned issues.

All that said, NaNs were the right design choice given the constraints. But it's far from ideal.


> This is not to say that NaNs are a good or bad method, but they're definitely not expensive to implement.

I saw him explain that in float, there a too many bit representations that amount to "NaN". According to a Stack Overflow I found:

"IEEE 754 standard defines 16,777,214 32-bit floating point values as NaNs, or 0.4% of all possible values."

That's "expensive" in terms of losing bits of expressiveness that could have been used to represent actual numbers.


Cost of a NaN Operation on chip- same as every other operation, as NaN just is handled and returned like another floating point value - always resulting in a new NaN value- thus a error invalidates all resulting wrong results.

Cost of a Interrupt: 100 ns to 1 microseconds (Quora) Sorry, that solution is simply not interesting for most implementations where floats are used. There are sensors which in realtime hammer out so many values, that not using NaNs means dropping part of your sensor values.

This is a classic case of re-inventing a optimal wheel (fine for racing) and not really looking at the use-cases (not fine for a lot of normal day to day driving).

Im sure it will bring the groupies on conferences.


> Cost of a Interrupt: 100 ns to 1 microseconds

That sounds like the time it takes to do a context switch to the OS. A math error interrupt doesn't need to do a context switch. The cost doesn't need to be any higher than a branch misprediction at 10-20 cycles.


I stand corrected, sorry, it was late at night and yes of course its a floating point operation that goes sour, which in assembly would be handled and then the result passed to the programm handling. Still expensive when encountered in mass though, on a micro controler.

Thanks for putting it right, before the missinformation could spread.


> That's "expensive" in terms of losing bits of expressiveness that could have been used to represent actual numbers.

The double precision floating point format already represents between 15 and 17 significant digits, and those who, for some reason, believe they need more expressiveness can already use the illusive and rarely needed 80-bit extended precision floating point format.

0.4% of anything is nothing.


Interrupts on NaNs are extremely expensive to implement on vector and GPU processors. We learned that in the Cray-1 days. Handling the exception in a vector iperation kicks off an ultra-expensive context switch. A NaN does not break the flow.


It seems like an antipattern. If your execution time is bound by an "ultra expensive context switch" occurring when you hit a Nan, then something is wrong with your algorithm.


> Interrupts on NaNs are extremely expensive to implement on vector and GPU processors.

That's ok, because NaNs are exceptional responses to numerical errors thrown during number crunching, indicating that something failed unexpectedly and the output is crap.


Why don't they just change the current +/-inf to NaN, and then have the adjacent values be -inf or +inf.

Or is that exception hard to implement?


here is the spec we drafted a few months ago on exactly what you suggest.

https://github.com/interplanetary-robot/SigmoidNumbers/blob/...


Doesn't that proposal remove the infinities altogether?


Yep!


So... not exactly what I suggested then?


oh whoops, reading comprehension fail! Yeah that is an interesting idea that crossed my mind! I'm a little bit hesitant to do that because 1) infinities are not really that important to have, and 2) if we have them we want them to be exact values in the Valid representation, which any value ending in a binary 1 would not be.


Great observation! That's a compuational mode we are discussing. It's actually trivial to implement.


Another option could be to borrow a trick from floating point, and have a separate status register with bits that represent the kinds of errors encountered. Before a many-step calculation, software could clear these bits, and if any of them is set after the calculation, there was an error somewhere. Of course, since you don't have a NaN, you have to specify the bit pattern which will be returned by each kind of failing operation (like RISC-V does with integer divide by zero).

That doesn't help if the result could be partially valid, but if a "square root of minus one" or similar anywhere in the calculation means the result is invalid (and would be a NaN under floating point), sticky error bits in a status register could be sufficient.


question: How do you access this status register using standard C? (because if you can't, basically no one will use it)


You use feclearexcept() and fetestexcept() from fenv.h: https://linux.die.net/man/3/fenv


Awesome, thanks!!!


The LINPACK benchmark feels a bit like bragging. Of course you can demand an exact dot-product operation for any implementation of your standard, and then use it to compute an exact vector-matrix product, but floats could do the same if it were part of the standard. The fact that it requires a 1024-bit accumulator makes me doubt that it would be used for a massively parallel implementation e.g. in GPUs.

The overall idea of choosing a number distribution that loses less precision for common operations and orders of magnitude is pretty compelling, though.


The accumulator is pipelineable. For most graphics and ml applications you probably don't need the exact dot product. You probably do want it for scientific applications... There is a trade-off in performance, as always, an appeal to calculation speed though falls to the retort of, sure, you can calculate wrong answers with higher throughput if you wished.


Here's a C/C++ partial implementation of posits:

https://github.com/libcg/bfp


bfp uses C++ classes which store the format hyperparameters as fields. If you'd rather a C/C++ library memory layout which reflects the bitwidth of the particular posit:

https://github.com/Etaphase/FastSigmoids.jl

contains both a julia library and a C/C++ library. Posits are implemented as a C type and a C++ class where the type and class sizes correspond to the bitwidth.


Dev here, I'm planning to transition to a C library with fixed size posit implementations. I need all the help I can get :)

FastSigmoids uses float and double types which would be problematic in embedded environments.


Shoot me a message. I am planning on making a bitwise/intmath implementation in C. Things might be accelerated if you know someone interested in sponsoring it.


Thanks! Can you elaborate on what's missing?


Conversion from/to float and double, multiplication, division and reciprocation are implemented and should be working properly.

The next big TODOs are addition, substraction and rounding. Then advanced operations like sqrt, fma, etc. As part of the posit spec.

I've been working alone on that project, this being my first time working with floating point representation. I'm happy I could even get to that stage, and I hope someone can pick up my work and start contributing.


Posits are actually quite reasonable, but there's a lot of either ignorance or disingenuousness in this article, which is really too bad. I wish that John would ditch the hyperbole and solicit feedback from other experts, because posits are not a bad idea, but the presentation continues to give him the trappings of a crank.

I'll unpack just the first example that jumped out at me:

> Currently, half-precision (16-bit) IEEE floats are often used for this purpose, but 8-bit posits have the potential to be 2−4× faster. An important function for neural network training is a sigmoid function, a function f(x) that is asymptotically 0 as x → −∞ and asymptotically 1 as x → ∞. A common sigmoid function is 1/(1 + e−x) which is expensive to compute, easily requiring over a hundred clock cycles because of the math library call to evaluate exp(x), and because of the divide.

"have the potential to be" 2-4x faster? Sure. But until we see an implementation, 1-2x is much more likely (closer to the 1x end of the spectrum). Commodity hardware runs 32b IEEE float multiplications with 3-4 cycles latency and single-cycle throughput (or better). 16b can be made faster if designers care to. There's simply not much "faster" available for posits to inhabit (2-4x faster than 16b float would be faster than small integer arithmetic).

Evaluating a sigmoid function requires "over a hundred clock cycles" only in the most naive possible implementation sitting on top of a lousy math library. Using 32b floats on a current generation phone or laptop with a decent math library, a naive scalar implementation of the sigmoid function has a latency of less than 50 cycles. But latency doesn't matter at all in a machine learning context; we're interested only in throughput. On a machine with AVX-512, a single core can evaluate a sigmoid function with a throughput of about 1.25 cycles / input. In full-precision 32b floating-point (i.e. a relative error of ~10^-7). John's proposed posit implementation has a relative error of about 10^-1. If we target that error threshold, we can trivially go below 1 cycle/input in 32b or 16b float on a phone. So IEEE floats are at least two orders of magnitude faster than he claims. You need to go back more than 15 years for the numbers that the paper tosses around to even be plausible.

There are several other examples like this in the paper. I don't want to be too antagonistic, because posits are not a bad idea (actually, I think they're a pretty good format), but this paper is either ignorant of the state of the art, or more marketing than science.


I'm just going to be blunt here. John and I have decided that we need to be more marketing savvy after he's had trouble with several rounds pitching other floating point formats. Posits are just an intermediate step to try to build acceptance for valids, so there's a lot of effort put into branding.

A couple of points: 2-4x faster means 2x faster in dot product based simd and 4x faster in matrix simd, assuming that your bottleneck is memory throughput.

The sigmoid function realistically isn't a bottleneck in general, but you gotta admit it is pretty cool to have a ~zero clock cycle approximation. (I've tried to rein in John a bit on this one)


If you're bound by memory throughput, you can't go beyond a 2x speedup (there's 1/2 as much data to move in an 8b format, whether it's in vectors or matrices doesn't matter). I still don't see any reasonable expectation for 4x.


if your matrix contents are static during your rate-limiting step (as they are for most DL applications) your FLOPs scale with O(n^2) relative to your memory throughput on your vector component.

[a b, c d] dot [e, f]

is four multiplies

[a b c, d e f, g h i] dot [j, k, l]

is nine multiplies.


HN discussion from 2 years ago:

https://news.ycombinator.com/item?id=9943589


This is different from unums which it says so in the second line of the Abstract.


Actual discussion of posits from 3 months ago

https://news.ycombinator.com/item?id=14013971

John Gustafson (the author) has an account

https://news.ycombinator.com/threads?id=jlgustafson


Apologies - my bad.


I read the book on Type I Unums, it was quite interesting but I'm not an expert in this field. However, I appreciate the fact you don't have to buy a book to find his new research.


That looks really cool. Is there hardware support for posits anywhere?


John Gustafson is on the board of Rex Computing, a small semi startup. They claim to have taped out last year, but I don't know if the chip has been validated or if it had posits in silicon, but I know that is one of their longer term goals.

I find Gustafson's UNUMS 2.0 more compelling than the first version.


Founder of REX Computing here, we taped out in July of 2016 and got our silicon brought up and working back in February. I gave a talk at Stanford showing our hardware and a tiny bit of software: https://www.youtube.com/watch?v=ki6jVXZM2XU

Type 2 unums are pretty much entirely deprecated by type 3 unums (now given the name 'posits' as referred to in the OP's linked paper)... they are basically superior in every way, and I recommend watching John's February 2nd talk at Stanford.


Does the Rex Neo have posits?


John's presentation more or less mirrors the topics covered in the paper: https://www.youtube.com/watch?v=aP0Y1uAA-2Y


What's the smallest positive integer that 64-bit posits cannot represent exactly?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: