More

stephencanon · 2025-12-04T13:33:39 1764855219

Schubfach's table is quite large compared to some alternatives with similar performance characteristics. swiftDtoa's code and tables combined are smaller than just Schubfach's table in the linked implementation. Ryu and Dragonbox are larger than swiftDtoa, but also use smaller tables than Schubfach, IIRC.

If I$ is all you care about, then table size may not matter, but for constrained systems, other algorithms in general, and swiftDtoa in particular, may be better choices.

stephencanon · 2025-11-10T11:52:04 1762775524

IEEE 754 is a floating point standard. It has a few warts that would be nice to fix if we had tabula rasa, but on the whole is one of the most successful standards anywhere. It defines a set of binary and decimal types and operations that make defensible engineering tradeoffs and are used across all sorts of software and hardware with great effect. In the places where better choices might be made knowing what we know today, there are historical reasons why different choices were made in the past.

DEC64 is just some bullshit one dude made up, and has nothing to do with “floating-point standards.”

dbcurtis · 2025-11-10T16:18:39 1762791519

It is important to remember that IEEE 754 is, in practice, aspirational. It is very complex and nobody gets it 100% correct. There are so many end cases around the sticky bit, quiet vs. signaling NaNs, etc, that a processor that gets it 100% correct for every special case simply does not exist.

One of the most important things that IEEE 754 mandates is gradual underflow (denormals) in the smallest binate. Otherwise you have a giant non-monotonic jump between the smallest normalizable float and zero. Which plays havoc with the stability of numerical algorithms.

jcranmer · 2025-11-10T17:04:44 1762794284

Sorry, no. IEEE 754 is correctly implemented in pretty much all modern hardware [1], save for the fact that optional operations (e.g., the suggested transcendental operations) are not implemented.

The problem you run into is that the compiler generally does not implement the IEEE 754 model fully strictly, especially under default flags--you have to opt into strict IEEE 754 conformance, and even there, I'd be wary of the potential for bugs. (Hence one of the things I'm working on, quite slowly, is a special custom compiler that is designed to have 100% predictable assembly output for floating-point operations so that I can test some floating-point implementation things without having to worry about pesky optimizations interfering with me).

[1] The biggest stumbling block is denormal support: a lot of processors opted to support denormals only by trapping on it and having an OS-level routine to fix up the output. That said, both AMD and Apple have figured out how to support denormals in hardware with no performance penalty (Intel has some way to go), and from what I can tell, even most GPUs have given up and added full denormal support as well.

stephencanon · 2025-09-16T15:58:06 1758038286

The orbital example where BDF loses momentum is really about the difference between a second-order method (BDF2) and a fourth-order method (RK), rather than explicit vs implicit (but: no method with order > 2 can be A-stable; since the whole point of implict methods is to achieve stability, the higher order BDF formulas are relatively niche).

There are whole families of _symplectic_ integrators that conserve physical quantities and are much more suitable for this sort of problem than either option discussed. Even a low-order symplectic method will conserve momentum on an example like this.

Certhas · 2025-09-16T16:45:33 1758041133

Obviously^1. But it illustrates the broader point of the article, even if for the concrete problem even better choices are available.

1) if you have studied these things in depth. Which many/most users of solver packages have not.

_alternator_ · 2025-09-16T16:46:27 1758041187

The fascinating thing is that discrete symplectic integrators typically can only conserve one of the physical quantities exactly, eg angular momentum but not energy in orbital mechanics.

srean · 2025-09-16T23:23:39 1758065019

I have always wanted to know if there is any theorem that says one cannot preserve all of the standard invariants.

For example, we know for mappings that we cannot preserve angles, distances and area simultaneously.

_alternator_ · 2025-09-23T16:15:04 1758644104

The short answer is that discretization can generally preserve only one invariant exactly; others must be approximate.

This could provide some evidence for the universe not being truly discrete since we have multiple apparent exactly preserved kinematic quantities, but it’s hard to tell experimentally since proposed discrete space times have discretization sizes on the order of hbar, which means deviations from the continuum would be hard to detect.

srean · 2025-09-26T19:22:52 1758914572

Thanks for replying on this now rather inactive thread.

I am really curious about this issue and am looking for a theorem that gives an impossibility result (or an existence result).

It might be well known but I don't know DE to be aware about f the result.

ekelsen · 2025-09-16T16:26:42 1758040002

leapfrog!

stephencanon · 2025-09-10T13:39:39 1757511579

> Where you are not under any circumstances can be robbed by a random person on a street.

I will be very surprised if there's anywhere in the world where the expected loss from being robbed on the street while walking exceeds the expected loss from being in a car accident while driving.

Getting in a car is by far the most dangerous thing most people do routinely.

stephencanon · 2025-08-18T17:05:19 1755536719

I don't think most people think to do either direction by hand; it's all just matrix multiplication, you can multiply them in whatever order makes it easier.

eigenspace · 2025-08-18T18:05:36 1755540336

Im just talking about the general algorithm to write down the derivative of `f(g(h(x)))` using the chain rule.

For vector valued functions, the naive way you would learn in a vector calculus class corresponds to forward mode AD.

stephencanon · 2025-08-12T00:07:15 1754957235

All 48 peaks on the AMC white mountains 4000-footers¹ list in one continuous trek (no driving/shuttling/etc between trailheads).

¹ this list is outdated vis-a-vis modern mapping and includes at least one peak shorter than 4000 feet (Tecumseh) and omits at least one peak that should qualify per the rules (Guyot), but if the list were updated they would still have completed the direttissima, since they passed over Guyot on the way to the Bonds (dropping Tecumseh could only make the diretissima easier, but I'm not sure it makes much of a difference; it's been a decade or so since I hiked that section of the whites).

As an aside, that day 5 from Wildcat to Cabot is absolutely brutal even if you're fresh, to say nothing of having already covered 180 miles in the previous four days.

stephencanon · 2025-08-11T17:02:36 1754931756

Worth noting that division (integer, fp, and simd) has gotten much cheaper in the last decade. Division is partially pipelined on common microarchitectures now (capable of delivering a result every 2-4 cycles) and have greatly reduced latency from ~30-80 cycles down to ~10-20 cycles.

This improvement is sufficient to tip the balance toward favoring division in some algorithms where historically programmers went out of their way to avoid it.

stephencanon · 2025-08-10T01:51:21 1754790681

Yeah, we live far north of NYC where it gets much colder, and have never spent nearly that much on heating. Even when we lived in a converted barn from the 1930s with single pane windows and no wall insulation, the most we ever spent was about $500/month. Now (new construction, triple-pane windows, ground-sourced heat pump) it’s more like $80/month

stephencanon · 2025-08-07T13:35:29 1754573729

Raising children involves a whole lot of simple constraints that you gradually relax.

“Don’t touch the knife” becomes “You can use _this_ knife, if an adult is watching,” which becomes “You can use these knives but you have to be careful, tell me what that means” and then “you have free run of the knife drawer, the bandages are over there.” But there’s careful supervision at each step and you want to see that they’re ready before moving up. I haven’t seen any evidence of that at all in LLM training—it seems to be more akin to handing each toddler every book ever written about knives and a blade and waiting to see what happens.

stephencanon · 2025-08-05T18:24:03 1754418243

"Only slightly faster in decompression time."

m5 vs -19 is nearly 2.5x faster to decompress; given that most data is decompressed many many more times (often thousands or millions of times more, often by devices running on small batteries) than it is compressed, that's an enormous win, not "only slightly faster".

The way in which it might not be worth it is the larger size, which is a real drawback.

arp242 · 2025-08-05T18:30:30 1754418630

The difference is barely noticeable in real-world cases, in terms of performance or battery. Decoding images is a small part of loading an entire webpage from the internet. And transferring data isn't free either, so any benefits need to be offset against the larger file size and increased network usage.

fhcbix · 2025-08-06T05:51:44 1754459504

When you talk about images over HTTP, you need to consider most web servers and browsers already support zstd compression on the transport, so the potential bandwidth win provided by zstd is already being made use of today.

arp242 · 2025-08-06T09:29:47 1754472587

I'm not sure how that's relevant for a new "ZPNG" format vs. lossless WebP?

m463 · 2025-08-05T23:05:28 1754435128

you have to do the math - do you have more bandwidth or storage or cpu?

Not related to images, but I remember compressing packages of executables and zstd was a clear winner over other compression standards.

Some compression algorithms can run in parallel, and on a system with lots of cpus it can be a big factor.

fmbb · 2025-08-05T18:55:42 1754420142

Win how?

More efficiency will inevitably only lead to increased usage of the CPU and in turn batteries draining faster.

https://en.wikipedia.org/wiki/Jevons_paradox

hcs · 2025-08-05T19:02:11 1754420531

So someone is going to load 2.5x as many images because it can be decoded 2.5x faster? The paradox isn't a law of physics, it's an interesting observation about markets. (If this was a joke it was too subtle for me)

snickerdoodle12 · 2025-08-05T19:05:41 1754420741

Might as well just shoot yourself if that's how you look at improvements. The only way to do something good it to stop existing. (this is a general statement, not aimed at you or anyone in particular)