> If you look at existing GPU applications, their software implementations aren'...

> If you look at existing GPU applications, their software implementations aren't truly GPU-native. Instead, they are architected as traditional CPU software with a GPU add-on. For example, pytorch uses the CPU by default and GPU acceleration is opt-in. Even after opting in, the CPU is in control and orchestrates work on the GPU. Furthermore, if you look at the software kernels that run on the GPU they are simplistic with low cyclomatic complexity. This is not unique to pytorch. Most software is CPU-only, a small subset is GPU-aware, an even smaller subset is GPU-only, and no software is GPU-native.

> We are building software that is GPU-native. We intend to put the GPU in control. This does not happen today due to the difficulty of programming GPUs, the immaturity of GPU software and abstractions, and the relatively few developers targeting GPUs.

Really feels like fad engineering. The CPU works better as a control structure and the design of GPUs are not fitted for proper orchestration compared to CPUs. What really worries me is their mention of GPU abstractions, which is completely the wrong way to think about hardware designed for HPC. Their point about PyTorch. and kernels having low cyclomatic complexity is confusing to me. GPUs aren't optimize for control flow. The nature of SIMD/SIMT values throughput and the hardware design forgoes things like branch prediction. Having many independent paths a GPU kernel could take would make it perform much worse. You could very well end up with kernels that are slower than their optimize CPU counterparts.

I'm sure the people behind this are talented and know what they're doing, but these statements don't make sense to me. GPU algorithms are harder to reason about and implement. You often need to do more work just to gain the parallizable benefit. There aren't actually that many use cases were the GPU being the primary compute platform is a better choice. My cynical view is that people like the GPU because they compare unoptimize slow CPU code with decent GPU/tensorized code. They never see how much a modern CPU can actually do, and how fast it can be.