One of the founders here, feel free to ask whatever. We purposefully didn't put much technical detail in the post as it is an announcement post (other people posted it here, we didn't).
2. Can modern GPU hardware efficiently make system calls? (if you can do this, you can eventually build just about anything, treating the CPU as just another subordinate processor).
3. At what order-of-magnitude size might being GPU-native break down? (Can CUDA dynamically load new code modules into an existing process? That used to be problematic years ago)
Thinking about what's possible, this looks like an exceptionally fun project. Congrats on working on an idea that seems crazy at first glance but seems more and more possible the more you think about it. Still it's all a gamble of whether it'll perform well enough to be worth writing applications this way.
1. The GPU owns the control loop And the only sparingly kicks to the CPU when it can't do something.
2. Yes
3. We're still investigating the limitations. A lot of them are hardware dependent, obviously data center cards have higher limits more capability than desktop cards.
Thanks! It is super fun trailblazing and realizing more of the pieces are there than everybody expects.
Pedantic note: rust-cuda was created by https://github.com/RDambrosio016 and he is not currently involved in VectorWare. rust-gpu was created by the folks at embark software. We are the current maintainers of both.
We didn't post this or the title, we would never claim we created the projects from scratch.
Yes, it is all these folks getting together and getting resources to push those projects to the next level: https://www.vectorware.com/team/
wgpu, ash, and cudarc are great. We're focusing on the actual code that runs on the GPU in Rust, and we work with those projects. We have cust in rust-cuda, but that existed before cudarc and we have been seriously discussing just killing it in favor of cudarc.
We are investing in them and they form the basis of what we are doing. That being said, we are also exploring other technical avenues with different tradeoffs...we don't want to assume a solution merely because we are familiar with them.
Yeah, there is way more prior art. We were doing this at FB early (here is a talk from 2014 mentioning it but it was much earlier: https://www.infoq.com/presentations/facebook-release-process...) and we definitely did not invent it...it was already the industry standard for large CI.
The underlying projects support loading at runtime, so you could have as many AOT compiled kernel variants and load the one you want. Disk is cheap. You could even ship rustc if you really wanted for "JIT" (lol), but maybe not super crazy as mojo is llvm based anyway. There is of course the warmup / stuttering problem, but that is a separate issue and (sometimes) not an issue for compute vs graphics where it is a bigger issue. I have some thoughts on how to improve the status quo with things unique to Rust but too early to know.
One of the issues with GPUs as a platform is runtime probing of capabilities is... rudimentary to say the least. Rust has to deal with similar stuff with CPUs+SIMD FWIW. AOT vs JIT is not a new problem domain and there are no silver bullets only tradeoffs. Mojo hasn't solved anything in particular, their position in the solution space (JIT) has the same tradeoffs as anyone else doing JIT.
> The underlying projects support loading at runtime, so you could have as many AOT compiled kernel variants and load the one you want.
I'm not sure I understand. What underlying projects? The only reference to loading at runtime I see on the post is loading the AOT-compiled IR.
> There is of course the warmup / stuttering problem, but that is a separate issue and (sometimes) not an issue for compute vs graphics where it is a bigger issue.
It should be worth noting that Mojo itself is not JIT compiled; Mojo has a GPU infrastructure that can JIT compile Mojo code at runtime [1].
> One of the issues with GPUs as a platform is runtime probing of capabilities is... rudimentary to say the least.
Also not an issue in Mojo when you combine the selective JIT compilation I mentioned with powerful compile-time programming [2].