PyPy is a JIT compiler — it runs on a standard CPU and accelerates "hot" parts of a program after runtime analysis.
This is a great approach for many applications, but it doesn’t fit all use cases.
PyXL is a hardware solution — a custom processor designed specifically to run Python programs directly.
It's currently focused on embedded and real-time environments where JIT compilation isn't a viable option due to memory constraints, strict timing requirements, and the need for deterministic behavior.
That a interesting project! I have some follow up:
> No VM, No C, No JIT. Just PyXL.
Is the main goal to achive C-like performance with the ease of writing python? Do you have a perfomance comparision against C? Is the main challenge the memory management?
> PyXL runs on a Zynq-7000 FPGA (Arty-Z7-20 dev board). The PyXL core runs at 100MHz. The ARM CPU on the board handles setup and memory, but the Python code itself is executed entirely in hardware. The toolchain is written in Python and runs on a standard development machine using unmodified CPython.
> PyXL skips all of that. The Python bytecode is executed directly in hardware, and GPIO access is physically wired to the processor — no interpreter, no function call, just native hardware execution.
Did you write some sort of emulation to enable testing it without the physical Arty board?
Goal:
Yes — the main goal is to bring C-like or close-to-C performance to Python code, without sacrificing the ease of writing Python.
However, due to the nature of Python itself, I'm not sure how close I can get to native C performance, especially competing with systems (both SW and HW) that were revised and refined for decades.
Performance comparison against C:
I don't have a formal benchmark directly against C yet.
The early GPIO benchmark (480ns toggle) is competitive with hand-written C on ARM microcontrollers — even when running at a lower clock speed.
But a full systematic comparison (across different workloads) would definitely be interesting for the future.
Main challenge:
Yes — memory management is one of the biggest challenges.
Dynamic memory allocation and garbage collection are tricky to manage efficiently without breaking real-time guarantees.
I have a roadmap for it, but would like to stick to a real use case before moving forward.
Software emulation:
I am using Icarus (could use Verilator) for RTL simulation if that's what you meant.
But hardware behavior (like GPIO timing) still needs to be tested on the real FPGA to capture true performance characteristics.
There are a lot of dimensions to what you could call performance. The FPGA here is only clocked at 100 MHz and there's no way you're going to get the same throughput with it as you would on a conventional processor, especially if you add a JIT to optimize things. What you do get here is very low latency.