Why is it not routine to "compile" Python? I understand that the interpreter is ...

cchianel · 2025-04-28T13:15:33 1745846133

The primary reason, in my opinion, is the vast majority of Python libraries lack type annotations (this includes the standard library). Without type annotations, there is very little for a non-JIT compiler to optimize, since:

- The vast majority of code generation would have to be dynamic dispatches, which would not be too different from CPython's bytecode.

- Types are dynamic; the methods on a type can change at runtime due to monkey patching. As a result, the compiler must be able to "recompile" a type at runtime (and thus, you cannot ship optimized target files).

- There are multiple ways every single operation in Python might be called; for instance `a.b` either does a __dict__ lookup or a descriptor lookup, and you don't know which method is used unless you know the type (and if that type is monkeypatched, then the method that called might change).

A JIT compiler might be able to optimize some of these cases (observing what is the actual type used), but a JIT compiler can use the source file/be included in the CPython interpreter.

hwpythonner · 2025-04-28T13:25:33 1745846733

You make a great point — type information is definitely a huge part of the challenge.

I'd add that even beyond types, late binding is fundamental to Python’s dynamism: Variables, functions, and classes are often only bound at runtime, and can be reassigned or modified dynamically.

So even if every object had a type annotation, you would still need to deal with names and behaviors changing during execution — which makes traditional static compilation very hard.

That’s why PyXL focuses more on efficient dynamic execution rather than trying to statically "lock down" Python like C++.

pjmlp · 2025-04-28T14:20:40 1745850040

Solved by Smalltalk, Self, and Lisp JITs, that are in the genesis of JIT technology, some of it landed on Hotspot and V8.

dragonwriter · 2025-04-28T16:24:37 1745857477

Python starting with 3.13 also has a JIT available.

pjmlp · 2025-04-28T16:30:01 1745857801

Kind of, you have to compile it yourself, and is rather basic, still early days.

PyPy and GraalPy is where the fun is, however they are largely ignored outside their language research communities.

jonathaneunice · 2025-04-28T14:31:06 1745850666

"Addressed" or "mitigated" perhaps. Not "solved." Just "made less painful" or "enough less painful that we don't need to run screaming from the room."

pjmlp · 2025-04-28T14:55:06 1745852106

Versus what most folks do with CPython, it is indeed solved.

We are very far from having a full single user graphics workstation in CPython, even if those JITs aren't perfect.

Yes, there are a couple of ongoing attempts, while most in the community rather write C extensions.

jonathaneunice · 2025-04-28T16:26:37 1745857597

Is "single user graphics workstation" even still a goal? Great target in the Early to Mid Ethernetian when Xerox Dorados and Dandelions, Symbolics, and Liliths roamed the Earth. Doesn't feel like a modern goal or standard of comparison.

I used those workstations back in the day—then rinsed and repeated with JITs and GCs for Self, Java, and on to finally Python in PyPy. They're fantastic! Love having them on-board. Many blessings to Deutsch, Ungar, et al. But for 40 years JIT's value has always been to optimize away the worst gaps, getting "close enough" to native to preserve "it's OK to use the highest level abstractions" for an interesting set of workloads. A solid success, but side by side with AOT compilation of closer-to-the-machine code? AOT regularly wins, then and now.

"Solved" should imply performance isn't a reason to utterly switch languages and abstractions. Yet witness the enthusiasm around Julia and Rust e.g. specifically to get more native-like performance. YMMV, but from this vantage, seeing so much intentional down-shift in abstraction level and ecosystem maturity "for performance" feels like JIT reduced but hardly eliminated the gap.

kragen · 2025-04-28T20:09:10 1745870950

"Single-user graphical workstation" may not be a great goal anymore, but it's at least a sobering milestone to keep failing to reach.

AFAIK there isn't an AOT compiler from JVM bytecode to native code that's competitive with either HotSpot or Graal, which are JIT compilers. But the JVM semantics are much less dynamic than Python or JS, whose JIT compilers don't perform nearly as well. Even Jython compiled to JVM bytecode and JITted with HotSpot is pretty slow.

However, LuaJIT does seem to be competitive with AOT-compiled C and with HotSpot, despite Lua being just as dynamic as Python and more so than JS.

pjmlp · 2025-04-28T16:32:33 1745857953

It is solved to the point the users on those communities are not writing extensions in C all the time, to compensate for the interpreter implementation.

AOT winning over JITs on micro benchmarks hardly wins in meaningful way for most business applications, especially when JIT caches and with PGO data sharing across runs is part of the picture.

Sure there are always going to be use cases that require AOT, and in most of them is due to deployment constraints, than anything else.

Most mainstream devs don't even know how to use PGO tooling correctly from their AOT toolchains.

Heck, how many Electron apps do you have running right now?

Qem · 2025-04-28T15:40:13 1745854813

> We are very far from having a full single user graphics workstation in CPython, even if those JITs aren't perfect.

Some years ago there was an attempt to create a linux distribution including a Python userspace, called Snakeware. But the project went inactive since then. See https://github.com/joshiemoore/snakeware

pjmlp · 2025-04-28T16:08:22 1745856502

I fail to find anything related to have a good enough performance for a desktop system written in Python.

homarp · 2025-04-28T19:30:27 1745868627

Sugar is built with python

https://github.com/sugarlabs/sugar

Qem · 2025-04-28T15:25:56 1745853956

> The primary reason, in my opinion, is the vast majority of Python libraries lack type annotations (this includes the standard library).

When type annotations are available, it's already possible to compile Python to improve performance, using Mypyc. See for example https://blog.glyph.im/2022/04/you-should-compile-your-python...

Someone · 2025-04-28T13:06:24 1745845584

Python doesn’t eschew all benefits of compilation. It is compiled, but to an intermediate byte code, not to native code, (somewhat) similar to the way java and C# compile to byte code.

Those, at runtime (and, nowadays, optionally also at compile time), convert that to native code. Python doesn’t; it runs a bytecode interpreter.

Reason Python doesn’t do that is a mix of lack of engineering resources, desire to keep the implementation fairly simple, and the requirement of backwards compatibility of C code calling into Python to manipulate Python objects.

jerf · 2025-04-28T13:20:20 1745846420

If you define "compiling Python" as basically "taking what the interpreter would do but hard-coding the resulting CPU instructions executed instead of interpreting them", the answer is, you don't get very much performance improvement. Python's slowness is not in the interpreter loop. It's in all the things it is doing per Python opcode, most of which are already compiled C code.

If you define it as trying to compile Python in such a way that you would get the ability to do optimizations and get performance boosts and such, you end up at PyPy. However that comes with its own set of tradeoffs to get that performance. It can be a good set of tradeoffs for a lot of projects but it isn't "free" speedup.

jonathaneunice · 2025-04-28T14:28:45 1745850525

A giant part of the cost of dynamic languages is memory access. It's not possible, in general, to know the type, size, layout, and semantics of values ahead of time. You also can't put "Python objects" or their components in registers like you can with C, C++, Rust, or Julia "objects." Gradual typing helps, and systems like Cython, RPython, PyPy etc. are able to narrow down and specialize segments of code for low-level optimization. But the highly flexible and dynamic nature of Python means that a lot of the work has to be done at runtime, reading from `dict` and similar dynamic in-memory structures. So you have large segments of code that are accessing RAM (often not even from caches, but genuine main memory, and often many times per operation). The associated IO-to-memory delays are HUGE compared to register access and computation more common to lower-level languages. That's irreducible if you want Python semantics (i.e. its flexibility and generality).

Optimized libraries (e.g. numpy, Pandas, Polars, lxml, ...) are the idiomatic way to speed up "the parts that don't need to be in pure Python." Python subsets and specializations (e.g. PyPy, Cython, Numba) fill in some more gaps. They often use much tighter, stricter memory packing to get their speedups.

For the most part, with the help of those lower-level accelerations, Python's fast enough. Those who don't find those optimizations enough tend to migrate to other languages/abstractions like Rust and Julia because you can't do full Python without the (high and constant) cost of memory access.

ModernMech · 2025-04-28T13:31:14 1745847074

Part of the issue is the number of instructions Python has to go through to do useful work. Most of that is unwrapping values and making sure they're the right type to do the thing you want.

For example if you compile x + y in C, you'll get a few clean instructions that add the data types of x and y. But if you compile this thing in some sort of Python compiler it would essentially have to include the entire Python interpreter; because it can't know what x and y are at compile time, there necessarily has to be some runtime logic that is executed to unwrap values, determine which "add" to call, and so forth.

If you don't want to include the interpreter, then you'll have to add some sort of static type checker to Python, which is going to reduce the utility of the language and essentially bifurcate it into annotated code you can compile, and unannotated code that must remain interpreted at runtime that'll kill your overall performance anyway.

That's why projects like Mojo exist and go in a completely different direction. They are saying "we aren't going to even try to compile Python. Instead we will look like Python, and try to be compatible, but really we can't solve these ecosystem issues so we will create our own fast language that is completely different yet familiar enough to try to attract Python devs."

kragen · 2025-04-28T20:25:59 1745871959

You don't need the whole Python interpreter to fall back to dynamic method dispatch for overloaded operators. CPython itself implements them with per-interface vtables for C extensions, very similar to Golang but laboriously constructed by hand.

For most code, you don't need static typing for most overloaded operators to get decent performance, either. From my experience with Ur-Scheme, even a simple prediction that most arithmetic is on (small) integers with a runtime typecheck and conditional jump before inlining the integer version of each arithmetic operation performs remarkably well—not competitive with C but several times faster than CPython. It costs you an extra conditional branch in the case where the type is something else, but you need that check anyway if you are going to have unboxed integers, and it's smallish compared to the call and return you'll need once you find the correct overload to call. (I didn't implement overloading in Ur-Scheme, just exiting with an error message.)

Even concatenating strings is slow enough that checking the tag bits to see if you are adding integers won't make it much slower.

Where this approach really falls down is choosing between integer and floating point math. (Also, you really don't want to box your floats.)

And of course inline caches and PICs are well-known techniques for handling this kind of thing efficiently. They originated in JIT compilers, but you can use them in AOT compilers too; Ian Piumarta showed that.

franga2000 · 2025-04-28T13:01:56 1745845316

There's no benefit that I know of, besides maybe a tiny cold start boost (since the interpreter doesn't need to generate the bytecode first).

I have seen people do that for closed-source software that is distributed to end-users, because it makes reverse engineering and modding (a bit) more complicated.

Qem · 2025-04-28T14:56:19 1745852179

Check Nuitka: https://nuitka.net/

hwpythonner · 2025-04-28T13:03:02 1745845382

There have been efforts (like Cython, Nuitka, PyPy’s JIT) to accelerate Python by compiling subsets or tracing execution — but none fully replace the standard dynamic model at least as far as I know.

wyldfire · 2025-04-29T02:06:33 1745892393

For python, compilation means emitting some bytecode. And you could conceivably ship that bytecode *. But because it's so terribly dynamic of a language, virtually nothing is bound to anything until you execute this particular line. "What code does this function call resolve to?" -- we'll find out when we get there. "What type does this local use?" -- we'll find out when we get there.

Even type annotations would have to be anointed with semantics, which (IIUC) they have none today (w/CPython AFAIK). They are just annotations for use by static checkers.

Unless you can perform optimizations, the compilation can't make a whole bunch of progress beyond that bytecode.

* In fact, IIRC there was/is some "freeze" program that would do just that: compile your python program. Under the covers it would bundle libpython with your *.pyc bytecode.

dragonwriter · 2025-04-28T16:24:08 1745857448

> Why is it not routine to "compile" Python?

Where’s the AOT compiler that handles the whole Python language?

It’s not routine because its not even an option, and people who are concerned either use the tools that let them compile a subset of Python within a larger, otherwise-interpreted program, or use a different language.

f1shy · 2025-04-29T12:54:43 1745931283

AFAIK, one reason is that if you use "eval()" anywhere you need already a whole python compiler shipped with your program. So, compile is not different as shipping the code with the interpreter.

seanw444 · 2025-04-28T15:05:53 1745852753

It's called Nim.

archargelod · 2025-04-29T02:37:14 1745894234

Comparing Nim to compiled Python is almost insulting.

Smaller binaries, faster execution, proper metaprogramming, actual type safety, and you don't need to bundle a whole interpreter just to say "hello world"

seanw444 · 2025-04-30T18:30:20 1746037820

I agree, it was just a succinct way of putting it. It's syntactically similar, which makes it easier for Python devs to shift to using it for higher-performance stuff. Aside from that, it's its own thing with its own unique offering.

My real point was that if you want "typed Python", you're doing it wrong. It wasn't built with that in mind, and probably will never be. You should just a tool that actually has strong typing in mind from the start. Nim fits that bill.