This is a great post and I probably wouldn't have read up on DynASM without it. But the comparison between DynASM syntax and the BPF JIT is a bit unfair; the Linux kernel is not pervasively JIT'd, and the BPF language hasn't changed in decades, so there was little incentive to create a flexible dynamic code generation system for it.
You could probably very easily clean up the syntax for a C->x64 JIT without requiring (yech) a preprocessor.
I'm not faulting the BPF authors for not inventing DynASM, I'm just illustrating what DynASM buys you over the more traditional approach.
Without a preprocessor you'll always be dealing in encoded instructions instead of symbolic ones. It's manual work that is surely nicer to avoid if you can. I'm surprised that you're put off by the preprocessor aspect of it, given the benefits of this approach.
There's something to be said for instructions encoded into structs instead of raw assembly; those structures are parameterized, so it's easy to interrogate them or bind new values to them; without structures, you end up having to invent new assembly language features for dynamic labels and things like that.
> There's something to be said for instructions encoded into structs instead of raw assembly
I'm confused, what is this in response to? Are we still talking about BPF's JIT vs DynASM?
By "instructions encoded into structs" are you referring to BPF's byte-code? I'm certainly not arguing against a byte-code approach at all -- byte-code obviously has numerous advantages, including (as you mention) ease of inspection. Even LuaJIT (the project that DynASM was built for) uses byte-code pervasively.
The question is: at the point that you decide you want to generate machine code from your byte-code, what's the easiest way to do it?
The BPF JIT's code is functionally identical to a DynASM-based code generator; the only difference is that the BPF JIT requires a human to put the machine encoding of every instruction directly in the source file. DynASM saves the human from performing this step by using a preprocessor.
No, I'm saying, rather than have the representation of assembly code in your C program be textual --- raw assembly code interpolated into C code and expanded into structs by a preprocessor --- there is something to be said for having the representation of your instructions be "native", expressed without a preprocessor. In particular, it makes it easier to modify the assembly code in C code later on in the runtime of the program; and, if you look closely at things like DynASM, it's not "really" assembly qua assembly, because they've added features to the language to handle the dynamic things anyways.
I'm definitely not saying bytecode is better than assembly code! I'm talking strictly about the mechanism by which you generate opcodes from "assembly language".
A library that can generate opcodes without exposing all the fiddly mod/rm stuff is probably just as good as the preprocessor tool.
I am, like everyone else, appropriately reverential of LuaJIT. :)
> A library that can generate opcodes without exposing all the fiddly mod/rm stuff is probably just as good as the preprocessor tool.
I'm skeptical that such a library could have a very simple or clean interface -- the "mov" instruction alone has almost 40 variants, all encoded in different ways -- but if you ever create such a thing I promise to check it out and give it a fair opinion. :)
Most of the x64 instruction set shares a common set of addressing mode variants all encoded the same way.
I've done this from C (messily) and from Ruby (very cleanly) and while the x64 ISA is definitely a pain to work with directly, it's not so painful that I think it defies the abstractions C provides natively. :)
Thanks! It took a while to write, but I was motivated by the positive feedback I'd gotten on HN previously about writing an introduction to DynASM.
Some feedback I got from Mike suggested very simple optimizations for the BF JIT that will let me catch (and exceed) the performance of bf2c. That might be the next article in the series.
TL;DR DynASM is a toned down JIT compiler which directly generates target assembly instead of a more abstract IL. Retargeting and optimizations will be hard.
I'm looking forward the next installment, where (hopefully!) one generates target independent code via LLVM, then uses one of the LLVM backends to generate the final assembly code.
What evidence do you have for this? I have evidence to the contrary: LuaJIT is a one-man effort and yet is one of the fastest and most portable JITs around (x86, x86-64, ARM, PowerPC, MIPS).
> I'm looking forward the next installment, where (hopefully!) one generates target independent code via LLVM, then uses one of the LLVM backends to generate the final assembly code.
LLVM is very cool, but it is an absolute mistake to think of it as obsoleting all other JITs. LLVM uses an IR that is well-suited to some things but not others. If LLVM fits your problem, great. But many problems it does not fit as well -- just look at Unladen Swallow, and notice that none of the mainstream JavaScript JITs use LLVM (not V8, not IonMonkey, not Nitro).
LLVM's design tightly couples an IR with a machine code generator. If you use DynASM, you can write your own machine code generator that accepts whatever IR is best suited to your problem.
I'm assuming that emitting "movzx edi, byte [PTR]" is using x86 as the target, thus retargeting for ARM will likely require a complete rewrite of the brainf#ck jit. In that sense retargeting is hard. But I may be wrong! I am looking for a further article that shows how the brainf#ck jit can be retargeted to ARM without a full rewrite of the jit code.
From the jit code that generates assembly tied to a specific architecture and register allocation, plus the code generation process encoded as a preprocessor step instead of a library I can only deduce that optimizations aren't the focus of this work. But I may be wrong! Perhaps the preprocessor is syntactic sugar over a library that build the code representation as a data structure and there are ways to programmatically manipulate this data structure to implement optimizations. Looking forward for a further article with more details!
I'm not suggesting you necessarily use LLVM, but LLVM is the closest to a assembly generator library I am aware of. To the best of my knowledge, you'd have a harder time extracting the code generator of, for example, v8 as a standalone library.
It is true that this approach requires a separate code generator for every architecture. That is not the same as saying that "retargeting will be hard" (which makes it sound like DynASM somehow gets in your way).
Yes, as I said before, if you have a problem that maps cleanly onto LLVM and you don't mind the weight that LLVM brings along, by all means use it! But you shouldn't think of LLVM as an "assembly generator library." That implies that it is far more general-purpose than it actually is. DynASM is actually an assembly generator library. LLVM is an IR, a set of optimization passes for that IR, and a set of target-specific code generators for that IR. The key point is "for that IR."
DynASM is a tool that you can count on when no existing IR's like LLVM, .NET, etc. fit your needs. It's a lower-level tool -- LLVM could conceivably use DynASM to perform its own target-specific instruction encoding. DynASM is a small, focused tool that does one thing and does it well. LLVM is more of a toolbox that tries to get the 99% case right for its target audience. As a result, it represents a lot more compromises and changes in more fundamental ways over time (for example, it recently completely rewrote its register allocator).
> Perhaps the preprocessor is syntactic sugar over a library that build the code representation as a data structure and there are ways to programmatically manipulate this data structure to implement optimizations.
No, definitely not. The idea is that you perform optimizations before the code generation step. I didn't do this in the article because these were just simple "Hello, World" examples, but maybe I should write a follow-up article that illustrates how optimization fits into this framework.
You could probably very easily clean up the syntax for a C->x64 JIT without requiring (yech) a preprocessor.