I also have a strange obsession with Prolog and Markus Triska's article on meta-interpreters heavily inspired me to write a Prolog-based agent framework with a meta-interpreter at its core [0].
I have to admit that writing Prolog sometimes makes me want to bash my my head against the wall, but sometimes the resulting code has a particular kind of beauty that's hard to explain. Anyways, Opus 4.5 is really good at Prolog, so my head feels much better now :-)
Anything you'd like to share? I did some research within the realm of classic robotic-like planning ([1]) and the results were impressive with local LLMs already a year ago, to the point that obtaining textual descriptions for complex enough problems became the bottleneck, suggesting that prompting is of limited use when you could describe the problem in Prolog concisely and directly already, given Prolog's NLP roots and one-to-one mapping of simple English sentences. Hence that report isn't updated to GLM 4.7, Claude whatever, or other "frontier" models yet.
Opus 4.5 helped me implement a basic coding agent in a DSL built on top of Prolog: https://deepclause.substack.com/p/implementing-a-vibed-llm-c.... It worked surprisingly well. With a bit of context it was able to (almost) one-shot about 500 lines of code. With older models, I felt that they "never really got it".
>>I have to admit that writing Prolog sometimes makes me want to bash my my head against the wall
I think much of the frustration with older tech like this comes from the fact that these things were mostly written(and rewritten till perfection) on paper first and only the near-end program was input into a computer with a keyboard.
Modern ways of carving out a program with 'Successive Approximations' with a keyboard and monitor until you get to something to work is mostly a recent phenomenon. Most of us are used to working like this. Which quite honestly is mostly trial and error. The frustration is understandable because you are basically throwing darts, most of the times in the dark.
I knew a programmer from the 1980s who(built medical electronics equipment) would tell me how even writing C worked back then. It was mostly writing a lot, on paper. You had to prove things on paper first.
>> I think much of the frustration with older tech like this comes from the fact that these things were mostly written(and rewritten till perfection) on paper first and only the near-end program was input into a computer with a keyboard.
I very much agree with this, especially since Prolog's execution model doesn't seem to go that well with the "successive approximations" method.
Before personal computer revolution, compute time and even development/test time on a large computers back then was rationed.
One can imagine how development would work in a ecosystem like that. You have to understand both the problem, and your solution, and you need to be sure it would work before you start typing it out at a terminal.
This the classic Donald Knuth workflow. Like he is away disconnected from a computer for long periods of time, focussed on the problems and solutions, and he is working them out on paper and pen. Until he has arrived solutions that just work, correctly. And well enough to be explained in a text book.
When you take this away. You also take away the need to put in hard work required to make things work correctly. Take a look at how many Java devs are out there who try to use a wrong data structure for the problem, and then try to shoe horn their solution to roughly fit the problem. Eventually solution does work for some acceptable inputs, and remainder is left to be discovered by an eventual production bug. Stackoverflow is full of such questions.
Languages like Prolog just don't offer that sort of freedom. And you have to be in some way serious about what you are doing in terms of truly understanding both the problem and solution well enough to make them work.
Languages like Prolog just don't offer that sort of freedom.
Yes, they do -- that's why people have enjoyed using such languages.
It might help to think of them as being like very-high-level scripting-languages with more rigorous semantics (e.g. homoiconicity) and some nifty built-ins, like Prolog's relational-database. (Not to mention REPLs, tooling, etc.)
Read, for example, what Paul Graham wrote about using Lisp for Viaweb (which became Yahoo Store) [0] and understand that much of what he says applies to languages like Prolog and Smalltalk too.
...these things were mostly written(and rewritten till perfection) on paper first and only the near-end program was input into a computer with a keyboard.
Not if you were working in a high-level language with an interpreter, REPL, etc. where you could write small units of code that were easily testable and then integrated into the larger whole.
I'm assuming they were written on paper because they were commonly punched into paper at some stage after that. We tend to be more careful with non erasable media.
But I wonder if that characterization is actually flattering for Prolog? I can't think of any situation, skill, technology, paradigm, or production process for which "doing it right the first time" beats iterative refinement.
Like Lisp and Smalltalk, Prolog was used primarily in the 1980s, so it was run on Unix workstations and also, to some extent, on PCs. (There were even efforts to create hardware designed to run Prolog a la Lisp machines.)
And, like Lisp and Smalltalk, Prolog can be very nice for iterative development/rapid prototyping (where the prototypes might be good enough to put into production).
The people who dealt with Prolog on punchcards were the academics who created and/or refined it in its early days. [0]
>>"doing it right the first time" beats iterative refinement.
Its not iterative refinement which is bad. Its just that when you use a keyboard a thinking device, there is a tendency to assume the first trivially working solution to be completely true.
This is doesn't happen with pen and paper as it slows you down. You get mental space to think through a lot of things, exceptions etc etc. Until even with iterative refinement you are likely to build something that is correct compared to just committing the first typed function to the repo.
Is there any reasonably fast and portable sandboxing approach that does not require a full blown VM or containers? For coding agents containers are probably the right way to go, but for something like Cowork that is targeted at non-technical users who want or have to stay local, what's the right way?
container2wasm seems interesting, but it runs a full blown x86 or ARM emulator in WASM which boots an image derived from a docker container [0].
As an experiment over the holidays I had Opus create a coding agent in a Prolog DSL (more than 200 lines though) [0] and I was surprised how well the agent worked out of the box. So I guess that the latest one or two generations of models have reached a stage where the agent harness around the model seems to be less important than before.
Thank you! I went with a Prolog base, because I was interested in what might be possible when combining its execution model with LLM-defined predicates. For anything related to modelling and querying data, a Datalog dialect might indeed be a better choice. I've also used Logica [0] as an intermediate layer in a text2sql system, but as models get better and better, I believe there is less need for these kinds of abstractions.
Hi, I stumbled on this article in my twitter feed and posted it because I found it to be very practical, despite the somewhat misleading title. (and I also don't like encoding agent logic in .md files). For my side project I am experimenting with describing agents / agentic workflows in a Prolog-based DML [1]
This looks like a very pragmatic solution, in line with what seems to be going on in the real world [1], where reliability seems to be one of the biggest issues with agentic systems right now. I've been experimenting with a different approach to increase the amount of determinism in such systems: https://github.com/deepclause/deepclause-desktop. It's based on encoding the entire agent behavior in a small and concise DSL built on top of Prolog. While it's not as flexible as a fully fledged agent, it does however, lead to much more reproducible behavior and a more graceful handling of edge-cases.
> But my bet is that the proposed program-of-thought is too specific
This is my impression as well, having worked with this type of stuff for the past two years. It works great for very well defined uses case and if user queries do not stray to far from what you optimized your framework/system prompt/agent for. However, once you move too far away from that, it quickly breaks down.
Nevertheless, as this problem has been bugging me for a while, I still haven't given up (although I probably should ;-). My latest attempt is a Prolog-based DSL (http://github.com/deepclause/deepclause.ai) that allows for part of the logic to be handled by LLMs again, so that it retains some of the features of pure LLM_based systems. As a side effect, this gives additional features such as graceful failures, auditability and increased (but not full) reproducibility.
I've been experimenting with giving the LLM a Prolog-based DSL, used in a CodeAct style pattern similar to Huggingface's smolagents. The DSL can be used to orchestrate several tools (MCP or built in) and LLM prompts. It's still very experimental, but a lot of fun to work with. See here: https://github.com/deepclause/deepclause-desktop.
My own attempt at "chain-of-code with a Prolog DSL": https://news.ycombinator.com/item?id=45937480. Similarly to CodeAct the idea there is to turn natural language task descriptions into small programs. Some program steps are directly executed, some are handed over to an LLM. I haven't run any benchmarks yet, but there should be some classes of tasks where such an approach is more reliable than a "traditional" LLM/tool-calling loop.
Prolog seemed like a natural choice for this (at least to me :-), since it's a relatively simple language that makes it easy to build meta-interpreters and allows for a fairly concise task/workflow representations.
Nice, I do like the direction. A prolog dialect does seem like a natural choice if we must pick only one kind of intermediate representation, but ideally there could be multiple. For example, I saw your "legal reasoning" example.. did you know about https://catala-lang.org/ ? I think I'd like to see an LLM experiment that only outputs formal specifications, but still supports multiple targets (say prolog, z3, storm, prism, alloy and what have you). After you can output these things you can use them in chain-of-code.
Anyway the basic point being.. it is no wonder LLM reasoning abilities suck when we have no decent intermediate representation for "thinking" in terms of set/probability primitives. And it is no wonder LLMs suck at larger code-gen tasks when we have no decent intermediate representation for "thinking" in terms of abstract specifications. The obsession with natural-language inputs/intermediates has been a surprise to me. LLMs are compilers, and we need to walk with various spec -> spec compilers first so that we can run with spec -> code compilers
Thank you, https://catala-lang.org/ looks very interesting. I've experimented a lot with LLMs producing formal representations of facts and rules. What I've observed is that the resulting systems usually lose a lot of the original generalization capabilities offered by the current generation of LLMs (Finetuning may help in this case, but is often impractical due to missing training data). Together with the usual closed world assumption in e.g. Prolog, this leads to imho overly restrictive applications. So the approach I am taking is to allow the LLM to generate Prolog code that may contain predicates which are interpreted by an LLM.
So one could e.g. have
is_a(dog, animal).
is_a(Item, Category) :- @("This predicate should be true if 'Item' is in the category 'Category'").
In this example, evaluation of the is_a predicate would first try to apply the first rule and if that fails fallback on to the second rule branch which goes into the LLM. That way the system as a whole does not always fail, if the formal knowledge representation is incomplete.
I've also been thinking about the Spec->Spec compilation use case. So the original Spec could be turned into something like:
I am honestly not sure where such an approach might ultimately be most valuable. "Anything-tools" like LLMs make it surprisingly hard to focus on an individual use case.
I have to admit that writing Prolog sometimes makes me want to bash my my head against the wall, but sometimes the resulting code has a particular kind of beauty that's hard to explain. Anyways, Opus 4.5 is really good at Prolog, so my head feels much better now :-)
[0] http://github.com/deepclause/deepclause-desktop
reply