Then you get to definitions like ": open ( string -- handle 1 | 0) ... ;" which describes returning algebraic type Maybe Handle unboxed on the stack. Algebraic types are fun, they can easily represent Peano arithmetic and get us into the realm Goedel incompleteness theorem very quickly.
Or you can deduce signature for EXEC EXEC sequence. EXEC's stack effect can be described as ( \alpha (\alpha -- \beta) -- \beta), where \greekletter is a placeholder for a stack part of arbitrary length. Notice that this type comment has nested brackets and does not adhere to Forth stack-effect comment convention.
When I thought about this more than fifteen years ago, I've got at least two equally valid types for the EXEC EXEC: one where xt at top of stack consumes all its input and leaves no output ( \alpha (\alpha -- \gamma) \beta (\beta -- ) -- \gamma) and when first EXEC produces something for second to execute upon ( \alpha \beta (\beta -- \gamma (\alpha \gamma -- \theta) -- \theta).
One can argue that second type of EXEC EXEC subsume first one, if greek-letter-named-stack-parts are allowed to be empty.
Still it shows that typing Forth, at the very least, needs unification on the Peano's arithmetic level, implementing deduction from length zero to unbounded length.
So, in my opinion, for LLM to dependably combine typed Forth/concatenative definitions, it needs to call external tool like Prolog to properly deduce type(s) of the sequence of Forth's (or concatenative language's) definitions.
This is a Prolog system to learn programs in polynomial time. For one example, it can one-shot-learn a grammar, without being "trained" on millions of samples.
So, should one use a LLM that either needs a paid access or just slow to run, or freely go where "old school" systems like Eurisco [1] and Cyc went?
Eurisco demonstrated superhuman abilities in 1982-83. It also demonstrated knowledge transfer at the time, where rules from VLSI place-and-route algorithms were used to design winning Traveler TCS fleet.
You can ask Opus 4.6 to do a task and leave it running for 30min or more to attempt one-shooting it. Imagine doing this with three agents in parallel in three separate work trees. Then spin up a new agent to decide which approach of the three is best on the merits. Repeat this analysis in fresh contexts and sample until there is clear consensus on one. If no consensus after N runs, reframe to provide directions for a 4th attempt. Continue until a clear winning approach is found.
This is one example of an orchestration workflow. There are others.
> Then spin up a new agent to decide which approach of the three is best on the merits. Repeat this analysis in fresh contexts and sample until there is clear consensus on one.
If there are several agents doing analysis of solutions, how do you define a consensus? Should it be unanimous or above some threshold? Are agents scores soft or hard? How threshold is defined if scores are soft? There is a whole lot of science in voting approaches, which voting approach is best here?
Is it possible for analyzing agents to choose the best of wrong solutions? E.g., longest remembered table of FizzBuzz answers amongst remembered tables of FizzBuzz answers.
We have a voting algorithm that we use, but we're not at the level of confidential disclosure if we proceed further in this discussion. There's lots of research out there into unbiased voting algorithms for consensus systems.
You conveniently decided not to answer my question about quality of the solutions to vote on (ranking FizzBuzz memorization).
To me, our discussion shows that what you presented as a simple thing is not simple at all, even voting is complex, and actually getting a good result is so hard it warrants omitting answer altogether.
I had no expectations at all, I just asked questions, expecting answers. At the very beginning the tone of your comment, as I read it, was "agentic coding is nothing but simple, look they vote." Now answers to simple but important questions are "confidential IP."
Okay then, agentic coding is nothing but complex task requiring knowledge of unbiased voting (what is this thing really?) and, apparently, use of necessarily heavy test suite and/or theorem provers.
> You should have test coverage, type checking, and integration tests that catch the edge cases automatically.
You should assume that if you are going to cover edge cases your tests will be tens to hundredths times as big as the code tested. It is the case for several database engines (MariaDB has 24M of C++ in sql directory and 288M of tests in mysql-test directory), it was the case when I developed VHDL/Verilog simulator. And not everything can be covered with type checking, many things, but not all.
AMD's FPU had hundredths of millions test cases for its' FPU and formal modeling caught several errors [1].
> This capability has been demonstrated multiple times, specially when it was politically convenient, like for example the intercepted Hamas calls that showed that some of the rockets fall inside Gaza by mistake.
Can it be something generated? One can display something that is politically convenient and not true at once.
> Dion language demo (experimental project which stores program as AST).
Michael Franz [1] invented slim binaries [2] for the Oberon System. Slim binaries were program (or module) ASTs compressed with the some kind of LZ-family algorithm. At the time they were much more smaller than Java's JAR files, despite JAR being a ZIP archive.
Interesting. It looks to me this was more about the portability of the resulting binary, IIUC.
Dion project was more about user interface to the programming language and unifying tools to use AST (or Typed AST?) as a source of truth instead of text and what that unlocks.
Their system allow for intermediate state with errors. If that erroneous state can be stored to disk, they using a storage representation that is equivalent to text. If erroneous state cannot be stored, this makes Dion system much less usable, at least for me.
They also deliberately avoided pitfalls of languages like C. While they can do that because they can, I'd like to see how they will extend their concepts of user interface to the programming language and unifying tools to use (Typed) AST to C or, forgive me, C++, and what it'll unlock.
As it allows monadic parsing combinators, it can parse context-sensitive grammars such as C.
It's interesting to see whether their demonstration around 08:30 of Visual Studio unable to recover from an error properly can be improved with error correction.
Arithmetic coding of a single bit preserves ordering of encoded bits, if CDF(1) > CDF(0). If byte's encoding process is going from higher bits to lower bits, arithmetic coding (even with dynamic model) will preserve ordering of individual bytes.
In the end, arithmetic coding preserves ordering of encoded strings. Thus, comparison operations can be performed on the compressed representation of strings (and big-endian representations of integers and even floating point values), without the need to decompress data until that decompressed strings are needed.
Another view: strings are compared by memcmp as if they are mantissas with the base 256. "hi!" is 'h'(1/256)+'i'(1/256)^2+'!'(1/256)^3+0(1/256)^4 and then there are zeroes to the infinity. Arithmetic encoding represents encoded strings as mantissas where base is 2. Range coding can utilize other bases such as 256.
Intriguing claim! At first I was skeptical, thinking that there would be an issue with leading zeros being discarded. However, with some piloting, I was able to use Claude and ChatGPT to construct a proof of your claim.
Sketch of the argument:
First, an arithmetic coder maps strings to non-overlapping subintervals of [0, 1) that respect lexicographic order.
Second, the process of emitting the final encoding preserves this. If enc(s) ∈ I(s), enc(t) ∈ I(t), and I(s) <= I(t), then enc(s) <= enc(t).
Finally, binary fractions compared left-to-right bitwise yield the same order as their numerical values — this is just memcmp.
Thus, we have a proof of your claim that arithmetic coding preserves lexicographic order! Nice result!
My mistake was in thinking that leading zeros are discarded -- it is tailing zeros that are discarded!
Alphabet Inc, as Youtube owner, faces a class action lawsuit [1] which alleges that platform enables bad behavior and promotes behavior leading to mental health problems.
In my not so humble opinion, what AI companies enable (and this particular bot demonstrated) is a bad behavior that leads to possible mental health problems of software maintainers, particularly because of the sheer amount of work needed to read excessively lengthy documentation and review often huge amount of generated code. Nevermind the attempted smear we discuss here.
I would like to add that sugar consumption is a risk factor for many dependencies, including, but not limited to, opioids [1]. And LLM addiction can be seen as fallout of sugar overconsumption in general.
I definitely don't deny that LLM addiction exists, but attempting to paint literally everyone that uses LLMs and thinks they are useful, interesting, or effective as addicted or falling for confidence or persuasion tricks is what I take issue with.
Did he do so? I read his comment as a sad take on the situation when one realizes that one is talking to a machine instead of (directly) to another person.
In my opinion, to participate in discussion through LLM is a sign of excessive LLM use. Which can be a sign of LLM addiction.
From the top of my head, overhead is around 10% or so.
https://www.youtube.com/watch?v=lEBQveBCtKYApparently FP80, which is even wider than FP64, is beneficial for pathfinding algorithms in games.
Pathfinding for hundredths of units is a task worth putting on GPU.
reply