"Paris is the capital of France" is a coherent sentence, just like "Paris dates back to Gaelic settlements in 1200 BC", or "France had a population of about 97,24 million in 2024".
The coherence of sentences generated by LLMs is "emergent" from the unbelievable amount of data and training, just like the correct factoids ("Paris is the capital of France").
It shows that Artificial Neural Networks using this architecture and training process can learn to fluently use language, which was the goal? Because language is tied to the real world, being able to make true statements about the world is to some degree part of being fluent in a language, which is never just syntax, also semantics.
I get what you mean by "miracle", but your argument revolving around this doesn't seem logical to me, apart from the question: what is the the "other miracle" supposed to be?
Zooming out, this seems to be part of the issue: semantics (concepts and words) neatly map the world, and have emergent properties that help to not just describe, but also sometimes predict or understand the world.
But logic seems to exist outside of language to a degree, being described by it. Just like the physical world.
Humans are able to reason logically, not always correctly, but language allows for peer review and refinement. Humans can observe the physical world. And then put all of this together using language.
But applying logic or being able to observe the physical world doesn't emerge from language. Language seems like an artifact of doing these things and a tool to do them in collaboration, but it only carries logic and knowledge because humans left these traces in "correct language".
> But applying logic or being able to observe the physical world doesn't emerge from language. Language seems like an artifact of doing these things and a tool to do them in collaboration, but it only carries logic and knowledge because humans left these traces in "correct language".
That's not the only element that went into producing the models. There's also the anthropic principle - they test them with benchmarks (that involve knowledge and truthful statements) and then don't release the ones that fail the benchmarks.
And there is Reinforcement Learning, which is essential to make models act "conversational" and coherent, right?
But I wanted to stay abstract and not go into to much detail outside my knowledge and experience.
With the GPT-2 and GPT-3 base models, you were easily able to produce "conversations" by writing fitting preludes (e.g. Interview style), but these went off the rails quickly, in often comedic ways.
Part of that surely is also due to model size.
But RILHF seems more important.
I enjoyed the rambling and even that was impressive at the time.
I guess the "anthropic principle" you are referring to works in a similar direction, although in a different way (selection, not training).
The only context in which I've heard details about selection processes post-training so far was this article about OpenAIs model updates from GPT-4o onwards, discussed earlier here:
The parts about A/B-Testing are pretty interesting.
The focus is ChatGPT as an enticing consumer product and maximizing engagement, not so much the benchmarks and usefulness of models. It briefly addresses the friction between usefulness and sycophancy though.
Anyway, it's pretty clever to use the wording "anthropic principle" here, I only knew the metaphysical usage (why do humans exist).
I get what you mean by "miracle", but your argument revolving around this doesn't seem logical to me, apart from the question: what is the the "other miracle" supposed to be?
Zooming out, this seems to be part of the issue: semantics (concepts and words) neatly map the world, and have emergent properties that help to not just describe, but also sometimes predict or understand the world.
But logic seems to exist outside of language to a degree, being described by it. Just like the physical world.
Humans are able to reason logically, not always correctly, but language allows for peer review and refinement. Humans can observe the physical world. And then put all of this together using language.
But applying logic or being able to observe the physical world doesn't emerge from language. Language seems like an artifact of doing these things and a tool to do them in collaboration, but it only carries logic and knowledge because humans left these traces in "correct language".