Your way off, this reads more like anti capitalist political rhetoric than real reasoning.
Look at Nvidia nemotron series. They hav become a leading open source training lab themselves and they’re releasing the best training data, training tooling, and models at this point.
When are people going to drop the immigration is good at all costs assumption.
We need a well managed set of immigration polices or country WILL take advantage of US. These are our military rivals and we sell our most advanced math, physics and engineering seats to the highest bidder. It’s a self districting disaster and it’s not just on us to treat people better.
Look at the rate of Indian asylum seekers in Canada to see the most extreme case. It happens anywhere you extend naivety and boundless good will.
Thats especially encouraging to me because those are all about generalization.
5 and 5.1 both felt overfit and would break down and be stubborn when you got them outside their lane. As opposed to Opus 4.5 which is lovely at self correcting.
It’s one of those things you really feel in the model rather than whether it can tackle a harder problem or not, but rather can I go back and forth with this thing learning and correcting together.
This whole releases is insanely optimistic for me. If they can push this much improvement WITHOUT the new huge data centers and without a new scaled base model. Thats incredibly encouraging for what comes next.
Remember the next big data center are 20-30x the chip count and 6-8x the efficiency on the new chip.
I expect they can saturate the benchmarks WITHOUT and novel research and algorithmic gains. But at this point it’s clear they’re capable of pushing research qualitatively as well.
> 5 and 5.1 both felt overfit and would break down and be stubborn when you got them outside their lane. As opposed to Opus 4.5 which is lovely at self correcting.
This is simply the "openness vs directive-following" spectrum, which as a side-effect results in the sycophancy spectrum, which still none of them have found an answer to.
Recent GPT models follow directives more closely than Claude models, and are less sycophantic. Even Claude 4.5 models are still somewhat prone to "You're absolutely right!". GPT 5+ (API) models never do this. The byproduct is that the former are willing to self-correct, and the latter is more stubborn.
Opus 4.5 answers most of my non-question comments with ‘you’re right.’ as the first thing in the output. At least I’m not absolutely right, I’ll take this as an improvement.
Hah, maybe 5th gen Claude will change to "you may be right".
The positive thing is that it seems to be more performative than anything. Claude models will say "you're [absolutely] right" and then immediately do something that contradicts it (because you weren't right).
Gemini 3 Pro seems to have struck a decent balance between stubbornness and you're-right-ness, though I still need to test it more.
5.2 seems worse on overfitting for esoteric logic puzzles in my testing. Tests using precise language where attention has to be paid to use the correct definition among many for a given word. It charges ahead with wrong definitions in a far lower accuracy and worse way now.
Slight tangent yet I think is quite interesting... you can try out the ARC-AGI 2 tasks by hand at this website [0] (along with other similar problem sets). Really puts into perspective the type of thinking AI is learning!
You haven’t actually looked at their fundamentals. They’re profitable serving current models including training costs and are only losing money on future RD training, but if you project future revenue growth on future generations of models you get a clear path to profitability.
They charge higher costs than OpenAI and have faster growing API demand. They have great margins compared to the rest of the industry on inference.
Sure the revenue growth could stop but it hasn’t and there is no reason to think it will.
> They’re profitable serving current models including training costs
I hear this a lot, do you have a good source (apart from their CEO saying it in an interview). I might have more faith in him but checks notes, it's late 2025 and AI is not writing all our code yet (amongst other mental things he's said).
The best I kind is this tech crunch article, which appears to be referencing an article from the information that is pay walled.
> The Information reports that Anthropic expects to generate as much as $70 billion in revenue and $17 billion in cash flow in 2028. The growth projections are fueled by rapid adoption of Anthropic’s business products, a person with knowledge of the company’s financials said.
> That said, the company expects its gross profit margin — which measures a company’s profitability after accounting for direct costs associated with producing goods and services — to reach 50% this year and 77% in 2028, up from negative 94% last year, per The Information.
So assuming that the gross margin is GAAP (which it probably isn't), then this would suggest that the costs of training are covered by inference sales this year (which is definitely good).
However, I'm still a little sceptical around this as the cost to train new models is going up super-linearly (apparently) which means that the revenue from inference needs to also go up along side this.
Interesting to think about though, thanks for the source!
I think Claude Code is the moat (though I definitely recognize it's a pretty shallow moat). I don't want to switch to Codex or whatever the Gemini CLI is, I like Claude Code and I've gotten used to how it works.
Again, I know that's a shallow moat - agents just aren't that complex from a pure code perspective, and there are already tools that you can use to proxy Claude Code's requests out to different models. But at least in my own experience there is a definite stickiness to Claude that I probably won't bother to overcome if your model is 1.1x better. I pay for Google Business or whatever it's called primarily to maintain my vanity email and I get some level of Gemini usage for free, and I barely touch it, even though I'm hearing good things about it.
(If anything I'm convincing myself to give Gemini a closer look, but I don't think that undermines my overarching (though slightly soft) point).
1. using Claude Code exclusively (back when it really was on another level from the competition) to
2. switching back and forth with CC using the Z.ai GLM 4.6 backend (very close to a drop-in replacement these days) due to CC massively cutting down the quota on the Claude Pro plan to
3. now primarily using OpenCode with the Claude Code backend, or Sonnet 4.5 Github Copilot backend, or Z.ai GLM 4.6 backend (in that order of priority)
OpenCode is so much faster than CC even when using Claude Sonnet as the model (at least on the cheap Claude Pro plan, can't speak for Max). But it can't be entirely due to the Claude plan rate limiting because it's way faster than CC even when using Claude Code itself as the backend in OC.
I became so ridiculously sick of waiting around for CC just to like move a text field or something, it was like watching paint dry. OpenCode isn't perfect but very close these days and as previously stated, crazy fast in comparison to CC.
Now that I'm no longer afraid of losing the unique value proposition of CC my brand loyalty to Anthropic is incredibly tenuous, if they cut rate limits again or hurt my experience in the slightest way again it will be an insta-cancel.
So the market situation is much different than the early days of CC as a cutting edge novel tool, and relying on that first mover status forever is increasingly untenable in my opinion. The competition has had a long time to catch up and both the proprietary options like Codex and open source model-agnostic FOSS tools are in a very strong position now (except Gemini CLI is still frustrating to use as much as I wish it wasn't, hopefully Google will fix the weird looping and other bugs ... eventually, because I really do like Gemini 3 and pay for it already via AI Pro plan).
Google Code assist is pretty good. I had it create a pretty comprehensive inventory tracking app within the quota that you get with the $25 google plan.
Google had PageRank, which gave them much better quality results (and they got users to stick with them by offering lots of free services (like gmail) that were better quality than existing paid services). The difference was night and day compared to the best other search engines at the time (WebCrawler was my goto, then sometimes AltaVista). The quality difference between "foundation" models is nil. Even the huge models they run in datacenters are hardly better than local models you can run on a machine 64gb+ ram (though faster of course). As Google grew it got better and better at giving you good results and fighting spam, while other search engines drowned in spam and were completely ruined by SEO.
PageRank wasn't that much better. It was better and the word spread. Google also had a very clean UI at a time where websites like Excite and Yahoo had super bloated pages.
That was the differentiation. What makes you think AI companies can't find moats similar to Google's? The right UX, the right model and a winner can race past everyone.
I remember the pre-Google days when AltaVista was the best search engine, just doing keyword matching, and of course you would therefore have to wade through pages of results to hopefully find something of interest.
Google was like night & day. PageRank meant that typically the most useful results would be on the first page.
PageRank, everything before PageRank was more like yellow pages than a search engine as we know it today. Google also had a patent on it, so it's not like other people could simply copy it.
Google was also way more minimal (and therefore faster on slow connections) and it raised enough money to operate without ads for years (while its competitors were filled with them).
Not really comparable to today, when you have 3-4 products which are pretty much identical, all operating under a huge loss.
Is Claude Code even running at a marginal profit? (who knows)
Is the marginal profit large enough to pay for continued R&D to stay competitive (no)
Does Claude Code have a sustainable advantage over what Amazon, Microsoft and Google can do in this space using their incumbency advantage and actual profits and using their own infrastructure?
Assuming by "they" you mean current shareholders (who include Google and Amazon and VCs) if they are selling at least in part, why would at least some of them not be willing to sell their entire stakes?
> They could make more money keeping control of the company and have control.
This is misleading. They had 4.5 which was a new scaled up training run. It was a huge model and only served to pro users, but the biggest models are always used as teacher models for smaller models. Thats how you do distillation. It would be stupid to not use the biggest model you have in distillation and a waste since they have the weights.
The would have taken some time to calculate the efficiency gains of pretraining vs RL. Resumed the GPT-4.5 for whatever budget made sense and then spent the rest on RL.
Sure they chose to not serve the large base models anymore for cost reasons.
But I’d guess Google is doing the same. Gemini 2.5 samples very fast and seems way to small to be their base pre train. The efficiency gains in pertaining scale with model scale so it makes sense to train the largest model possible. But then the models end up super sparse and oversized and make little sense to serve in inference without distillation.
In RL the efficiency is very different because you have to inference sample the model to draw online samples. So small models start to make more sense to scale.
Big model => distill => RL
Makes the most theoretical sense for training now days for efficient spending.
So they already did train a big model 4.5. Not using it would have been absurd and they have a known recipe they could return scaling on if the returns were justified.
My understanding of 4.5 was that it was released long, long after the initial training run finished. It also had an older cutoff date than the newer 4o models
Cutoff dates seem to be Oct 2024 for GPT-4.5, and Jan 2025 for the Gemini models.
It kind of explains a coding issue I had with tradingview who update their pinescript thing quite frequently. ChatGPT seemed to have issues with v4 vs v5.
After a long stretch of I just ask codex to:
Audit the code base and event changes and describe the data model and all related and possibly overlapping functions.
Plan a new redesign that is simpler, has less code, more reuse and cleaner design,
Execute the refactor
Review the code and asses new code and reaudit
… repeat
You can queue these up in codex and it will just go about its way reducing your tech debt way faster than an engineer could.
I think this might work for smaller codebases, but the main point of my article isn't really about vibe coding. Vibe coded apps are typically smaller anyway, so refactoring isn't that big of an issue there.
When we're talking about actual software that has been around for a while and has accumulated serious tech debt, it's not so easy. I've definitely worked on apps where your described approach doesn't lead to anything viable. It's just too much for an AI to grasp when you have years of accumulated complexity, dependencies, and business logic spread across a large codebase.
Regarding vibe coders specifically: I think people who can't code themselves often don't really know what "cleaner design" or "more reuse" actually means in practice. They can certainly learn, but once they do, they're probably not vibe coders anymore.
With AI generation of code or text, I have found that quality improvements have to be run multiple times, successively, until the achieved quality reaches my expectations. Prompts must also be refined before letting it run multiple times.
They have been able to write languages for two years now.
I think I was the first to write an LLM language and first to use LLMs to write a language with this project. (Right at ChatGPT launch, gpt-3.5
https://github.com/nbardy/SynesthesiaLisp
Look at Nvidia nemotron series. They hav become a leading open source training lab themselves and they’re releasing the best training data, training tooling, and models at this point.
reply