Hacker Newsnew | past | comments | ask | show | jobs | submit | nbardy's commentslogin

Your way off, this reads more like anti capitalist political rhetoric than real reasoning.

Look at Nvidia nemotron series. They hav become a leading open source training lab themselves and they’re releasing the best training data, training tooling, and models at this point.


When are people going to drop the immigration is good at all costs assumption.

We need a well managed set of immigration polices or country WILL take advantage of US. These are our military rivals and we sell our most advanced math, physics and engineering seats to the highest bidder. It’s a self districting disaster and it’s not just on us to treat people better.

Look at the rate of Indian asylum seekers in Canada to see the most extreme case. It happens anywhere you extend naivety and boundless good will.


Those arc agi 2 improvements are insane.

Thats especially encouraging to me because those are all about generalization.

5 and 5.1 both felt overfit and would break down and be stubborn when you got them outside their lane. As opposed to Opus 4.5 which is lovely at self correcting.

It’s one of those things you really feel in the model rather than whether it can tackle a harder problem or not, but rather can I go back and forth with this thing learning and correcting together.

This whole releases is insanely optimistic for me. If they can push this much improvement WITHOUT the new huge data centers and without a new scaled base model. Thats incredibly encouraging for what comes next.

Remember the next big data center are 20-30x the chip count and 6-8x the efficiency on the new chip.

I expect they can saturate the benchmarks WITHOUT and novel research and algorithmic gains. But at this point it’s clear they’re capable of pushing research qualitatively as well.


It's also possible that OpenAI use many human-generated similar-to-ARC data to train (semi-cheating). OpenAI has enough incentive to fake high score.

Without fully disclosing training data you will never be sure whether good performance comes from memorization or "semi-memorization".


> 5 and 5.1 both felt overfit and would break down and be stubborn when you got them outside their lane. As opposed to Opus 4.5 which is lovely at self correcting.

This is simply the "openness vs directive-following" spectrum, which as a side-effect results in the sycophancy spectrum, which still none of them have found an answer to.

Recent GPT models follow directives more closely than Claude models, and are less sycophantic. Even Claude 4.5 models are still somewhat prone to "You're absolutely right!". GPT 5+ (API) models never do this. The byproduct is that the former are willing to self-correct, and the latter is more stubborn.


Opus 4.5 answers most of my non-question comments with ‘you’re right.’ as the first thing in the output. At least I’m not absolutely right, I’ll take this as an improvement.


Hah, maybe 5th gen Claude will change to "you may be right".

The positive thing is that it seems to be more performative than anything. Claude models will say "you're [absolutely] right" and then immediately do something that contradicts it (because you weren't right).

Gemini 3 Pro seems to have struck a decent balance between stubbornness and you're-right-ness, though I still need to test it more.


5.2 seems worse on overfitting for esoteric logic puzzles in my testing. Tests using precise language where attention has to be paid to use the correct definition among many for a given word. It charges ahead with wrong definitions in a far lower accuracy and worse way now.

Same. Also got my attention re ARC-AGI-2. That's meaningful. And a HUGE leap.


Slight tangent yet I think is quite interesting... you can try out the ARC-AGI 2 tasks by hand at this website [0] (along with other similar problem sets). Really puts into perspective the type of thinking AI is learning!

[0] https://neoneye.github.io/arc/?dataset=ARC-AGI-2


You haven’t actually looked at their fundamentals. They’re profitable serving current models including training costs and are only losing money on future RD training, but if you project future revenue growth on future generations of models you get a clear path to profitability.

They charge higher costs than OpenAI and have faster growing API demand. They have great margins compared to the rest of the industry on inference.

Sure the revenue growth could stop but it hasn’t and there is no reason to think it will.


> They’re profitable serving current models including training costs

I hear this a lot, do you have a good source (apart from their CEO saying it in an interview). I might have more faith in him but checks notes, it's late 2025 and AI is not writing all our code yet (amongst other mental things he's said).


The best I kind is this tech crunch article, which appears to be referencing an article from the information that is pay walled.

> The Information reports that Anthropic expects to generate as much as $70 billion in revenue and $17 billion in cash flow in 2028. The growth projections are fueled by rapid adoption of Anthropic’s business products, a person with knowledge of the company’s financials said.

> That said, the company expects its gross profit margin — which measures a company’s profitability after accounting for direct costs associated with producing goods and services — to reach 50% this year and 77% in 2028, up from negative 94% last year, per The Information.

https://techcrunch.com/2025/11/04/anthropic-expects-b2b-dema...


So assuming that the gross margin is GAAP (which it probably isn't), then this would suggest that the costs of training are covered by inference sales this year (which is definitely good).

However, I'm still a little sceptical around this as the cost to train new models is going up super-linearly (apparently) which means that the revenue from inference needs to also go up along side this.

Interesting to think about though, thanks for the source!


We will all have a great source if they IPO :)


Why are you assuming Anthropic is for sale? They have a clear path to profitability, booming growth, and a massive and mission driven founding team.

They could make more money keeping control of the company and have control.


> They have a clear path to profitability

I'd love to see evidence for such a thing, because it's not clear to me at all that this is the case.

I personally think they're the best of the model providers but not sure if any foundation model companies (pure play) have a path to profitability.


What do you mean by pure play? Claude code alone is 1B revenue. It's not just the API they make money on.

https://www.anthropic.com/news/anthropic-acquires-bun-as-cla...


But there's no moat around these models, they're all interchangeable and leapfrogging each other at a decent pace.

Gemini could get much better tomorrow and their entire customer base could switch without issue.


I think Claude Code is the moat (though I definitely recognize it's a pretty shallow moat). I don't want to switch to Codex or whatever the Gemini CLI is, I like Claude Code and I've gotten used to how it works.

Again, I know that's a shallow moat - agents just aren't that complex from a pure code perspective, and there are already tools that you can use to proxy Claude Code's requests out to different models. But at least in my own experience there is a definite stickiness to Claude that I probably won't bother to overcome if your model is 1.1x better. I pay for Google Business or whatever it's called primarily to maintain my vanity email and I get some level of Gemini usage for free, and I barely touch it, even though I'm hearing good things about it.

(If anything I'm convincing myself to give Gemini a closer look, but I don't think that undermines my overarching (though slightly soft) point).


I went from:

  1. using Claude Code exclusively (back when it really was on another level from the competition) to

  2. switching back and forth with CC using the Z.ai GLM 4.6 backend (very close to a drop-in replacement these days) due to CC massively cutting down the quota on the Claude Pro plan to

  3. now primarily using OpenCode with the Claude Code backend, or Sonnet 4.5 Github Copilot backend, or Z.ai GLM 4.6 backend (in that order of priority)
OpenCode is so much faster than CC even when using Claude Sonnet as the model (at least on the cheap Claude Pro plan, can't speak for Max). But it can't be entirely due to the Claude plan rate limiting because it's way faster than CC even when using Claude Code itself as the backend in OC.

I became so ridiculously sick of waiting around for CC just to like move a text field or something, it was like watching paint dry. OpenCode isn't perfect but very close these days and as previously stated, crazy fast in comparison to CC.

Now that I'm no longer afraid of losing the unique value proposition of CC my brand loyalty to Anthropic is incredibly tenuous, if they cut rate limits again or hurt my experience in the slightest way again it will be an insta-cancel.

So the market situation is much different than the early days of CC as a cutting edge novel tool, and relying on that first mover status forever is increasingly untenable in my opinion. The competition has had a long time to catch up and both the proprietary options like Codex and open source model-agnostic FOSS tools are in a very strong position now (except Gemini CLI is still frustrating to use as much as I wish it wasn't, hopefully Google will fix the weird looping and other bugs ... eventually, because I really do like Gemini 3 and pay for it already via AI Pro plan).


You've convinced me to give OpenCode a try!


Google Code assist is pretty good. I had it create a pretty comprehensive inventory tracking app within the quota that you get with the $25 google plan.


And, if your revenue is $1B but your costs are $2B it only lasts until the music stops....


I don’t think they are losing money on inference.

Model training, sure. But that will slow down at some point.


Why do you think so? Seems like the space is fiercely competitive, I would expect it to get more expensive.


How many times are people going to repeat this lazy statement?

If claude code's revenue grows faster than cost, it will become profitable.


"If claude code's revenue grows faster than cost, it will become profitable."

No shit?


What was the moat in search?


Google had PageRank, which gave them much better quality results (and they got users to stick with them by offering lots of free services (like gmail) that were better quality than existing paid services). The difference was night and day compared to the best other search engines at the time (WebCrawler was my goto, then sometimes AltaVista). The quality difference between "foundation" models is nil. Even the huge models they run in datacenters are hardly better than local models you can run on a machine 64gb+ ram (though faster of course). As Google grew it got better and better at giving you good results and fighting spam, while other search engines drowned in spam and were completely ruined by SEO.


PageRank wasn't that much better. It was better and the word spread. Google also had a very clean UI at a time where websites like Excite and Yahoo had super bloated pages.

That was the differentiation. What makes you think AI companies can't find moats similar to Google's? The right UX, the right model and a winner can race past everyone.


> PageRank wasn't that much better

It really was!

I remember the pre-Google days when AltaVista was the best search engine, just doing keyword matching, and of course you would therefore have to wade through pages of results to hopefully find something of interest.

Google was like night & day. PageRank meant that typically the most useful results would be on the first page.


PageRank, everything before PageRank was more like yellow pages than a search engine as we know it today. Google also had a patent on it, so it's not like other people could simply copy it.

Google was also way more minimal (and therefore faster on slow connections) and it raised enough money to operate without ads for years (while its competitors were filled with them).

Not really comparable to today, when you have 3-4 products which are pretty much identical, all operating under a huge loss.


The sheer amount of data and infrastructure Google has relative to their competitors.

Just having far more user search queries and click data gives them a huge advantage.


Google is in a two sided market. Their moat in search is their ads market share, their moat in ads is their search market share.


And the same question I always ask.

Are they profitable (no),

Is Claude Code even running at a marginal profit? (who knows)

Is the marginal profit large enough to pay for continued R&D to stay competitive (no)

Does Claude Code have a sustainable advantage over what Amazon, Microsoft and Google can do in this space using their incumbency advantage and actual profits and using their own infrastructure?


Which is not a lot at all compared to their cost and especially the valuation discussed here.


> What do you mean by pure play?

They just sell tokens, essentially. Much like Open AI but very different from Google or Microsoft, who make their money elsewhere.


They are selling, to public equity investors, because they can get a better price that way than selling to another company!


Why are you assuming Anthropic is for sale?

They're preparing for IPO?


Assuming by "they" you mean current shareholders (who include Google and Amazon and VCs) if they are selling at least in part, why would at least some of them not be willing to sell their entire stakes?

> They could make more money keeping control of the company and have control.

It depends on how much they can sell for.


We’re not assuming anything, this whole post is about them doing an IPO…


That's not clear at all, at best your statement is controversial if not outright dubious.


signed D. Amodei lmao


This is misleading. They had 4.5 which was a new scaled up training run. It was a huge model and only served to pro users, but the biggest models are always used as teacher models for smaller models. Thats how you do distillation. It would be stupid to not use the biggest model you have in distillation and a waste since they have the weights.

The would have taken some time to calculate the efficiency gains of pretraining vs RL. Resumed the GPT-4.5 for whatever budget made sense and then spent the rest on RL.

Sure they chose to not serve the large base models anymore for cost reasons.

But I’d guess Google is doing the same. Gemini 2.5 samples very fast and seems way to small to be their base pre train. The efficiency gains in pertaining scale with model scale so it makes sense to train the largest model possible. But then the models end up super sparse and oversized and make little sense to serve in inference without distillation.

In RL the efficiency is very different because you have to inference sample the model to draw online samples. So small models start to make more sense to scale.

Big model => distill => RL

Makes the most theoretical sense for training now days for efficient spending.

So they already did train a big model 4.5. Not using it would have been absurd and they have a known recipe they could return scaling on if the returns were justified.


My understanding of 4.5 was that it was released long, long after the initial training run finished. It also had an older cutoff date than the newer 4o models


Cutoff dates seem to be Oct 2024 for GPT-4.5, and Jan 2025 for the Gemini models.

It kind of explains a coding issue I had with tradingview who update their pinescript thing quite frequently. ChatGPT seemed to have issues with v4 vs v5.


There’s another paper that shows you can get the same effect by training auto regression on Fill in the middle data.

So it’s more about the mask modeling objective than Diffusion.


Which paper is that?


This is so not a problem.

After a long stretch of I just ask codex to: Audit the code base and event changes and describe the data model and all related and possibly overlapping functions.

Plan a new redesign that is simpler, has less code, more reuse and cleaner design,

Execute the refactor Review the code and asses new code and reaudit

… repeat

You can queue these up in codex and it will just go about its way reducing your tech debt way faster than an engineer could.


I think this might work for smaller codebases, but the main point of my article isn't really about vibe coding. Vibe coded apps are typically smaller anyway, so refactoring isn't that big of an issue there.

When we're talking about actual software that has been around for a while and has accumulated serious tech debt, it's not so easy. I've definitely worked on apps where your described approach doesn't lead to anything viable. It's just too much for an AI to grasp when you have years of accumulated complexity, dependencies, and business logic spread across a large codebase.

Regarding vibe coders specifically: I think people who can't code themselves often don't really know what "cleaner design" or "more reuse" actually means in practice. They can certainly learn, but once they do, they're probably not vibe coders anymore.


With AI generation of code or text, I have found that quality improvements have to be run multiple times, successively, until the achieved quality reaches my expectations. Prompts must also be refined before letting it run multiple times.


In my experience codex fails at that kind of task in many cases.

I think it’s unlikely it will succeed at that task with a real world , old, large codebase.

Now, the next release might work !


You know nothing Jon Snow....


They have been able to write languages for two years now.

I think I was the first to write an LLM language and first to use LLMs to write a language with this project. (Right at ChatGPT launch, gpt-3.5 https://github.com/nbardy/SynesthesiaLisp


Yea, I can't get gemini to stop and think, even if I tell it to not write code it will rewrite the code block each time


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: