You'll need to make all US customers provide personal IDs for access first. I'm not American, but I do often hear how attached Americans can be to their personal firearms and how against providing their personal ID they can be.
Yeah, Opus/GPT need multiple rounds of reviews from each other to get to clean auto review. Fable was like, it is done and indeed… crickets in bot comments. ‘No issues’ galore.
GPT-5.5 has been really hard to beat imho. I've spent $$$ on Opus, Deepseek v4 Pro and recently started to dogfood GLM-5.2 (which is not bad) but I cannot really trust any of them (almost blind) like I can trust GPT-5.5. It gives me tremendous confidence. I cannot say the same for any of the others I mentioned.
>> I am on the opposite camp. Open models are starting to perform better. GPT 5.5 keeps on messing things up.
I'm working in a 600k+ LoC codebase that has complex domain-specific logic and lots of moving parts. I find that Codex 5.5 is pretty good at surgical fixes, but does not go out of its way to explore and figure out what those surgical fixes might break. So I only use it to work on parts of the system that are pretty isolated from everything else so that risk of regression is small.
Same here, reserved a 48GB M5 Pro shortly after seeing the news, and now I see the same retailer raised the price by over $1000. If they honor the sale, then this will be the most short term value I've gotten out of an HN submission ever.
To be fair with Codex, you can use any harness you want with it. Access is not gatekeeper by a crappy full of slop electron app.
So just move to PI, or whatever.
Claude on the contrary, forces all plan users to use their horrible app, which, if you ever dared to use cowork, only once, will run a 2GB VM on app start, no f's given. at all.
Not justifying it. But if you use the official Codex app, thats on you. If you use the official Claude app, it's because you are forced to.
Sidenote unrelated to the post: since the Fable thing, and after serious thinking, I moved to open source models. I still have the basic OpenAI sub, but then easy lifting is now done elsewhere.
>if you ever dared to use cowork, only once, will run a 2GB VM on app start, no f's given. at all.
Of all the issues, this seems like the most tame. I mean, there are single Chrome tabs that can use 300MB or even 700MB. A 2GB VM for what is likely isolated local testing of scripts and commands or local lightweight first-level inference to help guide the main harness sounds reasonable.
Not being able to use my own harness on the subscription plan is my biggest gripe with Anthropic/Claude. For what I work on, I still get better results with Opus than I do with GPT5.5-codex, but damn do I hate that I either have to PAYG or I'm stuck using Claude Code.
Anthropic employees are right, but maybe this is for good. It certainly has opened my eyes.
I can’t rely on using a technology that the US administration can ban at will.
IMO without getting into personal thoughts about how capable the current US administration is, last Friday move sent a very powerful signal to the industry.
Also I don’t think China releasing so many good models, capable to compete with Opus 4.8 and GPT 5.5, all at once, is a coincidence.
Are you saying that you think the US government is unpredictable and arbitrary, but that the People’s Republic of China is not? Do you remember all the PRoC’s strange and sudden policy shifts (e.g. steel, real estate, education, football/soccer, etc.)?
It seems to me that in the case of AI (as with many other modern technologies), you rely on vendor/creator support and updates to stay relevant, so the ‘next’ model matters more than the current one, and we have no idea whose next model will be open (and whose won’t).
Not OP and I wholly agree, but you can’t dismiss the fact that they are releasing those weights. Their agenda is quite obviously to make Anthropic and OpenAI CFOs sweat bullets, but it isn’t our problem as AI consumers, right?
Yes, I agree that it is possible that the 'open source model providers' are doing the equivalent of 'dumping' in an attempt to establish a dominant market position, or at least a foot-hold. I am generally a skeptic when it comes to the effectiveness of 'dumping' as a long-term strategy (as the producer tends to hemorrhage consumers when it increases prices), but some may see it as problematic.
I think for many practical purposes the frontier open weight models are almost universally good enough for most things. There may be greater and greater frontiers but at q certain point it becomes like IQ. Having a 150 IQ doesn’t mean you’ll be more successful at any particular task over someone with a 125 IQ. Indeed there’s a diminishing return on intelligence on many utility functions where being more intelligent yields more be same or worse ultimate outcomes. It might very well be the person with a 150 IQ could understand some extraordinarily complex and esoteric concepts faster, but it doesn’t mean with more effort the 125 IQ person can’t either; and sometimes that extra time spent yields better outcomes overall.
I suspect AI will be somewhat similar where even if the linear scaling laws continue to hold the practical utility of a model flattens for almost all conceivable use cases.
In some ways I already feel this has begun to happen. The marginal utility of opus class models and fable has in my perception begun to flatten. While I can tell the differences they aren’t earth shattering. I could continue to use the present models for the rest of my life and be ludicrously more productive simply by adapting within their constraints through ever more sophisticated applications.
What holds back the open weights IMO is hardware scaling and industrial production. As the enormous transfer of wealth in debt and equity markets unfolds with semiconductor and adjacent companies and the corresponding capital investments are made, and the eventual bubble pop leading to over capacity and market flooding, as well as advances in technology, math, techniques, and efficiencies, will make very large open weight models more directly attainable. This will also lead to chimera models that MOE very large models to get very close to the 1-2T parameter dense models, at which point I suspect utility for almost all uses is nearly fully saturated.
There will be areas where more capable models are needed but they will be frontier models on frontier problems. This, IMO, is inevitable, and without some criminalization of weights (see the attempts to criminalize encryption algorithms in the 20th century and all the wonderful tshirts that emerged). It’ll be harder to print a trillion parameter model on a shirt but I’m sure someone will try, as will governments try to keep us in our boxes slaving for food coupons and basic rights like health care.
> I am sorry but I can’t use any US AI if I don’t have the guarantee that I will be able to use it tomorrow.
To be fair this is every commercial model. We have already seen GHCP increase prices by anywhere from 10-100x (depending on usage). And old models get retired all the time. While these are not exactly the same as a cutting edge model being shut down, increasing prices a super high amount leads to effectively the same outcome.
> And Trump showed us he is willing to take it out whenever he wants.
Yes, the actions of this administration on Friday should have sent shockwaves through the market - a market that's currently "high on AI". How do you get a return on all of that AI investment if the administration can jump in at any time and say "Nope, you can't use this very advanced model!"? (the Iran "deal" over the weekend, I think helped cushion that blow, but eventually it's going to sink in)
The problem is there’s a real wall on the vram side. While fused main memory is ok the inference speeds on larger models are impractical. With vram on a GPU the machine class, power requirement, GPU costs, and other factors put them out of most people’s reach. Cloud GPUs require a second job to keep available and hot. What closed providers offer is packing and scale advantages as well as infrastructure. The scaling laws here aren’t the same as Moore’s law - in fact they predict more required hardware and more scale over time. Moore’s laws isn’t keeping up with expanded needs and the ability to fab and produce at scale the specific things that weren’t needed a few years ago are lagging. So it’s not a 6-8 month lag; it’s a lag that will be induced by hardware scarcity and an ever increasing lag until something fundamentally changes with matmul.
You can run the Chinese models on your infra, most are open weights. Not saying it’s out of the goodness of their heart, but the fact is, they’re open.
The excuse they give is borderline childish. I get the thing about slow rollout, make sure partners get to fix the bugs, etc...
But bad actors are hard working motivated entities with tens of thousand of fake ids, and american citizens working for them, for pennies.
All while the ones like or you sit at a crossfire which is borderline useless.
I cant wait to see what Qwen did with the massive distillation they made out of Opus 4.8 and Fable aka Mythos aka pretty sure they jailbroke it.
reply