Let's be clear: your entire post is just pure, unadulterated FUD. You first claim, based on cherry-picked benchmarks, that Mythos is actually only "barely competitive" with existing models, then suggest they must be training to the test, then call it "odd" that they are withholding the release despite detailed and forthcoming explanations from Anthropic regarding why they are doing that, then wrap it up with the completely unsubstantiated that they must be bleeding subscribers and that this must just be to stop that bleed.
Ya'll know they're teaching to the test. I'll wait till someone devises a novel test that isn't contained in the datasets. Sure, they're still powerful.
I read the entire performance degradation report in the OP, and Boris's response, and it seems that the overwhelming majority of the report's findings can indeed be explained by the `showThinkingSummaries` option being off by default as of recently.
>> Also Claude owes its popularity mostly to the excellent model running behind the scenes.
It's a bit of both. Claude Code was the tool that made Anthropic's developer mindshare explode. Yes, the models are good, but before CC they were mostly just available via multiplexers like Cursor and Copilot, via the relatively expensive API.
Yeah I think the 1M context is the issue. Because I use Opus 4.6 through Cursor at the previous 200k limit and it has been totally fine. But if I switch to the 1M version it degrades noticeably.
> Yeah I think the 1M context is the issue. Because I use Opus 4.6 through Cursor at the previous 200k limit and it has been totally fine. But if I switch to the 1M version it degrades noticeably.
I thought it was already well-known that context above 200k - 300k results in degradation.
One of my more recent comments this past week was exactly that - that there was no point in claiming that a 1m context would improve things because all the evidence we have seen is that after 300k context, the results degrade.
I have a similar workflow but I disagree with Codex/GPT-5.4 reviews being very useful. For example, in a lot of cases they suggest over-engineering by handling edge cases that won't realistically happen.
>> AI assisted coding makes you dumber full stop. It's obvious as soon as you try it for the first time. Need a regex? No need to engage your brain. AI will do that for you.
Regex is the worst possible example you could have given. Seriously, how many people do you know who painstakingly hand-craft their own regexes as opposed to using one of the million tools out there that can work backwards from example inputs and outputs to generate a regex that satisfies the conditions?
reply