They have a dedicated product called Co-work for non-technical people. Claude Code is a *coding* tool (it's in the name) and anthropic has made decisions to thoroughly annoy a lot of the users.
I'm not sure why I'm getting downvoted, but VS Code integration really does stink. Often times it will just simply not send the API request and just say reconnecting and I've had it simply freeze where the VS Code OpenAI Codex plugin has frozen, but all the other plugins like Cline or Roo are working perfectly fine. So the VS Code integration is almost unusable in my experience.
It's insane! We are so far beyond gpt-3.5 and gpt-4. If you're not approaching Claude Code and other agentic coding agents with an open mind with the goal of deriving as much value from them as possible, you are missing out on super powers.
On the flip side, anyone who believes you can create quality products with these tools without actually working hard is also deluded. My productivity is insane, what I can create in a long coding session is incredible, but I am working hard the whole time, reviewing outputs, devising GOOD integration/e2e tests to actually test the system, manually testing the whole time, keeping my eyes open for stereotypically bad model behaviors like creating fallbacks, deleting code to fulfill some objective.
It's actually downright a pain in the ass and a very unpleasant experience working in this way. I remember the sheer flow state I used to get into when doing deep programming where you are so immersed in managing the states and modeling the system. The current way of programming for me doesn't seem to provide that with the models. So there are aspects of how I have programmed my whole life that I dearly miss. Hours used to fly past me without me being the wiser due to flow. Now that's no longer the case most of the times.
The article did not provide a constructive suggestion on how to write quality code, either. Nor even empirical proof in the form of quality code written by LLMs/agents via the application of those principles.
I was LITERALLY thinking the other day of a niche tool for engineers to help them discover and fix this in the future because at the rate I have seen models version lock dependencies I thought this is going to be a big problem in the future.
You can do prompt injection through versions. The LLM would go back to GitHub in its endless attempt to people please, but dependency managers would ignore it for being invalid.
Bigger companies have vulnerability and version management toolsets like Snyk, Cycode, etc. to help keep things up to date at scale across lots of repos.
No need to build a tool for it, engineers can avoid the whole issue by simply avoiding slop-spewing code generation tools. Hell, just never allow an LLM to modify the dependency configuration - if you want to use a library, choose and import it yourself. Like an engineer.
That Erdos problem solution is believed by quite a few to be a previous result found in the literature, just used in a slightly different way. It also seems not a lack of progress but simply no one cared to give it a go.
That’s a really fantastic capability, but not super surprising.
You're thinking of a previous report from a month ago, #897 or #481, or the one from two weeks ago, #728. There's a new one from a week ago, #205, which is genuinely novel, although it is still a relatively "shallow" result.
Terence Tao maintains a list [1] of AI attempts (successful and otherwise). #205 is currently the only success in section 1, the "full solution for which subsequent literature review did not find new relevant prior partial or full solutions" section - but it is in that section.
As to speed, as far as I know the recent results are all due to GPT 5.2, which is barely a month old, or Aristotle, which is a system built on top of some frontier LLMs and which has only been accessible to the public for a month or two. I have seen multiple mathematicians report that GPT-5.2 is a major improvement in proof-writing, e.g. [2]
Thanks for the wiki link, very interesting, in particular
- the long tail aspect of the problem space ; 'a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools'
- the expertise requirement, literature review but also 'Do I understand what the key ideas of the solution are, and how the hypotheses are utilized to reach the conclusion?' so basically one must already be an expert (or able to become one) to actually use this kind of tooling
and finally the outcomes which taking into consider the previous 2 points makes it very different from what most people would assume as "AI contributions".
If I understood correctly you are giving an example of a "success" of using the technology. So that's addressing that the technology is useful or not, powerful or not, but it does not address what it actually does (maybe somebody in ChatGPT is a gnome that solved it, I'm just being provocative here to make the point) or more important that it does something it couldn't do a year ago or 5 years ago because how it is doing something new.
For example if somebody had used GPT2 with the input dataset of GPT5.2 (assuming that's the one used for Erdos problems) rather than the input dataset it had then, could it have solved those same problems? Without doing such tests it's hard to say if it moved fast, or at all. It's not because something new has been solved by it that it's new. Yes it's a reasonable assumption, but it's just that. So going for that to assuming "it" is "moving fast" is just a belief IMHO.
Also something that makes the whole process very hard to verify is what I tried to address in a much older comment : whenever LLMs are used (regardless of the input dataset) by someone who is an expert in the domain (rather than an novice) how can one evaluate what's been done by whom or what? Sure again there can be a positive result, e.g a solution to a problem until now unsolved, what does it say about the tool itself versus a user who is, by definition if they are an expert, up to date on the state of thew art?
Also the very fact that https://github.com/teorth/erdosproblems/wiki/AI-contribution... exist totally change the landscape. Because it's public it's safe to assume it's part of the input dataset so from now on, how does one evaluate the pace of progress, in particular for non open source models?
reply