More

computerex · 2026-02-12T00:27:26 1770856046

They have a dedicated product called Co-work for non-technical people. Claude Code is a *coding* tool (it's in the name) and anthropic has made decisions to thoroughly annoy a lot of the users.

computerex · 2026-02-12T00:24:43 1770855883

Codex has been useless for me on standard Plus plan unfortunately. Actually thoroughly disappointed. And VS code integration is totally broken.

computerex · 2026-02-12T05:35:07 1770874507

I'm not sure why I'm getting downvoted, but VS Code integration really does stink. Often times it will just simply not send the API request and just say reconnecting and I've had it simply freeze where the VS Code OpenAI Codex plugin has frozen, but all the other plugins like Cline or Roo are working perfectly fine. So the VS Code integration is almost unusable in my experience.

computerex · 2026-02-12T00:23:19 1770855799

American labs trained in a different way than the Chinese labs. They might be making profit on inference but they are burning money otherwise.

computerex · 2026-02-06T21:07:12 1770412032

It's insane! We are so far beyond gpt-3.5 and gpt-4. If you're not approaching Claude Code and other agentic coding agents with an open mind with the goal of deriving as much value from them as possible, you are missing out on super powers.

On the flip side, anyone who believes you can create quality products with these tools without actually working hard is also deluded. My productivity is insane, what I can create in a long coding session is incredible, but I am working hard the whole time, reviewing outputs, devising GOOD integration/e2e tests to actually test the system, manually testing the whole time, keeping my eyes open for stereotypically bad model behaviors like creating fallbacks, deleting code to fulfill some objective.

It's actually downright a pain in the ass and a very unpleasant experience working in this way. I remember the sheer flow state I used to get into when doing deep programming where you are so immersed in managing the states and modeling the system. The current way of programming for me doesn't seem to provide that with the models. So there are aspects of how I have programmed my whole life that I dearly miss. Hours used to fly past me without me being the wiser due to flow. Now that's no longer the case most of the times.

computerex · 2026-02-06T20:54:09 1770411249

Can you be specific? You didn't provide any constructive feedback, whatsoever.

einpoklum · 2026-02-06T21:24:28 1770413068

The article did not provide a constructive suggestion on how to write quality code, either. Nor even empirical proof in the form of quality code written by LLMs/agents via the application of those principles.

computerex · 2026-02-06T21:49:24 1770414564

Yes it did, it provided 12 things that the author asserts helps produce quality code. Feel free to address the content with something productive.

computerex · 2026-02-06T02:29:07 1770344947

This hasn’t been my experience at all working on production code bases with LLMs. What you are describing is how it was more like in gpt 3.5 era.

lubujackson · 2026-02-06T05:05:27 1770354327

Not using LLMs, but using them without ever looking at the code.

computerex · 2026-02-06T01:50:43 1770342643

And the goal post shifts.

computerex · 2026-01-16T23:21:59 1768605719

I think that's too restrictive; agents should be allowed to reference the internet like we do.

computerex · 2026-01-16T23:20:02 1768605602

I was LITERALLY thinking the other day of a niche tool for engineers to help them discover and fix this in the future because at the rate I have seen models version lock dependencies I thought this is going to be a big problem in the future.

ljm · 2026-01-17T00:09:52 1768608592

You can do prompt injection through versions. The LLM would go back to GitHub in its endless attempt to people please, but dependency managers would ignore it for being invalid.

mikestorrent · 2026-01-17T00:07:26 1768608446

Bigger companies have vulnerability and version management toolsets like Snyk, Cycode, etc. to help keep things up to date at scale across lots of repos.

satvikpendem · 2026-01-16T23:21:44 1768605704

Just use Dependi or similar VSCode extensions, they'll tell you if dependencies are outdated.

solid_fuel · 2026-01-17T02:19:35 1768616375

No need to build a tool for it, engineers can avoid the whole issue by simply avoiding slop-spewing code generation tools. Hell, just never allow an LLM to modify the dependency configuration - if you want to use a library, choose and import it yourself. Like an engineer.

callc · 2026-01-17T03:10:43 1768619443

Proposal to not tarnish the good name of actual engineers: slopgineers.

Maybe LLemgineers? Slopgrammers?

computerex · 2026-01-16T09:16:35 1768554995

You know LLM's have been used to solve very hard previously unsolved math problems like some of the Erdos problems?

patagurbon · 2026-01-16T10:59:23 1768561163

That Erdos problem solution is believed by quite a few to be a previous result found in the literature, just used in a slightly different way. It also seems not a lack of progress but simply no one cared to give it a go.

That’s a really fantastic capability, but not super surprising.

bakkoting · 2026-01-16T20:48:43 1768596523

You're thinking of a previous report from a month ago, #897 or #481, or the one from two weeks ago, #728. There's a new one from a week ago, #205, which is genuinely novel, although it is still a relatively "shallow" result.

Terence Tao maintains a list [1] of AI attempts (successful and otherwise). #205 is currently the only success in section 1, the "full solution for which subsequent literature review did not find new relevant prior partial or full solutions" section - but it is in that section.

As to speed, as far as I know the recent results are all due to GPT 5.2, which is barely a month old, or Aristotle, which is a system built on top of some frontier LLMs and which has only been accessible to the public for a month or two. I have seen multiple mathematicians report that GPT-5.2 is a major improvement in proof-writing, e.g. [2]

[1] https://github.com/teorth/erdosproblems/wiki/AI-contribution...

[2] https://x.com/AcerFur/status/1999314476320063546

utopiah · 2026-01-17T16:24:04 1768667044

Thanks for the wiki link, very interesting, in particular

- the long tail aspect of the problem space ; 'a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools'

- the expertise requirement, literature review but also 'Do I understand what the key ideas of the solution are, and how the hypotheses are utilized to reach the conclusion?' so basically one must already be an expert (or able to become one) to actually use this kind of tooling

and finally the outcomes which taking into consider the previous 2 points makes it very different from what most people would assume as "AI contributions".

utopiah · 2026-01-16T18:42:09 1768588929

I do, and I read Tao's comments on his usage too, that still doesn't address what I wrote.

computerex · 2026-01-16T23:00:40 1768604440

How does it not address what you wrote?

utopiah · 2026-01-17T14:34:22 1768660462

If I understood correctly you are giving an example of a "success" of using the technology. So that's addressing that the technology is useful or not, powerful or not, but it does not address what it actually does (maybe somebody in ChatGPT is a gnome that solved it, I'm just being provocative here to make the point) or more important that it does something it couldn't do a year ago or 5 years ago because how it is doing something new.

For example if somebody had used GPT2 with the input dataset of GPT5.2 (assuming that's the one used for Erdos problems) rather than the input dataset it had then, could it have solved those same problems? Without doing such tests it's hard to say if it moved fast, or at all. It's not because something new has been solved by it that it's new. Yes it's a reasonable assumption, but it's just that. So going for that to assuming "it" is "moving fast" is just a belief IMHO.

utopiah · 2026-01-17T14:37:06 1768660626

Also something that makes the whole process very hard to verify is what I tried to address in a much older comment : whenever LLMs are used (regardless of the input dataset) by someone who is an expert in the domain (rather than an novice) how can one evaluate what's been done by whom or what? Sure again there can be a positive result, e.g a solution to a problem until now unsolved, what does it say about the tool itself versus a user who is, by definition if they are an expert, up to date on the state of thew art?

utopiah · 2026-01-17T14:39:32 1768660772

Also the very fact that https://github.com/teorth/erdosproblems/wiki/AI-contribution... exist totally change the landscape. Because it's public it's safe to assume it's part of the input dataset so from now on, how does one evaluate the pace of progress, in particular for non open source models?