Hacker Newsnew | past | comments | ask | show | jobs | submit | thoughtpeddler's commentslogin

The author is reasoning through analogies, which can sometimes be helpful, but it's not applicable in this case. This is a thought-provoking post, but unlikely to be an accurate map of the future. I can use the same analogies to come to entirely different conclusions about the nature of knowledge work in 10-20 years. I have fallen for this trap myself: trying to reason about AI through the lens of the smartphone revolution, or the cloud era, etc. I don't think it fully holds up. Sometimes there are entirely new paradigms that create new worlds, and we have to navigate through the fog to get to a more intelligible side.

Is anyone else surprised that Gödel, Escher, Bach is as low on the list as it is? My experience on HN would have me believe it would be in the top 10 for sure. I wonder if it’s a string-matching issue.

I appreciate Andrej’s optimistic spirit, and I am grateful that he dedicates so much of his time to educating the wider public about AI/LLMs. That said, it would be great to hear his perspective on how 2025 changed the concentration of power in the industry, what’s happening with open-source, local inference, hardware constraints, etc. For example, he characterizes Claude Code as “running on your computer”, but no, it’s just the TUI that runs locally, with inference in the cloud. The reader is left to wonder how that might evolve in 2026 and beyond.

The CC point is more about the data and environmental and general configuration context, not compute and where it happens to run today. The cloud setups are clunky because of context and UIUX user in the loop considerations, not because of compute considerations.

Agree with the GP, though -- you ought to make that clearer. It really reads like you're saying that CC runs locally, which is confusing since you obviously know better.

I think we need to shift our mindset on what an agent is. The LLM is a brain in a vat connected far away. The agent sits on your device, as a mech suit for that brain, and can pretty much do damn near anything on that machine. It's there, with you. The same way any desktop software is.

Yeah, I made some edits to clarify.

From what I can gather, llama.cpp supports Anthropic's message format now[1], so you can use it with Claude Code[2].

[1]: https://github.com/ggml-org/llama.cpp/pull/17570

[2]: https://news.ycombinator.com/item?id=44654145


One of the most interesting coding agents to run locally is actually OpenAI Codex, since it has the ability to run against their gpt-oss models hosted by Ollama.

  codex --oss -m gpt-oss:20b
Or 120b if you can fit the larger model.

What do you find interesting about it, and how does it compare to commercial offerings?

It's rare to find a local model that's capable of running tools in a loop well enough to power a coding agent.

I don't think gpt-oss:20b is strong enough to be honest, but 120b can do an OK job.

Nowhere NEAR as good as the big hosted models though.


Think of it as the early years of UNIX & PC. Running inferences and tools locally and offline opens doors to new industries. We might not even need client/server paradigm locally. LLM is just a probabilistic library we can call.

Thanks.

What he meant was, agents will probably not be these web abstractions that run in deployed services (langchain, crew); agents meaning the Harnesses (software wrapper) specifically that call the LLM API.

It runs on your computer because of its tooling. It can call Bash. It can literally do anything on the operating system and file system. That's what makes it different. You should think of it like a mech suit. The model is just the brain in a vat connected far away.


The section on Claude Code is very ambiguously and confusingly written, I think he meant that the agent runs on your computer (not inference) and that this is in contrast to agents running "on a website" or in the cloud:

> I think OpenAI got this wrong because I think they focused their codex / agent efforts on cloud deployments in containers orchestrated from ChatGPT instead of localhost. [...] CC got this order of precedence correct and packaged it into a beautiful, minimal, compelling CLI form factor that changed what AI looks like - it's not just a website you go to like Google, it's a little spirit/ghost that "lives" on your computer. This is a new, distinct paradigm of interaction with an AI.

However, if so, this is definitely a distinction that needs to be made far more clearly.


Well Microsoft had thier "localhost" AI before CC but that was a ghost without a clear purpose or skill.

Look into the emerging literature around "needle-in-a-haystack" tests of LLM context windows. You'll see what the poster you're replying to is describing, in part. This can also be described as testing "how lazy is my LLM being when it comes to analyzing the input I've provided to it?" Hint: they can get quite lazy! I agree with the poster you replied to that "RAG my Obsidian"-type experiments with local models are middling at best. I'm optimistic things will get a lot better in the future, but it's hard to trust a lot of the 'insights' this blog post talks about, without intense QA-ing (if the author did it, which I doubt, considering their writing is also lazily mostly AI-assisted as well).

Is it fair to view this release as Nvidia strategically flexing that they can compete with their own customers in the model layer -- that they can be as vertically integrated as, say, GDM?

If you read the original post, the author actually gives the sort of credit to Doctorow that you're providing here, and says it is precisely because he is so well-liked (justifiably) that it's important for him to offer better prescriptions to his followers. I myself am a huge Doctorow fan but agree with the linked author's claim that "AI isn't going away" in the way Doctorow and other public thinkers making similar claim have been saying. Their words ultimately come from a good place, but what to do about it will likely be different than what's recommended so far.


This reminds me of a recent reflection, upon seeing an old journal entry of mine from ~2012, where I seemed to be grappling back then with the same exact issues I do today, namely 'browser tab overload'. Even though we've since had over a decade of tech progress (e.g. tab groups and associated features, AI, etc), I'm still drowning in tab overload. It actually made me laugh for a moment. All this powerful AI, large browser feature development teams shipping consistently quarter after quarter, and I'm still in the same spot. I could copy-paste this dilemma across a variety of 'productivity challenges' and arrive at a similar place.


Unless back-channel conversations keep 'competitors' colluding to ensure that 'public SOTA' is ~uniformly distributed...


Have there been deals like this before in the industry? (with the same express purpose as this, which is to say, creating a dedicated permanent capital vehicle that acquires + holds legacy businesses (e.g. accounting, IT services, etc) to transform them with AI agents using an equity-holding tech partner (OpenAI) to train models for tasks with company-specific data)? Feels new to me.


I get your point, but hardware becomes obsolete. There will be a 'hardware overhang' (or more precisely stated, 'compute overhang') here. That is, unless we find a use for what will eventually be these older-gen GPUs. At least with traditional data centers, a lot of the hardware was storage, which the unfolding timeline of humanity was sure to fill up. Are we really going to do this much ongoing matmul on hardware that will eventually have outdated economics compared to next-gen GPUs? Go figure...


I don’t think desktop gpus are going to move the needle for embodied AI robotics, at least not for on-board processing. The bottleneck currently is in making better models run on more resource-constrained devices, and there are different ways to approach this problem.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: