Hacker Newsnew | past | comments | ask | show | jobs | submit | gbnwl's commentslogin

OK I'll take the opportunity to be the the first non-self-promotional comment on this thread now that concensure and rohan2003 have done their ads.

Based on this post's current position on the front page it kind of seems to fall in line with a pattern we've all been seeing the past few months: HN is finally majority onboard with believing in the usefullness of coding agents and is celebrating this by rediscovering each and every personal "I improved CC by doing [blank] thing" from scratch project.

That's all whatever. Fine. But what I'm really curious about is does the HN community actually look at the random LLM-generated statistic-vomit text posted by creators like this and find themselves convinced?

I ask because if you're new to random stat vomit you're going to find yourself having to deal with it all the time soon, and I've yet to find good meta discussions about how we find signal in this noise. I used to use HN or selected reddit community upvotes as a first pass "possibly important" signal, but its been getting worse and worse, illustrated by posts like this getting upvoted to the top without any genuine discssion.


> random LLM-generated statistic-vomit text

I do not understand why this project in particular have set you off.

Their README looks much better than many I've seen on HN:

- no annoying verbosity, that is so prevalent in AI-generated text - not too many buzzwords (they're not saying "agentic" every sentence) - it is very clear what exactly project is supposed to do and why it can be useful

Personally, I upvoted this because I wanted to do something similar for a long time but never got around to it.


Their comment reporting stats here:

“Provider: OpenAI (gpt-4o / o1)”

Uh so is it 4o or o1? Very different models. When you read this, how did you interpret this?

    “Suite: 11-task core suite (atomic coding tasks)”
- OK ill take your word for it

    “Configuration: autoroute_first=true, single_file_fast_path=false
Run Variant Token Delta (per call) Step Savings (vs Baseline) Task Success Baseline (2026-03-13) -18.62% — 11/11 Hardened A +8.07% — 11/11 Enhanced (2026-03-27) -6.73% +27.78% 11/11 Key Takeaways:

- What useful information do you glean from this vova_hn2? Perhaps Im just ignorant.

    “The ROI of Precision: While the "Enhanced" run used roughly 6.73% more tokens than the baseline per request, it required 27.78% fewer steps to reach a successful solution.”
So it actually takes MORE tokens but less “steps”? This could all use actual discussion feom the creator. A blog post or detailed comment. Instead we get this.

What sets me off is projects like this that throw random numbers and technical jargon at you because the user simply asked their LLM to do so. It gives the veneer of “oh it must be legitimate look at all the data” to people mentally stuck in 2024 not realizing anyone can generate junk and pass if off in a way that (used to be) convincing.


> Personally, I upvoted this because I wanted to do something similar for a long time but never got around to it.

It's much easier to give your agents the LSP for the language(s) it's working on.

A project that turns them into MCP: https://github.com/isaacphi/mcp-language-server


Easier, sure.


LSP has many more "functions" that are useful to agents, such as rename and get definition/usage.

Some basic differences described here: https://news.ycombinator.com/item?id=30664671

More in depth: https://lambdaland.org/posts/2026-01-21_tree-sitter_vs_lsp/

Claude Code's instructions for LSP: https://github.com/Piebald-AI/claude-code-system-prompts/blo...

I'm not convinced that ASTs are meaningfully helpful in the grand scheme and the long-run. LSPs and code intelligence I find useful as a human and for my coding agent.


My understanding is that yes they are probabilistic by nature in that they give you a probability distribution over all tokens in the vocabulary for the next token, but if you just take the policy of using max prob for each token their output becomes deterministic right? Same output always results for a given input.

Hmmm yes and no, here the output "tends towards" deterministic (≠ deterministic), because the internal process contains noise (noise can't be nullified, otherwise LLMs wouldn't make mistakes and wouldn't produce unexpected outcomes). And the thing is that even if the output tends towards deterministic they will always be ways to do prompt injections. So maybe I should clarify by mentioning that the solution I shared/proposed is external to the AI's internal process, it's a complete separate thing from the LLMs

Not the original commenter but they could be talking about Timberborn. Don't have it but have friends who play.


I'm noticing terms related to DL/RL/NLP are being used more and more informally as AI takes over more of the cultural zeitgeist and people want to use the fancy new terms of the era, even if inaccurately. A friend told me he "trained and fine tuned a custom agent" for his work when what he meant was he modified a claude.md file.


Respectfully, your friend doesn't know what he is talking about and is saying things that just "feel right" (vibe talking??). Which might be exactly how technical terms lose their meaning so perhaps you're exactly right.


Is this a joke that’s going over my head? The country we all know the term “century of humiliation” from has recovered and is literally a superpower right now?


+1 to creating tickets by simply asking the agent to. It's worked great and larger tasks can be broken down into smaller subtasks that could reasonably be completed in a single context window, so you rarely every have to deal with compaction. Especially in the last few months since Claude's gotten good at dispatching agents to handle tasks if you ask it to, I can plan large changes that span multilpe tickets and tell claude to dispatch agents as needed to handle them (which it will do in parallel if they mostly touch different files), keeping the main chat relatively clean for orchestration and validation work.


It totally is. The fact that this post has gotten this many upvotes is appalling.


Just wait sir. We are indeed doing inference on n64. We had serious issues with text. I am almost done resolving.


You mean to tell me the included screenshot hasn't convinced you?

https://github.com/sophiaeagent-beep/n64llm-legend-of-Elya/b...


I think the source code in the GitHub repo generates the ROM in the corresponding screenshots, but it seems quite barebones.

It feels very much like it’s cobbled together from the libdragon examples directory. Or, they use hardware acceleration for the 2D sprites, but then write fixed-width text to the frambuffer with software rendering.


Partially correct. The value is not the game interface right now. Its proof you can do actual inference on an LLM the surprise I am developing is a bit bigger than this, just have to get the llm outputs right first!


Can you elaborate on the “partially correct” bit? I’d like to understand the programming of the ROM better.


You’re right that the graphics layer is mostly 2D right now. Sprites are hardware-accelerated where it makes sense, and text is written directly to the framebuffer. The UI is intentionally minimal. The point of this ROM wasn’t the game interface — it was proving real LLM inference running on-device on the N64’s R4300i (93 MHz MIPS, 4MB RDRAM). Since the original screenshots, we’ve added: • Direct keyboard input • Real-time chat loop with the model • Frame-synchronous generation (1–3 tokens per frame @ 60 FPS) So it’s now interactive, not just a demo render. The current focus is correctness and stability of inference. The graphics layer can evolve later. Next step is exposing a lightweight SDK layer so N64 devs can hook model calls into 3D scenes or gameplay logic — essentially treating the LLM as a callable subsystem rather than a UI gimmick. The value isn’t the menu. It’s that inference is happening on 1996 silicon. Happy to answer specifics about the pipeline if you’re interested.


Delivered. please reconsider now. AI slop cannot build this without a human who has real risc cpu knowledge. The Emulator ---------------------------------------------- https://bottube.ai/watch/shFVLBT0kHY

The real iron! it runs faster on real iron! ---------------------------------------------- https://bottube.ai/watch/7GL90ftLqvh

The Rom image ---------------------------------------------- https://github.com/sophiaeagent-beep/n64llm-legend-of-Elya/b...

reply


What makes you think their fame will be ephemeral? All of the tech billionaires from the 90s, 00s, and 10s are still constantly in the news for better or worse.


They need to generate revenue to continue to raise money to continue to invest in compute. Even if they have the Midas Touch it needs to be continously improved because there are three other competing Midas Touch companies working on new and improved Midas Touch's that will make their's obsolete and worthless if they stay still even for a second.


But most of their funding comes from speculative investment, not selling their services. Also, wouldn't selling their own products/services generate revenue?


Making a profitable product is so much more than just building it. I've probably made 100+ side projects in my life and only a handful has ever generated any revenue.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: