Level 4 is where I see the most interesting design decisions get made, and also where most practitioners take a shortcut that compounds badly later.
When the author talks about "codifying" lessons, the instinct for most people is to update the rules file. That works fine for conventions - naming patterns, library preferences, relatively stable stuff. But there's a different category of knowledge that rules files handle poorly: the why behind decisions. Not what approach was chosen, but what was rejected and why the tradeoff landed where it did.
"Never use GraphQL for this service" is a useful rule to have in CLAUDE.md. What's not there: that GraphQL was actually evaluated, got pretty far into prototyping, and was abandoned because the caching layer had been specifically tuned for REST response shapes, and the cost of changing that was higher than the benefit for the team's current scale. The agent follows the rule. It can't tell when the rule is no longer load-bearing.
The place where this reasoning fits most naturally is git history - decisions and rejections captured in commit messages, versioned alongside the code they apply to. Good engineers have always done this informally. The discipline to do it consistently enough that agents can actually retrieve and use it is what's missing, and structuring it for that purpose is genuinely underexplored territory.
At level 7, this matters more than people expect. Background agents running across sessions with no human-in-the-loop have nothing to draw on except whatever was written down. A stale rules file in that context doesn't just cause mistakes - it produces confident mistakes.
I had a hunch that this comment was LLM-generated, and the last paragraph confirmed it. Kudos for managing to get so many upvotes though.
"Where most [X] [Y]" is an up and coming LLM trope, which seems to have surfaced fairly recently. I have no idea why, considering most claims of that form are based on no data whatsoever.
It’s still an insightful and well written comment, but the LLM-ness does make me wonder whether this part was actually human-intended or just LLM filler:
> The discipline to do it consistently enough that agents can actually retrieve and use it is what's missing, and structuring it for that purpose is genuinely underexplored territory
Because I somewhat agree that discipline may be missing, but I don’t believe it to be a groundbreaking revelation that it’s actually quite easy to tell the LLM to put key reasoning that you give it throughout the conversation into the commits and issue it works on.
Suppose you spend months deeply researching a niche topic. You make your own discoveries, structure your own insights, and feed all of this tightly curated, highly specific context into an LLM. You essentially build a custom knowledge base and train the model on your exact mental framework.
Is this fundamentally different from using a ghostwriter, an editor, or a highly advanced compiler? If I am doing the heavy lifting of context engineering and knowledge discovery, it feels restrictive to say I shouldn't utilize an LLM to structure the final output. Yet, the internet still largely views any AI-generated text as inherently "un-human" or low-effort.
I would ignore any HN content written by a ghost writer or editor. I guess I would flag compiler output but I’m not sure we’re talking about the same thing?
I’m on the internet for human beings. I already read a newspaper for editors and books for ghostwriters.
Not for long though, HN is dying. Just hanging around here waiting for the next thing , I guess…
Sorry man, the internet has died and is not being replaced by anything but an authoritarian nightmare.
My only guess is if you want actual humans, you'll have to do this IRL. Of course we has humans have got used to the 24/7 availability and scale of the internet so this is going to be a problem as these meetings won't provide the hyperactive environment we want.
Any other digital system will be gamed in one way or another.
The problem is: the structure of LLM outputs generally make everything sound profound. It’s very hard to understand quickly whether a comment has actual signal or it’s just well written bullshit.
And because the cost of generating the comments is so low, there’s no longer an implicit stamp of approval from the author. It used to be the case that you could kind of engage with a comment in good faith, because you knew somebody had spent effort creating it so they must believe it’s worth time. Even on a semi-anonymous forum like HN, that used to be a reliable signal.
So a lot of the old heuristics just don’t work on LLM-generated comments, and in my experience 99% of them turn out to be worthless. So the new heuristic is to avoid them and point them out to help others avoid them.
I hadn't considered this so eloquently with LLM text output, but you're right. "LLMs make everything sound profound" and "well-written bullshit".
This has severe ramifications for internet communications in general on forums like HN and others, where it seems LLM-written comments are sneaking in pretty much everywhere.
It's also very, very dangerous :/ Because the structure of the writing falsely implies authority and trust where there shouldn't be, or where it's not applicable.
I really don’t mind this in principle (in fact it could help me out a lot). The problem is that the LLM often skews meaning by making up filler-phrases and it becomes hard to tell what you actually mean and what your LLM made up.
Yeah, you need to be aware of hallucinations for sure. Today, for example, I was doing my one linear, I've used all the curated knowledge to make some structure to it, see examples of deep research, brainstorm ideas around it, but I am the verifier and the steerer. 99% of the ideas were total BS IMO, but it inspired me on wording, what to use, and how to combine them to achieve something simple and understandable.
One idea that I haven't tried but will do is to create a soul.md dumping my writing style, etc.. to see the result (which will be an interesting experiment)
But if you think about it, LLM's are good on generic stuff, then you start curating context, you start using context engineering to structure and give form to that context, but then this is your expertise, your knowledge, and your insights. (if they are not synthetics tho) So now you have something tailored to your needs, something that can be used for brainstorming, idea generation, filtration, (if we see this as a pyramid starting from the most expanded and generic, going to specifics and things that only you can take and merge as solutions on your mental main branch).
So now you have data, knowledge, which is feeding and training the responses that will be generated for you for the current session, by the LLM, and maybe with the harness as well(they are not doing a great job so far in being real connectors).
Of course, we are away from AI taking and working on autopilot with my knowledge, but now I have become faster at: generating ideas, forming new knowledge, testing it, verifying it, deciding if this will be something synthetic or should I go deeper to discover more & explore the cases of it to form a new deep connection.
So, is this something that I used LLM just to generate the content for me? Or have i amplified myself and used LLM to structure the response, (maybe if this is not my primary language and I need to use it in order to form more in-depth sentences)?
It is for this reason that I usually keep an "adr" folder in my repo to capture Architecture Decision Record documents in markdown. These allow the agent to get the "why" when it needs to. Useful for humans too.
The challenge is really crafting your main agent prompt such that the agent only reads the ADRs when absolutely necessary. Otherwise they muddy the context for simple inside-the-box tasks.
Is it a mad dream to wish that was never gets DOM access, and instead there is invented a less memory-hungry dynamic representation of web pages that's usable only by wasm? Yeah, it's a mad dream. But it's also maddening that I can effortlessly open a 100 MB PDF but browser can barely handle a 10 MB html document.
When the author talks about "codifying" lessons, the instinct for most people is to update the rules file. That works fine for conventions - naming patterns, library preferences, relatively stable stuff. But there's a different category of knowledge that rules files handle poorly: the why behind decisions. Not what approach was chosen, but what was rejected and why the tradeoff landed where it did.
"Never use GraphQL for this service" is a useful rule to have in CLAUDE.md. What's not there: that GraphQL was actually evaluated, got pretty far into prototyping, and was abandoned because the caching layer had been specifically tuned for REST response shapes, and the cost of changing that was higher than the benefit for the team's current scale. The agent follows the rule. It can't tell when the rule is no longer load-bearing.
The place where this reasoning fits most naturally is git history - decisions and rejections captured in commit messages, versioned alongside the code they apply to. Good engineers have always done this informally. The discipline to do it consistently enough that agents can actually retrieve and use it is what's missing, and structuring it for that purpose is genuinely underexplored territory.
At level 7, this matters more than people expect. Background agents running across sessions with no human-in-the-loop have nothing to draw on except whatever was written down. A stale rules file in that context doesn't just cause mistakes - it produces confident mistakes.