Quick clarification: the `/hn` page is a no-login, no-API-key interactive demo (autoplay is just the default walkthrough; you can pause and click around).
OpenAI BYOK is only for the full app when you want real model calls.
More detail on what’s different under the hood:
- Branches are anchored to a source message + selected span (not freeform threads).
- Collector items are references back to those spans, so "Compose" can build a prompt from explicit citations rather than chat history drift.
- "Context compiler" shows the exact prompt stack + token budget, and lets you exclude/pin items to control what survives truncation.
Feedback I'd love: does Branch + Collector + Compose feel faster than "open a second chat window + copy/paste", or does it feel like extra steps?
1) It's complex. Formally, Moscow controlled the launch codes. However Ukraine designed and built the ICBMs, and are near the top of nations with the highest nuclear physicist per capita ratio.
On top of that the Soviet nuclear lockout systems are rumored to be much simpler than the American ones. Whereas the American system is rumored to be something like the decryption key for the detonation timings (without which you have at best a dirty bomb), the Soviet lockout mechanism is rumored to just be a lockout device with a 'is locked' signal going to the physics package. If that's all true, taking control of those nukes from a technical perspective would be on the order of hotwiring a 1950s automobile.
Taking physical control would have been more complex, but everything was both more complex and in some ways a lot simpler as the wall fell. It would have ultimately been a negotiation.
2) See above.
3) Which military nuclear power has been attacked by the kind of adversary that you can throw a nuke at? Yes, it doesn't remove all threats, but no solution does. Removing a class of threat (and arguably the most powerful class of threat in concrete terms) is extremely valuable.
Your computer is designed and built in China therefore your computer belongs to Chinese and China. Right?
> See above
Maybe you should see how good the Ukraine was at keeping their naval assets after they used the totally legal methods to obtain them. Maybe then you would have a clue on how good they could had maintained them.
> Your computer is designed and built in China therefore your computer belongs to Chinese and China. Right?
The previous owner was the USSR, who ceased to exist, and who Ukraine was a part of.
> Maybe you should see how good the Ukraine was at keeping their naval assets after they used the totally legal methods to obtain them. Maybe then you would have a clue on how good they could had maintained them.
Are you talking about the ships that weren't originally that Russia mostly scuttled on their way out of Sevastopal, in addition to stuff like a 70% completed nuclear powered carrier that even Russia couldn't maintain the sister to, and didn't fit in any naval doctrine that made sense for Ukraine?
Oh, so there were some wedding contract what stated what in case the parties.. part - there would be the transfer and division of assets? When why Belorussia didn't received their part of the navy? Kazakhstan? Georgia? Baltics, because they surely "were parts of USSR"?
> Are you talking about the ships that weren't originally
That weren't originally what? I know you degraded to just throwing words with your blanket knowledge but again you can find out the fate of the ships the Ukraine used totally legit means to obtain from Russian Federation with a quite short trip to Wikipedia.
Actually, exactly. We're specifically talking about the arsenal of the 43rd Rocket Army of the Soviet Strategic Rocket Forces. A force not reorganized until much later to be under the Russian Federation, and the relevant 1990 Budapest Memorandum occurred before the 1991 creation of the CIS.
Rather than a vague "not quite", would you care to elaborate?
> Oh, so there were some wedding contract what stated what in case the parties.. part - there would be the transfer and division of assets? When why Belorussia didn't received their part of the navy? Kazakhstan? Georgia? Baltics, because they surely "were parts of USSR"?
I think a divorce settlement is actually a pretty good model actually. Those other states rankly didn't have the means to keep them, but should have been otherwise compensated for that loss. However, as I described above, Ukraine literally designed and built large portions of these systems as was capable of keeping them.
> That weren't originally what? I know you degraded to just throwing words with your blanket knowledge but again you can find out the fate of the ships the Ukraine used totally legit means to obtain from Russian Federation with a quite short trip to Wikipedia.
I'm dyslexic and accidentally a word while editing. Are you incapable of telling what was meant by context, or where you just looking for a reason not to address the point made?
Good, you made a first step, now do the other two.
> but should have been otherwise compensated for that loss
It's quite amusing what you are clearly imply what some state shouldn't be compensated at all.
> Are you incapable of telling what was meant by context, or where you just looking for a reason not to address the point made?
Yes, I'm incapable of telling why you threw something completely unrelated to the question. I'm not LLM.
> Ukraine literally designed and built large portions of these systems as was capable of keeping them.
Ah, yes, the mighty Ukraine who solely done that, right? Every other nation, state and people in the USSR didn't do shit to that. I have a feeling you are thinking about that issue as some sort of video game: just a couple of factories and a bunch of special units. But the things are not like that in RL.
> Your computer is designed and built in China therefore your computer belongs to Chinese and China. Right?
The question is whether china would be capable of maintaining the equipment they created and have physical possession of, not whether they can root it without physical access.
- I think about what work needs to be done, how it ties into the existing architecture, and how completion is going to make me money / solve a problem
- Start CC (usually in yolo / '--dangerously-skip-permissions' mode), tell it to read relevant files or investigate how x works now, don't code anything.
- Explain the problem to it, still don't code anything but ask it how it's going to solve this. If I'm satisfied that it's looking in the right direction, let it rip
- Wait for results, intensely manually QA whatever comes out. Take a cursory glance at the diffs in my IDE, to see if I generally approve of and understand what its doing
I can't QA faster than a single claude code agent produces output. While I test the output of step n, I sometimes let it continue on step n+1, but more than that is beyond me ATM.
Very curious to learn how others are going beyond this! Right now I don't see an immediate path beyong this for myself, so if you have some tips or another way of doing thing entirely, I'd be very grateful!
Working for a few different clients atm as a freelancer:
Postgres etl pipelines; python glue code around a computer vision model; some rest api integrations to a frontend (outside of my direct control); an llm-backed sql generator that integrates with legacy shitware; a swift ios/macos app...
I agree that I need to invest heavily in testing infrastructure, thanks. My work is pretty heterogeneous, so I kinda put that on the backburner as there's always more pressing short term stuff to tackle...
Not sure this counts as "successful" yet (invite-only beta, still rough), but I'm building a full product almost entirely via LLM-assisted coding.
Tangents (https://tangents.chat) is an Angular/Nest/Postgres app for thinking-with-LLMs without losing the thread.
- Branch: select any span (user or assistant) and branch it into a tangent thread so the main thread stays coherent.
- Collector: collect spans across messages/threads into curated context, then prompt with it.
- You can inspect a "what the model will see" preview and keep a stored context-assembly manifest.
Vibe-coding aspect: about 600 commits and about 120k LOC (tests included) and I have not handwritten the implementation code. I do write specs/docs/checklists and I run tests/CI like normal.
What made it workable for something larger than a static page:
- Treat the model like a junior dev: explicit requirements plus acceptance criteria, thin slices, one change at a time.
- Keep "project truth" in versioned docs (design system plus interface spec) so the model does not drift.
- Enforce guardrails: types, lint, tests, and a strict definition of "done."
- The bottleneck is not generating code, it is preventing context/spec drift and keeping invariants stable across hundreds of changes.
If you define "vibe coding" as "I never look at the code," I do not think serious production apps fit that. But if you define it as "the LLM writes the code and you steer via specs/tests," it is possible to build something non-trivial.
Happy to answer specifics if anyone cares (workflow, tooling, what breaks first, etc.).
…it really feels like they’re attempting to reinvent a project tracker and starting off from scratch in thinking about it.
It feels like they’re a few versions behind what I’m doing, which is… odd.
Self-hosting a plane.io instance. Added a plane MCP tool to my codex. Added workflow instructions into Agents.md which cover standards, documentation, related work, labels, branch names, adding of comments before plan, after plan, at varying steps of implementation, summary before moving ticket to done. Creating new tickers and being able to relate to current or others, etc…
It ain’t that hard. Just do inception (high to mid level details) create epics and tasks. Add personas, details, notes, acceptance criteria and more. Can add comments yourself to update. Whatever.
Slice tickets thin and then go wild. Add tickets as your working though things. Make modifications.
This is actually very interesting I think, as Anthropic pushes against The Bitter Lesson a bit! The model is a great reasoner, but we still need a concrete way to manage tasks - like we needed for tool calling. Claude Code has an opinionated loop, something like ReAct/CoT etc with prompting tricks for tasks/skills/etc, but here they add a Hierarchical Controller/Worker thing leveraging the Claude SDK. Mixing agency with actual control using program logic - not just alignment using prompts screaming in all caps and emoji.
We are going to break out of the coding agent’s loop in this way - it’s sorta curving back around to Workflows, after leaving them behind for agency, but right now we need to orchestrate this with deterministic code written mostly by humans - like the git repo anthropic shared. This won’t last long.
Used an LLM to help write the following up as I’m still pretty scattered about the idea and on mobile.
——
Something I’ve been going over in my head:
I used to work in a pretty strict Pivotal XP shop. PM ran the team like a conductor. We had analysts, QA, leads, seniors. Inceptions for new features were long, sometimes heated sessions with PM + Analyst + QA + Lead + a couple of seniors. Out of that you’d get:
- Thinly sliced epics and tasks
- Clear ownership
- Everyone aligned on data flows and boundaries
- Specs, requirements, and acceptance criteria nailed at both high- and mid-level
At the end, everyone knew what was talking to what, what “done” meant, and where the edges were.
What I’m thinking about now is basically that process, but agentized and wired into the tooling:
- Any ticket is an entry point into a graph, not just a blob of text.
- Epics ↔ tasks ↔ subtasks
- Linked specs / decisions / notes
- Files and PRs that touched the same areas
- Standards live as versioned docs, not just a random Agents.md:
- Markdown (with diagrams) that declares where it applies: tags, ticket types, modules.
- Tickets can pin those docs via labels/tags/links.
- From the agent’s perspective, the UI is just a viewer/editor.
- The real surface is an API: “given this ticket, type, module, and tags, give me all applicable standards, related work, and code history.”
- The agent then plays something like the analyst + senior engineer role:
- Pulls in the right standards automatically
- Proposes acceptance criteria and subtasks
- Explains why a file looks the way it does by walking past tickets / PRs / decisions
So it’s less “LLM stapled to an issue tracker” and more “that old XP inception + thin-slice discipline, encoded as a graph the agent can actually reason over.”
Has any project tried forcing a planning layer as //TODO all throughout the code before making any changes? small loops like one //TODO at a time? What about limiting changes to a function at a time to remain focused? Or is everyone a slave to however the model was designed and currently they are designed for giant one-shot generations only?
Is it possible that all local models need to be better is more context used to make simpler smaller changes at a time? I haven't seen enough specific comparisons of how local models fail vs the expensive cloud models.
I did find beads helpful for some of this multi-context window tasks. It sounds a little like there is some convergence between what they are suggesting and how it give you light weight sub tasks that survive a /clear.
> It sounds a little like there is some convergence between what they are suggesting and how it give you light weight sub tasks that survive a /clear.
I do see the convergence there. Beads gives you that "state that survives `/clear`," and Anthropic’s harness tries to do something similar at a higher level.
I've been thinking about this with a pretty simple, old-school analogy:
You're at a shop with solid engineering and ticketing practices. You just hired a great junior developer. They know the stack, maybe even the domain basics, but they don't yet know:
- Your business processes
- The quirks of your microservices
- Local naming conventions, standards, etc.
- Team norms around testing, logging, and observability
You trust them with important tasks, but expect their context will frequently get blown away by interruptions, meetings, task-switching, and long weekends. T handle this, need to make sure each ticket or note contains enough structured info so that when they inevitably lose context, they can pick right back up.
For each ticket, you'd likely include:
- Personas and user goals
- Acceptance criteria, Given/When/Then scenarios
- Links to specs, documentation, related tickets, or prior art
- A short summary of their current understanding
- Rough plan (steps, what's done/not done)
- Decisions made and their rationale ("I chose X because Y")
- Open questions or known gotchas
End of day Friday, that junior would ideally leave notes that answer:
"If I have total amnesia next Tuesday, what's the minimum needed to quickly reload my context?"
To me, agent harnesses like Anthropic's or Beads are just formalizing exactly this pattern:
- `/clear` or `/new` is like a "long weekend brain wipe."
- Persistent subtasks or controllers become structured scaffolding.
- The crucial piece isn't remembering everything, just clearly capturing intent, decisions, rationale, and immediate next steps.
My confusion about Anthropic’s approach is why they're doing this over plain text files or JSON, instead of leveraging decades of existing tracker and project-management tooling—which already encode this exact workflow and best practice.
OpenAI BYOK is only for the full app when you want real model calls.
More detail on what’s different under the hood:
Feedback I'd love: does Branch + Collector + Compose feel faster than "open a second chat window + copy/paste", or does it feel like extra steps?reply