More

tomashubelbauer · 2026-02-25T15:13:52 1772032432

In theory, comments on Hacker News should advance discussion and meet a certain quality bar lest they be downvoted to make room for the ones that meet the criteria. I am not sure if this ever was true in practice, it certainly seems to have waned in the years I have been a reader of this forum (see one of the many pelican on a bike comments on any AI model release thread), but I'd expect some people still try to vote with this in mind.

Being sarcastic doesn't lower the bar for a comment to meet to not get downvoted, so I wouldn't go thinking people miss the sarcasm without first considering whether the comment adds to the discussion when wondering why a comment is downvoted.

tomashubelbauer · 2026-02-25T12:54:18 1772024058

TIL that HA notifications can have associated actions. I have the exact same setup as you, except I only receive the notification and then walk over to the laptop to unblock the agent feeling like a human tool call. This will improve my workflow, thank you.

moontear · 2026-02-25T13:06:37 1772024797

The notification payload for reference, you will also need a permission input_select (pending/allow/deny) and an automation that triggers upon mobile_app_notification_action:

  notification_payload=$(cat <<EOF
  {
    "message": "$escaped_message",
    "title": "$escaped_title",
    "data": {
      "tag": "$escaped_request_id",
      "group": "claude-code",
      "actions": [
        {
          "action": "CLAUDE_ALLOW",
          "title": " Allow"
        },
        {
          "action": "CLAUDE_DENY",
          "title": " Deny"
        }
      ]
    }
  }
  EOF
  )

Actionable notifications are a bit cumbersome on iOS since you need to long-press the notification for actions, but it does work.

tomashubelbauer · 2026-02-25T09:13:37 1772010817

Not the person you replied to, but I'll stress the point that it is not just what you can add that Claude Code doesn't offer, but also what you don't need to add that Claude Code does offer that you don't want.

I dislike many things about Claude Code, but I'll pick subagents as one example. Don't want to use them? Tough luck. (AFAIK, it's been a while since I used CC, maybe it is configurable now or was always and I never discovered that.)

With Pi, I just didn't install an extension for that, which I suspect exists, but I have a choice of never finding out.

prettyblocks · 2026-02-25T14:13:48 1772028828

You can just put "Never use subagents" in your CLAUDE.md and it will honor it, no?

tomashubelbauer · 2026-02-25T14:19:10 1772029150

IME CLAUDE.md rarely gets fully honored. I've left HN comments before about how I had to convert some CLAUDE.md instructions to pre-commit deterministic checks due to how often they were ignored. My guesstimate is that it is about 70 % reliable. That's with Opus 4.5. I've since switched to GPT-5.2 and now GPT-5.3 Codex and use Codex CLI, Pi and OpenCode, not CC, so maybe things have changed with a new system prompt or with the introduction of Opus 4.6.

extr · 2026-02-25T17:04:13 1772039053

This is and has always been trivially configurable. Just put `Task` as a disallowed tool.

tomashubelbauer · 2026-02-23T10:28:09 1771842489

> running oxfmt without any arguments recursively scans directory tree from the current directory for all .js and .ts files and silently reformats them

I've got to say this is what I would have expected and wanted to happen. I'd say it is wise to not run tools designed to edit files on files you don't have a backup for (like Git) without doing a dry-run or a small scope experiment first.

vladvasiliu · 2026-02-23T10:45:52 1771843552

While I can get behind things such as "use version control," "use backups", etc. this is definitely not what I'd expect from a program run without arguments, especially when it will go and change stuff.

Tadpole9181 · 2026-02-23T13:45:45 1771854345

What? The very first page of documentation tells you this. The help screen clearly shows a `--check` argument. This is a formatter and uses the same arguments as many others - in particular Prettier, the most popular formatter in the ecosystem.

How were you not expecting this? Did you not bother to read anything before installing and running this command on a sensitive codebase?

vladvasiliu · 2026-02-23T15:30:46 1771860646

I do usually run new tools from somewhere harmless, like ~/tmp, just in case they do something unexpected.

But most formatters I'm used to absolutely don't do this. For example, `rustfmt` will read input from stdin if no argument is given. It can traverse modules in a project, but it won't start modifying everything under your CWD.

Most unix tools will either wait for some stdin or dump some kind of help when no argument is given. Hell, according to this tool's docs, even `prettier` seems to expect an argument:

    > Running oxfmt without arguments formats the current directory (*equivalent to prettier --write .*)

I'm not familiar with prettier, so I may be wrong, but from the above, I understand that prettier doesn't start rewriting files if no argument is given?

Looking up prettier's docs, they have this to say:

    > --write
    This rewrites all processed files in place. *This is comparable to the eslint --fix* workflow.

So eslint also doesn't automatically overwrite everything?

So yeah, I can't say this is expected behaviour, even if it's documented.

johnny22 · 2026-02-23T21:30:03 1771882203

a more related tool would be prettier, which also has a --write option

tomashubelbauer · 2026-02-21T18:28:25 1771698505

I also built the equivalent of OpenClaw myself sometime when it was still called Clawdbot and I'm confused how LLMs can be both heralds of the era of personal apps and everyone at the same time be using the same vibe coded personal LLM assistant someone else made, much less it being worth an OpenAI acquisition. I agree building one yourself is very fun.

tomashubelbauer · 2026-02-17T08:14:30 1771316070

Just this morning I have run across an even narrower case of how AGENTS.md (in this case with GPT-5.3 Codex) can be completely ignored even if filled with explicit instructions.

I have a line there that says Codex should never use Node APIs where Bun APIs exist for the same thing. Routinely, Claude Code and now Codex would ignore this.

I just replaced that rule with a TypeScript-compiler-powered AST based deterministic rule. Now the agent can attempt to commit code with banned Node API usage and the pre-commit script will fail, so it is forced to get it right.

I've found myself migrating more and more of my AGENTS.md instructions to compiler-based checks like these - where possible. I feel as though this shouldn't be needed if the models were good, but it seems to be and I guess the deterministic nature of these checks is better than relying on the LLM's questionable respect of the rules.

iamflimflam1 · 2026-02-17T09:38:52 1771321132

Not that much different from humans.

We have pre-commit hooks to prevent people doing the wrong thing. We have all sorts of guardrails to help people.

And the “modern” approach when someone does something wrong is not to blame the person, but to ask “how did the system allow this mistake? What guardrails are missing?”

MITSardine · 2026-02-17T12:35:27 1771331727

I wonder if some of these could be embedded in the write tool calls?

tomashubelbauer · 2026-02-14T23:33:21 1771112001

I'm disappointed that even Signal does this when asking you for access to your contacts.

tomashubelbauer · 2026-02-05T18:41:10 1770316870

I agree, I was confused about where 5.3 non Codex was. 5.2-Codex disappointed me enough that I won't be giving 5.3 Codex a try, but I'm looking forward to trying 5.3 non Codex with Pi.

sunaookami · 2026-02-05T20:02:03 1770321723

GPT-5.x in general are very disappointing, the only good chat model was GPT-5 in the first week before they made "the personality warmer" and Codex in general was always kinda meh.

tomashubelbauer · 2026-02-03T17:18:39 1770139119

Anthropic banned my account when I whipped up a solution to control Claude Code running on my Mac from my phone when I'm out and about. No commercial angle, just a tool I made for myself since they wouldn't ship this feature (and still haven't). I wasn't their biggest fanboy to begin with, but it gave me the kick in the butt needed to go and explore alternatives until local models get good enough that I don't need to use hosted models altogether.

darkwater · 2026-02-03T17:26:55 1770139615

I control it with ssh and sometimes tmux (but termux+wireguard lead to a surprisingly generally stable connection). Why did you need more than that?

tomashubelbauer · 2026-02-03T17:30:34 1770139834

I didn't like the existing SSH applications for iOS and I already have a local app that I made that I have open 24/7, so I added a screen that used xterm.js and Bun.spawn with Bun.Terminal to mirror the process running on my Mac to my phone. This let me add a few bells and whistles that a generic SSH client wouldn't have, like notifications when Claude Code was done working etc.

pluralmonad · 2026-02-03T19:36:13 1770147373

How did they even know you did this? I cannot imagine what cause they could have for the ban. They actively want folks building tooling around and integrating with Claude Code.

tomashubelbauer · 2026-02-03T19:39:32 1770147572

I have no idea. The alternative is that my account just happened to be on the wrong side of their probably slop-coded abuse detection algorithm. Not really any better.

redblacktree · 2026-02-03T17:39:27 1770140367

How did this work? The ban, I mean. Did you just wake up to find out an email and that your creds no longer worked? Were you doing things to sub-process out to the Claude Code CLI or something else?

tomashubelbauer · 2026-02-03T17:44:08 1770140648

I left a sibling comment detailing the technical side of things. I used the `Bun.spawn` API with the `terminal` key to give CC a PTY and mirrored it to my phone with xterm.js. I used SSE to stream CC data to xterm.js and a regular request to send commands out from my phone. In my mind, this is no different than using CC via SSH from my phone - I was still bound by the same limits and wasn't trying to bypass them, Anthropic is entitled to their different opinion of course.

And yeah, I got three (for some reason) emails titled "Your account has been suspended" whose content said "An internal investigation of suspicious signals associated with your account indicates a violation of our Usage Policy. As a result, we have revoked your access to Claude.". There is a link to a Google Form which I filled out, but I don't expect to hear back.

I did nothing even remotely suspicious with my Anthropic subscription so I am reasonably sure this mirroring is what got me banned.

Edit: BTW I have since iterated on doing the same mirroring using OpenCode with Codex, then Codex with Codex and now Pi with GPT-5.2 (non-Codex) and OpenAI hasn't banned me yet and I don't think they will as they decided to explicitly support using your subscription with third party coding agents following Anthropic's crackdown on OpenCode.

fc417fc802 · 2026-02-03T18:39:35 1770143975

> Anthropic is entitled to their different opinion of course.

I'm not so sure. It doesn't sound like you were circumventing any technical measures meant to enforce the ToS which I think places them in the wrong.

Unless I'm missing some obvious context (I don't use Mac and am unfamiliar with the Bun.spawn API) I don't understand how hooking a TUI up to a PTY and piping text around is remotely suspicious or even unusual. Would they ban you for using a custom terminal emulator? What about a custom fork of tmux? The entire thing sounds absurd to me. (I mean the entire OpenCode thing also seems absurd and wrong to me but at least that one is unambiguously against the ToS.)

eptcyka · 2026-02-03T19:52:01 1770148321

> Anthropic is entitled to their different opinion of course.

It’d be cool if Anthropic were bound by their terms of use that you had to sign. Of course, they may well be broad enough to fire customers at will. Not that I suggest you expend any more time fighting this behemoth of a company though. Just sad that this is the state of the art.

tomashubelbauer · 2026-02-03T19:55:04 1770148504

It sucks and I wish it were different, but it is not so different from trying to get support at Meta or Google. If I was an AI grifter I could probably just DM a person on Twitter and get this sorted, but as a paying customer, it's wisest to go where they actually want my money.

RationPhantoms · 2026-02-03T17:25:20 1770139520

There is weaponized malaise employed by these frontier model providers and I feel like that dark-pattern, what you pointed out, and others are employed to rate-limit certain subscriptions.

bri3d · 2026-02-03T17:37:54 1770140274

They have two products:

* Subscription plans, which are (probably) subsidized and definitely oversubscribed (ie, 100% of subscribers could not use 100% of their tokens 100% of the time).

* Wholesale tokens, which are (probably) profitable.

If you try to use one product as the other product, it breaks their assumptions and business model.

I don't really see how this is weaponized malaise; capacity planning and some form of over-subscription is a widely accepted thing in every industry and product in the universe?

tomashubelbauer · 2026-02-03T17:57:42 1770141462

I am curious to see how this will pan out long-term. Is the quality gap of Opus-4.5 over GPT-5.2 large enough to overcome the fact that OpenAI has merged these two bullet points into one? I think Anthropic might have bet on no other frontier lab daring to disconnect their subscription from their in-house coding agent and OpenAI called their bluff to get some free marketing following Anthropic's crackdown on OpenCode.

bri3d · 2026-02-03T18:03:27 1770141807

It will also be interesting to see which model is more sustainable once the money fire subsidy musical chairs start to shake out; it all depends on how many whales there are in both directions I think (subscription customers using more than expected vs large buys of profitable API tokens).

Propelloni · 2026-02-03T18:03:17 1770141797

So, if I rent out my bike to you for an hour a day for really cheap money and I do so a 50 more times to 50 others, so that my bike is oversubscribed and you and others don't get your hours, that's OK because it is just capacity planning on my side and widely accepted? Good to know.

bri3d · 2026-02-03T18:48:13 1770144493

Let me introduce you to Citibike?

Also, this is more like "I sell a service called take a bike to the grocery store" with a clause in the contract saying "only ride the bike to the grocery store." I do this because I am assuming that most users will ride the bike to the grocery store 1 mile away a few times a week, so they will remain available, even though there is an off chance that some customers will ride laps to the store 24/7. However, I also sell a separate, more expensive service called Bikes By the Hour.

My customers suddenly start using the grocery store plan to ride to a pub 15 miles away, so I kick them off of the grocery store plan and make them buy Bikes By the Hour.

elzbardico · 2026-02-03T19:09:59 1770145799

As others pointed out, every business that sells capacity does this, including your ISP provider.

They could, of course, price your 10GB plan under the assumption that you would max out your connection 24 hours a day.

I fail to see how this would be advantageous to the vast majority of the customers.

pluralmonad · 2026-02-03T19:43:15 1770147795

Well, if the service price were in any way tied to the cost of transmitting bytes, then even the 24hr scenarios would likely see a reduction in cost to customers. Instead we have overage fees and data caps to help with "network congestion", which tells us all how little they think of their customers.

dehugger · 2026-02-03T18:52:42 1770144762

Yes, correct. Essentially every single industry and tool which rents out capacity of any system or service does this. Your ISP does this. The airline does this. Cruise lines. Cloud computing environments. Restaurants. Rental cars. The list is endless.

pyvpx · 2026-02-03T18:35:54 1770143754

I have some bad news for you about your home internet connection.

Tossrock · 2026-02-03T18:33:54 1770143634

They did ship that feature, it's called "&" / teleport from web. They also have an iOS app.

tomashubelbauer · 2026-02-03T18:36:24 1770143784

That's non-local. I am not interested in coding assistants that work on cloud based work-spaces. That's what motivated me to developed this feature for myself.

Tossrock · 2026-02-03T19:06:23 1770145583

But... Claude Code is already cloud-based. It relies on the Anthropic API. Your data is all already being ingested by them. Seems like a weird boundary to draw, trusting the company's model with your data but not their convenience web ui. Being local-only (ie OpenCode & open weights model running on your own hw) is consistent, at least.

tomashubelbauer · 2026-02-03T19:10:02 1770145802

It is not a moral stance. I just prefer to have my files of my personal projects in one place. Sure I sync them to GitHub for backup, but I don't use GitHub for anything else in my personal projects. I am not going to use a workflow which relies on checking out my code to some VM where I have to set everything up in a way where it has access to all the tools and dependencies that are already there on my machine. It's slower, clunkier. IMO you can't beat the convenience of working on your local files. When I used my CC mirror for the brief period where it worked, when I came back to my laptop, all my changes were just already there, no commits, no pulls, no sync, nothing.

Tossrock · 2026-02-03T19:16:12 1770146172

Ah okay, that makes sense. Sorry they pulled the plug on you!

tomashubelbauer · 2026-02-03T09:13:32 1770110012

I am using GPT-5.2 Codex with reasoning set to high via OpenCode and Codex and when I ask it to fix an E2E test it tells me that it fixed it and prints a command I can run to test the changes, instead of checking whether it fixed the test and looping until it did. This is just one example of how lazy/stupid the model is. It _is_ a skill issue, on the model's part.

Sammi · 2026-02-03T17:49:49 1770140989

Non codex gpt 5.2 is much better than codex gpt 5.2 for me. It does everything better.

tomashubelbauer · 2026-02-03T17:51:51 1770141111

Yup, I find it very counter-intuitive that this would be the case, but I switched today and I can already see a massive difference.

Sammi · 2026-02-04T08:42:51 1770194571

It fits with the intuition that codex is simply overfitted.

tomashubelbauer · 2026-02-04T08:48:10 1770194890

Yeah I meant it more like it is not intuitive to my why OpenAI would fumble it this hard. They have got to have tested it internally and seen that it sucked, especially compared to GPT-5.2

theshrike79 · 2026-02-03T11:55:56 1770119756

Codex runs in a stupidly tight sandbox and because of that it refuses to run anything.

But using the same model through pi, for example, it's super smart because pi just doesn't have ANY safeguards :D

tomashubelbauer · 2026-02-03T15:42:53 1770133373

I'll take this as my sign to give Pi a shot then :D Edit: I don't want to speak too son, but this Pi thing is really growing on me so far… Thank you!

theshrike79 · 2026-02-03T21:50:55 1770155455

Wait until you figure out you can just say "create a skill to do..." and it'll just do it, write it in the right place and tell you to /reload

Or "create an extension to..." and it'll write the whole-ass extension and install it :D

prodigycorp · 2026-02-03T09:37:30 1770111450

i refuse to defend the 5.2-codex models. They are awful.