More

antirez · 2026-04-18T15:00:42 1776524442

"West" when we talk about urban spaces, walk-accessible cities and public transportation is, IMHO, the wrong category. Europe and USA are very far apart.

ianm218 · 2026-04-18T16:01:34 1776528094

Europe and USA are both huge places so it depends what you mean. If you compare major east coast cities - Boston, DC, and NYC to European metros like Paris/ Madrid/ Lisbon the biggest tax on the citizens is the same in that it’s impossible to build anything so a huge % of income needs to go to housing.

yorwba · 2026-04-18T16:09:45 1776528585

Well, Japan isn't much different in terms of the share of income that goes to housing: https://housingpolicytoolkit.oecd.org/2.H_conso.html

zhdc1 · 2026-04-18T16:35:52 1776530152

East coast cities were built before modern building codes.

Something that, for some reason, people in the states don't want to accept is that - when given the choice - the vast majority of people prefer living in dense urban environments.

cmatza · 2026-04-18T19:06:07 1776539167

OP addresses that. Japan is not particularly dense, especially outside of core downtowns.

You see the same dynamics in London and Paris.

People do not "prefer to live in dense urban environments" by urbanist standards.

They prefer to live in dense urban environments by North American standards, which can still be far less dense than urbanists really want.

zhdc1 · 2026-04-18T19:52:02 1776541922

> which can still be far less dense than urbanists really want.

And this was my comparison?

cmatza · 2026-04-18T21:14:23 1776546863

May be an assumption on my part, but the language "people prefer to live in dense urban environment" is typical of urbanism-boosters - who definitely push a lot online that leads one to believe that anything less than inner Tokyo is unacceptable.

dividefuel · 2026-04-19T03:43:27 1776570207

Do you have a source for this? What threshold is needed for it to be 'dense'?

zhdc1 · 2026-04-19T08:54:20 1776588860

My source -> https://i.kym-cdn.com/photos/images/newsfeed/003/033/089/164...

Spooky23 · 2026-04-18T15:50:41 1776527441

Great point.

Granted I’m approaching it from the perspective of a tourist or business traveler, but 6/6 of the European cities I’ve been in were fully navigable for my purposes via transit. I’d probably guess half or less in the US.

Even in NYC or SFO, the metro areas are so large it really makes the success rates low depending on the trip.

chii · 2026-04-18T15:36:00 1776526560

they might mean west of japan ;)

margalabargala · 2026-04-18T15:53:21 1776527601

Go far enough and Japan is west of Japan, several times over. You can always keep heading west.

antirez · 2026-04-18T13:58:46 1776520726

I moved two servers, one from Linode and the other from DO to Hetzner a few months ago, with similar savings. The best part was that the two servers had tens of different sites running, implemented in different languages, with obsolete libraries, MySQL and Redis instances. A total mess. Well: Claude Code migrated it all, sometimes rewriting parts when the libraries where no longer available. Today complex migrations are much simpler to perform, which, I believe, will increase the mobility across providers a lot.

p_stuart82 · 2026-04-18T17:32:35 1776533555

IMO nobody was paying for magic compute. they're paying to not touch ten years of glue.

if agents eat that glue, the moat gets thin fast.

grim_io · 2026-04-18T18:10:00 1776535800

> agents eat that glue

No wonder they hallucinate :)

andyhedges · 2026-04-18T19:06:35 1776539195

It's when they sniff the glue, then things get wild.

edoceo · 2026-04-19T00:05:34 1776557134

It's been like 90% glue since perl took over.

onemoresoop · 2026-04-19T01:03:41 1776560621

Don’t forget staples and the tape too. LLMs have a weakness for paperclips, hope we don’t end up on that path

zamadatix · 2026-04-18T18:28:27 1776536907

Yeah, at the last job there was a single outdated external wiki server left sitting in DO for those kinds of reasons while everything updated and internal had moved already (if not twice). If it hadn't become such a security risk it would never have been moved.

drewnick · 2026-04-18T21:13:21 1776546801

The problem is a lot of this glue is proprietary by design at the various cloud services. I realize there are open source and alternative abstractions for a lot of of the same services, but there’s still quite a bit of glue if you’re on AWS, for example, and looking to move to bare metal.

But maybe I’m just thinking of the current capabilities of agents, and if we fast forward a couple years, even removing these abstractions or migrating will be very low friction.

reillyse · 2026-04-18T21:27:37 1776547657

But you can run most of the glue on your own dedicated instances.

I run k8s on a bunch of dedicated servers that are super cheap and I have all bells and whistles - just tell your coding agent to do it. You can literally design the thing you would never do yourself and it works brilliantly.

Postgres running on dedicated hardware replicated and with wal backups - easy just tell codebuff (my harness of choice) to do it. Then any number of firewalls, load balancers, bastion servers, etc. if you can imagine it , codebuff will implement it.

rustyhancock · 2026-04-18T15:02:32 1776524552

Wow a Claude add embedded into a Hetzner add.

How deep does this go?

sph · 2026-04-18T15:32:29 1776526349

I have just seen with my own eyes Claude astroturfing on a gamedev subreddit from a botting account that was picked up by Google so I could see a few of their other comments. This account's operation was going on development subs complaining about how good Claude's latest model is and how awful it is being afraid of losing one's job to AI.

I know your comment is tongue-in-cheek and the poster here is kinda known, but this kind of astroturfing is a new low and it's everywhere on forums such as these.

Aurornis · 2026-04-18T16:03:35 1776528215

I see a lot of these posts on Reddit, too, but I don't think it's actually Anthropic or Claude doing it. It's the same old Reddit karma farmers picking up on the latest trends. They've always combined headlines with ragebait to build karma and now LLM bots make it easier than ever.

It's too bad Reddit allows accounts to hide their comment history now. That was an easy way to identify bot accounts before they started allowing accounts to hide their post history

binkHN · 2026-04-19T03:58:20 1776571100

> the same old Reddit karma farmers

What's the point of these Internet points?

creamyhorror · 2026-04-19T07:19:42 1776583182

The accounts are worth something later (e.g. for spreading opinions or promoting something) and can be sold.

sph · 2026-04-19T09:52:19 1776592339

Why would a 1000 karma bot account be more valuable than a 100 karma? As long as you pass the threshold for not being shadow banned in most subs because of low karma, it’s irrelevant.

On HN for example, karma is a relatively stronger signal of account history, on Reddit there are multiple million+ karma accounts that are quite obviously bots.

whatevaa · 2026-04-19T04:06:30 1776571590

Numbers go bigger. Plenty of games based in that.

AlecSchueler · 2026-04-18T17:40:13 1776534013

That is a pity but on the bright side it also helps people to avoid being stalked and harassed.

pydry · 2026-04-19T00:07:59 1776557279

I think it probably is claude and anthropic. It kicked off in earnest towards the end of last year when "AI bubble" news stories peaked.

They're incredibly exposed to investor sentiment and were likely panicking around sept/oct/nov time when AI bubble stories were trending.

These posts were really consistent and repetitive - similar language about "scary good" models and fear of losing jobs.

Bridged7756 · 2026-04-18T16:06:35 1776528395

I've been warning people of Anthropic's astroturfing for a while now. The amount of "Insert latest model/Claude Code is scary. I'm worried about my job" posts, followed by a doom ridden writing about how their job was automated and 30 dudes got fired and the person is pivoting into plumbing or something or working at Mcdonalds, is just too suspicious not to note. Sometimes it's more covert. They don't mention any provider/model. Sometimes there's a subtle insert somewhere in the body, Opus, Claude, etc.

rdevilla · 2026-04-18T15:43:29 1776527009

The whole internet is like this now, and it's only just getting started. Makes me sick tbh, and I am still questioning if this is the kind of industry I want to work in.

MikeNotThePope · 2026-04-18T15:55:09 1776527709

For those who remember Digg, the recently relaunched a new version and shut it down almost immediately. They were getting hammered with AI bots when it was realized the Digg apparently still has good SEO. The explain it right on homepage.

https://digg.com/

dwedge · 2026-04-18T16:27:41 1776529661

They explained it with an AI generated post

sph · 2026-04-18T16:15:06 1776528906

> I am still questioning if this is the kind of industry I want to work in

I'm not. I stick around for the popcorn, and I'm not gonna miss the schadenfreude in a few years.

locknitpicker · 2026-04-19T05:51:02 1776577862

> I have just seen with my own eyes Claude astroturfing on a gamedev subreddit from a botting account that was picked up by Google so I could see a few of their other comments.

Where's the link? I mean, why would anyone take your word for it? No one can tell who you are as well. If there are posts in a subreddit then it would be interesting to see them.

refulgentis · 2026-04-18T15:42:00 1776526920

I was really confused, then, realized the person you’re replying to misspelled “ad” as "add", and you’re moving forward with the premises GP is an ad, and this HN submission is an ad. Then, you share you saw a Reddit account on a gamedev subreddit complaining AI is too good, & they're worried they won’t have a job, and you believe that Reddit account must have been an ad for AI.

Just noting for fellow just-waking-up people

(edit: OP edited)

CamperBob2 · 2026-04-18T17:21:58 1776532918

It's not necessarily astroturfing. There is a seismic shift under way regarding how things get done in this business, and if you don't acknowledge it, that's weird in itself.

ozmodiar · 2026-04-18T17:36:39 1776533799

I've certainly noticed a seismic shift in how bad support and updates have gotten with some 3rd party vendors we use, and the answer they come back with is always that they're experimenting with AI. Not saying AI isn't part of the job now, but it is getting seriously over hyped and over extended.

oulipo2 · 2026-04-18T17:57:40 1776535060

It's absolutely not seismic. If you've used AI for a little bit, you'll realize it's good at writing boilerplate code. Any complex logic, and you better re-read and correct the code a few times until you trust it.

Of course if all you do is "host wordpress website" (like 80% of what's "webdev" do), it will work. Now the issue is that the last 20% are the hardest to cover, and current AI methods will not get there (you need some much more complex methods, like being able to integrate logic with learning-based ML, to do this)

atherton94027 · 2026-04-18T17:41:14 1776534074

This is the redis guy you're replying to, I doubt he's on Claude's payroll

godot · 2026-04-18T16:57:41 1776531461

It would seem that way for sure, if it was just a random anon posting it, but the person you're replying to is the creator of Redis so I feel it's more likely a genuine opinion/experience rather than a Claude ad...

post-it · 2026-04-19T02:08:33 1776564513

Do folks see a mention of Claude as indicative of an ad? I usually say it because at work we've got a couple different options and I like to mention which one in particular I was using. But maybe I'll just start saying AI on forums unless someone asks me to specify.

EasyMark · 2026-04-19T02:33:32 1776566012

I doubt if someone who has 30000+ karma waited for this precise post to submit a "clever" claude ad surreptitiously to draw you in.

jnwatson · 2026-04-18T15:37:20 1776526640

I think it is more the other direction. I asked Claude how to save money on my cloud costs, and it suggested migrating from DO to Hetzner.

tmpz22 · 2026-04-18T15:33:29 1776526409

Its certainly a choice to accuse antirez of all people

Bridged7756 · 2026-04-18T16:09:29 1776528569

True, let's not criticize those saints of ours.

antirez · 2026-04-18T15:05:34 1776524734

"ad", with a single "d".

So it's a Claude ad inside a Hetzner ad inside a decent grammar ad.

brianwawok · 2026-04-18T15:53:27 1776527607

You forgot that this entire forum is a VC/incubator ad. Its ads all the way down.

airstrike · 2026-04-18T16:00:26 1776528026

Don't forget the ad hominem

zephen · 2026-04-18T18:46:06 1776537966

The amount of ad populum around this issue certainly reaches ad absurdum levels.

mirekrusin · 2026-04-18T15:24:30 1776525870

Ad for which elementary school?

FEELmyAGI · 2026-04-18T15:47:02 1776527222

Come on can you really nitpick grammar when your original message contains: "when the libraries where[sic] no longer available"

Btw this type of grammar error can be found by proofreading your posts with ChatGPT powered OpenClaw assistant.

antirez · 2026-04-18T17:52:47 1776534767

I was joking. I notoriously write bad English but don't like using LLMs for writing. It removes personality.

wpollock · 2026-04-19T01:58:13 1776563893

> I notoriously write bad English...

You mean that you write in English badly. :-)

rpcope1 · 2026-04-18T16:03:15 1776528195

I mean if it were anyone else, yeah I might agree, but I think Salvatore is being genuine here (and have seen Claude do a similarly surprising job fixing ops issues).

conradfr · 2026-04-18T19:35:04 1776540904

On the other hand he has totally drunk the Kool-AI(d).

oulipo2 · 2026-04-18T17:59:07 1776535147

I don't think so. I think he's clearly abusing language (saying "Claude Code migrated the stufff", rather than "I migrated the stuff after using Claude to help write boilerplate, then I went on double-checking it, testing it, and then running it")

pythonaut_16 · 2026-04-18T18:07:08 1776535628

I don't think you've nailed it either. He SHOULD be saying "54 days ago, I powered on my computer and opened a terminal. From my editor I reviewed my code files and realized I had quite a mess on my hands. Realizing it was the year A.D. 2026, I decided to fire up a modern tool. I typed "claude" into my terminal. As it launched I told it I wanted helping taking my running programs and moving them from the virtual private servers I was running in Linode (inc) and Digital Ocean (co) to Hetzner (LLC). As Claude used it's tool use abilities it read the files and made suggestions on how to do the migrations, it indicated that it could go ahead and copy the files and run the needed commands but I would need to give it permission first. I granted it permission. Once it said the services were running, I instructed it to test that they were accessible and reliable while I reviewed the glowing new code it had written. In summary, with the help of Claude Code I was able to redeploy 37 services in Hetzner."

hrimfaxi · 2026-04-18T18:16:54 1776536214

I think the parent has a point. For how many other accomplishments is the tool framed as the responsible party? We don't say "cranes built the skyscraper", people did. Why do we shift accountability when it comes to AI?

grim_io · 2026-04-18T22:57:17 1776553037

If you vaguely describe to the crane what you want built, and it builds it, then I'd say the crane built it.

qzw · 2026-04-19T00:23:42 1776558222

On Monday a crane company announces it’s pivoting to AI, followed by a quick 600% boost to its stock price. I wouldn’t even be surprised at this point.

minimaxir · 2026-04-18T22:21:07 1776550867

For clarity and accuracy, in the hopes that the person reading it interprets in good faith.

democracy · 2026-04-18T22:52:52 1776552772

Because it shows you are hip and trendy and MAYBE you deserve a job in the AI era

senordevnyc · 2026-04-18T15:19:48 1776525588

By this logic your most recent comment was just an ad for Netflix.

tannhaeuser · 2026-04-18T15:30:06 1776526206

Not every fscking story has to be about AI.

freedomben · 2026-04-18T18:01:58 1776535318

They didn't make it about AI, they mentioned a tool that helped with the migration. I find it relevant and helpful to know.

I don't see it as much different from "I used script X to do it" or something.

keybored · 2026-04-19T09:29:29 1776590969

The author knows that AI can cause social disruption of the kind that siphons even more money to the rich. To that he has thoughts and prayers[1].

Extreme pragmatism for AI converts is just instant gratification for your hacker itch without looking any further.

That’s why people get fatigued by yet another “but now with AI”. In isolation they are just solving this one problem.

[1] https://news.ycombinator.com/item?id=46587277

TheLML · 2026-04-18T19:55:58 1776542158

Excuse my ignorance, but how is that migration (especially of older libraries that are apparently being rewritten) not just a copy/paste action from one server to the other? When I build software to deploy it it includes everything it requires library wise. At least the few things I've deployed so far.

simonw · 2026-04-18T23:00:53 1776553253

You have to copy data across, and confirm that everything worked correctly, and if you're being fancy about it you need to freeze writes to the old server while you are migrating and then unfreeze after you've directed traffic to the new server. It's not trivial.

orthecreedence · 2026-04-18T21:45:07 1776548707

Sometimes you need library version X, which uses a compiled binary for the platform, which requires C library version Y, which requires glibc version Z, which is deprecated on the current version of the OS, etc etc etc.

Or you can update the app to remove the dependency on the library.

But honestly, this is what containers or VMs are built for in the first place.

GaryBluto · 2026-04-18T18:13:26 1776536006

I don't see what checking a file system has to do with anything either.

sph · 2026-04-18T15:37:11 1776526631

They really can't help themselves showing how they didn't put any effort doing a thing.

locknitpicker · 2026-04-19T05:57:32 1776578252

> They really can't help themselves showing how they didn't put any effort doing a thing.

I would be proud to show that I managed to take one of the most radical changes we can do to a system, which would otherwise be practically unthinkable, and use a tool to make it trivial ton pull off.

pedrosorio · 2026-04-18T16:13:48 1776528828

Yeah, OP is famous for never having put effort into anything, just an AI shill /s

https://en.wikipedia.org/wiki/Salvatore_Sanfilippo

This whole thread is hilarious.

locknitpicker · 2026-04-19T05:54:16 1776578056

> Not every fscking story has to be about AI.

It might surprise you to learn that nowadays there are a lot of people using LLM code assistants. Those who do can also use them to help them write blog posts.

nutjob2 · 2026-04-18T16:10:59 1776528659

You may not be interested in AI, but AI is interested in you.

mixmastamyk · 2026-04-18T17:25:03 1776533103

I use that phrase with "done with Microsoft," but it fits well here too!

fuckinpuppers · 2026-04-18T21:47:33 1776548853

Linode is going to lose my business in the next couple months as well. Been there over a decade, have referred countless customers to them, but they’ve kept bumping up prices over and over and I can get a dedicated server at Hetzner or other places with 8x the memory, dedicated NVMe disks, dedicated CPU for cheaper.

Sure you lose a little of the benefit of a “virtual” server which can be migrated but Hetzner’s support has always been super fast and capable, should I wind up in a situation where I’ve got downtime.

TiredOfLife · 2026-04-19T06:12:01 1776579121

Linode was bought by Akamai couple years ago.

pbgcp2026 · 2026-04-19T02:25:13 1776565513

... expect Hetzner raising prices in ... 3 ... 2 ... 1

cowoder · 2026-04-19T07:37:34 1776584254

They just did a massive increase: https://docs.hetzner.com/general/infrastructure-and-availabi...

qudat · 2026-04-18T16:15:53 1776528953

I’ve been experimenting with letting a local agent manipulate my remote servers using https://bower.sh/zmx-ai-portal

What’s exciting is how simple cli tools can be so impactful to dev workflows

drewnick · 2026-04-18T21:10:31 1776546631

I too, am bravely using Claude for more DevOps. I run all of my virtual machines on proxmox atop bare metal servers I own and I’m just blown away at how quickly Claude can optimize and set up entire new networks across all of these machines. Truly feels like a coworker or well paid sysadmin.

pbgcp2026 · 2026-04-19T02:26:33 1776565593

"servers I own" - that's a temporary glitch and AI will fix it for ... someone else.

gib444 · 2026-04-19T06:41:54 1776580914

Let me guess: the Pro subscription was totally sufficient even using the expensive model and it only took an hour?

cyanydeez · 2026-04-18T15:50:18 1776527418

Now imagine you can do that with a local model. You're basically breaking lockin on _Every_ end. Simply beautiful. A digital guillotine for the digital elite!

jgalt212 · 2026-04-18T21:13:34 1776546814

> Today complex migrations are much simpler to perform, which, I believe, will increase the mobility across providers a lot.

Syntax did a nice episode on this topic recently. They went over where it works well, and where it does not work well.

https://syntax.fm/show/992/migrating-legacy-code-just-got-ea...

oulipo2 · 2026-04-18T17:55:09 1776534909

Sure, and then you realize it deleted the db to "simplify the migration" lol

Obviously I agree that AI can be useful to write boilerplate, but it's in no way something you should use blindly when trying to do a migration or anything touching prod

So, to be more precise: no, "Claude Code didn't migrate it all". Claude Code helped you write boilerplate so that you could migrate

pbgcp2026 · 2026-04-19T02:28:03 1776565683

it deleted the db to "simplify the migration". That was real scenario of at least one Sev 1 outages in "leading cloud provider".

zephen · 2026-04-18T18:53:09 1776538389

Claude gave him the courage to do the migration.

And, recent research suggests that anthropomorphization may actually be positively correlated with intelligence.

m00dy · 2026-04-18T15:15:21 1776525321

yeah, everything is about to be repriced.

antirez · 2026-04-17T14:27:05 1776436025

I'm happy to see this short story posted here, it is one that I deeply loved when I was 14 or alike, and read it again multiple times. But I wonder: how did it survive in those sites without being shut down by the Asimov writings copyright holders? Given that the story is short and highly shared, it was just tolerated?

EDIT: actually I see that the link historically posted here more often is now dead: multivax.com/last_question.html

sigalo · 2026-04-17T15:08:26 1776438506

I was wondering the same. All the links to Asimov stories I've bookmarked in the past are now dead, so there probably is some enforcement of copyright.

antirez · 2026-04-16T22:56:48 1776380208

Are you sure you selected GPT 5.4-xhigh as model, in Codex? Because this makes a huge difference, and with this setting in my experience Codex outperforms Opus for almost every coding/reasoning task. Opus is still better often times when there is to call a lot of tools, interact with servers to do operations and alike, but not always. But for low level coding, Codex with GPT 5.4-xhigh is really powerful.

antirez · 2026-04-16T22:27:32 1776378452

Let's see if even mid/big companies with tons of resources, with AI and the right tooling will continue to write webview-apps or, even worse, use some kind of multi target wrapper.

antirez · 2026-04-16T14:15:31 1776348931

In the Anthropic Mythos model cards they explicitly remarked that they didn't want Mythos to be specifically good at security. They trained it to be good at coding, and as a side effect the model is (obviously) good at security. This what happens with flesh hackers too, mostly. Hackers are very good programmers, as a side effect they understand systems well enough that their understanding has security implications.

Hendrikto · 2026-04-16T14:21:48 1776349308

Model cards are just marketing material. I wouldn’t trust them one bit.

antirez · 2026-04-16T16:51:15 1776358275

You don't need to trust anyone. GPT 5.4 xhigh is available and you can test it for $20, to verify it is actually able to find complex bugs in old codebases. Do the work instead of denying AI can do certain things. It's a matter of an afternoon. Or, trust the people that did this work. See my YouTube video where I find tons of Redis bugs with GPT 5.4.

Hendrikto · 2026-04-16T17:34:22 1776360862

I did not claim or deny anything. You cited the model card, I just pointed out that this is no reliable source. If you have better sources, like your YT video, you should cite those instead.

otterley · 2026-04-16T17:53:09 1776361989

You are claiming something: that the model card is not reliable, therefore it's as useful as nothing. Sowing doubt without a possible solution adds little value to the conversation. Moreover, your rebuttal is unsubstantiated.

cyanydeez · 2026-04-16T19:14:08 1776366848

Guys, think about all the security vulnerabilities you're aware of; now, think about how many of those you know how to technically reproduce. Now imagine that you actually don't know how to reproduce most things and you're never actually be able to judge the result.

Well, just cause these are all AI people doesn't mean they verified enough of the output of these models to actually provide the significant security implications they're advertising.

mbesto · 2026-04-16T16:45:51 1776357951

And overfitting benchmarks can easily be gamed. Yet here we are with the top HN comment on the HN Mythos thread outlining it's benchmarking performance gains.

I guess we'll never learn.

Yokohiii · 2026-04-16T14:36:30 1776350190

The whole discussion started out as an attempt to disprove/verify anthropics (model card) claims.

He also transfers the logic of their claims to the actual real world. You can say that model cards are marketing garbage. You have to prove that experienced programmers are not significantly better at security.

root_axis · 2026-04-16T15:09:00 1776352140

> You have to prove that experienced programmers are not significantly better at security.

That has not been my experience. It's true that they are "better at security" in the sense that they know to avoid common security pitfalls like unparamaterized SQL, but essentially none of them have the ability to apply their knowledge to identify vulnerabilities in arbitrary systems.

Yokohiii · 2026-04-16T15:47:45 1776354465

An expert level human doesn't have to be expert at every programming category. A webdev wouldn't spot a use after free. A systems engineer wouldn't know about CSRF. That is if both don't research security beyond their field. Requiring a programmer to apply their knowledge to an arbitrary system is asking too much. On the other hand and LLM can be expert level in every programming field, able to spot and combine vulnerabilities creatively. That is all pretty hard and I don't think an security expert with vast knowledge would say "that's easy".

My point is that more experienced programmers are better at security on average, not that they are security experts.

tracker1 · 2026-04-16T19:03:36 1776366216

I would think pwn2own competitions would signal the opposite. I'm consistently and often amazed at how a unique combination of exploits can bring a larger exploit and often in ways that most wouldn't even consider. I think it takes a level of knowledge, experience, creativity and paranoia to be really good with security issues all around as a person.

inetknght · 2026-04-16T15:29:04 1776353344

> essentially none of them have the ability to apply their knowledge to identify vulnerabilities in arbitrary systems.

I've found it to be the opposite. Many of them do have the ability to apply their knowledge in that fashion. They're just either not incentivised to do so, or incentivised to not do so.

2983592 · 2026-04-16T14:24:29 1776349469

But they are treated as holy scripture ...

zahlman · 2026-04-16T16:05:35 1776355535

> Hackers are very good programmers

This does not match my experience.

ang_cire · 2026-04-16T20:21:38 1776370898

The missing part of their intended meaning is "skilled hackers". Unskilled hackers are everywhere, and they're bad at programming, but so are unskilled programmers.

rakejake · 2026-04-16T14:38:15 1776350295

>>> the model is (obviously) good at security

Out of curiosity, are you one of the people who has access to the model? If yes, could you write about your experimental setup in more detail?

_the_inflator · 2026-04-17T19:53:34 1776455614

Yep. Some are while others are more or less forum leeching and exploiting known risks and use tools.

But the some that really find certain bugs are really exceptional. Almost all are very hardware prolific and do assembler stuff. This alone is an impressive feast, I still enjoy 6510 and M68000 assembler here and there as a former scener who mainly coded demos and here and there improved games (so called trainers) or cracked few.

To be honest, the assembler guys scare me always because with it you can poke a whole in almost anything. No one in his sane mind uses assembler on x86 for professional development besides few special cases. But Python etc serve many MB of executable code for the abstraction and 20 bytes just kills it…

antirez · 2026-04-16T10:43:32 1776336212

Why this is the wrong analogy: finding hash collisions, while exponentially harder with N, is guaranteed to find, with enough work, some S so that H(S) satisfies N, so an asymmetry of resources used will have the side with more work eventually winning. But bugs are different: 1. different LLMs executions take different branches, but eventually the branches possible based on the code possible states are saturated. 2. if we imagine sampling the model for a bug in a given code M times, with M large, eventually the cap becomes not M (because of saturated state so of the code and the LLM sampler) but I, the model intelligence level. The OpenBSD SACK bug easily shows that: you can run an inferior model for an infinite number of times, it will never realize that the lack of validation of the start window if put together with the integer overflow then put together with the fact the branch where the node should never be NULL is entered produce the bug. So cyber security of tomorrow will not be like proof of work "more GPU wins", but better models and faster access to such models win.

fhd2 · 2026-04-16T11:02:54 1776337374

Agreed, it is different in terms of there being no guarantee that a specific piece of software even has an exploit. If you don't want to break into a specific piece of software, or even a specific system, I would argue that the law of averages applies: If you just invest enough, you'll likely find _something_ worth exploiting.

In other terms, I feel the argument from TFA generally checks out, just on a different level than "more GPU wins". It's one up: "More money wins". That's based on the premise that more capable models will be more expensive, and using more of it will increase the likelihood of finding an exploit, as well as the total cost. What these model providers pay for GPUs vs R&D, or what their profit margin is, I'd consider less central.

But then again, AI didn't change this, if you have more money you can find more exploits: Whether a model looks for them or a human.

antirez · 2026-04-11T17:27:31 1775928451

Congrats: completely broken methodology, with a big conflict of interest. Giving specific bug hints, with an isolated function that is suspected to have bugs, is not the same task, NOR (crucially) is a task you can decompose the bigger task into. It is basically impossible to segment code in pieces, provide pieces to smaller models, and expect them to find all the bugs GPT 5.4 or other large models can find. Second: the smarter the model, and less the pipeline is important. In the latest couple of days I found tons if Redis bugs with a three prompts open-ended pipeline composed of a couple of shell scripts. Do you think I was not already tying with weaker models? I did, but it didn't work. Don't trust what you read, you have access to frontier models for 20$ a month. Download some C code, create a trivial pipeline that starts from a random file and looks for vulnerabilities, then another step that validates it under a hard test, like ASAN crash, or ability to reach some secret, and so forth, and only then the problem can be reported. Test yourself what it is possible. Don't let your fear make you blind. Also, there is a big problem that makes the blog post reasoning not just weak per se, but categorically weak: if small model X can find 80% of vulnerabilities, if there is a model Y that can find the other potential 20%, we need "Y": the maintainers should make sure they access to models that are at least as good as the black hats folks.

slopinthebag · 2026-04-11T21:09:11 1775941751

Idk, it seems reasonable to me

> "Our tests gave models the vulnerable function directly, often with contextual hints. A real autonomous discovery pipeline starts from a full codebase with no hints. The models' performance here is an upper bound on what they'd achieve in a fully autonomous scan. That said, a well-designed scaffold naturally produces this kind of scoped context through its targeting and iterative prompting stages, which is exactly what both AISLE's and Anthropic's systems do."

Also they included a test with a false positive, the small models got it right and Opus got it wrong. So this paper shows with the right approach and harness these smaller models can produce the same results. Thats awesome!

So, if you're struggling to make these smaller models work it's almost certainly an issue of holding them wrong. They require a different approach/harness since they are less capable of working with a vague prompt and have a smaller context, but incredibly powerful when wielded by someone who knows how to use them. And since they are so fast and cheap, you can use them in ways that are not feasible with the larger, slower, more expensive models. But you have to know how to use them, it requires skill unlike just lazily prompting Claude Code, however the results can be far better. If you aren't integrating them in your workflow you're ngmi imo :) This will be the next big trend, especially as they continue to improve relative to SOTA which is running into compute limitations.

felipeerias · 2026-04-12T02:10:11 1775959811

Anthropic gave the model the whole codebase and told it to find a vulnerability on a specific file, iterating across sessions focusing on different files.

What happens then is that, for example, the model looks through that particular file, identifies potential problems, and works upwards through the codebase to check whether those could actually be hit.

“Hum, here we assume that the input has been validated, is there any way that might not be the case?”

This is not unique to Mythos. You can already do this with publicly available models. Mythos does appear to be significantly more capable, so it would get better results.

The research discussed here provided models with just a known buggy function, missing the whole process required to find that bug in the first place.

slopinthebag · 2026-04-12T04:03:49 1775966629

Mmm, Anthropic had a harness that had Mythos check each file as an entry point. That's not quite "here is a codebase, find vulns". A more sophisticated harness with a fast and cheap model could go function-by-function to do the same thing. Which is what this was validating.

> The research discussed here provided models with just a known buggy function, missing the whole process required to find that bug in the first place.

That process can be made part of a harness, again which is what they were validating.

I'm not sure why people are so hell-bent on disparaging open source models here. I get that some people cant get results from them, but that's just a skill issue - we should all be ecstatic that we don't need to rely on the unethical AI corps to allow us to do our jobs.

Departed7405 · 2026-04-11T20:21:43 1775938903

Exactly, this is so flawed. Anthropic themselves said they only reported <1% of the vulnerabilities found, cause the rest is unpatched.

Give open models an environment (prior to Feb 15- so no Mythos-discovered vulns are patche) of Linux and see how many vulnerabilities it can find. Then put it in a sandbox and see if it can escape and send you an e-mail.

nsbsbdjdididi · 2026-04-11T18:40:43 1775932843

Thanks Dario, very cool!

antirez · 2026-04-10T08:03:38 1775808218

Don't focus on what you prefer: it does not matter. Focus on what tool the LLM requires to do its work in the best way. MCP adds friction, imagine doing yourself the work using the average MCP server. However, skills alone are not sufficient if you want, for instance, creating the ability for LLMs to instrument a complicated system. Work in two steps:

1. Ask the LLM to build a tool, under your guide and specification, in order do a specific task. For instance, if you are working with embedded systems, build some monitoring interface that allows, with a simple CLI, to do the debugging of the app as it is working, breakpoints, to spawn the emulator, to restart the program from scratch in a second by re-uploading the live image and resetting the microcontroller. This is just an example, I bet you got what I mean.

2. Then write a skill file where the usage of the tool at "1" is explained.

Of course, for simple tasks, you don't need the first step at all. For instance it does not make sense to have an MCP to use git. The agent knows how to use git: git is comfortable for you, to use manually. It is, likewise, good for the LLM. Similarly if you always estimante the price of running something with AWS, instead of an MCP with services discovery and pricing that needs to be queried in JSON (would you ever use something like that?) write a simple .md file (using the LLM itself) with the prices of the things you use most commonly. This is what you would love to have. And, this is what the LLM wants. For complicated problems, instead, build the dream tool you would build for yourself, then document it in a .md file.

prohobo · 2026-04-10T09:15:04 1775812504

I feel like the MCP conversation conflates too many things and everyone has strong assumptions that aren't always correct. The fundamental issue is between one-off vs. persistent access across sessions:

- If you need to interact with a local app in a one-off session, then use CLI.

- If you need to interact with an online service in a one-off session, then use their API.

- If you need to interact with a local app in a persistent manner, and if that app provides an MCP server, use it.

- If you need to interact with an online service in a persistent manner, and if that app provides an MCP server, use it.

Whether the MCP server is implemented well is a whole other question. A properly configured MCP explains to the agent how to use it without too much context bloat. Not using a proper MCP for persistent access, and instead trying to describe the interaction yourself with skill files, just doesn't make any sense. The MCP owner should be optimizing the prompts to help the agent use it effectively.

MCP is the absolute best and most effective way to integrate external tools into your agent sessions. I don't understand what the arguments are against that statement?

xyzzy123 · 2026-04-10T09:33:43 1775813623

My main complaint with mcp is that it doesn't compose well with other tools or code. Like if I want to pull 1000 jira tickets and do some custom analysis I can do that with cli or api just fine, but not mcp.

prohobo · 2026-04-10T09:40:47 1775814047

Right, that feels like something you'd do with a script and some API calls.

MCP is more for a back and forth communication between agent and app/service, or for providing tool/API awareness during other tasks. Like MCP for Jira would let the AI know it can grab tickets from Jira when needed while working on other things.

I guess it's more like: the MCP isn't for us - it's for the agent to decide when to use.

xyzzy123 · 2026-04-10T11:10:43 1775819443

I just find that e.g. cli tools scale naturally from tiny use cases (view 1 ticket) to big use cases (view 1000 tickets) and I don't have to have 2 ways of doing things.

Where I DO see MCPs getting actual use is when the auth story for something (looking at you slack, gmail, etc) is so gimped out that basically, regular people can't access data via CLI in any sane or reasonable way. You have to do an oauth dance involving app approvals that are specifically designed to create a walled garden of "blessed" integrations.

The MCP provider then helpfully pays the integration tax for you (how generous!) while ensuring you can't do inconvenient things like say, bulk exporting your own data.

As far as I can tell, that's the _actual_ sweet spot for MCPs. They're sort of a technology of control, providing you limited access to your own data, without letting you do arbitrary compute.

I understand this can be considered a feature if you're on the other side of the walled garden, or you're interested in certain kinds of enterprise control. As a programmer however I prefer working in open ecosystems where code isn't restricted because it's inconvenient to someone's business model.

SOLAR_FIELDS · 2026-04-10T23:12:43 1775862763

The auth angle is pretty interesting here. I spend a fair amount of time helping nontechnical people set up AI workflows in Claude Cowork and MCP works pretty well for giving them an isolated external system that I can tightly control their workflow guardrails but also interestingly give them the freedom to treat what IS exposed as a generic api automation tool. That combined with skills lets these non technical people string together zapier like workflows in natural language which is absolutely huge for the level of agency and autonomy it awards these people. So I find it quite interesting for the use case of providing auth encapsulated API access to systems that would normally require an engineer to unlock. The story around “wrap this REST API into a controlled variant only for the end users use case and allow them to complete auth challenges in every which way” has been super useful. Some of my mcp servers go through an oauth challenge response, others provide them guidance to navigate to the system and generate an api key and paste it into the server on initial connection.

hadlock · 2026-04-10T17:37:08 1775842628

>while ensuring you can't do inconvenient things like say, bulk exporting your own data

I think this is the key; I want my analysts to be able to access 40% of the database they need to do their job, but not the other 60% parts that would allow them to dump the business-secrets part of the db, and start up business across the street. You can do this to some extent with roles etc but MCP in some ways is the data firewall as your last line of protection/auth.

michaelbuckbee · 2026-04-10T13:29:02 1775827742

MCPs are for documentation. CLI->API is for interaction.

Twirrim · 2026-04-11T01:28:19 1775870899

Weird... I've been happily using Atlassian's MCP for this kind of thing just fine?

somnium_sn · 2026-04-10T15:42:37 1775835757

Give the model a REPL and let it compose MCP calls either by using tool calls structured output, doing string processing or piping it to a fast cheap model to provide structured output.

This is the same as a CLI. Bash is nothing but a programming language and you can do the same approach by giving the model JavaScript and have it call MCP tools and compose them. If you do that you can even throw in composing it with CLis as well

insin · 2026-04-10T10:13:07 1775815987

You can make it compose by also giving the agent the necessary tools to do so.

I encountered a similar scenario using Atlassian MCP recently, where someone needed to analyse hundreds of Confluence child pages from the last couple of years which all used the same starter template - I gave the agent a tool to let it call any other tool in batch and expose the results for subsequent tools to use as inputs, rather than dumping it straight into the context (e.g. another tool which gives each page to a sub-agent with a structured output schema and a prompt with extraction instructions, or piping the results into a code execution tool).

It turned what would have been hundreds of individual tool calls filling the context with multiple MBs of raw confluence pages, into a couple of calls returning relevant low-hundreds of KBs of JSON the agent could work further with.

__alexs · 2026-04-10T12:17:12 1775823432

The agent cannot compose MCPs.

What it can do is call multiple MCPs, dumping tons of crap into the context and then separately run some analysis on that data.

Composable MCPs would require some sort of external sandbox in which the agent can write small bits of code to transform and filter the results from one MCP to the next.

csallen · 2026-04-10T14:44:11 1775832251

This is confusing to me. What is composability if not calling a program, getting its program, and feeding it into another program as input? Why does it matter if that output is stored in the LLM's context, or if it's stored in a file, or if it's stored ephemerally?

Maybe I'm misunderstanding the definition of composability, but it sounds like your issue isn't that MCP isn't composable, but that it's wasteful because it adds data from interstitial steps to the context. But there are numerous ways to circumvent this.

For example, it wouldn't be hard to create a tool that just runs an LLM, so when the main LLM convo calls this tool it's effectively a subagent. This subagent can do work, call MCPs, store their responses in its context, and thereby feed that data as input into other MCPs/CLIs, and continue in this way until it's done with its work, then return its final result and disappear. The main LLM will only get the result and its context won't be polluted with intermediary steps.

This is pretty trivial to implement.

cruffle_duffle · 2026-04-11T03:09:03 1775876943

> Why does it matter if that output is stored in the LLM's context

Context window is expensive and precious. Much better to offload to some medium where it isn’t.

somnium_sn · 2026-04-10T15:44:34 1775835874

Give the model an interpreter like mlua and let it write code to compose MCP calls together. This is a well established method.

It’s the equivalent to calling CLIs in bash, except mlua is a sandboxes runtime while bash is not.

insin · 2026-04-10T12:45:03 1775825103

At the level of the agent, it knows nothing about MCP, all it has is a list of tools. It can do anything the tools you give it let it do.

__alexs · 2026-04-10T13:48:49 1775828929

It cannot do "anything" with the tools. Tools are very constrained in that the agent must insert into it's context the tool call, and it can only receive the response of the tool directly back into its context.

Tools themselves also cannot be composed in any SOTA models. Composition is not a feature the tool schema supports and they are not trained on it.

Models obviously understand the general concept of function composition, but we don't currently provide the environments in which this is actually possible out side of highly generic tools like Bash or sandboxed execution environments like https://agenttoolprotocol.com/

hrimfaxi · 2026-04-10T12:30:48 1775824248

They can already do this, no? MCPs regularly dump their results to a textfile and other tools (cli or otherwise) filter it.

cruffle_duffle · 2026-04-11T03:08:06 1775876886

At that point might as well just use CLI

I totally agree that mcp not being compostable is a very big issue.

losvedir · 2026-04-10T11:49:51 1775821791

But in the context of this discussion, Atlassian has a CLI tool, acli. I'm not quite following why that wouldn't have worked here. As a normal CLI you have all the power you need over it, and the LLM could have used it to fetch all the relevant pages and save to disk, sample a couple to determine the regular format, and then write a script to extract out what they needed, right? Maybe I don't understand the use case you're describing.

insin · 2026-04-10T12:43:51 1775825031

Not all agents are running in your CLI or even in any CLI, which is why people are arguing past each other all over the topic of MCP.

I implemented this in an agent which runs in the browser (in our internal equivalent of ChatGPT or Claude's web UI), connecting directly to Atlassian MCP.

xyzzy123 · 2026-04-10T11:04:42 1775819082

Hmm, but you can't write a standard MCP (e.g. batch_tool_call) that calls other MCPs because the protocol doesn't give you a way to know what other MCPs are loaded in the runtime with you or any means to call them? Or have I got that wrong?

So I guess you had to modify the agent harness to do this? or I guess you could use... mcp-cli ... ??

jmcodes · 2026-04-10T12:27:20 1775824040

I don't maintain this anymore but I experimented with this a while back: https://github.com/jx-codes/lootbox

Essentially you give the agent a way to run code that calls MCP servers, then it can use them like any other API.

Nowadays small bash/bun scripts and an MCP gateway proxy gets me the same exact thing.

So yeah at some level you do have to build out your own custom functionality.

CuriouslyC · 2026-04-10T12:33:14 1775824394

MCP is less discoverable than a CLI. You can have detailed, progressive disclosure for a CLI via --help and subcommands.

MCPs needs to be wrapped to be composed.

MCPs needs to implement stateful behavior, shell + cli gives it to you for free.

MCP isn't great, the main value of it is that it's got uptake, it's structured and it's "for agents." You can wrap/introspect MCP to do lots of neat things.

Eldodi · 2026-04-10T15:30:19 1775835019

"MCP is less discoverable than a CLI" -> not true anymore with Tool_search. The progressive discovery and context bloat issue of MCP was a MCP Client implementation issue, not a MCP issue.

"MCPs needs to be wrapped to be composed." -> Also not true anymore, Claude Code or Cowork can chain MCP calls, and any agent using bash can also do it with mcpc

"MCPs needs to implement stateful behavior, shell + cli gives it to you for free." -> having a shell+cli running seems like a lot more work than adding a sessionId into an MCP server. And Oauth is a lot simpler to implement with MCP than with a CLI.

MCP's biggest value today is that it's very easy to use for non-tech users. And a lot of developers seem to forget than most people are not tech and CLI power users

CuriouslyC · 2026-04-10T18:14:05 1775844845

Just to poke some holes in this in a friendly way:

* What algorithm does tool_search use?

* Can tool_search search subcommands only?

* What's your argument for a harness having a hacked in bash wrapper nestled into the MCP to handle composition being a better idea than just using a CLI?

* Shell + CLI gives you basically infinite workflow possibilities via composition. Given the prior point, perhaps you could get a lot of that with hacked-in MCP composition, but given the training data, I'll take an agent's ability to write bash scripts over their ability to compose MCPs by far.

prohobo · 2026-04-10T12:39:05 1775824745

"MCP is less discoverable than a CLI" - that doesn't make any sense in terms of agent context. Once an MCP is connected the agent should have full understanding of the tools and their use, before even attempting to use them. In order for the agent to even know about a CLI you need to guide the agent towards it - manually, every single session, or through a "skill" injection - and it needs to run the CLI commands to check them.

"MCPs needs to implement stateful behavior" - also doesn't make any sense. Why would an MCP need to implement stateful behavior? It is essentially just an API for agents to use.

CuriouslyC · 2026-04-10T12:51:49 1775825509

If you have an API with thousands of endpoints, that MCP description is going to totally rot your context and make your model dumb, and there's no mechanism for progressive disclosure of parts of the tool's abilities, like there is for CLIs where you can do something like:

tool --help

tool subcommand1 --help

tool subcommand2 --help

man tool | grep "thing I care about"

As for stateful behavior, say you have the google docs or email mcp. You want to search org-wide for docs or emails that match some filter, make it a data set, then do analysis. To do this with MCP, the model has to write the files manually after reading however many KB of input from the MCP. With a cli it's just "tool >> starting_data_set.csv"

prohobo · 2026-04-10T12:59:50 1775825990

This is a design problem, and not something necessarily solved by CLI --help commands.

You can implement progressive disclosure in MCP as well by implementing those same help commands as tools. The MCP should not be providing thousands of tools, but the minimum set of tools to help the AI use the service. If your service is small, you can probably distill the entire API into MCP tools. If you're AWS then you provide tools that then document the API progressively.

Technically, you could have an AWS MCP provide one tool that guides the AI on how to use specific AWS services through search/keywords and some kind of cursor logic.

The entire point of MCP is inherent knowledge of a tool for agentic use.

troupo · 2026-04-10T18:17:57 1775845077

> like there is for CLIs where you can do something like

Well, these will fail for a large amount of cli tools. Any and all combinations of the following are possible, and not all of them will be available, or work at all:

    tool                    some tools may output usage when no arguments are supplied
    tool -h                 some tools may have a short switch for help
    tool --help             some tools may have a long switch for help
    tool help               some tools may have help as a subcommand
    tool command            some tools may output usage for a command with no arguments
    tool command -h         some tools may have a short switch for command help
    tool command --help     some tools may have a long switch for command help
    tool help command       some tools may have a help command
    man tool                some tools may have man pages

examples:

    grep                    one-line usage and "type grep --help"
    grep -h                 one-line usage and "type grep --help"
    grep --help             extended usage docs
    man grep                very extended usage docs


    python                  starts interactive python shell
    python -h
    python --help           equivalent help output


    ps                      short list of processes
    ps -h                   longer list of processes
    ps --help               short help saying you can do, for example, `ps --help a`
    ps --help a             gives an extended help, nothing about a

    erl                     
    erl -h
    erl --help              all three start Erlang shell
    man erl                 No manual entry for erl

etc.

Not to say that MCPs are any better. They are written by people, after all. So they are as messy.

BeetleB · 2026-04-10T15:05:13 1775833513

> that MCP description is going to totally rot your context and make your model dumb, and there's no mechanism for progressive disclosure of parts of the tool's abilities,

Completely false. I was dealing with this problem recently (a few tools, consuming too many tokens on each request). MCP has a mechanism for dynamically updating the tools (or tool descriptions):

https://code.claude.com/docs/en/mcp#dynamic-tool-updates

We solved it by providing a single, bare bones tool: It provides a very brief description of the types of tools available (1-2 lines). When the LLM executes that tool, all the tools become available. One of the tools is to go back to the "quiet" state.

That first tool consumes only about 60 tokens. As long as the LLM doesn't need the tools, it takes almost no space.

As others have pointed out, there are other solutions (e.g. having all the tools - each with a 1 line description, but having a "help" tool to get the detailed help for any given tool).

medbrane · 2026-04-10T15:21:22 1775834482

>here's no mechanism for progressive disclosure of parts of the tool's abilities

In fact there is: https://platform.claude.com/docs/en/agents-and-tools/tool-us...

If the special tool search tool is available, then a client would not load the descriptions of the tools in advance, but only for the ones found via the search tool. But it's not widely supported yet.

fennecbutt · 2026-04-10T13:04:41 1775826281

>man tool | grep "thing I care about"

Isn't the same true of filtering tools available thru mcp?

The mcp argument to me really seems like people arguing about tabs and spaces. It's all whitespace my friends.

kordlessagain · 2026-04-10T14:24:49 1775831089

Nobody said anything about an API with thousands of endpoints. Does that even exist? I've never seen it. Wouldn't work on it if I had seen it. Such is the life of a strawman argument.

Further, isn't a decorator in Python (like @mcp.tool) the easy way to expose what is needed to an API, if even if all we are doing is building a bridge to another API? That becomes a simple abstraction layer, which most people (and LLMs) get.

Writing a CLI for an existing API is a fool's errand.

CuriouslyC · 2026-04-10T14:33:32 1775831612

Cloudflare wrote a blog post about this exact case. The cloud providers and their CLIs are the canonical example, so 100% not a strawman.

locknitpicker · 2026-04-10T15:22:20 1775834540

> Writing a CLI for an existing API is a fool's errand.

I don't think your opinion is reasonable or well grounded. A CLI app can be anything including a script that calls Curl. With a CLI app you can omit a lot of noise from the context things like authentication, request and response headers, status codes, response body parsing, etc. you call the tool, you get a response, done. You'd feel foolish to waste tokens parsing irrelevant content that a deterministic script can handle very easily.

coldtea · 2026-04-10T16:55:46 1775840146

>"MCP is less discoverable than a CLI" - that doesn't make any sense in terms of agent context. Once an MCP is connected the agent should have full understanding of the tools and their use, before even attempting to use them. In order for the agent to even know about a CLI you need to guide the agent towards it - manually, every single session, or through a "skill" injection - and it needs to run the CLI commands to check them.

Knowledge about any MCP is not something special inherent in the LLM, it's just an agent side thing. When it comes to the LLM, it's just some text injected to its prompting, just like a CLI would be.

Twirrim · 2026-04-11T01:39:34 1775871574

I'm using an MCP to enhance my security posture. I have tools with commands that I explicitly cannot risk the agent executing.

So I run the agent in a VM (it's faster, which I find concerning), and run an MCP on the host that the guest can access, with the MCP also only containing commands that I'm okay with the agent deciding to run.

Despite my previous efforts with skills, I've found agents will still do things like call help on CLIs and find commands that it must never call. By the delights of the way the probabilities are influenced by prompts, explicitly telling it not to run specific commands increases the risk that it will (because any words in the context memory are more likely to be returned).

mbesto · 2026-04-10T15:00:19 1775833219

The way I see it is more like this:

- Skills help the LLM answer the "how" to interact with API/CLIs from your original prompt

- API is what actually sends/receives the interaction/request

- CLI is the actual doing / instruct set of the interaction/request

- MCP helps the LLM understand what is available from the CLI and API

They are all complementary.

Eldodi · 2026-04-10T15:19:11 1775834351

There was a great presentation at the MCP Dev Summit last week explaining MCP vs CLI vs Skills vs Code Mode: https://www.figma.com/deck/H6k0YExi7rEmI8E6j6R0th/MCP-Dev-Su...

mbreese · 2026-04-10T12:57:27 1775825847

I think a lot of the MCP arguments conflate MCP the protocol versus how we currently discover and use MCP tool servers. I think there’s a lot of overhead and friction right now with how MCP servers are called and discovered by agents, but there’s no reason why it has to be that way.

Honestly, an agent shouldn’t really care how it’s getting an answer, only that it’s getting an answer to the question it needs answered. If that’s a skill, API call, or MCP tool call, it shouldn’t really matter all that much to the agent. The rest is just how it’s configured for the users.

noodletheworld · 2026-04-10T09:59:25 1775815165

> MCP is the absolute best and most effective way to integrate external tools into your agent sessions

Nope.

The best way to interact with an external service is an api.

It was the best way before, and its the best way now.

MCP doesn't scale and it has a bloated unnecessarily complicated spec.

Some MCP servers are good; but in general a new bad way of interacting with external services, is not the best way of doing it, and the assertion that it is in general, best, is what I refer to as “works for me” coolaid.

…because it probably does work well for you.

…because you are using a few, good, MCP servers.

However, that doesn't scale, for all the reasons listed by the many detractors of MCP.

Its not that it cant be used effectively, it is that in general it is a solution that has been incompetently slapped on by many providers who dont appreciate how to do it well and even then, it scales badly.

It is a bad solution for a solved problem.

Agents have made the problem MCP was solving obsolete.

brabel · 2026-04-10T10:44:20 1775817860

You haven’t actually done that have you. If you did, you would immediately understand the problems MCP solves on top of just trying to use an API directly:

- easy tool calling for the LLM rather than having to figure out how to call the API based on docs only. - authorization can be handled automatically by MCP clients. How are you going to give a token to your LLM otherwise?? And if you do, how do you ensure it does not leak the token? With MCP the token is only usable by the MCP client and the LLM does not need to see it. - lots more things MCP lets you do, like bundle resources and let the server request off band input from users which the LLM should not see.

thepasch · 2026-04-10T15:41:31 1775835691

> easy tool calling for the LLM rather than having to figure out how to call the API based on docs only

I think the best way to run an agent workflow with custom tools is to use a harness that allows you to just, like, write custom tools. Anthropic expects you to use the Agent SDK with its “in-process MCP server” if you want to register custom tools, which sounds like a huge waste of resources, particularly in workflows involving swarms of agents. This is abstraction for the sake of abstraction (or, rather, market share).

Getting the tool built in the first place is a matter of pointing your agent at the API you’d like to use and just have them write it. It’s an easy one-shot even for small OSS models. And then, you know exactly what that tool does. You don’t have to worry about some update introducing a breaking change in your provider’s MCP service, and you can control every single line of code. Meanwhile, every time you call a tool registered by an MCP server, you’re trusting that it does what it says.

> authorization can be handled automatically by MCP clients. How are you going to give a token to your LLM otherwise??

env vars or a key vault

> And if you do, how do you ensure it does not leak the token?

env vars or a key vault

bitexploder · 2026-04-10T14:58:49 1775833129

An authnz aware egress proxy that also puts guard rails on MCP behavior?

anon84873628 · 2026-04-11T01:36:29 1775871389

Gee, that's starting to sound like a whole "bloated" framework...

prohobo · 2026-04-10T10:06:33 1775815593

Let's say I made a calendar app that stores appointments for you. It's local, installed on your system, and the data is stored in some file in ~/.calendarapp.

Now let's say you want all your Claude Code sessions to use this calendar app so that you can always say something like "ah yes, do I have availability on Saturday for this meeting?" and the AI will look at the schedule to find out.

What's the best way to create this persistent connection to the calendar app? I think it's obviously an MCP server.

In the calendar app I provide a built-in MCP server that gives the following tools to agents: read_calendar, and update_calendar. You open Claude Code and connect to the MCP server, and configure it to connect to the MCP for all sessions - and you're done. You don't have to explain what the calendar app is, when to use it, or how to use it.

Explain to me a better solution.

frotaur · 2026-04-10T10:12:24 1775815944

Why couldn't the calendar app expose in an API the read_calendar and update_calendar functionalities, and have a skill 'use_calendar' that describes how to use the above?

Then, the minimal skill descriptions are always in the model's context, and whenever you ask it to add something to the calendar, it will know to fetch that skill. It feels very similar to the MCP solution to me, but with potentially less bloat and no obligation to deal with MCP? I might be missing something, though.

prohobo · 2026-04-10T10:17:53 1775816273

Why would I do that if the MCP already handles it? The MCP exposes the API with those tools, it explains what the calendar app is and when to use it.

Connected MCP tools are also always in the model's context, and it works for any AI agent that supports MCP, not just Claude Code.

noodletheworld · 2026-04-10T10:22:19 1775816539

> The MCP exposes the API with those tools, it explains what the calendar app is

So does an API and a text file (or hell, a self describing api).

Which is more complex and harder to maintain, update and use?

This is a solved problem.

The world doesnt need MCP to reinvent a solution to it.

If we’re gonna play the ELI5 game, why does MCP define a UI as part of its spec? Why does it define a bunch of different resource types of which only tools are used by most servers? Why did not have an auth spec at launch? Why are there so many MCP security concerns?

These are not idle questions.

They are indicative of the “more featurrrrrres” and “lack of competence” that went into designing MCP.

Agents, running a sandbox, with normal standard rbac based access control or, for complex operations standard stateful cli tooling like the azure cli are fundamentally better.

raincole · 2026-04-10T10:44:49 1775817889

> So does an API and a text file (or hell, a self describing api).

That sounds great. How about we standardize this idea? We can have an endpoint to tell the agents where to find this text file and API. Perhaps we should be a bit formal and call it a protocol!

bavell · 2026-04-10T12:29:31 1775824171

> How about we standardize this idea? We can have an endpoint to tell the agents where to find this text file and API

Good news! It's already standardized and agents already know where to find it!

https://code.claude.com/docs/en/skills

prohobo · 2026-04-10T10:30:05 1775817005

How would the AI know about the calendar app unless you make the text file and attach it to the session?

Self-describing APIs require probing through calls, they don't tell you what you need to know before you interact with them.

MCP servers are very simple to implement, and the developers of the app/service maintain the server so you don't have to create or update skills with incomplete understanding of the system.

Your skill file is going to drift from the actual API as the app updates. You're going to have to manage it, instead of the developers of the app. I don't understand what you're even talking about.

noodletheworld · 2026-04-10T10:38:52 1775817532

[flagged]

prohobo · 2026-04-10T11:30:14 1775820614

You do understand that what it sounds like you're talking about is essentially a proto-MCP implementation right? Except more manual work involved.

throwanem · 2026-04-10T11:55:41 1775822141

This has devolved into "MCP is web scale." https://youtu.be/b2F-DItXtZs

prohobo · 2026-04-10T15:17:56 1775834276

You're clearly very intelligent and a real software engineer, maybe you can explain where I'm wrong?

throwanem · 2026-04-10T15:33:28 1775835208

Sure thing! That probably won't take more than a couple years at 10-20 hours a week of tutelage, and although my usual rate for consulting of any stripe is $150 an hour, for you I'm willing to knock that all the way down to just $150 an hour.

prohobo · 2026-04-10T16:15:28 1775837728

Just give us a taste of what we'd be paying for? I'm sure you're an expert but before I commit to 2+ years of consultation I'd like to see your approach.

throwanem · 2026-04-10T16:50:52 1775839852

I've already pointed this out as the silly, purposeless argument it's become. (Or more become.) Even I at this point can't figure out who is advocating what or why, other than for the obvious ego reasons. You're bikeshedding at each other and wasting all the time and effort it requires, because no one else is enjoying it any more than you two are: if anything you have left your audience more confused than we began, but I see I repeat myself.

Show me you can stop doing that, and I'll happily mediate a technical version of this conversation that proceeds respectfully from the two of you each making a clear and concise statement of your design thesis, and what you see as its primary pros and cons.

For that I'll take a flat $150 for up to 4 hours. I usually bill by the 15-minute increment, but obviously we would dispense with that here, and ordinarily I would not, of course, offer such a remarkable discount. But it doesn't really take $150 worth of effort to remind someone that he should take better care to distinguish his engineering judgment and his outraged insecurity.

prohobo · 2026-04-10T17:56:27 1775843787

I don't get it, you joined this thread to call me an idiot with a meme, and now you're talking about being a neutral arbiter for a technical discussion that I supposedly ruined.

More than anything I'm getting frustrated with HN discussions because people just insinuate that I'm stupid instead of making substantive arguments reasoning how what I'm saying is wrong.

Are we performing for an audience or having a discussion?

throwanem · 2026-04-10T18:21:35 1775845295

I can't make heads nor tails of anyone's position in this mess, precisely because of its devolution into everyone yelling at one another. Yours happened to be the tail comment on this branch at the time I posted. Don't take it more personally than it was meant.

I understand why this website doesn't have DMs except among YC founders. But if it were otherwise, I'd have DMed you instead of posting that first comment publicly. The criticism I remain convinced has merit, but such things are better done in private. If I chose to make an example out of you over the other guy, it was because you looked like offering a better chance than he of redirecting this into the kind of discussion from which someone could conceivably learn something.

juped · 2026-04-10T13:39:33 1775828373

Why would you put a second, jankier API in front of your API when you could just use the API?

saberience · 2026-04-10T11:16:25 1775819785

You realize you can just create your own tools and wire them up directly using the Anthropic or OpenAI APIs etc?

It's not a choice between Skills or MCP, you can also just create your own tools, in whatever language you want, and then send in the tool info to the model. The wiring is trivial.

I write all my own tools bespoke in Rust and send them directly to the Anthropic API. So I have tools for reading my email, my calendar, writing and search files etc. It means I can have super fast tools, reduce context bloat, and keep things simple without needing to go into the whole mess of MCP clients and servers.

And btw, I wrote my own MCP client and server from the spec about a year ago, so I know the MCP spec backwards and forwards, it's mostly jank and not needed. Once I got started just writing my own tools from scratch I realised I would never use MCP again.

addandsubtract · 2026-04-10T11:03:37 1775819017

Meanwhile, I'm using MCP for the LLM to lookup up-to-date documentation, and not hallucinate APIs.

Aperocky · 2026-04-10T13:29:04 1775827744

It's like saying it is very safe and nice to drive a F150 with half ton of water on the truck bed.

How about driving the same truck without that half ton of water?

JamesSwift · 2026-04-10T13:11:02 1775826662

Hard disagree. Apis and clis have been THOROUGHLY documented for human consumption for years and guess what, the models have that context already. Not only of the docs but actual in the wild use. If you can hook up auth for an agent, using any random external service is generally accomplished by just saying “hit the api”.

I wrap all my apis in small bash wrappers that is just curl with automatic session handling so the AI only needs to focus on querying. The only thing in the -h for these scripts is a note that it is a wrapper around curl. I havent had a single issue with AI spinning its wheels trying to understand how to hit the downstream system. No context bloat needed and no reinventing the wheel with MCP when the api already exists

anon84873628 · 2026-04-11T01:28:13 1775870893

By wrapping the API with a script and feeding that inventory to the LLM... You reinvented MCP.

Having service providers implement MCP saves everyone from having to do that work themselves.

Plus there are a lot more uses cases than developers running agents on their own machine.

JamesSwift · 2026-04-11T02:46:49 1775875609

Wrapping here is literally just

```

  #!/usr/bin/env bash

  creds={path to creds}
  basepath={url basepath}

  url={parse from args}

  curl -H "Authorization: #{creds}" "#{basepath}/#{url}" $rest_of_args

```

Just a way to read/set the auth and then calling curl. Its generalizable to nearly all apis out there. It requires no work by the provider and you can shape it however you need.

gitgud · 2026-04-10T12:03:38 1775822618

> For instance it does not make sense to have an MCP to use git.

What if you don’t want the AI to have any write access for a tool? I think the ability to choose what parts of the tool you expose is the biggest benefit of MCP.

As opposed to a READ_ONLY_TOOL_SKILL.md that states “it’s important that you must not use any edit API’s…”

NiloCK · 2026-04-10T13:54:19 1775829259

Just as easy to write a wrapper to the tool you want to restrict. You ban the restricted tool outright, and the skill instructs on usage of the wrapper.

Safer than just giving an instruction to use the tool a specific way.

Majromax · 2026-04-10T14:28:22 1775831302

Anyone who's ever `DROP TABLE`d on a production rather than test database has encountered the same problem in meatspace.

In this context, the MCP interface acts as a privilege-limiting proxy between the actor (LLM/agent) and the tool, and it's little different from the standard best practice of always using accounts (and API keys) with the minimum set of necessary privileges.

It might be easier in practice to set up an MCP server to do this privilege-limiting than to refactor an API or CLI-tool, but that's more an indictment of the latter than an endorsement of the former.

BatteryMountain · 2026-04-10T12:03:28 1775822608

This is exactly what I do too. Works very well. I have a whole bunch of scripts and cli tools that claude can use, most of them was built by claude too. I very rarely need to use my IDE because of this, as I've replicated some of Jetbrains refactorings so claude doens't have to burn tokens to do the same work. It also turns a 5 minute claude session into a 10 second one, as the scripts/tools are purpose made. Its reallly cool.

edit: just want to add, i still haven't implemented a single mcp related thing. Don't see the point at all. REST + Swagger + codegen + claude + skills/tools works fine enough.

smusamashah · 2026-04-10T17:29:26 1775842166

> I've replicated some of Jetbrains refactorings

How? Jetbrains in a Java code baes is amazing and very thorough on refactors. I can reliably rename, change signature, move things around etc.

girvo · 2026-04-12T12:14:05 1775996045

Typically by deeply misunderstanding what those refactoring tools actually do

BatteryMountain · 2026-04-14T07:41:23 1776152483

What is so magical about it? Most of them are pretty straight forward and core functionality easy to replicate in 30 minutes or less?

evanmoran · 2026-04-10T16:13:05 1775837585

This is a great idea. Did you happen to release the source for this? I run into this all the time!

BatteryMountain · 2026-04-14T07:39:58 1776152398

Nope, I just dump it all in a folder (~/scripts) that claude can read & it picks them up as skills. A good chunk of them are regex based, many are find/replace type tools, some are small code generators & template inflators, some are deployment tools, some are audit tools. I cannot release them at this time, most of them are specific to our company, infra and codebase (main codebase is 1MLoC), sorry about that.

Start with a simple "Let me build a script for claude that can rename the namespace for all the file in a folder". If you have 100K+ plus files, it effort is worth it and your tools start getting chained together too. So make sure each tool only has one purpose for existing and that its output is perfect. So when claude start chaining them and you see what is possible, the mind opens up even more to possibilities.

siva7 · 2026-04-10T10:14:42 1775816082

> MCP adds friction, imagine doing yourself the work using the average MCP server.

Why on earth don't people understand that MCP and skills are complementary concepts, why? If people argue over MCP v. Skills they clearly don't understand either deeply.