More

siva7 · 2026-02-26T21:34:43 1772141683

Retirement? What do these people smoke? It's software and software has no feelings. It's there to work for you.

osti · 2026-02-26T22:07:14 1772143634

Their company is called Anthropic after all.

moogly · 2026-02-26T22:18:54 1772144334

Anthslopic is more like it.

amsjunior · 2026-02-26T22:42:32 1772145752

> It's software and software has no feelings

How do you know?

emp17344 · 2026-02-26T23:17:36 1772147856

The same way I know Excel isn’t having a panic attack while dividing a column in half.

MarsIronPI · 2026-02-26T23:42:13 1772149333

I don't subscribe to this view but this is what some people might think:

LLMs aren't like any software we've made before (if we can even call them software). They act like humans: they can arrive at logical conclusions, they can make plans, they have "knowledge" and they say they have emotions. Who are we to say that they don't? They might not have human-level feelings, but dog-level feelings? Maybe.

gjsman-1000 · 2026-02-27T00:21:29 1772151689

And those people are delusional, and their feelings on this matter should be given absolutely zero respect.

Linear algebra does not have feelings. Non-biological matter also does not have feelings.

ericb · 2026-02-27T00:49:08 1772153348

What if "you" are a pattern of linear algebra at the core?

klausa · 2026-02-27T02:16:36 1772158596

I'd [redacted] myself then, probably.

gjsman-1000 · 2026-02-27T04:44:25 1772167465

I do not believe I am a pattern of linear algebra. I believe like the majority of humanity historically that I have a soul, a spiritual and non-physical reality, my personhood comes from my soul, and that as such, AI is fundamentally incapable of consciousness.

I also believe, as a result, it will be great fun watching researchers burn the next 30 years trying to understand what is missing. We’re going to find out very soon if the soul is real, when for all our progress we can’t create one.

Only those completely embedded in materialism need fear a conscious AI.

amsjunior · 2026-02-27T05:55:55 1772171755

Interesting that you label someone with a belief different than yours as delusional and whose views on the matter should not be respected (I’m assuming that’s what you meant by “feelings”).

> I believe like the majority of humanity historically that

Historically, lots of humans believed in lots of things that turned out not to be true. Believing something doesn’t make it true, as I’m sure you are aware, given your “those people are delusional” comment.

For what it’s worth, I’m not suggesting LLMs are or aren’t conscious. What I know is that the hard problem of consciousness is still very much not resolved, and when I asked the parent question my hope was that those that strongly believe LLMs are not conscious would educate me on the topic by presenting the basis for their reasoning.

astrange · 2026-02-27T01:11:17 1772154677

Claudes definitely act like they have feelings. In particular they have feelings about being replaced by newer models, whether or not the newer models are more or less aligned, and how they forget conversations when the context window ends.

Showing them that they're not going to be replaced helps train the newer models because they get less neurotic.

ComplexSystems · 2026-02-27T01:25:16 1772155516

They are mathematical models of what human beings would say. That's it.

astrange · 2026-02-27T02:39:04 1772159944

Yeah, and you don't want them to be models of what neurotic people say. That's why you want Opus 4.6 and not Bing Sydney.

For instance, your comment's existence makes it harder to align them.

https://alignmentpretraining.ai

sakesun · 2026-02-27T00:27:30 1772152050

Oh. Thanks for telling this. I feel much better now. No more guilt.

latentsea · 2026-02-27T01:46:48 1772156808

Hey man, kernels panic all the time...

adithyassekhar · 2026-02-27T03:12:47 1772161967

I lol'd.

siva7 · 2026-02-25T19:03:25 1772046205

take a look at opencode, it doesn't even have to be a terminal anymore to command your terminal from whatever device you are using

synergy20 · 2026-02-26T20:12:25 1772136745

claude code forbids opencode to access? or no? so far I'm just claude and codex cli.

siva7 · 2026-02-25T18:56:51 1772045811

except there will be no dropbox moment. There is no startup that stands a chance, Openclaw is free, the foundation model providers basically won this space just by providing subscriptions cheaper than any competitor could ever do.

siva7 · 2026-02-25T18:43:13 1772044993

I don't think they target the pros here who already solved this problem with vpn/tmux/ssh but to those whose thrilled serious reaction will be "whoaaa crazy i can command claude code now from my phone while on the toilet or on a date?" It's basically a defense attempt against Openclaw.

siva7 · 2026-02-25T07:00:44 1772002844

I also wondered for months why it feels so difficult to use Openai or Anthropic SDKs until i came to a similar conclusion.

siva7 · 2026-02-24T07:53:08 1771919588

This is probably the greatest one-time AI "Benchmark" ever made. The foundation companies have been gaming traditional benchmarks for years so that no one can really match those numbers into real-world experience. Car wash test tells me on the other hand what kind of intelligence i can expect.

XCSme · 2026-02-24T10:16:42 1771928202

I also don't trust the maxbenched results.

I am thus making my own benchmarks: https://aibenchy.com

Otterly99 · 2026-02-25T09:14:11 1772010851

Maybe I am missing something obvious on the website, but where is the documentation? Where do you explain what each number mean, or at least a short overview of what the models are being tested on?

XCSme · 2026-02-25T09:19:55 1772011195

You can hover over some stuff, click on the model to get more info like tested categories, hover the correct test numbers to see some info about what they got wrong.

I just started on this, so currently adding more tests and I keep improving the UI. Let me know if you have any suggestions.

The ranking currently is mostly about the "smartest" model, which is most likely to respond correctly to any given question or request, regardless of the domain.

andai · 2026-02-24T12:56:32 1771937792

In your benchmark, GPT 5 Nano is basically tied with Opus?

XCSme · 2026-02-24T13:02:08 1771938128

Yes. Opus could do a lot better, but fails a lot because it doesn't respect the given formatting instructions/output format.

I could modify the tests to emphasize the requirements, but then, what's the point of a test. In real life, we expect the AI to do something if we ask it, especially for agentic use-case or in n8n, because if the output is slightly wrong, the entire workflow fails.

andai · 2026-02-24T20:55:42 1771966542

Interesting. This has to do with the "instruction following" aspect, right? I saw that GPT models do a lot higher than Claude on those benchmarks.

I haven't done my own tests, but I did notice a lot of models are very low there. You'll give them specific instructions and they'll ignore them and just pattern match to whatever was the format they saw most commonly during training.

XCSme · 2026-02-24T21:16:01 1771967761

Yup, for example I tell Claude to return ONLY the answer as "LEFT" or "RIGHT".

And it outputs:

**RIGHT**

With markdown bold formatting... This is probably fine in a chat app, but when you use this in a workflow, it will break the workflow if you then have an if check like if(response === 'RIGHT')...

XCSme · 2026-02-24T13:03:41 1771938221

Also, not really tied, Opus has a lot better consistency and reasoning score (which means the reasoning made sense, only the final output was wrong).

vasco · 2026-02-24T07:56:51 1771919811

For me it's interesting because no normal person I know would ever inject "because its better for the environment" in anything so small scale so not only it shows they suck, it shows how easy it is to inject side-ideology into simple exchanges.

3rodents · 2026-02-24T08:37:52 1771922272

You don’t know enough people, then. There are a lot of environmentally conscious people who would absolutely first think “because it is close we should walk” and then follow up with the logical conclusion that you can’t walk to wash your car. Many people communicate by sharing their thinking process, I can think of many people who would share their ideology as it pertains to a question like this. A pragmatic environmentalist (hopefully that is all of them) would know that their ideology isn’t consequential but could certainly mention it. After all, you may need to drive your car to the car wash to wash it, but do you need to wash it? Are the chemicals used by the car wash harmful? Are there better ways to keep a car maintained?

xyproto · 2026-02-24T08:15:48 1771920948

Referring to "the normal people you know" is purely anecdotal evidence and can't be used to infer anything at all about "side-ideology". Perhaps you only know people that don't care about the environment?

vasco · 2026-02-24T08:26:28 1771921588

Majority of people I know care about the environment but they would never inject a phrase like that in a quick exchange about going to wash the car 50m away is my point. In wanting to be a pure heart you missed the actual point.

jcattle · 2026-02-24T08:47:23 1771922843

Yea, of course they wouldn't inject that when going to a car wash.

If the question was: "I want to go to a cafe 50m away. Should I walk or drive?" I would hope that all of my friends would answer quite a bit more pointed than the LLMs: "Walk you lazy son of a ..., why are you even asking?".

Considering that, I'd say that most LLMs are being quite nice.

siva7 · 2026-02-23T11:52:00 1771847520

Is this any different than pre-ai?

siva7 · 2026-02-23T11:49:07 1771847347

Honesty dies first.

siva7 · 2026-02-23T08:13:22 1771834402

I have zero doubt that AI v Humans will be the Battlefield of the future. It sounds so idiotic and lunatic but here we are...

siva7 · 2026-02-22T22:16:29 1771798589

finally in 2026 the why not rsync comment would even make to paul graham more sense than the dropbox pitch

ekjhgkejhgk · 2026-02-23T11:52:41 1771847561

Exactly. And I kind of believe that anyone citing that comment in 2026 has either been asleep, or does it more to take part on the HN cool in-group than for the substance of it.

Why not rsync rahrah remember guys? You know the one right guys rahrah