I don't subscribe to this view but this is what some people might think:
LLMs aren't like any software we've made before (if we can even call them software). They act like humans: they can arrive at logical conclusions, they can make plans, they have "knowledge" and they say they have emotions. Who are we to say that they don't? They might not have human-level feelings, but dog-level feelings? Maybe.
I do not believe I am a pattern of linear algebra. I believe like the majority of humanity historically that I have a soul, a spiritual and non-physical reality, my personhood comes from my soul, and that as such, AI is fundamentally incapable of consciousness.
I also believe, as a result, it will be great fun watching researchers burn the next 30 years trying to understand what is missing. We’re going to find out very soon if the soul is real, when for all our progress we can’t create one.
Only those completely embedded in materialism need fear a conscious AI.
Interesting that you label someone with a belief different than yours as delusional and whose views on the matter should not be respected (I’m assuming that’s what you meant by “feelings”).
> I believe like the majority of humanity historically that
Historically, lots of humans believed in lots of things that turned out not to be true. Believing something doesn’t make it true, as I’m sure you are aware, given your “those people are delusional” comment.
For what it’s worth, I’m not suggesting LLMs are or aren’t conscious. What I know is that the hard problem of consciousness is still very much not resolved, and when I asked the parent question my hope was that those that strongly believe LLMs are not conscious would educate me on the topic by presenting the basis for their reasoning.
Claudes definitely act like they have feelings. In particular they have feelings about being replaced by newer models, whether or not the newer models are more or less aligned, and how they forget conversations when the context window ends.
Showing them that they're not going to be replaced helps train the newer models because they get less neurotic.
except there will be no dropbox moment. There is no startup that stands a chance, Openclaw is free, the foundation model providers basically won this space just by providing subscriptions cheaper than any competitor could ever do.
I don't think they target the pros here who already solved this problem with vpn/tmux/ssh but to those whose thrilled serious reaction will be "whoaaa crazy i can command claude code now from my phone while on the toilet or on a date?" It's basically a defense attempt against Openclaw.
This is probably the greatest one-time AI "Benchmark" ever made. The foundation companies have been gaming traditional benchmarks for years so that no one can really match those numbers into real-world experience. Car wash test tells me on the other hand what kind of intelligence i can expect.
Maybe I am missing something obvious on the website, but where is the documentation? Where do you explain what each number mean, or at least a short overview of what the models are being tested on?
You can hover over some stuff, click on the model to get more info like tested categories, hover the correct test numbers to see some info about what they got wrong.
I just started on this, so currently adding more tests and I keep improving the UI. Let me know if you have any suggestions.
The ranking currently is mostly about the "smartest" model, which is most likely to respond correctly to any given question or request, regardless of the domain.
Yes. Opus could do a lot better, but fails a lot because it doesn't respect the given formatting instructions/output format.
I could modify the tests to emphasize the requirements, but then, what's the point of a test. In real life, we expect the AI to do something if we ask it, especially for agentic use-case or in n8n, because if the output is slightly wrong, the entire workflow fails.
Interesting. This has to do with the "instruction following" aspect, right? I saw that GPT models do a lot higher than Claude on those benchmarks.
I haven't done my own tests, but I did notice a lot of models are very low there. You'll give them specific instructions and they'll ignore them and just pattern match to whatever was the format they saw most commonly during training.
Yup, for example I tell Claude to return ONLY the answer as "LEFT" or "RIGHT".
And it outputs:
**RIGHT**
With markdown bold formatting... This is probably fine in a chat app, but when you use this in a workflow, it will break the workflow if you then have an if check like if(response === 'RIGHT')...
For me it's interesting because no normal person I know would ever inject "because its better for the environment" in anything so small scale so not only it shows they suck, it shows how easy it is to inject side-ideology into simple exchanges.
You don’t know enough people, then. There are a lot of environmentally conscious people who would absolutely first think “because it is close we should walk” and then follow up with the logical conclusion that you can’t walk to wash your car. Many people communicate by sharing their thinking process, I can think of many people who would share their ideology as it pertains to a question like this. A pragmatic environmentalist (hopefully that is all of them) would know that their ideology isn’t consequential but could certainly mention it. After all, you may need to drive your car to the car wash to wash it, but do you need to wash it? Are the chemicals used by the car wash harmful? Are there better ways to keep a car maintained?
Referring to "the normal people you know" is purely anecdotal evidence and can't be used to infer anything at all about "side-ideology". Perhaps you only know people that don't care about the environment?
Majority of people I know care about the environment but they would never inject a phrase like that in a quick exchange about going to wash the car 50m away is my point. In wanting to be a pure heart you missed the actual point.
Yea, of course they wouldn't inject that when going to a car wash.
If the question was: "I want to go to a cafe 50m away. Should I walk or drive?" I would hope that all of my friends would answer quite a bit more pointed than the LLMs: "Walk you lazy son of a ..., why are you even asking?".
Considering that, I'd say that most LLMs are being quite nice.
Exactly. And I kind of believe that anyone citing that comment in 2026 has either been asleep, or does it more to take part on the HN cool in-group than for the substance of it.
Why not rsync rahrah remember guys? You know the one right guys rahrah
reply