More

phist_mcgee · 2026-03-14T00:52:30 1773449550

Anthropic is famous for changing things under your feet. Claude code is basically alpha software with a global footprint.

phist_mcgee · 2026-03-01T23:18:44 1772407124

WHOOOOOSH

phist_mcgee · 2026-02-27T22:24:57 1772231097

Jensen Huang? He put GPUs on the internet!

phist_mcgee · 2026-02-27T22:13:55 1772230435

Does anyone see the demand for coding agents that aren't subsided 90% by the AI company?

shimman · 2026-02-27T22:34:37 1772231677

Or demand that isn't a condition of keeping your job?

phist_mcgee · 2026-02-22T02:39:32 1771727972

What kind of project lead is going to answer for their CEO?

sarchertech · 2026-02-22T03:06:11 1771729571

Not a normal one but also a normal project lead doesn’t get on HN and start publicly answering questions.

If you’re gonna start speaking for and defending your company though and your company CEO has made asinine statements that are related, I’m gonna ask.

phist_mcgee · 2026-02-06T22:43:54 1770417834

Damage control to limit the rush for the exits?

phist_mcgee · 2026-02-04T10:34:39 1770201279

That's not very nice. Be nice.

booleandilemma · 2026-02-04T12:03:11 1770206591

Who are you? The morality police?

phist_mcgee · 2026-01-29T22:57:58 1769727478

Then you'd get people claiming that the benchmarks were 'paid for' by anthropic

nikcub · 2026-01-29T23:05:11 1769727911

one thing you learn from being on the internet is that you're never going to satisfy everybody

phist_mcgee · 2026-01-21T02:55:19 1768964119

Oh my god who cares?

phist_mcgee · 2026-01-20T03:35:37 1768880137

I think this says a lot about yourself and where your prejudices and preferences lie.

spmurrayzzz · 2026-01-20T15:10:45 1768921845

Preferences I think I get, but prejudices?

The OED defines prejudice as a "preconceived opinion that is not based on reason or actual experience."

My day to day work involves: full stack web dev, distributed systems, embedded systems, and machine learning. In addition to using AI tooling for dev tasks, we also use agents in production for various workflows and we also train/finetune models (some LLMs, but also other types of neural networks for anomaly detection, fault localization, time series forecasting, etc). I am basing my original commentary in this thread on all of that cumulative experience.

It has been my observation over the last almost 30 years of being a professional SWE that full stack web dev has been much easier and simpler than the other domains I work in. And even further, I find that models are much better at that domain on average than the other domains, measured by pass@k scores on private evals representing each domain. Anecdotal experience also tends to match the evals.

This tracks with all the other information we have pertaining to benchmark saturation, the "we need harder evals" crowd has been ringing this bell for the last 8-12 months. Models are getting very good at the less complex tasks.

I don't believe it will remain that way forever, but at present its far more common to see someone one shot a full stack web app from a single prompt than something like kernel driver for a NIC. One class of devs is seeing a massive performance jump, another class is not.

I don't see how that can be perceived as prejudice, it just may be an opinion you don't agree with or an observation that doesn't match your own experience (both of which are totally valid and understandable).