Hacker Newsnew | past | comments | ask | show | jobs | submit | phist_mcgee's commentslogin

Anthropic is famous for changing things under your feet. Claude code is basically alpha software with a global footprint.

WHOOOOOSH

Jensen Huang? He put GPUs on the internet!


Does anyone see the demand for coding agents that aren't subsided 90% by the AI company?


Or demand that isn't a condition of keeping your job?


What kind of project lead is going to answer for their CEO?


Not a normal one but also a normal project lead doesn’t get on HN and start publicly answering questions.

If you’re gonna start speaking for and defending your company though and your company CEO has made asinine statements that are related, I’m gonna ask.


Damage control to limit the rush for the exits?


That's not very nice. Be nice.


Who are you? The morality police?


Then you'd get people claiming that the benchmarks were 'paid for' by anthropic


one thing you learn from being on the internet is that you're never going to satisfy everybody


Oh my god who cares?


I think this says a lot about yourself and where your prejudices and preferences lie.


Preferences I think I get, but prejudices?

The OED defines prejudice as a "preconceived opinion that is not based on reason or actual experience."

My day to day work involves: full stack web dev, distributed systems, embedded systems, and machine learning. In addition to using AI tooling for dev tasks, we also use agents in production for various workflows and we also train/finetune models (some LLMs, but also other types of neural networks for anomaly detection, fault localization, time series forecasting, etc). I am basing my original commentary in this thread on all of that cumulative experience.

It has been my observation over the last almost 30 years of being a professional SWE that full stack web dev has been much easier and simpler than the other domains I work in. And even further, I find that models are much better at that domain on average than the other domains, measured by pass@k scores on private evals representing each domain. Anecdotal experience also tends to match the evals.

This tracks with all the other information we have pertaining to benchmark saturation, the "we need harder evals" crowd has been ringing this bell for the last 8-12 months. Models are getting very good at the less complex tasks.

I don't believe it will remain that way forever, but at present its far more common to see someone one shot a full stack web app from a single prompt than something like kernel driver for a NIC. One class of devs is seeing a massive performance jump, another class is not.

I don't see how that can be perceived as prejudice, it just may be an opinion you don't agree with or an observation that doesn't match your own experience (both of which are totally valid and understandable).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: