More

svantana · 2026-01-06T14:04:18 1767708258

> 80mph to cars that push past 600mph

I have yet to see evidence that this is really the case. Already 15 years ago, people were creating impressive software over the course of a hackday, by glueing open source repos together in a high level language. Now that process has been sped up even more, but does it matter that much if the prototype takes 4 or 24 hours to make? The real value is in well-thought-out, highly polished apps, and AFAICT those still take person-years to complete.

elevation · 2026-01-06T22:06:48 1767737208

The REAL speed up comes to efforts that are already well-designed, but require lots of human busy work. I've personally seen multi-day human efforts reduced to a 15-minute session with an LLM. In a way, LLMs are reducing implementation costs to the kolmolgorov complexity -- you can get what you prompt for, but you have to remember to prompt for everything you want to get -- which comes easiest if you already took time to consider the design.

svantana · 2026-01-03T11:33:43 1767440023

Now. For example in Algoriddim DJay

svantana · 2025-12-31T20:37:13 1767213433

> At some level, the simplest thing to do is to give up and crash if things are no longer sane.

The problem with this attitude (that many of my co-workers espouse) is that it can have serious consequences for both the user and your business.

- The user may have unsaved data - Your software may gain a reputation of being crash-prone

If a valid alternative is to halt normal operations and present an alert box to the user saying "internal error 573 occurred. please restart the app", then that is much preferred IMO.

Calavar · 2025-12-31T22:06:55 1767218815

> If a valid alternative is to halt normal operations and present an alert box to the user saying "internal error 573 occurred. please restart the app", then that is much preferred IMO.

You can do this in your panic or terminate handler. It's functionally the same error handling strategy, just with a different veneer painted over the top.

lmm · 2026-01-01T03:05:51 1767236751

Crashing is bad, but silently continuing in a corrupt state is much worse. Better to lose the last few hours of the user's work than corrupt their save permanently, for example.

Krssst · 2025-12-31T21:40:17 1767217217

> Your software may gain a reputation of being crash-prone

Hopefully crashing on unexpected state rather than silently running on invalid state leads to more bugs being found and fixed during development and testing and less crash-prone software.

saagarjha · 2026-01-01T11:55:59 1767268559

So you don't get a crash log? No, thanks.

SAI_Peregrinus · 2026-01-01T16:29:59 1767284999

- The user may have unsaved data

That should not need to be a consideration. Crashing should restore the state from just before the crash. This isn't the '90s, users shouldn't have to press "save" constantly to avoid losing data.

svantana · 2025-12-12T11:24:17 1765538657

That's probably because "Gemini 3.5 Pro" doesn't exist

svantana · 2025-11-28T13:53:49 1764338029

The silly verbiage can be excused but not the graphs with completely unlabeled data points, IMO.

postexitus · 2025-11-28T14:53:26 1764341606

Yep that's what I mean - looks like AI slop to me.

svantana · 2025-11-27T15:28:22 1764257302

That's mentioned in the article, but is the lock-in really that big? In some cases, it's as easy as changing the backend of your high-level ML library.

tempest_ · 2025-11-27T16:56:35 1764262595

That is like how every ORM promises you can just swap out the storage layer.

In practice it doesnt quite work out that way.

LogicFailsMe · 2025-11-27T15:35:07 1764257707

That's what it is on paper. But in practice you trade one set of hardware idiosyncrasies for another and unless you have the right people to deal with that, it's a hassle.

lvl155 · 2025-11-27T15:50:06 1764258606

On top, when you get locked into Google Cloud, you’re effectively at the mercy of their engineers to optimize and troubleshoot. Do you think Google will help their potential competitors before they help themselves? Highly unlikely considering their actions in the past decade plus.

LogicFailsMe · 2025-11-27T19:03:31 1764270211

Given my Fitbit's inability to play nice with my pixel phone, I have zero faith in Google engineers.

What else would one expect when their core value is hiring generalists over specialists* and their lousy retention record?

*Pay no attention to the specialists they acquihire and pay top dollar... And even they don't stick around.

Irishsteve · 2025-11-27T15:37:53 1764257873

I thin k you can only run on google cloud not aws bare metal azure etc

svantana · 2025-11-26T14:19:12 1764166752

According to that site, there were more tech layoffs in 2022 than in 2024 or 2025. Doesn't that speak against the "AI is taking tech jobs" hypothesis?

hamdingers · 2025-11-26T16:04:12 1764173052

Massive, embarrassingly shortsighted overhiring in 2020 and 2021 seems like the more likely culprit.

amrocha · 2025-11-27T01:18:24 1764206304

I agree, I think AI taking jobs is all smoke and mirrors by companies trying to gas up their stock prices

svantana · 2025-11-25T13:42:42 1764078162

Doesn't work in this case because the 'talk' (github PR comments) is also computer generated. But in person (i.e. at work) it's a good strategy

svantana · 2025-11-18T16:27:28 1763483248

SWEBench-Verified is probably benchmaxxed at this stage. Claude isn't even the top performer, that honor goes to Doubao [1].

Also, the confidence interval for a such a small dataset is about 3 percent points, so these differences could just be up to chance.

[1] https://www.swebench.com/

usaar333 · 2025-11-18T17:39:44 1763487584

claude 4.5 gets 82% on their own highly customized scaffolding. (parallel compute with a scoring function). That beats Doubao

svantana · 2025-11-18T16:21:33 1763482893

Grok got to hold the top spot of LMArena-text for all of ~24 hours, good for them [1]. With stylecontrol enabled, that is. Without stylecontrol, gemini held the fort.

[1] https://lmarena.ai/leaderboard/text

inkysigma · 2025-11-18T16:56:21 1763484981

Is it just me or is that link broken because of the cloudflare outage?

Edit: nvm it looks to be up for me again

dyauspitr · 2025-11-18T17:35:36 1763487336

Grok is heavily censored though

KingMob · 2025-11-19T11:06:38 1763550398

Is it censored... or just biased towards edge-lord MechaHitler nonsense whenever Musk feels like tinkering with the system prompt?