Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seems to be the first model that one-shots my secret benchmark about nested SQLite and it did it in 30s,


Out of interest. Does it one shot it every time?


Will try again just tried once in the phone a few hours ago, other models were able to do quite a lot but usually missing some stuff this time it managed nested navigation quite well, lot of stuff missing for sure I just tested the basics with the play button in AI studio


It seems to be that first impression that makes all the difference. Especially with the randomness that comes with llms in general. which maybe explains the 'wow this is so much better' vs the 'this is no better than xxx' commments littered throughout this whole parent post.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: