Seems to be the first model that one-shots my secret benchmark about nested SQLi...

osn9363739 · 2025-11-19T03:37:37 1763523457

Out of interest. Does it one shot it every time?

raffkede · 2025-11-19T05:13:27 1763529207

Will try again just tried once in the phone a few hours ago, other models were able to do quite a lot but usually missing some stuff this time it managed nested navigation quite well, lot of stuff missing for sure I just tested the basics with the play button in AI studio

osn9363739 · 2025-11-19T05:30:13 1763530213

It seems to be that first impression that makes all the difference. Especially with the randomness that comes with llms in general. which maybe explains the 'wow this is so much better' vs the 'this is no better than xxx' commments littered throughout this whole parent post.