I’m pretty sure you can think of one yourself, I’m not going to play this game. Now it’s 5.4 Thinking, before that it was 5.3, before that 5.2, 5.1, 5, before that it was 4… At every stage there’s someone saying “oh, the previous model doesn’t matter, the current one is where it’s at”. And when it’s shown the current model can’t do something, there’s always some other excuse. It’s a profoundly bad faith argument, the very definition of moving the goal posts.
I do have a number of examples to give you, but I no longer share those online so they aren’t caught and gamed. Now I share them strictly in person.
He means that if the problem becomes known, the AI companies will hack in a workaround rather than solving the problem by making the model more intelligent. Given that they have been caught cheating in that way in the past, I can't blame the GP for not sharing his tests.