As soon as I found out that this model launched, I tried giving it a problem tha...

As soon as I found out that this model launched, I tried giving it a problem that I have been trying to code in Lean4 (showing that quicksort preserves multiplicity). All the other frontier models I tried failed.

I used the pro version and it started out well (as they all did), but it couldn't prove it. The interesting part is that it typoed the name of a tactic, spelling it "abjel" instead of "abel", even though it correctly named the concept. I didn't expect the model to make this kind of error, because they all seems so good at programming lately, and none of the other models did, although they did some other naming errors.

I am sure I can get it to solve the problem with good context engineering, but it's interesting to see how they struggle with lesser represented programming languages by themselves.