Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

false, chess ELO is pretty good

https://maxim-saplin.github.io/llm_chess/

ets not cherry pick and actually see benchmarks please. i would say even ~1000 elo means that it can reason better than the average human.



If you look at the "workflow" section of that page, they had to add a bunch of scaffolding around telling the model what moves are legal -- an llm can't keep enough context to know how to play chess; only to choose an advantageous move from a given list. But feel free to "cherry pick".


why do you think this falsifies that it can't reason?


i ran the benchmark without the valid moves tool as well as the three mistakes grace and gpt-5.4 holds well. it can achieve 1000 ELO which is much higher than my own.

this clearly tells me that GPT is good at chess, at least better than a normal person who has played ~30-40 games in their life.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: