Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
LLMs and Diagnostic Reasoning: A Randomized Clinical Vignette Study [pdf] (medrxiv.org)
3 points by trott on Oct 2, 2024 | hide | past | favorite | 3 comments


TLDR:

Physicians scored 73.7. Physicians armed with GPT-4 scored 76.3. But GPT-4 alone scored 89.2.

The authors think it's unlikely that the materials are in the GPT-4 training data, because the cases have never been publicly released.


thanks for sharing.

the implications are fascinating, if the findings are generalizable and reproducible.

the study suggests LLMs may already be materially superior to experts in a critical field like medicine, and that inexpert users hold back LLMs.

given the author affiliations, it's also likely that the tested physicians are in the top tier -- suggesting even greater disparity between LLMs and doctors in less advanced areas.


a doctor friend highlighted two key limitations: only six cases were evaluated per physician and half the physicians were only residents.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: