ChatGPT Outperforms In Neurology Professional-Level Exam

ChatGPT Achieves 85% in Professional-Level Neurology Exam

Prefer us on Google

Written & Edited by
Ciaran Lyons

11 December 2023 22:52 UTC

In a cross-sectional study, researchers explored the performance of large language models (LLMs) in a professional-level neurology exam.
Both ChatGPT versions exhibit confidence in responses, signaling potential for improvement in future iterations.
LLM 2 excels in both lower and higher-order questions, showcasing cognitive versatility. These findings hint at the transformative potential.

In a recent cross-sectional study researchers explored the performance of large language models (LLMs) in neurology board-style examinations.

The study, which utilized a question bank approved by the American Board of Psychiatry and Neurology, revealed insights into these advanced language models.

ChatGPT Dominates Neurology Exam

The study involved two versions of the LLM ChatGPT—version 3.5 and version 4. The findings revealed that LLM 2 significantly outperforms its predecessor. Furthermore, even surpassing the mean human score on the neurology board examination.

ChatGPT Performs Better On Lower-Order Exam Questions

However, even the older model, LLM 1, demonstrated sufficient performance, albeit slightly below the human average, scoring 66.8%.

Both models consistently used confident language, irrespective of the correctness of their answers, indicating a potential area for improvement in future iterations.

According to the study categorized questions into lower-order and higher-order based on the Bloom taxonomy.

Both models performed better on lower-order questions. However, LLM 2 exhibited excellence in both lower and higher-order questions, showcasing its versatility and cognitive abilities.