OpenAI’s cutting-edge language model, ChatGPT-4.0, has showcased remarkable potential in the field of clinical neurology by achieving an 85% correctness rate in a recent proof-of-concept study. The study, conducted by researchers from the University Hospital Heidelberg and the German Cancer Research Center, utilized questions from the American Board of Psychiatry and Neurology, along with some from the European Board for Neurology.
In this comparative study, the previous model, ChatGPT-3.5, scored 66.8%, correctly answering 1,306 out of 1,956 questions. In contrast, the more advanced ChatGPT-4.0 demonstrated a significant improvement with an 85% accuracy rate, answering 1,662 questions correctly. Notably, the average human score was 73.8%. Impressively, ChatGPT-4.0 outperformed human respondents in questions related to behavioral, cognitive, and psychological domains.
The research findings suggest that, with further refinements, large language models (LLMs) like ChatGPT could find substantial applications in clinical neurology. However, the study also highlighted areas of weakness in both models, particularly in tasks requiring “higher-order thinking” compared to those necessitating only “lower-order thinking.”
While the study positions LLMs as promising tools for documentation and decision-making support systems in neurology, the researchers emphasize caution. Dr. Varun Venkataramani, one of the authors, clarified that the study serves as a proof-of-concept, and significant development and fine-tuning are necessary to make LLMs suitable for practical application in clinical neurology.
Despite the potential, the researchers advise neurologists to remain cautious about relying on these models for tasks involving high-order cognitive functions. They emphasize that LLMs, though powerful, are still imperfect and require further refinement before being integrated into routine clinical practices.
The broader context of artificial intelligence in healthcare is acknowledged, with AI already contributing to significant tasks such as cancer research for AstraZeneca and addressing the overprescription of antibiotics in Hong Kong. The study concludes with a recognition of the need for ongoing development and customization of LLMs to ensure their optimal applicability in the field of clinical neurology.
