Performance of Large Language Models in Diagnosing Rare Hematologic Diseases and the Impact of Their Diagnostic Outputs on Physicians: Combined Retrospective and Prospective Study
Journal of Medical Internet Research2025Vol. 27, pp. e77334–e77334
Hongbin Yu, Tian Chen, Xin Zhang, Yunfan Yang, Qinyu Liu, Chenlu Yang, Kai Shen, He Li, Wenjiao Tang, Xushu Zhong, Xiao Shuai, Xinmei Yu, Y. P. Liao, Chiyi Wang, Huanling Zhu, Yu Wu
Abstract
Without fine-tuning, new-generation commercial LLMs, particularly those with chain-of-thought reasoning, can identify diagnoses of rare hematologic diseases with high accuracy and significantly enhance the diagnostic performance of less-experienced physicians. Nevertheless, biased LLM outputs may mislead clinicians, highlighting the need for critical appraisal and cautious clinical integration with appropriate safeguard systems.