Assessing LLM-generated vs. expert-created clinical anatomy MCQs: a student perception-based comparative study in medical education

Elzayyat, Maram; Mohammad, Janatul Naeim; Zaqout, Sami

المؤلف	Elzayyat, Maram
المؤلف	Mohammad, Janatul Naeim
المؤلف	Zaqout, Sami
تاريخ الإتاحة	2025-09-10T05:34:15Z
تاريخ النشر	2025-01-01
اسم المنشور	Medical Education Online
المعرّف	http://dx.doi.org/10.1080/10872981.2025.2554678
الاقتباس	Elzayyat, M., Mohammad, J. N., & Zaqout, S. (2025). Assessing LLM-generated vs. expert-created clinical anatomy MCQs: a student perception-based comparative study in medical education. Medical Education Online, 30(1). https://doi.org/10.1080/10872981.2025.2554678
معرّف المصادر الموحد	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105014827122&origin=inward
معرّف المصادر الموحد	http://hdl.handle.net/10576/67186
الملخص	Large language models (LLMs) such as ChatGPT and Gemini are increasingly used to generate educational content in medical education, including multiple-choice questions (MCQs), but their effectiveness compared to expert-written questions remains underexplored, particularly in anatomy. We conducted a cross-sectional, mixed-methods study involving Year 2–4 medical students at Qatar University, where participants completed and evaluated three anonymized MCQ sets—authored by ChatGPT, Google-Gemini, and a clinical anatomist—across 17 quality criteria. Descriptive and chi-square analyses were performed, and optional feedback was reviewed thematically. Among 48 participants, most rated the three MCQ sources as equally effective, although ChatGPT was more often preferred for helping students identify and confront their knowledge gaps through challenging distractors and diagnostic insight, while expert-written questions were rated highest for deeper analytical thinking. A significant variation in preferences was observed across sources (χ² (64) = 688.79, p <.001). Qualitative feedback emphasized the need for better difficulty calibration and clearer distractors in some AI-generated items. Overall, LLM-generated anatomy MCQs can closely match expert-authored ones in learner-perceived value and may support deeper engagement, but expert review remains critical to ensure clarity and alignment with curricular goals. A hybrid AI-human workflow may provide a promising path for scalable, high-quality assessment design in medical education.
راعي المشروع	This publication was supported by Qatar University Internal Grant Number (QUST-1-CMED-2025–231). The findings achieved herein are solely the responsibility of the authors.
اللغة	en
الناشر	Taylor and Francis Group
الموضوع	AI-generated content anatomy assessment quality large language models medical education
العنوان	Assessing LLM-generated vs. expert-created clinical anatomy MCQs: a student perception-based comparative study in medical education
النوع	Article
رقم العدد	1
رقم المجلد	30
ESSN	1087-2981
dc.accessType	Open Access

الملفات في هذه التسجيلة

الاسم:: Assessing LLM-generated vs. ...
الحجم:: 3.864Mb
الصيغة:: PDF
الوصف:: Main article

عرض / فتح

هذه التسجيلة تظهر في المجموعات التالية

أبحاث الطب [‎1891‎ items ]

عرض بسيط للتسجيلة

Assessing LLM-generated vs. expert-created clinical anatomy MCQs: a student perception-based comparative study in medical education

الملفات في هذه التسجيلة

هذه التسجيلة تظهر في المجموعات التالية

Video