Assessing LLM-generated vs. expert-created clinical anatomy MCQs: a student perception-based comparative study in medical education
المؤلف | Elzayyat, Maram |
المؤلف | Mohammad, Janatul Naeim |
المؤلف | Zaqout, Sami |
تاريخ الإتاحة | 2025-09-10T05:34:15Z |
تاريخ النشر | 2025-01-01 |
اسم المنشور | Medical Education Online |
المعرّف | http://dx.doi.org/10.1080/10872981.2025.2554678 |
الاقتباس | Elzayyat, M., Mohammad, J. N., & Zaqout, S. (2025). Assessing LLM-generated vs. expert-created clinical anatomy MCQs: a student perception-based comparative study in medical education. Medical Education Online, 30(1). https://doi.org/10.1080/10872981.2025.2554678 |
الملخص | Large language models (LLMs) such as ChatGPT and Gemini are increasingly used to generate educational content in medical education, including multiple-choice questions (MCQs), but their effectiveness compared to expert-written questions remains underexplored, particularly in anatomy. We conducted a cross-sectional, mixed-methods study involving Year 2–4 medical students at Qatar University, where participants completed and evaluated three anonymized MCQ sets—authored by ChatGPT, Google-Gemini, and a clinical anatomist—across 17 quality criteria. Descriptive and chi-square analyses were performed, and optional feedback was reviewed thematically. Among 48 participants, most rated the three MCQ sources as equally effective, although ChatGPT was more often preferred for helping students identify and confront their knowledge gaps through challenging distractors and diagnostic insight, while expert-written questions were rated highest for deeper analytical thinking. A significant variation in preferences was observed across sources (χ² (64) = 688.79, p <.001). Qualitative feedback emphasized the need for better difficulty calibration and clearer distractors in some AI-generated items. Overall, LLM-generated anatomy MCQs can closely match expert-authored ones in learner-perceived value and may support deeper engagement, but expert review remains critical to ensure clarity and alignment with curricular goals. A hybrid AI-human workflow may provide a promising path for scalable, high-quality assessment design in medical education. |
راعي المشروع | This publication was supported by Qatar University Internal Grant Number (QUST-1-CMED-2025–231). The findings achieved herein are solely the responsibility of the authors. |
اللغة | en |
الناشر | Taylor and Francis Group |
الموضوع | AI-generated content anatomy assessment quality large language models medical education |
النوع | Article |
رقم العدد | 1 |
رقم المجلد | 30 |
ESSN | 1087-2981 |
الملفات في هذه التسجيلة
هذه التسجيلة تظهر في المجموعات التالية
-
أبحاث الطب [1891 items ]