Assessing LLM-generated vs. expert-created clinical anatomy MCQs: a student perception-based comparative study in medical education
Author | Elzayyat, Maram |
Author | Mohammad, Janatul Naeim |
Author | Zaqout, Sami |
Available date | 2025-09-10T05:34:15Z |
Publication Date | 2025-01-01 |
Publication Name | Medical Education Online |
Identifier | http://dx.doi.org/10.1080/10872981.2025.2554678 |
Citation | Elzayyat, M., Mohammad, J. N., & Zaqout, S. (2025). Assessing LLM-generated vs. expert-created clinical anatomy MCQs: a student perception-based comparative study in medical education. Medical Education Online, 30(1). https://doi.org/10.1080/10872981.2025.2554678 |
Abstract | Large language models (LLMs) such as ChatGPT and Gemini are increasingly used to generate educational content in medical education, including multiple-choice questions (MCQs), but their effectiveness compared to expert-written questions remains underexplored, particularly in anatomy. We conducted a cross-sectional, mixed-methods study involving Year 2–4 medical students at Qatar University, where participants completed and evaluated three anonymized MCQ sets—authored by ChatGPT, Google-Gemini, and a clinical anatomist—across 17 quality criteria. Descriptive and chi-square analyses were performed, and optional feedback was reviewed thematically. Among 48 participants, most rated the three MCQ sources as equally effective, although ChatGPT was more often preferred for helping students identify and confront their knowledge gaps through challenging distractors and diagnostic insight, while expert-written questions were rated highest for deeper analytical thinking. A significant variation in preferences was observed across sources (χ² (64) = 688.79, p <.001). Qualitative feedback emphasized the need for better difficulty calibration and clearer distractors in some AI-generated items. Overall, LLM-generated anatomy MCQs can closely match expert-authored ones in learner-perceived value and may support deeper engagement, but expert review remains critical to ensure clarity and alignment with curricular goals. A hybrid AI-human workflow may provide a promising path for scalable, high-quality assessment design in medical education. |
Sponsor | This publication was supported by Qatar University Internal Grant Number (QUST-1-CMED-2025–231). The findings achieved herein are solely the responsibility of the authors. |
Language | en |
Publisher | Taylor and Francis Group |
Subject | AI-generated content anatomy assessment quality large language models medical education |
Type | Article |
Issue Number | 1 |
Volume Number | 30 |
ESSN | 1087-2981 |
Files in this item
This item appears in the following Collection(s)
-
Medicine Research [1891 items ]