Clinical Trials Directory

Trials / Completed

CompletedNCT07481162

AI vs Human Exam Assessment and Development (AHEAD Trial)

Psychometric Performance and Student Perceptions of AI- Versus Human-Generated Multiple-Choice Question Development in Medical Education: The AHEAD Randomized Controlled Trial

Status
Completed
Phase
N/A
Study type
Interventional
Enrollment
258 (actual)
Sponsor
University of British Columbia · Academic / Other
Sex
All
Age
18 Years
Healthy volunteers
Accepted

Summary

The Artificial Intelligence (AI) vs Human Exam Assessment and Development (AHEAD) Trial is a participant-blinded randomized controlled trial conducted among first-year medical students at the University of British Columbia. The study evaluates whether multiple-choice examination questions generated using large language models (LLMs) perform comparably to traditionally human-written questions in medical education. Participants were randomized to complete one of two versions of a formative mock final examination consisting of 112 case-based single-best-answer multiple-choice questions (MCQs) aligned with the same course learning objectives. One exam version contained AI-generated questions produced using a structured LLM workflow with independent AI verification, while the other contained questions authored by senior medical students using conventional methods. The study evaluates exam feasibility, psychometric reliability, validity, student acceptability, and educational impact. Outcomes include exam performance, item discrimination indices, distractor efficiency, student perceptions of exam quality and difficulty, and changes in perceived preparedness for the upcoming summative examination.

Detailed description

The AHEAD Trial (AI vs Human Exam Assessment and Development) is a single-center, participant-blinded randomized controlled trial conducted among first-year Doctor of Medicine (MD) students enrolled in the Foundations of Medical Practice I (MEDD 411) course at the University of British Columbia. Participants were randomized in a 1:1 ratio to complete either an AI-generated or a human-generated mock final examination. Both exams consisted of 112 case-based single-best-answer multiple-choice questions (MCQs) aligned with the same MEDD 411 curricular objectives. AI-generated questions were produced using a structured workflow involving ChatGPT for question generation and Google Gemini for independent verification. Human-generated questions were authored by senior medical students without AI assistance and underwent independent peer review. Both exams followed identical formatting guidelines and assessed the same learning objectives. All participants completed identical pre-exam and post-exam surveys assessing demographic characteristics, familiarity with artificial intelligence in education, and perceptions of the examination experience. The study evaluates the utility of AI-generated assessments using van der Vleuten's Assessment Utility Framework, including feasibility, reliability, validity, acceptability, and educational impact. The trial aims to determine whether large language models can accelerate the development of formative medical examinations while maintaining comparable psychometric quality and educational value relative to traditional human-authored questions.

Conditions

Interventions

TypeNameDescription
OTHERAI-generated MCQ examinationA formative mock examination composed of 112 case-based multiple-choice questions generated using large language models aligned with course learning objectives.
OTHERHuman-generated MCQ examinationA formative mock examination composed of 112 case-based multiple-choice questions written by senior medical students using conventional item-writing methods aligned with the same course learning objectives.

Timeline

Start date
2024-12-08
Primary completion
2024-12-09
Completion
2024-12-09
First posted
2026-03-18
Last updated
2026-03-18

Locations

1 site across 1 country: Canada

Source: ClinicalTrials.gov record NCT07481162. Inclusion in this directory is not an endorsement.