Trials / Not Yet Recruiting
Not Yet RecruitingNCT07414966
Scalable Clinical Oversight of Large Language Models Via Uncertainty Triangulation
Prospective Evaluation of a Model-Agnostic Meta-Verification Framework (SCOUT) for Scalable Clinical Oversight of Large Language Model Outputs in Coronary Heart Disease Diagnosis: A Multi-Reader, Randomized, Crossover Trial
- Status
- Not Yet Recruiting
- Phase
- N/A
- Study type
- Interventional
- Enrollment
- 7 (estimated)
- Sponsor
- China National Center for Cardiovascular Diseases · Other Government
- Sex
- All
- Age
- 18 Years
- Healthy volunteers
- Not accepted
Summary
This prospective, multi-reader, randomized crossover trial evaluates SCOUT (Scalable Clinical Oversight via Uncertainty Triangulation), a model-agnostic meta-verification framework that selectively defers unreliable large language model (LLM) predictions to clinicians by triangulating three orthogonal uncertainty signals: model heterogeneity, stochastic inconsistency, and reasoning critique. The trial assesses whether SCOUT-assisted review can reduce physician review time compared with standard manual review of AI-generated diagnoses while maintaining non-inferior diagnostic accuracy in coronary heart disease (CHD) subtyping.
Detailed description
Background: Large language models are increasingly deployed in clinical workflows, yet requiring clinician review of every AI output negates the efficiency gains that motivate their adoption. SCOUT addresses this efficiency-safety paradox through algorithmic meta-verification. The SCOUT framework triangulates three orthogonal external signals to determine case-level uncertainty: (1) Model Heterogeneity - whether a structurally different auxiliary LLM agrees with the primary model; (2) Stochastic Inconsistency - whether repeated sampling from the same model yields divergent outputs; (3) Reasoning Critique - whether an external checker model identifies logical flaws in the chain-of-thought reasoning. In this crossover trial, 7 clinicians of varying seniority (2 junior residents, 3 senior residents, 2 attending physicians) each review all 110 cases under both standard manual review and SCOUT-assisted review workflows. The study evaluates workflow efficiency (primary endpoint) and diagnostic accuracy (secondary endpoint).
Conditions
Interventions
| Type | Name | Description |
|---|---|---|
| DIAGNOSTIC_TEST | SCOUT-Assisted Review Workflow | SCOUT-Assisted Review (Intervention Arm): Physicians review 56 cases processed through the SCOUT framework. For cases classified as low-uncertainty (D(x)=0), the AI prediction is auto-accepted without physician review. For high-uncertainty cases (D(x)=1), the physician reviews the case with access to the main model's chain-of-thought reasoning and the meta-verification audit results. The main model is DeepSeek-V3.1 with chain-of-thought prompting. |
| DIAGNOSTIC_TEST | Standard Manual Review Workflow | Physicians perform a full manual review of 54 cases using raw medical records with access to the AI model's predictions and reasoning, but without SCOUT uncertainty stratification or selective deferral. |
Timeline
- Start date
- 2026-02-19
- Primary completion
- 2026-02-28
- Completion
- 2026-02-28
- First posted
- 2026-02-17
- Last updated
- 2026-02-17
Source: ClinicalTrials.gov record NCT07414966. Inclusion in this directory is not an endorsement.