Trials / Not Yet Recruiting

Not Yet RecruitingNCT07414966

Scalable Clinical Oversight of Large Language Models Via Uncertainty Triangulation

Prospective Evaluation of a Model-Agnostic Meta-Verification Framework (SCOUT) for Scalable Clinical Oversight of Large Language Model Outputs in Coronary Heart Disease Diagnosis: A Multi-Reader, Randomized, Crossover Trial

Status: Not Yet Recruiting
Phase: N/A
Study type: Interventional
Enrollment: 7 (estimated)

Sponsor: China National Center for Cardiovascular Diseases · Other Government
Sex: All
Age: 18 Years
Healthy volunteers: Not accepted

Summary

This prospective, multi-reader, randomized crossover trial evaluates SCOUT (Scalable Clinical Oversight via Uncertainty Triangulation), a model-agnostic meta-verification framework that selectively defers unreliable large language model (LLM) predictions to clinicians by triangulating three orthogonal uncertainty signals: model heterogeneity, stochastic inconsistency, and reasoning critique. The trial assesses whether SCOUT-assisted review can reduce physician review time compared with standard manual review of AI-generated diagnoses while maintaining non-inferior diagnostic accuracy in coronary heart disease (CHD) subtyping.

Detailed description

Background: Large language models are increasingly deployed in clinical workflows, yet requiring clinician review of every AI output negates the efficiency gains that motivate their adoption. SCOUT addresses this efficiency-safety paradox through algorithmic meta-verification. The SCOUT framework triangulates three orthogonal external signals to determine case-level uncertainty: (1) Model Heterogeneity - whether a structurally different auxiliary LLM agrees with the primary model; (2) Stochastic Inconsistency - whether repeated sampling from the same model yields divergent outputs; (3) Reasoning Critique - whether an external checker model identifies logical flaws in the chain-of-thought reasoning. In this crossover trial, 7 clinicians of varying seniority (2 junior residents, 3 senior residents, 2 attending physicians) each review all 110 cases under both standard manual review and SCOUT-assisted review workflows. The study evaluates workflow efficiency (primary endpoint) and diagnostic accuracy (secondary endpoint).

Conditions

Coronary Heart Disease (CHD)

Interventions

Type	Name	Description
DIAGNOSTIC_TEST	SCOUT-Assisted Review Workflow	SCOUT-Assisted Review (Intervention Arm): Physicians review 56 cases processed through the SCOUT framework. For cases classified as low-uncertainty (D(x)=0), the AI prediction is auto-accepted without physician review. For high-uncertainty cases (D(x)=1), the physician reviews the case with access to the main model's chain-of-thought reasoning and the meta-verification audit results. The main model is DeepSeek-V3.1 with chain-of-thought prompting.
DIAGNOSTIC_TEST	Standard Manual Review Workflow	Physicians perform a full manual review of 54 cases using raw medical records with access to the AI model's predictions and reasoning, but without SCOUT uncertainty stratification or selective deferral.

Timeline

Start date: 2026-02-19
Primary completion: 2026-02-28
Completion: 2026-02-28
First posted: 2026-02-17
Last updated: 2026-02-17

Source: ClinicalTrials.gov record NCT07414966. Inclusion in this directory is not an endorsement.