Trials / Completed

CompletedNCT07493681

ChatGPT in the Diagnosis and Management of Complex Polyneuropathies: Comparative Analysis With Neurologists Using Real-World Cases

Role of ChatGPT in the Differential Diagnosis of Polyneuropathies and Comparison of Its Performance With That of Peripheral Neuropathy Specialists and Non-specialists

Status: Completed
Phase: —
Study type: Observational
Enrollment: 100 (actual)

Sponsor: Istituto Clinico Humanitas · Academic / Other
Sex: All
Age: —
Healthy volunteers: Not accepted

Summary

BACKGROUND AND PURPOSE Polyneuropathies are diseases affecting the peripheral nerves that occur in approximately 1% of the general population, rising to up to 13% among older adults. Despite their prevalence, accurate diagnosis is often challenging and requires specialist expertise that is not uniformly available. Patients evaluated in primary care or non-specialist settings frequently experience diagnostic delays or misdiagnoses, highlighting the need for innovative tools to support clinicians at critical points in the diagnostic process. Artificial intelligence (AI) large language models (LLMs), such as ChatGPT, are increasingly being explored as potential aids in clinical diagnosis. These tools can process complex clinical information and generate diagnostic suggestions at low cost and with broad accessibility. However, their performance in specialised neurological conditions, particularly complex polyneuropathies, has not yet been rigorously evaluated in real-world settings. STUDY OBJECTIVES This study aims to evaluate the diagnostic performance of ChatGPT-4o on real-world polyneuropathy cases and to compare it with that of peripheral nerve disease specialists and non-specialist neurologists. A secondary objective is to assess whether exposure to ChatGPT-4o outputs influences and potentially improves neurologist diagnostic accuracy. STUDY DESIGN This will be a comparative diagnostic accuracy study conducted at two tertiary referral centres for peripheral neuropathies in Milan, Italy. One hundred patients with confirmed polyneuropathy diagnoses will be randomly selected from consecutive outpatients. Each case will be summarised in a standardised format including demographics, symptom history, neurological examination findings, nerve conduction study results, and screening laboratory data. Only cases with a diagnosis confirmed after at least 12 months of clinical follow-up will be included. ChatGPT-4o will be presented with each case using a structured prompt, and will be asked to provide: (1) a leading diagnosis, (2) two alternative differential diagnoses, and (3) a single recommended confirmatory diagnostic test. The model will be run in two independent trials to assess response consistency. The same 100 cases will also be reviewed by neurologists from multiple international centres. Participants will be classified as either peripheral nerve disease specialists, neurologists routinely practising in tertiary polyneuropathy centres, or non-specialists, including general neurologists or those sub-specialised in other fields. Neurologists will first provide their own diagnostic assessments independently, and will subsequently be shown ChatGPT-4o's output with the option to revise their responses. EXPECTED SIGNIFICANCE This study will provide evidence on whether AI-based LLMs can serve as reliable diagnostic aids in complex polyneuropathy cases.

Detailed description

Background and Rationale Polyneuropathies represent a heterogeneous group of disorders affecting the peripheral nervous system and constitute one of the most common neurological conditions encountered in clinical practice. Their prevalence is estimated at approximately 1% in the general population and increases substantially with age, reaching nearly 4% among middle-aged individuals and up to 13% among the elderly. Peripheral neuropathies therefore represent a major contributor to neurological morbidity and healthcare utilization worldwide. Establishing the etiological diagnosis of polyneuropathy remains clinically challenging. Despite advances in laboratory testing and neurophysiological techniques, the diagnostic process continues to rely heavily on careful clinical evaluation, including detailed patient history and neurological examination, followed by targeted use of electrophysiological and laboratory investigations. In many cases, accurate diagnosis requires specialized expertise in neuromuscular disorders. Access to such expertise is unevenly distributed across healthcare systems. A substantial proportion of patients with peripheral neuropathies are initially evaluated by non-specialist physicians, including general neurologists or clinicians in primary or secondary care settings. Diagnostic delays and misclassification of neuropathy subtypes are therefore relatively common, particularly for rare or atypical etiologies. These challenges highlight the need for tools capable of supporting clinicians during the diagnostic process. Recent advances in artificial intelligence (AI), particularly the development of large language models (LLMs), have generated increasing interest in their potential applications in clinical medicine. ChatGPT, developed by OpenAI and based on the Generative Pre-trained Transformer architecture, is a conversational AI system capable of generating context-aware medical reasoning and diagnostic suggestions. Emerging studies suggest that LLMs may achieve diagnostic performance comparable to that of medical trainees or physicians in certain clinical reasoning tasks. However, rigorous evaluation of LLM performance in complex neurological conditions remains limited. In particular, little evidence exists regarding their performance in diagnostically challenging polyneuropathy cases typically encountered in tertiary referral centers. Evaluating the capabilities and limitations of such tools in this setting is essential before considering their potential role as clinical decision-support systems. This study aims to systematically evaluate the diagnostic performance of ChatGPT-4o when applied to real-world polyneuropathy cases and to compare its performance with that of specialist and non-specialist neurologists. Study Objectives The primary objective of this study is to evaluate the diagnostic accuracy of ChatGPT-4o in identifying the leading etiological diagnosis in complex polyneuropathy cases. Specifically, the study aims to compare the diagnostic performance of ChatGPT-4o with that of peripheral nerve disease specialists and non-specialist neurologists. Secondary objectives include assessing the ability of ChatGPT-4o to generate appropriate differential diagnoses and recommend suitable confirmatory diagnostic tests. The study will also evaluate whether exposure to AI-generated diagnostic suggestions influences neurologist diagnostic decisions. Additionally, the study will assess the consistency of ChatGPT-4o responses across repeated independent evaluations and characterize the types of errors produced by the model when incorrect diagnoses are generated. Study Design This is a comparative diagnostic accuracy study designed to evaluate the performance of an artificial intelligence-based large language model relative to human neurologists. The study will be conducted across two tertiary referral centers for peripheral neuropathies located in Milan, Italy: Humanitas Research Hospital (IRCCS) and Fondazione IRCCS Istituto Neurologico Carlo Besta. These institutions provide specialized care for patients with a wide range of polyneuropathies, including inflammatory, hereditary, metabolic, toxic, and paraneoplastic etiologies. The study protocol has been approved by the Ethics Committee of Humanitas Research Hospital and will be conducted in accordance with the principles of the Declaration of Helsinki. Study Population and Case Selection Clinical cases will be retrospectively identified from patients evaluated in the polyneuropathy outpatient clinics of the participating institutions. Eligible cases will include patients with a confirmed diagnosis of polyneuropathy for whom a stable etiological diagnosis has been established after at least 12 months of clinical follow-up. Clinical information will be extracted from medical records and used to prepare standardized case summaries. Data included in the summaries will reflect the information typically available during early diagnostic evaluation, including patient demographics, clinical history, neurological examination findings, results of nerve conduction studies, and results of standard laboratory screening tests. From the pool of eligible cases, 100 cases will be randomly selected using a computer-generated sampling method. All cases will be anonymized prior to analysis. The final dataset is expected to include a broad spectrum of polyneuropathy etiologies reflecting the referral patterns of tertiary neuromuscular centers, including inflammatory neuropathies, hereditary neuropathies, metabolic neuropathies, toxic neuropathies, and other less common conditions. Case Evaluation by ChatGPT-4o The artificial intelligence system evaluated in this study is ChatGPT-4o Enterprise, developed by OpenAI. Each clinical case summary will be presented to the model using standardized prompts instructing the model to analyze the clinical scenario and provide diagnostic reasoning. For each case, the model will be asked to generate: * one leading etiological diagnosis, * two less likely alternative diagnoses, * and one diagnostic test capable of confirming the leading diagnosis. To minimize contextual bias, each case will be evaluated in an independent session with no prior conversation history. The entire dataset will be evaluated twice in two independent runs in order to assess the reproducibility and consistency of the model's responses. Case Evaluation by Neurologists A panel of neurologists from multiple international centers will participate as human raters in the study. Participants will include both peripheral nerve disease specialists and neurologists without specific subspecialization in neuromuscular disorders. Neurologists will review the same standardized case summaries using a web-based interface and will be asked to provide a leading diagnosis, two alternative differential diagnoses, and a recommended confirmatory diagnostic test for each case. The evaluation will be conducted in two phases. In the first phase, neurologists will independently review the cases and provide their diagnostic assessments. In the second phase, cases will be presented again together with the diagnostic output generated by ChatGPT-4o, allowing neurologists to confirm or modify their previous responses. This design will allow assessment of the potential influence of AI-generated suggestions on physician diagnostic decisions. Outcome Assessment The primary outcome of the study is diagnostic accuracy for the leading diagnosis, defined as the proportion of cases in which the proposed leading diagnosis matches the final confirmed etiological diagnosis. Secondary outcomes include the accuracy of differential diagnoses, the appropriateness of recommended confirmatory diagnostic tests, and changes in neurologist diagnostic performance after reviewing AI-generated outputs. All responses will be evaluated against the final confirmed diagnosis for each case. The appropriateness of differential diagnoses and diagnostic test recommendations will be independently assessed by expert neurologists with extensive experience in polyneuropathy diagnosis. Clinical Significance This study will provide empirical evidence on the diagnostic performance of a large language model in a complex neurological domain. By directly comparing AI-generated diagnostic reasoning with that of specialist and non-specialist neurologists, the study aims to clarify the potential role of large language models as decision-support tools in neurology.

Conditions

Polyneuropathies

Interventions

Type	Name	Description
OTHER	ChatGPT-4o	ChatGPT-4o Enterprise (OpenAI) is evaluated as an artificial intelligence-based large language model for clinical diagnostic reasoning. Standardized anonymized clinical case summaries will be presented to the model, which will generate a leading diagnosis, two alternative differential diagnoses, and one recommended confirmatory diagnostic test for each case.
OTHER	chatGPT	Case summaries will

Timeline

Start date: 2023-06-12
Primary completion: 2025-01-15
Completion: 2025-03-01
First posted: 2026-03-25
Last updated: 2026-03-25

Locations

1 site across 1 country: Italy

Source: ClinicalTrials.gov record NCT07493681. Inclusion in this directory is not an endorsement.