Trials / Not Yet Recruiting

Not Yet RecruitingNCT07497815

Artificial Intelligence-assisted Diagnosis in Ophthalmology

Development and Validation of an Artificial Intelligence-assisted Diagnostic System for Ophthalmic Pathologies

Status: Not Yet Recruiting
Phase: —
Study type: Observational
Enrollment: 15,000 (estimated)

Sponsor: Marisse Masis-Solano · Industry
Sex: All
Age: 18 Years
Healthy volunteers: Accepted

Summary

This is a retrospective, multicenter, observational study designed to develop and validate an artificial intelligence (AI) system capable of detecting and classifying major ophthalmic diseases (glaucoma, cataract, diabetic retinopathy, and other retinal pathologies) in the Costa Rican population. The study will use approximately 15,000 existing medical images from digital archives of two ophthalmic centers in Costa Rica, without active participant recruitment or capture of new images. The primary motivation is that AI systems developed in other countries (primarily Asian, European, or North American populations) do not necessarily perform with the same accuracy when applied to Latin American populations. This study seeks to establish a precedent for the importance of locally validating any medical AI technology before clinical implementation.

Detailed description

Background and Rationale Ophthalmic diseases, including glaucoma, diabetic retinopathy, and cataract, represent a major public health burden both globally and in Costa Rica. Early detection is critical for all of these conditions, yet it faces persistent challenges: glaucoma is asymptomatic in its early stages, diabetic retinopathy requires annual screening that overwhelms available ophthalmology capacity, image interpretation is time-consuming and subject to inter-observer variability, and in resource-limited settings the supply of expert ophthalmologists is insufficient to screen all at-risk populations. Advances in deep learning have demonstrated strong capability in ophthalmic image analysis, with published studies showing AI systems achieving diagnostic accuracy comparable to expert ophthalmologists across multiple disease categories. However, most AI systems have been developed and validated in Asian, European, or North American populations. These systems may not generalize well to Latin American populations due to differences in disease prevalence patterns, demographic characteristics, imaging equipment and protocols, and healthcare system structures. Applying AI systems developed elsewhere without local validation is scientifically questionable and potentially unsafe. This study addresses that gap by developing and validating an AI system specifically for the Costa Rican population. Study Design and Setting This is a retrospective diagnostic technology validation study conducted at two ophthalmology centers in Costa Rica: Asociados de Mácula y Vítreo de Costa Rica in San José (contributing approximately 10,000 images) and Centro Ocular in Heredia (contributing approximately 5,000 images). The study will use approximately 15,000 existing ophthalmic images from adult patients who received care at these centers during routine clinical practice. No temporal restrictions are applied; all available historical images meeting quality and modality criteria will be included to maximize data volume and representativeness. There is no active participant recruitment and no new images will be captured. Image modalities include fundus photography (color and autofluorescence), optical coherence tomography (OCT) of the posterior segment, anterior segment photography, automated perimetry (visual fields), and video-OCT. Study Procedures Image Extraction (Months 1-6). All available ophthalmic images meeting inclusion criteria will be automatically extracted from the Picture Archiving and Communication Systems (PACS) at each site. Anonymization (Months 1-6). A rigorous anonymization protocol will be applied. All direct identifiers (name, national ID number, address, phone, email, medical record number) will be removed and each image will be assigned a random anonymous code. Potentially identifying data will be transformed: exact birth dates will be converted to age groups (18-40, 41-60, 61-75, \>75) and exact dates will be reduced to year only. An encrypted linkage table will be stored locally at each site solely for ethical emergencies, such as incidental findings requiring patient notification. Non-identifying clinical data will be preserved, including age group, sex, diagnosis, intraocular pressure, best-corrected visual acuity, diabetes mellitus history, and recent ocular surgery history. Quality Assessment and Labeling (Months 7-12). A double-reading system with adjudication will be used. Each image will first undergo quality screening and be accepted or rejected based on focus, illumination, field of view, and absence of major artifacts. Rejected images will be permanently excluded. For accepted images, two independent readers (Dr. Marissé Masís Solano and Dr. Erick Hernández Bogantes) will each assign a diagnosis while blinded to one another's assessments. Where readers agree, the diagnosis is recorded as final. Where they disagree, an adjudicator (Dr. Masís Solano) makes the final determination. Dr. Masís meets adjudicator independence requirements: the images are not from her own patients, she is not the Principal Investigator, and the process complies with Good Clinical Practice standards and FDA guidelines for AI medical device development. Diagnostic categories include glaucoma (open-angle and angle-closure), cataract (severity graded), diabetic retinopathy (mild, moderate, severe, proliferative), age-related macular degeneration (dry versus wet), retinal vascular occlusion, macular edema, normal/no significant pathology, and other relevant findings. Data Split (Month 13). Images will be divided by stratified random split maintaining diagnostic category proportions: a training set of 12,000 images (80%) and a validation set of 3,000 images (20%). Strict separation will be maintained so that validation images are never exposed to the model during training. Model Development and Training (Months 13-15). The AI system will be built on deep convolutional neural network architectures (ResNet, EfficientNet, Vision Transformers) using transfer learning from ImageNet pre-trained models. A multimodal fusion approach will integrate image features with clinical data. Training will incorporate regularization techniques to prevent overfitting, including dropout, data augmentation, and early stopping, along with hyperparameter optimization. External Validation (Months 16-18). The model will be tested on the 3,000 held-out validation images. Performance metrics, subgroup analyses, and equity analyses are described in the Outcome Measures section of this record. Bias Analysis (Months 19-21). A comprehensive algorithmic bias assessment will evaluate the model across sex, age group, disease severity, imaging equipment, and center of origin. Documentation and Dissemination (Months 22-24). Final outputs will include manuscript preparation for peer-reviewed publication, technical documentation, reports to health authorities, and conference presentations. Quality Assurance Plan Data validation is built into multiple stages of the study. During image extraction, automated checks will verify file integrity, modality type, and metadata completeness. During labeling, the double-reading design with independent adjudication serves as the primary quality control mechanism; inter-reader agreement rates will be calculated (Cohen's kappa) and if excessive disagreement is observed, calibration sessions will be held to align diagnostic criteria before labeling continues. All reader assignments and adjudication decisions will be logged with timestamps in an auditable database. Site-level data quality reviews will be conducted at the midpoint and conclusion of the labeling phase to verify that diagnostic category distributions and rejection rates are consistent with expectations and to identify any systematic data entry errors. Data Checks and Source Data Verification Predefined range and consistency checks will be applied to all structured clinical variables at the point of data extraction. Intraocular pressure values, visual acuity measurements, and age groups will be validated against clinically plausible ranges. Cross-field logic checks will flag inconsistencies (for example, a diabetic retinopathy diagnosis in a patient with no documented diabetes history). Flagged records will be reviewed by an investigator and either corrected with documentation or excluded. Because the study uses fully anonymized data and the encrypted linkage table is accessible only to the site PI for ethical emergencies, direct source data verification against original medical records will not be performed routinely. However, during the pilot phase, a random sample of images from each site will be cross-referenced against their PACS metadata to confirm that the extraction and anonymization pipeline preserves clinical data accurately. Data Dictionary A data dictionary will be maintained as a living document throughout the study. It will contain detailed descriptions of each variable, including its source (PACS metadata, reader assignment, or derived), data type, permissible values or coding scheme, and normal or expected ranges where applicable. Diagnostic categories will be coded using a study-specific classification system aligned with established clinical grading scales (for example, the International Clinical Diabetic Retinopathy severity scale for diabetic retinopathy). The data dictionary will be finalized before the labeling phase begins and will be versioned, with all changes tracked and dated. Standard Operating Procedures Written standard operating procedures (SOPs) will govern all core study activities, including image extraction and transfer, anonymization, quality screening, diagnostic labeling and adjudication, data management and storage, model training and validation, adverse or incidental finding reporting, and change management. SOPs will be reviewed and approved by all investigators before study initiation. Any amendments during the study will follow a formal change management process requiring investigator review and dated version control. Sample Size Assessment The total dataset of approximately 15,000 images, split into 12,000 training images and 3,000 validation images, was determined based on statistical power requirements for the validation phase. With an estimated 750 cases per major diagnostic category in the validation set, the study has greater than 90% power to detect an AUC of 0.90 against a null hypothesis of 0.80 using a two-sided test at the 0.05 significance level. The 95% confidence intervals for AUC will have a width of no more than 0.05. For a sensitivity estimate of 85% with 750 positive cases, the 95% confidence interval is 82-88%. For a specificity estimate of 90% with 2,250 negative cases, the 95% confidence interval is 89-91%. These narrow confidence intervals support precise conclusions about true model performance. If pilot-phase data suggest that available image volume is lower than anticipated, additional centers may be recruited. Plan for Missing Data Missing data will be addressed at two levels. For image quality, images that fail quality screening (insufficient focus, illumination, field of view, or presence of major artifacts) will be permanently excluded and documented with the reason for exclusion; no imputation will be attempted for image-level deficiencies. For structured clinical variables, the extent and pattern of missingness will be characterized for each variable. Variables with less than 5% missing data will be handled by complete-case analysis for that variable. Variables with 5-20% missing data will be evaluated for missingness pattern (missing completely at random, missing at random, or missing not at random) and, where appropriate, multiple imputation will be used with sensitivity analyses comparing imputed results to complete-case results. Variables with greater than 20% missing data will not be used as model inputs but will be reported descriptively. The impact of missing clinical covariates on model performance will be assessed by comparing model accuracy with and without clinical data integration. Statistical Analysis Plan The primary analysis will evaluate model diagnostic performance on the held-out validation set using receiver operating characteristic (ROC) curves, with AUC calculated for each disease category and compared against the pre-established threshold of 0.90 using the DeLong method. Ninety-five percent confidence intervals will be reported for all performance metrics. McNemar's test will be used to compare the pattern of errors between the AI model and human readers on discordant cases. Secondary analyses will include between-center performance comparison to assess generalizability across sites, subgroup analyses stratified by sex, age group, disease severity, and capture device, and multivariate analysis to identify factors independently associated with model performance. Equity analysis will use statistical testing to detect performance disparities between demographic subgroups, with a pre-specified criterion that no clinically significant difference (defined as an AUC difference of less than 0.05) exists between groups. All model errors will be manually reviewed by the ophthalmologist investigators and classified as understandable errors (genuinely ambiguous or borderline cases), serious errors (clearly incorrect diagnoses), or systematic errors (patterns suggesting a consistent weakness in the model). Error analysis results will inform decisions about model iteration or architectural changes. Ethical Considerations This study will request a waiver of informed consent on the following grounds: the study poses minimal risk (no intervention, no patient contact, no modification of treatment); obtaining consent is practically impossible given that approximately 15,000 images were collected from thousands of patients over multiple years, many of whom are no longer reachable; complete anonymization renders the data non-identifiable under international data protection standards; the study offers significant social benefit through its potential to improve disease detection across the Costa Rican population; and the approach is consistent with established scientific precedent for large retrospective AI development studies. Patient rights are preserved: any individual may request that their data be excluded. Data Security Data will be stored on Google Cloud Platform with HIPAA certification. Encryption will use AES-256 at rest and TLS 1.3 in transit. Access will require multi-factor authentication. A complete audit trail will log all data access and modification events. Quarterly security audits will be conducted and a documented incident response plan will be maintained. Data will be retained for five years following publication, after which it will be securely destroyed. Incidental Findings Protocol If an investigator identifies a previously undiagnosed serious pathology during the labeling process, the investigator will notify the site Principal Investigator, who will use the local encrypted linkage table to identify the patient and contact the original treating physician. The treating physician will then decide whether and how to contact the patient. All findings and actions taken will be documented. Limitations and Risk Mitigation The retrospective design means the study depends on historical documentation quality and cannot standardize image capture conditions. Inclusion of only two centers means results may not generalize perfectly to other settings with different equipment or patient populations. The study is specific to the Costa Rican population by design, and findings may not transfer directly to other countries, though the methodology is intended to be replicable across Latin America. Ground truth is established by human experts who are fallible, though this risk is mitigated by the double-reading and adjudication system. Contingency plans are in place: if image volume proves insufficient, additional centers may be recruited; if inter-reader disagreement is excessive, calibration sessions will align diagnostic criteria; if model performance falls below thresholds, the team will iterate with advanced architectures and more aggressive data augmentation; and data security risks are addressed through HIPAA-compliant infrastructure, quarterly audits, and a documented incident response plan. Scope This study is limited to AI model development and validation. It does not include clinical implementation. If validation is successful, a subsequent phase would involve a prospective controlled study evaluating the clinical impact of AI-assisted diagnosis, including effects on detection rates, patient management, clinical outcomes, clinician and patient acceptability, and cost-effectiveness. The intended clinical use model positions the AI system as a decision-support tool: the ophthalmologist would always make the final clinical decision, the system would indicate confidence levels for each prediction and flag low-confidence cases for human review, and continuous performance monitoring would be maintained.

Conditions

Interventions

Type	Name	Description
OTHER	No interventions	This retrospective observational study involves no therapeutic interventions, no treatment modifications, no patient contact, and no comparison groups. It is purely diagnostic technology development and validation using existing historical data.

Timeline

Start date: 2026-05-01
Primary completion: 2028-05-01
Completion: 2029-05-01
First posted: 2026-03-27
Last updated: 2026-04-01

Locations

2 sites across 1 country: Costa Rica

Source: ClinicalTrials.gov record NCT07497815. Inclusion in this directory is not an endorsement.