Trials / Completed

CompletedNCT07432061

Prediction of Infectious Diseases in LMICs Using Electronic Health Record Data

Status: Completed
Phase: —
Study type: Observational
Enrollment: 1,000 (actual)

Sponsor: Mahidol University · Academic / Other
Sex: All
Age: 18 Years
Healthy volunteers: Not accepted

Summary

Dengue is a rapidly emerging infectious disease in South and Southeast Asia. Definitive diagnosis requires laboratory testing (PCR or antigen testing) which are often unavailable in settings with highest incidence. Correctly identifying patients who have dengue, and the small number of patients with dengue who will progress to severe disease is important to ensure prompt institution of appropriate treatments. Existing models use a combination of clinical and laboratory features. A model developed and tested on data from 397 patients admitted to the Hospital for Tropical Diseases in Bangkok in 2013 - 2014 used Bayesian modelling of variables (liver and full blood count) and clinical symptoms (including fever, petechiae, bleeding) to distinguish dengue from other febrile illness. The resultant model performed had an AUC of 0.75 which improved to 0.8 when NS1 was included. The Sequential Organ Failure (SOFA) scores, or modified versions use vital sign and blood test (liver, renal and haematology) data and are good indicators of those likely to die. However, they function less well in moderately severe diseases (e.g. predicting need for ICU admission). These approaches are promising, but are limited by limited generalizability, use of multiple blood tests and clinical symptoms. A low-cost easy tool able to rapidly diagnose dengue and predict disease severity would be of great value in the region. With modern machine learning methods, this is now feasible and previously identified barriers such as the requirement for large amounts of training data can now be overcome. For example, models can be created from large datasets, but then optimized for smaller different datasets (data either from other locations/conditions, or with less input data). We've previously shown that data-driven machine learning algorithms could generalize across multiple United Kingdom (UK) National Health Service (NHS) Trusts (for predicting COVID-19). Whilst initially trained on data from over 77,000 patients, we created a model requiring only vital sign data and bedside blood count able to predict COVID-19 diagnosis in patients presenting at UK hospitals. We have demonstrated ability to adapt this model for a lower middle-income country (LMIC) setting using data from two Vietnamese hospitals. The adapted models achieved AUROCs around 0.75 and AUPRCs around 0.89 (similar to UK sites where much larger amounts of data were available). Performing "transfer learning," whereby a small subset of UK data was used to support model development in Vietnam, improved performances between 5-10%. We also found that using statistical methods for addressing missing values can further improve predictive performance by 2-5%. This machine learning model can also function as a 'baseline model' and be adapted for a new task i.e. dengue.

Conditions

Interventions

Type	Name	Description
OTHER	No intervention	No intervention

Timeline

Start date: 2024-11-14
Primary completion: 2025-09-15
Completion: 2025-09-15
First posted: 2026-02-25
Last updated: 2026-02-25

Locations

1 site across 1 country: Thailand

Source: ClinicalTrials.gov record NCT07432061. Inclusion in this directory is not an endorsement.