Trials / Active Not Recruiting
Active Not RecruitingNCT07519538
Southern Sichuan HIV Cohort Study Protocol
Multi-source Data-integrated HIV Cohort in Four Cities of Southern Sichuan, China: a Mixed Retrospective-prospective Study Protocol
- Status
- Active Not Recruiting
- Phase
- —
- Study type
- Observational
- Enrollment
- 10 (estimated)
- Sponsor
- Southwest Medical University · Academic / Other
- Sex
- All
- Age
- 18 Years – 85 Years
- Healthy volunteers
- Not accepted
Summary
Multi-source data-integrated HIV cohort in four cities of southern Sichuan, China: a mixed retrospective-prospective study protocol.
Detailed description
This study is a mixed retrospective-prospective cohort study conducted across four prefecture-level cities in southern Sichuan Province, China: Luzhou, Zigong, Neijiang, and Yibin. These four cities together comprise 23 districts and counties, with HIV care delivered through a network of designated treatment institutions coordinated by local Centers for Disease Control and Prevention (CDC). The region exhibits substantial intra-regional variation in healthcare resources and HIV burden, with molecular epidemiological data showing predominant CRF01\_AE (36.9%) and CRF07\_BC (41.0%) subtypes and evidence of concentrated transmission clusters among older adults (aged ≥60 years). Retrospective component Data are extracted from existing electronic medical records (EMRs), CDC surveillance registries, and pharmacy/insurance claims systems for the period from 1 January 2019 to 31 December 2025. This phase reconstructs individual-level historical trajectories of antiretroviral therapy (ART) engagement, laboratory monitoring (CD4, viral load), and clinical outcomes to establish the baseline cohort. Prospective component From 1 January 2026 to 31 December 2026, newly diagnosed individuals and those transferring into care are enrolled consecutively with quota controls by city and age group (\<50, 50-64, ≥65 years). At scheduled clinic visits, trained interviewers administer tablet-based structured questionnaires capturing patient-reported outcomes, including health-related quality of life (MOS-HIV, EQ-5D-5L), mental health (PHQ-9, GAD-7), HIV stigma (Berger Stigma Scale), self-management capacity, social determinants of health, and treatment preferences via a discrete choice experiment (5 attributes, 9 choice sets). Clinical data continue to be collected from routine sources. Data integration A deterministic linkage procedure uses the unique ART treatment code as the primary key, with probabilistic linkage for records where the code is missing. A linkage failure threshold of 5% triggers manual verification. All data are harmonized (ICD-10 diagnoses, YPID medication codes, standardized laboratory units) and stored on encrypted servers with restricted access. Sample size Target enrolment is 2,000 participants across the four cities, accounting for an anticipated 15% loss to follow-up. This ensures ≥1,700 participants with complete 12-month outcome data, providing ≥200 virological failure events (assuming a 12% failure rate) to satisfy the "10 events per variable" rule for multivariable analyses. Primary analysis The primary outcome is 12-month virological suppression (HIV RNA \<50 copies/mL). Logistic regression with stepwise variable selection (Akaike Information Criterion) is used, reporting adjusted odds ratios and 95% confidence intervals. Secondary analyses include time-to-event analysis for virological failure (≥1,000 copies/mL on two consecutive measures) using Cox proportional hazards models, with verification of the proportional hazards assumption. Dynamic trajectory modeling Hidden Markov Models (HMM) are applied to repeated viral load and CD4 measurements to characterize unobserved latent states (e.g., stable suppression, immunological fluctuation, failure) and transition probabilities over time. Self-management capacity and socioeconomic status are included as time-invariant covariates. Machine learning prediction Models are developed to predict 12-month virological failure (VL ≥50 copies/mL). Feature selection uses LASSO regression on a training set (70% of data). Four algorithms are compared: logistic regression (benchmark), Random Forest, XGBoost, and a stacked ensemble. Hyperparameter tuning uses grid search with five-fold cross-validation on a validation set (10%). Performance is evaluated on a held-out test set (20%) using AUC-ROC, Brier score, and calibration curves. Synthetic minority oversampling (SMOTE) is applied to the training set to address class imbalance. Model interpretability is enhanced with SHAP values. Missing data Multiple imputation by chained equations (MICE) is performed for variables with \>10% missingness under the missing at random assumption, generating five imputed datasets combined using Rubin's rules. Complete-case analysis serves as a sensitivity comparison. Sensitivity analyses Alternative viral suppression thresholds (\<200 copies/mL and \<1,000 copies/mL), different loss-to-follow-up handling (inverse probability of censoring weighting), and restriction to participants with at least two viral load measurements are conducted to assess robustness of findings. Ethics and dissemination The study has received approval from the Institutional Review Board of Southwest Medical University (approval number SWMUIRBKS-202509-0017) and is conducted in accordance with the Declaration of Helsinki. Written informed consent is obtained from all prospective participants (or legally authorized representatives for those with cognitive impairment). The study is registered with the Chinese Clinical Trial Registry (ChiCTR) prior to enrolment of the first prospective participant. Results will be published in peer-reviewed journals and presented at scientific conferences. De-identified analytical datasets will be made available upon reasonable request, subject to approval from the data governance committee and relevant ethics approvals.
Conditions
Timeline
- Start date
- 2026-04-03
- Primary completion
- 2027-04-30
- Completion
- 2027-06-30
- First posted
- 2026-04-09
- Last updated
- 2026-04-09
Locations
1 site across 1 country: China
Source: ClinicalTrials.gov record NCT07519538. Inclusion in this directory is not an endorsement.