Trials / Not Yet Recruiting
Not Yet RecruitingNCT07523035
Colorectal Adenoma Cohort
An Ambidirectional Cohort Study on Colorectal Adenoma
- Status
- Not Yet Recruiting
- Phase
- —
- Study type
- Observational
- Enrollment
- 1,280 (estimated)
- Sponsor
- Second Affiliated Hospital, School of Medicine, Zhejiang University · Academic / Other
- Sex
- All
- Age
- 18 Years – 75 Years
- Healthy volunteers
- Not accepted
Summary
This is an ambidirectional cohort study aiming to develop and validate a risk prediction model for colorectal adenoma recurrence and progression. The study will enroll patients aged 18-75 years who undergo colorectal adenoma resection at the Second Affiliated Hospital of Zhejiang University School of Medicine. A retrospective cohort will include patients treated in the past 10 years with available endoscopic, pathologic, and routine laboratory data. A prospective cohort will be enrolled from the date of ethical approval until December 31, 2030, with collection of epidemiological questionnaire data, lifestyle information, blood and tissue biospecimens, and follow-up outcomes. The primary outcome is adenoma recurrence, with secondary outcomes including advanced adenoma and colorectal cancer. Based on a target of 1,083 evaluable participants (325 events) to ensure adequate model development, and accounting for 15% loss to follow-up, the total planned enrollment is 1,280 participants. The study will validate existing risk models based on traditional adenoma characteristics and establish a novel model incorporating lifestyle factors and systemic inflammatory markers to improve risk stratification and guide surveillance strategies.
Detailed description
This is a single-center, ambidirectional cohort study conducted at the Second Affiliated Hospital of Zhejiang University School of Medicine, aiming to develop and validate a risk prediction model for colorectal adenoma recurrence and progression. The design includes both a retrospective component, which enrolls patients treated in the past 10 years with existing clinical data, and a prospective component, which enrolls patients from the date of ethical approval until December 31, 2030. No sampling frame is applied; all patients meeting the eligibility criteria will be consecutively enrolled until the target sample size of 1,280 participants is achieved. The retrospective cohort will provide preliminary data for model calibration, while the prospective cohort will serve for model development and temporal validation. This study operates as a patient registry with embedded quality assurance mechanisms. Although the registry is not certified by a third party, internal quality control follows standardized operating procedures aligned with institutional review board requirements and Good Clinical Practice principles for observational studies. A formal quality assurance plan has been established to address data validation and registry procedures. Regular on-site monitoring visits are conducted quarterly by an independent data monitor not involved in patient recruitment or data entry, assessing adherence to the protocol, completeness of case report forms, and consistency with source documents. Annual internal audits are performed by the hospital's clinical research audit unit to evaluate compliance with data management standard operating procedures, patient confidentiality protections, and biospecimen handling protocols. External audits are not planned unless required by the funding agency or regulatory authority. Data entered into the registry undergo multiple automated and manual checks. Range checks validate continuous variables against predefined normal or plausible ranges, with out-of-range values flagged for review. Consistency checks verify cross-field consistency, such as ensuring that the date of adenoma resection precedes the date of first follow-up colonoscopy and that smoking status changes follow logical sequences. Logical checks apply internal rules, such as the number of adenomas not being negative and follow-up intervals aligning with risk categories. Duplicate detection algorithms identify potential duplicate patient records based on medical record number, date of birth, and name initials. All flagged discrepancies are resolved by querying the responsible study coordinator and verifying against source documents, with corrections documented with audit trails. Source data verification is performed on a random sample of 20% of enrolled participants, comparing registry data against external source documents including electronic medical records, paper or electronic case report forms, laboratory information systems, and biospecimen tracking logs. For the retrospective cohort, additional verification compares extracted data against original hospital records. For the prospective cohort, a subset of 10% of participants undergoes full source data verification, with an additional targeted verification for primary outcome events such as adenoma recurrence, advanced adenoma, and colorectal cancer. The acceptable accuracy threshold is set at 98% or higher agreement for critical variables and 95% or higher for non-critical variables. A comprehensive data dictionary is maintained and version-controlled. For each variable, the dictionary includes the variable name and full label, definition and description, data type, allowed values or valid range, source of the variable, coding schemes where applicable (such as MedDRA version 26.0 or later for medical history and adverse events, WHO Drug Dictionary Global for concomitant medications, and WHO classification of tumors of the digestive system for adenoma morphology), normal ranges for laboratory parameters, missing value codes, and time of collection. The data dictionary is accessible to all study personnel and is updated as needed, with version control and change logs maintained. Written standard operating procedures have been developed for all key registry operations and analytical activities, and are reviewed annually and updated as needed. Key standard operating procedures cover patient recruitment and enrollment, including procedures for identifying eligible patients, obtaining informed consent, assigning unique participant identifiers, and documenting screening logs. Data collection procedures include instructions for completing case report forms, administering lifestyle questionnaires, extracting data from medical records, and handling paper versus electronic forms. Biospecimen collection, processing, and storage protocols address blood and tissue collection, centrifugation, aliquoting, labeling, temperature monitoring, freezer logs, and chain of custody documentation. Data management procedures cover data entry (with double entry for critical fields), discrepancy resolution, audit trail maintenance, database locking procedures, and backup protocols. Follow-up and outcome ascertainment procedures include scheduling and conducting follow-up colonoscopies, telephone interview scripts for lifestyle updates, and linkage procedures with the Zhejiang Cancer Registry System. Adverse event reporting procedures, although the study is not an interventional trial, require documentation and reporting to the institutional review board within 15 days for any serious adverse events occurring during follow-up colonoscopies, such as perforation or bleeding requiring hospitalization. Finally, change management procedures address protocol amendments, case report form modifications, and database schema changes, including documentation, approval pathways, and communication to study personnel. The sample size was calculated using the method proposed by Riley et al. (BMJ 2022) implemented in the 'pmsampsize' R package. Assumptions include an expected model C-statistic of 0.80, an overall outcome event rate (adenoma recurrence or progression within follow-up) of 30%, and a number of candidate predictor variables for model development of 10, selected based on literature review and clinical relevance including adenoma size, number, morphology, family history of colorectal cancer, age, sex, smoking status, body mass index, neutrophil-to-lymphocyte ratio as a systemic inflammatory marker, and dietary fiber intake. The minimum required sample size to simultaneously achieve a shrinkage factor of 0.9 or higher, a calibration slope error of 0.1 or less, and a C-statistic estimation precision (standard error) of ±0.05 or less is 1,083 participants, including at least 325 outcome events. Accounting for a 15% loss to follow-up due to dropout, death, relocation, or withdrawal of consent, the final target enrollment is 1,280 participants. The retrospective cohort is expected to contribute approximately 400 to 500 participants depending on data availability, and the prospective cohort will recruit the remaining participants. Enrollment will continue until the target sample size is reached, with no interim stopping rules. A comprehensive missing data plan has been developed, as missing data are anticipated in both retrospective and prospective components. Missingness may be due to non-reporting, unavailability, uninterpretable data, data inconsistency, or out-of-range results. All missing values are coded using standardized missing value codes, and a missing data log is maintained. For the retrospective cohort, variables with more than 20% missingness will not be imputed; instead, such variables will be excluded from model development or analyzed as a separate category. For variables with 20% or less missingness, multiple imputation by chained equations will be used, assuming missing at random, and sensitivity analyses will compare complete-case analysis with imputed results. For the prospective cohort, efforts are made to minimize missing data through standardized data collection protocols, real-time data entry validation, and staff training, with an expected missing rate of less than 10% for core variables. Multiple imputation with 20 imputed datasets will be used, and the imputation model will include auxiliary variables such as age, sex, and baseline adenoma risk category to improve prediction of missing values. If the primary outcome status (adenoma recurrence) is missing due to loss to follow-up, participants will be censored at the time of last known contact, and a sensitivity analysis will assume worst-case and best-case scenarios for missing outcomes to assess potential bias. For missing not at random, pattern-mixture models will be explored to test the robustness of findings. All analyses will be conducted using R version 4.2 or later and Stata version 17 or later. Two-sided statistical tests will be used with a significance level of 0.05, except for model development where internal validation methods will be emphasized over p-values. No interim analyses are planned. The analysis will follow a pre-specified statistical analysis plan, which will be finalized and time-stamped before any model development analyses are performed. For the primary objective of validating existing risk models, existing risk models for adenoma recurrence, such as those based on adenoma size, number, and morphology per European Society of Gastrointestinal Endoscopy guidelines, will be validated in the combined retrospective and prospective cohort. Performance will be assessed in terms of discrimination using the area under the receiver operating characteristic curve with 95% confidence intervals, calibration using calibration plots, calibration slope, and calibration-in-the-large, and the Brier score will be reported as a measure of overall prediction error. If the retrospective cohort shows significant temporal bias due to changes in endoscopic or pathological practices over the past 10 years, validation will be performed separately in the prospective cohort only. For the secondary objective of developing a novel risk prediction model, model development will use the prospective cohort data only, with a target of approximately 780 to 880 participants after accounting for retrospective contributions. The primary outcome for model development is time to first adenoma recurrence, including non-advanced and advanced adenomas, with secondary outcomes of time to advanced adenoma and time to colorectal cancer, and competing risk of death will be considered. The candidate predictors, pre-specified at a maximum of 10, include traditional adenoma characteristics such as size (10 mm or larger versus smaller than 10 mm), number (three or more versus fewer than three), and villous histology (presence versus absence); demographic and lifestyle factors including age, sex, smoking status (current, former, or never), and body mass index as a continuous variable; family history of a first-degree relative with colorectal cancer (yes or no); the neutrophil-to-lymphocyte ratio as a continuous systemic inflammatory marker; and dietary fiber intake in grams per day as a continuous variable. Continuous predictors will be examined for linearity using restricted cubic splines, and non-linear relationships will be transformed or categorized based on clinically meaningful cutoffs. A Cox proportional hazards regression model will be used for time-to-event outcomes, and the proportional hazards assumption will be tested using Schoenfeld residuals; if violated, time-dependent coefficients or stratified models will be considered. With 780 participants and an expected event rate of 30% (234 events) and 10 candidate predictors, the events-per-variable ratio is 23.4, exceeding the recommended minimum of 10 for logistic regression and the more conservative events-per-variable of 20 or higher for Cox models with moderate censoring. To correct for overfitting, the model will be estimated using ridge regression or penalized maximum likelihood. Internal validation will use bootstrap resampling with 200 replicates to estimate the optimism-corrected C-statistic and calibration slope, and a shrinkage factor will be applied to the regression coefficients. A point-based risk score or nomogram will be derived from the final model, and optimal risk thresholds for recommending different surveillance intervals, such as one-year versus three-year colonoscopy, will be identified using decision curve analysis to maximize net benefit. Subgroup analyses will be performed by age (younger than 60 years versus 60 years or older), sex, and baseline risk category (low versus high). Sensitivity analyses will exclude participants with incomplete follow-up of less than two years unless recurrent adenoma occurred. A competing risk analysis using the Fine-Gray subdistribution hazard model will be conducted for advanced adenoma and colorectal cancer outcomes, with death as a competing event. For missing data in the primary model development, multiple imputation by chained equations with 20 imputations will be used for missing predictor variables, with imputation performed separately for training and validation sets, and complete-case analysis will be reported as a sensitivity analysis. Finally, an anonymized dataset and analysis code will be made available upon reasonable request after publication of primary results, subject to institutional data sharing policies and patient consent provisions.
Conditions
Timeline
- Start date
- 2026-04-16
- Primary completion
- 2030-12-31
- Completion
- 2033-12-31
- First posted
- 2026-04-13
- Last updated
- 2026-04-13
Source: ClinicalTrials.gov record NCT07523035. Inclusion in this directory is not an endorsement.