Ultrasensitive plasma-based monitoring of tumor burden using machine learning-guided signal enrichment

All inquiries regarding this data set should be directed towards the following contact persons:

Study description

Patients

The study included 63 patients diagnosed with UICC stage III colorectal cancer (CRC), 20 patients with colorectal adenomas, 10 patients with pT1 CRC, and 10 patients with known “high-burden” circulating tumor DNA (ctDNA), totaling 103 CRC patients. Additionally, the study included 45 healthy individuals.

Samples

For CRC patients, DNA from tumor tissue and normal DNA from white blood cells (WBC) were obtained. Furthermore, cfDNA from plasma samples were available from the patients as indicated in Table 1. For healthy individuals, cfDNA was collected.

Methods

WGS was performed on all collected samples using the NovaSeq platform from Illumina. The MRD-EDGE tool was used to assess ctDNA status and level. For a subset of patients (n = 48), digital droplet PCR (ddPCR) was conducted on a second aliquot of the plasma sample.

Cohorts and samples (Table 1)

CohortPatientsSample typesComment
Stage III CRC48Tumor, normal, 2x pre-OP plasmaUsed for MRD-EDGE >< ddPCR comparison
Stage III CRC15Tumor, normal, pre-OP plasma, post-OP plasma
High Burden CRC10Tumor, normal, high-burden ctDNA plasma5 used for training, 5 used in relation to adenoms/pT1 analysis
Adenoma20Tumor, normal, pre-resection plasma
pT110Tumor, normal, pre-resection plasma
Healthy Controls45Plasma5 used for training, 40 used as non-cancer controls

Available supplementary files (Table 2)

Restricted table numberTable nameContent
1MRD-EDGESNV disease-specific model train, validation, test sets and model performanceInfo on 5 high-burden and 5 healthy cfDNA samples used for model training.
4In silico and experimental mixing studiesInfo on 2 high-burden and 2 healthy cfDNA samples used for in silico mixing studies.
5Plasma sample sequencing metricsVarious metrics on all plasma samples, including (but not limited to) blood collection tube, library preparation details, sequencing platform, and post-sequencing metrics.
6Clinical data for individual patientsClinical info on CRC patients and patient with colorectal adenomas, including age, gender, stage, MSI status, histology type, and recurrence information.
7Tumor-informed MRD-EDGE Z scoresMRD-EDGE Z scores (CNV and SNV) and related details on all plasma samples.
8Adenoma histology featuresInfo on 20 colorectal adenomas, including location, histological subtype, tumor size, and grade of differentiation.
11Tumor tissue sequencing metricsVarious metrics on all tumor samples, including (but not limited to) library preparation details, sequencing platform, and post-sequencing metrics.
12Normal tissue sequencing metricsVarious metrics on all normal samples, including (but not limited to) library preparation details, sequencing platform, and post-sequencing metrics.
15CRC plasma samples used in comparison with ddPCRInfo on 48 CRC patients and plasma samples used for MRD-EDGE <> ddPCR comparison. Includes information on gender, age, tumor location and size, MSI status as well as MRD-EDGE and ddPCR ctDNA results.

Original publication

Widman et. al. Ultrasensitive plasma-based monitoring of tumor burden using machine learning-guided signal enrichment

Data access

External researchers (academic or commercial) interested in analyzing the Danish colorectal cancer dataset will need to contact the Data Access Committee via email to cla@clin.au.dk. Access to clinical data and supplementary information (Table 2) related to the article requires that the data requestor (legal entity) enter into Collaboration and Data Processing Agreements, with the Central Denmark Region (the legal entity controlling and responsible for the data). Request for access to raw sequencing data furthermore requires that the purpose of the data re-analysis is approved by The Danish National Committee on Health Research Ethics. Upon a reasonable request, the authors, on behalf of the Central Denmark Region, will enter into a collaboration with the data requestor to apply for approval.