The association of blood ctDNA levels to mutations of marker genes in colorectal cancer

Abstract Background Colorectal cancer (CRC) is a deadly and commonly diagnosed cancer. Cell‐free circulating tumor DNAs (ctDNA) have been used in the diagnosis and treatment of CRC, but there are open questions about the relationship between ctDNAs and CRC. Although mutations of genes detected by ctDNA in CRC have been studied, the quantitative relationship between ctDNA mutations and ctDNA concentration has not been addressed. Aims We hypothesized that there was an association between mutations of genes identified in ctDNAs and ctDNA concentration. His study examined this association in a population of CRC patients. Methods In 85 CRC patients, we sampled 282 mutations in 36 genes and conducted an association study based on a Random forest model between mutations and ctDNA concentrations in all patients. Results This association study showed that mutations on five genes, ALK, PMS2, KDR, MAP2K1, and MSH2, were associated with the ctDNA concentrations in CRC patients’ blood samples. Because ctDNA mutations correlate with ctDNA level, we can infer the tumor burden or tumor size from ctDNA mutations, as well as the survival time for prognosis. Conclusion Our findings shed light on the associations between mutations of genes identified in ctDNAs and ctDNA concentration in the blood of CRC patients. This discovery provides information regarding the tumor burden or tumor size based on ctDNA mutations.


| INTRODUCTION
Colorectal cancer (CRC) is the fourth most commonly diagnosed cancer but the third most deadly cancer in the world-more than two million new cases and one million deaths per year worldwide. 1 In routine diagnosis, determining the genomic characteristics of CRCs is based on the DNA extracted from tumor tissues obtained by biopsy or surgical removal, which are all invasive methods. Usually, DNAs from such a tissue sample are not representative of the heterogeneity of the entire tumor because a limited number of cells from a local area were used. As a complement to tumor tissue genotyping, liquid biopsy allows minimally invasive detection of potential tumor-specific mutations and molecular profiling of their dynamics. Measuring ctDNA also allows for quantitative and qualitative real-time evaluation of body fluids. In addition, a liquid biopsy can be repeated at intervals to monitor response to treatment, the development of drug resistance, and the detection of relapse. 2,3 Overall, the identification of ctDNA has led to lots of research for its applications to detect early disease, relapse, response to therapy, and emerging drug resistance mechanisms. 4 Liquid biopsy can be used for the noninvasive detection of cellfree circulating tumor DNA (ctDNA) or circulating tumor cells. The study of ctDNA can be applied to CRC, such as the detection of minimal residual disease (MRD) and tracking clonal dynamics in response to targeted therapies. 5 Existing studies suggest that the detection of ctDNA in patients with CRC depends on the extent of tumor volume. 6 Many studies suggested that ctDNA level is associated with tumor burden. 7,8 Studies also found that the changes in tumor volume measured by CT imaging had a kind of association with the changes in ctDNA levels. 9 Besides serving as a reliable marker of tumor burden, where changes in the volume of disease with treatment or disease progression, ctDNA also carries genetic information. 10,11 Previous studies showed that ctDNA analysis discovered genomic changes in genes RAS, BRAF, ERBB2, MET, and other tumor-related genes associated with resistance to anti-epidermal growth factor receptor (EGFR) therapy could have higher diagnostic accuracy. 12 Besides, longitudinal monitoring of ctDNAs during anti-epidermal growth factor receptor therapy revealed that genomic changes occurred as an acquired drug resistance mechanism of specific genes, mainly genes associated with mitogen-activated protein kinase (MAPK) signaling pathways. ctDNA analysis can also identify predictive biomarkers of immunocheckpoint inhibitors, 7 such as mismatch repair gene mutations, microsatellite instability high phenotypes, and tumor mutation burden. 13,14 A number of prospective clinical trials are underway to assess the impact of targeted drugs on genomic ctDNA changes, or to monitor ctDNA to explore drug resistance biomarkers. 15 Liquid biopsy, including ctDNA and circulating tumor cells, has been used in the diagnosis and treatment of CRC. For example, Yang et al. collected ctDNA mutations and concentrations in 47 early and late CRC patients and analyzed them using target sequencing and a panel covering 50 cancer-related genes. Thirty-seven ctDNA mutations were found in 93.6% of all patients. The results showed that TP53, PIK3CA, APC, and EGFR were the most commonly mutated genes. ctDNA concentration in advanced patients was significantly higher than that in early patients, and increased ctDNA concentration was associated with increased tumor size. 16 Previous work showed that the overall concordance rate between ctDNA and matched tissues was 77.2% (78/101). 8 The data confirm that the amplitude-based NGS system can sensitively detect mutant alleles in cell-free DNA (cfDNA). These results suggest that ctDNA may be a novel diagnostic biomarker for monitoring mutation status and changes in tumor burden in patients with mCRC. 8 Liquid biopsy by ctDNA sequencing has great potential for early detection and postoperative monitoring of CRC. DNA from colon cancer tissue is released into the blood more easily than DNA from rectal cancer tissue. 17 As a sensitive biomarker, ctDNA shows great potential in monitoring the response to multiple treatment modalities and targeted therapies for CRC. 7 The ctDNA level has a certain correlation with tumor burden with a range of correlation coefficient around 0.5-0.7. 7,8,18 There must be many other factors that can affect the ctDNA level, such as mutations in cancer cell genomes. Although ctDNA mutations in CRC have been studied, the quantitative relationship between ctDNA mutations and ctDNA concentration has not been addressed. We have a hypothesis that there is an association between ctDNA mutations and ctDNA concentration. We sampled 282 mutations for 36 genes across 85 patients and conducted a statistical analysis of mutations and ctDNA concentrations of patients. The association study showed that mutations on five genes, ALK, PMS2, KDR, MAP2K1, and MSH2, are associated with the concentration of ctDNAs. The association studies between ctDNA levels in the blood to therapeutic response, tumor burden, or tumor size were conducted. [7][8][9]18 Therefore, because mutations in ctDNA have an association with ctDNA level, then, we can infer the tumor burden or tumor size from gene mutations in ctDNA, and even the survival time for prognosis. The large tumor burden and high ctDNA level indicate shorter overall survival. [19][20][21] We found the association between gene mutations and ctDNA levels provides new insight into ctDNA study. It can give us some hints about the reason that caused the change in ctDNA levels.  Table 1 and their details are listed in the Dataset S1.

| RESULTS
Tumor tissues were collected at surgery, and blood samples (n = 85) were collected before surgery. Then, ctDNAs of 36 CRC marker genes were amplified and sequenced. Single nucleotide variants (SNVs) in these ctDNAs were called, and SNVs were validated by the corresponding DNAs obtained from tumor tissues. The ctDNA sequencing dataset has 282 mutations for 36 genes across 85 samples (Dataset S1). The objective is to identify mutations that could be used as a prognostic marker for ctDNA. The distribution of patient blood ctDNA levels is shown in Figure 1A. Most ctDNA concentrations are smaller than 100 ng/μl and the distribution is skewed. Therefore, we applied the log-transform to all ctDNA concentrations ( Figure 1B) to approximately conform to normality. Distribution fitting analysis showed the log-transformed ctDNA values near a normal distribution.
We used the random forest regression model to study the association between SNPs from tumor genes and ctDNA concentration.
SNVs with minor allele frequency (MAF) < 0.01 were not considered to target SNPs only and final SNPs with correlation coefficient > 0.8 were grouped together. For the model building and validation steps, we split the whole dataset into a 60% training dataset and a 40% test dataset and repeated this step 1000 times for validation. For the Random Forest regression, we allowed the maximum features to be 5, that is, five mutations, and the number of independent trees to be 100.
The average values and distribution of mean square errors (MSE) and SHAP values were calculated. Figure

| DISCUSSION
Information on ctDNA has many potential applications, including early detection, monitoring for early recurrence, molecular profiling, and therapeutic response prediction. 4 ctDNA may be used to inform clinical decision-making using both tumor-informed and tumor-agnostic platforms. 4 ctDNA has been studied in CRC, for example, the analysis of circulating tumor DNA to monitor disease burden following colorectal cancer surgery 25 and the analysis of ctDNA in patients with stages I to III colorectal cancer. 26 The ctDNA concentration itself can provide much useful information. For example, there is a hypothesis that ctDNA concentration during the therapy is associated with longterm survival. 27 There are ongoing studies to develop dynamic changes in ctDNA concentrations as a potential surrogate end-point of clinical efficacy in patients undergoing adjuvant immunotherapy. 28 Because the ctDNA concentration is useful in cancer studies, the association of the ctDNA concentration and tumor genomes is required to be investigated.
The level of ctDNA in the serum of individuals with cancer was higher than that of the healthy group. 29 ctDNA carries the same specific mutations as the corresponding cancer cells, including single nucleotide mutations, structural mutations, and DNA methylation. 30 A study showed that the ctDNA concentration in early-stage patients was significantly lower than that in late-stage patients and that the ctDNA concentration was positively correlated with tumor size. 16 It Patients' H&E-stained histopathological images of tumor tissues and cfDNA concentrations. A0 and B0 were H&E-stained histopathological images (magnification Â40) for patients P20190804_008 and P20190609_062. A1 and B1 are Agilent 4200 TapeStation images for cfDNAs from patients P20190804_008 and P20190609_062, respectively. In A1 and B1, the left and right outermost peaks are internal standards, while the middle peak of 200 bp is the cfDNA concentration has been discovered that tumor genomic mutations, such as SNV 31 and copy number variants, 32  ies. 35,36 It was discovered that ALK gene copy number gain was found in some CRC tumors, and increasing ALK gene copy was associated with poor prognosis. 35,36 The gain of ALK gene copy may have a role in resistance to anti-EGFR therapy through cross-talk of signaling pathways. 36 The reason is that mutations in ALK are associated with cell proliferation, resistance to apoptosis, and enhanced DNA synthesis. 37,38 Due to the association between ALK gene somatic mutation and ctDNA level, the measurement of ctDNA level can help clinicians to decide if targeted therapy to ALK, such as using Crizotinib or Ceritinib, shall be given to patients.
MSH2 and PMS2 mutations were founded in ctDNAs 39 in different cancer patients, including CRC. 40 It has been shown that MSH2 and PMS2 alteration is associated with high microsatellite instability for tumor genomes. 39 Microsatellite instability is found in 10% to 15% of CRCs. 41 48 Since cell death is related to ctDNA levels, 23,24 it is reasonable that mutations of PMS2 are associated with ctDNA levels.
The MAP2K1 gene encodes MEK1 protein kinase, which is involved in the RAS/MAPK signaling pathway. The RAS/MAPK signaling pathway regulates cell growth, proliferation, differentiation, migration, and apoptosis. 49 MEK1 protein kinase appears to be essential for normal development before birth and for survival after birth, but MAP2K1 mutations were observed in many human epithelial cancers, including esophageal cancer, gastric cancer, breast cancer, and CRC. 50 Table 1 and their details are listed in the Dataset S1.

| Extraction and sequencing of cell-free nucleic acids
Tumor tissue was collected at surgery and blood samples (n = 85) were collected before surgery. Based on the IRB protocol for this study, patient blood samples were collected for approximately 5-10 ml and transferred into EDTA-coated tubes. All blood samples were processed immediately or within 1 day after storage at 4 C.
Plasma from blood samples was separated by centrifugation at 1600g for 10 min at 4 C, and a second centrifugation step was performed at 18 000g at room temperature to remove any remaining cellular debris. The obtained library was amplified and purified. Multiple libraries were merged for amplification. After amplification, the products were purified by magnetic beads. Finally, sequencing was conducted.

| Analysis and quality control of sequencing data
After sequencing and base-calling, the resulting raw fastq data were analyzed by in-house quality control software to remove low quality reads and were then aligned to the reference human genome (hs37d5) using the Burrows-Wheeler Aligner (BWA), 53 and duplicate readings were marked using Sambamba tools. 54 The raw fastq data were submitted to the National Omics Data Encyclopedia (NODE) (https://www. biosino.org/node/), and the project ID is OEP001279. Please see the sections of Data Availability and Materials for more details.
SNVs and InDels were called with GATK. 55 The raw calls of SNVs and InDels were further filtered with the following inclusion thresh-

| Experimental validation
We conducted small scale experiments to compare the gene mutations in patient primary tumors and mutations identified in ctDNA to validate if ctDNA discovered mutations are also in the primary tumor tissues. We validated mutations in primary tumors by Sanger sequencing after DNA amplification with polymerase chain reaction (PCR) to the primary tumor samples ( Figure S1).

| Data analysis and model construction
For the measured ctDNA amount of all patients, distribution models were used to fit the model, and the diagnostic analysis of these fitted models was done. The mathStatica toolkit was used for distribution parameter calculation and distribution fitting, and R was used for fitting diagnosis.
For the model of the association study, the preprocessing contains two steps to ensure the quality of the data used for the modeling process. SNVs with MAF smaller than 0.01 were dropped to target SNPs only. 56 In the second step, SNPs with correlations larger than 0.8 were grouped together and one mutation from each cluster was picked

CONFLICT OF INTEREST
The authors declare no conflict of interest.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in the online repository. The name of the repository is NODE (The National Omics Data Encyclopedia) (https://www.biosino.org/node/), and the project ID is OEP001279. The raw next-generation sequencing data in fastq format can be downloaded by the following link: https://www.biosino.

ETHICS STATEMENT
Informed consent was voluntarily obtained from the participants who had been fully informed of the study including any of the benefits and