Liquid Biopsy in Gastric Cancer: Analysis of Somatic Cancer Tissue Mutations in Plasma Cell-Free DNA for Predicting Disease State and Patient Survival

Introduction: Gastric cancer (GC) diagnosis in late stages and high mortality rates are the main issues that require new noninvasive molecular tools. We aimed to assess somatic mutational profiles in GC tissue and plasma cell-free DNA (cfDNA), evaluate their concordance rate, and analyze the role of multilayer molecular profiling to predict disease state and prognosis. Methods: Treatment-naive GC patient group (n = 29) was selected. Whole exome sequencing (WES) of GC tissue was performed, and a unique 38-gene panel for deep targeted sequencing of plasma cfDNA was developed. Oncoproteins were measured by enzyme-linked immunosorbent assay, and other variables such as tumor mutational burden and microsatellite instability were evaluated using WES data. Results: The yield of cfDNA was increased 43.6-fold; the integrity of fragments was decreased in GC compared with controls. WES analysis of cancerous tissue and plasma cfDNA (targeted sequencing) mutational profiles revealed 47.8% concordance. The increased quantity of GC tissue–derived alterations detected in cfDNA was associated with worse patients' survival. Analysis of importance of multilayer variables and receiver operating characteristic curve showed that combination of 2 analytes: (i) quantity of tissue matching alterations and (ii) presence of any somatic alteration in plasma cfDNA resulted in area under curve 0.744 when discriminating patients with or without distant metastasis. Furthermore, cfDNA sequence alterations derived from tumor tissue were detected in patients who had even relatively small GC tumors (T1-T2). Discussion: Our results indicate that quantitative and qualitative cfDNA mutational profile analysis is a promising tool for evaluating GC disease status or poorer prognosis.


INTRODUCTION
Gastric cancer (GC) is one of the most common and lethal oncological diseases of the gastrointestinal tract worldwide because it is usually diagnosed at an advanced stage because of asymptomatic course of the disease (1). It is a complex disease arising from the interaction of environmental and hostassociated factors (2,3), and conventional diagnostic techniques or current molecular biomarkers have a very limited role for early diagnosis of GC (4,5). Thus, minimally invasive biomarkers that would help to determine specific molecular spectra for diagnostic and prognostic purposes are highly needed.
Improving technologies have enabled a more comprehensive molecular analysis in the body fluids of patients with cancer and have revealed that circulating tumor-derived molecules could provide multilayer molecular information suitable for cancer diagnostics, prognosis, or even response to therapy (6)(7)(8). The currently available studies analyzing ctDNA alterations in GC focus on a limited number of well-known oncogenes such as TP53 (6) and HER2 (9)(10)(11). On the other hand, studies implementing high-throughput technologies such as new generation sequencing (NGS) are still very scarce and have been mostly conducted in Asian populations (12)(13)(14).
In this study by using cancer tissue whole exome sequencing (WES), we developed custom 38-gene panel and performed cfDNA deep targeted sequencing in plasma samples. We were able to identify somatic alterations in cfDNA in a solid proportion of the patients with GC, including patients with early disease stages. Moreover, we performed multicomponent analysis for GC using machine learning on various analytes including cfDNA and oncoproteins. Our study suggests that qualitative and quantitative analysis of somatic variants in the plasma cfDNA might be a promising approach when discriminating patients based on disease state and even predict survival.

Patient samples
Treatment-naive GC patients (n 5 29) were recruited at the Department of Gastroenterology, Lithuanian University of Health Sciences Hospital during the period of 2015-2018. Clinical and demographic characteristics of patients are summarized in Figure 1 (see also Supplementary Table 1, Supplemental Digital Content 1, http://links.lww.com/CTG/A679). Paired tissue and plasma samples were collected at the same time point. Tumor tissue samples were obtained from the primary lesion during gastroscopy or surgical tumor removal. Peripheral blood was collected using K 2 EDTA tubes (10 mL; Becton, Dickinson and Company, Franklin Lakes, NJ) for cfDNA extraction (double centrifugation protocol within 2 hours of blood draw) and serum separator tubes (5 mL; Becton, Dickinson, and Company) for serum separation. The control group (n 5 20) consisted of self-reported healthy subjects without a history of cancer. All subjects provided written informed consent. Research was approved by the Kaunas Regional Biomedical Research Ethics Committee (No. BE-2-10, May 8, 2011, and No. BE-2-31, June 5, 2018, Kaunas, Lithuania).

Isolation of nucleic acids
Genomic DNA (gDNA) from the primary GC lesion was isolated using the AllPrep DNA/RNA Mini Kit (Qiagen, Hilden, Germany), and gDNA from white blood cells (WBC) was isolated using salting-out method. Total circulating nucleic acids from plasma were extracted using QIAamp Circulating Nucleic Acid isolation kit (Qiagen). All isolations of nucleic acids were performed according to the manufacturers' protocols. cfDNA yield and fragment size were evaluated using TapeStation 2200 system (Agilent Technologies, Santa Clara, CA). Tumor cfDNA fraction was calculated according to mean mutant allele frequency (MAF) in each patient's plasma sample. The GATK Best Practices paired-sample workflow (15) for somatic short variant discovery was used for the GC tissue exome analysis (human genome reference build hg19). Variants were called using GATK4 Mutect2 and annotated using Ensembl-VeP (v96.0) (16). Microsatellite instability (MSI) from WES data was evaluated using MSIsensor (17). Tumor mutational burden (TMB) was defined as the quantity of somatic mutations in the coding region per megabase (Mb) (18).
Filtering of somatic variants and selection of GC-related genes for cfDNA custom targeted sequencing panel was performed using following criteria: (i) prevalence of the mutation in general population ,1%; (ii) protein coding nonsynonymous, annotated as having high impact; (iii) Combined Annotation Dependent Depletion score .30; (iv) excluding variants that are present in 100% of the samples; and (v) variant supported with coverage $2 in both forward and reverse directions.

Statistical analysis
Statistical analysis and data visualization was performed using R Studio (R version 3.3.3). Comparison of total cfDNA yield was evaluated by 2-sided t test or Mann-Whitney U test depending on the data distribution. Correlation analysis was performed using the Spearman rank-order correlation analysis. Multivariate comparison was performed using ANOVA, and 2 groups were compared using x 2 or Fisher exact tests (2-sided). MAF analysis was conducted using maftools package (Bioconductor) (21). Gene list pathway enrichment analysis was performed using the PANTHER Gene List Analysis tool (22). Random forest analysis of the prediction variables' importance was performed using the Boruta and randomForest packages (23,24). Survival analysis was performed using the Kaplan-Meier method and Cox proportional hazards model.

Total cfDNA yield and size of the fragments differ between GC cases and controls
Total cfDNA yield (fragments from 100 to 1,000 bp, Figure 2a) was compared with GC clinical features and patients' characteristics. As expected, a significantly higher amount of total cfDNA was detected in patients with GC (87.59 ng per ml of plasma) compared with controls (2.01 ng per ml of plasma) (W 5 0, P 5 7.07 3 10 14 ) ( Figure 2b). Moreover, the analysis of total cfDNA yield revealed positive significant correlation with serum CEA Clinical and Translational Gastroenterology VOLUME 12 | SEPTEMBER 2021 www.clintranslgastro.com Next, we compared the quantity of tissue matching somatic variants in plasma cfDNA with GC clinical features and analyzed correlation with total cfDNA yield, serum level of oncoproteins, and age. Concordantly with literature (25,26), the quantity of unique somatic alterations detected in tissue and plasma and the quantity of tissue matching alterations in plasma revealed positive moderate correlation with age (tissue: R 5 0.47, P 5 0.012; plasma: R 5 0.4, P 5 0.035; matching variants: R 5 0.38, P 5 0.048 [see Supplementary Figure 4, Supplemental Digital Content 8, http://links.lww.com/CTG/A686]). Our analysis revealed that cfDNA sequence alterations derived from tumor tissue were detected significantly more often in samples of the patients with larger tumors (T3-T4-55.6% and T1-T2-10.0%, x 2 5 5.59, P 5 0.018) (Figure 6a) and in patients with distal metastasis (not significantly) (45.5% and 37.5%, M1 and M0, respectively, x 2 5 0.17, P value 5 0.679) (Figure 6b). Survival analysis showed that patients without sequence alterations in cfDNA had a median survival time (MST) of 803 days, whereas MST for patients with 1-2 cfDNA sequence alterations was 469 days. MST for patients with 3-6 cfDNA sequence alterations and more than 6 cfDNA sequence alterations was 315 and 44 days, respectively (P value 5 0.008) (Figure 6c). In addition, Cox proportional hazards model for the survival analysis was used. Model included not only tissue matching somatic variants detected in plasma but also patients' demographics and tumors characteristics: age, gender, and size of the primary tumor based on tumor-node-metastasis staging. Results showed slight gender impact on survival estimation (padj 5 0.0410) and significant effect of more than 6 variants detected in plasma (padj 5 0.0186) for shorter lifespan.

Qualitative and quantitative analysis of somatic variants in plasma discriminates patients with distant metastases
The role of multilayer molecular profiling in the discrimination of patients with larger tumors (T3-T4) and distant metastases was evaluated by including analytes such as concentration of oncoproteins CA 19-9, Cancer Antigen 72-4, CEA, MSI status, TMB, quantity of somatic mutations (unique or matching the tumor tissue), presence or absence of somatic mutations (unique or matching the tumor tissue), and specific mutations of the most mutated genes. Our analysis revealed that the quantity of tissue matching variants and the presence of any somatic alteration in plasma cfDNA was shown to be significant for discrimination between M0 and M1 groups (classification analysis resulted in area under curve 5 0.744).

DISCUSSION
In this study, we present a robust analysis of liquid biopsy for GC using circulating plasma cfDNA. We show that somatic mutations determined by WES in GC tissues can be tracked in the blood of patients with GC. Furthermore, our study suggests that qualitative and quantitative analysis of somatic variants in the plasma cfDNA might be a promising approach to discriminate patients with advanced disease.
Raised cfDNA levels were first reported in the serum of patients with cancer in 1977 (27). However, it was shown that concentration of cfDNA could increase because of number of physiological conditions, and more specific analysis of circulating nucleic acids is needed. Circulating tumor DNA can be detected in any body fluids, does not require additional analysis tools such    as cell sorting as in the case of circulating tumor cell analysis, and has a very high clinical potential: applications from noninvasive genomic analysis of cancer, quantification of disease burden, disease burden monitoring, and clonal evolution. Despite the recent effort (The Cancer Genome Atlas Research Network) (28), there is still high need for more appropriate gene panels for cfDNA analysis which could be implemented in the routine diagnostics. To analyze wide molecular spectra and investigate genetic alterations in the GC patient group of the European descent, we performed WES for tumor tissue and WBC samples. Twenty-three of 29 patients with GC (79.31%) had cancer-associated somatic alterations detected in tissue. All mutated genes were previously associated with gastric tumorigenesis and reported in the Catalogue Of Somatic Mutations In Cancer database (29). Signaling pathway enrichment analysis revealed that genes which we found to be mutated were involved in Wnt and cadherin pathways (see Supplementary Figure 3, Supplemental Digital Content 9, http://links.lww.com/CTG/ A687) (30). Based on our WES results, a custom 38-gene panel for deep targeted sequencing of plasma cfDNA was designed. To the best of our knowledge, this is the first study conducted in patients with GC which implemented UMI error correction and deep sequencing for accurate cfDNA mutational analysis. This approach allowed us to determine somatic alterations in plasma cfDNA samples for 21 of 23 alteration-positive tissue samples (91.3%) and tumor tissue matching alterations for 11 of 23 alteration-positive tissue samples (47.8%). By comparison, previously reported plasma ctDNA mutational concordance with tissue ranged from 33.9% to 58% (8,13,14), and the differences could be explained by GC tissue molecular heterogeneity (31).
Furthermore, we have compared the quantity of tissue matching alterations detected in plasma cfDNA with different clinical features. In concordance to other studies, the analysis has revealed that alterations derived from tumor tissue were detected significantly more often in samples from the patients with more advanced tumors (6,8) and could be associated with worse survival (8). But, our data also indicated that even relatively small GC tumors (10% of T1-T2) could shed detectable amounts of ctDNA into the blood stream. Multicomponent analysis of variable importance based on machine learning algorithms showed that combination of quantity of tissue matching alterations in cfDNA and presence of any somatic alteration in plasma cfDNA was the most accurate when discriminating patients with distal metastasis (area under curve 5 0.744). However, it is important to note that more than a third of gastric tumors without distant site metastasis still gave rise to detectable cfDNA molecules carrying somatic alterations. Therefore, we believe that an ability of our custom cfDNA panel to detect even a fraction of patients with nonadvanced tumors (early stages or without metastasis) could improve early cancer detection and increase survival rates (32). Studies report strong correlation between tumor-derived cfDNA detection rates and stage of tumors and in concordant with our findings show that detection rate is around 30% for tumors without distant metastasis (6,8,33,34). Moreover, survival analysis revealed that an increased quantity of somatic mutations in plasma cfDNA is associated with the worse patient's survival. Well-known cancer diagnostic analytes (MSI status, TMB, and oncoproteins) did not reveal any significant impact in our variable importance analysis or our discrimination analysis. These findings support the great need of new minimally invasive molecular markers for GC diagnosis and disease state monitoring.
In addition, we observed that the total cfDNA yield is increased in patients with GC. The higher total cfDNA yield in GC is consistent with previous studies of gastrointestinal cancers (35)(36)(37)(38). Although results of various studies show that levels of oncoproteins such as CEA hardly correlate with clinicopathological features (39,40), we found moderate positive correlation with total cfDNA yield and serum CEA levels for patients with GC. The logical explanation for this correlation could be that increased levels of both total cfDNA yield and CEA are observed during tumorigenesis. In the analysis of cfDNA fragment distribution, we showed that the higher total cfDNA yield in GC affected all fragment sizes and mononucleosomal and dinucleosomal fragments were smaller in the patients with GC compared with the control. This observation supports the hypoxia theory: Rapidly growing tumor cells lack oxygen; hypoxia induces necrosis which leads to phagocytosis of tumor cells and DNA fragment release to the blood stream.
The study has some limitations. Study sample size is small; however, study population was well clinically defined and tested for many clinically relevant variables. Healthy controls' plasma cfDNA was not sequenced while healthy controls usually have a very low total cfDNA yield and extremely low ctDNA fraction. This could result to inconsistencies and sequencing errors. Although the gene panel was not evaluated in the independent validation group, all variants were manually checked on integrative genomics viewer. Nevertheless, we believe that this study adds very important new data for the development of clinically relevant liquid biopsy tools in patients with GC.
In conclusion, sequencing-based approaches have the advantage of being flexible and capable of detecting a wide range of aberrations in tumor genomes. Therefore, in this study, WES was performed to analyze the GC tissue mutational profile and to develop a custom panel for cfDNA mutational profile analysis. It is important to note that by using our gene panel and UMI correction, we were able to detect tumor-derived cfDNA even for small tumors and tumors without distant metastasis and identify a solid proportion of patients with GC carrying somatic  alterations in plasma cfDNA. We found that the quantity of somatic alterations could be associated with overall patients' survival. Further investigation of plasma cfDNA could implement larger cohorts of the patients with GC and analysis of MAF in cfDNA at different disease time points and/or disease status (e. g. relapse or remission). The implementation of plasma cfDNA analysis into routine cancer testing is still technically challenging, and more population-based screening studies are still needed. However, given the progress in NGS technology and new methods of processing complex data, tumor-derived cfDNA even   3 This unique gene panel enabled to identify a solid proportion of patients with GC carrying somatic alterations in plasma cfDNA, including patients whose disease was in early stages.
3 Multilayer molecular machine learning-based analysis indicated that the quantity of tissue matching variants and the presence of any somatic alteration in plasma cfDNA is significant for discrimination between M0 and M1 groups.