Comprehensive analysis to identify DNA damage response-related lncRNA pairs as a prognostic and therapeutic biomarker in gastric cancer

Objective: Gastric cancer (GC) is the fifth most common malignancy and the fourth leading cause of cancer-related mortality worldwide. The identification of valuable predictive signatures to improve the prognosis of patients with GC is becoming a realistic prospect. DNA damage responserelated long noncoding ribonucleic acids (drlncRNAs) play an important role in the development of cancers. However, their prognostic and therapeutic values remain sparse in gastric cancer (GC). Methods: We obtained the transcriptome data and clinical information from The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) cohort. Co-expression network analyses were performed to discover functional modules using the igaph package. Subsequently, lncRNA pairs were identified by bioinformation analysis, and prognostic pairs were determined by univariate analysis, respectively. In addition, we utilized least absolute shrinkage and selection operator (LASSO) cox regression analysis to construct the risk model based on lncRNA pairs. Then, we distinguished between the highor lowrisk groups from patients with GC based on the optimal model. Finally, we reevaluated the association between risk score and overall survival, tumor immune microenvironment, specific tumor-infiltrating immune cells related biomarkers, and the sensitivity of chemotherapeutic agents. Results: 32 drlncRNA pairs were obtained, and a 17-drlncRNA pairs signature was constructed to predict the


Introduction
Gastric cancer (GC) is the fifth most common malignancy and the fourth leading cause of cancer related mortality worldwide, according for 768,793 deaths in 2020 [1]. Despite many efforts have been made to improve treatment strategies, approximately 50% of patients with GC are diagnosed in advanced stage, therefore, have an unfavorable prognosis [2]. Chemotherapy alone, or in combination with surgery is the mainstay of treatment for this stage [3]. Unfortunately, chemoresistance becomes the major obstacle to successful cancer treatment [4]. Hence, it is imperative to search for effective prognostic and therapeutic indicators for GC.
DNA damage response (DDR) plays a crucial role in the tumorigenesis and progression of several cancers, which also contributes to the resistance of chemotherapy agents [5]. Genomic instability has been implicated in this process. Due to genomic instability, cancer cells might accumulate deleterious secondary agents and promote its progression. Chromosomal instability is now appreciated to occur during the development of ~50% of GC, which contributes to genomic instability [6]. In order to maintain genome stability, the relevant cellular response involves the activation of multiple signaling cascades through DNA damage response (DDR) pathway [7,8]. The dysregulation of DDR pathway related genes was closely associated with the tumorigenesis, progression, and prognosis in GC [9]. However, the molecular mechanisms of DDR in GC have not been fully elucidated and need further study.
Long non-coding RNAs (lncRNAs), a class of RNA transcripts, play important roles in many biological processes by regulating gene expression and signal transduction [10]. The dysregulation of gene expression and signal transduction contributes to the progression of GC [11]. Recently, more and more evidences indicate several lncRNAs could maintain genome integrity via regulating the DDR signaling. In this process, RNA polymerase II is recruited to the sites of DNA damage and further synthesize damage-induced lncRNA to promote DNA repair [12]. Damage-induced lncRNA is necessary for the secondary recruitment of DDR proteins to DNA double strand breaks to form foci formation [13]. Pervious study has also indicated that lncRNAs act as a regulatory role in radioresponse through DDR pathway in cancers [14]. Therefore, it is reasonable to identify lncRNAs associated with DDR to predict GC prognosis. Nevertheless, we still lack direct evidence about the predictive power of DDR related genes in the prognosis and therapeutic of GC. Whether DDR related lncRNAs contributed to the prognosis and therapeutic of GC or not needs further study.
In this study, we obtained the transcriptome data and clinical information from The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) cohort. Co-expression network analyses were performed to discover functional modules using the igaph package. Subsequently, lncRNA pairs were identified by bioinformation analysis, and prognostic pairs were determined by univariate analysis, respectively. In addition, we utilized least absolute shrinkage and selection operator (LASSO) cox regression analysis to construct the risk model based on lncRNA pairs. Then, we distinguished between the high-or low-risk groups from patients with GC based on the optimal model. Finally, we reevaluated the association between risk score and overall survival, tumor immune microenvironment, specific tumor-infiltrating immune cells related biomarkers, and the sensitivity of chemotherapeutic agents. The flow of this present study is shown in Figure 1.

Data source
The transcriptome profiling and corresponding clinical data regarding 375 GC were downloaded from the TCGA (https://portal.gdc.cancer.gov). The complete clinical information of 350 GC patients was available in the TCGA database. The 276 identified DDR-related core genes were acquired from the previous study, and were listed in Supplementary Table S1 [15]. Annotation information of lncRNAs was obtained from GENECODE website (https://www.gencodegenes.org/human/), and GRCh38 was utilized as the genome annotation file from the UCSC database (http://hgdownload.cse.ucsc.edu/).

Differentially expressed analysis of drlncRNAs
Then, we performed co-expression analysis to screen out drlncRNAs between DDR-related genes and lncRNAs using R package igraph (version 1.2.7), with the threshold of correlation coefficients more than 0.4 (absolutive value) and P-value less than 0.001. Subsequently, R package limma was performed to screen the differentially expressed drlncRNAs. Finally, we set adjusted P-value as the threshold to filter out.

Establishment of lncRNA-mRNA co-expression network
The lncRNA-mRNA co-expression network was constructed to assess the correlation between DEdr-lncRNAs and targeted mRNA using the igaph package (version 1.2.7). Pearson correlation analysis was utilized to identify the targeted mRNAs based on correlation coefficient |R| > 0.5 and P-value less than 0.05.

Pairing DDR related lncRNA
Differentially expressed drlncRNAs were periodically singly paired. Let us assume that lncRNA A plus lncRNA B equals C. If the expression level of lncRNA A is higher than that of lncRNA B, then C is defined as 1, otherwise C is defined as 0. There is no relationship between the lncRNA pair and prognosis if the expression quantity of C is 0 or 1. The lncRNA pair was identified as a valid match when the amounts of pairs with the expression quantity of 0 or 1 accounted for more than 20% of the total lncRNA pairs [16].

Construction and verification of the prognostic signature
After obtaining the clinical information of GC from TCGA, univariate Cox regression analysis was performed to screen lncRNA pairs related to overall survival (OS) with P-value less than 0.05. Next, we used the least absolute shrinkage and selection operator (LASSO) analysis to identify lncRNA pairs most correlated with OS. Subsequently, these filtered genes were rolled in the Cox proportional hazard regression analysis, and to construct the signature with penalty parameter (λ) determined by 10-fold cross-validation. The following formula was used to calculate the risk score of each patient: Risk score = ∑ (bi × Si) = . We performed the ttimeROC R package to assess the predictive potential of the model. The 1-, 2-, and 3-year ROC curves were plotted. The maximum inflection point identified from the Akaike information criterion (AIC) values of every point of the 1year ROC curve was determined as the cut-off value. To validate this cut-off point, GC patients were divided into high-and low-risk groups, and Kaplan-Meier analysis was performed to show the survival difference. Besides, we compared our lncRNA pairs model with several established lncRNA models. Furthermore, the univariate and multivariate Cox regression analyses were utilized to determine whether the risk score was an independent prognostic factor for GC.

Functional enrichment of risk model
To study the potential biological attributes that were correlated with our model, the differentially expressed genes (DEGs) between the different groups were utilized to perform enrichment analyses. We first identified the expression of DEG sets between two subgroups, with the threshold of |log2FC| > 1.5 along with FDR < 0.05. GO and KEGG analysis was further conducted by R package clusterProfiler.

Investigation of tumor-infiltrating immune cells
To explore the relationship between the risk score and tumor immune characteristics, we used the currently acknowledged methods to calculate the status of immune cell infiltration including CIBERSORT, CIBERSORT-ABS, TIMER, XCELL, QUANTISEQ, MCPcounter, and EPIC. Wilcoxon signed-rank test was performed to analyze the differences between high-and low-risk groups. Spearman correlation analysis was performed to analyze the correlation between the risk score and tumor-infiltrating immune cells. Subsequently, we validated the results by assessing the expression level of multipotency cell-markers.

Estimation of the sensitivity of chemotherapeutic agents
To evaluate the sensitivity of chemotherapeutic drugs, we calculated the half-maximal inhibitory concentration (IC50) of GC patients in the high-and low-risk groups using the pRRophetic package. Nine common administrating chemotherapeutic drugs were recommended as candidates for GC. Finally, we compared the IC50 value between the high-and low-risk groups by Wilcoxon signed-rank test.

Statistical analysis
R software 3.6.3 was used for all analyses. P-value less than 0.05 and 0.001 were considered statistically and highly significant, respectively. FDR less than 0.05 was considered statistically significant.

Differentially expressed drlncRNAs and enrichment analyses in GC
We performed pearson correlation coefficient analysis to identify drlncRNAs. 722 drlncRNAs were identified and rolled in the subsequent differential expression analysis. Besides, 113 drlncRNAs were distinguished as differentially expressed drlncRNAs (DEdrlncRNAs) (Figure 2A). Among these DEdrlncRNAs, 9 were downregulated in GC samples, while others were upregulated ( Figure 2B). Subsequently, the lncRNA-mRNA co-expression network was established ( Figure 2C).  With the criteria mentioned above, a total of 3947 valid drlncRNA pairs were identified. We utilized univariate Cox regression analysis to screen 32 drlncRNA pairs associated with OS of GC. Using LASSO Cox regression analysis, we constructed a 17-drlncRNA-pair prognostic signature from the above 32 OS-related drlncRNA pairs. The maximum inflection point with value of equal to 0.245 was identified as the cut-off point ( Figure 3A). Based on the cut-off point value, 350 patients with GC were divided into high-risk group (N = 126) and low-risk group (N = 224). As shown in Figure 3B, the OS of patients in the high-risk group was worse than those in the low-risk group. The ROC value of the prognostic signature was 0.797, 0.812 and 0.821 at 1, 2, 3 years, respectively ( Figure 3C). Furthermore, we compared the ROC value of risk score with DDR model. The result showed that risk score was superior to the model based on DDR ( Figure S1). Distribution of risk score and survival status of our model was shown between high-and low-risk groups (Figures 3D and E). In addition, the performance of risk model was better than those of other clinical parameters ( Figure 3F). Finally, we evaluated the performance of risk model in different subgroups, including age, gender, stage, and grade. The ROC value in these subgroups showed the same result as entire group (Figure 4).

Independent prognosis analyses
We used univariate and multivariate Cox regression analyses to determine whether the risk score was an independent prognostic factor for GC. In univariate Cox regression analysis, the hazard ratio (HR) of risk score was 2.786 (P < 0.001) ( Figure 5A). After correction for other confounding factors, the risk score (P < 0.001, HR = 3.198, 95% CI [2.510-4.074]) still showed statistical differences by multivariate Cox regression analysis, which indicated that risk score was one independent prognostic factor for GC ( Figure 5B).

Comparison of established lncRNA-related prognostic signatures
To evaluate the predictive value of the lncRNA pairs signature, we reviewed recently published prognostic signatures associated with lncRNA and compared them with the present study. As shown in Figure 5C, the predictive performance of our signature was superior to others.

Functional enrichment of risk model
To elucidate the potential biological pathways, we performed the enrichment analyses involving GO and KEGG analyses based on risk score. The results showed that these genes in high-risk group participated in the biological processes of extracellular structure organization, protein-lipid complex remodeling, etc. ( Figure 6A). In addition, the pathways involved in cholesterol metabolism, fat digestion and absorption, PPAR signaling pathway, etc. ( Figure 6B).

Correlation between risk score and the tumor-infiltrating immune cells in GC
In recent years, a great deal of evidences showed the tightly correlation between DDR and the immune system [17]. We next analyzed whether the prognostic signature was associated with the tumor immune microenvironment. As shown in Figure 7A, patients in the high-risk group were significantly positively correlated with these immune cell subtypes such as macrophage, cancer-associated fibroblast (CAF), T cell, monocyte, and myeloid dendritic cell. Furthermore, the markers of these cells ACTA2, CD163, CD206, CSF1R, and MAF were differentially expressed in GC samples with highrisk score ( Figure 7B). Taken together, these results suggested great significance of these infiltrating immune cells in the progression of GC.

Estimation of correlation between risk signature and the efficacy of chemotherapeutic agents
To evaluate the clinical value of the model, we attempted to investigate the association between risk score and the sensitivity of common administrating chemotherapeutics. Our study revealed that the high-risk score was linked to higher chemosensitivity in Mitomycin.C, Methotrexate, Cytarabine whereas it was associated with lower chemosensitivity in Imatinib, Dasatinib, Pazopatinib, Sorafinib ( Figure 8). Thus, our risk model showed potential predictive performance for chemosensitivity.

Discussion
With the wide application of high throughout technology, the comprehensive profiling of genes is useful for providing valuable molecular information to individualized therapy. Previous study demonstrated that DDR was involved in the tumorigenesis and progression of GC [18,19]. In addition, DDR related genes mutations were identified in GC [20,21]. Although several studies elucidated coding genes and noncoding RNAs have implications in the prognostic of malignant tumor [9], almost all signatures are composed of quantifying the expression levels of transcripts. Besides, no specific gene-pair model has been developed for GC until now. Therefore, we constructed a risk model based on multiple gene-pair signatures that could act as prognostic and therapeutic biomarker in GC.
Previous study confirmed that lncRNA expression profile has been extensively altered in the development of GC, and further regulated genomic stability, cell cycle, apoptosis, and angiogenesis [22]. LncRNA has been identified as one class of putative biomarkers in GC. In this study, we first retrieved expression data from TCGA-STAD cohort. After performing the differential co-expression analysis, several DEdrlncRNAs were screened out. We further established lncRNA pairs with a 0-or-1 matrix mentioned above. Second, we performed univariate analysis to screen out OS-related lncRNA pairs, and modified lasso penalized regression to choose DEdrlncRNAs pairs. Third, we regrouped the highor low risk-group among patients with GC based on the optimal cut-off point. Finally, we reevaluated the association between risk score and survival, tumor immune microenvironment, specific tumorinfiltrating immune cells related biomarkers, and the sensitivity of chemotherapeutic agents. These results suggested that our signature was an applicable and precise tool, which could provide support to clinical decision-making.
Identification of disease-related lncRNAs is benefit to elucidate complex pathogenesis. In order to provide valuable information for clinical decision-making, the comprehensive profiling of molecular information was carried out in GC [23]. As mentioned above, we identified 17 lncRNA pairs in this signature. some of them have been explored in GC including LINC00106, TSPEAR-AS2 and HAND2-AS1 [24,25]. The recent study indicated that TSPEAR-AS2 was involved in the progression of GC via suppressing GJA1 expression and upregulating CLDN4 expression [25]. Li et.al. have reported that the aberrant expression of HAND2-AS1 was linked to GC progression and prognosis. HAND2-AS1 could inhibit proliferation, colony formation and promote apoptosis in CG [26]. Additionally, other lncRNAs in our model have been investigated in other cancers including TFAP2A-AS1, KCNMB2-AS1 and LINC01082. Therefore, further studies need to be further conducted in GC. Interestingly, our result suggested that the AL158166.1/AC106820.3 pair with highest coefficient was the main contributing gene pair. However, the association between AL158166.1/AC106820.3 and GC have not been explored until now. Given the potential biological and clinical value of this gene-pair, further research on this is still warranted.
Compelling evidence indicated DDR signaling pathway has been associated with activation of anti-tumor immune responses [27]. It has been reported that DNA damage could trigger innate immune responses and IFN signaling [28]. Subsequently, IFNs promote mobilization of immune cells to kill cancer cells [27]. A comprehensive understanding of anti-tumor immune response induced by DNAdamage is essential for the development of effective cancer treatment. In the present study, we explored the association between risk score and tumor-infiltrating immune cells in GC using several common acceptable methods. Our results suggested that risk score was positively associated with macrophage, cancer-associated fibroblast (CAF) and several types of T cell. As mentioned above, dysfunction of DDR proteins could facilitate tumor antigen presentation, and further promote the activation of Type I IFN signaling pathway via T lymphocytes [27,29]. CAF plays a crucial role in the initiation and progression of cancers. Some studies have reported that CAFs could play a regulatory role in inhibition or activation of non-coding RNA function through DDR pathway [14]. Besides, CAFs also mediates drug resistance by secreting various bioactive factors [30]. It has been well accepted that M2 macrophage polarization could facilitate tumor growth, invasion, and immune escape. Meng X, et al. found that the alteration of DDR might remodel the tumor immunosuppressive microenvironment by promoting M2 polarization [31]. In this study, we evaluated the association between risk model and the specific biomarkers of immune cells, which is consistent with these published findings.
Previous studies demonstrated an important crosstalk between the DDR and the immune system in cancers. Targeting DDR related protein could provide new perspectives for cancer treatment. Sistigu A et.al found that the efficacy of chemotherapeutic agents partly relied on activation of anti-tumor immune response [32]. Therefore, we explored the association between risk score and sensitivity of chemotherapeutic agents. As shown in the results, some chemotherapeutic agents expressed significant difference in chemosensitivity between high-and low-risk groups. Further studies are needed to validate our study in the future. Although our risk model has shown promise for GC patients, there were still some limitations and shortcomings in this study. Due to the relatively insufficient of lncRNA datasets for the analysis, we only analyzed the data from the TCGA cohort. In this aspect, 0-or-1 lncRNA pairs matrix would introduce less variations and minimize the sample errors. Therefore, our model might be acceptable in the absence of external data validation. However, prospective external validation would be more beneficial than retrospective cohort. In the further, we will recollect the clinical samples for the prospective validation in our population cohort, in attempts to confirm the clinical application value of the model.

Conclusions
In the present study, we constructed a novel signature by paring drlncRNAs to predict the prognosis and the sensitivity of chemotherapy. This model could provide promising prediction value, and guide individual treatment strategies in the future.