Integrated bioinformatics analysis of noncoding RNAs with tumor immune microenvironment in gastric cancer

In recent years, molecular and genetic research hotspots of gastric cancer have been investigated, including microRNAs, long noncoding RNAs (lncRNAs) and messenger RNA (mRNAs). The study on the role of lncRNAs may help to develop personalized treatment and identify potential prognostic biomarkers in gastric cancer. The RNA-seq and miRNA-seq data of gastric cancer were downloaded from the TCGA database. Differential analysis of RNA expression between gastric cancer samples and normal samples was performed using the edgeR package. The ceRNA regulatory network was visualized using Cytoscape. KEGG pathway analysis of mRNAs in the ceRNA network was performed using the clusterProfiler package. CIBERSORT was used to distinguish 22 immune cell types and the prognosis-related genes and immune cells were determined using Kaplan-Meier and Cox proportional hazard analyses. To estimate these nomograms, we used receiver operating characteristic and calibration curve studies. The ceRNA regulation network of gastric cancer was built in this study, and the genes in the network were analyzed for prognosis. A total of 980 lncRNAs were differentially expressed, of which 774 were upregulated and 206 were downregulated. A survival study identified 15 genes associated with gastric cancer prognosis, including VCAN-AS1, SERPINE1, AL139002.1, LINC00326, AC018781.1, C15orf54, hsa-miR-145. Monocytes and Neutrophils were associated with the survival rate of gastric cancer. Our research uncovers new ceRNA network for the detection, treatment, and monitoring of gastric cancer.

Prognosis of related genes.Lasso regression analysis was performed, and two plots were generated.
The variation of coefficients at different log Lambda values was depicted (Fig. 3A).The horizontal axis represented log Lambda, while the vertical axis represented the values of the coefficients.The changes in partial likelihood deviance at different log Lambda values were showcased (Fig. 3B).The horizontal axis represented log Lambda, and the vertical axis represented the values of the partial likelihood deviance.VCAN-AS1, SERPINE1, AL139002.1,LINC00326, AC018781.1,C15orf54, hsa-miR-145 were obtained by Multivariate Cox regression analysis (Fig. 3C).The scores were then calculated, and the patients were classified into high-risk and low-risk groups based on the median risk score in the training set (Fig. 3D).The relationships between survival status and survival times of gastric cancer patients, ordered by their respective risk scores, were displayed (Fig. 3E).
The high -risk group had a worse chance of survival than the low-risk group (Fig. 4A).The AUC at 1 year, 3 years, 5 years were 0.665, 0.674 and 0.789 (Fig. 4B).The gene SERPINE1 was highly expressed in both high and low-risk groups (Fig. 4C).The Nomogram was established based on the independent prognostic indicators, which predicts survival rates for the first, 3 to 5 years.).Moreover, the calibration curve for nomogram models showed a good agreement between the actual survival rate and the predicted 3 years overall survival rate, suggesting that this model was accurate in its predictions (Fig. 4E).were higher in tumor cells, while B cells memory, plasma cells, T cells CD8, T cells CD4 memory activated and Monocytes were lower in tumor cells (Fig. 5B).The heatmap was also demonstrated that various immune cells were differently expressed in tumor cells (Fig. 5C).CD4 memory activated T cells was associated with survival probability.High number of CD4 memory activated T cells had a good survival probability (Fig. 6A).The association among 22 different types of immune cells was presented (Fig. 6B).The expression of B cells naïve, Mast cells resting and NK cells activated was higher in younger than in older age, while the expression of Mast cells activated, Neutrophils, NK cells resting was lower than in older age (Fig. 7).The expression of Macrophage M0, Mast cells activated, and Plasma cells was higher in G1/2 than in G3 grade, while the expression of Macrophage M1, Mast cells resting, Monocytes, T cells CD4 memory activated, T cells CD8, and T cells follicular helper was lower in G1/2 than in G3 grade (Fig. 7).
Monocytes and neutrophils in gastric cancer.Next, we found no significant difference for survival probability between the high-risk group and the low-risk group (Fig. 8A).The data from AUC curve showed that the nomogram survival prediction model had a reasonable accuracy.The AUC values at 1 year, 3 years, and  8B).Moreover, the calibration curve for nomogram models demonstrated a consistent agreement between the actual survival rate and the predicted 3 years overall survival rate, implying that this model was accurate in its predictions (Fig. 8B).In addition, hsa-miR-145 and AC018781.1 were associated www.nature.com/scientificreports/with Monocytes, while AC018781.1,AL139002.1,SERPINE1, VCAV-AS1 and C15orf54 were associated with Neutrophils (Fig. 8C-D).In the high-risk group, Monocytes and neutrophils were highly expressed (Fig. 9A).
Pearson's correlation assessed the association between biomarkers and immune cells.C15orf54 had the strongest relationship with Neutrophils with a coefficient of 0.32.VCAN-AS1 had a good association with Neutrophils with a coefficient of 0.28.In addition, hsa-miR-145 had a stronger relationship with Monocytes with a coefficient of 0.31 (Fig. 9B).

Discussion
Although the incidence and mortality of gastric cancer have decreased in recent years, gastric cancer is still one of the most common malignant tumors in the world and causes a heavy medical burden globally 3 .identify potential biomarkers associated with the diagnosis, treatment and prognosis of gastric cancer, comprehensive bioinformatics analysis may help to achieve this goal and develop personalized treatment for gastric cancer patients.Based on RNA expression profiles from TCGA, 980 lncRNAs, 104 miRNAs and 1639 mRNAs were identified, which were differentially expressed between tumors and normal tissues.The current work used  www.nature.com/scientificreports/ LncRNAs have also been considered as an underappreciated novel therapeutic target for gastric cancer 43 .Their diverse roles in cancer progression offer new opportunities to disrupt metastasis in clinical settings 44 .LncRNAs play important roles in gene regulation, including regulation of gene activation and silencing, X chromosome inactivation, alternative splicing, and post-translational regulation.mRNAs, miRNAs, and lncRNAs are linked in an intricate network of crosstalk by competing endogenous RNAs 45 .LncRNA VCAN-AS1 has been reported to promote the malignant tumor behaviors via modulating the miR-106a-5p-involved STAT3/HIF-1α axis in breast cancer 46 .Two studies also validated that lncRNA VCAN-AS1 was abnormally expressed in gastric cancer by constructing the ceRNA regulatory network 47,48 .Feng et al. 49 reported that lncRNA VCAN-AS1 contributed to gastric cancer progression via regulation of p53 expression.Hence, lncRNA VCAN-AS1 might be a potential factor to influence gastric cancer development and progression.Due to the unclear role of lncRNA VCAN-AS1 in TIME, further investigation is required to determine the functions of VCAN-AS1 in regulating TIME in gastric cancer.
One study has confirmed that AL139002.1 sponged miR-490-3p and regulated the expression of Hepatitis A virus cellular receptor 1 (HAVCR1) in gastric cancer, contributing to the development of gastric cancer 50 .In addition, LINC00326 has been found to regulate hepatocarcinogenic lipid metabolism 51 .In non-small cell lung carcinoma, overexpression of LINC00326 attenuated tumor progression via blockade of Wnt/β-catenin pathway through miR-657/DKK2 axis 52 .Liu et al. 53 used a competing endogenous RNA network and found that 3 lincRNAs, C15orf54, ADAMTS9-AS1 and AL391152.1,were involved in survival rate of gastric cancer patients.LncRNA AC018781.1 was observed to be an independent risk factors for gastric cancer by a bioinformatic and integrated analysis 48 .
In this study, we found the important role of miR-145 in gastric cancer.In line with this finding, numerous studies have validated the functions of miR-145 in gastric cancer progression 54 .For example, miR-145 inhibited cell proliferation, invasion and migration as well as cell cycle progression via suppression of transcription factor Sp1 in gastric cancer 55 .Xing et al. reported that miR-145 targeted the expression of catenin-δ1, MYO6 and www.nature.com/scientificreports/inhibited cell invasion in gastric cancer 56,57 .Chang et al. 58 found that miR-145 induced the antiproliferative effects via targeting E2F3 in gastric cancer.Low expression of miR-145-5p was associated with poor prognosis in gastric cancer 59 .It is clear that the role of miR-145 has been well studied in gastric cancer.SERPINE1 gene has been studied in a variety of human cancers, including gastric cancer.One group reported a SERPINE1-based immune gene signature, which can predict immunotherapy response and patient prognosis in gastric cancer 60 .Another group uncovered that SERPINE1 enhanced malignant progression and associated with poor prognosis in gastric cancer patients 61 .Several analyses support SERPINE1 as a potential prognostic biomarker in gastric cancer [62][63][64] .One study used data mining in combination with bioinformatics dissected that SERPINE1 was associated with immunoinfiltration and might be a diagnosis and prognosis biomarker in stomach adenocarcinoma 65 .High SERPINE1 expression group presented higher immune cells' expression, including CD4+ T cells, neutrophils, CD8+ T cells, macrophages and B cells.Moreover, SERPINE1 was associated with immune cells in the TIME of stomach adenocarcinoma 65 .Another study discovered that SERPINE1 expression was correlated with several types of immune cells, such as neutrophils, macrophages, CD4+ T cells, CD8+ T cells, B cells, and dendritic cells in clear cell renal cell carcinoma 66 .
Recently, evidence suggested that SERPINE1 was associated with immune infiltrates in gastric cancer 67 .SER-PINE1 enhanced the inhibitory TIME, which was positively associated with the infiltration level of neutrophils, macrophage M2, resting NK cells, and activated mast cells.SERPINE1 was negatively associated with plasma cells and B cell memory in gastric cancer 67 .Consistently, in our study, we identified that SERPINE1 expression was positively correlated with neutrophils in gastric cancer.
CD4+ regulatory T cells were identified to be associated with survival in gastric cancer 68 .In gastric cancer patients, CD4+ T cells and regulatory T cells were enriched, and lower expression of miR-128-3p was correlated with overall survival.Moreover, miR-128-3p targeted the expression of IL16 in gastric cancer 68 .One group reported that high infiltration of CD4+ T cells was linked to worse overall survival in gastric cancer 69 .Another group revealed that CD4+ memory T cell-related genes were correlated with clinical overall survival in patients with gastric cancer 70 .Additionally, lncRNAs, including A2M-AS1, C2orf27A, and ZNF667-AS1, impaired activation of CD4+ T cells and affected prognosis of gastric cancer patients 71 .In gastric cancer patients with FLOT chemotherapy (5-Fluorouracil, Leucovorin, Oxaliplatin and Docetaxel), CD4+/CD8+ lymphocyte ratio was elevated and predicted favourable therapy response 72 .In our study, CD4 memory activated T cells were linked to survival of gastric cancer patients.Without a doubt, more exploration is necessary to determine the mechanism of CD4 T cells-mediated survival in gastric cancer.
In summary, we successfully identified lncRNAs closely related to the occurrence and development of gastric cancer based on the TCGA database.Based on these lncRNAs, a ceRNA network based on the lncRNA-miRNA-mRNA regulatory mechanism was constructed.Several lncRNAs, miRNAs and mRNAs associated with gastric cancer prognosis were screened in the ceRNA network by survival analysis.These indicators provide new targets for the prognosis evaluation of gastric cancer patients.At the same time, our study also confirmed the relationship between immunity and lncRNA-based ceRNA regulatory network.This study will enable us to identify more useful targets to develop effective treatment strategies for gastric cancer.It has several limitations in this study.In TCGA database, there are lower numbers of control groups compared with the tumor groups.The discrepancy between the control group and the experimental group could lead to potential bias in the analysis results.In addition, it is important to mention that in vitro and in vivo experiments are necessary to validate our findings in the future.

Materials and methods
Data colleting from TCGA .RNA-sequencing data, miRNA profiling and clinical information of STAD were downloaded from the TCGA database (https:// portal.gdc.cancer.gov/).Illumina HiSeqRNASeq platforms were used to obtain mRNA and lncRNA data.HiSeqmiRNASeq platforms were used to collect miRNA data.Data were collected for gastric cancer samples and normal samples.Data were further organized by ID conversion, filtering, merging, correction and clinical information.Demography, histology, tumor stage and TNM of the STAD were obtained (Supplementary Table 1 and supplementary file 1).Pre-filtered low count genes (rowMeans(data) > 1) were used to preprocess RNA sequencing profiles.We converted the downloaded data into count format through R, so as to analyze these samples and screen out differentially expressed RNAs.EdgeR (v.3.28.0) is an R package dedicated to analyzing DEGs (differentially expressed genes).Standard settings for DEGs were FDR < 0.01 and log fold change (FC) > 2. DElncRNA (differentially expressed lncRNA), DEmRNA (differentially expressed mRNA) and DemiRNA (differentially expressed miRNA) were visualized using the ggplot package.The Ensembl database was used to annotate genes in RNAseq, and we excluded RNAs not included in the Ensembl database.Afterwards, we extracted lncRNA expression profiles and mRNA expression profiles from the RNAseq expression matrix.

Construction of CeRNA regulatory network.
To further understand the role of DERNA (differentially expressed RNA) in gastric cancer, we constructed a ceRNA regulatory network between lncRNA-miRNA-mRNA to explore the relationship between them.Then, a ceRNA network was built using the hypergeometric test and correlation analysis to identify the differentially expressed miRNAs that can control lncRNAs and mRNAs (p < 0.05 as the filter threshold criteria).DElncRNAs, DEmiRNAs and DEmRNAs were incorporated into ceRNA regulatory network.We visualized the ceRNA regulatory network with Cytoscape.PPI network of mRNAs involved in the ceRNA network were generated by STRING 73 .www.nature.com/scientificreports/ was further conducted through the Gene ontology (GO) database 74 .The clusterProfiler (v.3.14.3) package was adopted to perform GO functional annotation and KEGG pathway analysis of mRNAs in the ceRNA network 75 .

GO analysis and KEGG pathway analysis.
Prognosis analysis of related genes.The clinical information of gastric cancer patients was downloaded from TCGA database and then we extracted the survival information of patients (survival time and survival status of patients) 76 .Subsequently, survival information with the expression matrix of RNAs in the ceRNA network was combined.To determine whether there is a relationship between biomarker expression and STAD survival, Kaplan-Meier (K-M) survival analysis was performed 77 .The next step was to find significant factors in the original Cox model that had prognostic significance and to incorporate those into the reduced Cox proportional hazard model 78 .In order to test the accuracy of the created multifactorial model, LASSO regression uses contraction to lower the data value to a particular point 79 .Ultimately, a nomogram was developed using the multivariate model to forecast the prognosis of STAD.The nomogram might make it possible for doctors to assign each biomarker a prognosis value based on how it expresses 80 .Indicators of prognosis and total 3-and 5 years survival may be found in the sum of the individual values.The accuracy of the nomograms was evaluated using receiver operating characteristic curve (ROC) analysis and calibration 81 .
Immune infiltration in STAD.Gene expression data were used by CIBERSORT to estimate the proportion and abundance of different types of immune cells in tumor and normal groups 82 .To assess the percentages of 22 immune cell types in STAD samples, we used CIBERSORT in this instance.Only instances with CIBERSORT results of P ≤ 0.05 were considered for further analysis.Important immune cells in tumor samples were distinguished from non-malignant samples using Wilcoxon rank-sum 83 .Next, K-M survival analysis was used to see whether the number of particular immune cells was related to overall STAD survival 84 .Different immune cells were incorporated into a Cox proportional hazard model following LASSO regression analysis, and a nomogram for STAD prognosis prediction was created 85 .The bias and precision of the nomogram were evaluated using the concordance index of the Cox mode.

Association between selected RNAs and immune cells.
To examine the immune cells that are connected to survival, univariate Cox regression and Kaplan-Meier survival analysis were employed 77 .In parallel, multivariate Cox regression analysis and Lasso regression were used to create the final immune cell model 86,87 .An equation that forecasts medical outcomes was graphically represented as a nomogram.With nomograms, patients accumulate points based on the severity of their risk factors 85 .The fold difference in gene expression between tumor tissues and normal tissues served as the basis for the risk factors of two nomograms that we constructed for the study.Then, we forecast the patients' 1-, 3-, or 5 years survival rate.

Figure 2 .Figure 3 .
Figure 2. Prognosis-related genes were illustrated by the K-M survival evaluation.

Figure 4 .
Figure 4.The association of ceRNAs and survival.(A) KM analysis of the high-risk and low-risk groups.(B) Time-dependent receiver operating characteristic (ROC) analysis for OS prediction of prediction model.(C) Heatmap of seven differentially expressed genes.(D) Nomogram based on multiple Cox regression.(E) Calibration curve for 3 years survival.

Figure 5 .
Figure 5.The association of immune cells and STAD.(A) The percentage of 22 immune cell subpopulations in STAD patients.(B) The fraction of immune cells in STAD patients.(C) Heatmap of immune infiltration between normal and tumor groups.

Figure 6 .
Figure 6.The correlation of immune cells and STAD.(A) T cells CD4 memory activated and survival in STAD.(B) Co-expression patterns among fractions of immune cells.

Figure 7 .
Figure 7. Correlation between immune cells and clinical features.

Figure 8 .
Figure 8.The association of immune cells and survival in STAD.(A) Survival probability of high and lowrisk groups (left).AUC curve at 1, 3, 5 years (right).(B) Nomogram for predicting patients' outcome based on immune cells (left).Calibration curves for assessing the discrimination and accuracy of the nomogram (right).(C-D) The association between immune cells and prognostic genes.

Figure 9 .
Figure 9.The association of immune cells and survival in STAD.(A) Heatmap of the immune cells (monocytes and neutrophils) in high-risk and low-risk groups.(B) correlation matrix of immune cells and genes.