Comprehensive analysis of competitive endogenous RNA associated with immune infiltration in lung adenocarcinoma

To identify the prognostic biomarker of the competitive endogenous RNA (ceRNA) and explore the tumor infiltrating immune cells (TIICs) which might be the potential prognostic factors in lung adenocarcinoma. In addition, we also try to explain the crosstalk between the ceRNA and TIICs to explore the molecular mechanisms involved in lung adenocarcinoma. The transcriptome data of lung adenocarcinoma were obtained from The Cancer Genome Atlas (TCGA) database, and the hypergeometric correlation of the differently expressed miRNA-lncRNA and miRNA-mRNA were analyzed based on the starBase. In addition, the Kaplan–Meier survival and Cox regression model analysis were used to identify the prognostic ceRNA network and TIICs. Correlation analysis was performed to analysis the correlation between the ceRNA network and TIICs. In the differently expressed RNAs between tumor and normal tissue, a total of 190 miRNAs, 224 lncRNAs and 3024 mRNAs were detected, and the constructed ceRNA network contained 5 lncRNAs, 92 mRNAs and 10 miRNAs. Then, six prognostic RNAs (FKBP3, GPI, LOXL2, IL22RA1, GPR37, and has-miR-148a-3p) were viewed as the key members for constructing the prognostic prediction model in the ceRNA network, and three kinds of TIICs (Monocytes, Macrophages M1, activated mast cells) were identified to be significantly related with the prognosis in lung adenocarcinoma. Correlation analysis suggested that the FKBP3 was associated with Monocytes and Macrophages M1, and the GPI was obviously related with Monocytes and Macrophages M1. Besides, the LOXL2 was associated with Monocytes and Activated mast cells, and the IL22RA1 was significantly associated with Monocytes and Macrophages M1, while the GPR37 and Macrophages M1 was closely related. The constructed ceRNA network and identified Monocytes, Macrophages M1 and activated Mast cells are all prognostic factors for lung adenocarcinoma. Moreover, the crosstalk between the ceRNA network and TIICs might be a potential molecular mechanism involved.

www.nature.com/scientificreports/ genes and the interactions between protein and genes [6][7][8] . Moreover, studies have recently demonstrated that the ceRNA regulatory networks is a prognostic indicator in different diseases, such as cardiovascular diseases, diabetic cataract, myocardial infarction, and some malignancies, including myeloid leukemia, bladder cancer and pancreatic adenocarcinoma 9,10 .
Recently, studies have identified that the tumor immune microenvironment (TIME) plays a vital role in tumorigenesis, metastasis and could also critically influence therapeutic response. Tumor infiltrating immune cells (TIICs) which are regulated by different cytokines and chemokines consist the main regulators of tumor biology in the tumor microenvironment (TME) 11,12 . Previous study shows that TIICs, especially the tumor-associated macrophage, are associated with oncological outcomes and drug response to Bacillus Calmette-Guerin (BCG) treatment in non-muscle-invasive bladder cancer 13 . Besides, it has also been shown that the TIICs could have effect on the ceRNA network by regulating the expression of non-coding RNA directly or indirectly 14 .
In the current study, we detected the different expression of ceRNA in TCGA lung adenocarcinoma and calculated the TIICs by CIBERSORT algorithm. Then, we built nomograms based on the prognosis model, which can be used to assess prognosis for lung adenocarcinoma patients. Moreover, the correlation analysis between the prognostic members in the ceRNA and prognostic TIICs was performed, our results might render to clarify the mechanism of the occurrence and development of lung adenocarcinoma combined with tumor immune microenvironment regulation.

Methods
Data collection and identification of differently expressed RNAs. We downloaded all the miRNA, lncRNA and mRNA expression profiles and relevant clinical data such as pathological factors, and the survival outcome of the lung adenocarcinoma cohort from The Cancer Genome Atlas (TCGA) (https:// portal. gdc. cancer. gov/), including 497 cancer cases, 54 cases of adjacent normal tissues and 486 cases of relevant clinical data. Another independent validation sets GSE14814 15 with 71 LUAD samples and common clinicopathological characteristics were obtained from the Gene Expression Omnibus (GEO) database (https:// www. ncbi. nlm. nih. gov/ geo/ query/ acc. cgi? acc= GSE14 814).
The differently expressed RNAs were screened by using the R package 'edgeR 16 ' , | Log (fold change) |≥ 1.0 and p-value < 0.05 were set as the cut-off. To visualize the differently expressed RNAs, we used the R package 'ggplot2' .
Construction of the ceRNA network and functional enrichment analysis. The hypergeometric correlation of the differently expressed miRNA-lncRNA and miRNA-mRNA were analyzed based on the starBase 17 (http:// starb ase. sysu. edu. cn/) information by using the R package 'GDCRNATools' 18 , p-value < 0.05 were set as the cut-off. Then Cytoscape 19 v3.8.2 was performed to visualize the ceRNA network of miRNA regulated the lncRNAs and mRNAs. The gene ontology (GO) analysis and the Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed by the R package 'clusterProfiler 20 ' , the false discovery rate (FDR) and p-value < 0.05 was set as the cut-off.
Construction of prognostic prediction model. Univariate Cox hazards regression analysis was used to preliminary screen the candidate genes from the ceRNA network, p-value < 0.05 were set as the cut-off. Lasso regression analysis 21 and multivariate Cox hazards regression analysis 22 were performed to further verify the prognostic value, p-value < 0.05 were set as the cut-off. The multivariate Cox hazards regression coefficients were set as the β, and we calculated each case risk score with the formula: risk score = β gene [1] × Expression gene [1] + β gene [2] × Expression gene [2] + … + β gene [n] × Expression gene [n] . Patients above the median risk score would be divided into the high-risk group, and the rest would be divided into the low-risk group. The ROC curves and K-M survival curves were utilised to evaluate the diagnostic efficacies. Finally, we constructed a nomogram to predict the prognosis of lung adenocarcinoma patients and the calibration curves were utilised to access the accuracy.
Estimation of the immune cells in the lung adenocarcinoma. CIBERSORT algorithm 23 was performed to estimate the component of the total immune cells in the lung adenocarcinoma and adjacent normal tissues, p-value < 0.05 were set as the cut-off. To visualize the composition of immune cells in the case, we used the R package 'corplot' , 'heatmap' and 'viopot' . Correlation analysis. We performed Cox hazards regression analysis and Lasso regression analysis to identify the survival related immune cells, p-value < 0.05 were set as the cut-off. Then we constructed a nomogram to predict the prognosis of the lung adenocarcinoma patients. The ROC curves and calibration curves were utilised to access the accuracy. Correlation analysis was performed for each key gene in the ceRNA network and each survival related immune cell with the Pearson method.

Result
Identification of the differently expressed miRNAs, lncRNAs and mRNAs. All

Identification of key genes and construction of prognostic prediction model. All clinicopatho-
logical characteristics of patients in TCGA lung adenocarcinoma were summarized in Table 1. Univariate Cox hazards regression analysis detected 45 candidate genes from the ceRNA network (Supplementary Table) and the lasso regression analysis filtered 12 genes were reliable for prediction model, p-value < 0.05 set as the cut-off (Fig. 3A,B). Then multivariate Cox hazards regression analysis identified 6 genes (FKBP3, GPI, LOXL2, IL22RA1, GPR37, has-miR-148a-3p) were as the key members for constructing the prognostic prediction model in the in the network, p-value < 0.05 set as the cut-off (Fig. 3C). Based on the model, a nomogram was constructed and a good discrimination of the model was validated by the calibration curves (Fig. 3D,E). Kaplan-Meier curves validated these 6 key genes (Fig. 4A-F) and the model were significantly associated with overall survival, p-value < 0.05 set as the cut-off (Fig. 4G). The ROC curves showed that the area under curve (AUC) at 1 year was 0.677, at 3 year was 0.705, and at 5 years was 0.709 (Fig. 4H). We used GSE14814 lung adenocarcinoma   Fig. 4G). Subgroup analysis classified by patients' age, gender, smoking status or clinic stage were subsequently conducted, the results revealed that the high-risk group in this model were significantly associated with worse prognosis among all the subgroups (Fig. 5A-D). The expression of the 6 key genes (FKBP3, GPI, LOXL2, IL22RA1, GPR37, has-miR-148a-3p) in lung adenocarcinoma patients with different tumor stages, T stages, N stages and M stages (Fig. 5E-J).

Comprehensive analysis of immune cells and prognostic value in lung adenocarcinoma. The
CIBERSORT algorithm estimated the component of immune cells in lung adenocarcinoma (Fig. 6A,B). Cox hazards regression analysis detected 3 immune cells (Monocytes, Macrophages M1, activated mast cells) were associated with the prognosis in lung adenocarcinoma (Table 2), and the lasso regression analysis validated the 3 immune cells were reliable for prediction as an immune cell model, p-value < 0.05 set as the cut-off (Fig. 7A,B). Based on the result, a nomogram was constructed (Fig. 7C). The ROC curves showed that the area under curve (AUC) at 1 year was 0.621, at 3 year was 0.604, and at 5 years was 0.585 (Fig. 7D). The discrimination of the 3 immune cells were validated by the calibration curves (Fig. 7E). The results showed that the AUC for network and network + TIICs is 0.705 and 0.706, respectively (Fig. 8A). Pearson correlation analysis indicated that the FKBP3 was negatively associated with Monocytes (R = − 0.13, p < 0.05) and positively associated with Macrophages M1 (R = 0.12, p < 0.05) (Fig. 8C). The GPI was negatively associated with Monocytes (R = − 0.18, p < 0.05) and positively associated with Macrophages M1 (R = 0.15, p < 0.05) (Fig. 8D). The LOXL2 was negatively associated with Monocytes (R = − 0.12, p < 0.05) and positively associated with activated mast cells (R = 0.11, p < 0.05) (Fig. 8E). The IL22RA1 was negatively associated with Monocytes (R = − 0.11, p < 0.05) and positively associated with Macrophages M1 (R = 0.21, p < 0.05) (Fig. 8F). Moreover, the GPR37 was positively associated with Macrophages M1 (R = 0.12, p < 0.05) (Fig. 8G). The correlation of all key genes and the prognostic immune cells were summarized in the heatmap (Fig. 8B).

Discussion
Lung cancer is a heterogeneous disease which is still a big challenge to treatment. Even a part of patients with early-stage tumors who received surgery ultimately develop recurrence 24 . In the process of tumor occurrence and progression, molecular components play an important role and are generally considered as potential prognostic factors 25 . Tumor infiltrating immune cells have been proved to be very important in the occurrence and development of tumors, and have shown the prognostic value of a variety of malignant tumors 26-28 .  29 . Previous studies have shown that the non-coding RNAs play a vital role in cancer occurrence and prognosis, and it has also been generally recognized that the interactions between lncRNA, miRNA and mRNA could regulate the expression levels of mRNAs and the affect the core protein signals, inducing the changes in physiological functions of cells 14 . In this study, we detected the differently expressed miRNAs, lncRNAs and mRNAs in the TCGA lung adenocarcinoma for the first and constructed a ceRNA network from these differently expressed genes.
To identify the key biomarker for prognosis, we performed Cox hazards regression analysis, Lasso regression analysis and survival analysis. Then, FKBP3, GPI, LOXL2, IL22RA1, GPR37 and has-miR-148a-3p were identified as the prognostic biomarkers that were significantly associated with overall survival of LUAD patients. In previous studies, these 6 key genes have been confirmed to play an important regulatory role in the progression of tumors. Zhu et al. have provided evidence that FKBP3 plays an essential role in proliferation and cell cycle progression of NSCLC and might be valued as a prognostic marker in NSCLC 30 . IL22RA1 is a receptor of interleukin 22, has been reported working as a signal transducer and activator of STAT3 signaling in the malignant transformation, it plays an important role in promoting tumor growth and metastasis and inhibiting cell apoptosis [31][32][33] . Besides, many studies have shown that the abnormal expression of LOXL2 in a variety of cancers is related to epithelial-mesenchymal transition, metastasis, poor prognosis, resistance to radiotherapy and chemotherapy, and tumor progression 34 . Consistent with our result, a previous study demonstrated that LOXL2 levels are positively www.nature.com/scientificreports/ associated with poor prognosis in NSCLC patients. In addition, GPR37 also has been shown as a poor prognostic factor in LUAD patients with TP53 mutation combined EGFR mutation 35 . Next, we estimated tumor-infiltrating immune cells through the CIBERSORT algorithm, and used Cox hazards regression analysis and lasso regression analysis to identify prognostic immune cells. Our results show that the Monocytes, Macrophages M1 and Activated mast cells identified were significantly related with survival. Previous studies proved that Monocytes and macrophages play an important role in the immune system, Monocyte activation and then macrophage differentiation is an important step for immune responses 36 . In the early stages of tumor progression, Monocyte's recruitment can be found in many types of cancer, and Monocytes induce direct killing of malignant cells through cytokine-mediated cell death and phagocytosis 37,38 . Macrophages M1 is one major type cell subtype of tumor-associated macrophages (TAMs) which are an important part of   39,40 . Cytokines secreted by TAMs induce anti-apoptotic programs in cancer cells, such as IL-6 derived from TAMs that mediate the resistance of solid tumors to many chemotherapies via activating TGF-β pathway [41][42][43] . Although Mast cells participate in the maintenance of healthy lungs through innate immunity and adaptive immunity 44 , there is evidence that mast cell activation leads to the proliferation of non-small cell lung cancer cells 45 . According to previous study, Kim et. al. proved that the IL-22 could promote the infiltrating immune cells, including macrophages, and then regulate the inflammatory processes 14 . Our correlation analysis revealed that the correlation between IL22RA1 with macrophage was significantly obvious suggesting that the IL22RA1 might be an immune-related gene in lung carcinoma. In addition, the correlation analysis revealed that the GPR37 were also associated with immune cells. Therefore, our results support that the crosstalk between the ceRNA network and tumor-infiltrating immune cells might play a vital role in the formation and progression of lung adenocarcinoma.
In conclusion, through comprehensive analysis, we constructed a ceRNA network and identified FKBP3, GPI, LOXL2, IL22RA1, GPR37 and has-miR-148a-3p as the prognostic members of the network, and then identified Monocytes, Macrophages M1 and activated Mast cells as the prognostic tumor-infiltrating immune cells, thus constructing two nomograms that can be used as indicators for evaluating the survival of lung adenocarcinoma patients. Moreover, our results also suggest the crosstalk between the ceRNA network and tumor-infiltrating immune cells to explore the molecular mechanisms involved in lung adenocarcinoma.