Establishment of the prognostic index of lung squamous cell carcinoma based on immunogenomic landscape analysis

The incidence of lung squamous cell carcinoma (LUSC) increased substantially in recent years. Systematical investigation of the immunogenomic pattern is critical to improve the prognosis of LUSC. Based on the TCGA and GEO dataset, we integrated the immune-related genes (IRGs) expression profile and the overall survival (OS) of 502 patients with LUSC. The survival-related and differentially-expressed IRGs in LUSC patients were evaluated by univariate cox regression and LASSO regression analysis. By applying multivariate cox analysis, a new prognostic indicator based on IRGs was established. We also used CIBERSORT algorithms and TIMER database to analyze immune infiltration of LUSC. Both gene set enrichment analysis (GSEA) and principal component analysis (PCA) was carried out for functional annotation. With the assist of computational biology, we also investigated the latent properties and molecular mechanisms of these LUSC-specific IRGs. We analyzed the correlation between immune checkpoints and risk score. A novel prognostic model was established based on 11 IRGS, including CXCL5, MMP12, PLAU, ELN, JUN, RNASE7, JAG1, SPP1, AGTR2, FGFR4, and TNFRSF18. This model performed well in the prognostic forecast, and was also related to the infiltration of immune cells. Besides, the high-risk groups and the low-risk groups exhibited distinct layout modes in PCA analysis, and GSEA results showed that different immune status among these groups. In summary, our researches screened out clinically significant IRGs and proved the significance of IRG-based, individualized immune-related biomarkers in monitoring, prognosis, and discern of LUSC.

was included in the treatment guidelines for multiple cancers [10,11]. T cell is an important component of tumor immunity [12]. The standard treatment of immunotherapy is to promote T cell functionality in tumors [13], and the studies on immunotherapy focus on the recruitment of cancer-infiltrating T cells [14]. In lung cancer, cancer-infiltrating CD4 + T cells have a vital impact on the immune response [15]. CD4 + T cells were reported to recruit CD8 + T cells to the tumor site [16] and infect mucosa [17]. In addition, they were necessary to inhibit angiogenesis at the tumor sites [18,19]. Recently, several immune checkpoint inhibitors were found to enhance cytotoxic competence by targeting PD-1 ligand 1 (PD-L1), cytotoxic T lymphocyte antigen-4 (CTLA-4), and programmed cell death protein 1 (PD-1). They also had significant clinical effects on LUSC [20]. PD-1 antibodies, Nivolumab and Pembrolizumab, as well as PD-L1 antibody Atezolizumab, were allowed for NSCLC therapy [21,22]. With the development of immune therapy, the relationship between immune cell and tumor has become a hot topic [23,24]. The prognostic value of IRGs was comprehensively explored to utilize personalized immune signals for optimal prognostic evaluations in non-squamous NSCLC patients [25]. However, the prognostic significance and clinical correlation of IRGs in LUSC remain to be explored.
We combined clinical information with IRG expression profiles to evaluate the OS of LUSC patients. The prognostic landscape and expression status of IRGs were systematically analyzed, and individual prognostic characteristics for patients with LUSC were developed. We found that 11 IRGs were significantly correlated with prognosis, and established a new independent prognostic model based on these genes. This model also well predicted immune cell infiltration in LUSC. Our study provided a potential model and biomarkers for further immune-related work and personalized medicine for the treatment of LUSC.

Data collection and processing
The RNA-seq FPKM data of LUSC, containing corresponding clinical data, were downloaded from the TCGA [26], which included 502 LUSC tissues and 49 normal tissues. The dataset (GSE73403) on LUSC with survival data was downloaded from the GEO database as a validation set. This dataset contained 69 tumor samples. IRG list in the ImmPort database has been exported [27]. These genes have been identified as active participants in immune processes. We then screened the immune genes shared by TCGA and GEO datasets. The differentially expressed genes (DEGs) in LUSC and its adjacent normal tissues were analyzed by the R software limma package. The log2 | fold change | > 1 and false discovery rate (FDR) < 0.05 were set as the cut-off value.

Gene ontology and KEGG pathway analysis
To verify whether the DEGs were related to immune, GO and KEGG enrichment analysis were used. First, the org. Hs.e.g.db package was used to convert the gene symbol into entrezID. Then, GO and KEGG enrichment analysis were performed using the clusterProfiler package. P < 0.05 was considered as statistical significance. Finally, the GOplot package was used to draw the bar chart of GO and circle diagram of KEGG.

Univariate COX analysis and LASSO analysis
To get survival-and immune-related genes, we integrated the expression of IRGs with the OS of LUSC patients. IRGs were then analyzed by univariate COX regression analysis with continuous variables (P < 0.05). These survival-and immune-related genes were integrated into least absolute shrinkage and selection operator (LASSO) regression, which was calculated by the glmnet package of R software with 1000 runs [28].

Survival analysis
The patients whose follow-up time was less than 30 days were removed from the survival analysis. 470 patients were analyzed. Multivariate survival analysis was performed to inspect the overall effect of the IRGs on prognosis using the R software survival package. Finally, the prognostic model of LUSC was established based on the multivariate co-efficiency multiplied by expression data. The formula was as follows: The survminer package of R software was used to apply the Kaplan-Meier curve to investigate the connection amid IRGs and prognosis. Univariate analysis and multivariate analysis were used to explore independent prognostic factors of LUSC patients. Survival ROC R Software package was used to calculate the area under the curve (AUC) to verify the manifestation of prognostic characteristics [29]. In addition, we drew a nomogram including the clinical factors and risk scores. The calibration curve and ROC curve were painted to illustrate the accurateness of this model in predicting the survival of LUSC patients.

Validation of the immune-related genes
To investigate the expression of IRGs in distinct cancers, the Oncomine database was utilized to analyze the Risk score = αgene(a) × gene expression(a) + αgene(b) × gene expression(b) + · · · + αgene(n) × gene expression(n). expression levels of the hub gene in tumor tissues and normal tissues. The Human Protein Atlas database was used to verify the protein function of IRGs by immunohistochemistry. The correlations between IRGs and clinical factors were also analyzed.

Molecular characteristics of immune-related genes
We recognized the mutations in IRGs through Cbioportal database [30][31][32]. Cistrome cancer database is a resource for predicting transcription factor (TF) targets and enhancers in cancer [33]. We downloaded tumorrelated TFs from this database and acquired differential expression TFs (log2 | fold change | > 1 and FDR < 0. 05). Through correlation analysis (corFilter > 0.4 and P < 0.001), the association between IRGs and TFs were established. Then we utilized Cytoscape to visualize this relationship.

Analysis of the difference between high-risk and low-risk patients
We built the immune-related gene-based prognostic index (IRGPI) on the basis of the multivariate cox regression coefficient multiplied by expression data. According to the median PI value, the patients were classified into the high-risk group and low-risk group. Principal component analysis (PCA) was utilized to analyze the grouped samples and expression patterns and gene set enrichment analysis (GSEA) was carried out to evaluate different functional phenotypes between low-risk and high-risk groups [34].

Analysis of the relationship between immune cell infiltration and immune-related genes
Based on all the genes of two cohorts, we used the CIB-ERSORT software package to evaluate the proportion of 22 leukocyte subtypes. The perm was set to 1000. The samples with P < 0. 05 in the results of CIBERSORT analysis were delivered for further investigate. Mann-Whitney U test was performed to contrast the difference of similar leukocyte subtypes between the low-risk group and the high-risk group. In addition, we also used TIMER database to calculate the relationship between IRGs and  immune cell infiltration. TIMER reanalyzed gene expression data to assess the infiltrating levels of 6 immune cell subtypes, including CD4 + T cells, B cells, CD8 + T cells, neutrophils, macrophages, and dendritic cells. Therefore, it could be utilized to validation the connections between hub IRGs and immune cell infiltration. To explore the relationship between IRGs and immune checkpoints, we calculated the correlation between IRGs and 30 immune checkpoint genes.

Identification of differentially expressed IRGs
According to the list of IRGs from the ImmPort database, 355 differentially expressed IRGs were identified, containing 135 upregulated and 220 downregulated ( Fig. 1a, b). The results of GO analysis and KEGG analysis confirmed that the differential genes were related to immune (Fig. 1c, d). Through univariate COX regression analysis, 42 differentially expressed IRGs (P < 0.05) were notably associated with clinical outcomes. Then we used LASSO regression analysis to select these survival-related IRGs. As illustrated in Fig. 2a, b, Twentyone IRGs were involved in the classifier.

Transcriptional factor regulatory network
To investigate the latent regulatory mechanism of these IRGs expressions, we obtained differentially expressed TFs between LUSC and normal tissues using data downloaded from the Cistrome database. 111 differentially expressed TFs were identified (Fig. 2c, d). We built an interaction network on the basis of these 111 TFs and the 42 IRGs. The regulatory map showed the relationships between these IRGs and TFs (Fig. 2e).

Evaluation of clinical outcomes
On the basis of IRGs, we built a prognostic model through multivariate cox regression analysis. As shown in Fig. 3a, the hazard ratio of most genes was greater than 1, indicated that the high expression level of these genes implied poor prognosis. With the increased in the risk score, the number of deaths was also increased (Fig. 3b). The calculation formula of the risk score was shown as follows: According to this immune-related biomarker, the clinical outcome of high-risk and low-risk groups could be well distinguished in training and validation sets (Fig. 3c, d). The AUC value of ROC curve was 0.692, 0.702, 0.656 and 0.655, 0.551, 0.713 in training and validation set, respectively, which indicated that the prognostic features based on IRGs have moderate potential in survival monitoring (Fig. 3e, f ). Univariate and multivariate Cox regression analysis showed that the prognostic indexes were independent predictors after  adjusting for other parameters, for example, gender, tumor stage, age, metastasis, and lymph node ( Table 1).

Characteristics of survival-related IRGs
Identified survival-related IRGs have outstanding biomarker capacity and could be used to monitor prognosis. Most of these core IRGs were upregulated in LUSC samples (Table 2), and most of these genes were risk factors. While TNFRSF18 was defined as positive effectors. In the genetic alterations of these IRGs, deep deletion and amplification were the most common forms (Fig. 4a). CXCL5, PLAU, and FGFR4 was the gene that had the most genetic alternations. SPP1, PLAU, JUN, JAG1, CXCL5, and AGTR2 existed mutations in the protein functional domain (Fig. 4b), and these IRGs mutations may affect the prognosis of LUSC patients (Fig. 4c).

Clinic correlation and nomogram of immune-related genes
The ggpubr package was applied to explore the connection of IRGs and clinical factors (Table 3). Except for CXCL5, JAG1, and SPP1, the rest of the IRGs were related to clinical factors. At the same time, we utilized IRGs together with clinical factors to draw a nomogram (Fig. 5a) and the calibration curve was drawn to verify the accuracy of the prediction model ( Fig. 5b-d). The predicted value fits well with the real value, suggesting that our model might be applied to prophesy the prognosis of LUSC patients. ROC was performed to measure the clinical effectiveness of the nomogram. For the 1-, 3-, and 5-year OS probability, the ROC curve showed that the combination of IRGs and other clinical factors were better than the model built only by IRGs (Fig. 5e).

Validation of the immune-related genes
Based on the HPA database, the function of IRGs was verified at the protein levels by immunohistochemistry (Fig. 6a). The results were accordant with our preceding research. CXCL5, ELN, JUN, and FGFR4 were highly expressed in normal tissues, while PLAU, RNASE7, JAG1, SPP1, and TNFRSF18 were highly expressed in tumor tissues. Oncomine analysis of tumor and normal tissues (Fig. 6b) showed that the expression patterns of some IRGs in LUSC were different from those in other tumors.

Immunocyte infiltration in the tumor microenvironment
To understand whether the immune genome exactly mirrored the condition of the LUSC immune microenvironment, we analyzed the connection between IRGs and immune cell infiltration by CIBERSORT algorithm and TIMER database. The proportion of 22 kinds of immune cells in LUSC was calculated through CIBERSORT algorithm. We found that resting memory CD4 + T cell, M0 macrophage, M2 macrophage, and neutrophil infiltration levels were higher in high-risk group. While T cell follicular helper, and CD8 + T cells were higher in low-risk group (Fig. 7a). The TIMER database was also applied to investigate the connection of the IRGs and immune cell infiltration. CD4 + T cells, CD8 + T cells, dendritic cells, neutrophils, and macrophages were positively related to IRGs ( Fig. 7b-g).

Immune status analysis for high-risk and low-risk groups
To study whether the LUSC patients could be distinguished properly based on our prognosis model, PCA analysis was utilized to explore the distinct distribution modes between the high-risk groups and low-risk groups. According to risk genes, the high-risk and lowrisk groups tend to be divided into two aspects (Fig. 8a). Based on the whole-genome sets and whole-IRGs, highrisk and low-risk groups did not show significant separation in immune status (Fig. 8b, c), while the model based on our risk genes could well distinguish the difference of immune status between high-and low-risk groups. The GSEA further validated the functional annotations and found that the high-risk groups had more immune responses than the low-risk groups (immune system process, NES = 2.08, FDR = 0.002; immune response, NES = 2.16, FDR < 0.001; Fig. 8d, e). These results were consistent with the results of immune cell infiltration, indicating that the high-risk scores were correlated with the enhanced immunophenotype.

Discussion
The importance of IRGs in cancer deterioration and immunotherapy has been accepted, but overall genomewide analysis is still to be investigated to explore the molecular mechanism and clinical significance. Our researches revealed the effects of IRGs on LUSC clinical significance and elucidated the molecular characteristics. these IRGs might be valuable clinical indicators. Personalized immune-related prognostic characteristics on the basis of selective, differentially expressed IRGs were raised to evaluate potential clinical outcomes and measure immune cell infiltration.
To establish a suitable and simple scheme to observe the immune status of LUSC patients and imply clinical outcomes, we built an IRGs-based prognostic index. With the consequences of multivariate regression analysis, the prognostic indexes based on 11 IRGs (CXCL5, MMP12, PLAU, ELN, JUN, RNASE7, JAG1, SPP1, AGTR2, FGFR4, and TNFRSF18) were established. Patients with high-risk values have a bad prognosis, whose survival time was shortened with increased risk values. Moreover, univariate COX and multivariate COX regression analysis illustrated that the prognostic signature based on these IRGs might be applied as independent prognostic factors. We also constructed a nomograph composed of IRGs and other clinical factors to predict the OS. Our studies suggested that IRGs could be used as prognostic markers and indexes of immune status.
The TF-IRG regulatory network based on CHIP-SEQ and co-expression will assist to guide and inform the analysis of future mechanisms. The mechanism and function of RNASE7, and ELN had not been reported in lung cancer. The other 9 IRGs, including MMP12, PLAU, JUN, TNFRSF18, JAG1, FGFR4, AGTR2, CXCL5, and SPP1. CXCL5, FGFR4, and PLAU were reported to be a potential prognostic factor in LUSC [35]. Ella et al. suggest that overexpression of MMP12 promotes invasion and migration of lung cancer cells [36]. Chang et al. found that the ectopic expression of JAG1 in lung cancer cells enhances cell migration, invasion and metastasis in vivo and in vitro [37]. It was reported that SPP1 could not only be used as a prognostic biomarker of lung cancer but also play a role in mediating macrophage polarization and immune escape [38,39]. In small cell lung cancer, TNFRSF18 has been found to bind to its receptor and induce apoptosis [40]. The complex of dTAT-AGTR2-Ca2 + could inhibit the growth of Lewis lung carcinoma in mice [41]. Yang et al. discovered that JUN played a role in the inhibition of growth and apoptosis of NSCLC by PS-341 [42]. However, these studies offered a finite message on the mechanism of 11 IRGs in the survival of LUSC patients.
To establish a suitable and simple scheme to observe the immune status of LUSC patients and imply clinical outcomes, we established an immune-based prognostic index. Shi et al. investigated DNA methylation profiling and put forward potential diagnostic biomarkers for LUSC [43]. Chen et al. investigated the roles of IRGs in deterioration of lung cancer and indicated the distinct between LUAD and LUSC from the perspective of the immune response [44]. Several researchers have put forward a prognostic marker for survival prediction in patients with LUSC [45][46][47]. Our study suggested that IRGPI could be used as a prognostic marker. In addition, it could also be utilized as an index of immune status.
On the basis of 11 IRGs in LUSC, this prognostic indicator showed satisfactory clinical feasibility. The tool could adjust the treatment plan relatively quickly according to the level of immune cell infiltration reflected by IRGPI. The results of TIMER database showed that these genes were positively related to the infiltration of  Fig. 8 The high-and low-risk groups showed different distribution patterns and gene-set enrichment analysis. a PCA of the high-and low-risk groups based on the 11 risk genes. b PCA of the high-and low-risk groups based on the whole immune-genome set. c PCA of the high-and low-risk groups based on the whole genome set. d Immune system process and e immune response enrichment analysis by GSEA CD4 + T cells, CD8 + T cells, dendritic cells, neutrophils, and macrophages. However, the results of CIBERSORT showed that CD8 + T cells infiltrated more in the low-risk group, which was different from the TIMER database, resulting from the difference between the 2 algorithms. TIMER quantified 6 kinds of immune cells, but it was different from CIBERSORT (CIBERSORT analysis: the total proportion of 22 kinds of immune cells added to 100%). TIMER did not standardize the predicted value to 1. Therefore, the results could not be interpreted as cell fraction or compared in different data sets. The imbalance of immune cell composition was associated with the survival rate and bad prognosis of cancer patients [48]. Our findings expanded and confirmed the discovery that the infiltration level of the immune cells was crucial for the progression of LUSC. The result indicated that these IRGs had the capacity to be a predictor of increased infiltration of immune cells, which was consistent with previous reports. Nevertheless, the function of the infiltration immune cell of LUSC was still unknown. Our initial observations offered an opinion for investigating this issue, and further study was required in the future.
Interestingly, we found that the risk score was not only associated with immune cell infiltration, but also correlated with the expression of immune checkpoint genes. CD276, PDCD1LG2 (PD-L2) were a member of the B7 transmembrane glycoprotein family, and their expression were associated with poor prognosis and tumor immune escape of NSCLC [49][50][51]. VTCN1 also belonged to the B7 family, but its expression was not related to B cell and T cell infiltration in lung cancer [52]. It was reported that TNFSF14, a member of Tumor necrosis factor superfamily, played an important role in Osteolytic Bone Metastases of NSCLC patients [53]. HAVCR2, also known as TIM-3, was mainly distributed in NK cells and macrophages in NSCLC, which could suppress anti-tumor immunity [54]. Franz et al. found that VSIR was related to the increase of lymphocyte infiltration in tumor microenvironment, specific gene mutation and prognosis of NSCLC patients [55]. The function of NT5E was hydrolyzed Fig. 9 Immune checkpoint. a Heatmap of 30 immune checkpoints between low-and high-risk groups. Analysis of the correlation between risk score and immune checkpoints. b VTCN1. c ENTPD1. d ICOS. e TNFRSF18. f VSIR. g CD276. h NT5E. i PDCD1LG2. j TNFSF14. k HAVCR2 extracellular nucleotides, overexpression of which gene could inhibit anti-tumor immune response and contribute to proliferation, angiogenesis, and metastasis [56]. With the increase of risk score, the gene expression in most immune checkpoints increased, which was consistent with the poor prognosis of patients in the high-risk group. Meanwhile, the treatment of these immune checkpoints may improve the prognosis of LUSC patients.
However, the current study had some shortcomings, which ought to be taken into consideration when explaining our results. First of all, transcriptome analysis could only reflect certain aspects of the immune state, but not global changes. Secondly, the verification with another independent queue was lacked. At the last, our results also required validation of in vivo and in vitro experiments.

Conclusion
On the basis of gene sets downloaded from the TCGA and GEO database, we utilized LASSO regression analysis and univariate cox regression analysis in R to screen IRGs associated with the prognosis of LUSC patients. A prediction model was constructed based on 11 IRGs (CXCL5, MMP12, PLAU, ELN, JUN, RNASE7, JAG1, SPP1, AGTR2, FGFR4, and TNFRSF18). This model also well predicted immune cell infiltration in LUSC. Our study provided a potential model and biomarkers for further immune-related work and personalized medicine for the treatment of LUSC.