Development of an Immune Infiltration-Related Eight-Gene Prognostic Signature in Colorectal Cancer Microenvironment

Objective Stromal cells and immune cells have important clinical significance in the microenvironment of colorectal cancer (CRC). This study is aimed at developing a CRC gene signature on the basis of stromal and immune scores. Methods A cohort of CRC patients (n = 433) were adopted from The Cancer Genome Atlas (TCGA) database. Stromal/immune scores were calculated by the ESTIMATE algorithm. Correlation between prognosis/clinical characteristics and stromal/immune scores was assessed. Differentially expressed stromal and immune genes were identified. Their potential functions were annotated by functional enrichment analysis. Cox regression analysis was used to develop an eight-gene risk score model. Its predictive efficacies for 3 years, 5 years, overall survival (OS), and progression-free survival interval (PFI) were evaluated using time-dependent receiver operating characteristic (ROC) curves. The correlation between the risk score and the infiltering levels of six immune cells was analyzed using TIMER. The risk score was validated using an independent dataset. Results Immune score was in a significant association with prognosis and clinical characteristics of CRC. 736 upregulated and two downregulated stromal and immune genes were identified, which were mainly enriched into immune-related biological processes and pathways. An-eight gene prognostic risk score model was conducted, consisting of CCL22, CD36, CPA3, CPT1C, KCNE4, NFATC1, RASGRP2, and SLC2A3. High risk score indicated a poor prognosis of patients. The area under the ROC curves (AUC) s of the model for 3 years, 5 years, OS, and PFI were 0.71, 0.70, 0.73, and 0.66, respectively. Thus, the model possessed well performance for prediction of patients' prognosis, which was confirmed by an external dataset. Moreover, the risk score was significantly correlated with immune cell infiltration. Conclusion Our study conducted an immune-related prognostic risk score model, which could provide novel targets for immunotherapy of CRC.


Introduction
CRC, as a heterogeneous disease, is a common cause of cancer-related deaths worldwide [1]. TNM staging is usually considered to be one of the main tools for CRC prognosis [2]. However, the prognosis varies greatly among CRC patients with the same TNM stage, suggesting that the current TNM stage does not well provide complete prognostic information for CRC patients. Therefore, it is necessary to adopt a new strategy to increase the predictive efficiency of prognosis and survival outcomes of CRC patients.
Due to the considerable heterogeneity between CRCs, determination of the optimal treatment strategy at the individual level faces the large challenges. Thus, it is an urgent need to conduct robust models to identify high-risk CRC patients and to find novel molecular targets. In the tumor microenvironment (TME), stromal and immune cells are involved in the development of CRC [3,4]. Increasing evidence suggests that stromal and immune cells possess critical clinical significance for CRC. It has been reported that stromal cells can contribute to transcriptome and clinical features of CRC subtype [5]. Furthermore, stromal gene expression can more robustly predict the prognosis of CRC subtypes compared to epithelial tumor cells [6]. In a large cohort of CRC patients, infiltrating immune cell data could better predict patients' survival than histopathological methods [7]. Growing studies have found that infiltrating immune cells are involved in chemoresistance [8] and metastasis [9]. Thus, it is essential to further analyze the biological characteristics of stromal and immune genes and to determine their prognostic value for CRC patients. However, there is a lack of stromal and immune scores that can predict CRC patients' prognosis based on multiple clinical factors. Moreover, robust prognostic models on the basis of stromal and immune scores are also lacking.
In this study, we established a reliable prognostic immune-related risk score for CRC. Our results could offer novel insights for prediction of CRC patients' prognosis and development of individualized immunity therapy strategies.  Table 1. Survival information including OS status, OS time, progression-free survival (PFS) status, and PFS time was derived from the pan-cancer on the GDC website (https://gdc.cancer.gov/about-data/publications/ PanCan-Clinical-2018). Furthermore, mutation data (including BRAF, KRAS, and TP53) were from CRC MuTect. An overview of the workflow is shown in Figure 1.

2.2.
Estimation of Stromal/Immune Scores. ESTIMATE algorithm was used to calculate the stromal/immune scores on the basis of unique expression profiles of stromal/immune cells by the ESTIMATE package in R (https://bioinformatics.mdanderson.org/estimate/) [10].

Kaplan-Meier Survival Analysis.
According to the optimal cutoff of stromal/immune scores, CRC samples were classified into high and low stromal/immune score groups. Kaplan-Meier plot of overall survival between the two groups was analyzed, and the results were evaluated by log-rank test.

Correlation between Clinical Characteristics and
Stromal/Immune Scores. To probe into the clinical significance of stromal/immune scores, we analyzed the correlation between clinical characteristics (including pathologic T, pathologic N, pathologic M, and tumor stage) and stromal/immune scores.
2.5. Differential Expression Analysis. Differential expression analysis between high and low stromal/immune score groups was carried out using the edgeR package in R, following the screening criteria of |log2 fold change ðFCÞ | >1 and FDR ðadjusted p valueÞ < 0:05. Then, up-or downregulated stromal/immune genes were intersected by the VennDiagram package in R, respectively.
2.6. Functional Enrichment Analysis. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of differentially expressed stromal and immune genes were carried out through the cluster-Profiler package in R [11]. GO analysis contains three terms, cellular component (CC), molecular function (MF), and biological process (BP). p value after adjustment < 0.05 was significantly enriched.
2.7. Protein-Protein Interaction (PPI) Analysis. PPI analyses of differentially expressed stromal and immune genes were carried out via The Search Tool for the Retrieval of Interacting Genes (STRING, https://string-db.org/; version 11) [12]. Then, the PPI network was visualized through Cytoscape (version 3.7.2) [13].

Univariate and Multivariate Cox Regression Analyses.
Univariate cox regression analysis of differentially expressed  [14]. Spearman's correlation between the risk score and the infiltrating levels of immune cells was evaluated through the psych package in R. Furthermore, we also assessed the correlation between the genes in the risk score and marker genes of immune cells. The strength of correlation followed the criteria: 0:7 ≤ | r | ≥1 suggested a high correlation, 0:3 ≤ | r | < 0:7 suggested a moderate correlation, and 0 < | r | <0:3 suggested a weak correlation [15].

Results
3.1. Immune Score Is in Significant Association with Prognosis and Clinical Features of CRC Patients. According to the optimal cutoff of stromal/immune scores, the CRC patients were divided into two groups. Kaplan-Meier OS analysis results showed that patients with high stromal score had shorter OS time than those with low stromal score; however, it was not statistically significant ( Figure 2(a); p value = 0.057). As depicted in Figure 2(b), we found that patients with low immune score implied a poor prognosis (p value = 0.0094). Furthermore, we analyzed the correlation between stromal/immune scores and clinical features. As depicted in the results, stromal score was not significantly associated with pathologic T (Figure 3

Identification of Differentially Expressed Stromal and
Immune Genes for CRC. We analyzed differentially expressed genes (DEGs) with |log 2FC | >1 and FDR < 0:05 between the high and low stromal/immune score groups. As volcano plots, there were 1197 up-and 28 downregulated stromal genes in the high stromal score group (Figure 4(a)). Furthermore, 899 immune genes were upregulated and eight immune genes were downregulated in the high immune score group (Figure 4(b)). Hierarchical clustering analysis           . We further performed functional enrichment analysis of these common stromal and immune genes. These genes were significantly correlated with inflammatory biological processes like regulation of inflammatory response and pathways such as cytokinecytokine receptor interaction and chemokine signaling pathway (Figures 5(a)-5(d)). As shown in the PPI network, COL6A2, COL6A1, COL5A2, C1S, and C1R were the top five genes, which were considered hub genes ( Figure 5(e)).

3.3.
Construction of an Eight-Gene Prognostic Signature for CRC. Among 738 differentially expressed stromal and immune genes, 23 genes were significantly associated with CRC patients' prognosis according to univariate Cox regression analysis results. Of them, 20 genes were risk factors, and the remaining three (CCL22, CPA3, and MMP1) were protective factors ( Table 2). These genes were used for multivariate Cox regression analysis. Finally, an eight-gene signature was constructed for CRC. The risk score was calculated on the basis of the coefficients and expression values of the eight genes. All CRC patients were divided into two groups in accordance with the median value of risk score ( Figure 6(a)). Heat maps depicted the difference in expression patterns of the eight genes (including CD36, KCNE4, CPT1C, SLC2A3, RASGRP2, NFATC1, CCL22, and CPA3) 1 1 1 1 1 1 1 1 1 1 1 1  COL3A1   WISP1 PRELP DCN

The Eight-Gene Model Is in Significant Correlation with
Immune Cell Infiltration. The correlation between the model and the infiltrating levels of six immune cells was analyzed. We found that the model was in significant association with the infiltrating levels of six immune cells, including B cell (Figure 10 Table 1). These results suggested that the model could be in association with immune cell infiltration.
3.6. Validation of the Risk Score Using an Independent Dataset. Based on 566 CRC samples from the GSE39582 dataset, the prognostic value of the risk score was validated. The risk score distribution and survival status of CRC patients are shown in Figure 12(a). Heat maps showed the expression differences of CD36, KCNE4, CPT1C, SLC2A3, RASGRP2, NFATC1, CCL22, and CPA3 between the high and low risk scores (Figure 12(b)). As expected, CRC patients with high risk score had a poorer prognosis than those with low risk score (Figure 12(c)). Among the eight genes, CD36, NFATC1, and CCL22 were significantly associated with prognosis of CRC patients (Figure 12(d)). AUCs of the model for 3 years and 5 years were 0.65 and 0.66, respectively (Figure 12(e)), indicating that the risk score could well predict CRC patients' prognosis. The expression levels of CCL22 (Figure 13(a)), CD36 (Figure 13(b)), CPA3 (Figure 13(c)), CPT1C (Figure 13(d)), KCNE4 (Figure 13(e)), NFATC1 (Figure 13(f)), RASGRP2 (Figure 13(g)), and SLC2A3     ( Figure 13(h)) between the high risk score and low risk score were validated based on the 566 CRC samples. Univariate Cox regression analysis results showed that age, KRAS mutation, pathologic T, pathologic N, pathologic M, tumor stage, and risk score were notably associated with CRC patients' prognosis. After multivariate Cox regression analysis, we found that age, KRAS mutation, pathologic M, and risk score could be independent prognostic factors for CRC (Table 4).

Discussion
In TME, stromal and immune cells are involved in the development of CRC. In this study; using the ESTIMATE algorithm, stromal and immune scores were calculated. A significant correlation between the immune score and CRC patients' prognosis was observed. Both the stromal score and immune score were in significant correlation with clinical characteristics of CRC patients. Furthermore, we identified differentially expressed stromal and immune genes for CRC. Functional enrichment analysis results suggested that these genes were positively related with immunerelated pathways like cytokine-cytokine receptor interaction [16,17] and chemokine signaling pathway [18,19].

BioMed Research International
Individual prognosis for CRC patients varies widely. Individual genes often cannot accurately predict the prognosis of patients with CRC. Genes in most prognostic risk scores are screened via differential expression analyses [20][21][22]. Yet, there are few prognostic models associated with CRC immune infiltration. Therefore, in this study, we selected eight differentially expressed stromal and immune genes related to prognosis for constructing a risk score model. However, focusing only on CRC-related immune-related genes may limit its clinical value. For this reason, through multivariate regression analysis, after adjustment of the clinical characteristics of CRC, we assessed the association between the risk score and CRC prognosis. The results showed that the model may become an independent prognostic factor for CRC. Our risk score exhibited well efficiency in predicting CRC patients' prognosis. Therefore, the risk score model possessed potential prognostic value, which was confirmed using an independent dataset. Among the eight genes in the model, both in the discovery and independent datasets, CCL22 was a protective factor of CRC, while CD36 and NFATC1 were two risk factors of CRC. However, other genes exhibited inconsistent results in the two datasets. This is partly due to the heterogeneity of the samples in the two datasets. Patients in the same pathological stage have different prognosis. Both in the discovery and independent datasets, CCL22 and CPA3 were lowly expressed and KCNE4, NFATC1, and SLC2A3 were highly expressed in the high-risk samples compared to the low-risk samples. However, there were inconsistent results about other genes between the high-and low-risk samples in the two datasets, partly due to the heterogeneity of the samples, different sequencing platforms, different background correction and normalization methods and so on. Thus, it is unreliable to predict CRC patients' prognosis by an individual gene. However, our risk score composed of these genes may accurately suggest the patient's prognosis.
As described in a previous study, high CCL22 expression was found in CRC tissues [23]. Recent study has found that CCL22 secreted by M2 macrophages could mediate CRC 5-FU-mediated chemoresistance [24]. Furthermore, it has been reported that CCL22 was in significant correlation with the infiltrating levels of different T cell subsets for CRC [25]. Our results showed that CD36 was significantly downregulated in CRC tissues compared to normal tissues, which was validated in vitro and in vivo [26]. Genome-wide DNA methylation analysis revealed that hypermethylation of CD36 could contribute to its low expression [27]. Fang et al. found that CD36 expression gradually decreased from adenoma to cancer and CD36 loss implied a poor prognosis in patients with CRC [28]. NFATC1 was deregulated in CRC tissues, which was consistent with previous findings [29]. In vitro, its overexpression significantly promoted CRC cell invasion and metastasis [30]. Kumar et al. reported that NFATC1 indicated poor survival outcomes of CRC patients [31]. High SLC2A3 expression was observed in CRC tissues and its high expression indicated a poor prognosis, consistently with previous research [32,33]. Furthermore, downregulated CPA3 and RASGRP2 and upregulated CPT1C and KCNE4 were found in CRC tissues, which implied poor prognosis.
As for immune cell infiltration, we found that the eight genes in the risk score model were moderately correlated with the infiltering levels of CD4+T cell, dendritic cell, macrophage, and neutrophil. It has been confirmed that TME affects the efficacy of immunotherapy, and immune cells in TME possess predictive value for immunotherapy treatment [34][35][36]. Increasing genes have been shown to participate in the regulation of immune cells [37][38][39][40]. Therefore, our risk score model could possess potential value to predict CRC    37 BioMed Research International patients' prognosis, and the eight genes could become promising immunotherapeutic targets, which deserve further study.
Our correlation analysis results confirmed that the eight genes in the risk score were distinctly correlated with molecular markers of CRC prognosis. However, our study has several limitations. First, our retrospective study limited the application of this risk score. Second, the heterogeneity of the immune microenvironment would inevitably contribute to result bias. Therefore, it is necessary to validate our findings in a prospective clinical study.

Conclusion
In this study, we conducted an immune-related prognostic model for CRC on the basis of stromal and immune scores.