Introduction

Colorectal cancer (CRC) is the third most common malignancy reported worldwide and is the second leading cause of cancer-related deaths1,2. Early screening, surgical procedure advancements, and multidisciplinary systemic therapies have led to a decline in CRC mortality rates, and have helped achieve a 5-year survival rate of 90% in early-stage patients3,4,5. However, the 5-year survival rate is less than 15% among patients diagnosed with metastasis, and 25% of patients newly diagnosed with CRC are expected to present with metastases at the time of diagnosis6,7. Moreover, the incidence of early-onset CRC, which is defined as CRC occurring in individuals < 50 years of age, continues to increase considerably8,9. CRC is highly heterogeneous, resulting in varied prognoses even among patients exhibiting the same tumor stage, which poses challenges in the clinical management of the patients. The heterogeneous nature of CRC may be associated with differences in genetic mutations, the tumor microenvironment (TME), gut microbiota profile, and metabolism characteristics10,11,12. Therefore, the classification of CRC at the molecular level is urgently needed to improve risk stratification and adopt precision therapies for CRC prevention and management.

Tumor evolution is a consequence of complex interactions between tumor cells and the TME, including immunocytes and stromal cells. Immunotherapy has transformed the traditional paradigm of clinical cancer therapy and exhibits potent anti-tumor efficacy in various cancer types13,14,15,16. Application of monoclonal antibodies to inhibit immune checkpoints has resulted in the achievement of promising and durable responses in multiple cancers17,18,19,20. However, immune checkpoint inhibitors (ICIs) have shown effectiveness in only a few patients with CRC, and only those with the microsatellite instability (MSI) phenotype but not those with the microsatellite-stable (MSS) phenotype respond to PD-1 blockade (objective response rate [ORR], 40% vs. 0%)21,22. The MSI phenotype is represented by enrichment of tumor-infiltrating lymphocytes (TILs) and production of neoantigens, which facilitate immune activation. Moreover, accumulating evidence has shown that tumor mutation burden (TMB), PD-1/PD-L1 expression, and TILs correlate with ICI response23,24,25. However, a lack of thorough understanding of TME complexity has hindered the application of anti-tumor immune therapies. For example, immunotherapies that repress tumor-associated macrophages have been applied in the clinical setting, but minimal response has been observed with monotherapy26. Therefore, comprehensive profiling of the immune molecular characteristics of CRC is warranted.

Owing to technical advancements, several computational methods are available for dissecting tumor heterogeneity. Non-negative matrix factorization (NMF) analysis is a virtual microdissection approach that has been widely applied in numerous research fields, including image representation27, interpretation of multimodal data28, and omics analyses29. NMF can be used in the identification of cancer molecular subtypes based on gene expression profiles through the identification of exemplar genes30,31,32,33. In this study, we used NMF to virtually microdissect the molecular patterns that are closely associated with immune activation in CRC and validated the resulting molecular classification in three independent cohorts.

Methods

Patient selection

A total of 1503 patients with CRC, whose gene expression profiles and clinicopathological characteristics were available, were selected; the relevant data were obtained from public databases. The Cancer Genome Atlas-Colon Cancer (TCGA-COAD) and The Cancer Genome Atlas-Rectal Cancer (TCGA-READ) datasets were downloaded from UCSC XENA (http://xena.ucsc.edu/) and used as the training cohort. Only patients diagnosed with adenocarcinoma (n = 488) were selected for the subsequent analysis. Three additional CRC datasets with the required clinical information were obtained from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/) and used as an external validation cohort. The three cohorts—GSE39582, GSE14333, and GSE17538—comprised 1015 patients.

Extraction of immune expression pattern

We used the NMF package (v0.23.0) to perform microdissection of the mRNA expression profiles in the TCGA-COAD and TCGA-READ datasets. After testing, it was found that K = 10 was the number of factors that could be applied for good segmentation of TCGA data. To extract data on immune-related NMF factors, ESTIMATE was used to analyze the immune enrichment score in TCGA samples. Statistical analysis revealed that among all NMF factor groups, patients whose data were decomposed into the seventh factor had a higher immune enrichment score; hence, the seventh factor was defined as an “immune factor.”

According to the difference between the load value of the seventh factor and the maximum load of other factors, all genes were sorted from high to low, and the genes with the top 150 weights were considered to be the key factors distinguishing immune and non-immune classes. This classification was achieved through the utilization of the NMF Consensus command of the NMF package with top 150 genes, which was further refined using the multidimensional scaling (MDS) random forest method of the randomForest (v4.6-16) package. The clusterProfiler (v3.14.3) package was used to analyze the function of the 150 immune-related genes.

Further classification of immune classes

Nearest template prediction (NTP) (CMScaller_0.99.2 package) and the single-sample Gene Set Enrichment Analysis (ssGSEA) method in the Gene Set Variation Analysis (GSVA) (v1.34.0) package were used to evaluate gene expression signatures and enrichment of molecular pathways that represented different inflammatory states or immune cells. Through NTP of stroma activation, the immune class data were further bifurcated into immune-suppressed and immune-activated subtypes. DESeq2 software was used to select genes that were significantly differentially expressed between the immune and non-immune groups with padj < 0.05 and absolute value of log2 fold change (FC) > 1. To determine the gene sets and pathways enriched in the immune and non-immune groups, the clusterProfiler (v3.14.3) package was used to conduct Kyoto Encyclopedia of Genes and Genomes and Gene Ontology pathway functional enrichment analysis of the differentially expressed genes (DEGs); simultaneously, all genes were enriched through GSEA using the fgsea (v1.12.0) package.

Validation of immune subtypes in the validation cohort

We screened the top 150 dysregulated genes between the immune and non-immune classes. Similar to the training cohort, the NMF and the ESITIMAT algorithms were adopted to separate the validation cohort samples into immune and non-immune classes based on the expression of the selected 150 genes. Furthermore, NTP was used to further divide the immune group into immune-suppressed and immune-activated subtypes.

Prediction of response to immunotherapy

To predict the responses to ICI therapy in patients with CRC, tumor immune dysfunction and exclusion (TIDE) and SubMap analysis (GenePattern module “SubMap”) were used. SubMap is an algorithm used to assess similarities in gene expression between previously defined immunophenotypes and responders or non-responders to anti-PD-1 treatments in the MD Anderson melanoma cohort.

Correlation of immune class with copy number alterations, mRNA stemness index, neoantigens, TILs, TMB, and mutational genes

Copy number alterations (CNAs) were calculated using GISTIC2.0 from GDAC Firehose (https://gdac.broadinstitute.org). A statistical comparison was then conducted to determine the difference between the immune and non-immune groups in amplification or deletion events at the arm and focal levels. Previously, Rooney et al. calculated the neoantigen number of TCGA tumors, from which the neoantigen number of TCGA-COAD and TCGA-READ could be obtained34. Furthermore, Saltz et al. estimated the abundance of TILs based on the images of hematoxylin and eosin (H&E)-stained sections of 13 TCGA tumor types, which included data on CRC35. Mutation data were collected from TCGA (https://tcga-data.nci.nih.gov), and the TMB was evaluated using the maftools (v2.6.05) package. To obtain data on the significantly mutated cancer genes (p < 0.01), we used the MutSigCV (v1.41) package to analyze the mutation data. Independent tests were then performed to further determine the mutations with significant differences among the different groups (p < 0.05). Finally, the mutation landscape oncoprint was generated using maftools. The mRNA stemness index (mRNAsi) was analyzed through one-class logistic regression, which can help predict stemness in poorly differentiated tumors.

Statistical analysis

Unless otherwise noted, all analyses were conducted using R software (v3.6.1), and the significance level was set at p < 0.05. Normally distributed data were analyzed using the Student’s t-test and analysis of variance, whereas non-normally distributed data were analyzed using the Wilcoxon and Kruskal–Wallis tests. For the analysis of categorical variables, the Fisher's exact and Pearson's chi-square tests were used (ns p > 0.05, *p < 0.05, **p < 0.01, ***p < 0.001, or ****p < 0.0001). Kaplan–Meier analysis and the log-rank test were conducted to compare patients’ survival between different immunophenotypes.

Results

Identification of a novel immune-related subtype of CRC

To establish an immune-associated molecular classification of CRC, we selected 488 patients from TCGA for a training cohort and 1015 patients from the GEO for a validation cohort, to dissect gene expression profiles using the NMF algorithm (Fig. 1). The omics sequencing data and complete clinical information of the patients included in this study are presented in Table 1. We first used the NMF algorithm to conduct a virtual microdissection of distinct gene expression patterns in the training cohort. Upon integration with the immune enrichment score36 generated using the ESTIMATE algorithm, the seventh of the 10 expression patterns (NMF factors) was found to be enriched in patients with high immune enrichment scores and exhibited a significantly higher immune enrichment score average than most of the remaining patterns (Fig. 2a and Supplementary Fig. S1). Therefore, this pattern was considered as an immune factor. The top 150 weighted genes in the immune factor were defined as exemplar genes, which represented the immune factor expression pattern. We performed Gene Ontology enrichment analysis for the exemplar genes and observed that immune activation-associated pathways, such as humoral immune response, lymphocyte-mediated immunity, complement activation, and antigen binding, were significantly enriched (Supplementary Fig. S1), indicating the immune-related functions and signaling of the immune factor.

Figure 1
figure 1

Flow chart of the analyses conducted in this study.

Table 1 Summary of the clinicopathological characteristics of four independent colorectal cancer datasets.
Figure 2
figure 2

Identification of novel immune-related classes of colorectal cancer using the non-negative matrix factorization (NMF) approach. (a) Integration of NMF and ESTIMATE algorithms revealed the seventh factor enriched in most patients with high immune enrichment scores. (b) Data on the immune and non-immune classes were refined through Multidimensional Scaling random forest analysis through the top 150 exemplar genes. (c) Heatmap showing the distribution of patients with different NMF factors, immune factor weight, clustering based on exemplar genes, immunophenotypes, and immune enrichment score. (d) Heatmap showing the gene set variation analysis scores of immune-related signatures reported previously between the immune and non-immune classes. The Student’s t-test was used to statistically analyze enrichment scores between the immune and non-immune classes. TLS tertiary lymphoid structure, CYT cytolytic activity score, CMSs consensus molecular subtypes, ISs immune subtypes.

To further stratify immune-related CRC subtypes, consensus clustering based on the exemplar genes was conducted to obtain preliminary classes, and data for the classes were refined using an MDS random forest algorithm (Fig. 2b). This helped identify a novel immune molecular subtype that accounted for 42.8% of the cohort (209/488). This subtype was termed as the “immune class.” The remaining 57.2% (279/488) of the cohort was termed as the “non-immune class” (Fig. 2c). We characterized the function of the immune class using immune-related gene signatures derived from literature and databases (Supplementary Table S1) through GSVA analysis. The immune class showed significantly higher enrichment scores for immune-related signatures than the non-immune class, including immune infiltrating cell-related signatures (T cell, NK cell, B cell, macrophage), as well as cytotoxic effect-related signatures, such as tertiary lymphoid structure (TLS), interferon (IFN) signatures, and the cytolytic activity score (CYT) (all p < 0.05, Fig. 2d). Additionally, we compared the DEGs between the immune and non-immune classes and observed that these DEGs were significantly enriched in immune-activated signatures, such as cytokine-cytokine receptor interaction, Th17 cell differentiation, Th1 and Th2 cell differentiation, and antigen binding (Supplementary Fig. S2). We also compared the difference in enrichment signatures between immune and non-immune classes through GSEA, which revealed significantly upregulated immune response related-pathways in the immune class, such as the intestinal immune network for IgA production, cytokine-cytokine receptor interaction, inflammatory response, IFN gamma response, IFN alpha response, and tumor necrosis factor alpha signaling via NF-kB (Supplementary Fig. S2). These results indicated that the immune class identified depicted evident characteristics of immune activation.

To evaluate the accuracy of the NMF algorithm-generated immune molecular classification, we integrated features of the established immune class with previously reported CRC molecular features. Thorsson et al.37 established a method of pan-cancer immune classification (C1-C6) of solid tumors by conducting an extensive immunogenomic analysis of over 10,000 tumors derived from 33 distinct cancer types. We observed a significant enrichment of C2 (IFN-γ-dominant, 48/190 vs. 24/236, p < 0.01), and a significant decrease in C1 (wound healing, 130/190 vs. 201/236, p < 0.01), in the immune class compared with those in the non-immune class (Fig. 2d). Consensus molecular subtypes (CMSs; CMS1-CMS4) proposed by the CRC Subtyping Consortium are regarded as the most accurate classification for the stratification of CRC cases38,39. We observed a significant enrichment of CMS4 (mesenchymal subtype, 60/190 vs. 43/236, p < 0.01) and a significant decrease in CMS2 (canonical subtype, 62/190 vs. 120/236, p < 0.01) in the immune class compared with those in the non-immune class (Fig. 2d). Collectively, these results suggest that the immune-related classes identified in this study through the NMF analysis may be used to stratify CRC based on the immune molecular characteristics.

Immune class is composed of immune-activated and immune-suppressed subtypes based on different tumor niches

The TME varies in components of immunocytes and cytokines in the tumor niche, which confer anti- and pro-tumor activities during cancer progression. Moffitt et al. proposed stromal activation as a signature for the categorization of pancreatic ductal adenocarcinoma into normal and activated stromal subtypes, which was successfully applied to provide decision support and tailor therapies in the clinic40. To explore this component in the immune CRC class, we integrated NTP analysis by determining stromal activation signature. We identified 46.9% (98/209) of the patients in the immune class that lacked stromal enrichment, whereas the remaining 53.1% (111/209) had a relatively higher stromal enrichment scores (Fig. 3a). TGF-β is regarded as the pivotal regulator of immune suppression within the TME and has been reported to play roles in tumor immune escape and poor response to ICI immunotherapy41,42. Notably, we observed that multiple TGF-β signatures, such as WNT/TGF-β and TGF-β response signatures of fibroblasts (Fibroblast-TBRs), TGF-β response signatures of T cells (T cells-TBRs), late TGF-β, and extracellular matrix cytokines (C-ECM) were all significantly enriched in the stromal-activated class than those observed in the remainder of the immune class (all p < 0.05, Fig. 3a). We also observed that CMS4 subtypes and PD-1 signaling were higher enriched in the stromal-activated class than the remainder of the immune class (Fig. 3a). Therefore, we defined the stromal-activated class as the “immune-suppressed subclass” of the immune class. The remaining cases, which lacked the TGF-β and PD-1 signatures, were defined as the “immune-activated subclass.”

Figure 3
figure 3

Identification of the immune-activated and immune-suppressed subclasses of colorectal cancer based on immunosuppressive signaling. (a) Heatmap showing the gene set variation analysis scores of immune-related signatures reported previously and the predicted result of the signature calculated by considering nearest template prediction between the immune and non-immune classes. (b) Expression of genes related to immune suppression between the immune-activated and immune-suppressed classes.

Cytotoxic T cells, Th1 cells, and IFN-γ potently eliminate tumor cells, whereas Treg cells, cancer-associated fibroblasts (CAFs), and myeloid-derived suppressor cells (MDSCs) are regarded as immune-suppressive cells43,44. Interestingly, we observed that immunosuppressive Treg cells, tumor-infiltrating Treg (TITR) cells, MDSCs, and macrophages signatures were significantly upregulated in the immune-suppressed subtype (all p < 0.05, Fig. 3a), which confirmed the suppressive status of this subclass. Additionally, we observed that the expression of immune-suppressive genes, such as TGFβ1, TGFβ3, LGALS1, and CCL2, was significantly higher in the immune-suppressed subclass than in the immune-activated subclass (Fig. 3b). Collectively, these results revealed that the immune class consisted of two distinct subtypes based on suppressive signaling.

Reoccurrence of the three immune-related subtypes in other independent cohorts

To validate the recurrence of our established immune-related subtypes, we performed NMF analysis and activated stromal signature-based separation in three CRC cohorts (GSE17538, GSE14333, GSE39582, n = 1015; Table 1). The top 150 dysregulated genes identified between the immune and non-immune classes in the training cohort were considered as the immune classifiers. In the GSE17538 cohort, we identified 50.0% (116/232) of the patients who had a high immune enrichment score and belonged to the immune class, whereas the remaining 50.0% (116/232) were defined as the non-immune class. In the immune class, patients were further divided into immune-activated (40.5%, 47/116) and immune-suppressed (59.5%, 69/116) subclasses (Fig. 4). Similar to the results obtained for the training cohort, the immune class showed higher enrichment scores for the infiltrating immunocyte-related signatures, TLS, CYT, and IFN signatures, than the non-immune class. Accordingly, the immune-suppressed subclass displayed higher scores for stromal activation, TGF-β signatures, and immune-suppressive-associated signatures than the other two classifications. Similar results were also observed in the GSE14333 and GSE39582 cohorts. In GSE14333, 27.9% (63/226) patients were divided into the immune class, including 60.3% (38/63) of the patients grouped under the immune-suppressed subclass, and 39.7% (25/63) of the patients categorized under the immune-activated subclass (Supplementary Fig. S3). In GSE39582, 42.2% (235/557) of the patients were grouped under the non-immune class, whereas 50.9% (164/322) and 49.1% (158/322) of the remaining patients were grouped under the immune-activated and immune-suppressed subclasses, respectively (Supplementary Fig. S4). Genes related to immune suppression were upregulated in the immune-suppressed subclass in all three validation cohorts (Supplementary Fig. S5). Collectively, these results confirmed the stability and robustness of our immune-related CRC molecular classification system.

Figure 4
figure 4

Validation of the immunophenotypes in the GSE17538 cohort.

Immune-related subtypes associated with prognosis outcome and response to ICIs

To explore the clinical implications of this molecular classification, we first evaluated the different prognosis outcomes through survival analysis. The immune-activated subclass exhibited the best overall survival (OS) and disease-free survival (DFS) outcomes, the immune-suppressed subclass showed the poorest prognosis, and the non-immune class exhibited moderate prognosis outcomes in both training and validation cohorts (Fig. 5a). We performed the multivariate analysis to confirm the results of the survival evaluation. The results showed the immune-related classification is an independent factor of OS and DFS (Supplementary Table S2). We further explored the correlation between these groups and clinicopathological characteristics. No significant correlation was observed between the distinct immune subtypes and clinical features, including age, sex, tumor stage, and genetic mutation (Supplementary Table S3).

Figure 5
figure 5

Immune-related molecular subtypes associated with prognosis and response to immune checkpoint inhibitor (ICI) therapy. (a) Kaplan–Meier analysis showing different overall and disease-free survival with three immunophenotypes in the training and validation cohorts. (b) Subclass mapping analysis indicates that patients in the immune-activated subclass show similarity with patients who respond to anti-PD-1 treatment. (c) Tumor immune dysfunction and exclusion analysis showing ICI response ratios among the three immunophenotypes.

To assess the potential applicability of this immune-related molecular classification system in the prediction of response to ICI treatment, we used SubMap analysis to compare the gene expression pattern of our immunophenotypes with those obtained for patients with metastatic melanoma who were subjected to treatment with ICIs. Notably, patients with the immune-activated subclass exhibited a highly similar gene expression profile to melanoma patients who responded to anti-PD-1 immunotherapy in both training and validation cohorts (Fig. 5b). These results imply that patients in the immune-activated subclass may be potential candidates for receiving ICI immunotherapy. TIDE45, which is an effective computational method for predicting response to ICIs, was also used to evaluate the ability of our molecular classification to predict ICI response. We observed that the ratio of ICI response in the immune-activated subclass was higher than that in the other two groups, whereas the immune-suppressed subclass showed the worst ICI responses (Fig. 5c). Taken together, the immune-activated subclass exhibits the best prognostic outcomes and patients grouped under this subclass may benefit more from anti-PD-1 treatment.

Relationship between immune-related subtypes and CNAs, TIL enrichment, PD-1/PD-L1 expression, and cancer stemness

To illustrate the molecular mechanism underlying the different immune phenotypes, we analyzed the relationship between these immune-related subclasses and tumor molecular characteristics. CNAs are associated with distinct molecular characteristics, immunologic phenotypes, and clinical prognosis. Chromosomal instability is a common phenomenon in cancer and usually refers to arm and focal CNAs. Bassaganyas et al. found that tumors with high chromosomal instability exhibited characteristics of immune escape in hepatocellular carcinoma, whereas tumors with low burdens of arm CNAs displayed an immune-activated status, suggesting CNAs could be used to predict response to immunotherapies46. Consistent with the results, we observed that the immune class exhibited a significantly lower burden of copy number amplification than the non-immune class at both arm and focal levels, whereas there was no significant difference based on copy number deletion between the two classes (Fig. 6a, b). Notably, we found that the number of TILs was significantly upregulated in the immune class compared with those in the non-immune class (Fig. 6c). Moreover, we found that the expression of PD-1 and PD-L1 was significantly upregulated in the immune class compared with that in the non-immune class (Fig. 6d, e). Therefore, these well-accepted biomarkers used for predicting ICI response support the potential response of immune class to immunotherapy, suggesting a promising role for the immune-related molecular classification in guiding clinical immunotherapy.

Figure 6
figure 6

Correlation between immunophenotype and genetic characterization. (a, b) Copy number amplification and deletion at the arm and focal levels between the immune and non-immune classes. (c) Tumor-infiltrating lymphocyte count between the immune and non-immune classes. (d, e) PD-1/PD-L1 expression between the immune and non-immune classes. (f) Differentially mutated genes among the three immune subgroups. (g) Tumor mutant burden between the immune and non-immune classes. (h) Neoantigen content between the immune and non-immune classes. (i) Stemness difference represented by the mRNAsi signature between the immune and non-immune classes. Comparisons between groups were conducted using the Student’s t-test, whereas comparisons among three groups were conducted using the Kruskal–Wallis test, t-test, Student’s t-test; K–W test, Kruskal–Wallis test.

Genetic characterization of tumor tissues has confirmed that high TMB and neoantigens, resulting from somatic non-synonymous coding mutations, may elevate tumor immunogenicity and increase susceptibility to immunotherapy47. We observed a different mutation profile among the three immune classifications via Mut-SigCV algorithm analysis (Fig. 6f). Notably, the mutation frequency of ACVR2A and GABRB in the immune-activated subclass was significantly higher than the immune-suppressed subclass and non-immune class (ACVR2A: 10.59% vs. 7.22% and 3.14%; GABRB: 8.24% vs. 3.61% and 1.57%; respectively). More mutations in LRRC7 and CDH10 were revealed in the immune-suppressed subclass than in other classes (LRRC7: 18.07% vs. 9.41% and 5.1%; CDH10: 14.46% vs. 8.24% and 4.71%; respectively). High TMB and neoantigen burden correlated with response to immunotherapy in various solid tumors, whereas this association was uncertain in MSS tumors47. Our study revealed that the TMB and neoantigen burdens were comparable between the immune and non-immune classes (Fig. 6g, h). Miranda et al.48 showed a negative correlation between stemness feature, represented by the mRNAsi signature, and the immune response. Accordingly, the mRNAsi signature expression was significantly lower in the immune class than in the non-immune class, implying that the immune class had reduced stemness and could be more responsive to immunotherapy (Fig. 6i). Collectively, these results indicate that the immune class possesses significantly higher TIL enrichment and PD-1/PD-L1 expression and lower copy number amplification and stemness, which may contribute to its immune response.

Combination of immune-related subtypes and microsatellite status precisely guide immunotherapy strategy

Microsatellite status is an important factor involved in response to immunotherapy in CRC; patients with MSI CRC show a favorable outcome with immunotherapy. While only 40% patients with MSI CRC respond to ICIs, 60% patients with MSI CRC still do not respond to ICIs. Patients with MSI CRC who respond to ICIs and how to improve the response rate of patients with MSI CRC to ICIs are still unknown. We tries to integrate immune-related subclasses in patients with MSI CRC to explore heterogeneity in the effect of immunotherapy in patients with MSI CRC. We identified 72 patients with MSI from the GSE39582 datasets. Based on our established immune-related classification, these 72 patients were divided into non-immune, immune-suppressed, and immune-activated subclasses. Twenty-two (30.6%) patients were allocated to the immune-activated subclass, which exhibited higher immune cell infiltration and immune activation signaling, implying these patients were potential responders to ICI therapy (Fig. 7). Thirty-four (47.2%) patients were allocated to immune-suppressed subclass, which exhibited higher immune suppressive cells and immune suppressive signaling, such as TGF-β, PD-1, PD-L1, MDSC, macrophages, and Tregs (Fig. 7 and Supplementary Fig. S6). We recapitulated these results in the TCGA cohort; 51 patients with MSI CRC were identified, and 25 of them were allocated to the non-immune subclass, 20 patients to the immune-suppressed subclass, and 5 patients to the immune-activated subclass. Similarly, immune suppressive molecules and signaling were higher in the immune-suppressed subclass than in the non-immune and immune-activated subclasses, although no statistical significance was observed owing to small number of patients in the immune-activated subclass (Supplementary Fig. S7). Collectively, these results suggest that patients with MSI CRC allocated to the immune-activated subclass have a “hot” immune status. The tumor may be regressed by single ICI immunotherapy. In contrast, for patients in the immune-suppressed subclass, ICI therapy combined with a TGF-β inhibitor or an agent for the elimination of immune suppressive cells might improve efficacy. Our novel classification not only provides new insights and assists in identifying appropriate candidate patients with MSI CRC who will respond to ICIs, but also, through combination immune-related subtypes and MSI status, may help tailor optimal immunotherapeutic treatment and further improve clinical outcome of patients with CRC on immunotherapy.

Figure 7
figure 7

Validation of the immunophenotypes in patients with MSI CRC in the GSE39582 cohort.

Discussion

Accumulating evidence has demonstrated the essential role of the TME in cancer progression49. However, the immune landscape in CRC remains poorly understood. A thorough investigation of different immune and stromal components in the TME may help understand complex tumor heterogeneity, develop more precise therapies, and refine current immunotherapeutic strategies. Particularly, establishment of an immune-related classification may provide data on more accurate subtypes of patients with CRC to help clinicians select more effective therapies. In this study, we presented a comprehensive immune-related classification of CRC based on the NMF algorithm according to CRC expression profiles. Using the top 150 exemplar genes of the immune pattern for consensus clustering, three immune-related molecular groups were identified, namely immune-activated, immune-suppressed, and non-immune. Molecular classification associated with different immune signaling pathways and immune cell types was established and validated in three independent cohorts. Notably, the immune-related classes correlated with patient prognosis and response to ICI therapy, and this finding might be used to tailor clinical immunotherapy strategies for patients with CRC.

The NMF algorithm, a multiplicative update algorithm, can be used to dissect substantial volumes of data using a limited number of components50. Emerging evidence has suggested NMF to be a promising method for classifying tumor molecular subtypes51. Using the NMF algorithm, we identified an immune pattern in the training cohort. Through consensus clustering of the exemplar genes of that immune pattern, CRC could be distinguished as immune and non-immune classes. Of the 488 patients in the training cohort, 42.8% were grouped under the immune class, which showed higher enrichment of infiltrating immune cells, cytolytic activity, and IFN signaling than the non-immune class. Effective immunotherapy is often limited by the immunosuppressive TME, which includes immunosuppressive cells and cytokines. Calon et al.52 revealed a pro-metastatic program induced upon secretion of IL11 by TGF-β-stimulated CAFs in the tumor niche. Accumulating evidence has suggested that the C-ECM level is upregulated by activated CAFs to trigger the recruitment of immunosuppressive cells53. To investigate the different components in the CRC TME, we further dissected the gene expression profiles of the immune class and divided them into immune-activated and immune-suppressed subclasses based on a stromal-activated signature calculated using the NTP method. Interestingly, 46.9% of the patients in the immune class were assigned to the immune-activated subclasses, which showed lower enrichment of the stromal-related signatures and immunosuppressive cells—such as TGF-β, C-ECM, PD-1, Treg, and MDSC signatures—than that observed in the immune-suppressed subclass. This immune molecular classification was also validated in three independent cohorts. For example, the immune-activated subclass accounted for 40.5% of the patients, whereas the immune-suppressed subclass accounted for 59.5% of the patients in the immune class of the GSE17538 cohort. These results indicate that NMF may be used as a helpful classifier for stratifying immune molecular subtypes of patients with CRC.

Currently, treatment paradigms have been shifting from targeting the tumor cell compartment to focusing on the development of strategies for regulating the TME54. As a smaller number of patients respond to ICI therapy, identification of effective biomarkers for predicting patient suitability to immunotherapy is warranted. Recently, emerged immunological and non-invasive methods, such as liquid biopsy using peripheral T cell receptor repertoire of PD-1+ CD8+ lymphocytes, provide a non-invasive approach for selecting patients with metastatic CRC who could benefit from ICI treatment55. Meanwhile, with the advances in computational methods, such as SubMap and TIDE, a novel algorithm was established to predict ICI therapy response. SubMap analysis is conducted using an algorithm to evaluate the similarity in gene expression profiles between patients in different molecular subtypes and in patients with metastatic melanoma treated with ICIs56. Notably, here, the results of the SubMap analysis showed that patients in the immune-activated subtypes exhibited similar gene expression profiles to those of patients with melanoma who responded to anti-PD-1 immunotherapy, which suggested that patients in the immune-activated subclass could respond to PD-1. Moreover, survival analysis shows that patients in the immune class, especially in the immune-activated subclass, presented with better OS and DFS. These results may have clinical implications in terms of formulating prognosis and treatment decisions.

Genomic studies have revealed that CNAs, TMB, and neoantigens demonstrate a predictive value for ascertaining susceptibility to immunotherapies57. Lower CNAs were linked to the elevated infiltration of immunocytes and cytokines, which implied a higher likelihood of successful ICI therapy58. We analyzed the relationship between the established molecular subtypes and genetic features. We observed that CNA amplification was significantly lower in the immune class than that in the non-immune class. The levels of other canonical biomarkers that could help predict response to ICI, such as TIL and PD-1/PD-L1 expression, were significantly higher in the immune class, which confirmed patients in the immune class could potentially benefit from immunotherapy.

Conclusions

Herein, we proposed a novel immune molecular classifier based on gene expression profiles analyzed using the NMF algorithm. Patients in the immune-activated subclass present with favorable prognosis and may respond positively to anti-PD-1 immunotherapy. This study provides new insights into tumor heterogeneity using an integrated and multifaceted approach to enhance the discovery of clinically important molecular subtypes that respond to immunotherapy.