Deciphering cell diversity and differentiation states in colon cancer
To explore the cell diversity and differentiation status in colon cancer patients in-depth, we conducted a comprehensive single-cell cluster analysis. Firstly, we collected GSE132465 data, retaining only the tumor samples. After quality control of the single-cell data, we obtained 47,285 cells. Subsequently, we selected the cells with the top 2,000 expression variability in the dataset and performed PCA analysis. We retained the top 20 principal components for clustering and subclustering, resulting in the classification of cells into 14 subclusters, as depicted in Fig. S1. Subsequently, differential analysis was performed on the 14 subgroups to identify the highly expressed genes in each cell subgroup (Fig. 1A). Based on previous reports11, we annotated the 14 subpopulations into 6 categories using cell-specific marker genes, namely epithelial cells, endothelial cells, myeloid cells, T cells, B cells, and fibroblasts. The expression of the respective marker genes (Fig. 1B). To demonstrate the spatial relationship of the 6 subpopulations more clearly, we calculated the spatial coordinates of each cell using the t-SNE algorithm (Fig. 1C). Subsequently, the marker genes of each subpopulation were displayed based on their expression levels (Fig. 1D).
Single-cell data analysis revealed abundant infiltration of immune and non-immune cells in the tumor tissues, with the largest proportion being immune cells, specifically T cells and myeloid cells. To further investigate the role of immune cell heterogeneity in colon carcinogenesis, we subdivided T cells and myeloid cells into subpopulations based on previously reported cellular marker genes. Firstly, T cells were classified as CD8+ T cells based on high expression of CD8A and CD8B, while cells were classified as CD4+ T cells based on high expression of IL7R and CD27. Subsequently, T cells were further subdivided based on the highly expressed genes of the respective subpopulations, resulting in the classification of a total of 20 T cell subpopulations (Fig. 2A). Similarly, according to previous literature, myeloid cells were classified into macrophages, monocytes, mast cells, dendritic cells, and neutrophils based on high or low expression of LYZ, CD68, CD163, CST3, and CSF3R. These myeloid cell populations were further subdivided into a total of 20 subpopulations based on the highly expressed genes of each cell type (Fig. 2B). Next, to analyze the immunosuppression-related status, we examined the expression of 79 immune checkpoint-related genes based on previous literature. Initially, we observed that T cells could be classified into CD4+ and CD8+ T cells based on the expression of immune checkpoint genes. However, we noted exceptions in individual cell subpopulations CD4-C01 and CD8-C16, which exhibited high expression in most cases, except for CD4-C01 and CD4-C13, which displayed partially high and low expression levels in the majority of other CD4+ T cells. This observation suggests that CD8+ T cells are primarily responsible for tumor immunosurveillance within our immune microenvironment. Notably, the CD8-C20 and CD8-C06 subpopulations of CD8+ T cells exhibit prominent expression of specific ligand genes, including HLA, BTNL, and LAG3. Our findings indicate that these highly expressed genes play dual roles in both immunosuppression and immune activation. Meanwhile, CD8-C10 exhibits particularly high expression of the immune checkpoint-associated receptor KIR, which belongs to the class of immunosuppressive receptors. This suggests that our C10 cells may be driving the immunosuppressive function in the microenvironment (Fig. 2C). On the other hand, macrophages and monocytes among myeloid cells exhibited a more activated state compared to other subpopulations. Specifically, the Macro-C5 subpopulation predominantly expressed high levels of immune activation-related receptors such as TBFRSF4, PDCD1, TNFRSF18, KIR2DL4, CTLA4, and CD27. Additionally, the Mono-C3 and Mono-C5 subpopulations of monocytes displayed high expression of immune ligands, including genes from the HLA family and CD86, among others. Notably, these genes play bidirectional roles in immune regulation, suggesting that the C3-C5 subpopulations are primarily involved in immune homeostasis within the microenvironment (Fig. 2D).
Exploring functional diversity and differentiation trajectories of immune cell subpopulations in colon cancer tumor microenvironment
To investigate the functions of individual subpopulations of T cells and myeloid cells in the tumor microenvironment, we initially computed the mean expression levels of genes within each cell subpopulation. Subsequently, we employed the GSVA algorithm to calculate pathway scores for HALLMARK pathways within each cell subpopulation. Our investigation revealed a prevalent state of heightened activity in T cells across pathways associated with Fatty Acid Metabolism, Adipogenesis, Estrogen Response Latency, and Glycolysis. Conversely, CD8-C20, CD4-C08, and CD4-C09 exhibited consistently low activity across all pathways. In contrast, CD4-C01, CD8-C06, and CD4-C07 demonstrated notable levels of activity, particularly CD4-C01 displaying elevated activity in pathways linked to Kras Signaling Upregulation, Early Estrogen Response, and Myogenesis. Additionally, CD8-C06 exhibited activity in pathways related to Unfolded Protein Response, Mtorc1 Signaling, and Apoptosis. CD4-C07 exhibited similar characteristics to C06 and demonstrated activity within the Cholesterol Homeostasis pathway. Myeloid cells exhibited heightened activity in pathways associated with the Apical Surface, Hedgehog Signaling, and Apical Junction, while Neu, Mast-C2, Mono-C6, and Mono-C5 displayed diminished activity levels (Fig. 3A). Conversely, all macrophage subpopulations displayed significant activity in inflammation-related pathways, such as the Inflammatory Response, Reactive Oxygen Species, and TNF-α Signaling via NF (Fig. 3B). These findings strongly suggest their involvement in tumor cell development.
To deepen our comprehension of the interrelationships among cell subpopulations, we conducted the cell trajectory analysis as proposed, aiming to discern potential differentiation trajectory associations between cells. Initially, the algorithm identified five distinct differentiation states within the T cell subpopulations (Fig. 3E). Due to the inability of the computer to ascertain the initial differentiation point of the cells, it differentiated the distance of each cell subpopulation along the differentiation trajectory. Thus, only the distribution of possible differentiation timelines for individual cells is simulated, and the location of differentiation initiation cannot be determined (Fig. 3D). Notably, one end predominantly featured the CD8-C05 and CD8-C06 subpopulations, with CD8-C06 being more prevalent, while the other end primarily comprised CD8-C06 subpopulations (Fig. 3C). In addition to the preceding functional analysis, over-representation of CD4-C01 and CD4-C14 subpopulations was observed at the opposite end (Fig. 3C). The preceding findings suggest that while the differentiation trajectory did not distinctly delineate the 20 identified subpopulations, there exist cells exhibiting diverse differentiation trajectories. To elucidate the functional disparities between cell populations situated at the opposite poles of differentiation, we conducted KEGG and GO database enrichment analysis on the top 100 differentially expressed genes within each subpopulation. Our findings reveal that the CD4-C01 subpopulation primarily engages in pathways associated with protein digestion and inflammatory modulation, such as Regulation of Endopeptidase Activity, Regulation of Inflammatory Response, and the Tumor Necrosis Factor-Mediated Signaling Pathway. The CD4-C14 subgroup predominantly engages in protein folding-related pathways, such as the response to unfolded proteins, correction of topologically incorrect proteins, and various other pathways (Fig. 3F, G). At the opposite end of the differentiation trajectory, CD8-C05 is predominantly associated with pathways related to immunoregulation, such as Lymphocyte-Mediated Immunity, Peptide Antigen Assembly with MHC Class II Protein Complex, and other pathways. Conversely, CD8-C06 is primarily implicated in protein folding pathways, akin to the CD4-C14 subpopulation. This suggests that cells located at various stages of differentiation exhibit similar gene expression patterns (Fig. 3H, I). Simultaneously, we delineated the differentiation trajectories of myeloid cell subpopulations. Utilizing our algorithm, we categorized the 20 cell subpopulations into three distinct states. Notably, we observed significant spatial overlap between the distribution sites of macrophages and monocytes. Furthermore, there was a notable coincidence between the distribution of Mast-C2 and macrophages. Conversely, mast cells and DC cells exhibited consistent distribution patterns at the opposite end of the differentiation trajectory (Supplementary Fig. 2). The aforementioned findings indicate that T cells, along with macrophages and monocytes, constitute the predominant cell types within the tumor immune microenvironment. Moreover, distinct subpopulations of T cells exhibit diverse functions and exert regulatory control over immune checkpoints. Notably, among these subpopulations, CD4-C1, CD4-C14, CD8-C05, and CD8-C06 cells demonstrate heightened activity.
CD8+-ANXA1hi-T cells are associated with poor prognosis in colon cancer
Our previous investigation delved into the distinct roles played by different subpopulations of T cells and myeloid cells within tumor tissues, within the context of the tumor immune microenvironment. Our findings highlighted certain subpopulations intricately linked with cancer progression. Subsequently, to deepen our understanding of how these cells impact the survival outcomes of colon cancer patients, we conducted an analysis of the differential gene expression profiles associated with each of the 40 subpopulations, focusing separately on T cells and myeloid cells. We selected the top 5 highly expressed genes among the differentially expressed genes in each cell type through our calculations to serve as the set of marker genes for respective subpopulations (Fig. 4A, B, Supplementary 3A, B). The ssGSEA algorithm was subsequently employed to compute the scores of individual cell subpopulations for each sample from TCGA-COAD, GSE17538, and GSE39582 datasets. Subsequently, we categorized the samples into two groups, namely high and low, based on the median scores. Combining this classification with one-way COX analysis of overall survival, we discovered that CD8+-ANXA1hi-T cells, identified by their characteristic score (hereafter referred to as imm-score) (Fig. 4C-E), it is apparent that the observed phenomenon had a notable detrimental effect on the survival outcomes of patients across all three datasets. Concurrently, we conducted survival analysis using Kaplan-Meier methodology for individual cell subpopulations (Supplementary Fig. 4). Consistently, our findings revealed that patients belonging to the high imm-score subgroup exhibited diminished overall survival rates (Fig. 3F-H). The aforementioned findings collectively identify the CD8+-ANXA1hiT cell subpopulation as a significant scoring cell subpopulation in colon cancer. In our investigation, a notable variance in cellular infiltration was observed between the high-scoring and low-scoring groups. Remarkably, the DC-GALhi subpopulation exhibited the highest degree of infiltration in both cohorts, with the low-scoring group exhibiting a notably elevated proportion of this subpopulation, and the proportion of the CD8+-ANXA1hi-T cell subpopulation within the overall CD8+-ANXA1hi-T cell population was relatively low, however, it was evident that the proportion of individuals in the high-scoring group exceeded that in the low-scoring group. Hierarchical cluster analysis was able to partially separate the samples from the high and low score groups, and all of the above results suggest that the imm-score is a poor prognostic feature, and that the survival of patients with a high imm-score is shorter than that of patients with a low imm-score.
A,B Heat map of the top five differential genes in T cells. Gene expression levels are higher the darker the red the higher the expression, and conversely, the darker the blue the lower the gene expression level. C-E COX univariate analysis of the effect of gene scores characterizing each cell subpopulation on patient survival time. F-H Three independent datasets were performed for KM analysis grouped by gene characterization scoring, with red representing the high scoring group and green representing the low scoring group.
Clinical characteristics of the high and low scoring groups
Our prior investigation revealed an association between CD8+-ANXA1hi-T cell subsets and adverse prognosis among patients with colon cancer, particularly evident in those with elevated scores, leading to shortened survival. To provide a comprehensive depiction of the clinical variances between high and low scoring groups, we integrated our analysis with pertinent clinical data sourced from TCGA. In our analysis of cancer clinical staging, we observed a progressive increase in imm-score from stage I to stage IV, with stage I exhibiting the lowest score and stage IV the highest (Fig. 5A). These findings suggest that imm-score may serve as an indicator for determining the clinical staging of patients to some extent. Previously, it was observed that patients exhibiting elevated scores experienced shorter survival durations. Our analysis comparing the survival outcomes between deceased and surviving patients further validated this observation, revealing that deceased patients exhibited higher imm-scores (Fig. 5B). In addition, our findings indicate an increased number of aberrant genomic loci among patients in the high-scoring group (Fig. 5C). This observation implies a potential early cumulative carcinogenic process associated with this group. Additionally, a multitude of clinical features were observed, encompassing cancer cell infiltration into nerves and lymph nodes. Patients exhibiting invasion showed elevated scores compared to those lacking invasion, with notably higher imm-scores evident in individuals with nerve invasion (Fig. 5D). To explore the potential association between high imm-score and shortened patient survival due to cancer recurrence, we analyzed two sets of Bulk RNA-seq sequencing data of colon cancer (GSE17538 and GSE38582). Initially, imm-scores were calculated for individual samples, followed by their categorization into distinct groups. Subsequently, we conducted Kaplan-Meier analysis based on the patients' tumor recurrence timelines. Our findings revealed a significant correlation between high imm-scores and decreased recurrence-free survival (Fig. 5E, F). We found the same trend of elevated imm-score in samples in the presence of tumor cells invading lymphoid tissue (Fig. 5G, p = 0.081).
A Correlation of patient clinical stage with imm-score, yellow represents clinical stage 1, purple represents clinical stage 2, blue represents clinical stage 3, and orange represents clinical stage 4. B Correlation between patient survival status and imm-score, with red representing death and blue representing survival. C Comparison of the number of genomic abnormal loci between high and low imm-score groups. D Association of tumor neuroinvasion with imm-score. E-F Relationship between imm-score and time to recurrence, with red being the high-score group and green being the low-score group. G Difference in imm-score between subgroups with or without lymphatic invasion.
Cancer-associated pathway activation in the high-scoring group
Through our studies, we have observed that patients with high scores frequently exhibit a poor prognosis. Subsequently, to delve into the potential molecular mechanisms underlying this correlation, we aim to investigate the variances in molecular pathways between patients with high and low scores. Initially, the GSVA algorithm was employed to evaluate disparities in pathway scores between the high and low scoring cohorts (Fig. 6A, B). The analysis of KEGG database annotations unveiled significant activation of glycosaminoglycan synthesis, galactose metabolism, and ECM receptor interaction pathways in the high imm-score group, juxtaposed with notable inhibition in the low imm-score group. Conversely, primary bile acids, selenine, histidine, butyric acid, and butyric acid metabolism, along with bile acid metabolism, exhibited pronounced activation in the low imm-score group. The pathways of histidine metabolism, butyric acid metabolism, and fatty acid metabolism exhibited significant activation and inhibition in the high subgroup, as illustrated in Fig. 6A. Furthermore, we utilized the Gene Ontology (GO) database to annotate the disparities in biological processes between the high and low subgroups, including chromosome attachment to the nuclear envelope, negative regulation of astrocyte activation, phosphatidylinositol synthesis, and fatty acid metabolism, as illustrated in Fig. 6A. Phosphatidylinosito 4 kinase activity and NADH regeneration exhibit heightened activity within the higher subgroup, contrasting with UDP-N-acetylglucosamine transmembrane transport, phosphagen metabolic processes, positive regulation of sister chromatid cohesion maintenance, and pH elevation pathways, which display increased activity within the lower subgroup (Fig. 6B). In addition, we conducted a comprehensive analysis of pathways closely associated with tumorigenesis within the HALLMARK dataset for a thorough investigation of their interrelationships. Our findings reveal a significant positive correlation between the imm-score and the DNA repair, NOTCH signaling, and EMT pathways (Fig. 6C). Genomic instability characterized by alterations in the frequency of microsatellite occurrence is referred to as MSI. Microsatellites exhibit a high mutation rate, and MSI is strongly associated with tumors, including colorectal and gastric cancers. The presence of MSI leads to frameshift mutations, resulting in aberrant protein structures encoded by damaged genes, potentially culminating in tumorigenesis in affected cells. MSI constitutes one of the pivotal triggers for colon cancer. Through a comparative analysis of the imm-score between MSI-H and MSS groups, we observed a notably elevated imm-score in the MSI-H group, indicative of a significantly positive correlation between imm-score and MSI status. Furthermore, our findings reveal that patients in the MSI-H/high imm-score subgroup exhibited shorter overall survival, whereas those in the MSS/low imm-score subgroup displayed significantly prolonged survival (Fig. 6D). Furthermore, we examined the genomic data of colon cancer from the TCGA database and found that patients with high imm-scores had higher tumor mutational burden (TMB), and patients with High-TMB & High-imm-score exhibited significantly shortened overall survival (Fig. 6E). On the other hand, we observed that patients with low tumor heterogeneity also had low imm-scores (Fig. 6F, G). Finally, among colon cancer patients, the top 10 genes with the highest mutation frequency were APC, TP53, TTN, KRAS, SYNE1, MUC16, PIK3CA, FAT4, RYR2, and CSMD3, and their gene expression was negatively correlated with patients' imm-scores (Fig. 6H, I). In summary, the imm-score can infer the activity level of tumor-associated signaling pathways in colon cancer patients.
Assessment of immunotherapy efficacy between different imm-score groups
Previously, we observed that cohorts with higher scores exhibited heightened activity in tumor growth and metastatic pathways, correlating with a poorer prognosis. Our aim is to delve deeper into whether the imm-score might serve as a predictive marker for responsiveness to immunotherapy or chemotherapy. Therefore, we calculated the immune phenotype scores for each patient based on the transcriptomic data from TCGA-COAD. Our analysis revealed that in the MHC, CP, AZ, and IPS scores, the high imm-score group exhibited significantly lower IPS scores compared to the low-score group (Fig. 7A-D). This suggests that patients with a low imm-score may achieve better therapeutic efficacy when treated with immune checkpoint inhibitors (ICBs). To validate our findings, we downloaded an immunotherapy cohort from the TIED database and conducted Kaplan-Meier analysis, which showed that patients with high imm-scores had significantly higher relapse rates after ICB therapy compared to those with low imm-scores (Fig. 7E). Additionally, in two other cohorts, we observed that the imm-score of patients could partially predict the effectiveness of ICB therapy (Fig. 7F-G). Finally, we observed that the overall survival time of patients with high scores was significantly shorter after ICB therapy (Fig. 7H, I). In summary, our newly developed algorithm can predict the therapeutic response to ICB therapy by calculating the imm-score of cancer patients. .