Cell Type population
We used three datasets, GSE176078, GSE161529, and GSE180878, as described in the Materials and Methods section. We first used HiCAT[16] to conduct cell type annotation for each dataset. It was performed individually for each dataset in three hierarchical levels: major-type, minor-type, and subset. Major-types include epithelial cells, endothelial cells, fibroblasts, and immune cells. Within the immune cell category, further distinctions were made for minor-types, such as, CD4 + T cells, CD8 + T cells, B cells, plasma cells, dendritic cells and macrophages. The subset classification goes deeper into CD4 + T cells, subdividing them into Th1, Th2, Th9, Th17, Th22, and regulatory T cells, while macrophages are subcategorized as M1, M2a, M2b, M2c, and M2d. Lastly, B cells are sub-classified into memory B cells, regulatory B cells, follicular B cells, and marginal zone B cells.
Figure 2(a) shows the UMAP plot for the major cell types. In cancer datasets, epithelial cells were classified as either cancerous or normal. It's evident from the UMAP plot that, unlike other normal cell types, cancer epithelial cells form various clusters, indicating heterogeneity across patients. Figure 2(b) shows the proportion of each cell type relative to the total number of cells in each sample. The data indicates that normal tissues consist predominantly of epithelial cells, while tumor tissues also have large portion of immune cells. The proportion of cancer cells and immune cells varies significantly within each dataset of breast cancer, indicating the presence of different envirornments referred to as "hot tumors" and "cold tumors". Hot tumors are characterized by the activation of immune cells that can recognize and eliminate tumor cells[20]. Figure 2(c) shows the differences of immune cell populations among breast cancer subtypes and their statistical significance using t-tests with Bonferroni correction. Although the difference in immune cell proportion in the three subtypes were not as significant as from normal tissue, the two datasets showed similar tendency.
Identification of tumor cells
The identification of tumor cells was performed based on the cell-types identified by HiCAT. We merged the three datasets into one with cell type annotations and used InferCNVpy package to estimate the copy number variation (CNV) patterns for each cell. Based on the distribution of CNV patterns, tumor cell identification was performed based on the procedure described in the Materials and Methods section.
Figure 3(a) to (d) represent the UMAP plots for the dimension reduced CNV patterns of all cells in the merged dataset with colors corresponding to (a) sample ID, (b) tumor decision, (c) major cell type without tumor cell identification, and (d) major cell type with tumor cell identification. The clusters in the center mainly consists of normal cells including immune cells, fibroblasts, and normal epithelial cells from normal tissues (Fig. 3b and 3d). Despite being different cell types, they exhibit similar CNV patterns and appear adjacent to each other on the UMAP plot. On the other hand, the clusters in the outer regions are mostly mapped to epithelial cells from tumor tissues (possibly the cancer epithelial cells) and exhibit diverse cluster formations, clearly distinguishing each sample. It reflects the tumor heterogeneity. In contrast to the tumor cells in the outer clusters, the normal epithelial cells (from normal breast tissues) are centrally located, with similar CNV patterns across different samples, despite being from distinct individuals.
Figure 3 (e) illustrates a comprehensive heatmap depicting CNV patterns of all cells grouped according to their cell-types, where red color indicates copy number amplification (gain) while blue indicates loss. Figure 3 (f) is the same heatmap, but only for the tumor cells, grouped with sample ID. Although CNV patterns are quite heterogeneous, we can identify several commonalities across some samples or in the same subtype. Notably, HER2-positive subtype exhibited high amplification on chromosome 17, aligning with the known fact that the HER2 gene resides on chromosome 17 and may undergo mutation and/or structural variation [21]. In some samples of HR-positive, a common loss was observed on chromosome 6, consistent with previous findings that alterations in several genes on chromosome 6 could influence the formation and development of breast cancer and other cancer types, such as ovarian cancer, melanoma and leukemia [22]. Compared with other subtypes, tumor cells in TNBC subtype shows distinct amplification in chromosome 1 and 7, even if not so high as in chromosome 17 in HER2-positive subtype. In summary, the CNV estimation results for each cell type appear reasonably accurate when compared to the established knowledge and the scRNA-seq based CNV analysis effectively explains distinct gene alterations among different tumor subtypes.
Subtype-specific markers
Subtype-specific marker genes were explored for each cell type that constitutes the TME, including epithelial cells, endothelial cells, fibroblasts, macrophages, and CD8 + T cells, as summarized in Fig. 4. B cells and CD4 + T cells were excluded from the analysis as they were almost absent in normal tissues.
The marker gene expression for epithelial cells in Fig. 4(a) clearly shows the canonical markers for HR-positive and HER2-positive subtypes, i.e., ESR1[23], AGR2[24] and AGR3[25] in the HR-positive group and ERBB2[26] in HER2-positive subtype, indicating that the analysis is appropriate in this context. The per-sample marker expression pattern in (f) also show the difference among subtypes, even though some of marker genes for HER2-positive subtype also expressed in HR-positive samples, e.g., CRIP1, LRRC26, AR, FOXA1, and GPR160. Of note, although AH0319 was classified as HR-positive subtype, its expression pattern looks closer to TNBC as TNBC marker expressions are stronger than those for HR-positive or HER2-positive subtypes.
For TNBC subtype, 12 marker genes were identified in the comparative analysis of TNBC (cancer) epithelial cells with those of normal and other subtypes. The TNBC epithelial cell markers include VIM, CALD1, DSC2, TPGS2, PHGDH, FABP5, USP1, MAD2L2, CRYAB, GABRP, CDKN2A, and ATG5.
VIM (Vimentin) is a gene that was observed from breast cancer cell line experiments to be commonly upregulated in surviving cells after treatment. High expression of VIM is known to be associated with increased invasiveness and silencing this gene has been shown to reduce metastasis, suggesting its potential as a therapeutic target.[27]
CALD1 is a gene that encodes Caldesmon protein and plays a role in regulating the motility and invasiveness of breast cancer cells, affecting metastasis. Breast cancer tissues with upregulated CALD1 genes exhibit aggressive tumor characteristics, including increased growth rate and enhanced motility of tumor cells, which can promote carcinogenesis through the induction of epithelial-mesenchymal transition (EMT).[28]
DSC2 (Desmocollin 2) is primarily expressed in epithelial cells and is closely associated with heart diseases. It is a member of the Desmocollin gene family, which is essential for maintaining strong adhesion between epithelial cells to ensure structural stability. Overexpression of this gene enhances tumor cell cohesion, reduces the diffusion of chemotherapy agents, and promotes tumor growth and survival. It is known to be associated with poor prognosis in aggressive molecular subtypes like HER2-positive and TNBC.[29]
PHGDH (Phosphoglycerate dehydrogenase) enzyme overexpression is a characteristic feature in cancer cells and is associated with rapid cell division, growth, and proliferation. Research is ongoing to develop targeted drugs against this enzyme. In TNBC, there is a tendency for upregulated expression of this gene, and it is commonly associated with the CK5-positive cell lineage, which is observed in various tissues and cancer types.[30]
FABP5 (Fatty Acid-Binding Protein 5) plays a role in delivering fatty acids within breast tissue, providing the necessary energy for the proliferation and growth of cancer cells. Upregulation of RNA and protein levels of FABP5 has been observed in prostate and breast cancer cell lines, with specific expression noted in HR-negative breast cancer patients with poor prognosis. There is a documented association with upregulation by EpCAM, and experimental evidence has revealed that it is a major target gene of c-Myc, promoting the survival and progression of tumor cells.[31]
USP1 (Ubiquitin Specific Peptidase 1) plays a role in responding to DNA damage and facilitating its repair. Mutations in this gene can disrupt DNA repair regulation and potentially lead to the development of cancer. TGF-β is a crucial cytokine involved in EMT and metastasis. The USP1/WDR48 complex can mediate TGF-β signaling, enhancing EMT and the mobility of TNBC cells.[32]
MAD2L2 (Mitotic Arrest Deficient 2 Like 2) plays a crucial role in detecting errors during cell division and regulating proper cell division. Dysregulation or mutations in MAD2L2 can influence the development and growth of tumors. It has received attention as a potential target. Notably, it is overexpressed in MDA-MB-157 TNBC cells compared to other breast cancer subtypes, and this overexpression is associated with the promotion of cell division.[33]
CRYAB is highly expressed in various cancer types, including breast cancer, cervical cancer, and renal cell carcinoma. It plays a protective role by binding to proteins and promoting cell survival. It has been associated with tumor invasion, metastasis, and prognosis in breast cancer. Breast epithelial cells expressing high levels of CRYAB exhibit tumor-like growth patterns in vitro, suggesting their contribution to tumor formation.[34]
Recent research has confirmed that the EGFR signaling pathway is activated in TNBC, and abundant GABRP in TNBC stem cells interacts with EGFR to maintain its high expression levels, thereby increasing chemoresistance. Inhibiting the EGFR signaling pathway induced by GABRP has shown the potential to enhance the response to chemotherapy, including paclitaxel, doxorubicin, and cisplatin.[35]
CDKN2A is known as a tumor suppressor gene that plays a role in temporarily halting the cell cycle during the G1 phase[36].In a recent study, tumor samples from 587 TNBC patients were divided into six subtypes and abnormal amplification of the CDKN2A/B gene was observed in the basal like 1 type[6].In our results,CDKN2A expression was shown to be upregulated in TNBC subtype and it could be attributed to issues related to mutations in the CDKN2A gene itself.
The ATG5 (Autophagy-related 5) gene plays a crucial role in the process of autophagy, which is responsible for breaking down cellular waste, damaged proteins, and other components within cells. Mutations or dysregulation of this gene can lead to the development of various conditions, including cancer, neurodegenerative diseases, and immune disorders. ATG5, along with ATG2B, is targeted by miR-181a, and inhibiting the expression of miR-181a has been shown to weaken cancer stem cells in TNBC and increase autophagy. These findings suggest that ATG5 is associated with the suppression of cancer stem cells in TNBC, and compounds like curcumin that can modulate it may hold therapeutic potential for TNBC [37].
The top 12 markers identified for TNBC cancer epithelial cells belong to genes known to have specific functions in tumor cells. TNBC cancer epithelial cells exhibit minimal expression of marker genes associated with other subtypes, making them suitable candidates for serving as marker genes to distinguish TNBC from other breast cancer subtypes.
In addition to the cancer epithelial cells, we also identified markers for other TME components, including fibroblasts, macrophages, endothelial cells and CD8 + T cells. In Fig. 4(f) to (j), each breast cancer subtype exhibits distinct marker gene expression even though some of markers also expressed in some samples of other subtypes. However, when compared to samples of normal tissue, they display entirely distinct marker expression patterns for the same cell type. In particular, TNBC subtype shows much higher mean expression of its markers compared to other subtypes (Fig. 4(f) to (j)). It is especially the case in immune cells, such as CD8 + T cells and macrophages, of which the marker genes have very low expression in normal tissue (Fig. 4(i) and (j)). This observation highlights the unique and pronounced gene expression patterns associated with TNBC in comparison to other breast cancer subtypes and normal tissues.
Although we could not identify distinct marker genes for fibroblasts, endothelial cells, and macrophages in HR-positive and HER2-positive subtypes, it is evident that the cell types constituting the TME in cancer patients exhibit entirely different gene expression patterns when compared to normal tissue.
Gene Set Enrichment Analysis
The gene set enrichment analysis (GSEA) was performed using GSEApy (prerank method)[18]. Figure 5 shows a summary of gene set enrichment analysis results for(a) fibroblasts, (b) epithelial cells, (c) CD8 + T cells, (d) endothelial cells, and (e) macrophages. For each cell type, we used subtype-specific DEGs, including the marker genes shown in Fig. 4. In TNBC, epithelial cells exhibit the higher level of activation in several pathways, including allograft rejection, cell cycle, DNA replication, primary immunodeficiency, and Th1/Th2 cell differentiation. The results suggest that TNBC cells have rapid cell cycle progression, and various immune regulation processes are more prominently activated, indicating a potentially heightened immune response and cell proliferation in TNBC compared to other subtypes.
Fibroblasts show the activation of allograft rejection, autoimmune thyroid disease, cell adhesion molecules, graft-versus-host disease, hematopoietic cell lineage, natural killer cell-mediated cytotoxicity, Th1 and Th2 cell differentiation, and Th17 cell differentiation, suggesting the involvement of fibroblasts not only in immune regulation-related pathways but also in processes related to cell adhesion, hematopoietic cell development, and potential roles in cellular transitions. Macrophages exhibit the activation of the neuroactive ligand-receptor interaction pathway, while endothelial cells show activation of the human immunodeficiency virus 1 infection and RAP1 signaling pathway. These indicate the involvement of specific cellular interactions and responses in these cell types, possibly related to immune responses or other functions associated with these pathways.CD8 + T cells in TNBC subtype shared some upregulated pathways with HR-positive or HER2-positive subtypes.
Cell-Cell Interaction
Cell-cell interaction analysis was performed using CellPhoneDB, where we focused on the main players around the tumor microenvironment, including tumor cells (cancer epithelial cells), fibroblast, macrophage, endothelial cells, CD8 + T cells, and CD4 + T cells. Figure 6 summaries the results, excluding those interactions commonly occurred in the corresponding normal cells.
In Fig. 6(a) and (b), comparing the number of interactions between different cell types in different subtypes, we see that the interactions between CD8 + T cells and fibroblast was higher in TNBC compared to others, while those between fibroblast and endothelial cells was very weak in TNBC subtype.
In Fig. 6(c-e), we noted that JAG-NOTCH interaction between tumor cells and endothelial cells was found in all subtypes. It is known that JAG-NOTCH interaction in breast cancer has several biological implications, i.e., it can drive the transformation of normal breast cells into cancer cells. Notch signaling promotes uncontrolled cell growth and proliferation. It maintains cancer stem cells, which contribute to tumor recurrence. It also supports the formation of blood vessels within tumors and may help cancer cells evade the immune system. Targeting this pathway is being explored for breast cancer treatment.
Another notable interaction was DSC2-DSG2 interactions between tumor cells. Although this was identified in all subtypes, they have different level of interactions: strongest in TNBC, moderate in HER2-positive, and weakest in HR-positive. In the marker gene analysis results, DSC2 was a marker gene of TNBC cells. The mean values representing the average interaction levels of gene pairs were approximately 0.36 for the HR-positive, 0.73 for the HER2-positive, and 1.43 for the TNBC, indicating an approximately two-fold increase in the average interaction level of DSC2-DSG2 interacting pair in tumor cells across these subtypes. The expression levels of these genes in epithelial cells in different subtype also support this argument (Fig. 6(f)). The DSC and DSG gene families mediate cell adhesion functions through interactions in the intercellular space [38]. This cell adhesion structure is crucial for maintaining the normal architecture in epithelial tissues. While it has been studied that DSC2 is involved in tumor progression and development in various types of cancer, it is known to exhibit high expression in aggressive subtypes and to be correlated with poor survival outcome in breast cancer [39]. Additionally, a recent experimental study on breast cancer cell line revealed a significantly higher expression of the DSC2 gene in HER2-positive and TNBC, and DSC2 overexpression showed a significant correlation with a shorter disease-free and overall survival. Furthermore, tumor cells with high DSC2 expression exhibited increased metastatic potential, tumor cell cohesion, and resistance to chemotherapy[29].