Baseline demographic and clinical characteristics of study patients
The demographic and clinical charateristics of the 26 patients in the NRM group and the 31 patients in the R/M group are shown in Table 1. No significant differences were observed in age, body mass index, ethnicity, cancer stage, neoadjuvant chemotherapy, antibiotics use (pre-surgery), or tumor molecular subtypes.
Tumor microbial diversity was significantly different between the two groups and was associated with prognosis
To explore the potential association of the tumor microbiome composition with prognosis of breast cancer patients, we investigated differences in the microbiome composition of tumor tissues between NRM and R/M patients. A total of 2,610,661 raw reads were obtained from all 57 samples, with an average number of reads of 45,801±10,702 per sample (Additional file 1:Table S1). After quality control and removal of noise and chimeras, a total of 2,241,646 high-quality sequences were generated across all samples (Additional file 1:Table S2) and clustered into 4,569 OTUs (Additional file 2:Figure S1; Additional file 1:Table S3). The rarefaction curves approached saturation, suggesting that the majority of OTUs were recovered (Additional file 2:Figure S2).
Richness and evenness were higher in the tumor microbiome of NRM subjects than in that of R/M patients in terms of alpha diversity, as measured in observed OTUs (P=0.024, Wilcoxon rank sum test, Fig. 1a) and Chao1 index (P=0.024, Wilcoxon rank sum test, Fig. 1b). However, the beta diversity was not significantly different between the two groups based on the Bray-Curtis index (P=0.853, Fig. 1c), weighted UniFrac index (P=0.891, Fig. 1d), or unweighted UniFrac index (P=0.591, Fig. 1e) (all P from PERMANOVA). The PCoA plot was not completely separated based on the Bray-Curtis metrics (Fig. 1f), weighted UniFrac (Fig. 1g), or unweighted UniFrac metrics (Fig. 1h).
According to the PLS-DA, the tumor microbiomes of the NRM and R/M groups were clearly differentiated into two independent clusters, suggesting that their composition was significantly different (Fig. 1i). The area under the curve (AUC) was 0.9975, indicating that the prediction model obtained by the PLS-DA analysis is reliable. Thus, the tumor microbiome in R/M patients may be a strong predictor of recurrence or metastasis in breast cancer patients (Fig. 1g).
Based on the alpha diversity results, we tested the relationship between tumor microbial diversity and DFS by stratifying the patients in two groups based on median diversity obtained by observed OTUs index. We found that patients with high alpha diversity showed significantly longer DFS (median survival: 78 months) than those with low alpha diversity (median survival: 30 months) (Fig. 2).
The relationship between tumor microbial diversity and prognosis actually allowed the stratification of breast cancer patients according to whether their alpha diversity was high or low. These findings support the idea that tumor microbial diversity may be a prognostic predictor, and suggest that tumor microbiome composition may influence tumor progression.
Tumor microbiome composition was significantly different between NRM and R/M patients and was associated with prognosis
To identify marked differences in the predominance of bacterial taxa between NRM and R/M patients, different taxonomy levels were compared using LEfSe analysis based on Kruskal-Wallis and Wilcoxon test to detect bacterial taxa with significant differential abundances between NRM and R/M patients. LEfSe then was performed using LDA to estimate the effect size of each differentially abundant taxa with the criteria of LDA > 2 and P < 0.05. The tumors of NRM patients exhibited a predominance of Ruminococcaceae at the family level and Anaerostipes at the genus level. In contrast, the tumors of R/M patients were dominated by Bacilli at the class level, Tissierellaceae and Enterococcaceae at the family level, Lactobacillales at the order level, and Anaerococcus and Dechloromonas at the genus level (Fig. 3a,b).
The relative abundance of nine genera differed significantly between NRM and R/M. Four genera were enriched in R/M: Dechloromonas, Anaerococcus, Thiothrix and Microbacterium. Conversely, five genera were less abundant in R/M: Ruminococcus, Anaerostipes, Weissella, Butyrivibrio and Deinococcus (adjusted P < 0.05, Kruskal-Wallis and Wilcoxon; Fig. 4; Additional file 1: Table S4).
We then stratified patients into high or low categories based on their median relative abundance of the four genera Ruminococcus, Butyrivibrio, Deinococcus, and Microbacterium. Risk of recurrence or metastasis was significantly lower for breast cancer patients with higher abundance of Ruminococcus [hazard ratio (HR)=0.4554, 95% confidence interval (CI) 0.2169-0.9560], Butyrivibrio (HR=0.3537, 95% CI 0.1397-0.8957), or Deinococcus (HR=0.4763, 95% CI 0.2291-0.9901). Conversely, the risk was significantly higher for patients with higher abundance of Microbacterium (HR=2.632, 95% CI 1.113-6.224) (Fig. 5a-d).
Tumor microbiome function was significantly different between NRM and R/M patients
To determine whether the compositional differences between the two tumor microbiomes corresponded to functional differences, PICRUSt was used to predict the potential function of tumor microbiomes using 16S rRNA gene data. The top 20 most abundant level 2 and 3 KEGG pathways in the two different groups are shown in Fig. 6a,b. The comparison of enriched level 3 KEGG pathways between the two groups, based on the Dunn test, is shown in the Additional file 1: Table S5. In general, there were two level 3 KEGG pathways enriched in the NRM group and 69 in the R/M group. Genes associated with isoflavonoid biosynthesis and glycosylphosphatidylinositol-anchor biosynthesis were more abundant in NRM patients. However, genes associated with bacterial invasion of epithelial cells, cell division, steroid hormone biosynthesis, aminoacyl-tRNA biosynthesis, ECM-receptor interaction, and DNA replication were more abundant in R/M patients (adjusted P < 0.05, Dunn test; Additional file 1: Table S5).
Tumor DEGs between NRM and R/M patients
To investigate the differences in tumor gene expression at the transcriptional level, we performed transcriptome sequencing on total RNA samples from 26 NRM patients and 31 R/M patients, whose tumor microbiome DNA samples were used for 16S rRNA gene analyses. Overall, NRM and R/M groups had 22,221,601 and 21,832,153 reads on average, respectively (Additional file 1: Table S6). Using DESeq2, we identified 16 DEGs (adjusted P < 0.05, Benjamini-Hochberg correction; Fig. 7a,b; Additional file 1: Table S7). Of the 16 DEGs, 11 were upregulated in the R/M group (e.g. AKR1C3, CD36, AKR1C1, MEST, and NR1H3), and five were downregulated (EIF5B, ANKHD1-EIF4EBP3, PABPN1, RBM47 and KCNK6).
Using clusterProfiler, we carried out GO enrichment analysis on the DEGs. Using clusterProfiler and gage R packages, we conduct KEGG enrichment analysis on the DEGs. We were unable to obtain information about GO and KEGG enrichment because of the small number of DEGs.
Associations between tumor microbiome differences and DEGs
In order to elucidate how microbial abundance in breast tumors may influence prognosis, we considered correlations between nine microbial taxa and 16 DEGs (Fig. 8; Additional file 1: Table S9). The relative abundance of Ruminococcus (Spearman rho=0.337, P=0.01) positively correlated with the expression of RBM47. The relative abundance of Anaerostipes positively correlated with the expression of EIF5B (Spearman rho=0.364, P = 0.005), RBM47 (Spearman rho=0.387, P=0.003), and ANKHD1-EIF4EBP3 (Spearman rho=0.346, P=0.008). The relative abundance of Weissella positively correlated with the expression of PABPN1 (Spearman rho=0.268, P=0.044), EIF5B (Spearman rho=0.263, P=0.048), RBM47 (Spearman rho=0.294, P=0.026), and ANKHD1-EIF4EBP3 (Spearman rho=0.319, P=0.016). The relative abundance of Butyrivibrio positively correlated with the expression of RBM47 (Spearman rho=0.354, P=0.007) and ANKHD1-EIF4EBP3 (Spearman rho=0.339, P=0.01). The relative abundance of Dechloromonas positively correlated with the expression of ANKHD1-EIF4EBP3 (Spearman rho=0.294, P=0.026). The relative abundance of Anaerococcus positively correlated with the expression of PABPN1 (Spearman rho=0.354, P=0.007), EIF5B (Spearman rho=0.308, P=0.02), and RBM47 (Spearman rho=0.305, P=0.021), and negatively correlated with the expression of PLIN2 (Spearman rho=-0.29, P=0.028), VAMP5 (Spearman rho=-0.285, P=0.032), FOS (Spearman rho=-0.291, P=0.028), and TXNIP (Spearman rho=-0.393, P=0.003).
The relative abundance of Thiothrix negatively correlated with the expression of EIF5B (Spearman rho=-0.292, P=0.028) and RBM47 (Spearman rho=-0.276, P=0.038). The relative abundance of Deinococcus negatively correlated with the expression of CD36 (Spearman rho=-0.283, P=0.033), and positively correlated with the expression of EIF5B (Spearman rho=0.383, P=0.003), RBM47 (Spearman rho=0.310, P=0.019), and ANKHD1-EIF4EBP3 (Spearman rho=0.301, P=0.023).
Influence of tumor microbiome on the TIME in breast cancer
Next, we aimed to explore the role of the tumor microbiota on the local immune microenvironment and the natural history of the cancer. We used the CIBERSORT tool to examine the relative fractions of 22 TIIC types based on the gene expression profiling of tumor tissues, and we explored associations between relative microbial abundance and TIICs. Fig. 9 exhibits the relative proportions of the 22 TIIC subpopulation in tumor samples. The fractions of plasma cells, CD8+ T cells, follicular helper T cells, gamma delta T cells, activated NK cells, M0 macrophages, M1 macrophages, activated mast cells, and eosinophils were higher in tumor tissue from NRM patients than in tumors from R/M patients (Fig. 10). Conversely, the fractions of memory B cells, naive CD4+ T cells, resting memory CD4+ T cells, resting NK cells, monocytes, M2 macrophages, resting dendritic cells, activated dendritic cells, resting mast cells, and neutrophils were lower in tumor tissues from NRM patients (Fig. 10).
Next, associations between the tumor microbiome and the 22 TIIC types were analyzed to confirm the two-way relationship (Fig. 11; Additional file 1: Table S10). The relative abundance of Dechloromonas positively correlated with the fraction of activated mast cells (Spearman rho=0.268, P=0.044), while it negatively correlated with the fraction of resting mast cells (Spearman rho=-0.33, P=0.012). The relative abundances of Weissella (Spearman rho =-0.279, P=0.036) and Anaerostipes (Spearman rho =-0.279, P=0.036) negatively correlated with the fraction of gamma delta T cells. A trend towards a positive correlation was observed between the relative abundance of Butyrivibrio and the fraction of activated NK cells (Spearman rho=0.258, P=0.052).