Classication of muscle-invasive bladder cancer based on tumor stromal compartment

There has been a resurgence of interest in the tumor stroma in recent years. Whether the relative abundance of various stromal cells can be used as a classication system for muscle-invasive bladder cancer (MIBC) remain elusive. We applied single-cell RNA-sequence (scRNA-seq) data from two MIBCs to identify stromal cell (CD45 negative cells) clusters and the marker gene set of each cell cluster. The single sample gene set enrichment analysis method is used to estimate the relative abundance of the cell clusters in each MIBC sample from The Cancer Genome Atlas. Subsequently, k-means clustering was performed to cluster the MIBCs. Prognosis, oncogenic pathway enrichment score, epithelial-mesenchymal transition (EMT) score, gene mutation frequency, and tumor inltrating lymphocytes (TILs) were compared among the stromal component-based types of MIBC. We proposed a novel stromal component-based classication system to divided into MIBC three phenotype. The three types of MIBC differ in various biological characteristics. The progress of MIBC may be summarized as the process of gradually increasing stromal components and constructing a microenvironment suitable for cancer cells. The stromal-sucient type I and II differ in the composition of different bladder epithelial cell clusters. To the best of our knowledge, this study may be the rst systemic analysis of the non-immune stromal heterogeneity of MIBC.


Introduction
Bladder cancer is the 10th most common cancer worldwide, with an estimated 549,000 new cases and 200,000 deaths in 2018. [1] Muscle-invasive bladder cancer (MIBC) represents 25% of newly diagnosed bladder cancer and require aggressive therapy. [2] Treatment selection still depends on clinicopathologic characteristics, but this staging systems are poor accurate and result in consequently inadequate treatment. In recent years, though several molecular subtypes of MIBC have been proposed, [3][4][5][6][7][8] no one of them has been used in clinical practice. All of these typings focus on the genetic level and may ignore the composition ratio of various types of cells in the tumor.
It was con rmed that factors derived from the tumor microenvironment (TME) can in uence tumor cell survival, invasiveness and metastatic dissemination, as well as access and responsiveness to therapeutics. [9][10][11] The cell types of the TME comprise non-cancer cells, including blood endothelial cells, lymphatic endothelial cells, immune cells, mesenchymal stem cells and their differentiated progeny, and broblasts. The abundance of these cells varies across patients with the same or different cancer types, which may be one of the characteristics of tumor heterogeneity. We hypothesized that it may be a novel tumor classi cation system according to the various stromal compartment in MIBC.
In recent years, single-cell RNA sequencing (scRNA-seq) technology has become an effective method of revealing tumor heterogeneity. [12] In order to investigate our hypothesis, in the present study, scRNA-seq data from two MIBCs was used to identify stromal cell (CD45 negative cell) clusters. Then, MIBCs were classi ed based on the relative abundance of these stromal cell clusters. To our best knowledge, this may be the rst study to uncover tumor heterogeneity using scRNA-seq and bulk RNA-seq analysis.

Data collection
The scRNA-seq data of the CD45 negative cells from two MIBC specimens was upload in Gene Expression Omnibus [13] (https://www.ncbi.nlm.nih.gov/geo/) database by Wang etc. The scRNA-seq data was based on GPL16791 from 10X Genomics. MIBC samples with clinical information, bulk RNAseq data (displayed as read counts), and gene mutation data in the The Cancer Genome Atlas (TCGA, https://gdc.cancer.gov/) were included for subsequent analysis. The gene expression pro les of GSE87304 [7] contained 305 MIBC were downloaded using GEOquery R package [14] and used to validate the MIBC stromal component-based clustering. The work ow of the present study was shown in Fig. 1.
Processing of scRNA-seq data Seurat version 3.0 (https://satijalab.org/seurat/) R (version 3.6.1, https://www.r-project.org/) package was used to created the Seurat Object and performed quality control (QC), data normalization and scaling, and the detection of highly variable features. For QC, we ltered out genes detected in less than 3 cells and cells where < 200 genes had nonzero counts. In addition, the cells having a percentage of total unique molecular identi ers that derived from mitochondrial genome greater than 20% were also removed ( Figure S1), as such these cells may be considered under stress or cell death. After the above processing, 3774 cells remained for subsequent analysis.
After data normalization, 3000 genes with the highest standard deviations ( Figure S2) were considered as highly variable genes. These highly variable genes were used to performed principal component analysis (PCA). The jackstraw function was used to identi ed signi cant principal components. The ElbowPlot function was applied and 20 principal components ( Figure S3) were selected to performed Uniform Manifold Approximation and Projection (UMAP) [15]. Subsequently, cells were clustered by the FindClusters function.
Annotation of cells and identi cation of marker genes of cell cluster SingleR package[16] (https://bioconductor.org/packages/SingleR/) in R was used to perform unbiased cell type recognition from scRNA-seq data, by leveraging reference transcriptomic datasets of pure cell types to infer the cell of origin of each single cell independently. The FindMarkers function in Seurat package was used to identify markers of a single cluster compared to all other cells. The relevant parameters were only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.263, and test.use = "roc".
Calculation of cell abundance of each cluster for MIBC samples and stromal type clustering in TCGA The single sample gene set enrichment analysis (ssGSEA) was performed to calculate the abundance of each cell cluster in each sample with expression data according to the marker gene set of cell clusters identi ed by scRNA-seq analysis using GSVA R package [17]. Subsequently, we performed k-means clustering and determined the optimal number of stable stromal subtypes using Nbclust package[18] in R. The samples were reordered according to the result of clustering and displayed as a heatmap using pheatmap R package (https://CRAN.R-project.org/package=pheatmap).

Survival analysis
The univariate and multivariate Cox proportional hazards model was used to analyze the prognostic signi cance of stromal subtypes. Sex, age, and TNM stage were rst analyzed in univariate Cox proportional hazards model. The signi cant variables (P < 0.05) were included in the multivariate Cox proportional hazards model. The overall survival between stromal subtypes were compared using Kaplan-Meier survival analysis with the log-rank method.
Enriched oncogenic pathways between stromal subtypes Referred to a published article [19], ssGSEA was performed according to 331 genes involved in 10 oncogenic pathways to generate enrichment scores for each pathway in each sample. Genes in the 10 oncogenic pathways were annotated as tumor suppressor genes or oncogenes. The enrichment scores were calculated as the activated score minus the repressed score. Then, the enrichment score of each pathway among the stromal types were compared.
Comparison of epithelial-mesenchymal transition (EMT) score among stromal phenotypes EMT plays a crucial role in tumor invasion and metastasis, the EMTness was calculated referred to a published article [20]. The bulk RNA-seq data was normalized using voom function in limma R package. [21] The normalized expression values of corresponding genes was used as following : Subsequently, the EMTness of stromal subtypes were compared.

Comparison of gene mutations among stromal subtypes
The gene mutation annotation les (aligned against the genoeme of reference GRCh38) for MIBC in TCGA were downloaded using the GDCquery_Maf function of TCGAbiolinks R package [22]. The somatic mutation calling work ow used was the MuTect2 pipeline [23]. The count of variant was converted by log 2 to compare the tumor mutational burden (TMB) of stromal subtypes. We also compared gene mutation frequencies among stromal subtypes of MIBC.

Comparison of tumor in ltrating lymphocytes (TILs)
Referred to the published articles [24,25], the abundance of 22 types of speci c immune cells were calculated using ssGSEA method. Subsequently, the 22 TILs were compared among the stromal subtypes of MIBC.

Statistical analysis
All of the analyses were performed in R software. Student t test, Wilcoxon test, and Kruskal-Wallis test were applied to compare continuous variables. Pearson Chi-square test or Fisher exact test were employed for the comparison of unordered categorical variables. Survival analysis was performed using the Kaplan-Meier method, and the survival of the tumor subtypes was compared using the log-rank test. All the tests were two sided, and P-value < 0.05 was considered as indicating signi cance.

Cell clustering and annotations
A total of 9 clusters were identi ed in scRNA-seq analysis.( Fig. 2A) Using SingleR package and the Human Primary Cell Atlas[26] as reference, all the cells were recognized (Table S1). Interestingly, cells in the same cluster may be different types of cells. Since the specimens were derived from MIBC, the cell clusters 0, 1, 2, 3 and 7 are considered to be cancer cells, while the cell cluster 4 with signi cantly different expression patterns may be non-cancerous bladder epithelial cells in the tumor. The cells in cluster 5 and 8 were mainly recognized as stem cells and broblasts, while the cell cluster 6 were recognized were endothelial cells. In addition, a small number of cells in each cell cluster were recognized as other types of cells. The marker genes (Table S2) could basically identify cell clusters. (Fig. 2B) The top marker genes (ranked by AUC value) in each cell cluster were IGFBP3, UPK1A, KIAA0101, ARL6IP1, RPL27A, DCN, GNG11, CYBA and IGFBP7, respectively. (Fig. 2C-D) The stromal component-based subtypes of MIBC In the TCGA MIBC data sets, the abundance of 9 stromal cell subsets in each sample were estimated using ssGSEA method referenced the marker gene sets of each cell cluster. Subsequently, we performed k-means clustering of the MIBC stromal subtypes. Our analysis indicated that three was the optimal and stable clustering number. (Fig. 3A) The 408 MIBC samples were classi ed into three heterogeneous types. (Fig. 3B) Type 1 and 3, the "stromal-su cient" type I and II, were characterized by relatively high stromal cells (cluster 5, 6 and 8) and varying in bladder epithelial cell cluster (cluster 0, 1, 2 and 3). In detail, type 1 was relatively with high clusters 0 and 1while with low clusters 2 and 3. It conversed in type 3. Type 2, the "stromal-desert" type, was characterized by relatively low stromal cells (cluster 5, 6 and 8).
The stromal subtype transformation Survival analyses indicated that the "stromal-desert" type MIBC had signi cantly better overall survival than the other two subtypes (log-rank P = 0.031 and univariate Cox proportional hazards P = 0.011) (Fig. 4A). However, it was not an independent prognostic factor adjusted by routine clinicopathological characteristics (multivariate Cox proportional hazards P > 0.05, Table 1). We found that the stromal type transformation happened as the disease progressed. MIBCs with advanced disease were with higher stromal cells (cluster 5, 6 and 8) than those with early disease (Fig. 4B). The enrichment scores of oncogenic pathways vary among stromal subtypes in MIBC Referred to the published signatures of 10 oncogenic pathways, [19] we calculated the enrichment scores among the three stromal subtypes for each samples. Except for the cell cycle, the enrichment scores of 9 oncogenic pathways differ among stromal types in MIBC. (Fig. 5) In general, the stromal-su cient type I and II MIBC were with higher enrichment scores of oncogenic pathways than stromal-desert type except for the TGF-β pathway. In addition, stromal-su cient type II MIBC has higher enrichment scores in HIPPO, PI3K, TP53 pathways, while lower enrichment scores in NOTCH, NRF2, RTK/RAS pathways than stromalsu cient type I (P < 0.05).
The stromal-su cient subtype MIBC was active in EMT After calculation of the EMTness for each sample among the three stromal types of MIBC, as could be expected, stromal-su cient type MIBC was more active in EMT than stromal-desert type. We also found that stromal-su cient type II MIBC EMTness was higher than stromal-su cient type II. (Fig. 6A) Three stromal subtypes of MIBC with various gene mutation characteristics High TMB levels in tumor lead to an increase in tumor neoantigens may trigger the immune system to attack the tumor. [27] However, our result indicated that no difference in TMB of the three stromal subtypes of MIBC. (Fig. 6B) A total of 423 gene mutation frequency differs between the three stromal types of MIBC. (Table S3) Among them, the high frequency mutant gene (gene with a mutation frequency greater than 10% in any stromal subtypes) are ELF3, KDM6A, KMT2A, MED12, TP53, RB1, and VCAN. (Fig. 6C) The stromal-desert type of MIBC may have relatively few TILs The relative abundance of immune cells among the three stromal subtypes of MIBC were estimated referred to the published studies, [24,25] interestingly, compared to the two types of stromal-su cient MIBC, stromal-desert MIBC also fewer TILs except for resting CD4 memory T cells, activated memory T cells and memory B cells. The stromal-su cient type II MIBC have fewer resting CD4 memory T cells and more activated CD4 memory T cells. The stromal-su cient type I MIBC have more memory B cells. (Fig. 7A) The landscape of the TILs in the three stromal subtype MIBC was displayed as a heatmap. (Fig. 7B) Validation of MIBC stromal component-based clustering As it was in the TCGA MIBC data set, the abundance of the 9 cluster cell subsets in each sample in GSE87304 were estimated using ssGSEA method. We scaled each sample and used Nbclust package to performed kmeans clustering and determined the optimal number of stable stromal subtypes. It was also three was identi ed as the optimal number for clustering. (Fig. 8A) The "stromal-su cient" type I and II have more stromal cells (cluster 5, 6 and 8) while the "stromal-desert" type have lower stromal cells (cluster 5, 6 and 8). (Fig. 8B) In addition, the stromal component-based clustering may be independent to a molecular classi cations which divides MIBC into four types [7] (Luminal, In ltrated-luminal, Basal and Claudin-low) (Pearson's Chi-squared test P = 0.849)

Discussion
It is increasingly becoming clear that TME is critical to the continued uncontrolled growth of the cancer. TME comprises a mass of heterogeneous cell types, including tumor-in ltrating lymphocytes (TILs), endothelial cells, cancer stem cells, and broblasts, alongside cancer cells. Previous studies have shown that the ratio of TILs play a crucial role in the treatment and prognosis of various tumors, [28][29][30] including MIBC. [31] The TIL compartment could even service as a classi cation in triple negative breast cancer. [24] In the present study, we found that the non-immune (CD45 negative) stromal compartment may also service as a classi cation in MIBC. Moreover, our study may provide a method to explore the biological function of the identi ed cell clusters from scRNA-seq analysis.
In the scRNA-seq analysis, the cells in cluster 5 and 8 were mainly recognized as stem cells and broblasts, while the cells of cluster 6 was recognized were endothelial cells. Cancer stem cells (CSCs) are endowed with intrinsic plasticity, whereby they can modulate their stemness in relation to other tumorrelated traits, especially motility and invasiveness. [32] The cancer-related broblasts (CAFs) contribute to the maintenance of cancer stemness and also directly promotes angiogenesis, invasion, metastasis, and chronic in ammation. [33] Our results suggested the CSCs and CAFs may have similar gene expression patterns and be clustered into the same clusters. Tumor endothelial cells can accelerate tumor metastasis. [34] The e cacy of anti-angiogenic agents has been con rmed in the treatment of various cancers. [35][36][37][38] Anti-broblast therapies were in exploration, [39] it may be a novel promising treatment strategy as broblasts are the most abundant in the tumor stroma cell components.
After identifying the cell clusters, we calculated the relative abundance of each cell cluster in the tumor.
The 408 MIBC samples in TCGA were classi ed into three heterogeneous stromal subtypes named as stromal-su cient type I, stromal-desert type and stromal-su cient type II. The stromal-su cient type I and II differ in the composition of different bladder epithelial cell clusters. To the best of our knowledge, this study may be the rst systemic analysis of the non-immune stromal heterogeneity of MIBC. Moreover, the clustering results were validated in an independent data set GSE87304. Nevertheless, the stromal subtypes of MIBC could be converted as the broblasts, endothelial cells and stem cells in tumor gradually increase in tumor progression. A previous study on molecular subtypes suggested that chemotherapy may induce subtype switching. [4] The progress of MIBC may be summarized as the process of gradually increasing stromal components and constructing a microenvironment suitable for cancer cells.
Various degrees of activation in oncogenic pathways among the three stromal types, which suggests that the three types of MIBCs have signi cant differences in biological characteristics. In general, the stromalsu cient type I and II MIBC had higher enrichment scores of oncogenic pathways than stromal-desert type. As CAFs have also been implicated in driving resistance to chemotherapy or radiotherapy, [40,41] we speculated that the drug treatment effect might be better in stromal-desert type than the stromalsu cient type. Our result also showed as the EMTness was higher in stromal-su cient type than in stromal-desert type. So far, we concluded that there may be three sources of non-immune stromal cell components in stromal-su cient type MIBC, epithelial cell transformation[42], stem cell differentiation, and recruitment.
Gene mutation analyses showed no difference in TMB of the three stromal subtypes of MIBC. This suggested the TMB may be not associated with tumor stromal components. TMB was a potential re ecting tumor immunogenicity factor and biomarker for predicting immune checkpoint inhibitor (ICI) e cacy.
[43] Notably, previous clinical trials using ICI therapy do not show strong results in tumors typically associated with high stromal burden,[44, 45] partly due to a failure of CD8 + T cell to in trate cancer nests.
[46] Moreover, CAFs are a critical regulator of tumoral immunosuppression of the T cell response.
[47] In the present study, compared to the stromal-desert MIBC, though stromal-su cient type MIBC have more TILs, the function of TILs may be inhibited by stromal cells. Therefore, whether our MIBC classi cation based on stromal components can provide guidance for ICI treatment remains unclear. On the other hand, our three stromal subtypes of MIBC may have speci c gene mutation characteristics.
Targeted therapies with anti-broblasts and anti-speci c drive genes may be a promising treatment.
Though our study may provide new insight into MIBC classi cation, it had some limitations. In the scRNA-seq analyses, the cells in clusters 5 and 8 were mainly recognized as stem cells and broblasts, which result in not be able to identify the marker genes of them separately. So it is di cult to precisely include all stromal types in MIBC. Second, some cell clusters have too many maker genes, which requires further improvements. In addition, as a retrospective analysis, its feasibility require further validation in prospective experiments.
In conclusion, we proposed a novel stromal component-based classi cation system to divided MIBC into three phenotypes. The three types of MIBC differ in various biological characteristics. The progress of MIBC may be summarized as the process of gradually increasing stromal components and constructing a microenvironment suitable for cancer cells. Availability of data and materials All data generated or analysed during this study are included in this published article and its supplementary information les.

Abbreviations
Competing interests KW, HL and LS contributed to the study design and reviewed the munascript. ZL and HL analyzed the data and writing of the manuscript. SM, YG , YL, DZ, FW, ZQ and XY. contributed to the data collection, data interpretation and manuscript writing. All authors read and approved the nal manuscript.      The work ow of the present study.    Multiple oncogenic pathway enrichment scores vary among stromal subtypes in MIBC.

Figure 6
Page 20/21 Comparison of EMTness, gene mutation burden, and gene mutation frequency among stromal subtypes in MIBC. (A) EMTness of stromal-su cient MIBC is signi cantly higher than in stromal-desert MIBC. (B) Gene mutation burden is not signi cantly different among the three stromal subtypes. (C) The high frequency mutant genes (gene with a mutation frequency greater than 10% in any stromal subtypes) in the three stromal subtypes of MIBC.