Integrative exploration of the mutual gene signatures and immune microenvironment between benign prostate hyperplasia and castration-resistant prostate cancer

Abstract Background Benign prostate hyperplasia (BPH) and prostate cancer (CaP) are among the most frequently occurring prostatic diseases. When CaP progressed to castration-resistant CaP (CRPC), the prognosis is poor. Although CaP/CRPC and BPH frequently coexist in prostate, the inter-relational mechanism between them is largely unknown. Methods Single-cell RNA sequencing, bulk-RNA sequencing, and microarray data of BPH, CaP in the Gene Expression Omnibus database were obtained and comprehensively analyzed. Weighted Gene Co-Expression Network Analysis (WGCNA) and lasso regression analysis were performed to explore the potential biomarkers. Results With WGCNA, five modules in BPH, two in CaP, and three in CRPC were identified as significant modules. Pathway enrichment analysis found that the epigenetics and chromosomal-related signaling were dominantly clustered in the CaP group but not in BPH and CRPC. Lasso regression analysis was used to analyze further the mutual genes between the BPH module and the CRPC module. As a result, DDA1, ERG28, OGFOD1, and OXA1L were significantly correlated with the transcriptomic features in both BPH and CRPC. More importantly, the role of the four gene signatures was validated in two independent anti-PD-1 immunotherapy cohort. Conclusion This study revealed the shared gene signatures and immune microenvironment between BPH and CRPC. The identified hub genes, including DDA1, ERG28, OGFOD1, and OXA1L, might be potential therapeutic targets for facilitating immunotherapy in prostate cancer.


Introduction
Benign prostate hyperplasia (BPH) is the main reason for lower urinary tract symptoms (LUTS) in aging men, which seriously impact the quality of life and health [1].The development of BPH is highly associated with aging.Significantly, the age-specific incidence rates of prostate cancer (CaP), the most prevalent cancer type in men worldwide, also rise steeply from around age 45-49, peaking in the 75-79 years old.In keeping with that, CaP patients frequently have pathological or symptomatic BPH.It is extensively acknowledged that the above problem of the prostate does not lead to the other, although they coexist at the same gland.However, evidence also suggested that BPH could be a risk factor for CaP or bladder cancer [2,3].Thus, whether the association between CaP and BPH is a causal link remains controversial.
Patients with CaP are initially responsive to antiandrogen therapies but eventually develop into castration-resistant prostate cancer (CRPC).Androgen deprivation therapy (ADT) remains the primary palliative strategy for men with locally advanced or metastatic disease [4].Ultimately, ADT becomes ineffective in 18-24 months, yielding to lethal CRPC [5].To date, emerging evidence indicates that immunotherapy, especially immune checkpoint inhibitors (ICIs), has the potential to enable long-term survival in cancer patients [6].Unfortunately, the response rate to ICIs varies across tumors and is limited to immunologically "hot" tumors, but not "cold" tumors, especially prostate [7,8].Therefore, it is critical to further understand the immune microenvironment in different prostatic pathologies and find molecular targets for screening the potential patients with response to ICIs.
Clinically, BPH is not treated as a premalignant lesion.But BPH and CaP share multiple biological features [9].The etiology of BPH remains largely elusive [10].Cellular senescence, metabolic disruption, genetic variation, and chronic inflammation are common risk factors for BPH and CaP [2,11].In addition, the PI3K-Akt pathway and CDK4 play a critical role in both BPH and CaP [12][13][14].Preclinical evidence supports the potential role of chronic inflammation in the malignant transformation of epithelial cells in the prostate [15].In keeping with that, epidemiological evidence revealed that prostatitis and BPH lead to escalating risks of CaP [16].Chronic inflammation leads to the accumulation of proinflammatory mediators, infiltration of immune suppressor cells, and activation of immune checkpoint pathways in effector T cells, which facilitate the formation of an immune suppressive microenvironment in the "cold" tumors.However, the difference of the immune microenvironment among BPH, CaP, and CRPC remains poorly understood.
Here, in this study, we pooled three scRNA-seq datasets of samples from BPH and CaP and 10 bulk transcriptomic datasets derived from BPH, CaP, as well as CRPC.Briefly, we identified the shared gene signatures and immune microenvironment between BPH and CRPC.The identified hub genes, including DDA1, ERG28, OGFOD1, and OXA1L, might be potential therapeutic targets for facilitating immunotherapy in prostate cancer.Understanding the links between BPH and CRPC would improve cancer prevention, screening, and decision-making in integrated therapy.

Datasets download and process
Bulk-transcriptomic and single-cell RNA sequencing (scRNA-seq) studies of BPH, CaP, and CRPC were obtained by a systematic search of literature from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/).As a result, three single-cell RNA sequencing studies (GSE141445/GSE145838/GSE145843) and 10 bulk-RNA sequencing study (GSE80609/ GSE28204/GSE89223/GSE104749/GSE119195/GSE132714/ GSE65343/GSE32982/GSE134073/GSE5377) were included in this study.To further validate our findings in scRNAseq analysis, the bulk-RNA sequencing dataset on the prostate from the TCGA database (PRAD) and the clinical data were collectively downloaded.In addition, the clinical information and transcriptomic profile of two independent cohorts of anti-PD1 therapy (GSE190265/ CheckMate 025) were included for validation.Ethical approval was waived by the institutional ethics committee because data are obtained from public databases, and all the patients are de-identified.

Quality control of scRNA-seq datasets
The scRNA-seq data set was downloaded from the GEO database (GSE141445, GSE145838, and GSE145843).The duplicated data were checked and removed.During quality control process, the mitochondrial reads ratio was 0.01.The resolution parameter used in this study was 1.2.After removal of doublets, 60,668 single-cells have been identified.Cell expression profiles were quantile normalized and analyzed using the Seurat R package (http://satijalab.org/seurat/).The standard Seurat workflow was performed with the pooled scRNA-seq data from 10X Genomics by using the R package Seurat (version 4.05).Samples with less than 3 cells and/or less than 300 detected genes per cell were excluded from further analysis.Doublets were identified and filtered by R package DoubletFinder, with 4% doublets expected.

Integration of scRNA-seq datasets
To integrate multiple samples, integration anchors were identified in the list of samples using the Canonical Correlation Analysis (CCA) method.For clustering, principal component analysis (PCA), T-distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP) were performed for dimension reduction.Furthermore, marker genes of each respective cluster were identified by the function of "FindAllMarkers" and the package of "SingleR" previous to manual annotation [17].

Weighted gene co-expression network analysis
The weighted gene co-expression network analysis (WGCNA) algorithm was used in this study to find the co-expressed modules among BPH, CaP, and CRPC with high biological significance and investigate the association between gene networks and diseases [18].The module eigengene (M.E.) was defined to summarize the expression profiles of each module.Furthermore, the significance of the correlation between M.E.s and lesion types was calculated.The mutual genes in M.E.s positively associated with both BPH and CRPC were overlapped using interactive [19].

Analysis of sub-clusters in epithelial cells
Following primary annotation, luminal and basal epithelial cells were extracted via the "SubsetData" function.Then, the "FindClusters" and "FindAllMarkers" functions were conducted, and the epithelial were reclustered by TSNE and UMAP.The sub-clusters were annotated by the dominant expression cell markers.The following cutoff threshold values were applied to reveal the marker genes for each cluster: adjust Pvalue < 0.01, and absolute log2 fold change > 0.5.

Analysis of copy number variations
The gene expression data of epithelial cells extracted from the Seurat object were analyzed for the CNVs with the inferCNV program of the Trinity CTAT Project (https://github.com/broadinstitute/inferCNV).The transcriptomic profiles of the epithelial cells from the normal prostate tissue were set as reference cells.For the normal group, BPH group, and CaP group, 3000 cells from each group were randomly sampled after quality control filtering with greater than 3000 UMIs.Each CNV was annotated to be either a gain or a loss.

Transcriptional factor analysis
Single-cell regulatory network inference and clustering (SCENIC) analysis was performed on luminal and basal epithelial cells from different groups of samples using the "SCENIC" package [20].SCENIC can identify targets of translational factors (TFs) based on the mRNA expression network.In addition, SCENIC was used to perform TFs motif enrichment analysis to identify direct targets and score the activity of the regulators (AUCell algorithm).The activity TFs were visualized in a heatmap, and certain TFs were visualized in a dot plot.

Differentially expressed gene analysis
The "Limma" package was used to perform the analysis for a differentially expressed gene (DEG).An empirical Bayesian method was conducted to estimate the fold change between the normal group, BPH group, CaP group, and CRPC group identified by the pathologist.The adjusted P-value for multiple testing was calculated using the Benjamini-Hochberg correction.The genes with an absolute log2 fold change greater than two were identified as DEGs between the two groups.

Gene ontology enrichment and protein-protein interaction (PPI)
Here, the gene lists were analyzed by the DAVID website for the enrichment of biological processes.Moreover, the GO terms were further investigated and visualized as functionally grouped networks by a Cytoscape plug-in tool, namely ClueGO.The P-value < 0.05 was considered significant during the gene ontology analysis.The PPI network and cluster analysis were performed with the STRING plug-in tools under a confidence cutoff value of 0.2 in Cytoscape software (version: 3.9.1).

Single sample GSEA and ImmuCellAI algorithm
Single sample Gene set enrichment analysis was conducted between normal, adjacent tissues, BPH, CaP, and CRPC to explore different immune cell infiltration states.The association between gene expression and immune infiltration was visualized in the violin plot.p < 0.05 was considered statistically significant.Moreover, immunogenomic analysis of TCGA datasets was performed by the CellAI algorithm with 24 immunes cells, which could be accessed by an online database (http://bioinfo.life.hust.edu.cn/GSCA).

Statistical analysis
Gene expression data were analyzed using the P-value, fold changes (FC), and ranks.The statistical difference between groups in the bioinformatics analysis was calculated using the Wilcoxon Signed-rank test.p < 0.05 was considered statistically significant.

GEO datasets of cohorts with prostate disease
A systematic review of the literature with keywords including "prostate hyperplasia", "prostate disease", "prostate cancer", or "prostate" in GEO query found 10 bulk-RNA sequencing or microarray studies and 3 scRNA-seq studies that meet the previously set criteria.
The GSE numbers, detection platforms, and number of BPH cases were listed in Supplementary Table S1. Figure 1(A) illustrates the workflow of this study.As a result, 7 normal prostates, 6 adjacent prostate tissues, 67 pathologically confirmed BPH tissues, 124 treatment naïve prostate cancer tissues, and 12 CRPC samples were included for subsequent analysis.

The landscape of co-expression modules among different prostatic lesions
A total of 17 modules were identified in the above 10 bulk transcriptomic datasets through WGCNA, with each color representing a distinct module.The transcriptomic profiles of the 206 cases could be clustered into dozens of subclusters with high levels of correlations (Figure 1(B,C)).Furthermore, the module-trait relationships based on the Spearman correlation coefficient were evaluated and visualized in a heatmap (Figure 1(D)).Five modules (dark turquoise, dark green, light yellow, white, and steelblue) were highly associated with BPH.The dark orange and black modules were highly correlated with CaP, while the cyan, pink, and white modules had a high association with CRPC (Figure 1(D,E)).G.O. analysis indicated that translation and spliceosome pathways were mutually increased in BPH and CRPC, while the chromatin remodeling and DNA damage repair pathways were exclusively overexpressed in the CaP (Figure 1F-H

The analysis of copy number variations in BPH and prostate cancer
Next, we sought to validate the differences in structural variations of chromosomes between BPH and CaP.Here, the cells from the prostate tissues were clustered into nine populations according to classical markers for indicated cell types (Figure 2(A,B)).The tumor cells could be clustered into luminal-like cells and basal-like cells based on their expression of keratin or PSA (Figure 2(A-C)).Overall, the CNVs more frequently happened in CaP compared with BPH.However, the increase of CNVs in CaP was mainly contributed by the luminal cells, while the basal cells of the BPH had significantly increased CNVs than CaP (Figure 2(D,E), Figure S1).The CNVs results calculated by InferCNV from scRNA-seq datasets were further confirmed by the somatic mutation data from the TCGA-PRAD project (Figure 2(F,G)).Notably, the 10q23.31was frequently deleted in CaP, where PTEN was located (Figure 2(G)).

The specific activation of MYC and FOS transcriptional factors in BPH
To further understand the driving force of transcriptomic changes between BPH and CaP, the activity of transcriptional factors (TFs) was screened across scRNA-seq datasets.Of all the TFs regulons (TFs and their target genes), most of them were generally active in both BPH and CaP.Here, the relatively exclusive TFs were selected and visualized in a heatmap (Figure 3(A)).To be more specific in the location of cell types, feature plots of TFs activity indicated that FOSL1 and MYC were activated explicitly in BPH, but not normal prostate or CaP (Figure 3(B-D)).SPDEF was exclusively activated in luminal cells of CaP (Figure 3(B,E)).CEBPD and ELF3 were more active in normal prostate, compared with BPH or CaP, while XBP1 was suppressed in normal prostate, but not BPH or CaP (Figure 3(B)).The G.O. analysis of shared genes between the BPH regulons and modules indicated that the cytokines signaling pathway was the most significantly enriched (Figure S2).In addition, lipid metabolism and histone deacetylation were dominantly enriched in the prostate cancer group (Figure S3).

Identification of unique and shared gene signatures in BPH and prostate cancer
In keeping with previous findings, we explore the transcriptomic features in BPH, CaP, and CRPC, to further validate the conclusions of genomic and TFs analysis.Here, the genes of DEGs and significant modules from WGCNA were selected to find the overlap signatures (Figure 4(A)).Of note, the BPH and CaP possess reversed important modules according to WGCNA (Figure 1(D)).In keeping with that, none of the genes in the DEGs and modules between BPH and CaP were overlapped (Figure 4(A)).However, five overexpressed genes (H2BC15, PSMA2, H4C3, LIMD2, ATP1B2) were found in the BPH DEGs and significant modules (Figure 4(B)).In addition, the five gene signatures were not significantly associated with the prognosis of prostate cancer (Figure 4(C)).Their shared genes were found to be functionally active in molecular functions toward translation, post-translational modification, or metabolic pathways (Figure 4(D)).As BPH and CRPC shared mutual modules, we next select the overlapped genes derived from DEGs and WGCNA modules  (Figure 1(D), Figure 5(A)).Forty-two genes were shared by BPH and CRPC according to WGCNA, which were mainly focused on the MAPK phosphorylation and MYC signaling pathways (Figure 5(B)).PPI network analysis suggested that the RPL4, FARSA, EIF3G, SNRPA, and OXA1L were the top five hub genes.To further filter the gene signatures related to the prognosis of prostate cancer, DDA1, ERG28, OGFOD1, and OXA1L were found by lasso regression analysis (Figure 5(D,E)).The mRNA expression of four gene signatures was significantly increased in BPH compared with normal prostate and was further upregulated in CaP (Figure 5(F)).Interestingly, the expression of four genes was decreased after the progression to CRPC (Figure 5(F), Figure S4A-C).Importantly, these genes failed to be correlated with the prognosis of CaP in TCGA (Figure 5(G)).However, the expression of DDA1 and ERG28 (C14orf1) were significantly associated with the poor prognosis of patients with anti-PD1 therapy (Figure 5(H), Figure S4D-F).

The shared immune microenvironment in BPH and CRPC
To investigate the immune state of prostate tissues, the enrichment levels of 29 immune-related cells and types in the normal prostate, adjacent tissues, BPH, CaP, and CRPC were shown in the heatmap.The ESTIMATE Score, Immune Score, Stromal Score, and Stromal Score of each case were presented and colored by intensity (Figure 6(A)).After calculating the ssGSEA score representing the immune cells in the prostate, a Pearson's correlation analysis between each immune cell was performed (Figure 6(B)).The ESTIMATE score, immune score, stromal score, and tumor purity were statistically compared among prostate lesions (Figure 6(C-F)).Notably, the high level of CD274 (encoding PDL1) and TIGIT expression, the canonical immune co-inhibitory factors, were significantly increased in the CaP group, compared with BPH or CRPC groups (Figure 6(G,H)).The relative percentage of annotated immune cells in 206 prostate tissues was shown in a bar plot (Figure 6(I)).Compared with CaP, the infiltration of CD8 þ T cells, M1 macrophage, and N.K. cells was significantly increased in BPH and CRPC (Figure 6(I)).However, the proportion of M2 macrophage was relatively high in both CaP and CRPC but not in BPH, adjacent tissues, and normal prostate (Figure 6(I)).
To further illustrate the relation between the gene signatures and the immune microenvironment, the ssGSEA and ImmuCellAI algorithms were applied [21,22].The individuals of the gene set were significantly correlated with a suppressive immune profile (Figure 7(A)).Of note, DDA1, ERG28, OGFOD1, and OXA1L were all negatively and significantly correlated with the infiltration of CD8 þ T cells (Figure 7(E)).In keeping with previous findings, the increase of indicated genes' CNVs also correlated with the decrease of CD8 þ T cells, and increase of exhausted infiltrate in prostate cancer (Figure 7(F), Figure S5A).More importantly, the gene set constituted of DDA1, ERG28, OGFOD1, and OXA1L significantly correlated with an immune-suppressive tumor-microenvironment (Figure 7(G), Figure S5B).

Discussion
In this study, WGCNA analysis indicated that the significant modules in BPH and CaP were almost mutually exclusive.However, the BPH and CRPC shared multiple significant modules in WGCNA.Further analysis suggested that the mutual gene signature was significantly associated with decreased infiltration of anti-tumor immune cells and a suppressive microenvironment.Above all, a four-gene signature (DDA1, ERG28, OGFOD1, and OXA1L) was associated with the shared immune features in BPH and CRPC, as well as the formation of tumor microenvironment in CaP.
Previous findings indicated that the deprivation of androgen signaling altered in the immune microenvironment in CaP contributed to the failures of various therapies for CRPC [23].But the role of the immune system in the development of CRPC is largely unclear.Our data suggested that the transcriptomic profiles between BPH and CaP were distinct.This could be explained by the dominant expansion of luminal cells in CaP and the proliferation of basal epithelial cells in BPH [24].Further analysis proved that these mutual gene signatures were mainly activated in MAPK phosphorylation and MYC signaling pathways and correlated with the suppressive tumor microenvironment.The ssGSEA analysis of the constitution of immune cells showed that M 2 -type macrophage was increased in both CaP and CRPC, compared with BPH and normal prostate.However, the infiltration of CD8 þ T lymphocytes, natural killers, and M 1 -type macrophages was significantly increased in both BPH and CRPC compared with hormone-native prostate cancer.Overall, the immune infiltration scores in CPRC were improved compared to CaP.Since the androgen receptor inhibited the CD8 þ T cell-driven anti-tumor immune response, it is plausible that the long-term administration of androgen-axis blockade could remodel the tumor microenvironment to a less suppressive state [25].Clinically, the autologous cellular product consisting of activated antigen-presenting cells has proved effective in metastatic CRPC [26].Moreover, preclinical studies have shown that the anti-IL23 antibody had synergistic effects with enzalutamide in the disease control of CRPC [27].Of note, an ongoing phase 3 clinical trial will confirm the role of anti-PD1 antibodies with or without docetaxel and prednisone/prednisolone in metastatic CRPC patients [28].Above all, although CaP is a "cold" tumor, patients with CRPC could benefit from immunotherapy.Unlike renal cell carcinoma, the biomarkers for tumor immune microenvironment in prostate cancer have not been extensively explored [29].
Although multiple strategies are available for patients with prostate cancer, there remains an unmet need for biomarkers to predict or prevent the occurrence and progression of prostate cancer.Rebeccca et al. reported that activation of Wnt signaling in prostate cancer may serve as prognostic biomarkers [30].In this study, we globally screened the potential biomarkers based on immune features in BPH and CRPC.Briefly, the expression of DDA1, ERG28, OGFOD1, and OXA1L were significantly correlated with the immune microenvironment in BPH and PCa.DDA1, a novel oncogene, has been reported to be a potential novel target for lung cancer treatment, and a biomarker for tumor prognosis [31].ERG28 was functionally active in cholesterol synthesis [32].OGFOD1, a prolyl hydroxylase, could promote cancer proliferation by regulating the expression of cell cycle regulators [33].Recently, OXA1L was found to be required for the assembly of multiple respiratory chain complexes [34].Further studies are needed to elucidate the molecular role of DDA1, ERG28, OGFOD1, and OXA1L in BPH and CaP.
However, the asymmetric sample size during the WGCNA analysis could introduce potential bias, which was a primary limitation of this study.It should be noted that the tissues from BPH, CaP, and CRPC, but not the normal prostate and adjacent tissues, were the focus of this study.Thus, the relatively small size of the normal control might not affect the major results.
In conclusion, our study identifies novel and mutual gene signatures (DDA1, ERG28, OGFOD1, and OXA1L) in BPH and CRPC, and was closely related to a suppressive tumor microenvironment and clinical prognosis.Targeting these gene signatures would be a promising way to remodel the tumor microenvironment and ameliorate the risk of prostate cancer.Thus, our study provides novel insights into immunotherapeutic strategies for prostate cancer patients.

Figure 2 .
Figure 2. Analysis of copy number variations in benign prostate hyperplasia and prostate cancer.

Figure 3 .
Figure 3.The difference in transcription factors between BPH and prostate cancer.

Figure 5 .
Figure 5.The role of mutual gene signatures between BPH and CRPC in prognosis and anti-PD1 response.

Figure 6 .
Figure 6.The comparison of immune microenvironment between BPH and CRPC.

Figure 7 .
Figure 7.The correlation between selected gene signatures with the tumor immune microenvironment.