Bioinformatics analysis identified hub genes in prostate cancer tumorigenesis and metastasis

: Objective: Prostate cancer (PCa) is the most frequent cancer found in males worldwide, and its mortality rate is increasing every year. To discover key molecular changes in PCa development and metastasis, we analyzed microarray data of localized PCa, metastatic PCa and normal prostate tissue samples from clinical specimens. Methods: Gene expression profiling datasets of PCa were analyzed online. The DAVID was used to perform GO functional and KEGG pathway enrichment analyses. CytoHubba in Cytoscape software was applied to identify hub genes. Survival data were downloaded from GEPIA. Gene expression data were obtained from ONCOMINE and UALCAN. Results: We obtained 4 sets of differentially expressed genes (DEGs), DEGs 1: a comparison of the gene expression between 4 normal prostate and 5 localized PCa samples in GSE27616; DEGs 2: a comparison of the gene expression between 6 normal prostate and 7 localized PCa samples in GSE3325; DEGs 3: a comparison of the gene expression between 5 localized PCa and 4 metastatic PCa samples in GSE27616; DEGs 4: a comparison of the gene expression between 7 localized PCa and 6 metastatic PCa samples in GSE3325. A comparison of these 4 sets of genes revealed 51 overlapped genes. GO function analysis revealed enrichment of the 51 DEGs in functions related to the proteinaceous extracellular matrix and centrosome, protein homodimerization higher in PCa than the levels in normal tissues, respectively (P ＜ 0.05). We obtained 20 hub genes from DEGs by the comparison of normal prostate tissue vs. localized cancer tissue. Among them, KIF20A , CDKN3 , PBK and CDCA2 , were expressed higher in PCa than in normal tissues, and were associated with the DFS of PCa patients. Meanwhile, we obtained 20 hub genes from DEGs by the comparison of localized cancer tissue vs. metastatic cancer tissue. Cox regression revealed PLK1 , CCNA2 and CDC20 , were associated with both the DFS and overall survival of PCa patients. Conclusions: The results suggested that the functions of KIF20A , CDKN3 , PBK and CDCA2 may contribute to PCa development and the functions of PLK1 , CCNA2 and CDC20 may contribute to PCa metastasis. Meanwhile, the functions of TOP2A , CCNB2 , BUB1 , CDK1 and EZH2 may contribute to both PCa development and metastasis.


Introduction
Prostate cancer (PCa) is the second most frequent cancer found in men worldwide (after lung cancer), with an estimate of 248,530 new cases and 34,130 deaths in the United States in 2021 [1]. And PCa is the second leading cause of cancer death in American men and survival rates are low for prostate cancers that advance to metastatic castrationresistant prostate cancer (mCRPC). Despite recent advances and a range of treatment options for mCRPC, outcomes are varied and clinicians are not able to predict response to the available therapies [2]. Therefore, the need to more accurately identify lethal PCa, in an effort to personalize medicine for those in need, has led to a large-scale push for biomarker development in the field.
Many efforts based on microarray have been done in order to select the key genes associated with PCa. Due to microarray technology, it is easier to analyze the genetic alterations underlying PCa development and progression. In addition, through bioinformatics tools it is possible to identify new biomarkers and to construct networks that could be valuable for the management of PCa patients [3]. However, to date, it has been difficult to identify key genes related and distinguished to PCa development and metastasis from microarray data. To discover key molecules active in PCa development and metastasis, we analyzed PCa data from microarray data. Our results suggested KIF20A, CDKN3, PBK and CDCA2 may contribute to PCa development, PLK1, CCNA2 and CDC20 may contribute to PCa metastasis, TOP2A, CCNB2, BUB1, CDK1 and EZH2 may contribute to PCa development and metastasis.

Online data
The gene expression profiling datasets of PCa were analyzed online (GEO; https://www.ncbi.nlm.nih.gov/geo/geo2r/). 4 normal prostate, 5 localized prostate cancer, and 4 metastatic prostate cancer samples were measured in the array GSE27616. 6 normal prostate, 7 localized prostate cancer, and 6 metastatic prostate cancer samples were measured in the array GSE3325.

Identifying differentially expressed genes (DEGs)
To analyze the microarray data, we defined DEGs 1: a comparison of the gene expression between 4 normal prostate and 5 localized prostate cancer samples in GSE27616; DEGs 2: a comparison of the gene expression between 6 normal prostate and 7 localized prostate cancer samples in GSE3325; DEGs 3: a comparison of the gene expression between 5 localized prostate cancer and 4 metastatic prostate cancer samples in GSE27616; DEGs 4: a comparison of the gene expression between 7 localized prostate cancer and 6 metastatic prostate cancer samples in GSE3325. We acquired the overlapped genes between DEGs 1 and DEGs 2 to identify genes involved with tumorigenesis, and defined genes cluster NC represented normal prostate tissue vs. localized cancer tissue. We acquired the overlapped genes between DEGs 3 and DEGs 4 to screen genes that promote tumor metastasis, and defined genes cluster LM represented localized cancer tissue vs. metastatic cancer tissue. DEGs were screened by P value and fold change (FC), restricted by P value < 0.05 and log|FC| ≥ 1. FunRich software (FunRich_3.1.3) was used to identify overlapped DEGs. The upregulated and downregulated genes were measured, respectively.

Mergine data
We proposed the two methods to process the clusters NC and LM: (1) tumorigenesis and metastasis were promoted by the same genes or proteins, the overlapped genes between NC and LM were analyzed to perform Gene Ontology (GO) functional and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis; (2) tumorigenesis and metastasis were contributed by different genes, we would find key genes from clusters NC and LM individually, and NC and LM genes were individually analyzed to perform GO and KEGG pathway analysis and retrieve interacting genes.

GO functional and KEGG pathway analysis
The Database for Annotation, Visualization, and Integrated Discovery (DAVID, https://david.ncifcrf.gov/) was used to perform GO functional and KEGG pathway enrichment analyses. P value < 0.05 was set as the cut-off criterion.

Identification of hub genes
Search Tool for the Retrieval of Interacting Genes (STRING) is a biological database (https://string-db.org) for constructing protein-protein interaction (PPI) networks, providing a system-wide view of interactions between each member. DEGs was constructed using STRING database, and an interaction with a combined score > 0.4 was considered statistically significant. Then, we established PPI network using Cytoscape software (Cytoscape_v3.7.2), which visually explores biomolecular interaction networks composed of proteins, genes and other molecules. CytoHubba in Cytoscape software was applied to screen the hub genes ranked by degree. And hub genes represented genes with a large degree and were hypothesized to play a key role in the network.

DEGs screening
We identified 1881 and 2389 DEGs by analysis of normal prostate tissue vs. localized cancer tissue in GSE27616 and GSE3325, respectively. 3246 and 6331 DEGs by analysis of localized cancer tissue vs. metastatic cancer tissue in GSE27616 and GSE3325, respectively. A comparison of these 4 sets of genes revealed 51 overlapped genes, including 18 upregulated and 33 downregulated DEGs ( Figure 1, Table 1). We next analyzed these genes by performing two kinds of functional analysis.

GO and KEGG pathway analysis
In the first analysis, the 18 upregulated and 33 downregulated genes that were differentially expressed in both the comparison of normal prostate tissue and localized cancer tissue and the comparison of localized cancer tissue and metastatic cancer tissue were analyzed using the DAVID. GO functional and KEGG pathway enrichment analyses were performed. GO function analysis revealed enrichment of these DEGs in functions related to the proteinaceous extracellular matrix and centrosome. There was an enrichment of genes involved with regulating cell division, protein homodimerization activity, mitotic nuclear division, proteinaceous extracellular matrix, chromatin binding, cell adhesion and apoptotic process. KEGG pathway analysis indicated that the identified DEGs were mainly enriched in progesterone-mediated oocyte maturation, oocyte meiosis and cell cycle. The 20 most enriched classes based on GO function analysis and the 3 most enriched KEGG pathways were listed in Table 2. In the second analysis, we focused on the DEGs identified by the comparison of normal prostate and localized cancer tissues or those identified by the comparison of localized cancer and metastatic cancer tissue samples. Analysis of DEGs from the normal prostate tissue vs. localized cancer tissue comparison should reflect key genes participating in tumorigenesis or PCa development. GO function analysis of these genes found high enrichment of functions related to plasma membrane and extracellular space, heme binding and heparin binding were the main functions of these genes, which participated in oxidation-reduction process, cell adhesion, and response to drug. KEGG pathway analysis indicated enrichment of these genes in protein digestion and absorption, and TGF-beta signaling pathway (Table 3). We next analyzed the DEGs identified by the comparison of localized cancer tissue and metastatic cancer tissue samples, which should include genes related to PCa metastasis. GO function analysis revealed enrichment of these genes in functions related to the extracellular exosome, extracellular region and extracellular space, and analysis of molecular function showed enrichment in protein binding. The most relevant enriched biological processes were cell adhesion, cell division, and cell proliferation and KEGG pathway analysis indicated enrichment of these genes in pathways in cancer, HTLV-I infection, cell cycle and cell adhesion molecules (CAMs) ( Table 3).

Hub gene analysis
We used STRING for investigating and integrating interaction between proteins. The PPI network of overlapped DEGs between NC and LM was constructed and data were exported for further analysis by Cytoscape (Figure 2a). And a total of 16 genes were identified as hub genes with degrees ≥ 14, shown in Figure 2b. Also, the top 20 hub genes with the highest degree of connectivity in clusters NC and LM were shown in Figures 2c and 2d, respectively.

Gene expression in UALCAN
UALCAN is a website that helps analyze, integrate and discover cancer transcriptomic data and deep analyses of TCGA gene expression information. It allows us to offer the differential expression analyses on normal and PCa tissues. It contained 497 PCa tissues and 52 normal prostate samples. As Figure 5 demonstrated, all hub genes significantly overexpressed in PCa tissues than in normal tissues ( P < 0.001).

Discussion
In this analysis, we defined DEGs for the NC comparison of normal prostate tissue vs. localized cancer tissue and for the LM comparison of localized cancer tissue vs. metastatic cancer tissue in GSE27616 and GSE3325, and considered the identified DEGs contributing to prostate cancer development and contributing to prostate cancer metastasis, respectively. Overlapped genes found in both NC and LM analyses affect both prostate cancer development and metastasis. GO function analysis revealed enrichment of the 51 DEGs overlapped in clusters NC and LM in functions related to the proteinaceous extracellular matrix and centrosome, protein homodimerization activity and chromatin binding were the main functions of these genes, which participated in regulating cell division, mitotic nuclear division, proteinaceous extracellular matrix, cell adhesion and apoptotic process. KEGG pathway analysis indicated that these identified DEGs were mainly enriched in progesterone-mediated oocyte maturation, oocyte meiosis and cell cycle. GO function analysis discovered DEGs in either cluster NC or LM were both enriched in extracellular space, and participated in cell adhesion. Their molecular function and KEGG pathway were different. Heme binding and heparin binding were the main functions of DEGs in cluster NC, and protein binding was the main function of DEGs in cluster LM. KEGG pathway analysis indicated that DEGs in cluster NC were mainly enriched in protein digestion and absorption, and TGF-beta signaling pathway, DEGs in cluster LM were mainly enriched in pathways in cancer, HTLV-I infection, cell cycle and CAMs.
We found that KIF20A, CDKN3, PBK and CDCA2 in cluster NC, expressed higher in prostate cancer than in normal tissues, and were associated with the DFS of prostate cancer patients, maybe associated with prostate cancer development. Although, their expressions were not all elevated in prostate cancer tissue than in normal tissues significantly, PLK1, CCNA2 and CDC20 in cluster LM, were associated with both the DFS and overall survival of prostate cancer patients, may be associated with prostate cancer metastasis. It was interesting that our analysis revealed 5 genes, TOP2A, CCNB2, BUB1, CDK1 and EZH2, overlapped in clusters NC and LM, expressed higher in prostate cancer than in normal tissues, and were associated with the DFS of prostate cancer patients, which may be associated with both prostate cancer development and metastasis.
KIF20A, known as mitotic kinesin-like protein, is a member of the kinesin family protein which is located in the Golgi apparatus and participates in the dynamics of organelle and cell division. Studies [4] have found that KIF20A was abnormally highly expressed in prostate cancer tissues and was associated with adverse prognosis of PCa patients and knockdown of KIF20A could inhibit the proliferation and invasion of prostate cancer cell. Cyclin-dependent kinase inhibitor 3 (CDKN3, also called CDI1 or KAP) is a member of the dual specificity protein phosphatase family, plays an important role in regulating cell division. It has been reported [5] that CDKN3 was over-expressed in PCa and negatively correlated with disease free survival time of PCa. Down-regulation of CDKN3 inhibited proliferation, promoted apoptosis and invasion of PCa cells, and CDKN3 may regulate these biological progresses through associating with cell cycle and DNA replication signal pathways. PDZ-binding kinase (PBK) or T-LAK cell-originated protein kinase (TOPK), is highly homologous to the MAP kinase kinases, particularly MKK3. Physiologically, PBK/TOPK plays a positive regulatory role in proper chromosomal separation and cytokinesis through phosphorylation of various targets. It was found that PBK could be a prognostic biomarker for prostate cancer that would discriminate aggressive prostate cancer from indolent disease, and was a potential target for the therapeutic intervention of aggressive prostate cancer in men [6]. CDCA2, also named as Repo-Man (Recruits PP1 onto Mitotic chromatinat anaphase), is a nuclear protein that binds to protein phosphatase 1 γ (PP1γ). It suggested that CDCA2 was responsible for the targeting of PP1 to chromatin during anaphase, leading to the dephosphorylation of H3 and controlled cell proliferation in vitro [7].
Polo-like kinase 1 (PLK1), a serine/threonine kinase that is essential for cell cycle progression, is a novel and major regulator of FOXO1 in the late phases of the cell cycle. Studies [8] have shown that PLK1 dependent phosphorylation of FOXO1 induced its nuclear exclusion and negatively regulated FOXO1′s transcriptional activity in prostate cancer. Blocking the PLK1-dependant phosphorylation of FOXO1 restored the pro-apoptotic function of FOXO1 in PCa. In addition, some studies [9] believed that cotargeting PLK1 and androgen receptor (AR) would be effective in the treatment of paclitaxel-resistant prostate cancer. Yang et al. [10] used weighted gene co-expression network analysis (WGCNA) to analyze hub genes associated with high-stage tumor stage and gleason scores in prostate cancer. They studied CCNA2 and found that it has a significant correlation with biochemical recurrence rate and survival rate of prostate cancer, and verified the role of CCNA2 in prostate cancer cell lines and found that CCNA2 was associated with tumor cell proliferation, invasion, metastasis, and cell cycle. Cell division cycle 20 (CDC20), is a WD40 repeat domain containing E3 ligase that associates with and activates the anaphase-promoting complex (APC). In prostate cancer, Mao et al. [11] demonstrated high expression level of CDC20 associated with higher gleason scores and predicted biochemical recurrence. Li et al. [12] presented the preliminary evidence that CDC20 was significantly involved in mediating chemoresistance to docetaxel partly through inhibiting Wnt/βcatenin signaling.
Metastatic PCa accounts for the majority of PCa specific mortality. However, the ability to distinguish primary PCa with metastatic potential has not been achieved. Studies have shown that primary human PCa with concurrent increased expression of TOP2A and EZH2 have a faster time to biochemical recurrence, were more susceptible to progress to a metastatic disease and thereby resulting in increased PCa specific death [13]. Jin et al. [14] suggested that TROAP and E2F1 were co-regulated by EZH2, TROAP was overexpressed in PCa when compared with the expression in normal tissues, and indicated that TROAP could promote PCa development and progression, at least partially, via a TWIST/c-Myc pathway. Cyclin B2 (CCNB2), a member of the cyclin protein family, has been found to be up-regulated in human cancers. Studies have suggested that detection of serum circulating CCNB2 mRNA may have potential clinical applications in screening and monitoring of metastasis and therapeutic treatments [15]. Other studies suggest that knockdown of circ_CCNB2 was shown to promote PCa radiosensitivity through autophagy repression by miR-30b-5p/KIF18A axis, developing a molecular resistance mechanism of PCa radiotherapy and a feasible strategy to increase radiosensitivity [16]. But the role of CCNB2 in prostate cancer tumorigenesis and metastasis remains unclear. Budding uninhibited by benzimidazoles 1 (BUB1) is a mitotic checkpoint serine/threonine kinase that serves a central role in aligning chromosomes and establishing the mitotic spindle checkpoint [17]. Researches have suggested that the overexpression of BUB1B in PCa cells promoted cell proliferation and migration. Furthermore, high expression levels of BUB1B in PCa were associated with poor clinicopathological features of PCa and predicted poor outcomes of patients with advanced PCa [18]. Cyclin dependent kinase 1 (CDK1, also called CDC2) as an androgen receptor Ser-81 kinase. Researches have indicated that CDK1 can stabilize ARs and that increased CDK1 activity may enhance AR responses to low levels of androgen in androgen-independent PCa [19]. In prostate cancer patients with PSA at diagnosis of ＜ 20 ng /ml , phosphorylation of AR at serine 515 by CDK1 may be an independent prognostic marker [20]. But whether CDK1 plays a role in prostate cancer metastasis is unclear.
Our methods have several limitations that could be improved in future studies. For instance, Zhang et al. [21] developed a time-course RNA-seq data-driven computational method to study the network mechanisms underlying cancer drug resistance, we also can use the method to study whether the identified hub genes are associated with the targeted therapy response and prognosis of PCa patients. Zhang et al. [22] presented a promising single-cell RNA-seq transcriptome-based multilayer network approach to elucidate the interactions between tumor cell and tumor-associated microenvironment and to identify prognostic and predictive signatures of cancer patients, we can combine single-cell RNA-seq data with clinical gene expression data to develop a computational pipeline for identifying the prognostic and predictive signature that connects cancer cells and microenvironmental cells. Then the single-cell RNA-seq transcriptome-based multilayer network biomarker may predict survival outcome and therapeutic response of PCa patients.

Conclusion
The present study analyzed the gene expression profiles of localized and metastatic prostate cancer tissue using bioinformatics analysis. It was identified that DEGs, may serve roles in development and metastasis of prostate cancer. These genes may become the novel potential biomarkers and therapeutic targets for patients with prostate cancer, and distinguish metastatic potential patients with indolent prostate cancer. However, further research is required to confirm whether these genes play a developmental or metastatic role in prostate cancer, or both.