Development and validation of hub genes for lymph node metastasis in patients with prostate cancer.

Lymph node metastasis is one of the most important independent risk factors that can negatively affect the prognosis of prostate cancer (PCa); however, the exact mechanisms have not been well studied. This study aims to better understand the underlying mechanism of lymph node metastasis in PCa by bioinformatics analysis. We analysed a total of 367 PCa cases from the cancer genome atlas database and performed weighted gene co-expression network analysis to explore some modules related to lymph node metastasis. Gene Ontology analysis and pathway enrichment analysis were conducted for functional annotation, and a protein-protein interaction network was built. Samples from the International Cancer Genomics Consortium database were used as a validation set. The turquoise module showed the most relevance with lymph node metastasis. Functional annotation showed that biological processes and pathways were mainly related to activation of the processes of cell cycle and mitosis. Four hub genes were selected: CKAP2L, CDCA8, ERCC6L and ARPC1A. Further validation showed that the four hub genes well-distinguished tumour and normal tissues, and they were good biomarkers for lymph node metastasis of PCa. In conclusion, the identified hub genes facilitate our knowledge of the underlying molecular mechanism for lymph node metastasis of PCa.

have been used to predict the occurrence of lymph node metastasis in PCa patients. 5,6 However, most of them are based on traditional biopsies or imaging-based diagnoses and have limited accuracy and sensitivity in a way. 7 In the past decade, an increasing number of microarray and next-generation sequencing technologies have been used to explore novel therapeutic targets and prognostic biomarkers for various cancers, which also provide a good method to explore potential molecular biomarkers for lymph node metastasis of PCa. 8 Weighted gene co-expression network analysis (WGCNA) is an algorithm for weighted correlation network analysis, as well as a data exploratory tool or a gene screening method to explore clusters of highly correlated genes. 9 It has been widely used to finding hub genes in many kinds of cancers. In this study, we used this algorithm to identify relevant modules and hub genes for lymph node metastasis of PCa, so that we can better understand the underlying molecular mechanism.

| Data collection and pre-processing
The workflow of the present study is shown in Figure 1. We downloaded expressing profiles of mRNA, including 367 PCa cases, from the cancer genome atlas (TCGA) database (https://portal.gdc.cancer.gov/). Clinicopathological characteristics of the 367 cases are summarized in Table 1. After screening the differentially expressed genes (DEGs) between PCa samples with and without lymph node metastasis, WGCNA was conducted to find the module relative to lymph node metastasis. Gene Ontology (GO) analysis and pathway enrichment analysis were conducted for functional annotation for selected module. We built protein-protein interaction (PPI) network and selected hub genes according to the degree of connectivity. Then, online databases were used for further validation. Meanwhile, the additional independent data set including 279 cases of PCa samples from International Cancer Genomics Consortium (ICGC) database (https://dcc.icgc.org/) was used to perform survival analyses for the hub genes; receiver operating characteristic (ROC) curves of hub genes were plotted, and area under the curves (AUC) was calculated to evaluate their capability to distinguish a patient with lymph node metastasis or not.
Clinicopathological characteristics of the 279 cases are summarized in Table 2.

| Differentially expressed genes screening
We screened the DEGs between PCa samples with and without lymph node metastasis in TCGA cohort by the 'limma' R package.
Adjust P-value < .05 and |logFC| ≥2 were set as the cut-off criterion for a better accuracy and significance as described previously. 10

| Co-expression network construction
Firstly, data pre-processing and quality assessment were performed and WGCNA algorithm was used to construct a scale-free co-expression network for the DEGs. The Pearson's correlation matrices were calculated by average linkage method for all pairwise genes. After that, we constructed a weighted adjacency matrix by a power function a mn = |c mn | (c mn = Pearson's correlation between gene m and gene n; a mn = adjacency between gene m and gene n). β is a soft-thresholding parameter that can emphasize strong correlations between genes and penalize weak correlations. We chose a proper power of β according to the mean connectivity. Then, the adjacency was transformed into a topological overlap matrix (TOM), and the corresponding dissimilarity (1-TOM) was calculated. 11 In order to classify genes with similar expression profiles into gene modules, average linkage hierarchical clustering was constructed according to the TOM-based dissimilarity measure with a minimum size (gene group) of 30 for the genes dendrogram.

| Identification of module associated with lymph node metastasis
We used two methods to identify modules relevant to clinical features of PCa. Gene significance (GS) was defined as the log10 trans-

| Functional enrichment analysis
Webgestalt (http://www.webge stalt.org/) and Metascape (http:// metas cape.org/) are online databases providing a comprehensive set of functional annotation tools for researchers to better understand biological meaning behind large list of genes. 12,13 We uploaded genes in chosen module to perform GO analysis and pathway enrichment analysis. P-value < .05 was considered statistically significant.
F I G U R E 1 Flow chart detailing the study design and samples at each stage of the analysis

| PPI network and hub genes selection
Search Tool for the Retrieval of Interacting Genes (STRING) is a biological database for constructing PPI networks, providing a system-wide view of interactions between each member. 14 Genes of selected module were mapped to STRING to explore their relationships with each other, and a combined score of >0.4 was set as the cut-off criterion as described previously. 15 Then, we established PPI network using Cytoscape software, which visually explores biomolecular interaction networks composed of proteins, genes and other molecules. The plug-in Centiscape was used to search for the most important nodes in a network by calculating centrality parameters for each node. 15 We selected hub genes based on the following criteria: high degree of connectivity; expression level was of statistically significant difference between samples with and without lymph node metastasis; overall survival was of statistically significant difference between samples with high and low gene expression level.

| Validation of hub genes by online database
UALCAN (http://ualcan.path.uab.edu/) is a portal for facilitating tumour subgroup gene expression and survival analyses. 16 Expression levels in mRNA and promoter methylation levels of hub genes were revealed using UALCAN. Additionally, the Human Protein Atlas (http://www. proteinatlas.org) was used for validation in immunohistochemistry aspect.

| Survival analyses and ROC curve analyses of hub genes
Data set including 279 cases of PCa samples from ICGC database was used for validation. Patients from ICGC cohort were divided into two groups (high expression and low expression) according to a cut-off value of mean expression of the hub genes. Then, survival analyses for hub genes were performed. The hazard ratio (HR) with 95% confidence intervals and log-rank P-value was calculated and displayed. Moreover, receiver operating characteristic (ROC) curves of hub genes were plotted and AUC was calculated with the 'ROC' R package to evaluate the capability of distinguishing a patient with lymph node metastasis or not. and GraphPad Prism 5.0 (GraphPad Software). Survival curves were plotted by the Kaplan-Meier method and compared by the log-rank test. P < .05 was considered to be statistically significant.

| DEGs screening
We obtained the expression data of 367 PCa samples after data preprocessing and quality assessment. Under the threshold of adjust Pvalue <.05 and |logFC| ≥2, a total of 745 DEGs (461 up-regulated and 284 down-regulated) were chosen for subsequent analysis. A volcano plot and heat map of DEGs are shown in Figure 2A,B, respectively.

| Weighted co-expression network construction and key modules identification
Firstly, we selected the power of β = 14 (scale free R 2 = .91) as the best soft-thresholding parameter ( Figure 3A,B); Figure 3C,D shows the positive result of the rationality test. After that, a sample dendrogram was constructed based on the similarity between the samples and the clinical characteristics of each sample are shown ( Figure 4A).
Finally, 11 modules were identified ( Figure 4B). We used two methods to test the relevance between each module and lymph node metastasis. Modules with a higher MS value were considered to have more connection with the lymph node metastasis, and we found that the MS of the turquoise module was higher than those of any other modules ( Figure 5A). Afterwards, the ME of the turquoise module showed a higher correlation with lymph node metastasis than other modules ( Figure 5C). We finally identified the turquoise module, including 133 genes, was the module most relevant to lymph node metastasis in PCa ( Figure 5B).

| Functional enrichment analysis
We performed functional enrichment analysis to look for the biological processes and pathways relevant to turquoise module. GO analysis was conducted by the tool of Webgestalt. GO analysis of biological process revealed that genes in turquoise module were mainly involved in biological regulation, cellular component organization, metabolic process, response to stimulus and multicellular organismal process ( Figure 6A). GO analysis of cellular component showed that these genes were mainly enriched in nucleus, membrane-enclosed lumen, cytosol, protein-containing complex and chromosome ( Figure 6B). GO analysis of molecular function revealed that these genes were mainly involved in protein binding, ion binding, nucleic acid binding, nucleotide binding and hydrolase activity ( Figure 6C). Metascape was used further investigate the relevant biological processes and pathways. The result showed that the biological processes and pathways were mainly related to the processes of cell cycle and mitosis ( Figure 6D,E and Table 3).

| Identification of hub genes and analysis of modules from PPI networks
A PPI network was constructed by STRING ( Figure 6F). Finally, we selected four hub genes for further discussion. They were CKAP2L  Figure 5B. This result was basically consistent with PPI network.

| Validation and efficacy evaluation of hub genes
We used data set from the online database UALCAN for validation.
These hub genes revealed higher expression levels in samples with lymph node metastasis than those without lymph node metastasis, let alone those in normal tissues ( Figure 7A). Besides, compared with the normal tissues, these hub genes revealed lower levels of promoter methylation in PCa tissues ( Figure 7B). Additionally, immunohistochemistry staining from the Human Protein Atlas database revealed that protein levels of the hub genes were significantly higher in PCa tissues compared with normal tissues ( Figure 7C).
A data set including 279 cases of PCa samples from the ICGC database was used for validation. It was found that increased ex-

| D ISCUSS I ON
Lymph node metastasis indicates poor prognosis for PCa patients. 3,4 Therefore, elucidation of the molecular mechanism of lymph node  [17][18][19] This is the basic process of lymph node metastasis; however, the exact mechanisms have not been well studied. 17 In this study, we applied WGCNA to identify the key modules and hub genes in lymph node metastasis of PCa. Functional enrichment analysis was performed, and PPI network was built to further explore the biological significance. The additional independent data set was used to confirm the reliability of the results. This study provides novel insights that will help to explain the mechanism of lymph node metastasis in PCa at the molecular level, and the hub genes identified might act as potential biomarkers as well as therapeutic targets for precise diagnosis and treatment in the future.
In the present study, the turquoise module identified by WGCNA was the most significantly related module to lymph node metastasis of PCa. In functional enrichment analysis conducted based on genes in turquoise module, we found that these genes were mainly related to the process of cell cycle and proliferation. Specifically, pathway analysis indicated that 51 up-regulated genes were significantly en- Owing to the importance of turquoise module in lymph node metastasis of PCa, we screened the hub genes of this module.
Four hub genes were selected according to the degree of connectivity in PPI network. CKAP2L, also known as Radmis, is a microtubule-associated protein that appears at the mitotic phase and participates in the cell division of neural progenitor cell. 21 Besides, CKAP2L has been proved to be a vital component of centrosome and is situated in the spindle, the midbody and the spindle pole. 22 A recent study revealed that high expression of CKAP2L promoted the invasion of lung cancer through MAPK signalling pathway and was associated with poor prognosis. 23 In CDCA8, also called Borealin/Dasra B, is a member of the chromosomal passenger complex necessary for transmission of the genome during cell division. 27 It plays a vital role in mitosis, intersecting chromosome segregation and cell division with cancer. 28 Previous study revealed that CDCA8 was overexpressed in colorectal cancers and that loss of CDCA8 suppressed the growth of cancer cells and induced apoptosis. 28 Furthermore, it was reported that high expression of CDCA8 was significantly associated with lymph node metastasis in melanoma. 29  in the development of tumours. 32 It was reported that ERCC6L was relevant to disease progression and poor prognosis in many types of cancer, such as breast, renal and colorectal cancer. 33,34 High expression of ERCC6L plus its role in embryonic development and the involvement of remodelling centromeric chromatin remind us to hypothesize that it may play a key role in tumorigenesis. However, further studies are required to clarify the role of ERCC6L in lymph node metastasis of PCa. ARPC1A is a member of the actin-related protein 2/3 (ARP2/3) complex family. 35 The ARP2/3 complex takes part in the process of actin filament nucleation and depolymerization which are necessary for the formation of invasive pseudopodia in cancer cells. 36 Previous studies have reported that components of the ARP2/3 complex were highly expressed in various tumours, including bladder, breast, gastric and lung cancers. [36][37][38][39] Consequently, ARPC1A is a promising candidate biomarker or predictor for lymph node metastasis of PCa when more studies confirm its value.

TA B L E 3 Functional enrichment analysis for genes in turquoise module
Analyses by UALCAN and the Human Protein Atlas online databases confirmed the function and the clinical significance of these hub genes. Besides, ROC curve analyses conducted based on the additional independent data set from ICGC database revealed that these hub genes did well in predicting lymph node metastasis of PCa, especially when combining all of them. Therefore, these hub genes may become potential biomarkers for predicting lymph node metastasis of PCa.
In conclusion, we used a series of bioinformatics analysis methods to identify the key genes involved in lymph node metastasis of PCa. Our results provide a more detailed molecular mechanism for lymph node metastasis of PCa, shedding light on the potential biomarkers and therapeutic targets. However, the interacting mechanism and function of genes need to be confirmed in further experiments.

CO N FLI C T O F I NTE R E S T
The authors confirm that there are no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data sets used and/or analysed in this study are available from the corresponding author on reasonable request.