Identification of hub genes in bladder cancer based on weighted gene co‐expression network analysis from TCGA database

Abstract Background Muscular invasive bladder cancer (MIBC) is a common malignant tumor in the world. Because of their heterogeneity in prognosis and response to treatment, biomarkers that can predict survival or help make treatment decisions in patients with MIBC are essential for individualized treatment. Aim We aimed to integrate bioinformatics research methods to identify a set of effective biomarkers capable of predicting, diagnosing, and treating MIBC. To provide a new theoretical basis for the diagnosis and treatment of bladder cancer. Methods and results Gene expression profiles and clinical data of MIBC were obtained by downloading from the Cancer Genome Atlas database. A dataset of 129 MIBC cases and controls was included. 2084 up‐regulated genes and 2961 down‐regulated genes were identified by differentially expressed gene (DEG) analysis. Then, gene ontology analysis was performed to explore the biological functions of DEGs, respectively. The up‐regulated DEGs are mainly enriched in epidermal cell differentiation, mitotic nuclear division, and so forth. They are also involved in the cell cycle, p53 signaling pathway, PPAR signaling pathway, and so forth. The weighted gene co‐expression network analysis yielded five modules related to pathological stages and grading, of which blue and turquoise were the most relevant modules for MIBC. Next, Using Kaplan–Meier survival analysis to identify further hub genes, the screening criteria at p ≤ .05, we found CNKSR1, HIP1R, CFL2, TPM1, CSRP1, SYNM, POPDC2, PJA2, and RBBP8NL genes associated with the progression and prognosis of MIBC patients. Finally, immunohistochemistry experiments further confirmed that CNKSR1 plays a vital role in the tumorigenic context of MIBC. Conclusion The research suggests that CNKSR1, POPDC2, and PJA2 may be novel biomarkers as therapeutic targets for MIBC, especially we used immunohistochemical further to validate CNKSR1 as a therapeutic target for MIBC which may help to improve the prognosis for MIBC.

POPDC2, PJA2, and RBBP8NL genes associated with the progression and prognosis of MIBC patients. Finally, immunohistochemistry experiments further confirmed that CNKSR1 plays a vital role in the tumorigenic context of MIBC.
Conclusion: The research suggests that CNKSR1, POPDC2, and PJA2 may be novel biomarkers as therapeutic targets for MIBC, especially we used immunohistochemical further to validate CNKSR1 as a therapeutic target for MIBC which may help to improve the prognosis for MIBC. Neoadjuvant cisplatin-based chemotherapy (NAC) is the most effective approach and standard of care for MIBC before radical cystectomy. 2 But many patients do not respond to NAC and patients with MIBC usually relapse within 2 years. A biomarker is a biological substance whose detection indicates a specific disease state. 3 To date, several biomarkers have been introduced in daily clinical practice, including risk assessment, screening, differential diagnosis, prognostic determination, treatment response prediction, and disease progression monitoring. 4 With the discovery and development of high-throughput sequencing methods, the systematic analysis of high-throughput sequencing data and screening of important information is the basis for subsequent studies. 5 The emergence of network biology has led to a deeper understanding of complex biological systems, allowing the realization of tissue or cellular functions with a modular character.
The development of cancer is a systems biology process (BP) that spans different functional networks. 6 Weighted gene co-expression network analysis (WGCNA) 7 is a systems biology tool for characterizing gene expression patterns in samples and has been widely used in the analysis of various cancers, 8 such as colorectal cancer, 9 non-smallcell lung cancer (NSCLC), 10 and breast cancer. 11 WGCNA is used by studying the relationship between tissue microarray data and clinical features to identify possible biomarkers for predicting relevant cancers and comparing differentially expressed genes and studying the interactions between genes in different modules. 12 In our study, the RNA sequencing (RNA-seq) profile data of MIBC was downloaded from the Cancer Genome Atlas (TCGA) database.
Then, the differentially expressed genes (DEGs) between MIBC and normal tissues were further analyzed at the expression and functional levels. After that, The gene ontology (GO) functional enrichment analyses of DEGs were performed by clusterprofiler R package. Subsequently, WGCNA was used to identify modules related to disease status, and pivotal genes for turquoise and blue modules were identified. Finally, the hub genes were verified by survival analysis, an independent dataset, and an immunohistochemical (IHC) experiment to determine these genes play an essential role in MIBC development.
Therefore, our research may identify several effective biomarkers for MIBC and provide practical help for treating diseases.

| Data collection and processing
The resource-rich public database (TCGA: https://www.cancer.gov/ tcga) provides insight into the mechanisms of cancer progression and the opportunity to discover new biomarkers. The RNA-seq data included 19 normal and 414 tumor samples. Principal component F I G U R E 1 Flow chart of data collection, preparation, processing, analyzing, and validation in the study analysis (PCA) was performed on the gene expression data of MIBC using the "factoextra" (version: 3.3.3) R package. A follow-up analysis was performed after the exclusion of outliers. The workflow for our experimental study of the MIBC dataset in TCGA is shown in Figure 1.

| Identification of differentially expressed genes
The "limma" R package (version: 3.3.3) was utilized to identify the DEGs between MIBC and normal samples in the dataset for validation. jLog2FCj > 2 and adj. value < .01 are used as cutoff criteria. We use the R package "ggplot2" (version: 3.3.3) to visualize the DEGs and show it with a volcano plot.

| Gene ontology enrichment analyses of differentially expressed genes
A comprehensive understanding of the biological significance behind the genes is essential. GO is widely used for functional annotation and enrichment analysis; BP, molecular function (MF), and cellular component (CC) are the three major components of gene function. 13 The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a database resource that integrates genomic, chemical, and phylogenetic information. It enables efficient candidate genes for pathway enrichment analysis. In this study, GO enrichment analysis and KEGG pathway analysis of previously obtained DEGs were performed using the R package "clusterProfiler" (version: 3.14.3) and "org.Hs.eg.DB" (version: 3.10). p values < .05 for DEGs were considered statistically significant.

| Co-expression network construction
Weighted gene co-expression networks analysis (WGCNA) of DEGs was performed according to the "WGCNA" (version: 1.70-3) R language package. 14 WGCNA, which aims to find co-expressed genes(modules) and explore the association between gene networks and phenotypes of interest and the hub genes in the network. Methodologically, WGCNA is divided into two parts: expression clustering analysis and phenotype association, which mainly include five steps: (1) network construction,

| Identification of hub genes
In this study, the key gene was defined by modular connectivity, measured by the absolute value of the module to measure the relationship between

| Single nucleotide polymorphism validation of hub genes
The cBioportal (http://www.cbioportal.org/) database can provide a resource: visual analysis of multidimensional cancer genomic data.
It also provides a graphical analysis at the gene level. We selected the bladder cancer database with 413 samples from cBioPortal to map the genome, including mutations, copy-number variance (CNV), and mRNA expression z-scores (RNASeqV2 RSEM). Meanwhile, we also showed the mutation types of some hub genes by the lollipop maps.  15 The CNKSR1 antibody (product #10885-1-AP) was used to detect CNKSR1 in this study. Two independent experts evaluated the results of the experiment. The scoring criteria of CNKSR1 protein expression in MIBC samples are as follows: intensity score (À negative, + weak, + + moderate, + + strong) Â positive reaction score (<10% À, 10% 25% +, 26% 50% positive +, > 50% moderate +).

| Immunohistochemical analysis
F I G U R E 2 Differentially expressed genes ***analysis of transcription profile of muscular invasive bladder cancer (MIBC) and normal samples. Survival analysis using the GEPIA database was performed to estimate the relationship between the nine hub genes and the prognosis. As can be seen from the box-profiles in Figure 5, the CNKSR1, HIP1R, and RBBP8NL were lower expressed in MIBC than normal bladder tissues (p < .05), and they may be tumor suppressor genes in MIBC. CFL2, TPM1, CSRP1, SYNM, POPDC2, and PJA2 were highly expressed in normal tissues ( Figure 5A-C). A previous study, which found differential expression of the SYNM, TPM1, CSRP1 gene in bladder cancer, demonstrated the reliability of our study. 16

| Single nucleotide polymorphism analysis of hub genes
We used cBioPortal for Cancer Genomics (https://www.cbioportal. org) to verify the single nucleotide polymorphism of nine mutated hub genes and showed the mutation type and mutation rate of each gene ( Figure 6A). Among them, the mutation rates of RBBP8NL, HIP1R, and CNKSR1 were the highest, which were 12%, 7%, and 5%, respectively.
And the site mutation type and mutation rate of CNKSR1, HIP1R, and other CSRP1 in amino acid sequence ( Figure 6B). The mutation rates and mutation types of the nine genes were shown in Table 1.

| CNKSR1 immunohistochemical analysis
As mentioned earlier, 17 we used Proteintech-branded CNKSR1 antibody (10885-1-AP; Proteintech) for IHC analysis. Immunohistochemistry was analyzed by two independent researchers who were unaware of the clinical results. According to the Shimizu criteria standard, 18 the expression of CNKSR1 protein in MIBC samples ranged from 0 to 2+. CNKSR1 was highly expressed in tumors as compared to matched normal, unaffected resected specimens (Figure 7).
The expression levels of CNKSR1 protein were divided into two low expression groups (0 or 1+) and one high expression group (2+). The  are often found in the cancer genetic sequence of bladder cancer. 16,30 The protein encoded by TPM1 is a member of the widely distributed actin-binding protein myosin (Tm) family, which participates in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells, and studies have shown that TPM1 is a tumor suppressor gene 31  The other three genes, CNKSR1, POPDC2, and PJA2, are also essential and highly involved in the process of many tumors. The

CONFLICT OF INTEREST
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

ETHICAL STATEMENT
Not applicable.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found TCGA database (https://www.cancer.gov/).