Identiﬁcation of Co-Deregulated Genes in Urinary Bladder Cancer Using High-Throughput Methodologies

: Although several genes are known to be deregulated in urinary bladder cancer (UBC), the list of candidate prognostic markers has expanded due to the advance of high-throughput methodologies, but they do not always accord from study to study. We aimed to detect global gene co-expressional proﬁles among a high number of UBC tumors. We mined gene expression data from 5 microarray datasets from GEO, containing 131 UBC and 15 normal samples. Data were analyzed using unsupervised classiﬁcation algorithms. The application of clustering algorithms resulted in the isolation of 6 down-regulated genes ( TMP2, ACTC1, TAGLN, MFAP4, SPARCL1, and GLP1R ), which were mainly implicated in the proteasome, base excision repair, and DNA replication functions. We also detected 6 up-regulated genes ( CDC20, KRT14, APOBEC3B, MCM5, STMN, and YWHAB ) mainly involved in cancer pathways. We identiﬁed lists of drugs that could potentially associate with the Differentially Expressed Genes (DEGs), including Vardenaﬁl, Pyridone 6, and Manganese (co-upregulated genes) or 1D-myo-inositol 1,4,5-triphosphate (co-down regulated genes). We propose 12 novel candidate markers for UBC, as well as potential drugs, shedding more light on the underlying cause of the development and progression of the disease.


Introduction
Urinary bladder cancer (UBC) is the second most common cancer of the human urogenital system [1][2][3][4][5]. In the United States alone, it is the sixth most common malignancy representing 4.6% of all cancers, while globally it ranks eleventh in frequency [2][3][4][5]. The tumor's diagnosis is not related to age, yet its occurrence is very rare before the age of 40, with an average diagnosis at 65-70 years [2][3][4][5]. The male to female ratio is approximately 4:1. In men, UBC is the seventh most frequent tumor precedent by those of the prostate, lung, and colon. Differences in male to female ratio are probably attributed to the genderrelated life-style, to the effect that androgenic hormones have, compared to estrogens, as well as to the process of oncogenesis in the bladder [2][3][4][5].
Depending on their histological type, urinary bladder tumors are classified into transitional cell carcinomas (TCC, 95%), squamous cell carcinomas (SqCC, 1-2%), adenocarcinomas (1%), and small cell carcinomas (SmCC, <1%). Finally, there are even rarer forms, of epithelial or non-epithelial origin, such as leiomyosarcoma, lymphoma, carcinosarcoma, and rhabdomyosarcoma [6]. The staging and grading of the tumor are both critical parameters for the likelihood of its recurrence and progression. Although both give an overview of the disease state, they have limited prognostic capacity in terms of the recurrence of the tumor, patient survival, or response to treatment. Therefore, efforts are constantly made to detect new biomarkers (genes), using a sufficient sample of cancer patients of different tumor stage and grade [1,4,5,7].
Several genes are well-known to participate in the malignant transformation of cells in UBC. These include proto-oncogenes, which once aberrantly expressed, interfere with normal growth mechanisms. The best example of this case is the RAS family of genes, which encode a group of proteins (P21 or G) that indirectly regulate cell growth, where RAS mutations lead to uncontrolled cell growth [8,9]. Other known oncogenes include MYCC [10], HER2 (Cerb-B2) [11], MDM2 [12], FGFR3 [13,14] and ERBB1 [15][16][17]. Several cycle-regulatory genes are also involved in the onset of the disease.
DNA microarrays constitute a high-throughput technology used for the study of gene expression patterns. Common applications of microarrays include the study of changes in gene expression between different conditions [18]. The present work compares two different classes, healthy tissue used as controls and tumors from all stages and grades of the urinary bladder.
The determination of gene relationships or groups of genes and the identification of biological functions of interest is carried out using machine learning algorithms, such as clustering and classification [18]. A major challenge in the analysis of gene expression is the manipulation of the large number of genes found in the original dataset, which could be up to tens of thousands. Many of these genes are not related to grouping or classification and therefore, the data must be adapted somehow, to enhance the relationship between genes and samples. Data (genes) that do not provide any additional information, will add significant complexity to the study, if not removed [19]. For example, if the expression of a particular gene is the same in all samples, this gene will consequently not be able to distinguish between sub-groups. Conversely, if a gene is expressed differently, e.g., between control-disease groups, then, it is probably useful for the distinction. Therefore, the selection of an optimal gene identification subset from the original dataset is an important step before grouping or classifying, and it is customary to call such a subset of genes, "differentially expressed genes" (DEGs) [19]. In the present work, we utilized computational methods, aiming to detect common gene expression patterns in UBC, among different subtypes of the tumor. We have previously used a similar approach, through which we identified that CDC20 is a possible gene marker for the disease [20,21]. Here, we expanded our investigation by collecting gene expression data from various UBC subtypes and detected more candidate genes, which were commonly deregulated across them. Importantly, we validated most of these in an independent sample cohort. Our hypothesis is that gene expression should manifest common profiles across urinary bladder cancers, despite their histological differences. Although gene expression profiling has been widely used for tumor sub-classification, our attempt is to use it in order to find common expressional profiles, and therefore, common prognostic and/or therapeutic targets.

Dataset Collection
We mined gene expression data from five distinct datasets all freely available in the Gene Expression Omnibus (GEO) database of the National Center for Biotechnology Information (NCBI, Bethesda, MD, USA). All data derived from gene expression DNA microarray experiments were performed on the tissues of patients with bladder cancer (disease samples), as well as on healthy tissues (control samples) (Supplementary Table S1). In particular, the collected datasets were GSE27448 (consisted of 10 UBC and 5 control samples), GSE89 (consisted of 40 pathological samples), GSE3167 (consisted of 41 pathological samples and 19 controls), GSE7476 (consisted of 9 pathological samples and 3 controls) and GSE12630 (consisted of 19 pathological samples). The aforementioned datasets were selected on the basis of tumor type, i.e., UBC samples representing a complete spectrum of subtypes (e.g., Ta, Tis, T1, muscle-invasive, metastatic, grade 1-4). In total, we collected gene expression data from 131 cancer samples and 15 control samples and analyzed them computationally.

Data Pre-Processing
The aforementioned datasets were produced by different microarray platforms, whereas each platform contained a different number of probes ranging from 7000-50,000. In order to create a common gene dataset, present across all microarray platforms, the "annotation tables" of all platforms were used, in which the detailed information of each probe was available. Gene IDs and the corresponding gene symbols were used as the common identifiers for gene selection, as these are the official identifiers listed on the NCBI website. For example, the gene with the official gene symbol TP53 has a gene ID# 7157. This final selection of genes based on the above criteria resulted in 2266 common genes among all experiments.

Differentially Expressed Genes (DEGs)
To evaluate the DEGs between UBC and control samples, a permutation test with 10,000 iterations for each gene was utilized with t-statistic with unequal variance. The estimated p-values were calculated based on the common distribution of all calculated t-scores derived from all permutations for each gene. So, in total we calculated 10,000 × 2226 t-scores, thus the minimum p-value that could be obtained was 1/(10,000 × 2266) = 4.4 × 10 −8 . Applying previous algorithms [25], based on the distribution of p-values, we also estimated the coefficient π 0 = m 0 /m. This gives an approximation of how many genes are truly null, i.e., not differentially expressed. Therefore, approximately 0.4169 × 2266 = 944.69 genes were not differentially expressed. Therefore, if all genes were selected this would result in a false discovery rate (FDR) equal to 41.6%. In addition, the above algorithm derives the q-value for each gene. Based on these, we obtained the corresponding FDR value for all the genes that were considered to be statistically significant (Supplementary Figure S1). Based on the FDR, one can select the q-value (hence the p-value cutoff) and at the same time, examine the number of false positives [25].

Unsupervised Classification Methods
We used unsupervised machine learning classification algorithms, which included hierarchical clustering (HCL) with the unweighted pair group method with arithmetic mean (UPGMA) and k-means clustering. Both methodologies were applied to the set of DEGs, in order to unravel expression patterns, as well as common expressional profiles across all UBC samples. The k-means algorithm was applied with 100 iterations and the optimal cluster number for the k-means algorithm was estimated using the Calinski-Harabasz criterion. We complementarily used the Davies-Bouldin algorithm for detecting the optimal number of clusters [26].

Common Expression Patterns in UBC
The DEGs were examined for taking part in possible common expression patterns, i.e., genes that were either down-or up-regulated across all UBC samples, irrespectively of the tumor diagnosis. The clusters revealed by unsupervised classification were examined separately. Each gene was counted for its occurrences in up-or down-regulation across all samples and the result was divided by the total number of samples, providing the percentage of up-or down-regulated samples for the respective gene. We looked for genes that were co-up or down-regulated, either across all UBC samples (100%), or in 90-99% of them, respectively.

Inter-cohort Validation of the DEGs
We validated the top deregulated genes using expression data from the Cancer Genome Atlas Bladder Cancer (TCGA-BLCA) dataset, composed of 404 urinary bladder cancer samples and 19 normal bladder samples. To increase the number of the normal samples, we also used 9 normal samples from the GTEx project, having a total of 28 normal bladder samples. We used log 2 (TPM + 1) to log-scale the expression values and the oneway ANOVA to assess statistical differences between bladder cancer and normal samples from both the TCGA and GTEx projects. A log 2 FC = 1 and a p = 0.01 were used as cutoff thresholds for statistical significance.

HCL of DEGs
The DEGs were initially clustered using HCL, to detect patterns of expression. Interestingly, we found a cluster of genes exhibiting global down-regulation across all cancer samples. In addition, high grade and metastatic tumors were clustered together, as well as grade 4 tumors clustered with Ta/T1 grade 3 ones. T1/T2 grade 2 and grade 3 cancers, also appeared to manifest common patterns of gene expression ( Figure 1).

K-Means of DEGs
To investigate further the profile of the DEGs, we clustered them using k-means ( Figure 2). We broke-down the globally DEGs and separated them. In particular, the Davies-Bouldin criterion manifested that the optimal numbers of clusters were 2, 3 and 4 ( Figure 2A). Clustering for k = 2 ( Figure 2B) manifested two clusters, where the first included 49 genes and the second 391 genes. The first cluster showed probable candidate genes for global down-regulation, while the second manifested a variety of gene expressional profiles. Therefore, we clustered genes for k = 3 ( Figure 2C) and k = 4 ( Figure 2D). For k = 3, the first cluster remained the same and the next two clusters manifested up-mid, up-and downregulated genes ( Figure 2C). A further separation with k = 4, still manifested the cluster

K-Means of DEGs
To investigate further the profile of the DEGs, we clustered them using k-means ( Figure 2). We broke-down the globally DEGs and separated them. In particular, the Davies-Bouldin criterion manifested that the optimal numbers of clusters were 2, 3 and 4 ( Figure 2A). Clustering for k = 2 ( Figure 2B) manifested two clusters, where the first included 49 genes and the second 391 genes. The first cluster showed probable candidate genes for global down-regulation, while the second manifested a variety of gene expressional profiles. Therefore, we clustered genes for k = 3 ( Figure 2C) and k = 4 ( Figure 2D). For k = 3, the first cluster remained the same and the next two clusters manifested up-mid, up-and down-regulated genes ( Figure 2C). A further separation with k = 4, still manifested the cluster with the possible 49 down-regulated genes, while it revealed a cluster of 278 genes with possible up-regulated genes ( Figure 2D).

Globally Down-Regulated Genes
The previous results gave us a hint that clusters A ( Figure 2D) and B ( Figure 2D) include genes that could be possibly up-and down-regulated, globally. Therefore, these clusters (cluster A with 49 genes and B with 278 genes, Figure 2D) were chosen for further analysis.

K-Means of DEGs
To investigate further the profile of the DEGs, we clustered them using k-means ( Figure 2). We broke-down the globally DEGs and separated them. In particular, the Davies-Bouldin criterion manifested that the optimal numbers of clusters were 2, 3 and 4 ( Figure 2A). Clustering for k = 2 ( Figure 2B) manifested two clusters, where the first included 49 genes and the second 391 genes. The first cluster showed probable candidate genes for global down-regulation, while the second manifested a variety of gene expressional profiles. Therefore, we clustered genes for k = 3 ( Figure 2C) and k = 4 ( Figure 2D). For k = 3, the first cluster remained the same and the next two clusters manifested up-mid, up-and down-regulated genes ( Figure 2C). A further separation with k = 4, still manifested the cluster with the possible 49 down-regulated genes, while it revealed a cluster of 278 genes with possible up-regulated genes ( Figure 2D).

Globally Down-Regulated Genes
The previous results gave us a hint that clusters A ( Figure 2D) and B ( Figure 2D) include genes that could be possibly up-and down-regulated, globally. Therefore, these clusters (cluster A with 49 genes and B with 278 genes, Figure 2D) were chosen for further analysis.

Globally Down-Regulated Genes
The previous results gave us a hint that clusters A ( Figure 2D) and B ( Figure 2D) include genes that could be possibly up-and down-regulated, globally. Therefore, these clusters (cluster A with 49 genes and B with 278 genes, Figure 2D) were chosen for further analysis.
Additionally, we performed a k-means clustering analysis for cluster A ( Figure 2D), which resulted in two sub-clusters, termed "A1" and "A2" (Figure 3A,B). This further classification was attempted in order to further separate genes based on their expressional profile. Indeed, the genes of cluster A were separated in two clusters, one of which manifested genes with lower expression (cluster A2) compared to those in cluster A1 ( Figure 3B). For visualization purposes, we also performed an HCL clustering in all three clusters (Figure 4), resulting in clusters A ( Figure 2D), A1 ( Figure 3B), and A2 ( Figure 3B). In particular, HCL manifested and visually confirmed the down-regulated pattern of k-means clusters A ( Figure 2D), A1, and A2 ( Figure 3B), which are presented in Figure 4A-C. The globally down-regulated genes are also summarized in Table 1.
Additionally, we performed a k-means clustering analysis for cluster A ( Figure 2D), which resulted in two sub-clusters, termed "A1" and "A2" (Figure 3A,B). This further classification was attempted in order to further separate genes based on their expressional profile. Indeed, the genes of cluster A were separated in two clusters, one of which manifested genes with lower expression (cluster A2) compared to those in cluster A1 ( Figure 3B). For visualization purposes, we also performed an HCL clustering in all three clusters (Figure 4), resulting in clusters A ( Figure 2D), A1 ( Figure 3B), and A2 ( Figure 3B). In particular, HCL manifested and visually confirmed the downregulated pattern of k-means clusters A ( Figure 2D), A1, and A2 ( Figure 3B), which are presented in Figure 4A-C. The globally down-regulated genes are also summarized in Table 1.  Figure 2D were further investigated with k-means clustering. The Davies-Bouldin criterion revealed that 2 clusters were optimal for the present dataset (A). Genes were clustered for k = 2 (B), manifesting two sub-clusters, termed A1 (43 genes) and A2 (6 genes); k is the number of clusters.    Figure 2D were further investigated with k-means clustering. The Davies-Bouldin criterion revealed that 2 clusters were optimal for the present dataset (A). Genes were clustered for k = 2 (B), manifesting two sub-clusters, termed A1 (43 genes) and A2 (6 genes); k is the number of clusters.
resulted in two sub-clusters, termed "A1" and "A2" (Figure 3A,B). This further classification was attempted in order to further separate genes based on their expressional profile. Indeed, the genes of cluster A were separated in two clusters, one of which manifested genes with lower expression (cluster A2) compared to those in cluster A1 ( Figure 3B). For visualization purposes, we also performed an HCL clustering in all three clusters (Figure 4), resulting in clusters A ( Figure 2D), A1 ( Figure 3B), and A2 ( Figure 3B). In particular, HCL manifested and visually confirmed the downregulated pattern of k-means clusters A ( Figure 2D), A1, and A2 ( Figure 3B), which are presented in Figure 4A-C. The globally down-regulated genes are also summarized in Table 1.  Figure 2D were further investigated with k-means clustering. The Davies-Bouldin criterion revealed that 2 clusters were optimal for the present dataset (A). Genes were clustered for k = 2 (B), manifesting two sub-clusters, termed A1 (43 genes) and A2 (6 genes); k is the number of clusters.

Functional Annotation of the DEGs
Our analysis manifested globally up-and down-regulated genes that were further analyzed for their functional annotation.

Functional Annotation of Globally Down-Regulated Genes
The down-regulated genes did not manifest any significant gene ontological function; yet they participated in "Pathways in Cancer" and "Basal Cell Carcinoma", along with "signaling pathways regulating pluripotency of stem cells" (Figure 8A). In addition, we queried for drug annotations related to these genes and found that they are related to Vardenafil, Pyridone 6, Manganese, Disulfiram, Cholecystokinin, Tofacitinib, Pranlukast, 3-isobutyl-1methyl-7H-xanthine, Ertugliflozin, and Semaglutide ( Figure 8B). Among the drugs with the highest enrichment ration, Vardenafil and Disulfiram, an erectile dysfunction drug and a proteasome inhibitor, respectively, were of particular interest. Ertugliflozin and Semaglutide are also two drugs of potential interest since both constitute glucose resorption factors.

Functional Annotation of the DEGs
Our analysis manifested globally up-and down-regulated genes that were further analyzed for their functional annotation.

Functional Annotation of Globally Down-Regulated Genes
The down-regulated genes did not manifest any significant gene ontological function; yet they participated in "Pathways in Cancer" and "Basal Cell Carcinoma", along with "signaling pathways regulating pluripotency of stem cells" (Figure 8A). In addition, we queried for drug annotations related to these genes and found that they are related to Vardenafil, Pyridone 6, Manganese, Disulfiram, Cholecystokinin, Tofacitinib, Pranlukast, 3-isobutyl-1-methyl-7H-xanthine, Ertugliflozin, and Semaglutide ( Figure 8B). Among the drugs with the highest enrichment ration, Vardenafil and Disulfiram, an erectile dysfunction drug and a proteasome inhibitor, respectively, were of particular interest. Ertugliflozin and Semaglutide are also two drugs of potential interest since both constitute glucose resorption factors.

Functional Annotation of the Globally Up-Regulated Genes
Similarly, we assessed the functional annotation of the globally up-regulated genes across all UBC samples and found that the DEGs participate in protein binding, in terms of their molecular function. In addition, they are related to the proteasome ( Figure 9A, Table 3). These findings were confirmed by the pathway annotation analysis, where the DEGs manifested participation in the proteasome and cell cycle pathways, but also in base excision repair, DNA replication, N-glycan biosynthesis, and cell cycle and Epstein -Barr virus infection, among others ( Figure 9B). Finally, the investigation of drug annotation showed that DEGs were related to Pyridone 6, 1D-myo-inositol 1,4,5-triphosphate, Tofacitinib, Inositol 1,3,4,5-tetrakisphosphate, S,S-2(-Hydroxyethyl) Thiocysteine, Formic Acid, Liothyronine, Phenethyl Isothiocyanate, Dasatinib and Medical Cannabis ( Figure 9C).

Functional Annotation of the Globally Up-Regulated Genes
Similarly, we assessed the functional annotation of the globally up-regulated genes across all UBC samples and found that the DEGs participate in protein binding, in terms of their molecular function. In addition, they are related to the proteasome ( Figure 9A, Table 3). These findings were confirmed by the pathway annotation analysis, where the DEGs manifested participation in the proteasome and cell cycle pathways, but also in base excision repair, DNA replication, N-glycan biosynthesis, and cell cycle and Epstein -Barr virus infection, among others ( Figure 9B). Finally, the investigation of drug annotation showed that DEGs were related to Pyridone 6, 1D-myo-inositol 1,4,5-triphosphate, Tofacitinib, Inositol 1,3,4,5-tetrakisphosphate, S,S-2(-Hydroxyethyl) Thiocysteine, Formic Acid, Liothyronine, Phenethyl Isothiocyanate, Dasatinib and Medical Cannabis ( Figure 9C).  Table 3. Gene Ontology annotations of the globally up-regulated genes, as depicted in Figure 8A.   Table 3. Gene Ontology annotations of the globally up-regulated genes, as depicted in Figure 8A.

Discussion
In the present work, we investigated for the first time the presence of globally up-and down-regulated genes in UBC. Microarray experiments consist of a series of experimental and analysis steps, among which, the clustering of the data at hand is of major importance. Clustering is one of the most important functions in data analysis [32]. Some of the main tasks that clustering algorithms are called upon to solve, is gaining insight into data, classifying, and compressing them [33]. Clustering approaches aim to extract sets of genes with similar expression profiles across all samples, or in a subset of them [34]. This is mainly accomplished by using 2D matrices of the form Genes × Samples, on which appropriate clustering algorithms are applied. The k-means or bi-clustering techniques are good examples of such algorithms [35][36][37][38][39][40][41]. There are also numerous proposals in the literature regarding the analysis of time-series microarray data. In such cases, a series of cDNA microarray experiments are performed in different time points, and then clustering techniques on a 3D matrix of the form Time × Genes × Samples are applied [34,[42][43][44][45][46][47][48][49][50][51].
In the present study, our main approach was to use different clustering methods. We focused, however, on the k-means algorithm, since it is very efficient for grouping expression patterns. It is also relatively easy to implement, it can be used for large datasets, and has the significant advantage of calculating data trends by estimating the centroids [52][53][54][55]. For small cluster numbers (k values), it converges very quickly to the local optimal, allowing a sufficient number of initializations to be used [56][57][58][59][60]. An important methodological parameter is the number of clusters. It is considered a difficult task since it influences the final outcome, i.e., which genes will be clustered together. A frequent approach is to test several appropriate values of k, in order to find patterns in expression data. In the present work, the Davies-Bouldin criterion helped in this direction, which successfully gave the optimal k clusters [61].
Unlike k-means, hierarchical clustering algorithms do not offer user-defined configuration options. The only option that can be made is to calculate the distance between clusters. In this case, we chose to use the UPGMA method, which defines the distance between two blocks as the average distance between all elements. Therefore, we used UPGMA in combination with k-means clustering. It has been previously shown that HCL groupings based on k-means clusters, perform well [18]. A major disadvantage of the HCL algorithm is that it requires large amounts of memory in order to store the "Dissimilarity Matrix", making it difficult to implement when the number of data is large.
This mode of analysis led us to the identification of 36 globally down-regulated and 30 globally up-regulated DEGs. The co-up-regulated genes could pose a more interesting therapeutic target since gene inhibition is probably easier, compared to the induction of gene expression. The genes YES1 and PMM1 were the ones with the highest percentage of up-regulation among all tumor samples. To our knowledge, there is no known relation between YES1 or PMM1 and UBC. In addition, no known relations were found for the role of IRF3, ARL6IP1, CYTH2, PSMD2, PSMD7, NDUFV2, RALY, ADRM1, RPN1, TYK2, UNG, EBNA1BP2, SNRPB, and VDAC1 in UBC; therefore, these genes could be the topic of future studies.
Among the co-upregulated genes, we found CDC20, which was also previously reported to be deregulated in urinary bladder cancer [20]. The rest genes were not previously identified as co-DEGs in UBC. CDC20 encodes the cell-division cycle protein 20, a regulatory protein that interacts with several other proteins at multiple points in the cell cycle [62]. It has been observed to be highly expressed in high-grade cancers and is associated with a poor prognosis in breast, pancreatic, bladder, and lung cancers [62]. In previous studies, specifically conducted on bladder cancer, CDC20 was presented as a biomarker that associates with poor prognosis [63,64]. Another report showed that in breast cancer, CDC20 limits the activity of the tumor suppressor SMAR1 [62], while there is evidence that the tumor suppressor TP53 inhibits the growth of cancer cells through indirect regulation of CDC20 [65]. In all, CDC20 is often cited in the literature as a "potentially innovative therapeutic target" for cancer treatment [65,66].
The MCM5 (Minichromosome Maintenance Deficient 5) gene, also known as CDC46, encodes the DNA replication licensing factor protein MCM5, which is implicated in initiating DNA replication. The MCM5 protein is up-regulated during the transition from the G 0 phase to the G 1 /S phase of the cell cycle and can be actively involved in its regulation. Previous studies have shown that the MCM5 protein is a very reliable biomarker for the diagnosis and prognosis of bladder cancer. Completely differentiated cells of the healthy epithelium do not express this gene, yet cancer cells that show intense cell division strongly express MCM5, making its protein detectable in urine [67]. Among several FDA-approved diagnostic tests for the early detection of bladder cancer (cystoscopy, urine cytology, ImmunoCyt, and UroVysion), as well as for the detection of the biomarkers NMP22 (Nuclear matrix protein 22), BTA (bladder tumor antigen), MCM5 is a very reliable biomarker, regardless of the cancer type, with >70% sensitivity and >95% Negative Predictive Value (NPV) [67][68][69][70].
Interestingly, LASP1 has been previously reported in the literature as a potential diagnostic marker of UBC [71][72][73]. The above-mentioned reports are in good agreement with our findings.
In another report, the DDOST gene was found to be hypo-methylated and overexpressed in the majority of UBC samples. In particular, DDOST was in the core of the built protein-protein network in these tumors [74]. Although this was a single report on DDOST in UBC, it is also in agreement with our findings.
SMTN1 is another interesting gene that we revealed. This gene encodes the Stathmin protein, which is also known as oncoprotein 18. This is very important for the regulation of the cytoskeleton, which is involved in many cellular processes, such as cytoplasmic organization, cell division and cell division [75]. In a recent report, it was shown that Stathmin is highly expressed in a wide range of cancers [76], including those of the lung, esophagus, breast, cervix, bladder urothelial carcinoma, and glioblastoma. In addition, the same study concluded that targeting this gene reduces cell proliferation, cell motility, and increases the apoptosis of neoplastic tumors. Specifically in UBC, it has been reported that Stathmin was found: (a) to be low or even not-expressed in the healthy urothelium, (b) to correlate between high protein expression and tumor high stage and degree, (c) to be expressed in the majority of metastatic cancers. These studies are in agreement with our report, declaring STMN1 as a potential therapeutic target of great potential [77].
HMGA1 is another interesting up-regulated gene, connected to the tumor's progression and therapy resistance [78][79][80][81][82]. These reports are in accordance with our findings since HMGA1 was also up-regulated in >90% of tumors in our study, indicating that they indeed manifest similar genotypic profiles.
Another interesting report showed that the TXN gene is probably a direct target of luteolin, which inhibits tumor proliferation through the rapamycin pathway [83]. On the other hand, PTMA [84], IGFR2 [85], and YWHAB [86] are reported in the literature as three genes whose down-regulation is related to poor prognosis, cell proliferation enhancement, and anti-apoptosis, and are thus suspected to act as tumor suppressors. This is in contrast to our findings, where these genes were globally up-regulated. Our finding that USP7 is globally up-regulated, is in line with two previous reports, which indicated that USP7 inhibition is a potential therapy for UBC [87,88].
In addition, PCNA was found to be globally up-regulated in UBC. This is one of the best-studied genes in this tumor type. It is a well-known proliferation marker, related to tumor progression and anti-apoptosis; thus, it acts as an oncogene. The majority of studies agree that PCNA over-expression is related to tumor progression and survival [89][90][91][92][93]. In addition, PCNA inhibition was directly connected to UBC therapy, where it has been shown to function as an anti-tumor agent [90,[94][95][96]. These findings are in good agreement with our results, in which PCNA was globally up-regulated in UBC.
The accumulation of various mutations may be due to exogenous or endogenous factors and is closely related to the development of carcinogenesis. There is increasing evidence that APOBEC3B is a broadly mutagenic agent in multiple tumor types [97]. It also shows increased expression, affecting the ontogenesis and progression of various cancers, including those of the neck, breast, lung, head, and bladder. The APOBEC3B gene functions as a cytidine to uridine (C-to-U) editor, and it is heavily implicated in innate and adaptive immunity with important roles in antibody diversification and antiviral response [98]. It also plays an important role in the functions of the immune system, introducing mutations to the viral genomes [99][100][101][102]. This was confirmed by our functional annotation analysis showing that the identified DEGs, manifested antiviral-related functions. It is reasonable to assume that APOBEC3B inhibitors can prevent the accumulation of mutations in certain cancers, making it a promising therapeutic target. APOBEC3B is often reported in the literature with the terms "mutagen" and "tumor mutator" [103][104][105][106][107].
Another interesting gene that our analysis revealed is KRT14. This encodes the protein keratin 14, which is a type I keratin being part of the cell skeleton of epithelial cells. The bladder epithelium, also known as the urothelium, is made up of 3 different cell types. The first layer, or superficial layer, coats the inner surface of the bladder and is made up of "umbrella cells". The next layers consist of the intermediate and the basal layers, which come in contact with the skin. In several studies performed in mice, it has been shown that the cells of the basal layer, expressing KRT14, are responsible for the regeneration of urothelium in the event of injury, and are cells responsible for the oncogenesis in the bladder [108,109]. According to a previous study, KRT14 expression in bladder cancer is strongly associated with poor prognosis for survival, regardless of other clinical variables such as the stage of cancer, age, or sex [110]. Thus, it has been suggested that KRT14 is a possible prognostic biomarker for distinguishing between high and low-risk patients.
Finally, we observed that the globally up-regulated genes were mostly involved in molecular functions related to the proteasome and protein-protein binding, while the annotated pathways, included viral carcinogenesis and proteasomal functions. On the other hand, the globally down-regulated genes were found to participate in different pathways in cancer, as well as basal cell carcinoma pathways.

Conclusions
In the present work, we identified 36 down-regulated and 30 up-regulated genes, across different subtypes of urinary bladder cancer stemming from various microarray datasets. Importantly, we validated most of these DEGs in an independent dataset from the TCGA, where gene expression was assessed using RNAseq. To the best of our knowledge, there is no previous work identifying common gene expression patterns in UBC. Our study identified genes that are in agreement with previous studies regarding their implication in the disease, as well as working as predictors of tumor prognosis, progression, and therapy. The methodology of finding co-deregulated patterns of gene expression across different studies could constitute a basis for the discovery of tumor biomarkers for therapy, prognosis, and diagnosis.
Supplementary Materials: The following are available online at https://www.mdpi.com/2076-341 7/11/4/1785/s1, Figure S1: Calculated False Discovery Rate (FDR) for the common gene dataset (Legend:π 0 = m 0 m , where, m 0 and m are the number of true null hypotheses and the total number hypotheses tested respectively, p-value is the obtained p-value for each gene, q-value is the respective q-value for each gene, λ is the threshold after which the proportion of truly null features equals the number of p-values greater than λ divided by m(1 − λ)). Figure S2: Schematic representation of p-values vs. q-values (A), the q-values vs. the expected number of significant genes (B) and the number of expected significant genes vs. the number of false positives (C). Table S1: Summary of the microarray experiments (data series) used in the present study (Legend: UBC: Urinary Bladder Cancer, the indication GrX implies a UBC of unknown grade). Author Contributions: G.I.L., Conceptualization, methodology, software, validation, formal analysis, investigation, data curation, writing-original draft preparation, visualization, resources, supervision; K.V., Methodology, software, validation, formal analysis, data curation; D.K., Supervision, project ad-ministration; A.Z., Conceptualization, methodology, validation, investigation, data curation, writingoriginal draft preparation, writing-review and editing, visualization, funding acquisition. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.