Introduction

Histological examination of a carcinoma from transurethral resection specimens, especially from the bladder neck, always triggers diagnostic consideration for the origin of the carcinoma as either bladder or prostate. The distinction is crucial as it impacts further management and prognosis. For advanced bladder urothelial carcinomas, the treatment options include neoadjuvant chemotherapy followed by cystectomy1, whereas for advanced prostate adenocarcinomas, the treatment options include radiotherapy and androgen deprivation therapy2.

For low-grade carcinomas, distinction between bladder urothelial carcinomas and prostate adenocarcinomas is usually possible based on morphological features. However, for high-grade bladder urothelial carcinomas and prostate adenocarcinomas, conclusive distinction based on morphology alone is difficult due to overlapping morphological features between these two types of carcinomas. In such cases, immunohistochemistry is performed, employing a panel of antibodies to interrogate the presence of certain proteins that act as urothelial lineage or prostate lineage markers3. A number of urothelial lineage markers such as GATA3 and p63, and prostate lineage markers such as prostate-specific antigen (PSA) and prostate acid phosphatase (PAP) are routinely used, acknowledging the variable sensitivities and specificities of these markers4,5.

For the past decades, the joint effort between the National Cancer Institute and the National Human Genome Research Institute has uncovered the genomic profiles of different types of cancers via large-scale genome sequencing and integrated multi-dimensional analyses. In particular, the Pan-Cancer analysis project under The Cancer Genome Atlas (TCGA) research network incorporates datasets across tumor types as well as across platforms by broad normalization efforts, enabling analyses for commonalities, differences and emergent themes6. Capitalizing on the publicly available transcriptomic data for bladder urothelial carcinomas and prostate adenocarcinomas, firstly, this study aims to verify that genes corresponding to urothelial lineage and prostate lineage markers employed in diagnostic immunohistochemistry are indeed significantly expressed in the corresponding groups of carcinomas. Secondly, this study aims to establish the relative importance of expressions of these genes in distinguishing between bladder urothelial carcinomas and prostate adenocarcinomas. Lastly, a model incorporating expressions of urothelial lineage and prostate lineage genes is constructed to best distinguish between bladder urothelial carcinomas and prostate adenocarcinomas.

Methods

Using the Xena Browser online portal (https://xenabrowser.net/)7, TCGA Pan-Cancer database was filtered on primary tumor sites of bladder urothelial carcinoma or prostate adenocarcinoma. Lineage markers of contemporary diagnostic immunohistochemistry were pre-determined: GATA3, uroplakin III, thrombomodulin, p63, CK5/6, S100 calcium-binding protein P (S100P) and uroplakin II for urothelial lineage5, and prostate specific antigen (PSA), prostate-specific acid phosphatase (PSAP), prostein (P501S), prostate-specific membrane antigen (PSMA), NKX3.1, androgen receptor (AR), and alpha-methylacyl-CoA racemase (AMACR) for prostate lineage4. Gene expressions of these corresponding markers were downloaded, excluding cases without gene expression data. Relevant clinical data were downloaded from TCGA Prostate Cancer and TCGA Bladder Cancer databases.

Heat maps of these genes were drawn in Xena Browser. Differential gene expression analyses with RNA-seq data in unit log(TPM + 0.001) for these genes were performed between these two groups of carcinomas. Graphical display was done in R version 4.0.3 with the ggplot2 and ggpubr packages8,9. Welch-t test was applied in SPSS version 24.0. To address the multiple tests problematic, the significance level α was adjusted by the Bonferroni correction (α corrected = 0.05/14 tests = 0.003)10.

The cases were randomly divided into about 70% as the training set and the remaining as the validation set by randomly generated Bernoulli variates with probability parameter 0.7. To determine which gene expressions best distinguish between bladder urothelial carcinomas and prostate adenocarcinomas, standard linear discriminant analysis was performed in the training set and then validated in the validation set by SPSS version 24.0.

Results

A total of 407 bladder urothelial carcinoma samples and 495 prostate adenocarcinoma samples were included in this study. Relevant clinical data of these bladder and prostate carcinoma samples are summarized in Table 1.

Table 1 Clinical characteristics of bladder urothelial carcinomas and prostate adenocarcinomas.

Heat map was drawn for expressions of genes corresponding to the urothelial lineage markers for both bladder urothelial carcinomas and prostate adenocarcinomas (Fig. 1A). The corresponding genes for GATA3, uroplakin III, thrombomodulin, p63, CK5/6, S100P and uroplakin II are GATA3, UPK3A, THBD, TP63, KRT5, S100P and UPK2, respectively. For CK5/6, only KRT5 gene expression was included. Similarly, heat map for expressions of genes corresponding to the prostate lineage markers was drawn (Fig. 1B). The corresponding genes for PSA, PSAP, P501S, PSMA, NKX3.1, AR and AMACR are KLK3, ACPP, SLC45A3, FOLH1, NKX3-1, AR and AMACR, respectively.

Figure 1
figure 1

(A) Heat map for expressions of genes corresponding to the urothelial lineage markers (prepared using Xena Browser, accessed and analyzed online on 19 September 2020, https://xenabrowser.net/). (B) Heat map for expressions of genes corresponding to the prostate lineage markers (prepared using Xena Browser, accessed and analyzed online on 18 September 2020, https://xenabrowser.net/).

Figure 2 displays the boxplots of urothelial and prostate lineage gene expressions, comparing between bladder urothelial carcinomas and prostate adenocarcinomas. All urothelial lineage genes had significantly higher expressions in bladder urothelial carcinomas except UPK3A, which was significantly expressed in the prostate adenocarcinomas as compared to bladder urothelial carcinomas (all p < 0.001). All prostate lineage genes had significantly higher expressions in prostate adenocarcinomas as compared to those in bladder urothelial carcinomas (all p < 0.001).

Figure 2
figure 2

Differential gene expressions for urothelial and prostate lineage markers between bladder urothelial carcinomas and prostate adenocarcinomas (prepared using R version 4.0.3, https://cran.r-project.org/).

Standard discriminant analysis was used to see if the model could predict the group membership of the dependent variable of either bladder urothelial carcinoma or prostate adenocarcinoma based on urothelial lineage gene expressions except UPK3A. This was first analyzed in the training set and then validated in the validation set. Table 2 shows the hit ratios for the training set and the validation set; predictive accuracies of the model for the training set and the validation set were 93.1% and 93.6% respectively. In descending order of importance for the urothelial lineage gene expressions, UKP2, S100P, GATA3 and THBD were the most important predictors for bladder urothelial carcinoma based on the discriminant loading > 0.3 (Tables 3, 4).

Table 2 Hit ratios for the model based on urothelial lineage gene expressions.
Table 3 Eigenvalues, canonical correlation and Wilk's lambda test of discriminant function based on urothelial lineage gene expressions.
Table 4 Summary of interpretive measures for discriminant analysis based on urothelial lineage gene expressions.

Similarly, standard discriminant analysis was performed based on prostate lineage gene expressions to see if the model could predict the group membership of the dependent variable of either bladder urothelial carcinoma or prostate adenocarcinoma. Table 5 shows the hit ratios for the training set and the validation set; predictive accuracies of the model for the training set and the validation set were 99.8% and 100.0% respectively. In descending order of importance for the prostate lineage genes, NKX3-1, KLK3, ACPP, SLC45A3 and FOLH1 were the most important predictors for prostate adenocarcinoma based on the discriminant loading > 0.3 (Tables 6, 7).

Table 5 Hit ratios for the model based on prostate lineage gene expressions.
Table 6 Eigenvalues, canonical correlation and Wilk's lambda test of discriminant function based on prostate lineage gene expressions.
Table 7 Summary of interpretive measures for discriminant analysis based on prostate lineage gene expressions.

Standard discriminant analysis was performed based on two most important urothelial lineage genes and two most important prostate lineage genes to see if the model could predict the group membership of the dependent variable of either bladder urothelial carcinoma or prostate adenocarcinoma. Table 8 shows the hit ratios for the training set and the validation set; predictive accuracies of the model for the training set and the validation set were 99.8% and 100.0% respectively. Prostate lineage genes of NKX3-1 and KLK3 appeared to be more important predictors as compared to urothelial lineage genes of UPK2 and S100P (Tables 9, 10).

Table 8 Hit ratios for the model based on most important urothelial and prostate lineage gene expressions.
Table 9 Eigenvalues, canonical correlation and Wilk's lambda test of discriminant function based on urothelial and prostate lineage gene expressions.
Table 10 Summary of interpretive measures for discriminant analysis based on urothelial and prostate lineage gene expressions.

Discussion

To distinguish urothelial carcinomas from prostate adenocarcinomas, many studies have employed immunohistochemistry to investigate the use of several lineage markers. GATA3, Uroplakin III, Thrombomodulin, S100P, and Uroplakin II are commonly recommended as urothelial lineage markers5. Apart from that, urothelium expresses squamous cell-associated markers such as CK5/6 and p63; expressions of these markers are of value to distinguish from adenocarcinomas5. This study showed that genes corresponding to these urothelial lineage markers with the exception of UPK3A were indeed significantly expressed in the urothelial carcinomas as compared to those in prostate adenocarcinomas. Surprisingly, gene for uroplakin III, UPK3A, was highly expressed in prostate adenocarcinomas as compared to urothelial carcinomas. Contradictorily, by immunohistochemistry method, no expression of uroplakin III was observed in prostate adenocarcinomas across many studies11,12,13,14, yielding specificity of 100% in determining the origin as the bladder. This discrepancy between transcripts of UPK3A gene and uroplakin III protein expression in the prostate has been previously documented in a study15. Presence of UPK3A transcripts in the absence of uroplakin III protein is likely related to interactions between UPK1B gene expression and translation of UPK3A transcripts15.

Standard discriminant analysis of this study demonstrated that, in descending order of importance for the urothelial lineage markers, UKP2, S100P, GATA3 and THBD were the most important predictors for urothelial carcinoma by gene expression. These results corroborate to the studies whereby expressions of these urothelial lineage markers have been studied immunohistochemically12,14,16,17. Among these, GATA3 has been widely studied as a urothelial lineage marker and has a wide range of sensitivities (67–100%) across different studies16. Although most studies reported 0% staining in prostate adenocarcinomas, GATA3 generally lacks specificity because a variety of other tumors express this protein, especially breast carcinomas, cutaneous basal cell carcinomas, and trophoblastic and endodermal sinus tumors18. The corresponding protein for UKP2, uroplakin II, is a relatively new marker for urothelial lineage. The reported sensitivities and specificities for uroplakin II to differentiate urothelial carcinomas from prostate adenocarcinomas were 66–78% and 95–100%, respectively12,19,20,21. For S110P, the sensitivities and specificities were 71–100% and > 95% respectively in cases whereby antibody clone 16 was used 16. Thrombomodulin has been used as a urothelial lineage marker with sensitivities of 46–81% and specificity of 95–100% to differentiate from prostate adenocarcinomas16,17. Thrombomodulin also stains a small number of carcinomas from the lung, breast, ovary, and pancreas14.

On the other hand, recommended prostate lineage markers are PSA, PSAP, P501S, PSMA, NKX3.1, AR, and AMACR4. This study confirms that genes corresponding to these prostate lineage markers were indeed significantly expressed in the prostate adenocarcinomas as compared to those in urothelial carcinomas. Standard discriminant analysis of this study demonstrated that many of the prostate lineage markers genes were important predictors for prostate adenocarcinomas i.e. NKX3-1, KLK3, ACPP, SLC45A3 and FOLH1, corresponding to NKX3.1, PSA, PSAP, P501S, and PSMA respectively. Among these, PSA is a sensitive and specific marker for the prostatic lineage with its sensitivities and specificities of 85–100% and 88–100%, respectively to differentiate from urothelial carcinomas17. PSAP is another conventional prostate lineage marker with high sensitivities and specificities of 92–95% and 81–100% respectively17. PSMA also has a similar range of sensitivities (87–100%) and specificities (83–100%) as a prostate lineage marker3,17,22. However, PSMA is also expressed in a few other tumor tissues such as squamous cell carcinomas and adenocarcinomas from stomach, colon and pancreas22. NKX3.1 and P501S are relatively newer prostate lineage markers. Sensitivities and specificities for NKX3.1 were 69–100% and 99–100%, and for P501S were 94–100% and 99–100%, respectively3,17,23. NKX3.1 is especially useful as it is expressed in many PSA-negative prostate adenocarcinomas24.

This study showed that by combination of four lineage markers with the highest discriminant loadings, i.e. UKP2 and S100P for urothelial lineage and NKX3-1 and KLK3 for prostate lineage, classifications of training set and validation set approached 100% accuracies. Importantly, the prostate lineage genes took precedence over urothelial lineage genes as major predictors. Combination of NKX3.1, PSA, uroplakin II and S100P is therefore proposed to be the favored immunohistochemical test to resolve the dilemma of distinguishing between bladder urothelial carcinomas and prostate adenocarcinoma. This is in line with the recommendations provided by International Society of Urologic Pathology that combination of both lineage markers should be applied in such scenario with the weightage inclined towards prostate lineage markers4.

A few limitations of this study are acknowledged. Although findings of this study generally support the results of the previous studies, this study employed gene expression data of tumor tissue as compared to the visual evaluation of the lineage markers expressed on tumor cells by immunohistochemistry. Thus, discrepancy in expression between gene transcripts and proteins may arise as quantification of transcripts is dependent on tumor cellularity in the tumor tissue. Furthermore, in this study, 5.2% of bladder urothelial carcinomas were low grade and 9.1% of prostate adenocarcinomas had Gleason score of six. Inclusion of these low-grade carcinomas in this study as retrieved from the public databases differs from those studies focusing on high-grade carcinomas. Nevertheless, the findings of this study shall remain valid as total loss of expressions of all lineage markers in high-grade carcinomas is a rare event. Although this study readily provides combination of four lineage gene expressions as an algorithm to resolve the distinction between bladder urothelial carcinomas and prostate adenocarcinomas, transition to application by immunohistochemistry in routine diagnostic practice requires future validation.

Conclusions

Data mining TCGA expression data for urothelial and prostate lineage markers, this study establishes that in descending order of importance, genes for uroplakin II, S100P, GATA3 and thrombomodulin are the most important urothelial lineage markers to distinguish a carcinoma as bladder urothelial carcinoma from prostate adenocarcinoma. In descending order of importance, genes for NKX3.1, PSA, PSAP, P501S and PSMA are the most important prostate lineage markers. Classification of a carcinoma of either bladder urothelial carcinoma or prostate adenocarcinoma reaches 100% accuracy by a combination of gene expressions of uroplakin II, S100P, NKX3.1, and PSA. This combination is readily applied in clinical diagnostic immunohistochemistry to resolve the dilemma in assigning the origin of a carcinoma as either bladder or prostate.