Transcriptional landscape of PTEN loss in primary prostate cancer

PTEN is the most frequently lost tumor suppressor in primary prostate cancer (PCa) and its loss is associated with aggressive disease. However, the transcriptional changes associated with PTEN loss in PCa have not been described in detail. In this study, we highlight the transcriptional changes associated with PTEN loss in PCa. Using a meta-analysis approach, we leveraged two large PCa cohorts with experimentally validated PTEN and ERG status by Immunohistochemistry (IHC), to derive a transcriptomic signature of PTEN loss, while also accounting for potential confounders due to ERG rearrangements. This signature was expanded to lncRNAs using the TCGA quantifications from the FC-R2 expression atlas. The signatures indicate a strong activation of both innate and adaptive immune systems upon PTEN loss, as well as an expected activation of cell-cycle genes. Moreover, we made use of our recently developed FC-R2 expression atlas to expand this signature to include many non-coding RNAs recently annotated by the FANTOM consortium. Highlighting potential novel lncRNAs associated with PTEN loss and PCa progression. We created a PCa specific signature of the transcriptional landscape of PTEN loss that comprises both the coding and an extensive non-coding counterpart, highlighting potential new players in PCa progression. We also show that contrary to what is observed in other cancers, PTEN loss in PCa leads to increased activation of the immune system. These findings can help the development of new biomarkers and help guide therapy choices.


Background
Previous molecular studies have explored the genomic heterogeneity of prostate cancer (PCa) revealing distinct molecular subsets characterized by common genome alterations [1][2][3]. Among these molecular alterations, loss of the tumor suppressor gene phosphatase and tensin homolog (PTEN)which is implicated in the negative-regulation of the PI3K-AKT-mTOR pathwayhas been identified as one of the most common genomic drivers of primary PCa [4,5]. Since alterations in the PI3K pathway are present in more than 30% of human cancers, the identification of an expression signature associated with PTEN loss has been investigated in different tumor contexts, including breast, bladder, lung, and PCa [6,7]. Assessment of PTEN status by fluorescence in situ hybridization and immunohistochemistry (IHC) in large clinical PCa cohorts have shown a consistent association with adverse pathological features such as high Gleason score, extra-prostatic extension, as well as prognostic value for biochemical recurrence and cancer-related death [4,8]. IHC-based assessment of PTEN status has been shown to correlate tightly with genomic alterations of the PTEN locus and captures not only loss of the gene, but also mutation and epigenetic changes that lead to PTEN functional inactivation [4,9,10] and the potential clinical utility of PTEN IHC as a valuable prognostic marker has been demonstrated previously [11][12][13][14].
Though PTEN is involved in a myriad of cellular processes spanning cellular proliferation to tumor microenvironment interactions [5], the transcriptional landscape related to PTEN expression has not yet been explored in-depth, and the role of long non-coding RNAs (lncRNAs) remains elusive [15]. These observations, added to the evidence that subtle PTEN downregulation can lead to cancer susceptibility [16], demonstrate the important role of PTEN in cancer biology but also highlight the need for additional studies.
Similarly, gene rearrangements of the ETS transcription factor, ERG, with the androgen-regulated gene Transmembrane Serine Protease 2 (TMPRSS2) are present in~50% of PCa from patients of European descent. TMPRSS2-ERG fusion (herein denoted as ERG + for fusion present and ERG − for the absence of fusion) has been shown to activate the PI3K-kinase pathway similarly to PTEN loss [17], leading to increased proliferation and invasion. Importantly, tumors harboring TMPRSS2-ERG rearrangements show enrichment for PTEN loss [17,18]. The co-occurrence of these two genomic alterations makes it challenging to dissect the contributions of each to the transcriptomic landscape.
The goal of this study was to elucidate the transcriptional landscape of PTEN loss in PCa through the analysis of two large and very well clinically curated cohorts, for which PTEN and ERG status was assessed by clinicalgrade IHC: The Natural History (NH) cohort, in which patients that underwent radical prostatectomy for clinically localized PCa did not receive neoadjuvant therapy or adjuvant hormonal therapy prior to documented distant metastases [19]; and the Health Professionals Follow-up Study (HPFS) cohort in which the patients were followed for over 25 years [20]. Based on IHC-assessed PTEN status for these cohorts, we built a PTEN-loss signature highly concordant across the independent datasets, in both presence and absence of TMPRSS2-ERG fusion. Overall, this PTEN-loss signature was associated with cellular processes associated with aggressive tumor behavior (e.g., increased motility and proliferation) and, surprisingly, with increases in gene sets related to the immune response. In addition, through our recently developed FANTOM-CAT/recount2 (FC-R2) resource [21] and copy-number-variation data, we expanded this signature beyond coding genes and report the non-coding RNA repertory resulting from PTEN loss.

Data collection and immunostaining
All expression data used in this work were gathered from public domain databases. In this work, we made use of three cohorts: FC-R2 TCGA, Natural History (NH), and Health Professionals Follow-up Study (HPFS). Information about each cohort is summarized in Table 1. Information about PTEN status by immunohistochemistry for the HPFS cohort was readily available and therefore obtained from the public domain. For NH cohort samples, IHC staining for PTEN and ERG was performed using a previously validated protocol [22]. Last, for TCGA we used the Copy Number Variation (CNV) called by the GISTIC algorithm to define PTEN status and the expectation-maximization algorithm to define ERG status.

Meta-analysis of NH and HPFS cohorts
Normalized microarray expression sets for the Natural History and HPFS cohorts were obtained from the Gene Expression Omnibus (GEO) [23]. We performed a metaanalysis approach using a Bayesian hierarchical multilevel model (BHM) for cross-study detection of differential gene expression implemented in the Bioconductor package XDE [24] on microarray-based cohorts to obtain a PTEN-null signature from PTEN IHC validated samples. The model was fitted using the Δgp model with empirical starting values and 1000 bootstraps were performed. All remaining parameters were set to default values. This analysis was also performed stratifying the samples by ERG status to evaluate the impact of the ERG rearrangement in the signature.

Differential expression analysis in the TCGA cohort
Raw coverage was obtained from the FC-R2 expression atlas [21] and divided by the average read length to obtain read counts. Only primary tumor samples with a PTEN GISTIC score of − 2 and 0 were used in this analysis. Low count genes (< 5 CPM) were filtered and the remaining genes were normalized with the trimmed  [25]. A generalized linear model approach coupled with empirical Bayes moderation of standard errors and voom precision weights [26,27] was used to detect differentially expressed genes in the TCGA cohort. Adjusted p-values controlling for multiple hypothesis testing were performed using the Benjamini-Hochberg method [28] and genes with false discovery rate (FDR) equal or less than 0.01 were reported.

Gene set enrichment analysis (GSEA)
The results from the meta-analysis performed in the NH and HPFS cohort were ranked by the weighted size effect (average of the posterior probability of concordant differential expression multiplied by the Bayesian effect size of each cohort). The results from the TCGA cohort were ranked by t-statistics. Ranked lists were tested for gene set enrichment. Gene set enrichment analysis (GSEA) was performed using a Monte Carlo adaptive multilevel splitting approach, implemented in the fgsea [29] package. A collection of gene sets (Hallmarks, REACTOME, and GO Biological Processes) were obtained from the Broad Institute MSigDB database. The androgen response gene set was obtained from Scheaffer et al. [30]. Gene sets with less than 15 and more than 1500 genes were removed from the analysis, except for the GO biological processes whose max size was set to 300 to avoid overly generic gene sets. The enriched pathways were collapsed to maintain only independent ones using the function collapsePathways from fgsea.

Meta-analysis of natural history and health professionals follow-up study cohorts
We sought to obtain a consensus signature of PTEN loss that could be reproduced across independent cohorts.
We utilized a meta-analysis approach leveraging a multilevel model for cross-study detection of differential gene expression (DGE). We fitted a Bayesian hierarchical model (BHM) for analysis of differential expression across multiple studies that allowed us to aggregate data from two previously described tissue microarray-based cohorts where PTEN and ERG status was determined by IHC (Table 1 and Fig. 1) and we derived a PTEN-loss signature (Fig. 2). In this analysis, we observed 813 genes for which the differential expression was highly concordant (Bayesian Effect Size (BES) ≥ 1, posterior probability of concordant differential expression (PPCDE) ≥ 0.95) ( Table S1).
The consequences of PTEN loss on cell cycle regulation and tumor cell invasion have been extensively reported previously [4,31,32]. Accordingly, beyond PTEN itself, the top DEG genes in our signature reflected this profile ( Fig. 2 and Table S1). Dermatopontin (DPT) (BES = − 2.59, PPCDE = 1) and Alanyl membrane aminopeptidase (ANPEP) (BES = − 2.53, PPCDE = 1) were found downregulated upon PTEN loss. Leucine-Rich Repeat Neuronal 1 (LRRN1) was among the genes up-regulated upon PTEN loss (BES = 3.36, PPCDE = 1). These and other genes found differentially expressed upon PTEN loss have all been shown to be associated with a more aggressive phenotype in several cancer types [5].
Notably, we found ERG among the top upregulated genes in the signature (Fig. 2). As expected [18,33,34], ERG rearrangement was more common among cases with PTEN loss compared to intact PTEN in all cohorts (Fisher exact test, p ≤ 0.001). Given this enrichment, it was not surprising that ERG was among the most upregulated genes in the BHM signature, as well as PLA2G7, which is among the most highly overexpressed genes in ERG-rearranged PCa compared to those lacking ERG rearrangements [35]. The presence of ERG and ERG-regulated transcripts in the PTEN-loss signature suggested that this signature might be confounded by enrichment of ERG rearranged tumors among the tumors with PTEN loss.
Since ERG rearrangements represent a major driver event in PCa and PTEN loss is enriched in ERG-rearranged tumors, we next investigated the role of ERG in our PTEN-loss signature. To this end, we repeated the Bayesian hierarchical model for the analysis of differential expression by stratifying the samples by ERG status. In the background with ERG rearrangement, we observed a similar signature to the previous overall PTEN-loss signature, but without the aforementioned ERG-associated genes (Supplementary figure S1 and Supplementary Table S2).

Extending the PTEN-loss signature
To validate our PTEN loss signatures in an independent cohort, we next examined the TCGA PRAD cohort [36], where PTEN status was estimated by genomic copy number (CN) assessment, which was closely aligned with PTEN gene expression ( Figure S2). We recently developed a comprehensive expression atlas based on the FANTOM-CAT annotations. This meta-assembly is currently the broadest collection of the human transcriptome [21,37]. These gene models include many novel lncRNAs, such as enhancers and promoters, that were annotated by the FANTOM consortium based on transcriptomic and epigenomic data, allowing the signature to be further expanded beyond the coding repertoire. We used TCGA expression data from the FC-R2 expression atlas [21] to perform DGE analysis stratified by the PTEN status as derived from CN analysis. We also performed the same analysis in a stratified manner as in the HPFS and NH cohorts, using the ERG expression with expectation maximization (EM) algorithm to define ERG status given the bimodal nature of ERG expression in PCa.
We observed 521 differentially expressed genes (DEG) when comparing PTEN-null and PTEN-wild-type samples (FDR ≤ 0.01, LogFC ≥1), of which 257 were coding genes and 264 were non-coding genes (Supplementary Table S3). When stratifying the samples by ERG status, we obtained 435 and 364 DEG in the background with and without ERG rearrangement (Supplementary Tables  S4 and S5), respectively, with similar proportions of coding and non-coding genes. Using hypergeometric confidence intervals, we evaluated the concordance between the TCGA and the meta-analysis signatures. The results were found to be significantly concordant ( Figure S3), confirming that CN is a reasonable proxy to IHCstaining in TCGA. Despite differences in technology, PTEN-status call, and statistical analysis, the high concordance between the signatures suggests that they are robust and reproducible, which allowed us to expand this signature to genes that are not encompassed in microarray, especially long non-coding RNAs.
Therefore, in this analysis, we were able to identify a variety of differentially expressed lncRNAs that have been already reported to be involved in PCa development and progression such as PCA3, PCGEM1, SCHL AP1, KRTAP5-AS1, Mir-596 [38][39][40][41][42][43][44][45][46][47] (Supplementary  Tables S3, S4 and S5). PCA3 is a prostate-specific lncRNA overexpressed in PCa tissue. Similarly, lncRNA PCGEM1 expression is increased and highly specific in PCa where it promotes cell growth and it has been associated with high-risk PCa patients [42,43]. On the other hand, to the best of our knowledge, KRTAP5-AS1 expression has not been previously associated with PCa.
In addition, among highly ranked differentially expressed lncRNAs were the lncRNAs SChLAP1 and its uncharacterized antisense neighbor AC009478.1. SchLAP1 is overexpressed in a subset of PCa where it antagonizes the tumor-suppressive function of the SWI/ SNF complex and can independently predict poor outcomes [46,47]. Besides, we observed a strong correlation between SchLAP1 and AC009478.1 expression in TCGA data only for PCa and bladder cancer (R = 0.94 and 0.85, respectively, with p < 2.2e-26, Figure S4), suggesting a possible, still unknown role also for this latter lncRNA in such tumor types.
A substantial proportion of the 264 lncRNAs differentially expressed upon PTEN-loss have not been previously reported in PCa, and 134 were only annotated in the FANTOM-CAT meta-assembly ( Table 2). The FAN-TOM consortium has recently characterized hundreds of lncRNAs via molecular phenotyping [48], however, none of those associated with PTEN-loss was included in their study, and therefore they still lack an assigned function. Interestingly, it was shown that the expression levels of genes in the same topological domain are highly correlated only in tissue types in which these genes play a functional role [48]. For this reason, we characterized our novel PTEN-loss lncRNAs by analyzing the expression correlation with nearby genes across all cancer types in TCGA.
Among the FANTOM-CAT exclusive genes with the highest fold change in close proximity with coding genes, CATG00000038715 and CATG00000079217 were downregulated, while CATG00000117664 was up-regulated (Figure S5). Notably, such genes were mostly expressed in PCa as opposed to other cancer types in TCGA (Fig. 3). CATG00000038715 is near CYP4F2 and CYP4F11, encoding members of the cytochrome P450 enzyme superfamily, and the expression levels of CATG00000038715 and CYP4F2 are most highly correlated in PCa (R = 0.91, p < 2.2e-16) suggesting specificity for this cancer type ( Figure  S6). CATG00000079217 is close to the coding gene FBXL7, an F-box gene that is a component of the E3 ubiquitin ligase complex. These genes showed only a weak correlation (R = 0.14, p < 7.4e-4), however, CATG00000079217 expression was notably higher in PCa and breast cancer than in other tumors, and it was moderately correlated with several PCa biomarkers (e.g. KLK2, KLK3, STEAP2, PCGEM1, SLC45A3) [42,43,[49][50][51][52][53] (R = 0.37-0.57, p < 2.2e-16) in TCGA. Finally, CATG00000117664 is located near GPR158, a G protein-coupled receptor highly expressed in the brain. The expression between GPR158 and CATG00000117664 was significantly correlated (R = 0.54, p < 2.2e-16), and highly specific to PCa [54] (Figure S6).

PTEN loss induces the innate and adaptive immune system
We performed Gene Set Enrichment Analysis (GSEA) using fgsea [29] and tested both the BHM-and TCGA-generated molecular signatures for enrichment in three collections of the Molecular Signature Database (MSigDB) [55,56]: HALLMARKS, REACTOME, and GO Biological Processes. Results were similar in both signatures, with positive enrichment of proliferation and cell cycle-related gene sets (e.g. MYC1 targets, MTORC1 signaling, cell cycle checkpoints, and DNA repair) and both innate and adaptive immune system associated gene sets (e.g. Neutrophil degranulation, MHC antigen presentation, interferon-alpha, and gamma) (Figs. 4 and 5 and Supplementary Tables S6, S7, S8, S9, S10, S11, S12, S13, S14, S15, S16, S17, S18, S19 and S20). The positive enrichment of MHC antigen presentation, interferon-alpha, and -gamma in PTEN-null tumors is consistent with our previous study showing that the absolute density of T-cells is increased in PCa with PTEN loss [57].
Since PTEN-null tumors are known to have decreased androgen output, which is a strong suppressor of inflammatory immune cells [30,58,59], we hypothesized that this decrease in androgen levels could activate an immune response. We, therefore, performed a GSEA analysis using a collection of androgen-regulated genes from Schaeffer et al. [30] to test if the PTEN-null signature was enriched in this gene set. Both the TCGA-and BHM-signature were shown to be positively enriched in genes that were shown to be repressed upon dihydrotestosterone treatment (NES =1.39-155, FDR ≤ 0.05) ( Figure S7).

Discussion
With an estimated prevalence of up to 50%, PTEN loss is recognized as one of the major driving events in PCa [60]. PTEN antagonizes PI3K-AKT/PKB and is a key Table 2 Summary of differentially expressed genes between PTEN-null and PTEN-intact in the TCGA cohort. Number of differentially expressed genes with logFC greater than 1 and FDR lesser than 0.01 across different ERG backgrounds. The number in parenthesis shows the number of genes exclusive to the FANTOM-CAT annotations

PTEN-null vs PTEN-intact overall PTEN-null vs PTEN-intact in ERG-fusion PTEN-null vs PTEN-intact in ERG-intact
Coding genes 257 (13) 226 (7) 185 ( modulator of the AKT-mTOR signaling pathways which are important in regulating cell growth and proliferation. Accordingly, PTEN loss is consistently associated with more aggressive disease features and poor outcomes. Saal and collaborators previously generated a transcriptomic signature of PTEN loss in breast cancer [6]. While this signature was correlated with worse patient outcomes in breast and other independent cancer datasets, including PCa, the signature unsurprisingly fails to capture key characteristics of PCa such as ERG-rearrangement [6,11]. Significantly, a comprehensive transcriptomic signature reflecting the landscape of PTEN loss in PCa has not been described to date. Immunohistochemistry assay is a clinically utilized technique to determine the status of the PTEN gene, with high sensitivity and specificity for protein levels, which are reduced when genomic deletions occur [22] (Fig. 1). Therefore, we analyzed transcriptome data from two large PCa cohortsthe Health Professional Followup Study and the Natural History studyfor which IHC-based PTEN and ERG status was available (n = 390 and 207, respectively), deriving a PTEN-loss gene expression signature specific to PCa (Fig. 2 and Supplementary Table S1). Genes that are associated with increased proliferation and invasion in several cancer types, such as DPT, ANPEP, and LRRN1, were among the most concordant DEG in this signature. DPT has been shown to inhibit cell proliferation through MYC repression and to be down-regulated in both oral and thyroid cancer [61,62]. It has also been shown to control cell adhesion and invasiveness, with low expression leading to the worst prognosis [62,63]. ANPEP is known to play an important role in cell motility, invasion, and metastasis progression [63,64], and lower expression of this gene has been associated with the worst prognosis [65]. LRRN1 is a direct transcriptional target of MYCN and an enhancer of the EGFR and IGRF signaling pathway [66]. Higher levels of LRRN1 expression promote tumor cell proliferation, inhibiting cell apoptosis, and play an important role in preserving pluripotencyrelated proteins through AKT phosphorylation [66][67][68], leading to a poor clinical outcome in gastric and brain cancer.
Notably, ERG was shown to be upregulated in our signature, which led us to perform a stratified analysis to avoid capturing signals driven mostly by ERG overexpression. In this latter analysis, we were not able to detect significant differences by PTEN status in the ERG − samples for the HPFS and NH cohorts, which were quantified by gene expression microarrays. Conversely, when analyzing the TCGA cohort, we were able to detect significant changes by PTEN status in the ERG − samples (Supplementary Tables S3, S4 and S5). However, given the known limitations of gene expression microarrays performed on formalin-fixed material, such as the limited dynamic range of expression values [69], observations in the HPFS and NH datasets could have been limited by the technology employed. Nevertheless, concordance between the BHM-and TCGA-cohorts were similar in both the overall and the ERG + background comparisons (Supplementary Figure S3).
When we expanded our analyses to the non-coding transcriptome using the TCGA cohort, we identified several lncRNAs that have been already associated with PCa progression. For instance, among these lncRNAs, were PCA3 and KRTAP5-AS1. PCA3 acts by a variety of mechanisms such as down-regulation of the oncogene PRUNE2 and by acting as a miRNA sponge for mir-1261, which down-regulate the PRKD3 gene, leading to increase proliferation and migration [38,39]. Conversely, knockdown of PCA3 can lead to partial reversion of epithelial-mesenchymal transition (EMT) [40] which can lead to increased cell invasion, motility, and survival [41]. Although KRTAP5-AS1 has not been associated with PCa, it has recently shown that it can act as a miRNA sponge for miRNAs such as mir-596, which targets CLDN4, an oncogene enhancing the invasion capacity of cancer cells promoting EMT [41,44]. Thereby overexpression of KRTAP5-AS1 can potentially lead to increased levels of CLDN4 [45]. Mir-596 has also been shown to be overexpressed in response to androgen signaling and associated with anti-androgen therapy resistance [45].
Moreover, many lncRNAs exclusively annotated in the FANTOM-CAT [37] were associated with PTEN-loss and were shown to be expressed mostly in PCa (Fig. 3). Since these genes are novel without an elucidated function, we analyzed their potential roles by investigating coding genes located in the same genomic loci, under the premise of "guilty-by-association". The genes encoding for CYP4F2, FBXL7, and GPR158, respectively, are positioned in the same loci as 3 of the top DE lncRNAs only known in the FANTOM-CAT (CATG00000038715, CATG00000079217, and CATG00000117664, Figure  S5). CYP4F2 is involved in the process of inactivating and degrading leukotriene B4 (LTB4). LTB4 is a key gene in the inflammatory response that is produced in leukocytes in response to inflammatory mediators and can induce the adhesion and activation of leukocytes on the endothelium [70]. FBXL7 regulates mitotic arrest by degradation of AURKA, which is known to promote inflammatory response and activation of NF-κB [71,72].
Likewise, increase expression of GPR158 is reported to stimulate cell proliferation in PCa cell lines, and it is linked to neuroendocrine differentiation [73].
We consistently observed a strong enrichment in immune response genes and gene sets upon PTEN loss ( Fig. 4 and Supplementary Tables S6, S7, S8, S9, S10, S11, S12, S13, S14, S15, S16, S17, S18, S19 and S20). Immune-associated genes (i.e. GP2 and PLA2G2A) were found amongst the top up-regulated genes in our signature (Fig. 2). Positive enrichment of Interferonalpha-and gamma-response genes (FDR ≤ 0.01) further suggests that a strong immuno-responsive environment, with both innate and adaptive systems activated, is developed in PTEN-null tumors (Fig. 5). The positive enrichment of MHC class II antigen presentation, neutrophil degranulation, vesicle-mediated transport, and FC receptor pathway-related genes suggests that PTEN-null tumors may be immunogenic (Fig. 4). This finding was particularly surprising given that PTEN is itself a key positive regulator of the innate immune response, controlling the import of IRF3, which is responsible for IFN production. Accordingly, disruption of PTEN expression has previously been reported to lead to decreased innate immune response [74]. Conversely, it has also been hypothesized that the increased genomic instability caused by, or associated with, PTEN loss can increase immunogenicity in the tumor microenvironment (TME) [75]. This finding is of particular interest given that immune-responsive tumors can be good candidates for immunotherapy-based approaches.
Remarkably, despite the loss of PTEN being associated with higher expression of the immune checkpoint gene programmed death ligand-1 (PD-L1) in several cancer types [76,77], this is not true in PCa [78]. It has been shown that PCa employ different combinations of immune evasion mechanisms such as immunological ignorance, upregulated cytotoxic T lymphocyte-associated protein 4, and upregulated decoy receptor 3 [79]. So far, current immunotherapeutic interventions, such as PD-1 blockade, in PCa have not been successful. One of the possible reasons is the lack of PD-L1 expression [78]. Therefore, alternative targets must be considered for immunotherapy in PCa. One alternative target is the checkpoint molecule B7-H3 (CD276), whose expression has already been associated with PCa progression and worse prognosis [80] and has been suggested as a target for immunotherapy [81,82]. CD276 was one of the most concordant up-regulated genes in our signature (Fig. 2) suggesting that its expression is associated with PTEN loss. Interestingly, B7-H3 expression may be downregulated by androgens [83].
The effects of androgen on the immune system have already been extensively studied and reviewed [58]. Androgens are known to suppress inflammatory immune cells and impair the development and function of B-and T-cells [59]. We, therefore, hypothesized that the decreased levels of androgen in PTEN-null TME could lead to an unsuppressed immune system. By testing our signature for enrichment in androgen-related genes (AR) derived from Schaeffer et al. [30], we observed that upon PTEN-loss, androgen-sensitive genes that are typically suppressed by DHT are positively enriched, indicating that androgen levels or androgen response in PTEN-null tumors may be lower than in their PTEN-intact counterparts ( Figure S7). This decrease in AR-signaling has been described in PTEN-null tumors, in which activation of the PI3K pathway inhibits AR activity [84]. Furthermore, AR inhibition activates AKT signaling by inhibiting AKT phosphatase levels further boosting cell proliferation [84], which has also been noted in this study (Fig. 3). Finally, in the non-coding repertoire, both PCA3 and PCGEM1 are modulated by androgen [85,86] and were down-regulated upon PTEN loss which tracks with the observed decreased androgen response in PTEN-null tumors ( Figure S5 and S7).
Altogether, we have generated a highly concordant gene signature for PTEN loss in PCa across three independent datasets. We show that this signature was highly enriched in proliferation and cell cycle genes, leading to a more aggressive phenotype upon PTEN loss, which is concordant with the literature. We have also highlighted some lncRNAs whose expression shows high specificity in PCa. Unfortunately, the roles of these lncRNAs are currently unknown and further functional studies are warranted, we have noted that they are in proximity to genes involved in immune response. We have shown that PTEN loss is associated with an increase in both innate and adaptive immune responses. Although the literature shows that PTEN loss usually leads to immuno-suppression, we find evidence that this finding may be reversed in PCa. This observation has potential implications in the context of precision medicine since immune responsive tumors are more likely to respond to immunotherapies. Therefore, PTEN-null tumors might benefit more from this approach than PTEN-intact tumors. Potentially, PTEN status can guide immunotherapy combination with other approaches such as androgen ablation.

Conclusion
Using the FC-R2 resource, we were able to highlight many lncRNAs that may be associated with PCa progression. Although functional characterization of these lncRNAs is beyond the scope of this study, we have shown that these novel lncRNAs are highly specific to PCa and track with several coding mRNAs and lncRNAs already reported to be involved in PCa development and progression, most notably, genes involved in the immune response. By providing a PCa-specific signature for PTEN loss and highlighting potential new players, we hope to empower further studies on the mechanisms leading to the development and progression of PCa that can aid in the development of potential biomarkers, drug targets, and guide therapies choice.  Figure S1. Cross-study of differential gene expression in PTEN-null vs PTEN-intact in ERG+ samples. Metaanalysis of HPFS/PHS and NH cohorts with Bayesian Hierarchical Model for DGE using XDE showing the top 25 most concordant differentially up-and down-regulated genes. PTEN status were based on IHC assays. Figure S2. PTEN expression levels stratified by CNV. Figure shows PTEN expression levels distribution by copy number variation (CNV), called by GISTIC algorithm. Figure S3. Correspondence-at-the-top (CAT) plot between TCGA CNV-based calls and the Bayesian Hierarchical Model approach (BHM). Agreement of genes ranked by t-statistics (TCGA) and average Bayesian Effect Size (BHM). Lines represent agreement between tested cohorts for PTEN-intact vs PTEN-null. Black-to-light grey shades represent the decreasing probability of agreeing by chance based on the hypergeometric distribution, with intervals ranging from 0.999999 (light grey) to 0.95 (dark grey). Lines outside this range represent agreement in different cohorts with a higher agreement than expected by chance. Figure S4. Expression of AC009478.1 is shown to be highly specific to PRAD, BLCA, to a lesser extent in UECA and BRCA. Figure shows raw expression values of SchLAP1 and AC009478.1 across cancer types. Pearson correlations and p-values are shown in red. Figure S5. Expression of FANTOM-CAT lncRNAs genes (top) and close coding genes (bottom) stratified by PTEN status. Significances based on t-test between PTEN-null and PTENintact using log2 CPM + 1 value. Significance cutoffs: * 0.05; **≤0.01; ***≤0.0001. Figure S6. Person correlation gene CATG00000038715 and CYP4F2 across cancer types. CATG00000038715 and CYP4F2 expression are shown to be highly correlated in PCa. Moreover, CATG00000038715 expression is shown to be highly specific to PCa. With exception of leukemia cells, none of the other tumors expressed high levels of CATG00000038715. Figure S7. Gene set enrichment for Androgen re-