Interactions between Germline and Somatic Mutated Genes in Aggressive Prostate Cancer

Prostate cancer (PCa) is the most common diagnosed malignancy and the second leading cause of cancer-related deaths among men in the USA. Advances in high-throughput genotyping and next generation sequencing technologies have enabled discovery of germline genetic susceptibility variants and somatic mutations acquired during tumor formation. Emerging evidence indicates that germline variations may interact with somatic events in carcinogenesis. However, the possible oncogenic interactions and cooperation between germline and somatic variation and their role in aggressive PCa remain largely unexplored. Here we investigated the possible oncogenic interactions and cooperation between genes containing germline variation from genome-wide association studies (GWAS) and genes containing somatic mutations from tumor genomes of 305 men with aggressive tumors and 52 control samples from The Cancer Genome Atlas (TCGA). Network and pathway analysis were performed to identify molecular networks and biological pathways enriched for germline and somatic mutations. The analysis revealed 90 functionally related genes containing both germline and somatic mutations. Transcriptome analysis revealed a 61-gene signature containing both germline and somatic mutations. Network analysis revealed molecular networks of functionally related genes and biological pathways including P53, STAT3, NKX3-1, KLK3, and Androgen receptor signaling pathways enriched for germline and somatic mutations. The results show that integrative analysis is a powerful approach to uncovering the possible oncogenic interactions and cooperation between germline and somatic mutations and understanding the broader biological context in which they operate in aggressive PCa.


Introduction
Despite remarkable progress in patient screening using the prostate specific antigen (PSA) and improved patient care, prostate cancer (PCa) remains a major public health problem [1,2]. PCa is the most diagnosed cancer and the second leading cause of cancer-related death among men in the US [1,2]. The majority of patients present with localized tumors that will remain indolent and pose no harm even without treatment [3]. However, a significant proportion of patients ∼ 20% to 30% will develop aggressive tumors, which progress rapidly to metastatic form and is often lethal if not detected and treated early [3,4]. Therefore, there is an urgent need for the discovery of molecular markers for early detection of aggressive PCa.
Advances in high-throughput genotyping and reduction in genotyping costs have enabled discovery of genetic variants, primarily single nucleotide polymorphisms (SNPs) herein called germline mutations associated with an increased risk of developing PCa using genome-wide association studies GWAS [5]. These findings are providing foundational knowledge about the genetic susceptibility landscape of PCa. Most notably, these discoveries are being incorporated into risk prediction models for identification of patients at high risk of developing aggressive disease [6,7]. For example, recently, Seibert et al. developed and validated a polygenic hazard score to guide screening for aggressive PCa [7]. However, despite this remarkable success, establishing the link between genetic predisposition and tumorigenesis remains a challenge. This knowledge gap has hampered 2 Prostate Cancer translation of genetic susceptibility variants from GWAS studies into clinical practice to improve human health.
The recent surge of next generation sequencing of the cancer genomes has led to an expanded molecular classification of PCa and discovery of somatic driver mutations acquired during tumorigenesis [8]. Large multicenter projects such as The Cancer Genome Atlas (TCGA) [9] and the International Cancer Genome Consortium (ICGC) [10] have performed a series of detailed analyses of the somatic alterations affecting tumor genomes in sporadic PCa. Discoveries from these large-scale studies are providing valuable information about the genomic landscape of the tumor genomes. However, to date, somatic mutation information has not been maximally leveraged and integrated with information on germline genetic susceptibility variants to infer the possible oncogenic interactions and mechanisms of cooperation between germline and somatic variation in aggressive PCa. Emerging evidence indicates that germline variations may interact with somatic events in carcinogenesis [11,12]. Recently Wang et al. reported interactions between germline genetic susceptibility loci and somatic alterations in lung cancer [11]. Feigelson et al. proposed approaches for integrating germline and tumor genomic data in cancer research [12]. However, the possible oncogenic interactions between germline and somatic alterations in aggressive PCa have not been reported. The objective of this exploratory study was to investigate the possible oncogenic interactions and cooperation among and between genes containing germline mutations and genes containing somatic mutations in aggressive PCa. Our working hypothesis was that genes containing germline and somatic mutations are functionally related and interact in gene regulatory networks and biological pathways driving aggressive PCa. To address this hypothesis we combined germline mutation information derived from GWAS with somatic mutation information from TCGA using gene expression data on aggressive PCa from the TCGA. Our evaluation focuses on genes containing germline and somatic mutated rather than individual mutations to understand the broader biological context in which they cooperate and to establish putative functional bridges between germlinesomatic interactions and the pathways they control.

Materials and Methods
. . Germline Mutations and Associated Genes. We have developed and published a comprehensive catalogue of genetic susceptibility variants (SNPs) and genes reported to be associated with an increased risk of developing PCa from GWAS [5]. For this study, we used an updated version of this catalogue. The information in this catalogue was supplemented with information from the GWAS catalogue [13] to ensure completeness of the information used in this study. Details about methods used in collection of genetic susceptibility variants and genes from GWAS have been described in our earlier publication [5]. The methods for data collection were based on the guidelines proposed by the Human Genome Epidemiology Network for systematic review of genetic associations [14][15][16][17][18], which we have described elsewhere [5]. Here, we provide a brief, but detailed description of the data used in this study as well as the inclusion and exclusion criteria used.
We reviewed over 200 published reports on GWAS on PCa. The reports were screened by title, abstract, and fulltext review to identify the studies meeting our eligibility criteria. Because most of the GWAS reports have not been clinical phenotype-specific (i.e., indolent or aggressive PCa specific), we reviewed all GWAS studies available on PCa. After screening, 150 studies that met our eligibility criteria were selected and subjected to further detailed review. The exclusion criteria for the 50 studies included removal of studies with insufficient or incomplete information, reviews, studies reporting only intergenic regions, and studies with very small sample sizes (i.e., studies containing <500 subjects in cases and controls). For the remaining 150 studies used in this study, they were considered eligible if they met the following criteria: (1) the studies must have been based on a case-control study design using unrelated individuals, (2) publications must have been of full length and published in peer-reviewed journals or online in English language before June 2018, (3) PCa must have been diagnosed by histological examination, (4) the sample sizes must be more than 500 for the cases and more than 500 for the controls to reduce sampling errors, (5) the study must have provided sufficient information such that genotype frequencies for both PCa and controls can be discerned without ambiguity, and (6) the studies must have used the appropriate and recommended statistical methods to infer the associations by taking into account the covariates and accounting for population structure and genetic background [14].
We manually extracted the information from the 150 studies meeting our eligibility criteria and the accompanying Web sites containing supplementary data. The extracted information included SNP identification number (rs-ID); evidence of association as determined by the GWAS P value; a composite of strong (P ⩽ 10 −8 ), moderate (P = 10 −5 -10 −7 ), and weak (P = 10 −2 -10 −4 ) association; gene name; and associated chromosome position to which the genes map as determined by the dbSNP database [19] and the Human Genome Nomenclature database [20]. This search yielded more than 500 SNPs in introns and exons mapped to 266 genes from a population of more than 250,000 cases and more than 250,000 controls. The GWAS data set included 61 genes containing genetic variants directly associated with an increased risk of developing aggressive PCa. The database was cross-referenced with the GWAS catalogue database for validation and to ensure that all reported genetic variants and associated genes are represented in the data set. A complete list of genetic variants and associated genes along with sources or published reports from which they were derived is presented in Table SA provided as supplementary data to this report.

. . Somatic Mutation Information and Gene Expression Data.
Today, treatment decisions for PCa patients are guided by various stratification algorithms [21]. Among these parameters, the most potent predictor of PCa mortality is the Gleason  grade which ranges from 6 to 10 in the modern era [21]. The presence of Gleason grade 6 is associated with very low cancer-specific mortality rates, even in the absence of intervention. Intermediate grade disease (Gleason grade 7) has a much more variable course. High Gleason grade 8-10 are aggressive and often lethal. For this study, we used somatic mutation information and associated genes derived from 188 patients with aggressive tumors (i.e., tumors with Gleason grad 8-10) from TCGA. In addition, because Gleason grade 7 follows a variable clinical course we used 4 + 3 score used by the American Urological Association to assign this group of patients to aggressive PCa [21]. For classification of tumors as aggressive, we used the clinical information provided by the TCGA and the classification protocols of the American Urological Association [21]. Somatic mutation information and genes were downloaded from the Genomics Data Commons and is available at https://gdc.cancer.gov/ [22]. Gene expression data derived from RNA-seq was downloaded from TCGA using Genomics Data Commons (GDC) data transfer tool along with clinical information at https://gdc.cancer.gov/ [22]. The gene expression data set included 305 patients with aggressive tumors and 52 normal control samples. Note that the number of patients with gene expression data is higher than the number of sequenced patients. The data matrix was filtered to remove rows with missing data, such that each row has at least ≥30% data using cpm (counts per million) filter (>0.5) implemented in R [23]. The resulting data set was normalized by TMM (The trimmed mean of M-values) normalization method and then transformed by Voom, using Limma package implemented in R [23]. The normalized data contained 18,428 probes and was used in the analysis. The probe IDs and gene symbols and names were matched for interpretation using the Ensemble database, a database used for gene annotation of sequencing experiments and sequencing technology platforms.
. . Data Analysis. The project design and data analysis workflow are presented in Figure 1. We performed unbiased whole genome analysis comparing gene expression levels between patients diagnosed with aggressive tumors and matched control samples using the Limma package implemented in R [23] to identify all significant differentially expressed genes distinguishing tumors from control samples. This unbiased approach was carried out to discover, germline and somatically mutated genes as well as nonmutated genes associated with PCa. We used the false discovery rate (FDR) procedure to correct for multiple hypothesis testing [24]. The genes were ranked on P values and the FDR. A gene was defined as PCa susceptibility gene if it belongs to protein-coding genes and contained the genetic susceptibility variant(s) and its expression in PCa tissue was associated with PCa risk SNPs. A gene was considered as PCa driver gene if it contained somatic mutations and there was evidence to be PCa related or the gene contained PCa-related driver mutations. A gene was considered to be involved in the germline and somatic genomes if it contained both germline and somatic mutations.
We performed hierarchical clustering using Morpheus [25] using sets of differentially expressed genes to identify clusters of coregulated genes and to assess similarity in patterns of gene expression profiles, among the germline and somatically mutated and nonmutated genes. Prior to clustering the data was standardized, normalized, and centered. For clustering, we used the Pearson correlation coefficient as the measure of distance between pairs of genes and the complete linkage as the clustering method. We performed enrichment analysis using Ingenuity Pathway Analysis (IPA) software [26] to identify molecular networks and biological pathways enriched for germline and somatic mutations. Using IPA, the most highly significantly differentially expressed genes distinguishing patients with tumors from control samples were mapped onto networks and canonical pathways. The probability scores and the log P values were calculated to assess the likelihood and reliability of correctly assigning the genes to the correct molecular networks and biological pathways. A false discovery rate was used to correct for multiple hypothesis testing in pathway analysis. The predicted molecular networks and biological pathways were ranked based on z-scores and log P values; respectively. Gene ontology (GO) [27] analysis as implemented in IPA was performed to characterize the functional relationships among genes in the networks as implemented in IPA. Genes were classified according to the molecular functions, biological processes and cellular components in which they are involved. Genes were considered interacting or cooperating if they were involved in the same molecular functions, biological process, cellular components, molecular networks, or biological pathways.

. . Discovery of Somatic Mutated and Nonmutated Gene
Signatures. To discover and characterize the somatic mutated and nonmutated gene signatures, we performed whole genome analysis comparing gene expression levels between patients with aggressive tumors and the controls samples. After controlling for multiple hypothesis testing, the analysis revealed two gene signatures. One gene signature consisted of 2,613 significantly (P<0.05) differentially expressed genes containing somatic mutations, of which 2,298 genes were highly significantly (P<0.01) differentially expressed. The other signature consisted of 10,192 significantly (P<0.05) differentially expressed genes containing no somatic mutations, of which 8,913 genes were highly significantly (P<0.01) differentially expressed. The results showing a signature of the top 34 most highly somatically mutated genes (≥ 5 mutation events) that were significantly differentially expressed are presented in Table 1. Also presented in the table are the number of somatic mutation events per gene and the differential expression P value. The list of highly mutated genes included the genes SYNE , ATM, CTNNB , FZD , and RYR implicated in PCa [28][29][30][31][32]. A complete list of significantly differentially expressed somatic mutated and nonmutated genes including their mutation frequencies along with their estimates of P values and false discovery rates are presented in Supplementary Table S1A (somatic mutated) and Table S1B (nonsomatic mutated) provided as supplementary data.
. . Germline and Somatic Mutation Gene Signatures. To discover and characterize the germline and somatic mutation gene signatures, we evaluated the significantly differentially expressed genes between cases and controls. We evaluated the 266 genes containing germline mutations associated with an increased risk of developing PCa. To get a complete picture, we performed a series of investigations. First, we investigated whether genes containing germline variation also harbor somatic mutation and whether these genes are differentially expressed between patients with aggressive tumors and control samples. Specifically, we sought to discover a gene signature with germline and somatic mutations. Secondly, we investigated whether there are any genes containing germline variations that are differentially expressed between patients with aggressive PCa and control samples, but are not somatically altered. Thirdly, we investigated whether there are somatically altered genes that are differentially expressed between patients with aggressive PCa and control samples but are not altered in the germline. The results of these analyses are presented in a Venn diagram in Figure 2. Out of the 266 genes containing germline mutations evaluated, 90 genes contained both germline and somatic mutations (Figure 2). This analysis also revealed a 61 gene signature containing both germline and somatic mutations. The second investigation produced a 117 gene signature containing germline mutations only (Figure 2). The third investigation yielded a 2552 gene signature containing somatic mutations only (Figure 2). A total of 59 genes contained germline mutations only and were not significantly differentially expressed. A set of 29 genes contained both germline and somatic mutations and were not significantly differentially expressed. Some 2137 genes contained somatic mutations only and were not significantly differentially expressed (Figure 2). About 10072 genes were significantly differentially expressed but contained neither germline nor somatic alterations (Figure 2).
The results showing a signature of the 61 genes containing both germline and somatic mutations are presented in Table 2. The signature includes the genes PGBD , OXR ,  GZF , ITGAX, ORC , BMPR B, KLK , KLK , NKX -,  SLC A , POU F B, LMTK , NAALADL , LDAH, PDLIM ,  SLC A , JAZF , LMTK , CASC , DAP IP, TIMM B,  MSMB, MYEOV, FLT , SL B , and HNF B containing genetic variants reported to be directly associated with aggressive PCa (Table 2) (references provided in supplementary Table SA). Also presented in the table are the SNP rs-IDs, the GWAS P value indicating the strength of germline mutation association PCa, the frequency of germline and somatic mutation events, and the gene expression P value indicating the level of significance in expression levels between tumors and control samples for individual genes.
There was significant variation in the distribution of germline and somatic mutations among the genes. Overall the genes containing germline variation did not have a high frequency of somatic mutations ( Table 2) Table 2). The gene LRP B was the most highly somatically altered (Table 2). Interestingly, the genes RNASEL, FTO, BMPR A, ITGA , TCF L , FREM , LMTK , and STAT containing germline mutations with weak to moderate associations were found to contain germline and somatic mutations and were differentially expressed between PCa and controls ( Table 2).

. . Patterns of Expression Profiles for Containing Germline and Somatic Mutations.
To investigate whether the 61 genes containing both germline and somatic mutations are coregulated and have similar patterns of expression profiles, we performed hierarchical clustering. The results showing the patterns of expression profiles are presented in Figure 3. The analysis revealed that genes containing germline and somatic mutations have similar patterns of expression profiles, suggesting that they are likely to be coregulated and functionally related. As expected, there was significant variation in the patterns of gene expression profiles. The variability in patterns of expression profiles among the genes containing germline and somatic mutations (Figure 3) can partially be explained by the fact that these genes were derived from different populations and different clinical phenotypes. Under such condition the observed outcome is expected. Five genes (KLK , KLK , NKX -, HOXB , and SIX ) were consistently highly expressed in both tumors and control samples (    similarities in patterns of gene expression profiles (results not presented here). Further analysis combining the 61 genes containing germline and somatic mutations with highly somatic mutated genes (i.e., >5 somatic mutation events per gene) also revealed similarity in patterns of expression profiles (results not presented). This suggests that genes containing germline and somatic mutations are likely coregulated and may not only be involved in cis regulation but also in trans regulation.

. . Enrichment Analysis of Molecular Networks and Biological Pathways.
To gain insights about the broader biological context through which interactions and cooperation among and between genes containing germline and somatic mutations operate, we performed network and pathway analysis using IPA. We hypothesized that genes containing germline and somatic mutations are functionally related and interact with one another in molecular networks and biological pathways enriched for germline and somatic mutations. Thus, we sought to discover molecular networks and biological pathways enriched for germline and somatic mutations. As a first step, we performed network and pathway analysis including the 61 genes containing germline and somatic mutations from Table 2 and the 34 genes that were found to contain high somatic mutation events (>5 somatic mutation events) from Table 1. In the second step, we combined the 61 genes containing germline variation and somatic mutations and the 34 highly somatic mutated genes with the 117 genes containing only germline mutations. Here we sought to investigate the possible oncogenic interactions between the germline and the somatic genomes. In these analyses, we filtered out genes with spurious and predicted interactions, keeping only genes with ≥ 3 direct connections to ensure the reliability of networks and functional relationships among the genes in the networks.
The results of network and pathway analysis for the 61 genes containing both germline and somatic mutations 8 Prostate Cancer In the network ( Figure 4) the genes in purple color font contain both germline and somatic mutations, the genes in blue color fonts indicate genes with high somatic mutation frequency >5, whereas gene names in red font indicate genes containing germline mutations directly associated with aggressive PCa. As evidenced in Figure 4, the genes containing germline and somatic mutations were found to be functionally related and interacting with one another. In addition, the genes containing germline and somatic mutations were found to be interacting with the most highly somatically mutated genes.
Pathway analysis revealed various signaling pathways enriched for both germline and somatic mutations. Among the most significant pathways included the PCa signaling pathway (P<5.81x10 The results of network and pathway analysis based on combining the 61 genes containing germline variation and somatic mutations and the 34 highly somatically mutated genes with the 117 genes containing germline mutations only are presented in Figure 5. Here we sought to investigate the possible oncogenic interactions between the germline and the somatic genomes. The analysis revealed 15 networks containing genes with multiple overlapping functions with z-scores ranging from 2 to 34. The top networks with Zscore 30 -34 contained genes predicted to be involved in cancer and cell cycle. The results showing the functional relationships among the genes in the three top networks are shown in Figure 5. Network analysis revealed that genes containing germline and somatic mutations, high somatic mutation frequency and genes containing only germline mutations are functionally related ( Figure 5). Interestingly, genes containing genetic variants directly associated with aggressive PCa (KLK , JAZF , KLK , and HNF BP) were found to be interacting with other sets of genes.
Pathway analysis revealed biological pathways enriched for both germline and somatic mutations. Interestingly, genes containing germline variations only were found to be involved in the same pathways with genes containing somatic mutations only. This suggests that germline and somatic mutations may cooperate through the pathways. The most significant pathways were the MSP-RON signaling pathway (P<3.21x10 −8 ), molecular mechanisms of cancer (P<3.31x10 −7 ), PDGF signaling (P<4.00x10 −7 ), Axonal guidance signaling (P<5.60x10 −7 ), P53 signaling (P<1.38x10 −6 ) and STAT3 signaling pathway (P<7.40x10 −6 ). Interestingly in both network and pathway analysis, genes containing genetic variants with strong and moderate to weak associations were found to be functionally related and interacting with genes containing somatic mutations.

Discussion
Studies of cancer genomes have been mostly concerned with understanding the somatic mutations or transcriptome changes that arise during tumorigenesis. For many years and decades, germline and somatic mutation information have generally been analyzed and reported as separate endeavors. Hundreds of genetic variants and genes associated with an increased risk of developing PCa have been reported in GWAS [5]. Similarly, hundreds of somatic mutations and genes have been reported [9,10]. However, emerging evidence indicates that germline variations may interact with somatic events to drive carcinogenesis [33][34][35]. In this exploratory study, we investigated the possible oncogenic interactions and cooperation between genes containing germline and somatic mutations. We found evidence that genes containing germline mutations also contain somatic mutations, and that these genes are functionally related and interact with one another in molecular networks and biological pathways enriched for both genetic alterations. Our analysis reveals that oncogenic interactions and cooperation between germline and somatic mutations likely occurs through molecular networks and biological pathways. To our knowledge this is the first study to show that genes containing germline mutations also contain somatic mutations and that these genes interact with one another in molecular networks and biological pathways in aggressive PCa. The novel aspect of the study is that germline and somatic mutation may interact and cooperate through molecular networks and biological pathways to drive aggressive PCa. The clinical significance of these results is that such information could be used for the development of new precision prevention strategies and polygenic risk scores to identify men at high risk of developing aggressive PCa [6,7].
In this study, evaluation of somatic mutation profiles revealed that genes containing genetic susceptibility variants did not exhibit an overall increase in somatic mutation frequency. This is consistent with the results of a recent study which explored somatic mutation profiles within susceptibility regions in a range of cancers [36]. In that report, the authors found that genes in cancer susceptibility regions did not exhibit high somatic mutation frequency [36]. However, the published analysis considered overall somatic mutations, which represented a substantial proportion of passenger mutations, while our study focused on network and pathway analysis to gain insights about the broader biological context in which germline and somatic mutations may cooperate to drive aggressive PCa. Importantly, network and pathway analysis showed that germline and somatic mutations were involved in many common signaling pathways which have been implicated in PCa. Interestingly, many highly somatically mutated genes not containing germline mutations were found to interact with germline mutated genes in biological pathways. This suggests that cooperation between germline and somatic alteration may not only involve cis regulation but also trans regulation through molecular networks and biological pathways. The clinical significance is that such pathways could serve as targets for the development of novel targeted therapeutics. Although we did not investigate the regulatory mechanisms and the impact of germline variation on gene expression in this study, association of PCa risk variants with gene expression in tumor and normal tissue has been reported [37]. Moreover, cis-regulatory variation in the human prostate transcriptome by using gene-level allelespecific expression has been reported [38].
The low somatic mutation frequency in germline mutated genes could be due to natural selection which is likely to confer selective advantage to specific genes [39]. For example, genomic alteration, clonal selection, and evolution of the tumor microenvironment, may contribute towards unique physiological characteristics under selection pressure contributing to increased mutations in the somatic or tumor genome as observed in this study for the most highly somatic mutated genes, a phenomenon which has been reported in PCa [39]. Although our approach focuses on PCa, the methodology and approach could be applied to other cancers as well. This exploratory study investigated the possible oncogenic interactions and cooperation among and between genes containing germline and somatic mutations. However, limitations of the study must be acknowledged. Important limitations include the lack of consistent methodological reporting between studies examining low and high grade PCa in GWAS studies which can confound the results. Additionally, genetic variants and genes from GWAS were derived from many diverse populations. Gene expression can be populationspecific and there is evidence from the literature that some genetic susceptibility variants may confer population-specific risks [40]. It is worth noting that this study focused on men of European ancestry because the population is well represented in both GWAS and TCGA. Future studies should include other ethnic populations, especially those underrepresented in genomic studies like African American men who are disproportionately affected by aggressive PCa.

Conclusions
Our findings suggest possible oncogenic interactions and cooperation between genes containing germline variation and somatic mutations. The results suggest a complex interplay between germline and somatic variation, underscoring the need for further studies to understand how the genetic variants and somatic alterations interact to drive aggressive PCa in different populations. Additionally, functional experiments are warranted to uncover the molecular mechanisms underlying these interactions and cooperation between the germline and the somatic genomes.

Data Availability
Germline mutation information from GWAS along with sources of original information is provided in the supplementary materials in supplementary Table SA. Original data on somatic mutations, gene expression data, and clinical information is available at the TCGA via the Genomics Data Commons https://gdc.cancer.gov/.

Supplementary Materials
Table S1B: list of significantly differentially expressed nonsomatic mutated genes associated with aggressive PCa along with estimates of P values, adjusted P values, and log fold change.