Validation of a multi-ancestry polygenic risk score and age-specific risks of prostate cancer: A meta-analysis within diverse populations

Background: We recently developed a multi-ancestry polygenic risk score (PRS) that effectively stratifies prostate cancer risk across populations. In this study, we validated the performance of the PRS in the multi-ancestry Million Veteran Program and additional independent studies. Methods: Within each ancestry population, the association of PRS with prostate cancer risk was evaluated separately in each case–control study and then combined in a fixed-effects inverse-variance-weighted meta-analysis. We further assessed the effect modification by age and estimated the age-specific absolute risk of prostate cancer for each ancestry population. Results: The PRS was evaluated in 31,925 cases and 490,507 controls, including men from European (22,049 cases, 414,249 controls), African (8794 cases, 55,657 controls), and Hispanic (1082 cases, 20,601 controls) populations. Comparing men in the top decile (90–100% of the PRS) to the average 40–60% PRS category, the prostate cancer odds ratio (OR) was 3.8-fold in European ancestry men (95% CI = 3.62–3.96), 2.8-fold in African ancestry men (95% CI = 2.59–3.03), and 3.2-fold in Hispanic men (95% CI = 2.64–3.92). The PRS did not discriminate risk of aggressive versus nonaggressive prostate cancer. However, the OR diminished with advancing age (European ancestry men in the top decile: ≤55 years, OR = 7.11; 55–60 years, OR = 4.26; >70 years, OR = 2.79). Men in the top PRS decile reached 5% absolute prostate cancer risk ~10 years younger than men in the 40–60% PRS category. Conclusions: Our findings validate the multi-ancestry PRS as an effective prostate cancer risk stratification tool across populations. A clinical study of PRS is warranted to determine whether the PRS could be used for risk-stratified screening and early detection. Funding: This work was supported by the National Cancer Institute at the National Institutes of Health (grant numbers U19 CA214253 to C.A.H., U01 CA257328 to C.A.H., U19 CA148537 to C.A.H., R01 CA165862 to C.A.H., K99 CA246063 to B.F.D, and T32CA229110 to F.C), the Prostate Cancer Foundation (grants 21YOUN11 to B.F.D. and 20CHAS03 to C.A.H.), the Achievement Rewards for College Scientists Foundation Los Angeles Founder Chapter to B.F.D, and the Million Veteran Program-MVP017. This research has been conducted using the UK Biobank Resource under application number 42195. This research is based on data from the Million Veteran Program, Office of Research and Development, and the Veterans Health Administration. This publication does not represent the views of the Department of Veteran Affairs or the United States Government.


Introduction
Prostate cancer is the second leading cause of cancer death and represents one of the largest health disparities in the United States, with African ancestry men having the highest incidence rates (Howlader, 2021). Genetic factors play an important role in prostate cancer susceptibility (Mucci et al., 2016;Conti et al., 2021) and racial/ethnic disparities in disease incidence (Conti et al., 2021). Polygenic risk scores (PRS), comprised of common genetic variants, have been shown to enable effective risk stratification for many common cancers (Kachuri et al., 2020;Mars et al., 2020;Balavarca et al., 2020;Pal Choudhury et al., 2020). We recently conducted a multi-ancestry genome-wide association study (GWAS), including 107,247 prostate cancer cases and 127,006 controls (75.8% of European ancestry, 11.7% of East Asian ancestry, 9.1% of African ancestry, and 3.4% Hispanic), where 269 common genetic variants were genome-wide significantly associated with prostate cancer risk (Conti et al., 2021). Although individual genetic variants modulate disease risk only marginally, the aggregated effect of these 269 risk variants, measured by a PRS, was found to stratify prostate cancer risk in independent samples of European and African ancestry (Conti et al., 2021;Plym et al., 2022). As a measure of genetic susceptibility to prostate cancer, the PRS could potentially be an effective tool to identify men across diverse populations at higher risk of developing prostate cancer and allow them to make more informed decisions regarding at what age(s) and how frequently to undergo prostate-specific antigen (PSA) screening.
In this investigation, we evaluated the previously developed multi-ancestry PRS in large independent samples of men from the Veteran Affairs Million Veteran Program (MVP; 21,078 cases and 284,177 controls, including 13,643 cases and 210,214 controls of European ancestry, 6353 cases and 53,362 controls of African ancestry, and 1082 cases and 20,601 controls from Hispanic populations) (Gaziano et al., 2016), the Men of African Descent and Carcinoma of the Prostate (MADCaP) Network (405 cases and 396 controls of African ancestry) (Harlemon et al., 2020), and the Maryland Prostate Cancer Case-Control Study (NCI-MD; 383 cases and 395 controls of African ancestry) (Smith et al., 2017;'Methods'). We also included, through meta-analysis, independent replication studies of the multi-ancestry PRS conducted to date in European (UK Biobank and Mass General Brigham [MGB] Biobank) and African ancestry populations (California and Uganda Prostate Cancer Study [CA UG] and MGB Biobank; 'Methods'; Conti et al., 2021;Plym et al., 2022), bringing the total sample to 31,925 cases and 490,507 controls.
In each of the replication studies included in our analysis, the PRS was constructed by summing variant-specific weighted allelic dosages of the 269 prostate cancer risk variants using the multiancestry conditional weights generated from our previous GWAS for prostate cancer ('Methods'). Within each ancestry population, the association of PRS on prostate cancer risk was evaluated separately in each study and combined in a fixed-effects inverse-variance-weighted meta-analysis. Agestratified analyses were performed in two large replication studies, UK Biobank and MVP, to assess the age-specific effects of PRS on prostate cancer risk. The absolute risk of prostate cancer was calculated for a given age for each PRS category in men from European, African and Hispanic populations Figure 1. Association between the multi-ancestry polygenic risk score (PRS) of 269 variants and prostate cancer risk in men from European, African, and Hispanic populations. The European ancestry replication studies included Million Veteran Program (MVP), UK Biobank (Conti, Darst et al., Nature Genetics, 2021), and Mass General Brigham (MGB) Biobank (Plym et al., JNCI, 2021). The African ancestry replication studies included MVP, California and Uganda Prostate Cancer Study (CA UG) (Conti, Darst et al., Nature Genetics, 2021), Men of African Descent and Carcinoma of the Prostate (MADCaP) Network, Maryland Prostate Cancer Case-Control Study (NCI-MD), and MGB Biobank (Plym et al., JNCI, 2021). Replication in Hispanic men was conducted in MVP. Results from individual replication studies are shown in Figure 1-figure supplement 1. The x-axis indicates the PRS category. Additional analysis was performed to evaluate the PRS association in men with extremely high genetic risk (99-100%). The y-axis indicates OR with error bars representing 95% CIs for each PRS category compared to the 40-60% PRS. The dotted horizontal line corresponds to an OR of 1. ORs and 95% CIs for each decile are provided in Figure 1-source data 1.
The online version of this article includes the following source data and figure supplement(s) for figure 1: Source data 1. Association between the multi-ancestry polygenic risk score (PRS) and prostate cancer risk replicated in men from European, African, and Hispanic populations.  (Antoniou et al., 2010;Kuchenbaecker et al., 2017;Amin Al Olama et al., 2015;Antoniou et al., 2001) using age-and population-specific prostate cancer incidence from the Surveillance, Epidemiology, and End Results (SEER) Program (1999Program ( -2013 and age-and population-specific mortality rates from the National Center for Health Statistics, CDC (1999-2013. The PRS was also tested for association with disease aggressiveness in MVP ('Methods,' Appendix 1-figure 1).

Results
The multi-ancestry PRS was strongly associated with prostate cancer risk in the three populations ( Figure 1, Figure 1-source data 1). In European ancestry men, ORs were 3.78 (95% CI = 3.41-3.81) and 7.32 (95% CI = 6.76-7.92) for men in the top PRS decile (90-100%) and top percentile (99-100%), respectively, compared to men with average genetic risk (40-60% PRS category). In African ancestry men, ORs were 2.80 (95% CI = 2.49-2.95) and 4.98 (95% CI = 4.27-5.79) for men in the top PRS decile and percentile, respectively. In Hispanic men, ORs were 3.22 (95% CI = 2.64-3.92) and 6.91 (95% = 4.97-9.60) for men in the top PRS decile and percentile, respectively. PRS associations within each ancestry population were generally consistent across individual replication studies (Figure 1-figure  supplement 1). The area under the curve (AUC) increased 0.136 on average across populations upon adding the PRS to a base model of age and principal components of ancestry (Appendix 1-table 1). Compared to the mean PRS in European ancestry controls, African ancestry controls had a mean PRS associated with a relative risk of 2.19 (95% CI = 2.17-2.21), while Hispanic controls had a relative risk of 1.16 (95% CI = 1.15-1.18), consistent with previous findings (Conti et al., 2021).

Discussion
Findings from this investigation provide further support for the PRS as a prostate cancer risk stratification tool in men from European, African, and Hispanic populations. Notably, this investigation provides the first evidence of replication of the multi-ancestry PRS in Hispanic men. Consistent with previous findings (Conti et al., 2021;Plym et al., 2022), we observed lower PRS performance in African versus European ancestry men, supporting the need to expand GWAS and fine-mapping efforts in African ancestry men. The stronger association of the PRS with prostate cancer risk observed for younger men supports previous studies (Conti et al., 2021), suggesting that the contribution of genetic factors to prostate cancer is greater at younger ages and that age needs to be considered when comparing PRS findings across studies and populations.
The PRS is an effective risk stratification tool for prostate cancer at both ends of the risk spectrum. Current guidelines consider age, self-reported race, and a family history of prostate cancer in PSA screening decisions (Schaeffer et al., 2021). Although the PRS generally did not differentiate aggressive versus nonaggressive prostate cancer, a substantial fraction of men who will develop aggressive tumors (~40%) are among a subset of men in the population with the highest PRS (top 20%; Appendix 1-table 2), while only ~7% of men who will develop aggressive tumors are among the subset of men in the population with the lowest PRS (bottom 20%; Appendix 1-table 2), suggesting . The x-axis indicates the PRS category. Additional analyses were performed to evaluate the PRS association in men with extremely high genetic risk (top percentile, 99-100%). The y-axis indicates the OR with error bars representing the 95% CIs for each PRS category compared to the 40-60% PRS category. The dotted horizontal line corresponds to an OR of 1. The number of cases and controls, ORs, and 95% CIs for each PRS category in each age stratum are provided in Figure 2-source data 1.
The online version of this article includes the following source data and figure supplement(s) for figure 2: Source data 1. Association of multi-ancestry polygenic risk score (PRS) and prostate cancer risk stratified by age. that reduced screening among low PRS men may reduce the overdiagnosis of prostate cancer. Indeed, previous studies in men of European ancestry support that PRS-stratified screening could significantly reduce the overdiagnosis of prostate cancer by 33-42%, with the largest reduction observed in men with lower genetic risk (Pashayan et al., 2015a;Callender et al., 2019;Pashayan et al., 2015b). Risk-stratified screening studies are warranted in diverse populations to evaluate the clinical utility of this multi-ancestry PRS for early disease detection and when in a man's life genetic risk should be considered in the shared decision-making process of prostate cancer screening.

Participants and genetic data
We replicated the association between the multiancestry PRS and prostate cancer risk in three independent case-control samples from the VA MVP, the MADCaP Network, and the NCI-MD, as described below. Previously, this multi-ancestry PRS was replicated by our group and others in the CA UG (1586 cases and 1047 controls of African ancestry), the UK Biobank (6852 cases and 193,117 controls of European ancestry; updates to the UK Biobank led to slightly different sample sizes in this study of 8483 cases and 193,744 controls of European ancestry), and the MGB (formerly known as the Partners Healthcare Biobank, 67 cases and 457 controls of African ancestry and 1554 cases and 10,918 controls of European ancestry). Results from these studies are described in detail elsewhere (Conti et al., 2021;Plym et al., 2022). To provide a comprehensive assessment of the PRS validation, we meta-analyzed all replication studies, which included a total of 22,049 cases and 414,249 controls of European ancestry (UK Biobank, MGB Biobank, and MVP) and 8794 cases and 55,657 controls of Table 1. Age at which 5% absolute risk of prostate cancer is reached in men from European, African, and Hispanic populations. Absolute risks of prostate cancer were estimated using age-and population-specific Surveillance, Epidemiology, and End Results (SEER) incidence rates, CDC National Center for Health Statistics mortality rates, and polygenic risk score (PRS) associations from Figure 2-  . The absolute risks were estimated using the age-and population-specific PRS associations from Figure 2-source data 1, the Surveillance, Epidemiology, and End Results (SEER) incidence rates, and the CDC mortality rates corresponding to non-Hispanic White, Black, and Hispanic men. The dotted line indicates the 5% absolute risk of prostate cancer.
African ancestry (MGB Biobank, MADCaP Network, NCI-MD, and MVP). In men of Hispanic ancestry, the multi-ancestry PRS was only assessed in MVP (1,082 cases and 20,601 controls). All study protocols were approved by each site's Institutional Review Board, and informed consent was obtained from all study participants in accordance with the principles outlined in the Declaration of Helsinki.

MVP
The design of the MVP has been previously described (Gaziano et al., 2016). Briefly, participants were recruited from approximately 60 Veteran Health Administration (VHA) facilities across the United States since 2011 with the current enrollment at >800,000. Informed consent was obtained for all participants to provide a blood sample for genetic analysis and access their full clinical and health data. The study received ethical and study protocol approval from the VA Central Institutional Review Board in accordance with the principles outlined in the Declaration of Helsinki.
A total of 485,856 samples from participants enrolled between 2011 and 2017 were genotyped on a custom Axiom array designed specifically for MVP (MVP 1.0). The genotyping array design and data quality controls were extensively described elsewhere (Hunter-Zinck et al., 2020). After excluding variants with high genotype missingness (>5%) and those that deviated from the expected allele frequency observed in the reference populations, genotype data were imputed to the 1000 Genomes Project Phase 3 reference panel (1000Genomes Project Consortium, 2015. In MVP, genetic ancestry was assessed using HARE (Fang et al., 2019), which assigned >98% of participants with genotype data to one of four nonoverlapping population groups: non-Hispanic White (European), non-Hispanic Black (African), Hispanic, and non-Hispanic Asian. Due to the small number of non-Hispanic Asian individuals, they were excluded from the current analysis.
We identified a total of 21,078 cases and 284,177 controls from MVP, of whom 13,643 cases and 210,214 controls were of European ancestry (73.3%), 6353 cases and 53,362 controls were of African ancestry (19.6%), and 1082 cases and 20,601 controls were Hispanic (7.1%). Prostate cancer cases were identified from the Veterans Affairs Central Cancer Registry (VACCR), which collects cancer diagnosis, extent of disease and staging, first course of treatment, and outcomes from 132 VA medical centers. In this analysis, we only included cases from the VACCR who have a confirmed cancer diagnosis based on their diagnostic code, procedure code, and information from other clinical documents. Among the MVP participants without any prostate cancer diagnostic codes, we limited controls to those aged 45-95 years and had at least one prostate-specific antigen (PSA) test after enrollment. For prostate cancer cases, we obtained additional information on cancer staging and Gleason score to define aggressive prostate cancer phenotypes. Specifically, prostate cancer was considered aggressive if one of the following criteria was met: tumor stage T3/T4, regional lymph node involvement (N1), metastatic disease (M1), or Gleason score ≥8.0. Nonaggressive cases were defined as tumor stage T1/T2 and Gleason score <7.

MADCaP
The MADCaP Network dataset included 405 prostate cancer cases and 396 controls from sub-Saharan Africa, as previously described (Harlemon et al., 2020;Andrews et al., 2018), with a substantial proportion of cases diagnosed at late stages. The study protocol was approved by each study site's Institutional Review Board/Ethnic Review Board. Written informed consent was obtained from all participants, and studies were conducted in concordance with the Declaration of Helsinki and the U.S. Common Rule. The MADCaP samples were genotyped on a customized array designed to capture common genetic variation in diverse African populations, and genotyping and quality control have been described in detail elsewhere (Harlemon et al., 2020). GWAS data were imputed using the 1000 Genomes Project Phase 3 reference panel (1000Genomes Project Consortium, 2015.

NCI-MD
The NCI-MD Study included 383 prostate cancer cases identified from two Maryland hospitals and 395 population-based controls from Maryland and its neighboring states (Smith et al., 2017). The study was approved by the NCI (protocol # 05C-N021) and the University of Maryland (protocol #0298229) Institutional Review Boards. Informed consent was obtained from all participants. About 87% of the cases in this study were considered nonaggressive, with pathologically confirmed T1 or T2 tumor and a Gleason score ≤7. All samples from this study were genotyped on the Illumina Infiniu-mOmni5Exome array and were imputed to the 1000 Genomes Project Phase 3 reference panel (1000Genomes Project Consortium, 2015.

PRS construction and association analyses
PRSs were constructed by summing variant-specific weighted allelic dosages from 269 previously identified prostate cancer risk variants (Conti et al., 2021). Variants were weighted using the multiancestry conditional weights generated from our previous trans-ancestry GWAS for prostate cancer (Conti et al., 2021). Variants and weights used to generate the PRS can be found in the PGS Catalog: https://www.pgscatalog.org/publication/PGP000122/.
The association of PRS on prostate cancer risk (i.e., case-control status) was estimated separately in each replication study using an indicator variable for the percentile categories of the PRS distribution: [0-10%], [10%-20%], [20%-30%], (30%-40%], (40%-60%], (60%-70%], (70%-80%], (80%-90%], and (90%-100%], where parentheses indicate greater than and square brackets indicate less than or equal to. Additional analysis was performed to obtain the association for the top 1% PRS by splitting the top PRS decile into (90%-99%] and (99%-100%] categories. PRS thresholds were determined in the observed distribution among controls in each study. In all replication studies, logistic regression was performed with the case-control status as the outcome (a binary dependent variable) and the PRS categories as independent predictors, adjusting for age and the up to 10 principal components of ancestry, with the (40%-60%] category as the reference. Age was defined as age at diagnosis for prostate cancer cases and age at last PSA testing (MVP) or age at study recruitment (MADCaP and NCI-MD) for controls.
Discriminative ability was evaluated in MVP by estimating the AUC for logistic regression models of prostate cancer that included covariates only (age and four principal components of ancestry) and for models that additionally included the PRS. All analyses were performed separately within each population.
We performed a fixed-effects inverse-variance-weighted meta-analysis to combine the ORs and standard errors for each PRS decile from individual replication studies by ancestry using R package meta (Schwarzer et al., 2015). This meta-analysis was conducted across the three studies of European ancestry, UK Biobank, MGB Biobank, and MVP, as well as across the five studies of African ancestry, MGB Biobank, CA UG, MADCaP Network, NCI-MD, and MVP.
In the two large replication studies, UK Biobank and MVP, logistic regression analyses were repeated stratifying both cases and controls at ages ≤55, (55-60], (60-65], (65)(66)(67)(68)(69)(70), and >70, with adjustments for age (as a continuous variable) and the top principal components of ancestry. The PRS associations estimated in men of European ancestry from UK Biobank and MVP were meta-analyzed using a fixed-effects inverse-variance-weighted method. Heterogeneity between studies and across strata was assessed via a Q statistic between effects estimates with corresponding tests of significance (Schwarzer et al., 2015).
In the three ancestry populations from MVP, we also performed stratified analyses by disease aggressiveness, where cases were stratified as aggressive or nonaggressive and all controls were used in the corresponding stratified analysis. In both the aggressive cases vs. controls and nonaggressive cases vs. controls analyses, logistic regression was performed with the case-control status as the outcome (a binary dependent variable) and the PRS categories as independent predictors, adjusting for age and the up to 10 principal components of ancestry, with the (40-60%] category as the reference. Heterogeneity across strata was assessed via a Q statistic between effects estimates with corresponding tests of significance (Schwarzer et al., 2015).

Estimation of absolute risk
The absolute risk of prostate cancer was calculated for a given age for each PRS category in European, African, and Hispanic ancestry men (Antoniou et al., 2010;Kuchenbaecker et al., 2017;Amin Al Olama et al., 2015;Antoniou et al., 2001). The approach constrains the PRS-specific absolute risks for a given age to be equivalent to the age-specific incidences for the entire population, such that age-specific incidence rates are calculated to increase or decrease based on the estimated risk of the PRS category and the proportion of the population within the PRS category. The calculation accounts for competing causes of death.
Specifically, for a given population and PRS category k (e.g., 80-90%, 90-100%), the absolute risk by age t is computed as AR k . This calculation consists of three components: is the probability of not dying from another cause of death by age t using age-specific mortality rates, µ . In this analysis, the age-specific mortality rates from the National Center for Health Statistics, CDC (1999-2013 is the probability of surviving prostate cancer by age t in the PRS category k and uses the prostate cancer incidence by age t for category k: . 3. The prostate cancer incidence by age t for PRS category k is I k ( t ) and is calculated by multiplying the population prostate cancer incidence for the reference category, I 0 ( t ) and the corresponding risk ratio, β ka , for PRS category k and age category a (e.g., ages ≤55, 55-60, 60-65, 65-70, and >70) containing age t. These are estimated from the odds ratio obtained from the population-specific individual-level PRS analysis for each age-stratum (African and Hispanic ancestry odds ratios from MVP and European ancestry odds ratios meta-analyzed from MVP and UK Biobank): Prostate cancer incidence for age t for the reference category, I 0 ( t ) , was obtained by constraining the weighted average of the population cancer incidences for the PRS categories to the population age-specific prostate cancer incidence, , where f k is the frequency of the PRS category k with f k = 0.1 for all nonreference categories in our primary PRS analysis by deciles (e.g., 0-10%, 10-20%, 20-30%, etc.). By leveraging the definition that S k ( t = 0 ) = 1 , for all k, the absolute risks were calculated iteratively by first getting , and finally AR k ( t = 1 ) . Subsequent values were then calculated recursively for all t.
For each population, absolute risks by age t were calculated using age-and population-specific prostate cancer incidence, µ ( t ) , from theSEER program (1999-2013) and age-and population-specific mortality rates, µ D ( t ) , from the National Center for Health Statistics, CDC (1999-2013. Additional files

Supplementary files • MDAR checklist
Data availability This investigation included published results from the following studies under DOI numbers 10.1038/ s41588-020-00748-0 and 10.1093/jnci/djab058. For the MVP data, the final data sets underlying this study cannot be shared outside the VA, except as required under the Freedom of Information Act (FOIA), per VA policy. However, upon request through the formal mechanisms in place and pending approval from the VHA Office of Research Oversight (ORO), a de-identified, anonymized dataset underlying this study can be created. Upon request through the formal mechanisms provided by the VHA ORO, we would be able to provide sufficiently detailed variable names and definitions to allow replication of our work. Any requests for data access should be directed to the VHA ORO ( OROCROW@va.gov), and should reference the following project and analysis: 'MVP017: A VA-DOE Exemplar Project on Cancer'. Publicly available data described in this manuscript can be found from the following websites: 1000 Genomes Project (https://www.internationalgenome.org/); SEER (https://seer.cancer.gov/); National Center for Health Statistics, and CDC (https://www.cdc.gov/nchs/ index.htm).