Inclusion of variants discovered from diverse populations improves polygenic risk score transferability

Summary The majority of polygenic risk scores (PRSs) have been developed and optimized in individuals of European ancestry and may have limited generalizability across other ancestral populations. Understanding aspects of PRSs that contribute to this issue and determining solutions is complicated by disease-specific genetic architecture and limited knowledge of sharing of causal variants and effect sizes across populations. Motivated by these challenges, we undertook a simulation study to assess the relationship between ancestry and the potential bias in PRSs developed in European ancestry populations. Our simulations show that the magnitude of this bias increases with increasing divergence from European ancestry, and this is attributed to population differences in linkage disequilibrium and allele frequencies of European-discovered variants, likely as a result of genetic drift. Importantly, we find that including into the PRS variants discovered in African ancestry individuals has the potential to achieve unbiased estimates of genetic risk across global populations and admixed individuals. We confirm our simulation findings in an analysis of hemoglobin A1c (HbA1c), asthma, and prostate cancer in the UK Biobank. Given the demonstrated improvement in PRS prediction accuracy, recruiting larger diverse cohorts will be crucial—and potentially even necessary—for enabling accurate and equitable genetic risk prediction across populations.


Introduction
Increasing research into polygenic risk scores (PRSs) for disease prediction highlights their clinical potential for informing screening, therapeutics, and lifestyle. 1 While their use enables risk prediction in individuals of European ancestry, PRSs can have widely varying and much lower accuracy when applied to non-European populations. [2][3][4] Although the nature of this bias is not well understood, it can be attributed to the vast overrepresentation of European ancestry individuals in genome-wide association studies (GWASs), which is 4.5-fold higher than their percentage of the world population; conversely, there is underrepresentation of diverse populations such as individuals of African ancestry in GWASs, which is one-fifth their percentage. 3 Potential explanations for the limited portability of European-derived PRSs across populations includes differences in population allele frequencies and linkage disequilibrium (LD), the presence of populationspecific causal variants or effects, or potential differences in gene-gene or gene-environment interactions. 4 However, in traits such as BMI and type 2 diabetes, 70%-80% of European-based PRS accuracy loss in African ancestry has been attributed to differences in allele frequency and LD; therefore, most causal variants discovered in Europeans are likely to be shared. 5 Recent methods developed to improve PRS accuracy in non-Europeans have prioritized the use of European-discovered variants and populationspecific weighting. [6][7][8] However, only small gains in accuracy are possible with limited sample sizes of non-European cohorts. 4 PRSs have been applied and characterized within global populations, but there is limited understanding of PRS accuracy in recently admixed individuals and whether this varies with ancestry. Studies applying PRSs in diverse populations [3][4][5]9 or exploring potential statistical approaches to improve accuracy in such populations 6,10 typically present performance metrics averaged across all admixed individuals. Only one study to date has suggested that PRS accuracy may be a function of genetic admixture (i.e., for height in the UK Biobank 8 ). However, it is unknown if the relationship between accuracy and ancestry exists when variants are discovered in non-European populations or what the best approach for applying PRSs to admixed individuals will be when there are adequately powered GWASs in non-European populations.
To help answer these questions, here we systematically and empirically explore the relationship between PRS performance and ancestry within African, European, and admixed ancestry populations through simulations. We highlight PRS-building approaches that will achieve unbiased estimates across global populations and admixed individuals with future recruitment and representation of non-European ancestry individuals in GWASs. We also investigate reasons for loss of PRS accuracy and attribute this to population differences in LD tagging of causal variants by lead GWAS variants, as well as allele frequency biases potentially due to genetic drift undergone by European ancestry populations. Finally, we confirm our simulation findings by application to data on hemoglobin A1c (HbA1c) levels, asthma, and prostate cancer in individuals of European and individuals of African ancestry from the UK Biobank.

Simulation of population genotypes
We used the coalescent model (msprime v.7.3 11 ) to simulate European (CEU) and African (YRI) genotypes, based on whole-genome sequencing of HapMap populations, for chromosome 20 as described previously by Martin et al. 2 Genotypes were modeled after the demographic history of human expansion out of Africa, 12 assuming a mutation rate of 2 3 10 À8 . We simulated 200,000 Europeans and 200,000 Africans for each simulation trial, for a total of 50 independent simulations (20 million total individuals). We generated founders from an additional 1,000 Europeans and 1,000 Africans (10,000 total across the 50 simulations) to simulate 5,000 admixed individuals (250,000 total across the 50 simulations) with RFMIX v.2, 13 assuming two-way admixture between Europeans and Africans with random mating and 8 generations of admixture.

True and GWAS estimated polygenic risk scores
We generated true genetic liability for all European, African, and admixed individuals within each simulation trial. 2 Briefly, m variants evenly spaced throughout the simulated genotypes were selected to be causal, and the effect sizes ðbÞ were drawn from a normal distribution b $ Nð0; ðh 2 =mÞÞ, where h 2 is the heritability. 2 Constant heritability and complete sharing of effect sizes in African ancestry and European ancestry individuals was assumed. The true genetic liability was computed as the summation of all variant effects multiplied by their genotype for each in- Finally, the non-genetic effect ðε ¼ Nð0; 1 Àh 2 ÞÞ standardized to explain the remainder of the Þ was added to the genetic risk, defining the total trait liability ðG þEÞ. 2 Cases were selected from the extreme tail of the liability distribution, assuming a 5% disease prevalence. An equal number of controls and 5,000 testing samples were randomly selected from the remainder of the distribution; all 5,000 admixed individuals were also used for testing. Across simulation replicates we varied causal variants (m ¼ [200, 500, 1,000]) and trait heritability (h 2 ¼ [0.33, 0.50, 0.67]); however, for simplicity, main text results assume m ¼ 1,000 and h 2 ¼ 0.50.
The estimated PRSs were constructed from GWASs of the simulated genotypes (modeled after chromosome 20) in European and African ancestry populations, each with 10,000 cases and 10,000 controls. Odds ratios (ORs) were estimated for all variants with minor allele frequency (MAF) > 1%, and statistical significance of association was assessed with a chi-square test. While causal var-iants may be included in the estimated PRS, they are drawn from the total allele frequency spectrum; thus, they are primarily rare (93% and 87% of causal variants have MAF < 1% in European and African ancestry populations when m ¼ 1,000) and restricted from our analysis. For each population, variants were selected for inclusion into the estimated PRS by p value thresholding (p ¼ 0.01 [main text], 1 3 10 À4 , and 1 3 10 À6 [Supplemental information]) and clumping (r 2 < 0.2) in a 1 Mb window within the GWAS population, where r 2 is the squared Pearson correlation between pairs of variants. A fixed-effects meta-analysis was also performed to calculate the inverse-variance weighted average of the ORs in African and European ancestry populations, and LD r 2 values for clumping used both datasets as the reference.
For each individual, an estimated PRS was calculated as the sum of the log(OR) (i.e., the PRS ''weights'') multiplied by their genotype for all independent and significant variants at a given threshold. The PRSs were constructed for testing samples with variants and weights each selected from European or African ancestry GWASs or a fixed-effects meta-analysis of both combined. Additional multi-ancestry PRS approaches 7,10 were also explored for admixed individuals. Accuracy was measured by Pearson's correlation (r) between the true genetic liability and estimated PRS within each population. Across simulation trials, averages and 95% confidence intervals (CIs) for r were calculated following a Fisher ztransformation for approximate normality. 14 The statistical significance of differences in accuracy between PRS approaches was assessed within ancestry groups, defined by global genome-wide European ancestry proportions, with a Z test (also following Fisher transformation). Specifically, within each simulation trial the z-statistic, measuring the difference between two PRS approaches, was computed, and a two-sided p value was obtained; results were summarized across trials by taking the median p value. While using r as a measure of accuracy has the added benefit of being independent from heritability, in admixed individuals we also calculate the proportion of variance (R 2 ) for total trait liability (genetic and environmental component) explained by the estimated PRS.

Multi-ancestry PRS
Local ancestry weighting PRS In addition to genotypes of simulated admixed individuals, RFMIX 13 also outputs the local ancestry at each locus for every individual. Thus, we used this information to create a local ancestry weighted PRS that is not impacted by ancestry inference errors. For admixed African and European ancestry individuals, an ancestryspecific PRS was constructed for each population (k) by limiting each PRS to variants found in that ancestry-specific subset of the genome ði˛kÞ, as defined by local ancestry, and weighting using variant effects discovered in that population. 7 Each ancestry-specific PRS was then combined, weighted by the genome-wide global ancestry proportion ðr k Þ for that individual as follows: 7 In this way, each individual has a PRS constructed from the same independent variants with personalized weights that are unique to the individual's local ancestry.
Linear mixture of multiple ancestry-specific PRSs Using a linear mixture approach developed by Márquez-Luna et al., 10 we combined two PRSs estimated in each of our global training populations where individual PRSs were constructed using significant and independent variants (p < 0.01 and r 2 < 0.2 in a 1Mb window) and effect sizes from a GWAS in that training population. For simulations, mixing weights (a 1 and a 2 ) were estimated in an independent African ancestry testing population, and as validation, accuracy was assessed in our simulated admixed ancestry individuals.f

Application to real data
We obtained genome-wide summary statistics for HbA1c, 15 asthma, 16,17 and prostate cancer 18,19 calculated in European and African ancestry individuals (Table S1). Summary statistic variants that were not present in both the UK Biobank European and African ancestry testing populations were removed. PRSs for each phenotype were constructed from associated and independent GWAS variants within each training population by p value thresholding (p ¼ [5 3 10 À8 , 1 3 10 À7 , 5 3 10 À7 , 1 3 10 À6 , 5 3 10 À6 , 1 3 10 À5 , 5 3 10 À5 , 1 3 10 À4 , 5 3 10 À4 , 1 3 10 À3 , 5 3 10 À3 , 0.01, 0.05, 0.1, 0.5, 1]) and clumping (LD r 2 < 0.2) of variants within 1 Mb with PLINK. 20 Additionally, a fixed-effects meta-analysis of the two populations was performed using METASOFT v2.0.1. 21 The selected PRS variants exhibited limited heterogeneity between the European and African ancestry training set summary statistics. In particular, of all possible European (African) ancestry selected PRS variants, only 5.4% (9.4%), 6.9% (5.7%), and 7.0% (4.8%) were heterogeneous between the two groups for HbA1c, asthma, and prostate cancer, respectively (i.e., I 2 > 25% and Q p value < 0.05). PRS performance was evaluated in an independent cohort using genotype and phenotype data for individuals of European ancestry and individuals of African ancestry (Table S1) from the UK Biobank, with imputation and quality control previously described. 22 We undertook extensive post-imputation quality control of the UK Biobank, including the exclusion of relatives and ancestral outliers from within each group. Specifically, analyses were limited to self-reported European and African ancestry individuals, with additional samples excluded if genetic ancestry principal components (PCs) did not fall within 5 SDs of the self-reported population mean. For each individual, their PRS was computed as the weighted sum of the genotype estimates of effect on each phenotype from the discovery studies (Table S1), multiplied by the genotype at each variant. For each population-specific variant set, weights from either the European or African summary statistics or the fixed-effects meta-analysis were used. A total of 96 polygenic risk scores were evaluated in each phenotype exploring the impact of ancestral population (two scenarios), p value threshold (16 scenarios), and variant weighting (three scenarios). The proportion of variation explained by each PRS (partial-R 2 ) approach was assessed for UKB European ancestry and African ancestry individuals separately. The partial-R 2 was calculated from the difference in R 2 values following linear regression of HbA1c levels on age, sex, BMI, and first 10 PCs with and without the PRS also included. Similarly, for asthma and prostate cancer, we determined the Nagelkerke's pseudo partial-R 2 following logistic regression of case status on age, sex (asthma only), BMI (prostate cancer only), and first 10 PCs with and without the PRS. Additionally, in African ancestry individuals we created a combined PRS ða 1 PRS EUR þa 2 PRS AFR Þ, where PRS EUR and PRS AFR was the most optimal PRS using variants from the designated population and the weight and p value that resulted in the highest accuracy; albeit in-sample, optimization was done within a single PRS to ensure limited overfitting of the combined model. 10 We used 5fold cross-validation to assess model performance in which 80% of the cohort was used to estimate the mixing coefficients (a 1 and a 2 ) and the combined PRS partial-R 2 was calculated in the remaining 20% of the data. Reported partial-R 2 was averaged across folds. 10 For our binary phenotypes with unbalanced affected and unaffected individuals, we used stratified 5-fold cross-validation.

Results
Generalizability of European-derived risk scores to an admixed population We constructed PRSs from our simulated European datasets and applied them to independent simulated European, African, and admixed testing populations, assuming 1,000 true causal variants (m) and trait heritability (h 2 ) of 0.5. On average, 1,552 (range ¼ [1,134-1,920]) variants were selected for inclusion into the PRS at p value < 0.01 and LD r 2 < 0.2 ( Table 1). The average accuracy across replicates (50 simulations), measured by the correlation (r) between the true and inferred genetic risk, was much higher when applying the PRS to Euro- Figure 1). This is in agreement with decreased performance seen in real data when applying a European-derived genetic risk score to an African population. [2][3][4][5] To understand the relationship between ancestry and PRS accuracy, admixed individuals were stratified by their proportion of genome-wide European (CEU) ancestry:  Figure 1). In comparison to Europeans, the performance of the Europeanderived PRS was significantly lower in individuals with intermediate (20% decrease; p ¼ 1.27 3 10 À47 ), and low (32% decrease; p ¼ 6.48 3 10 À16 ) European ancestry, and with African-only ancestry (41% decrease, p ¼ 8.00 3 10 À155 ). There was no significant difference for individuals with high (5.3% decrease; p ¼ 0.09) European ancestry. These trends remained consistent when varying the genetic architecture ( Figure S1), specifically decreasing the number of causal variants (m) and varying the trait heritability (h 2 ). Additionally, the relationship between ancestry and accuracy persisted with the inclusion of variants at lower p value thresholds ( Figure S2).
By further binning admixed individuals into deciles of global European ancestry and determining the variance explained of the total liability (genetics and environment) by the PRS, we estimated a 1.34% increase in accuracy for each 10% increase in global European ancestry, replicating a previous analysis of height in the UK Biobank. 8 The slope of this linear relationship increased with increasing heritability but was not found to vary with the number of true causal variants ( Figure S3).

Population-specific weighting of European selected variants
Using a well-powered GWAS from our simulated African cohort (10,000 cases and 10,000 controls), we aimed to explore the potential accuracy gains achieved from a PRS with European selected variants, but with population-specific weighting of these variants. We applied three different weighting approaches to incorporate non-European effect sizes: (1) effect sizes from an African ancestry GWAS for all variants; (2) effect sizes from a fixed-effects meta-analysis of European and African ancestry GWAS for all variants, both having 10,000 cases and 10,000 controls; and (3) population-specific weights based on the local ancestry for an individual at each variant in the PRS ( Figure 2).
The most accurate PRS approach varied by the proportion of European ancestry. Populations with greater than 20% African ancestry benefitted significantly from the inclusion of population-specific weights ( Figure 2). Intermediate European ancestry individuals benefitted most from using fixed-effects meta-analysis weighting instead of European weights (r ¼ 0.64 versus 0.61, p ¼ 0.02). In contrast, variant weighting from an African ancestry GWAS instead of from European had higher accuracy in low European ancestry (r ¼ 0.65 versus 0.53; p ¼ 0.009) and African-only (r ¼ 0.64 versus 0.45; p ¼ 2.02 3 10 À44 ) populations. Individuals with high European ancestry had similar accuracy with weights from a fixed-effects meta-analysis as from European (r ¼ 0.73 in both; p ¼ 0.79) but decreased performance with the inclusion of weights from an African ancestry GWAS (r ¼ 0.62 versus 0.73; p ¼ 0.01).
No clear benefits, and in some cases significant decreases, were observed for local ancestry-informed weights compared to weights from a European or African ancestry GWAS or fixed-effects meta-analysis. Individuals with high, intermediate, and low European ancestry had lower accuracy using local ancestry-informed weights compared to the best weighting in each ancestry group: r ¼ 0.71 versus 0.73 (from fixed-effect or European weights; p ¼ 0.58); r ¼ 0.61 versus 0.64 (from fixed-effect weights; p ¼ 0.004); and r ¼ 0.63 versus 0.65 (from African weights; p ¼ 0.60), respectively ( Figure 2).

Performance of non-European PRS variant selection and weighting approaches
In our simulations, population-specific weighting of PRS SNPs discovered in European ancestry populations improved PRS accuracy; however, the disparity between performance in European ancestry individuals versus African and admixed ancestry individuals remained large. We aimed to explore the potential improvements in PRS that could be gained by including variants discovered in large, adequately powered African ancestry cohorts. Following clumping and thresholding of significant variants using LD and summary statistics from the simulated African populations, an average of 5,269 (range ¼ [4,462-6,071]) variants were included in the PRS (Table 1), reflective of the greater genetic diversity and decreased LD compared to Europeans. 23 In contrast, when constructing a PRS using the same LD and p value criteria applied to a fixed-effects meta-analysis of European and African ancestry, an average of only 92 (range ¼ [38-197]) variants were included in the PRS. This substantially smaller number was a result of few variants being statistically significant in both populations. Of the total number of variants included from the European GWAS, African ancestry GWAS, and fixed-effects meta-analysis, only 1.15%, 0.54%, and 15.0% on average were the exact causal variant from the simulation; an additional 3.72%, 5.34%, and 33.3% tagged at least one causal variant with r 2 > 0.2 (and were within 5 1,000 kb of that causal variant) in European ancestry populations and 3.45%, 2.42%, and 28.1% in African ancestry populations ( Table 1). The limited overlap between true causal and GWAS selected variants is a result of causal variants in our simulation arising from the total spectrum of allele frequencies, and therefore more likely to be rare, while GWAS is better powered to detect common variants in the study population from which they were identified. 2 These common variants may not adequately tag rare variants due to low correlation. 24 Overall, we constructed twelve PRSs with variants selected from GWASs in European or African ancestry populations or a fixed-effects meta-analysis of both (three scenarios) and weights from the same approaches plus an additional local ancestry-specific weighting method (four scenarios) (Figure 2). For Europeans, the highest PRS accuracy was achieved with European selected variants and weights (r ¼ 0.77; 95% CI ¼ [0.76, 0.77]); however, a similar accuracy was observed for weights from a fixed-effects meta-analysis (r ¼ 0.76; p ¼ 0.53). For Africans, the highest PRS accuracy was with African selected variants and weights from a fixed-effects meta-analysis (r ¼ 0.75; 95% CI ¼ [0.74, 0.75]); similar performance was observed with African variants and weights (r ¼ 0.74; p ¼ 0.28). For admixed individuals, the highest-performing PRS depended on the population ancestry percentage. In individuals with high European ancestry (>80%), the best PRS was with European selected variants and fixed-effects metaanalysis or European weights (r ¼ 0.73; 95% CI ¼ Again, no benefit was observed with the inclusion of local ancestry-specific weights for any set of PRS variants. Using a more stringent p value threshold and including fewer variants into the PRS did not result in a change of the best PRS variants and weights ( Figure S2).

Inclusion of variants from diverse populations
We found that including in the PRS variants discovered in African ancestry GWASs with population-specific weights results in less disparity in PRS accuracy across ancestries compared to European selected variants, confirming that GWASs in non-bottlenecked populations may yield a more unbiased set of disease variants. 25 For example, applying to individuals of African ancestry a PRS derived from GWAS variants and weights discovered in training data from the target population results in a 15.7% higher accuracy compared to using a PRS comprised of variants discovered in a European GWAS (also with African weights). In contrast, the gains in accuracy achieved by sourcing variants from ancestry-matched studies were much lower in European ancestry individuals. Compared to a PRS with variants from an African ancestry GWAS (with European weights), a PRS derived from a European GWAS (also with European weights) only gave a 3.9% higher accuracy. We also observed better generalization of PRSs based on African selected variants across all admixed groups (Figure 2).
Unlike in Europeans, a PRS with variants discovered in African ancestry populations generalized across ancestral groups with population-specific weighting. However, similar to the European PRS, the African ancestry-derived PRS (with African variants and weights) was estimated to have a 1.62% increase in the variance explained of the total trait liability by the PRS for each 10% increase in African ancestry ( Figure S4). Through a linear combination of the European and African ancestry-derived PRS (Material and methods), 10 the relationship between ancestry and accuracy diminished to less than a 0.4% increase per 10% increase of African ancestry ( Figure S4).
While the best single PRS for admixed individuals with at least 20% African ancestry selected variants based on a GWAS in an African ancestry population with weights from a fixed-effects meta-analysis, a linear combination of the European and African ancestry-derived PRSs had higher accuracy; this was particularly true at decreased African ancestry cohort sizes. We saw considerable improvements with the combined PRS over using a European-derived (European selected variants and weights) PRS, especially for low European ancestry (CEU < 20%), where even with 10-fold fewer African samples there was a 27.4% increase in PRS accuracy compared to the European-derived risk score and a 12.3% increase compared to a PRS with African ancestry selected variants and weights from a fixed-effects meta-analysis (Figure 3).

Allele frequency and LD of GWAS variants
We sought to understand what factors impacted PRS generalizability across the different variant selection Figure 1. Accuracy of European-derived PRSs by proportion of total ancestry Accuracy of PRSs, with variants and weights from a European GWAS, decreases linearly with increasing proportion of African ancestry. Variants and weights were extracted from a GWAS of 10,000 European cases and 10,000 European controls. PRS accuracy was computed as the Pearson's correlation between the true genetic risk and GWAS estimated risk score across 50 simulations in independent test populations of 5,000 Europeans, 5,000 Africans, and 5,000 admixed individuals. Admixed individuals were grouped based on their proportion of genome-wide European ancestry. Simulations assume 1,000 causal variants and a heritability of 0.5 to compute the true genetic risk. A p value of 0.01 and LD r 2 cutoff of 0.2 was used to select variants for the estimated risk score.
approaches. GWASs performed in European and African ancestry populations (for SNPs with MAF R 0.01) were both more likely to identify significant variants that were more common in their own population than in the other. Approximately 60% of variants identified in European ancestry populations had MAFs less than 1% in African ancestry populations and vice versa; however, given the underlying assumption of homogeneity, the smaller number of variants selected by a meta-analysis of the two populations tended to have more similar MAFs ( Figure 4A). Although European and African ancestry GWASs were both better powered to detect variants at intermediate frequencies within the same study population, GWASs in European ancestry populations may be unable to capture derived risk variants that have remained in Africa, which could be the result of genetic drift, whereas GWASs in African ancestry populations are not subject to this bias. 25 We also examined LD tagging of causal variants by GWAS selected variants within our simulated European and African populations. Each causal variant's LD score was calculated by summing up the LD r 2 between that causal variant and every GWAS tag variant within 5 1,000 kb. The LD scores calculated in European and African ancestry populations were highly correlated (Pearson's r > 0.7) for the GWAS and fixed-effects meta-analysis selected variants. Variants selected from a fixed-effects meta-analysis had the highest LD score correlation between populations, as expected given that the variants reached significance in both populations and therefore were more common with similar LD patterns ( Figure 4B). Since LD score correlation did not vary largely between simulations, we examined the raw LD scores for a single simulation in order to illustrate differences in LD score magnitude not captured by the Pearson's correlation.
We found that European selected variants had higher LD scores in European compared to in African ancestry populations; however, variants selected from an African ancestry GWAS tagged causal variants in both populations more strongly ( Figure 4C). This is unlikely to be due to the larger number of African selected variants, as the results were unchanged following normalization of LD scores by dividing the total LD score for each causal variant by PRS size ( Figure S5). Fixed-effects meta-analysis variants had similar LD score magnitudes. However, while a greater proportion of the fixed-effects meta-analysis selected variants were causal, fewer were strong tags for causal variants not directly identified, highlighting the potential need for a model that does not assume homogeneity of effects for tag variants. 26 Additionally, causal variants with identical effect sizes may have differing allele frequencies across populations, which would result in heterogeneous allele substitution effects; this helps indicate why a fixed-effects meta-analysis may not be the optimal approach.
Application to real data To corroborate our simulation findings, we undertook an analysis of 96 PRSs developed for the prediction of multiple complex traits in European and African ancestry individuals from the UK Biobank, including HbA1c levels, asthma status, and prostate cancer (Table S1). We tested variant selection strategies based on p value thresholding and LD clumping of genome-wide summary statistics 15 computed in independent European or African ancestry cohorts and variant weights from the same approaches with an additional weighting from a fixed-effects metaanalysis across populations. Multiple p value thresholds and weighting strategies were tested to assess the PRS Figure 2. PRS construction approaches and performance in admixed individuals Using significant variants from an African ancestry GWAS with population-specific weights results in less disparity in PRS accuracy across populations. PRSs were constructed using variants and weights selected from either a European or African population (10,000 cases, 10,000 controls each) or a fixed-effects meta-analysis of both. An additional local ancestry-specific method was used for PRS weighting. Performance, measured as the Pearson's correlation between the true and GWAS estimated risk score, is shown across 50 simulations. Simulations assume 1,000 causal variants and a heritability of 0.5 to compute the true genetic risk. A p value of 0.01 and LD r 2 cutoff of 0.2 was used to select variants for the estimated risk scores. accuracy in African ancestry individuals relative to European ancestry individuals across parameters.
In UK Biobank Europeans, a GWAS significant Europeanderived PRS (with European variants and weights) had a partial-R 2 of 1.6%, 1.2%, and 1.5%, respectively, for HbA1c levels, asthma, and prostate cancer; the same PRS applied to African ancestry individuals with approximately 13.1% European ancestry 8 only explained 0.07%, 0.38%, and 0.19% ( Figure S6). Although the proportion of variation explained by the PRS (partial-R 2 ) was consistently lower in UK Biobank African ancestry individuals compared to Europeans, prediction was improved through the inclusion of variants or weights discovered in an independent African ancestry cohort across all traits ( Figure S6). Interestingly, we found that a linear combination of the best-performing PRS with European-discovered variants and African ancestry-discovered variants improved accuracy substantially (Table S2), supporting our simulation finding that a combined PRS that includes variants from the target population, even at smaller sample sizes, is the optimal approach for constructing PRS in admixed and non-European individuals.

Discussion
Our work shows that incorporating variants selected from European GWASs into a PRS can result in less-accurate prediction in non-European and admixed populations in comparison to variants selected from a well-powered African ancestry GWAS. Through simulations and application to real data analysis of multiple complex traits, we provide empirical evidence that supports the use of a linear mixture of multiple population-derived PRSs to remove bias with ancestry and achieve higher accuracy in admixed individuals with currently available non-European sample sizes. We also demonstrate the anticipated improvements in PRS prediction accuracy that may be achieved with the inclusion of diverse individuals in GWAS, highlighting the need to actively recruit non-European populations.
Our simulation finding that prediction accuracy of a European-derived PRS linearly decreases with increasing proportion of African ancestry in admixed African and European populations is consistent with a recent study of height, where there was a 1.3% decrease for each 10% increase in African ancestry. 8 This decrease in prediction accuracy has been attributed to LD and allele frequency differences, as well as differences in effect sizes across populations contributing to height. 8 Our work adds further insights into this reduction in PRS accuracy, showing that (1) it exists in the absence of trans-ancestry effect size differences consistent with previous theoretical models that did look at admixture, 2,5 and (2) variants selected from an African population may not have these same biases. Recent work found that known GWAS loci discovered in Europeans have allele frequencies that are upwardly biased by 1.15% in African ancestry populations, which results in a misestimated PRS, a phenomenon that likely arises due to population bottlenecks and ascertainment bias from GWAS arrays. 25 In our simulation study, which was not impacted by ascertainment bias, we show that GWASs in African ancestry populations also identify variants with population allele frequency differences; however, these variants have more consistent LD tagging of causal variants across populations. Our observations support the hypothesis that well-powered African ancestry GWASs will be able to more accurately capture disease-associated loci that are more broadly applicable across populations, due to having undergone less genetic drift. 25 A major advantage of our study is having large simulated European and African ancestry cohorts to provide guidelines for developing the best possible PRSs in admixed individuals with current and future GWASs. Through our exploration of 12 PRSs, with various variant selection and weighting approaches, we recapitulate recent results applying similar PRS strategies to an admixed Hispanic/ Latino population. 9 For individuals with intermediate proportions of European ancestry (20%-80%), we also see improvements using European selected variants and population-specific or fixed-effects meta-analysis weights; however, as non-European cohorts get increasingly large, Figure 3. Impact of African sample size on PRS accuracy and generalization PRS accuracy in diverse populations can be improved by including data from an African ancestry GWAS with smaller sample sizes than in a European GWAS. The number of African samples used in the GWAS and subsequent PRS construction was decreased to reflect availability of diverse samples in real data. Analysis was conducted assuming 1%, 5%, 10%, 50%, and 100% (matched size of European dataset) of the total African ancestry cases. Average accuracy and the 95% CI were reported across the 50 simulations for different variant selection and weighting approaches. Simulations assume 1,000 causal variants and a heritability of 0.5 to compute the true genetic risk. A p value of 0.01 and LD r 2 cutoff of 0.2 was used to select variants for the estimated risk score. A linear mixture of single-population PRSs (a 1 EUR þ a 2 AFR), with variants and weights selected from that population, was also tested in the admixed population. The mixture coefficients (a 1 and a 2 ) were estimated in an independent African ancestry testing population.
it will be imperative to perform variant discovery in these populations, as gains in accuracy with weight adjustment of European selected variants will be limited, especially in individuals with higher proportions of non-European ancestry.
Current methods for improving PRS accuracy in diverse populations have prioritized the inclusion of variants from European GWASs, as these have higher statistical power to identify trait-associated loci. For example, one such approach uses a two-component linear mixed model to allow for the incorporation of ethnic-specific weights. 6 Another method creates ancestry-specific partial PRSs for each individual based on the local ancestry of variants selected from a European GWAS. 7 This approach was found to improve trait predictability, compared to a traditional PRS with population-specific or European weights, in East Asians for BMI but not height. 7 In contrast, implementing this local-ancestry method 7 in our simulation, we found that PRS accuracy was higher with African or fixedeffects meta-analysis weighting than with local ancestry in admixed African ancestry populations. Our results suggest that true equality in performance will be difficult to obtain solely based on European-identified variants even with local ancestry-adjusted weights. Although local ancestry weighting may have greater benefits when complete sharing across populations is not assumed, we show that in the absence of population-specific factors, the optimal PRS approach involves using variants identified in a large African population and population-specific weighting.
To create a multi-ancestry PRS without incorporating local ancestry, Márquez-Luna et al. 10 use a mixture of PRSs, taking advantage of existing well-powered GWAS studies and supplementing with additional information that can be gained from a smaller study in the population of interest. We investigate this approach in the context of varying admixture proportions and find that it achieved high accuracy across all admixed individuals, was not biased by ancestry, and significantly improved performance over a European-only PRS with 10-fold fewer African ancestry cases. Thus, a combination of multiple single-population PRSs may be the best currently available approach for admixed individuals, and this approach will likely continue to improve as the individual PRSs are further developed.
An important finding of our work that the inclusion of variants from an African ancestry population results in A B C less disparity in PRS accuracy across other populations illustrates the need to recruit diverse populations in GWASs and make these data readily available. Large consortia such as H3Africa, PAGE, the Million Veterans Program, and All of Us are undertaking efforts to support this initiative. Based on our analysis of HbA1c, asthma, and prostate cancer in the UK Biobank, we find that improvement in PRS prediction accuracy is currently possible by incorporating findings from GWASs in African ancestry populations, albeit with lower power. In addition to smaller sample sizes, this potential improvement may be limited by ascertainment bias in what SNPs are included on genotyping arrays and poorer imputation in non-Europeans. GWAS arrays and their imputation have substantially higher coverage among Europeans, and this may result in decreased PRS portability of findings across traits; in such situations, whole-genome sequencing in diverse populations may be needed in order to improve accuracy. 27,28 Our study and others support the immense genetic diversity that can be unlocked by studying underrepresented populations to both eliminate the disparity in genetics for precision medicine and provide insights into disease biology for all populations. 25,27,29 Although our simulation study provides important insight into the future of PRS use, it has important limitations. First, while coalescent simulations allow for decreased computational burden, model assumptions may result in inaccurate long-range LD, especially for whole-genome simulations. 30 However, given we only simulated chromosome 20, biases are expected to be modest. 30 We also use a case-control framework for our simulation; therefore, power and potential differences in population PRS accuracy may be even higher if a quantitative trait was used. In addition, our simulations assume random mating among admixed individuals and therefore do not reflect the more complex assortative mating that may be observed, which may impact the distribution of local ancestry tract lengths in our simulation and therefore hinder the improvement of PRS accuracy by local ancestry weighting. 31 Also, although we provide evidence to suggest the contribution of population differences in allele frequency and LD tagging of causal variants to loss of PRS accuracy with varying ancestry, we do not delineate how each of these factors decreases accuracy independently; this is a direction for future work. Finally, we have only simulated individuals from Yoruba, a West African population, which is not representative of the greater diversity in Sub-Saharan Africa. 32 Future studies must be done to ensure our findings can be extended to individuals from other regions of Africa.
Overall, our findings support the concern that while studies have demonstrated the potential clinical utility of PRSs, adopting the current versions of these scores could contribute to inequality in healthcare. 4 Going forward, future studies should prioritize the inclusion of diverse participants, and care must be taken with the interpretation of currently available risk scores. While statistical approaches may offer improvements in accuracy compared to current European-derived risk scores, in order to truly diminish the disparity and achieve PRS accuracies similar to in European ancestry populations we must actively recruit and study diverse populations.

Data and code availability
The code generated during this study is available at GitHub under taylorcavazos/PRS_Admixture_Simulation.