Demographic history impacts stratification in polygenic scores

Large genome-wide association studies (GWAS) have identified many loci exhibiting small but statistically significant associations with complex traits and disease risk. However, control of population stratification continues to be a limiting factor, particularly when calculating polygenic scores where subtle biases can cumulatively lead to large errors. We simulated GWAS under realistic models of demographic history to study the effect of residual stratification in large GWAS. We show that when population structure is recent, it cannot be fully corrected using principal components based on common variants—the standard approach—because common variants are uninformative about recent demographic history. Consequently, polygenic scores calculated from such GWAS results are biased in that they recapitulate non-genetic environmental structure. Principal components calculated from rare variants or identity-by-descent segments largely correct for this structure if environmental effects are smooth. However, even these corrections are not effective for local or batch effects. While sibling-based association tests are immune to stratification, the hybrid approach of ascertaining variants in a standard GWAS and then re-estimating effect sizes in siblings reduces but does not eliminate bias. Finally, we show that rare variant burden tests are relatively robust to stratification. Our results demonstrate that the effect of population stratification on GWAS and polygenic scores depends not only on the frequencies of tested variants and the distribution of environmental effects but also on the demographic history of the population.

utility is limited by lack of transferability between populations [27] and between subgroups 118 within populations [28]. This may be due in part to residual stratification in polygenic 119 scores. To understand the behavior of polygenic scores under the perpetual and recent 120 structure models, we simulated GWAS (N=9,000) of a heritable phenotype with a genetic 121 architecture similar to that of height. We used GWAS effect sizes to calculate polygenic 122 scores in an independent sample (N=9,000) and subtracted the true genetic values for each 123 individual to examine the spatial bias in polygenic scores due to stratification.

124
Under both perpetual and recent structure models, residual polygenic scores are spatially 125 structured, recapitulating environmental effects even when 100 common-PCs are used as 126 covariates in the GWAS (Fig. 3). LMMs perform similarly (Fig. S3). This is due to the fact 127 that when population stratification is not fully corrected, the effect sizes of variants that 128 are correlated with the environment tend to be over or underestimated depending on the 129 Page 6 direction and strength of correlation (Fig. S4). Stratification in residual polygenic scores is minimal when the causal variants are known, but not when the score is constructed from the 131 most significant SNPs ('lead SNPs') ( Fig. 3)-almost always the case in practice. Picking 132 the most significant SNPs tends to enrich for variants that are more structured than the 133 causal variants. Thus, polygenic scores are especially prone to residual stratification when 134 constructed using SNPs in 'sub-significant' peaks where the effect of stratification dominates 135 over relatively small causal variant effect sizes.

136
The effect of stratification in more complex models 137 In reality, genetic structure in most studies is more complex than either model discussed 138 above. Most populations are genetically heterogeneous and each genome is shaped by pro-139 cesses such as ancient and recent admixture, non-random mating and selection, all of which 140 vary both spatially and temporally. The present-day population of Britain, for example, is 141 the result of a complex history of admixture and migration [9,21]. Thus, restricting analysis 142 even to the 'White British' subset of UK Biobank involves population structure on multiple 143 time scales. To study these effects, we simulated under a model based on the demographic history of Europe and geographic structure of England and Wales, while maintaining the 145 same degree of structure as the previous models (Fig. 4). In addition to recent geographic 146 structure, we simulated an admixture event 100 generations ago between two populations, 147 each of which are themselves the result of mixtures between many ancient populations ( Fig.   148 4). We varied the admixture fraction from the two source populations to create a North-

149
South ancestry cline and sampled individuals to mimic uneven sampling in the UK Biobank 150 (Fig. 4, Methods).

151
The results under this model are very similar to the recent structure model in that when 152 the environmental effect is smoothly distributed, it cannot be corrected using common-PCA 153 as population structure is largely recent (Fig. 4). Note also that correction is not complete 154 even with rare-PCA as seen from the biased polygenic scores of individuals from Cornwall.

155
This is not due to reduced migration in the region ('edge effects') but rather to uneven 156 sampling (only 17 individuals sampled from Cornwall as opposed to 250 under uniform Figure 3: Residual stratification in effect size estimates translates to residual spatial structure in the polygenic score in the (A) recent and (B) perpetual structure models. The simulated phenotype in the training sample has a heritability of 0.8, distributed over 2,000 causal variants. Each small square is colored with the mean residual polygenic score for that deme in the test sample, averaged over 20 independent simulations of the phenotype.
In each panel, the rows represent different methods of PCA correction and columns represent two different methods of variant ascertainment. 'Causal' refers to causal variants with p-value < 5×10 -4 , and 'lead SNP' refers to a set of variants, where each represents the most significantly associated SNP with a p-value < 5×10 -4 in a 100kb window around the causal variant. The simulated environment is shown on the left. For the sharp effect, the affected deme is highlighted with an asterisk.
Page 8

169
In practice, however, sample sizes for sibling-based studies are much smaller than stan-

204
The effect of population structure on GWAS depends on the amount of structure, the 205 frequency of the variants tested and the distribution of confounding environmental effects.

206
Here, we demonstrated that it also depends on the demographic history of the population 207 in a way that is not fully captured by the degree of structure as summarized by F ST and 208 genomic inflation. Consequently, to fully correct for population structure, it is necessary 209 to know not only the degree of realized structure, but also the demographic history that 210 generated it, and the likely distribution of environmental effects.

211
Generally, PCA (or mixed models) based on common variants will inadequately capture 212 and correct population structure with a recent origin. genes.

234
Even imperfect correction for population structure is probably sufficient to limit the

245
Our study focused on population structure arising from ancient admixtures and geo-246 graphic structure because these are relatively well-understood and easy to model. However,

247
our results generalize to any type of population structure, for example due to social strati-248 fication or assortative mating. What we refer to as local environmental effects also includes 249 socially structured factors like cultural practices. Ultimately, no single approach can com- Where λ 300k location is the observed value (≈ 12) given a sample size of 300,000 as in Ref. [12].
Plugging this in, we get an expected value of λ 9k location ≈ 1.36. To match this approximately,

Page 14
we set the migration rate to a fixed value of 0.05 and 0.07 for the 'recent' and 'perpetual' models, respectively (Table S1).

282
We parameterized the 'complex' model with two migration rates, m 1 and m 2 , where 283 m 1 represents the migration rate between the source populations mixing 100 generations 284 before present (2.5Kya) and m 2 represents the migration rate between adjacent demes in 285 the grid (Fig. 4). We selected m 1 and m 2 in a step-wise manner, first setting m 1 = 0.

334
We calculated genomic inflation (λ p ) for non-heritable phenotypes as is the p th percentile of the observed association test statistic and F −1 χ 2 (p) is the quantile 336 function of the χ 2 distribution with 1 degree of freedom.

337
Sibling-based tests 338 We conducted structured matings by sampling pairs of individuals from the same deme 339 and generated the haplotypes of each child by sampling haplotypes, with replacement, from 340 each parent without recombination. We generated heritable phenotypes as described in the  burden tends to behave more like a common variant in its spatial distribution.