Signals of polygenic adaptation on height have been overestimated due to uncorrected population structure in genome-wide association studies

Genetic predictions of height differ among human populations and these differences are too large to be explained by genetic drift. This observation has been interpreted as evidence of polygenic adaptation. Differences across populations were detected using SNPs genome-wide significantly associated with height, and many studies also found that the signals grew stronger when large numbers of subsignificant SNPs were analyzed. This has led to excitement about the prospect of analyzing large fractions of the genome to detect subtle signals of selection and claims of polygenic adaptation for multiple traits. Polygenic adaptation studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the height analyses in the UK Biobank, a much more homogeneously designed study. Our results show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population structure.

Here we focus on a series of recent studies-some involving co--authors of 72 the present manuscript-that have reported evidence of polygenic adaptation at 73 alleles associated with height in Europeans. One set of studies observed that height--74 increasing alleles are systematically elevated in frequency in northern compared to 75 southern European populations, a result that has subsequently been extended to 76 ancient DNA.[4-11]Another study using a very different methodology (singleton 77 density scores, SDS) found that height--increasing alleles have systematically more 78 recent coalescent times in the United Kingdom (UK) consistent with selection for 79 increased height over the last few thousand years.
[12] 80 All of these studies have been based on SNP associations, in most cases with 81 effect sizes discovered by the GIANT Consortium, which most recently combined 79 82 individual GWAS through meta--analysis, encompassing a total of 253,288 83 individuals. [13,14] Here, we show that the selection effects described in these 84 studies are severely attenuated and in some cases no longer significant when using 85 summary statistics derived from the UK Biobank, an independent and larger single 86 study that includes 336,474 genetically unrelated individuals who derive their 87 ancestry almost entirely from British Isles (identified as "white British ancestry" by 88 the UK Biobank) (Supplementary Table  S1). The UK Biobank analysis is based on a 89 single cohort drawn from a relatively homogeneous population enabling excellent 90 control of potential population stratification. Our analysis of the UK Biobank data 91 confirms that almost all genome--wide significant loci discovered by the GIANT 92 consortium are real associations, and that the two datasets have high concordance 93 for low P value SNPs which do not reach genome--wide significance 94 (Supplementary Figure S1; genetic correlation between the two height studies is 95 0.94 [se=0.0078]). However, our analysis yields qualitatively different conclusions 96 with respect to signals of polygenic adaptation. 97 98 Results 99 100 We began by estimating "polygenic height scores"-sums of allele 101 frequencies at independent SNPs from GIANT weighted by their effect sizes-to 102 study population level differences among ancient and present--day European 103 samples. We used a set of different significance thresholds and strategies to correct 104 for linkage disequilibrium as employed by previous studies, and replicated their 105 signals for significant differences in genetic height across populations. qualitatively similar only when we restricted to independent genome--wide 118 significant SNPs in GIANT and the UK Biobank (P < 5x10 --8 ) (Supplementary Figure  119 S2b). This replicates the originally reported significant north--south difference in the 120 allele frequency of the height--increasing allele [4] or in genetic height[5] across 121 Europe, as well as the finding of greater genetic height in ancient European steppe 122 pastoralists than in ancient European farmers,[6] although the signals are 123 attenuated even here. This suggests that tests of polygenic adaptation based on 124 genome--wide significant SNPs may be relatively insensitive to confounding 125 (Supplementary Figure S2b), and that confounding due to stratification is a 126 particular danger for sub--significant SNPs (Figure 1a, Supplementary Figure S2a). 127 128 129

222
We obtained direct confirmation that population structure is more correlated 223 with effect size estimates in GIANT than to those in the UK Biobank. Figure 2a  224 shows that the effect sizes estimated in GIANT are highly correlated with the SNP 225 * * * * * * * * *°°°°β Figures S10--S12). We also do not see a strong correlation with PC loadings in our 235 UK Biobank estimates computed using unrelated individuals (UKB) (Figure 2a). 236 However, the UK Biobank estimates including individuals not of British Isles 237 ancestry and not correcting for population structure (UKB all no PCs) show the 238 same stratification effects as GIANT and NG2015 sibs (Supplementary Figure  S10--239 S12). Similarly, we find that alleles that are more common in the Great Britain 240 population (GBR) than in the Tuscan population from Italy (TSI) tend to be 241 preferentially height--increasing according to the GIANT and NG2015 sibs estimates 242 but not according to the UKB estimates (Figure 2c, Supplementary Figures S11, 243 S12). 244 The tSDS analysis should be robust to the type of population structure 245 discussed above.
[12] However, there is a north--south cline in singleton density in 246 Europe due to the lower genetic diversity in northern than in southern Europeans, 247 with singleton density being lower in northern than in southern regions.
[21] As a 248 consequence, SDS tends to be higher in alleles more common in GBR than in TSI 249 (Figure 2d). This cline in singleton density coincidentally parallels the phenotypic 250 cline in height and the major axis of genome--wide genetic variation. Therefore, 251 when we perform the tSDS test using GIANT--estimated effect sizes and P values, we 252 find fewer singletons (corresponding to higher SDS) around the inferred height--253 increasing alleles which tend, due to the uncontrolled population stratification in 254 GIANT, to be at high frequency in northern Europe (Figures 2c). This effect does not 255 appear when we use UK Biobank summary statistics because of the much lower 256 level of population stratification and more modest variation in height. We find that 257 SDS is not only correlated with GBR--TSI allele frequency differences, but with 258 several principal component loadings across all SNPs (Figure 2b), and that these 259 SDS--PC correlations often coincide with correlations between GIANT--estimated 260 effect sizes and PC loadings (Figure 2a). 261 We further find that the tSDS signal which is observed across the whole 262 range of P values in some summary statistics can be mimicked by replacing SDS 263 with GBR--TSI allele frequency differences (Figures  3a,  3c, Supplementary Figures 264 S7--S8, S13--S14), suggesting that the tSDS signal at non--significant SNPs may be 265 driven in part by residual population stratification. As with the polygenic score 266 analysis, a small but significant effect is observed when we restrict to genome--wide 267 significant SNPs (P < 5 x 10 --8 ). This effect persists when using UK Biobank family--268 based estimates for genome--wide significant SNPs (Figure 3b), and is not driven by 269 allele frequency differences between GBR and TSI (Figure 3d

296
The patterns shown here suggest that the positive tSDS values across the whole range of P 297 values is a consequence of residual stratification. At the same time, the increase in tSDS at 298 genome--wide significant, LD--independent SNPs in (b) cannot be explained by GBR --TSI 299 allele frequency differences as shown in (d). Binning SNPs by P value without LD--pruning 300 can lead to unpredictable patterns at the low P value end, as the SNPs at the low P value end 301 are less independent of each other than higher P value SNPs (Supplementary Figure  S15).
NG2015 sibs UKB all no PCs UKB WB no PCs UKB UKB sibs NG2015 sibs UKB all no PCs UKB WB no PCs UKB UKB sibs 304 305 number of SNPs than previously thought. Indeed, a tSDS signal which is driven by 306 natural selection is not expected to lead to an almost linear increase over the whole 307 P value range in a well--powered GWAS. Instead, we would expect to see a greater 308 difference between highly significant SNPs and non--significant SNPs, similar to the 309 pattern observed in the UK Biobank (Figure 3a).   Lastly, we asked whether any remaining differences in polygenic height 333 scores among populations are driven by polygenic selection by using the Qx 334 framework to test against a null model of genetic drift.
[5] We re--computed 335 polygenic height scores in the POPRES dataset for this analysis as it has larger 336 sample sizes of northern and southern Europeans than the 1000 Genomes 337 project.
[22] We computed height scores using independent SNPs that are 1) 338 genome--wide significant in the UK Biobank ("gw--sig", P < 5 x 10 --8 ) and 2) sub--339 significantly associated with height ("sub--sig", P < 0.01) in different GWAS datasets. 340 For each of these, we tested if population differences were significant due to an 341 overall overdispersion (PQx), and if they were significant along a north--south cline 342 (Plat) ( Therefore, we remain cautious about interpreting any residual signals as "real" 360 signals of polygenic adaptation. 361 362 Discussion 363 364 We have shown that estimates of population differences in polygenic height 365 scores are strikingly attenuated with the UK Biobank GWAS data relative to 366 previous analyses. We find some evidence for population--level differences in 367 genetic height, but it can only be robustly seen at highly significant SNPs, because 368 any signal at less significant P values is dominated by the effect of residual 369 population structure. Even genome--wide significant SNPs in these analyses may be 370 subtly affected by population structure, leading to continued overestimation of the 371 effect. Thus, it is difficult to arrive at any quantitative conclusion regarding the 372 proportion of the population differences that are due to statistical biases vs. 373 population stratification of genetic height. It is equally challenging to test whether 374 differences in genetic height are due to adaptation in response to environmental 375 differences, migration and admixture (e.g. fraction of Steppe pastoralist ancestry), 376 or relaxation of negative selection. Further, estimates of the number of independent 377 genetic loci contributing to height variation are sensitive to and likely confounded 378 by residual population stratification. 379 We conclude that while effect estimates are highly concordant between 380 GIANT and the UK Biobank when measured individually (Supplementary Tables  381  S4-- polygenic adaptation, which are driven by small systematic shifts in allele 385 frequency, they can create highly significant artificial signals especially when SNPs 386 that are not genome--wide significant are used to estimate genetic height. In no way 387 do our results question the reliability of the genome--wide significant associations 388 discovered in the GIANT cohort or the validity of the statistical methodology used in 389 previously reported polygenic tests for adaptation. However, we urge caution in the 390 interpretation of genome--wide signals of polygenic adaption that are based on large 391 number of sub--significant SNPs-particularly when using effect sizes derived from 392 meta--analysis of heterogeneous cohorts which may be unable to fully control for 393 population structure. 394 395 396 Materials and Methods 397 398 Genome--wide association studies (GWAS) 399 We analyzed height using publicly available summary statistics that were 400 obtained either by meta--analysis of multiple GWAS or by a GWAS performed on a 401 single large population. Asia (PJL, BEB) and East Asia (CHB, JPT) (Figure 1a, Supplementary Figures S2--436  S6). In total, we analyzed 1112 ancient individuals, and 1005 modern individuals 437 from 10 different populations in the 1000 genomes project (Supplementary  Table  438 S2). We used the allele frequency differences between the GBR and TSI populations 439 for a number of analyses to study population stratification (Figures 3c, 3d, 440 Supplementary Figures S11--S14). We also analyzed 19 European populations 441 from the POPRES[22] dataset with at least 10 samples per population (Figure 4, 442 Supplementary Table S3, Supplementary Figures S16--S18). 443 444 All ancient samples had 'pseudo--haploid' genotype calls at 1240k sites 445 generated by selecting a single sequence randomly for each individual at each SNP 9 . 446 Thus, there is only a single allele from each individual at each site, but adjacent 447 alleles might come from either of the two haplotypes of the individual. We also re--448 computed scores in present--day 1000 genomes individuals using only pseudo--449 haploid calls at 1240k sites to allow for a fair comparison between ancient and 450 modern samples (Supplementary Figure S6) Al polygenic scores are plotted in centered standardized form ( was used to make independent "clumps" of SNPs for each GWAS at different P value 471 thresholds. This procedure selects SNPs below a given P value threshold as index 472 SNPs to start clumps around, and then reduces all SNPs (P < 0.01) that are in LD 473 with these index SNPs (above an r 2 threshold, 0.1) and within a physical distance of 474 them (1 Mb) into clumps with them. Clumps are preferentially formed around index 475 SNPs with the lowest P value in a greedy manner. The index SNP from each clump is 476 then picked for further polygenic score analyses. The algorithm is also greedy such 477 that each SNP will only appear in one clump if at all. We clumped each GWAS to 478 obtain (2) a set of independent sub--significant SNPs associated with height (P < 479 0.01) similarly to ref.
[7], and (3) a set of genome--wide significant SNPs associated 480 with height (P < 5 x 10 --8 ). The 1000 genomes phase 3 dataset was used as the 481 reference panel for computing LD for the clumping procedure. 482 483 The estimated effect sizes for these three sets of SNPs from each GWAS was 484 used to compute scores. Only autosomal SNPs were used for all analyses to avoid 485 creating artificial mean differences between populations with different numbers of 486 males and females. 487 488 The 95% credible intervals were constructed by assuming that the posterior 489 of the underlying population allele frequency is independent across loci and 490 populations and follows a beta distribution. We updated a Uniform prior 491 distribution with allele counts from ancient and modern populations to obtain the 492 posterior distribution at each locus in each population. We estimated the variance of 493 the polygenic score ! using the variance of the posterior distribution at each locus, 494 and computed the width of 95% credible intervals as 1.96 ! for each population. were computed using a block--jackknife approach, where each block of 1% of all 526 SNPs ordered by genomic location was left out and the Spearman correlation 527 coefficient was computed on the remaining SNPs. We also compared the tSDS score 528 distributions for only genome--wide significant SNPs (Figure 3b).

530
Population structure analysis 531 To compute SNP loadings of the principal components of population 532 structure (PC loadings) in the 1000 genomes data (Figure 2, Supplementary 533 Figure S10), we first computed PC scores for each individual. We used SNPs that 534 had matching alleles in 1000 genomes, GIANT and UK Biobank, that had minor allele 535 frequency > 5% in 1000 genomes, and that were not located in the MHC locus, the 536 chromosome 8 inversion region, or regions of long LD. After LD pruning to SNPs 537 with r 2 < 0. Characterization of stratification effects in GIANT and UK Biobank 688 To better understand how stratification influences the differences observed 689 between GIANT and UK Biobank, we grouped SNPs by their P value in GIANT and by 690 their P value in UK Biobank (Supplementary Figure S1). First, we observe that 691 SNPs with low GIANT P values, but not SNPs with low UK Biobank P values, show 692 greater differences in estimated effect size (Supplementary Figure  S1a). However, 693 the relative difference in beta values decreases for lower P values, and the 694 correlation among betas approaches one at the most significant SNPs 695 (Supplementary Figure S1b,c). In GIANT, more significant SNPs exhibit a greater 696 correlation between effect estimates and GBR--TSI allele frequency differences, while 697 this is not observed in the UK Biobank (Supplementary Figure  S1d). Consequently, 698 the difference in UK Biobank and GIANT effect size estimates is more correlated to 699 GBR--TSI allele frequency differences at more significant SNPs (Supplementary 700 Figure S1e). This suggests that while stratification effects are larger at more 701 significant SNPs, the magnitude of stratification--independent effects is even larger, 702 which may be why polygenic score results converge when using only the most 703 significant SNPs. 704 Next, we investigated how P value inflation as measured by !" is influenced 705 by stratification, by grouping SNPs into deciles based on their GBR--TSI allele 706 frequency difference (Supplementary Figure S12). To guard against the effect of 707 observing lower P values at more differentiated SNPs simply because those SNPs 708 are more common on average, we restrict this analysis to SNPs with mean MAF > 709 20%. We find that !" is not much increased for SNPs that are more differentiated 710 between populations. However, in the presence of stratification, there is a large 711 difference between !" of height increasing alleles and !" of height decreasing 712 alleles (Supplementary Figure S12b). Similarly, there are large effects on the 713 frequency with which a SNP is estimated to be height increasing or height 714 decreasing. In GIANT, SNPs in the highest decile of GBR--TSI allele frequency 715 differences are 52% more often estimated to be height increasing than height 716 decreasing, while these rates are close to even in the UK Biobank (Supplementary 717 Figure S12a).         TSI  IBS  GBR  CEU  TSI  IBS  GBR  CEU  TSI  IBS  GBR  CEU   TSI  IBS

961
LD score regression can be used to detect residual stratification effects in summary 962 statistics, and so we tested whether LDSC confirms our hypothesis of residual stratification.

963
We detect a greatly inflated intercept estimate of 9.42 in UKB all no PCs, but only a 964 moderately increased intercept value in GIANT and an intercept less than one in NG2015