The Impact of Population Demography and Selection on the Genetic Architecture of Complex Traits

Population genetic studies have found evidence for dramatic population growth in recent human history. It is unclear how this recent population growth, combined with the effects of negative natural selection, has affected patterns of deleterious variation, as well as the number, frequency, and effect sizes of mutations that contribute risk to complex traits. Because researchers are performing exome sequencing studies aimed at uncovering the role of low-frequency variants in the risk of complex traits, this topic is of critical importance. Here I use simulations under population genetic models where a proportion of the heritability of the trait is accounted for by mutations in a subset of the exome. I show that recent population growth increases the proportion of nonsynonymous variants segregating in the population, but does not affect the genetic load relative to a population that did not expand. Under a model where a mutation's effect on a trait is correlated with its effect on fitness, rare variants explain a greater portion of the additive genetic variance of the trait in a population that has recently expanded than in a population that did not recently expand. Further, when using a single-marker test, for a given false-positive rate and sample size, recent population growth decreases the expected number of significant associations with the trait relative to the number detected in a population that did not expand. However, in a model where there is no correlation between a mutation's effect on fitness and the effect on the trait, common variants account for much of the additive genetic variance, regardless of demography. Moreover, here demography does not affect the number of significant associations detected. These findings suggest recent population history may be an important factor influencing the power of association tests and in accounting for the missing heritability of certain complex traits.


AUTHOR SUMMARY
Many human populations have dramatically expanded over the last several thousand years. I use population genetic models to investigate how recent population expansions affect patterns of mutations that reduce reproductive fitness and contribute to the genetic basis of complex traits (including common disease). I show that recent population growth increases the proportion of mutations found in the population that reduce fitness. When mutations that have the greatest effect on reproductive fitness also have the greatest effect on a complex trait, more of the heritability of the trait is due to mutations at very low-frequency in populations that have recently expanded, as compared to populations that have not. Also, under this model, for a given sample size and false-positive rate, fewer variants show statistically significant associations with the trait in the population that has expanded than in one that has not. Both of these findings suggest that recent population growth may make it more difficult to fully elucidate the genetic basis of complex traits that are directly or indirectly correlated with fitness.

INTRODUCTION
Genome-wide association studies (GWAS) have successfully detected associations between hundreds of common single nucleotide polymorphisms (SNPs) and complex traits in humans [1,2]. While this catalog of genes has revealed important biological insights, for most traits, the discovered associations can only account for a small fraction of the heritability for these traits measured from family-based studies [3]. This difference in the heritability observed in familial studies and the heritability explained by associated SNPs has been termed "missing heritability", and there is tremendous interest in the human genetics community to find it [3][4][5].
One possibility that has received particular attention is that the missing heritability lies in rare variants that have large effect sizes [3]. Because a risk variant is rare in the population, an association between the variant and the phenotypes of interest may not have been detected using traditional GWAS. Instead, at present, such variants must be assayed through direct sequencing.
Due to technological advances (e.g. next-generation sequencing)! [6,!7], combined with newer analytical methods designed for analyzing full sequence data [8(11], exome and full genomesequencing studies are now being implemented in human genetics. The progression to sequencing data has already proved fruitful for the identification of causal mutations for several Mendelian diseases [6,!12 (16]. Full sequence data [17,18] ! is starting to reveal a richer picture of low-frequency genetic variation (minor allele frequency <0.5%), which may, in turn, increase the community's ability to implicate rare variants in risk of complex disease [4,[19][20][21][22][23][24][25]. Further, such studies should allow researchers to empirically determine the extent to which rare variants account for the missing heritability of complex traits [26][27][28][29][30]. However, before these new technological and methodological advances can reach their full potential, a more thorough understanding of low-frequency genetic variation in multiple human populations is essential.
To learn about patterns of rare genetic variation, several studies have sequenced hundreds of genes or complete exomes in thousands of individuals [31 (34]. These studies have made two important discoveries. First, they have found a larger number of rare variants than was expected under previous models of human population history. It has been argued that this excess of rare variants can be explained by the recent explosion in human population size [31,!32,!35 (37].
Second, these studies have documented a plethora of rare nonsynonymous SNPs that are likely evolutionarily deleterious and may be of medical relevance.
Comparatively less work has been done, however, to examine the implications that recent population history has had on the architecture of complex traits (but see the recent manuscript by Simons et al. [38]). It is unclear whether population history, and recent population growth in particular, affects the number, frequency, and effect sizes of mutations that contribute risk to complex traits. Addressing this question is critical for finding the "missing heritability" in different populations, as well performing the most powerful association studies to implicate specific variants in disease risk. Over a decade ago, it was recognized that the power to associate common variants with complex disease varied across populations [39][40][41]. This was largely due to asymmetry in the extent of linkage disequilibrium (LD) across populations as a result of differences in demographic history [42][43][44]. While the issue of LD is less relevant when considering rare variants, the topic of population choice for association studies has received substantially less attention when considering rare variants, despite its potential importance.
Here I use population genetic models to investigate the effect of recent population growth on patterns of deleterious genetic variation, the architecture of complex traits, and the ability to associate causal variants with the trait in models where a proportion of the trait's heritability is accounted for by mutations in a subset of the exome. Specifically, I show that recent population growth increases the input of deleterious mutations into the population, directly causing a proportional excess of deleterious genetic variation segregating in the population. Second, if a mutation's effect on reproductive fitness is correlated with its effect on a complex trait (such as a disease), I show that recent population growth increases the amount of the additive genetic variance of the trait that is accounted for by low-frequency variants relative to that in a population that did not expand recently. Further, I demonstrate that recent population growth leads to an increase in the number of alleles that contribute to the trait relative to what is expected in a population that did not recently expand. Finally, recent population growth decreases the number of SNPs that are significantly associated with the trait, relative to the number detected in a population that did not recently expand. This work indicates that in certain circumstances, recent population history will play an important role in determining the genetic architecture of complex traits in a particular population under study. As such, recent population history is a factor that should be considered when designing and interpreting re-sequencing studies for complex traits.

Models of population history
I explore several of models of population history (Fig. 1). Because many studies have inferred a population bottleneck in non-African human populations associated with the Out-of-Africa migration process [45(50], the first model includes a brief, but severe, reduction in population size (Fig. 1A). After the bottleneck, the population returns to the same size as the ancestral population. This model is referred to as "BN" throughout the rest of paper. The second model of population history also includes the same Out-of-Africa population bottleneck, but now includes an instantaneous, 100-fold population expansion in the last 80 generations, or the last 2000 years, assuming 25 years/generation (Fig. 1B)![51]. This recent explosion in effective population size is meant to approximate the expansion detected in the archeological and historical records as well as in studies of genetic variation [31,!35,!36,!52]. This model is referred to as "BN+growth" throughout the paper. Finally, for comparison purposes, I also investigate a model where a population experienced an ancient 2-fold expansion (Fig. 1C). Such a model is meant to reflect the history of African populations [46,47,53] and is referred to as "Old growth" in the paper.

Forward simulations
All results were obtained using the forward-in-time population genetic program described Step-wise population size changes are included in the model by changing the population size (N) at particular time points. Size changes affect the number of mutations that enter the population during each individual epoch and the magnitude of genetic drift.
For computational efficiency, I divided the population size by 2 and rescaled all times to be two fold smaller than under the specified model. However, I keep the population scaled mutation rate (θ ), and the population scaled selection coefficient (γ = 2Ns ), equal to the same values as for the larger population. This rescaling is possible because, in the diffusion limit, patterns of genetic variation only depend on the scaled parameters. Such a rescaling is customary in other forward simulation programs [56,!57]. Samples of 1000 chromosomes are taken from the population at different time points to calculate how diversity statistics change over time.

Models of disease
To evaluate the effect of recent demographic history on the architecture of complex traits, I simulate individuals who have a quantitative trait under various demographic scenarios. I assume that deleterious (nonsynonymous) mutations in a given mutational target account for some of the heritability of the trait. Such a model is implicitly assumed in exome re-sequencing studies used to implicate rare variants in disease risk. This quantitative trait could represent a trait that is measured on a quantitative scale (e.g. lipid levels) or represent the underlying risk to a dichotomous phenotype (e.g. diabetes). Below I provide a description of the model and parameters.
I investigate models where mutations at a subset of nonsynonymous sites can account for 5%, 10% or, 30% of the variance of the phenotype (i.e. the heritability accounted for by these variants is 5%, 10% or 30%). Many complex traits have heritabilties around 30% [79-82]. Thus, the models considered here assume that some fraction of this heritability is accounted for by variants within the mutational target (i.e. a portion of the exome) while the rest is accounted for by variants not modeled here (i.e. noncoding portions of the genome). The mutational target size, M, is the number of nonsynonymous sites in the genome that, if mutated, would generate a variant that affects the phenotype. Assuming a mutation rate of 1 x 10 -8 per site per generation, I investigate mutational target sizes of 70 kb and 140 kb. To gain a sense of how these sites could be partitioned into genes, the median length of the coding region of human genes 1335 bp [58].
Thus, a random gene would have roughly 934 nonsynonymous sites (assuming 70% of the coding sites are nonsynonymous). If all nonsynonymous sites within the gene would, if mutated, produce a causal variant, then the mutational target size of 70kb would correspond to 75 distinct genes accounting for the specified heritability, and the target size of 140kb would correspond to 150 distinct causal genes accounting for the heritability. If only half the nonsynonymous sites could be mutated to causal variants, then the number of genes would increase by a factor of 2. In practice, this model is implemented by taking a subset of the nonsynonymous SNPs simulated as described above and then assigning them an effect on the trait.
To assign an effect on the trait to a given causal SNP, I follow the model described by Eyre-Walker [59], with a modification described below. Essentially, the i th SNP's effect on a trait, α i , is given by where δ = 1, s i is the selective disadvantage for the i th SNP, τ is the relationship between the SNP's effect on fitness and the trait. A value of τ = 1.0 indicates a linear relationship, where the mutations that are most deleterious will also have the biggest effects on the trait. A value of τ = 0.0 indicates that a mutation's effect on fitness is independent of its effect on the trait. I set τ = 0.5 and τ = 0.0, to model a situation where there is some correlation between fitness and the trait, and another situation where the trait is independent of fitness. Next, ε i for the i th SNP is drawn from a normal distribution with mean 0 and a standard deviation of 0.5. I did not vary this standard deviation, because Eyre-Walker showed that varying this parameter had little effect on the overall results [59]. C is a normalizing constant for the SNP effect sizes so that where h C 2 ∈{0.05,0.1,0.3}. Essentially, C is a scaling constant for the SNP effect sizes so that the desired heritability is achieved under each combination of parameters h C 2 , τ, and M. Importantly, I find the average value of C across all simulation replicates in the standard neutral model, and then use this value of C for simulations under the other demographic models. As such, a SNP with a given effect on the trait under one demographic scenario will have the same effect on the trait under a different demographic scenario. This framework has the desirable property that a SNP's effect on a trait in a particular individual is biologically determined and is not directly affected by the demography of the population. Additionally, when setting up the simulations in this manner, the actual h 2 in a given simulation replicate is the outcome of a stochastic process, rather than set to a specific value. Nevertheless, in practice, there was little variation in h 2 across different demographic scenarios (Fig. S3). Incidentally, different values of C are found when using different values of h C 2 , τ, and M (Table S1). This is reasonable because these models are biologically very different from each other.
I then assign trait values (Y j ) to each simulated individual. This is done using an additive model, where the summation is over all i causal variants, z ij is the number of copies of the risk allele ( z i, j ∈ 0,1,2 { } ) carried at the i th SNP by the j th individual, α i is the effect of the i th SNP, and ε j is the environmental variance, which is drawn from a normal distribution with mean 0 and variance

Recent growth and deleterious variation
First I assess the effect that different population histories (BN, BN+growth, and Old Growth) have on neutral and deleterious genetic variation. Fig. 2A and Fig. 2B show how the number of synonymous and nonsynonymous, respectively, SNPs segregating in a sample of 1,000 chromosomes changes over time as the simulated populations change in size. The population bottleneck 2000 generations ago resulted in a decrease in the number of SNPs segregating in the BN and BN+growth populations (orange and green lines in Fig. 2A and Fig.   2B). When the populations recovered from the bottleneck and increased in size, the number of SNPs in the population also increased. This increase in the number of SNPs after the recovery from the bottleneck is due to two factors. First, the larger population size allows more new mutations to enter the population. Second, genetic drift has a weaker effect when the population size is large. As such, more SNPs are maintained in the population. The recent explosion in population size (dashed green lines in Fig. 2A and Fig. 2B; BN+growth) rapidly results in a substantial increase in the number of both synonymous and nonsynonymous SNPs segregating in the population. This is due to the extreme increase in the population mutation rate (typically referred to as θ ) due to the larger population size. Ancient population growth also resulted in an increase in the number of synonymous and nonsynonymous SNPs segregating in the population, via the same mechanisms (purple line in Fig. 2A and Fig 2B; Old growth).  Fig. 2C). The reason for this is that, when the population size decreases, rare variants are preferentially lost over common variants. More nonsynonymous than synonymous SNPs are rare, and, as such, the crash in population size results in the loss of more nonsynonymous SNPs than synonymous SNPs. After the population recovers from the bottleneck, the proportion of nonsynonymous SNPs found in the population increases (Fig. 2C).
The reason for this increase is that, due to the increase in population size, many new mutations enter the population after the recovery from the bottleneck. Most of these new mutations are nonsynonymous, due to there being more possible nonsynonymous changes than synonymous changes in coding regions. In fact, the proportion of nonsynonymous SNPs segregating in the population immediately after the recovery of the bottleneck is actually higher than that in the ancestral population (Fig. S1). The very recent increase in size in the BN+growth population also results in an increase in the proportion of nonsynonymous SNPs (green line in Fig. 2C). 54.8% of the SNPs in BN+growth are nonsynonymous (green line in Fig. 2C), compared to 52.8% in the BN population (orange line in Fig. 2C). It will take approximately 4Ne (where Ne is the current effective population size) generations for the proportion of deleterious SNPs to reach the equilibrium value for the larger population size (Text S1). The population that underwent an ancient expansion (dotted purple line in This pattern also holds with other magnitudes of population growth (Text S1).
Next I examine the average fitness effects of nonsynonymous SNPs segregating in a sample of 1,000 chromosomes at different time points in the simulations (Fig. 2D). During the bottleneck, the average segregating SNP in the BN and BN+growth populations becomes less deleterious than in the ancestral population (orange and green lines in Fig. 2D). This is due to many rare, deleterious SNPs being eliminated from the population as well as fewer new deleterious SNPs entering the population when it is small in size. After the population recovers from the bottleneck, the average segregating SNP became more deleterious. In the first few generations after the recovery, the average SNP is even more deleterious than in the ancestral population. This is due to the increase in the input of deleterious mutations immediately after the recovery from the bottleneck. After a few generations however, negative natural selection has eliminated many, though certainly not all, of these deleterious SNPs from the population. In fact,

Fig. 2D
shows that even in the present day, the average SNP is more deleterious than that in the ancestral population. This same effect applies even more strongly to the recent population growth within the last 80 generations. Immediately, after growth, the average SNP segregating in the BN+growth population was more strongly deleterious ( Fig. 2D) than what is expected in a population that has not expanded. However, during the last 40 generations, selection has eliminated many of the most deleterious SNPs from the population. In the present day, the average SNP in the BN+growth population (green line) is slightly less deleterious the BN population (orange line, also see Text S1), consistent with the results of Gazave et al. [65]. This effect is less pronounced with decreasing amounts of population growth (Text S1).
The description of the average strength of selection on a SNP described above does not take into account the frequency of the deleterious SNP in the population. The genetic load, however, does by weighting the selection coefficient by the SNP's frequency [66]. Genetic load is the reduction in fitness of the population due to deleterious mutations [67]. I find that, unlike the average selection coefficient, the genetic load is not affected by the demographic history of the population (Fig. S2). Similar

Recent growth and the heritability of complex traits
Using the models of demography, selection, and genetic architecture described above, I first examine the effect that population history has on the heritability of the trait. I find that population history has little effect on the heritability of the trait (Fig. S3), regardless of the values of h C 2 , τ , and M used in the simulations. This is evidenced by the fact that in all three demographic scenarios investigated, the actual heritability estimated from each simulation replicate is close to h C 2 , the value set in a constant size population. To further investigate the effect of recent growth on heritability, I divide the causal variants segregating at the end of the simulation into three categories. The first category consists of those SNPs that arose either further back in time than, or during the population bottleneck ("Before bottleneck" in Fig. 3).
These mutations occurred >1960 generations ago. The second category consists of SNPs that arose after the population had recovered from the bottleneck, but further back in time than the recent population growth ("After bottleneck" in Fig. 3). These mutations arose between 1960 and 80 generations ago. The final category consists of SNPs that arose within the last 80 generations ("After growth" in Fig. 3). In the BN+growth model, these are the mutations that arose after the population expansion. Fig. 3A shows that the average heritability accounted for by mutations that arose at these three different time points is similar in both the BN+growth population (green boxes), and the BN population (orange boxes). Interestingly, when a mutation's effect on fitness is correlated with its effect on the trait (τ = 0.5 ), mutations that arose in the last 80 generations, as a class, account for the greatest amount of the heritability (Fig. 3A).
Next, I investigate whether other features of genetic variation that affect the heritability (e.g. number of SNPs, mean allele frequency, mean effect size) are affected by recent population history. I find that recent growth has had little effect on the number of mutations that arose prior to the population bottleneck and are still segregating in the sample, the mean allele frequency, and the effect sizes of such mutations (Fig. 3B, Fig. 3C, and Fig. 3D). However, there is a different pattern for mutations that arose after the bottleneck, but more than 80 generations ago (those in the "After bottleneck" category). Recent population growth increases the number of such mutations (roughly 2-fold) relative to that found in the population that did not expand ( Fig.   3B). Further, these mutations tend to be at lower frequency in the BN+growth population compared to the BN population (Fig. 3C). The only difference between the two models of population history on variants that arose during this time period is that genetic drift is weaker in the BN+growth population, compared to the BN population. Thus, fewer weakly deleterious mutations are lost from the BN+growth population, generating the pattern seen in Fig. 3B. The mutations that are not lost from the population tended to be at lower frequency in the larger population because they also are less likely to drift to higher frequency in the expanded population as compared to the non-expanded populations. The mutations that arose within the last 80 generations also are affected by recent population history (those in the "After growth" category). As expected, recent population growth leads to a dramatic increase in the number of such SNPs (Fig. 3B). Further, the new mutations tend to be at lower frequency in the BN+growth population than in the BN population (Fig. 3C). More surprisingly, these SNPs tend to have weaker effect sizes on the trait in the BN+growth population than in the BN population ( Fig. 3D). This observation can be explained by selection more effectively removing moderately and strongly deleterious mutations from the expanded population than from the non-expanded populations [65]. Thus, while recent population history affects the number of mutations, frequencies, and effect sizes of these mutations, it does so in such a way that the overall heritability of the trait appears unaffected by population history. Fig. S4 shows similar plots for the model where a mutation's effect on the trait is not correlated with its effect on fitness (τ = 0).

Recent growth increases the contribution of rare variants to the additive genetic variance
While population history does not affect the overall heritability of the trait, it can have a profound impact on the additive genetic variance (V A ), and consequently, the heritability, attributable to low-frequency vs. common variants (Fig. 4). When a mutation's effect on fitness is correlated with its effect on the trait (τ = 0.5), more than 50% of the additive genetic variance Additionally, under this model, demographic history does not make as substantial an impact on the amount of the additive genetic variance explained by SNPs at different frequencies, as suggested by Simons [38]. Again, similar results hold for other heritabilities considered and mutational target sizes (Fig. S5B). Thus, in some instances, recent population growth can result in a substantial increase in the amount of the genetic variance attributable to rare variants.

Recent growth increases genetic heterogeneity of disease
Population history also has a profound impact on the number of causal mutations in a sample of 1000 individuals who were selected from the upper 40 th percentile of the distribution of the quantitative trait (Fig. 5). These individuals can be thought of as cases. Here recent growth (BN+growth) is predicted to have resulted in a substantial increase in the number of causal mutations compared to a population that had not undergone such recent growth (BN; orange vs. green boxes in Fig. 5A). In fact, a sample of 1000 cases from the BN+growth population is predicted to have nearly twice as many distinct causal mutations as a sample from the BN population. An explanation for these patterns is that many new deleterious causal mutations have arisen after the population has expanded in size. Because they are new and rare, they are only found in a small number of individuals. As such, each individual has his/her own set of lowfrequency risk mutations. When aggregating this number across hundreds of individuals, the total number of causal mutations in the sample from the BN+growth population is higher than in the BN population. Interestingly, the number of distinct causal mutations is actually higher in the BN+growth population than in the Old growth population (purple box in Fig. 5A).
A similar increase in the number of causal variants in the sample from the recently expanded population relative to a non-expanded population is seen even when a SNP's effect on the trait was uncorrelated with its effect on fitness (τ = 0; Fig. 5B). This pattern is due to the fact that there is the same number of rare causal variants in the BN+growth population even when τ = 0. However, when τ = 0, many of these rare causal mutations have smaller effect sizes, and consequently, do not account for much of the phenotypic variance of the trait.
To further explore this issue, I examine how much of the phenotypic variance (V P ) in the population can be accounted for by the SNPs that explain the most V P (Fig. 6). When τ = 0.5, the top SNPs that explain most of the variance account for less of it in the population that recently expanded (BN+growth) than in the population that did not (BN). For example, when h C 2 = 0.05 , the 25 SNPs that account for the most V P will account for 5% of the V P in the BN population (orange line in Fig. 6A). In contrast, for the BN+growth population (green line in Fig. 6A), the 25 SNPs that account for the most V P will only explain <3.5% of it. Put another way, the top 25 SNPs that explain the most variance account for >90% of the V A in the BN population, but <70% of the V A in the BN+growth population. These results suggest that, if mutational effects on disease are correlated with their effects on fitness, many of the additional rare causal variants found in a recently expanded population, may, in aggregate, explain a substantial proportion (say 20%) of the heritability of the trait.
If a mutation's effect on fitness is independent of its effect on disease (τ = 0), then the top SNPs that explain the most variance account for almost all of the V A (Fig. 6B). For example, in the model where h C 2 = 0.05 , the top 25 SNPs will account for nearly 5% of the V P , regardless of the demographic history of the population. Put another way, here the top 25 SNPs account for the majority of the V A , and this pattern is not affected by the demography of the population. This finding supports the previous statement that many of the extra causal mutations seen in Fig. 5B in the recently expanded population actually contribute very little to the overall V P of the trait.
Similar results are found for other values of h C 2 and mutational target sizes (Fig. S6).

Effect of demography on the power of association tests
Next, I investigate how different demographic histories affect the power to associate SNPs with a trait in a sample of 1000 cases and 1000 controls. Most power simulations for association tests examine the power to detect a given causal variant conditional on its allele frequency and/or effect size. Using this approach, I find that power to detect the SNPs that explain the greatest amount of V A is actually higher in the population that recently expanded (BN+growth) than in the population that only underwent a bottleneck (BN; Text S2). However, recent population growth has a more limited effect on the power to detect a given association when conditioning on the allele frequency or odds ratio of the causal SNP (Text S2).
The power analyses described above refer to the power to detect a given causal variant, conditional on various attributes of it. However, the number of causal variants, their frequencies, and their effect sizes are random quantities that are influenced by the evolutionary process experienced by the population under study. Thus, it is also useful to examine the expected number of causal SNPs with P-values less than the significance threshold across the different models of demographic history (Fig. 7). The expected number of causal SNPs detected in a study of a given sample size will account for both the power to detect a given variant as well as the number, frequency distribution, and effect size distribution of causal variants in the population. It  Fig. 7A). Similar trends are seen for the other models of h C 2 and M (Fig. S7).
However, when h C 2 = 0.05 , sample sizes of 1000 cases and controls are too small to detect almost any associations with P<1 x 10 -5 , regardless of the demography of the population. When using a less stringent significance threshold (P<0.01), h C 2 = 0.3 , and M=70kb, a median of 10 causal loci are associated with the trait in the BN population (Fig. S8A). However, a median of only 8 causal loci were detected in the BN+growth population. Again, similar trends are noted for the other models of h C 2 and M (Fig. S8). However, when h C 2 = 0.05 , a median of 2 causal SNPs were detected at P<0.01 for all three demographic models. This result is due to the very low power to detect an association for causal variants with very small effect sizes using samples of 1000 cases and 1000 controls, regardless of the demography of the population. Nevertheless, even here, a higher proportion of simulation replicates had detected at least 3 associations in the BN population (54%) than in the BN+growth population (41%). Taken together, this analysis suggests that recent population growth can result in a decrease in the expected number of associations to be detected in a given sample size. Thus, while recent growth may increase power to detect the SNP that explains the greatest amount of the variance, and have little effect on power to detect a given SNP conditional on its frequency or effect size, it enriches the frequency distribution for rare causal variants. The power to detect such variants using single-marker association tests is low, decreasing the expected number of significant association to be detected in the population that recently expanded.
However, demographic history has no clear effect on the number of causal loci detected with a given sample size when the mutation's effect on fitness is independent of its effect on the trait (τ = 0; Fig. 7B, Fig. S7, and Fig. S8). For some models, the BN+growth population appears to have a higher number of significant associations than in the BN population ( Fig. S7D and Fig.  S7E). However, this pattern is not consistently seen across models or significance thresholds.
Similarly, when using a significance threshold of P<0.01, the Old growth population appears to show a greater number of significant association (Fig. S8E-H) than either of the other two models of population history. This pattern may be due to the slight, but noticeable, increase in the h C 2 for the Old growth population (Fig. S3E-H).
Researchers have suggested that the amount of the additive genetic variance ( accounted for by SNPs with a given P-value (Fig. 8A) of V A in the BN+growth population. The reason for this is that many of the rare variants that account for V A for the trait in the population were not present in the sample of 1000 cases and 1000 controls. When τ = 0, population history has little affect on the amount of V A accounted for by SNPs with a given P-value (Fig. 8B). Including all SNPs present in the association study captures over 95% of the V A . This finding is not surprising in light of the observation (Fig. 4B) that much of the V A is accounted for by common variants when τ = 0, and such variants are likely to be present in the sample of 1000 cases and controls. Qualitatively similar trends are seen for other heritablities and mutational target sizes (Fig. S9).

DISCUSSION
I have shown that very recent population growth can have a profound impact on patterns of deleterious genetic variation and the genetic architecture of complex traits. Specifically, I show that recent population growth leads to an increase in the proportion of nonsynonymous SNPs relative to non-expanded populations. Further, this recent growth is predicted to have affected the genetic architecture of some complex traits. This result has implications for discovering the "missing heritability" in different human populations and detecting causal variants that may also affect reproductive fitness.
While it has been shown that differences in population history between European and African populations has affected the proportion of deleterious SNPs in the two populations [54], here I demonstrate that the influence of population history on deleterious mutations also applies on a much more recent timescale, and to populations that are much more similar to each other than Europeans and Africans.
While I have shown that demographic history greatly affects the proportion and frequencies of deleterious mutations segregating in the population, it is interesting that demography does not have a large effect on the overall genetic load of the population. Haldane has shown that the genetic load at equilibrium contributed by a particular mutation is independent of the strength of selection acting on the particular mutation and the frequency of that mutation [66]. Mutations of strong effect will be maintained by selection at lower frequency than mutations of weaker effect. Haldane suggests that these effects should cancel each other out.
Haldane's result was derived for a simple model with a constant population size. It was unclear whether this result would hold when considering populations with bottlenecks and recent growth.
Here I have shown that Haldane's result applies under certain complex demographic models.
Further work is required to determine whether this trend holds in other species with demographic histories that depart even further from the standard neutral model, and whether this trend holds for models involving dominance.
I find that population history is predicted to have little effect on the overall amount of additive genetic variance for a trait seen in different populations. As such, assuming a common environmental variance across populations, the heritability of a trait is predicted to be similar across populations. This finding suggests that if differences in the heritability of a trait are detected across populations, these differences are more likely to be due to differing environmental effects, rather than due to different amounts of additive genetic variance. For example, it has been suggested that the heritability of height in a West African population is less than that typically estimated from European populations [75]. My results would argue that such a difference would be due to shifts in the environmental variance, rather than changes in amount of additive genetic variance as a result of differences in recent population history.
A major conclusion of this study is that recent population growth has a greater effect on the architecture of traits when a mutation's effect on fitness is correlated with its effect on the phenotype than when the mutation's effect on fitness is independent of its effect on the  Further rationale for considering models where a mutation's effect on fitness is correlated with its effect on the trait comes from exome sequencing studies themselves. A major assumption made in exome sequencing studies is that some of the missing heritability can be explained by rare variants of large effect that increase risk to disease [3,4]. If there is no correlation between a mutation's effect on disease and its effect on fitness, then there is no reason for rare variants to have stronger effects on disease than more common variants. Under this model (where fitness effects are independent of trait effects), effect sizes would be randomly assigned to SNPs, regardless of their allele frequency. On the other hand, if a mutation's effect on fitness is correlated with its effect on disease, then the SNPs with the strongest effects on disease are likely to be the most deleterious ones. As such, they will also be the most rare in the population due to purifying selection. Because the exome sequencing paradigm essentially assumes that the effect of a coding region mutation on disease is correlated with its effect on fitness, it is important to investigate the proprieties of such a model under different population histories.
My models make several predictions that can be tested with empirical data. While these models were developed to apply to exome sequencing data, because the predictions were robust to the mutational target size and heritability accounted for by the mutations in the target region ( Fig. S3, Fig. S5-Fig. S9), they should apply to GWAS data as well, especially if low-frequency variants are imputed from a reference panel, like the 1000 Genomes Project. First, the models predict that if a mutation's effect on fitness is correlated with its effect on the trait, common variants should account for more of the heritability in a population that did not expand than in one that had recently expanded. This prediction can be tested by analyzing GWAS data in expanded vs. non-expanded populations. Second, for a given sample size, if a mutation's effect on fitness is correlated with its effect on the trait, the models predict that fewer significant associations will be detected in the recently expanded population than in a population that has not expanded. This prediction can also be tested by comparing the number of significant associations detected in GWAS data from the expanded population vs. the non-expanded population. Third, the prediction that, if a mutations' effect on fitness is correlated with its effect on the trait, low-frequency variants should account for more of the heritability in the recently expanded population than in a non-expanded population can be tested directly once large-scale exome sequencing data in both expanded and non-expanded populations has been collected.
Failing to find these patterns in GWAS and exome sequencing data would suggest that there is little correlation between a mutation's effect on fitness and its effect on the trait.  recently expanded in addition to those more stable in size, recognizing that larger sample sizes in populations that have recently expanded will be necessary to achieve comparable power to that in non-expanded populations.
Finally, these results are directly relevant for finding the "missing heritability" in different populations. If a mutation's effect on disease is correlated with its effect on fitness, then more of the heritability will be explained by very rare variants in a population that experienced a recent expansion than in a population that did not recently expand. Additionally, the variants detected by single marker association tests explain less of the heritability in a recently expanded population than in a population that did not recently expand. Thus, while the overall heritability of a trait may not be variable across populations, our ability to discover the variants that account for it is likely to vary across populations due to differences in demographic history.    Narrow sense heritability was computed for each demographic model as

Allele frequency
Effect size 0.0 0.5

Additional results on recent growth and deleterious variation
Looking into the future: re-attaining equilibrium Very recent population growth leads to an increase in the proportion of nonsynonymous SNPs in the population compared to a population that has not recently expanded. Thus, the recent population growth has pushed the population out of equilibrium. But, as natural selection eliminates the new deleterious mutations from the population, the proportion of nonsynonymous SNPs in the population and the average deleterious effect of a SNP will decrease. Eventually, assuming no further demographic changes, the proportion of nonsynonymous SNPs in the population will attain the new equilibrium value for the larger population size (Fig. S1.1).  How long will it take for human populations to reach this new equilibrium? By running the simulations with 5-fold growth for many generations into the future, it will take roughly 200,000 or 4N e generations for the proportion of nonsynonymous SNPs segregating in the population to decrease to the new equilibrium value for the larger population size (Fig. S1.1). This agrees well with the classic result that, conditional on fixation, a new neutral mutation takes roughly 4N e generations to become fixed in the population [1]. Thus, extrapolating to 100-fold growth, it would take roughly 4 million generations to reach the equilibrium proportion of nonsynonymous SNPs in the population.!
Effect of different growth rates ! I also examine the effect that different magnitudes of population growth have on the proportion of nonsynonymous SNPs in the population (Fig. S1.2A) and the average fitness effects of segregating deleterious variants (Fig. S1.2B). As expected, stronger recent population growth results in a higher proportion of nonsynonymous SNPs in the population (Fig. S1.2A).
Interestingly, the difference between 10-fold growth and 100-fold growth appears rather slight compared to the difference between 0 and 5-fold growth. This result suggests that even a small amount of recent growth may be sufficient to affect patterns of weakly deleterious mutations.
The picture for the average fitness effects of segregating deleterious mutations is more complex (Fig. S1.2B). Here, as the magnitude of growth increases, the average segregating SNP becomes less deleterious. This effect can be explained by selection being more efficient in the larger population (as suggested in a recent paper by Gazave et al. [2]). Interestingly, the 80 generations since the expansion has been sufficient time for selection to have begun removing some of the most deleterious mutations (Fig. S1.3). Note that at 50 generations ago, the average SNP was most deleterious in the populations that experienced the greatest expansion (100-fold and 50-fold). This is due to the increased input of new deleterious mutations as a result of the population expansion. However, in the present day (0 generations ago), the average SNP is least deleterious in the populations that expanded the most. This is due to the increased efficacy of purifying selection in a large population. Thus, many of the most deleterious mutations are quickly eliminated from the population.

Supplementary Text S2
Additional results on the effect of demography on the power of association tests All results described here were obtained using h C 2 = 0.3 , M = 70 kb, and a critical value for Fisher's exactly test of 1 x 10 -5 .

Power as a function of V A
I examine the power to detect an association as a function of the amount of the V A that a given SNP explains (Fig. S2.1). When a mutation's effect on fitness is correlated with its effect on the trait (τ = 0.5), power to detect an association with a SNP that explains much of the V A (>5%) is higher in the population that recently expanded (BN+growth; green lines in Fig. S2.1A) than a population that did not (BN; orange line in Fig. S2.1A). This increase in power comes from the fact that those SNPs that explain a lot of V A in the recently expanded population tend to have smaller effect sizes than those SNPs in the population that did not recently expand ( Fig.   S2.1C). Because, under this model, the effect size of a mutation is correlated with the strength of selection acting against it (Fig. S10), those SNPs that have smaller effect sizes are also less deleterious and can reach higher frequency in the population. This pattern is demonstrated in Fig.   S2.1E, which shows that the SNPs that each account for about 5% of the V A in the BN+growth population have a higher average frequency than those SNPs that explain a similar proportion of V A in the BN population. This increase in allele frequency results in an increase in power for the single-marker association test. An obvious question is why do the SNPs that explain >5% of the V A in the BN+growth population have smaller effect sizes than SNPs that explain similar amounts of V A in the BN population? This counter-intuitive pattern can be explained by noting that in order for a given SNP to explain >5% of the V A , it must be relatively common, or have a large effect, or both. In the BN population, there are many low-frequency mutations of strong effect which are moderately deleterious that may explain >5% of the V A . In the BN+growth population, however, many of these low-frequency deleterious mutations with strong effects are eliminated from the population due to the increased efficacy of purifying selection in the large population. Those mutations that are left behind in the recently expanded population that can explain >5% of the V A will tend to be less deleterious, have smaller effect sizes (Fig. S2.1C), but higher allele frequencies as compared to the mutations that explain >5% of the V A in the BN population ( Fig. S2.1E). The recent population growth also increases the number of new deleterious mutations with large effect sizes that enter the population. However, these mutations are so rare that they are unlikely to contribute >5% of the V A , and so they are not relevant for the present discussion.
When a mutation's effect on fitness is not correlated with its effect on the trait (τ = 0), I find that demography has little effect on the power of the association test (Fig. S2.1B). For all three models of population history, power to detect mutations that explain more of the V A is high.
Further, the SNPs that tend to explain most of the V A tend to be at higher allele frequency ( Fig.   S2.1D) and have larger effect sizes than those mutations that explain less V A (Fig. S2.1F).

Power as a function of allele frequency
Next I examine the power of the single-marker association tests as a function of the allele frequency of the SNP (Fig. S2.3). When a mutation's effect on fitness is correlated with its effect on the trait (τ = 0.5), I find that power is highest for SNPs with allele frequencies between 5-10% ( Fig. S2.3A). Additionally, power is slightly higher in the population that did not expand as compared to that in the expanded population. Power is low for rare SNPs, as expected. Power is also low for very common SNPs (>10%) because, under this model, these SNPs also have the smallest effect sizes, which leads to a decrease in power. While recent population growth only has a subtle effect on the power of the association test when conditioning on allele frequency, it has a substantial effect on the number of rare causal variants. As expected, population growth increases the number of rare causal variants as compared to a population that did not expand ( Fig.   S2.3C). These are the variants that are very difficult to detect via single-marker association tests.
Similar to previous results, when a mutation's effect on fitness is not correlated with its effect on the trait (τ = 0), I find that demography has little effect on the power of the association test, and that the power of the test is highest for common variants (allele frequency >10%; Fig. S2.3B).
Under this model, effect sizes are not correlated with allele frequencies (Fig. S2.3B), and as such, common variants are just as likely to have large effects as are rare variants. Thus, power is highest to detect common variants. Again, however, recent population growth increases the number of rare causal variants in the population (Fig. S2.3D), which will be difficult to detect using single marker association tests.

Power as a function of odds ratio (OR)
I also examine the power of the association test as a function of the estimated odds ratio (OR) computed from the case-control study (Fig. S2.4). When a mutation's effect on fitness is correlated with its effect on the trait (τ = 0.5), power is highest for those SNPs with an OR close to 10 ( Fig. S2.4A). Power decreases as the effect sizes decrease, and there is essentially no difference in power across the different models of population history. Many SNPs were present only in cases. An OR calculated for such SNPs would be infinite. However, power is low to detect such variants because they are typically at low frequency, and single-marker tests are underpowered to detect such variants [1]. Recent population growth increases the number of such mutations (Fig. S2.4C). When a mutation's effect on fitness is independent of its effect on the trait (τ = 0), power is highest for SNPs with ORs between 1.5 and 2. SNPs with higher ORs are typically at low frequency in the population, reducing the power to detect them (Fig. S2.4B).
Though the effect size on the liability scale in the population is not correlated with allele frequency (Fig. S10B), low-frequency SNPs tend to have larger ORs, simply because they are more likely to show larger relative differences in frequency between cases and controls ( Fig.   S10D). Again, population history has little effect on power (Fig. S2.4B). Recent growth also increases the number of mutations that are present only in cases and that appear to have ORs of infinity ( Fig. S2.4D). Growth would also increase the number of mutations present only in controls that would have ORs of 0. However, the median OR is still >1, reflecting the fact that cases carry more variants that are not carried by controls (rather than vice versa). This is expected as mutations were expected to increase risk of disease and as such, cases are expected to carry more of them.