Using Ancient Samples in Projection Analysis

Projection analysis is a tool that extracts information from the joint allele frequency spectrum to better understand the relationship between two populations. In projection analysis, a test genome is compared to a set of genomes from a reference population. The projection’s shape depends on the historical relationship of the test genome’s population to the reference population. Here, we explore in greater depth the effects on the projection when ancient samples are included in the analysis. First, we conduct a series of simulations in which the ancient sample is directly ancestral to a present-day population (one-population model), or the ancient sample is ancestral to a sister population that diverged before the time of sampling (two-population model). We find that there are characteristic differences between the projections for the one-population and two-population models, which indicate that the projection can be used to determine whether a test genome is directly ancestral to a present-day population or not. Second, we compute projections for several published ancient genomes. We compare two Neanderthals and three ancient human genomes to European, Han Chinese and Yoruba reference panels. We use a previously constructed demographic model and insert these five ancient genomes to assess how well the observed projections are recovered.

The projection of a test genome onto a reference panel provides insight about the demographic relationship between the population from which the test genome is sampled and the reference population (Yang et al. 2014). The projection shows the probability of observing a derived allele at a particular site in a test genome, relative to the derived allele frequency at that site of the reference population. Thus, using a test genome that is a member of the reference population would give a projection of one for all derived allele frequency categories. If the test genome does not belong to the reference population, then the projection may show that the test genome has more or fewer derived alleles than expected given the derived allele frequency in the reference panel. Yang et al. (2014) showed that, for a two-population scenario with no migration or population size changes, if the reference panel was sampled from one population and a test genome from the other, the projection is dependent on the effective population size and the time of divergence between the two populations. The projection is given by wðxÞ ¼ e 2t = 2N , where wðxÞ is the projection, x is the derived allele frequency in the reference panel, t is the time of divergence and N is the effective population size. As the two populations diverge further back in time, it is less likely to find a derived allele found in the reference panel in the test genome.
While the projection does not depend on x for the simplest model, it does for more complex models. Through simulations, Yang et al. (2014) described the relationship between the projection and the derived allele frequency for more complex demographic models. A small amount of past migration from the reference population into the test population has little effect on the projection. Migration from the test into the reference population, however, increases the projection for small x, indicating more low frequency derived alleles are found in the test genome than expected. Population size changes, particularly in the reference population, also alter the projection such that the number of derived alleles in the test genome for different derived allele frequency categories varies with x. The two demographic processes that have the greatest effect on the shape of the projection are population size changes in the reference population and admixture from the test population into the reference population (Yang et al. 2014).
Here, we explore how the projection of an ancient sample depends on the relationship to present-day populations. Then, we present the projections of several ancient hominin genomes onto present-day human populations, as represented by Phase 3 of the 1000 Genomes (1KG) Panel (The 1000 Genomes Project Consortium 2015).

SIMULATIONS OF ANCIENT SAMPLES
To simulate demographic scenarios including ancient samples, we used fastsimcoal2 (version 2.1, Excoffier et al. 2013) to model several demographic histories, from which samples were taken to form a reference panel of n = 200 and a test genome to project onto the reference panel. For each simulation, we projected an ancient sample onto a modern population or a modern sample onto an ancient population. The ancient samples were taken at 500, 1000, 2000, 3000 and 4000 generations ago (ga). Unless otherwise indicated, the effective population size was 5000. We considered two demographic models: a onepopulation model (OPM, Figure 1, OPM A-E) where the ancient sample was directly ancestral to the present-day population, and a two-population model (TPM, Figure 1, TPM A-E) where the ancient sample belongs to a sister population that diverged from the presentday population.
In OPM A, no population size change or migration was applied to the population. In OPM B, we applied a pulse of admixture of 0.05 at 750 ga from an unsampled population into the present-day population. We then allowed a population size expansion from 500 to 5000 at 750 ga (OPM C), a population size decline from 5000 to 500 at 750 ga (OPM D), and a bottleneck 500 to 1000 ga, where the population reduces from 5000 to 500, before recovering to 5000 ( Figure 1, OPM C-E). In the TPM, the same five scenarios were simulated. Again, we considered no population size changes or migration (TPM A), before adding migration from the sister population into the present-day population (TPM B). The three population size changes occur only in the present-day population ( Figure 1, TPM C-E).
For the OPM, when the reference panel is from the present and the test genome is ancient, the projection's shape does depend on the sampling time ( Figure 2, top row). In Figure 2, the projection of an ancient sample onto a reference panel comprised of members of the descendant population decreases with the age of the sample. When there are no population size changes or migration (Figure 2, OPM A), the projection follows the wðxÞ ¼ e 2t = 2N line, where t is the age of the ancient sample, not the time of population divergence. Small amounts of admixture from an unsampled population have no effect on the projection (Figure 2, OPM B). Population size changes show different levels of effect for different sampling times. When there is a population expansion, the projection decreases for small x (Figure 2, OPM C), while when there is a population decline, the projection increases for x ( Figure 2, OPM D). A bottleneck results in a humped shape similar to that observed when the test genome is sampled from a related population that diverged prior to the bottleneck (Figure 2, OPM E). Changes in the sampling time result in slight changes in the shape of the projection, but the projection retains the characteristic shape for that type of population size change.
The mirror scenario, where the reference panel consists of ancient samples and the test genome is sampled from the present, looks markedly different ( Figure 2, bottom row). Here, the present-day test genome looks no different from the ancient population upon which it is projected. This is reasonable because the main contribution to deviations in the projection from wðxÞ ¼ 1 is from new mutations in the reference population that are not found in the test population. When the reference panel is made up of ancient samples, there are no new mutations in the reference population that are not also in the presentday population from which the test genome is sampled. Thus, using an ancient reference panel and a test genome from the descendant population will not give insight into the demographic changes that the population has undergone between the time of sampling and the present-day.
In the TPM, the results for the projection are very different than those found for the OPM. The simplest scenario ( Figure 3, TPM A) highlights a difference in the projection relative to OPM A ( Figure 2). In TPM A, the projection is lower for ancient samples, until the time of sampling is younger than the time of divergence. When the time of sampling is younger than the time of divergence, the projection no longer changes as the sampling time changes-it looks the same as if the test genome was sampled from the present-day. Thus, if the time of sampling is known, the projection can determine whether an ancient sample is directly ancestral to a present-day population or a member of a related population that diverged before the time of sampling.
A pulse of admixture from the test population into the reference population shows an increase in rare alleles, but only if the test genome was sampled after the time of divergence ( Figure 3, TPM B). Population size changes show the characteristic effects (decline in rare alleles for population expansion; increase in rare alleles for population decline; 'humped' effect for population bottleneck; Figure 3, TPM C-E). Similar to the TPM A case, the projections for test genomes sampled more recently than the time of divergence look the same as for when the test genome was sampled in the present.
In the TPM, when the reference panel consists of ancient samples and the test genome is sampled from the present-day, the projection is again different than the reverse (Figure 3, bottom row). As the reference panel is sampled closer to the time of divergence, the projection moves closer Figure 1 Simulated demographic models used to illustrate the effect of ancient samples in a onepopulation model (OPM) and two-population model (TPM). The Ã represents where the present-day population was sampled and the gray dashed line indicates when the ancient genomes were sampled [0-4000 generations ago (ga)]. Any divergence occurs 2000 ga. For both OPM and TPM, A has an N e of 5000, with no population size changes or admixture. B adds a pulse of admixture from the second diverging population. C has no admixture but allows a population size expansion from 500 to 5000 in the reference population 750 ga. D allows the reverse, a population size decline from 5000 to 500 in the reference population 750 ga. E has a bottleneck from 5000 to 500, 500-1000 ga. Any diverging population has the same N e as the ancestral population.
to the wðxÞ ¼ 1 line and away from the wðxÞ ¼ e 2t = 2N expected if the reference panel was sampled from the present. Once the reference panel is sampled from a time at least as old as the time of divergence, the projection acts similarly as in OPM A; the test genome looks as if it was sampled from the reference population, that is, wðxÞ ¼ 1 for all x. To conclude, the shape of the projection can be affected by the time of sampling. Particularly, the dynamics are notably different when the ancient samples are directly ancestral to the present-day samples and when they belong to a sister population that diverged from the presentday population. In the following analysis, we highlight when this distinction can be made with ancient hominin data.

PROJECTIONS OF NEANDERTHALS, DENISOVANS, AND OTHER HUMANS
In this study, five ancient genomes were compared to present-day human populations using projection analysis. Of the five, two are Neanderthal and three are ancient modern humans. Table 1 indicates the sampling time, as indicated by the study in which the genome was sequenced. The Vindija Neanderthal was the original Neanderthal genome sequenced , and the Mezmaiskaya Neanderthal was sequenced by Prüfer et al. (2014). The three ancient modern humans used in this study are the Ust-Ishim (Fu et al. 2014), the Loschbour and the Stuttgart genomes (Lazaridis et al. 2014). The Ust-Ishim individual died 45,000 years ago (kya), and is equally distant from all present-day non-Africans, with some greater admixture into present-day East Asians (Fu et al. 2014). The Loschbour and Stuttgart genomes date to around 7-8 kya, in Central Europe. The Loschbour individual was found in a hunter-gather site, while the Stuttgart individual was associated with the Linearbandkeramik farming culture. Both of these genomes are of West Eurasian ancestry and are members of different populations that contributed to presentday European populations (Lazaridis et al. 2014).
We project these five genomes onto three reference panels representing Europeans (CEU), Han Chinese (CHB) and the Yoruba (YRI) populations. To calculate the projection, we modified the analysis from that found in Yang et al. (2014) to use reads instead of genotypes called from the reads, in order to more accurately assess low coverage samples.
We used the CEU, CHB, and YRI panels from Phase 3 of the 1000 Genomes Project as the reference panels (The 1000 Genomes Project Consortium 2015). We considered only biallelic sites where the mutation was a transversion. We filtered out any sites where the mapping quality was less than 30, and for each ancient genome we filtered for sites where the coverage was within the 2.5%-97.5% interval of the coverage distribution unique to each sample (Table 1, minCov and maxCov). The derived allele frequency of the reference panel was determined by using the genotypes assessed in the Phase 3 panels and the ancestral allele called in the Phase 3 1000 Genomes data set. For each site, the test genome was called derived or ancestral by choosing randomly from the set of reads for that site. The projection was calculated across all autosomal sites that were not filtered out by the above criteria. A minimum projection value (MPV) was calculated using the average projection for x . 0.5. The projections within each panel were compared to each other and to the line wðxÞ ¼ 1 using the sum of least squares (LSS) score (Table 2). In the projections, there are several notable characteristics (Figure 4, black curve and Table 2). First, with respect to the reference panel refCEU, the projections for the ancient samples can be divided into three main groups. The Neanderthals have the lowest projections, with MPV values of 0.4622 and 0.4802 (Figure 4, top row). Both Neanderthals show a substantial increase in rare alleles and have very similar projections (LSS = 0.61, Table 2). The Ust-Ishim shows the next lowest MPV of 0.9027 (Figure 4, top row) with minor deviations from a horizontal line likely indicative of population size changes in the refCEU population. The Loschbour and Stuttgart genomes lie almost on the wðxÞ ¼ 1 line (Figure 4, top row and Table 2; LSS = 0.47 and 0.30), with a slight decrease for small x.
For the refCHB reference panel (Figure 4, middle row and Table 2), the projections for the Neanderthal genomes are nearly identical to that observed for the refCEU panel (MPV values of 0.4323 and 0.4490, Figure 4, middle row). The Ust-Ishim, Loschbour and Stuttgart projections all indicate they are not members of the CHB population (MPV = 0.8626, 0.8748, and 0.8632, Figure 4, middle row). LSS values for each projection are all very high, ranging from 13.61 to 170.61, further supporting that none of these ancient genomes are directly w is the value of the projection and x is the derived allele frequency in the reference population.
ancestral to the Han Chinese (Table 2). Finally, for the refYRI panel, the projections are unusual (Figure 4, bottom row), but similar to that observed by Yang et al. (2014). The Neanderthals have a higher projection onto the refYRI panel (MPV values of 0.6539 and 0.6697, Figure 4, bottom row) than the non-Africans. The higher MPVs are probably because the Yoruba did not undergo the same bottleneck detected in non-Africans. For non-Africans, the projection increases for large x, which was shown in simulations of Yang et al. (2014) to be due to high levels of ancient admixture between the ancestral Yoruba and non-African populations, as well as a population decline in the Yoruba population. This results in a closer fit to the wðxÞ ¼ 1 line and lower LSS scores (Table 2), despite the fact that these genomes are not ancestral to the present-day Yoruba population. The shape of these projections is very similar to those for present-day non-Africans relative to the refYRI panel (Yang et al. 2014).

COMPARING THE PROJECTIONS TO A SIMULATED DEMOGRAPHY
To gain greater perspective on how the projections of these ancient genomes relate to human demographic history, we compared the ancient genomes to simulated projections taken from a demographic model. We used the demographic model that best fit the set of projections for modern humans published in Yang et al. (2014), which included eight populations of European, African, East Asian, and Papuan origin, and the Altai Neanderthal and Denisovan. For each ancient genome, we simulated the same demographic model, adding a single simulated sample retrieved at the time indicated in Table 1, where one generation is assumed to be 25 years. The Neanderthals were placed on the Neanderthal lineage, the Ust-Ishim genome shared a common ancestor with Europeans and East Asians, and the Loschbour and Stuttgart genomes were placed on the European lineage ( Figure 5), in accordance with the conclusions of their respective studies Prüfer et al. 2014;Fu et al. 2014;Lazaridis et al. 2014).
Using fastsimcoal2 (ver 2.1, Excoffier et al. 2013) and Brent's algorithm, the time (T 8 ) and amount (f NEA-ANC1 ) of Neanderthal admixture, the time of Neanderthal divergence (T 15 ) and the recent admixture from Europeans to Yoruba (f FRE-YOR ) were allowed to vary to improve the fit of the projections (Figure 5, bolded). The LSS was calculated when each simulated and real projection was compared (Figure 4, LSS score in top right corner). Using a time of Neanderthal divergence of 610,175 years, with admixture into non-Africans 38,950 years ago of 0.018, and recent admixture 7,500 years ago from Europeans to Africans of 0.02 (Table 3), the simulated and observed projections exhibited low LSS scores (Figure 4).

DISCUSSION
Simulated scenarios show that the projection can distinguish between samples directly ancestral to a reference population and samples that belong to a sister population that diverged from the reference population. The projections of the Neanderthals all show a very similar projection to each other with respect to each reference panel, despite the w is the value of the projection and x is the derived allele frequency in the reference population.
n Covg, coverage; MinCov, minimum coverage; MaxCov, maximum coverage a In thousands of years, roughly taken from the date ranges found in the reference. b The average coverage given in the reference. c The 2.5% and 97.5% interval cutoffs for the coverage that were used in the analysis. differences in sampling time. They also look very similar to the Altai Neanderthal and Denisovan projections analyzed in Yang et al. (2014). Therefore, these genomes belong to a sister group and the reconstructed demographic history that recovers the observed projections also places them all in a sister group. These results concur with the conclusions of previous studies (Prüfer et al. 2014;Meyer et al. 2012;Reich et al. 2010) The increase in rare alleles for their projections onto the refYRI panel was recovered by including some recent admixture from Europeans to the Yoruba population. Another scenario that was not illustrated here is direct admixture from Neanderthals or a sister group to Neanderthals directly into the ancestral Yoruba population. This is unlikely, as recent studies have proposed recent admixture from non-African to African populations (Wang et al. 2013;Wall et al. 2013), and another (Gallego Llorente et al. 2015) has shown that there is European gene flow back into many regions of Africa. While we simulated direct admixture from the CEU population to the Yoruba, the admixture may have come from a population distinct from the ones to which the Loschbour and Stuttgart genomes belong. Accounting for this may improve the fit of the Loschbour and Stuttgart projections onto the refYRI panel.
The Ust-Ishim genome is different from both the European and East Asian panels, showing it is likely not a member of either population, but it behaves similarly to other non-Africans with respect to the Yoruba panel. When a simulated ancient sample was placed directly ancestral to Europeans and East Asians 45 kya, the simulated projection was very similar to the observed projection, illustrating that the shape of the projection can largely be attributed to the population size changes in Europeans and East Asians after the Ust-Ishim was sampled.  The Loschbour and Stuttgart genomes sit on the wðxÞ ¼ 1 line when projected onto the refCEU panel, but not when projected onto the refCHB or refYRI panel. The projections show that the Loschbour and Stuttgart could be considered the same population as presentday Europeans. Lazaridis et al. (2014) showed that both of these genomes are members of different ancestral source populations for present-day Europeans. Though Europeans are composed of several different source populations, the projections analyzed shows only that these two genomes are ancestral to Europeans, but it does not specify whether there are other ancestral populations also.
Projections provide a visually appealing method of comparing a single genome against a set of genomes belonging to a well-studied reference population. When genomes sampled are ancient, the projection can distinguish between several different demographic scenarios, providing further insight into potential demographic models to test using a more statistically rigorous analyses.

Conclusions
Projection analysis is a useful tool for studying the relationship between two populations. Here, we have demonstrated the effects on the projection when ancient samples are included. For scenarios where the ancient population is directly ancestral to the modern population, if the test genome is ancient and the reference panel is modern, the projection reflects the changes in the reference panel since the sampling time. However, when the test genome is modern and the reference panel is ancient, the projection of the test genome is on the line wðxÞ ¼ 1, despite the time that has passed since the ancient genomes were present.
In the alternate scenario where the ancient population is a member of a sister population, if the test genome is ancient and the reference panel is modern, the projection looks the same as when the test genome is sampled from the present. In the reverse situation when the test genome is modern and the reference panel is ancient, the projection of the test genome moves closer to the wðxÞ ¼ 1 as the reference panel is nearer to the time of divergence.
We studied the projections of several ancient hominin genomes. Neanderthals were not directly ancestral to modern humans. The Ust-Ishim projection looks ancestral to both Europeans and East Asians, and the Loschbour and Stuttgart projections suggest that they are ancestral to Europeans, but not to East Asians or the Yoruba.
Projections provide insight on the ancestry of the ancient genome and their relationship to present-day populations. Future studies of ancient genomes may find projections useful as a test for the ancestral relationship between the ancient sample and present-day populations. While not a method of demographic inference, the projection's shape provides clues as to the direction of further model testing using formal n 0.018 f FRE-YOR 0.02 Figure 5 The placement of the five ancient genomes (black circles) in the demographic model described in Figure 7 of Yang et al. (2014) (shaded gray, Table 1 of Yang et al. (2014) contains parameter values). The time of sampling for these five genomes are included in Table 1. Bolded parameters are those that were modified to improve the fit onto the projections (values in Table 3).