A weakly structured stem for human origins in Africa

Despite broad agreement that Homo sapiens originated in Africa, considerable uncertainty surrounds specific models of divergence and migration across the continent1. Progress is hampered by a shortage of fossil and genomic data, as well as variability in previous estimates of divergence times1. Here we seek to discriminate among such models by considering linkage disequilibrium and diversity-based statistics, optimized for rapid, complex demographic inference2. We infer detailed demographic models for populations across Africa, including eastern and western representatives, and newly sequenced whole genomes from 44 Nama (Khoe-San) individuals from southern Africa. We infer a reticulated African population history in which present-day population structure dates back to Marine Isotope Stage 5. The earliest population divergence among contemporary populations occurred 120,000 to 135,000 years ago and was preceded by links between two or more weakly differentiated ancestral Homo populations connected by gene flow over hundreds of thousands of years. Such weakly structured stem models explain patterns of polymorphism that had previously been attributed to contributions from archaic hominins in Africa2–7. In contrast to models with archaic introgression, we predict that fossil remains from coexisting ancestral populations should be genetically and morphologically similar, and that only an inferred 1–4% of genetic differentiation among contemporary human populations can be attributed to genetic drift between stem populations. We show that model misspecification explains the variation in previous estimates of divergence times, and argue that studying a range of models is key to making robust inferences about deep history.


Details about populations used in analyses
inference due to batch effects (as demonstrated in Section 7.6). We therefore wanted to be able to perform 155 joint calling in all of our samples. Together, these considerations make the jointly called Thousand Genomes 156 and ADRP dataset best-suited for our demographic inference. 157 From a study design perspective, our choice of populations aimed to improve our understanding of early 158 human demography in Africa. The Nama derive ancestry from a relatively deeply diverged population 159 branch, and thus their demographic history is of unique interest. Given previous archaeological and genetic 160 evidence, we expected that a minimal model for the Nama must include pastoralist East African groups 161 that contributed to the Nama (e.g., Uren et al., 2016), as well as European colonial admixture. The 162 inclusion of three Ethiopian populations here represent some uncertainty in the possible source population 163 for the pastoralist contribution to the Nama, and the Amhara and Oromo also have substantive "back-164 to-Africa" ancestry. The Gumuz serve as a representative of southwest Ethiopian ancestry that pre-dates 165 the Holocene back-to-Africa migration. Finally, because previous studies have suggested a role for "ghost"    Figure S40). Data filtering 182 was identical for both analyses. We retained biallelic SNPs with no missing data and minor allele frequency 183 greater that 5%. We then thinned the remaining sites to remove SNPs in high linkage disequilibrium with one another. We used the locate unlinked function from scikit-allel version 1.2.1 (Miles et al., 2021) to identify SNPs in low linkage disequilibrium, using a threshold of r 2 = 0.1, a window size of 100 SNPs, 186 and a step size of 20 SNPs, removing the remaining SNPs. This process was iterated five times for each 187 chromosome, resulting in 327,656 roughly unlinked SNPs used in the PCA and ADMIXTURE analyses. 188 2 Computing statistics 189 2.1 LD and diversity statistics used in model fits 190 We used multi-population linkage disequilibrium (LD) and pairwise diversity statistics to fit demographic 191 models to data. These statistics, introduced and described in detail in Ragsdale and Gravel (2019), are the 192 multi-population analog of the classic LD statistics first described and computed by Hill and Robertson 193 (1968); Ohta and Kimura (1971). 194 Given two biallelic loci, with alleles A and a and the left locus and alleles B and b at the right locus, the 195 standard covariance measure of LD is D = p AB p ab −p Ab p aB , where p AB is the frequency of AB haplotypes in 196 a population (and thus the probability of drawing an AB haplotype in a random sample of that population). 197 Hill and Robertson (1968) solve for the expectation of D 2 using a system of equations that includes  indexing. That is Dz i,j,k = D i (1−2p A,j )(1−2p B,k ) and π 2;i,j,k,l = p A,i (1−p A,j )p B,k (1−2p B,l ). We consider 207 statistics normalized by π 2 in an arbitrary reference population (throughout, we use the Mende π 2 ), which 208 removes any dependence on the mutation rate. Thus, statistics take the form HapMap maps were inferred using array data, which is sparser than whole-genome sequencing data as was 234 used in the Nama map inference. In subsequent analyses, we used data based on recombination rates from 235 the OMNI YRI map, and in the Supplemental Results we show that the inferences are robust when using 236 the Nama recombination map (Section 7.1). 237 We removed bins of recombination distance less than a recombination distance of r = 5×10 −6 (at a rough 238 estimate of 1 cM/Mb, this corresponds to a minimum distance of 500 bp on average) to avoid previously 239 reported biases at short distances due to processes like multinucleotide mutations (Harris and Nielsen,240 2014; Ragsdale and Gravel, 2019). To avoid uncertainty in phasing, we used unphased genotypes to 241 compute LD statistics, as proposed in Ragsdale and Gravel (2020).

242
Statistics were computed from intergenic loci, as described in Section 1.4, to avoid biases due to direct 243 or indirect selection in protein-coding regions. To assess how strongly selection in coding regions affects LD 244 and diversity statistics, we also computed statistics from all regions of the genome that pass the Thousand 245 Genomes strict mask. Genome-wide statistics showed slight deviations from statistics inferred from intergenic 246 regions alone ( Figure S9), although the differences were smaller than the differences between our best-fit 247 models and the data they were fit to. Nonetheless, we chose to use only intergenic data to avoid possible 248 biases due to selection in and around protein-coding regions.

256
Some statistics, such as E[D i D j ], require at least two diploid samples from a population to compute.

257
Since we used a single Neanderthal sample, such statistics for the Neanderthal population were not used in 258 the fit. By contrast, there are statistics that only require a single sample in a given population to estimate. 259 These include cross-population nucleotide diversity measures, as well as some LD statistics involving more 260 than one population. For example, statistics of the form D human (1−2p human )(1−2q neanderthal ) require a single 261 Neanderthal sample and are informative of the Neanderthal demography. These statistics were included in 262 the fit, but statistics requiring more than one Neanderthal sample to estimate were removed. As is commonly done in demographic inference, all models we considered are initialized from an ancestral 280 panmictic population at mutation-recombination-drift equilibrium. This allows for rapid initialization of 281 the summary statistics to reasonable values. The size N e of this ancestral population is a model-specific 282 adjustable parameter that also serves as a reference size to define subsequent population size changes.

283
For the early history following this initialization, we tested model parameterizations that cover many 284 of the proposed scenarios of population structure, size changes, and/or archaic hominin admixture. The we account for both scenarios with this parameterization.

293
To include population structure in early human history, we considered multiple parameterizations of 294 models that allowed more than a single stem population. In general, stem populations were allowed to vary 295 in their sizes, split times, and migration rates, with parameterizations flexible enough to encompass proposed 296 scenarios of either archaic hominin admixture or population structure, both connected by gene flow or with 297 isolation between stems.

298
In one parameterization of early structure, which we refer to as a "continuous migration" model, a 299 secondary stem population (stem 2) split from the primary stem (stem 1) that leads to contemporary

303
This continuous migration was allowed until stem 2 disappeared, which occurred as recently as 5ka. We 304 tested models that both allowed or disallowed migration between stems, i.e., before stem 1 split into S/E/W 305 African populations.

306
In another parameterization of early structure, a secondary stem population (stem 2) contributed ancestry 307 to present-day populations via a series of instantaneous admixture events (i.e., pulse admixture or "merger" 308 events) to lineages leading to sampled present-day African populations. Merger events were allowed to occur 309 in one or more of the Nama, Mende, and Gumuz branches, as well as the branch of East and West Africans 310 prior to their split. Those admixture events were allowed to occur at any time along those branches, and 311 with any proportions, and stem 2 was allowed to split from the primary stem at any time before subsequent 312 divergences (and either before or after the split of the Neanderthal branch). We tested models that allowed 313 migration between the early stems, before subsequent splits and admixture events, as well as models that 314 were restricted to isolation between stem branches. Depending on the specific parameters, such models 315 encompass commonly-considered scenarios for admixture from a ghost archaic hominin population (e.g., if 316 a long-isolated lineage more recently contributes a minority of ancestry to one or more populations), as well 317 as relatively simple fragmentation-coalescence scenarios.

318
Based on the geographical locations of present-day populations, we tentatively labeled ancestral branches 319 using a parsimony in migration, referring to South, East, and West African branches (S/E/W AFR, Fig-320 ures 1, 2). We do not know the geographical location of these ancestral populations (nor even if they 321 correspond to populations with a well-defined geographic range), and these labels should be considered as When testing multiple variations of the more complex models, we kept these values fixed (see Table S2  while the models with multiple stems all provide reasonable fits to the data. The merger-with-stem-migration 449 model performed the best of these four models, although the fits were nearly visually indistinguishable from 450 the fits from the merger-with-stem-migration model.

451
A simulation-based validation of the confidence interval calculation is provided in Section 7.5. For each model tested, we ran multiple rounds of likelihood optimization, alternating between the opti-454 mize log fmin, optimize log powell, and optimize lbfgsb methods to explore parameter space and hone in on 455 the best fit parameters. We imposed bounds on many of the parameters in the fits. This is done primarily  Event times were only constrained to occur in the proper order (so that populations were required to have 461 positive existence times, for example). Most best-fit parameters were away from these boundaries.

462
Initial guesses for parameters were chosen from demographically plausible starting points and then per-463 turbed to explore space, using gradient descent (on the log of the parameters). The best fits from these initial  branch" is a branch within a marginal tree (meaning, the tree at a given locus) that has its upper end (i.e., 484 the node of its coalescence with another branch) extending to more than 1 million years in age, and we where H ij is the average number of differences between one haploid copy of the genome from population i 511 and one from population j, and H i and H j are the average number of differences for two haploid copies from 512 the same population.

513
In all inferred models, F ST between Neanderthals and the ancestors of contemporary humans were much 514 larger than between any pair of contemporary human populations or their ancestors (stems). F ST between 515 human populations over the last 100ka remained relatively low, at ∼ 0.1 or less. In the continuous-migration 516 model, the two stem populations also were predicted to have low F ST within this range, as migration 517 rates were inferred to be high enough to maintain genetic similarity. In both the merger-without-stem-518 migration and the merger-with-stem-migration, one of the stem populations was inferred to have a short, 519 severe bottleneck, which led to increased F ST between stem populations during that time. In the merger- To evaluate the degree to which divergence among contemporary human populations is inherited from di-533 vergence between early stem population, we computed f 4 statistics of the form This statistic measures co-linearity of genetic drift between contemporary populations v = (p present 1 − 535 p present 2 ) and stem populations w = (p stem1 − p stem2 ) (see Figures S16-S19 and Lipson (2020) for an 536 overview of f 4 statistics).

537
To contrast these statistics to divergence across contemporary populations, we interpret the f -statistics as scalar products between vectors v and w, i.e., f 2 (present 1 , present 2 ) = v · v 539 and f 4 (stem 1 , stem 2 ; present 1 , present 2 ) = v · w. We can then write v = v ⊥ + v ∥ as the sum of a com-540 ponent v ⊥ perpendicular and a component v ∥ parallel to w = (p stem1 − p stem2 ). We can then decompose Finally, the proportion of present-day structure v 2 = f 2 (present 1 , present 2 ) that is accounted for by the 543 component parallel to ancient population structure is α 2 can be interpreted as the proportion of the f 2 divergence across contemporary populations that is aligned 545 with the differences in allele frequencies across ancient populations. However, drift occurring after the time 546 of sampling of ancient populations is not correlated to drift occurring before that time. For this reason, we 547 also interpret α 2 as the proportion of the f 2 divergence across contemporary populations that is explained 548 by the differences in allele frequencies across ancient populations. We report this predicted statistic for each 549 of the four inferred demographic models in Figures S16-S19. In our best fit models (continuous migration 550 and merger with stem migration), differentiation between Stem 1 and Stem 2 (before the split of Stem 1 551 into Stem 1E and Stem 1S in the merger model) contributes a maximum of 4% to differentiation between 552 the Nama and Gumuz, and closer to 1% to differentiation between the Nama and Mende or the Mende and 553 Gumuz.  Figure S1: Present-day structure attributable to ancient population structure. The statistic f 4 (stem 1 , stem 2 ; present 1 , present 2 ) captures how much of present-day population structure can be attributed to population structure among the stems. The blue line represents genetic drift along ancestral branches between contemporary populations, while the red line represents drift between stem populations. f 4 measures the overlap between the blue and red line, which corresponds to shared genetic drift between the two pairs. In scenario A, contemporary populations are descended from distinct stems, and f 4 (stem 1 , stem 2 ; present 1 , present 2 ) equals the drift between stems, f 2 (stem 1 , stem 2 ). In scenario B, despite deep population structure, there is no shared drift and f 4 = 0. That is, present-day population structure is unrelated to structure among stems. In more realistic scenario C, where present 1 receives a proportion m of its ancestry from the dotted arrow, the overlap is only found along the corresponding path, and f 4 depends on both ancestry proportion m and drift between stems, f 4 (stem 1 , stem 2 ; present 1 , present 2 ) = mf 2 (stem 1 , stem 2 ) relevant to the modeling presented here. The models we inferred account for both Neanderthal admixture 558 in Eurasia and Back-to-Africa migrations, such that they predict the amount of Neanderthal ancestry that 559 we would expect to find in African individuals. We find that African populations without high proportions 560 of recent Eurasian ancestry are predicted to carry 0.1 − 0.2% Neanderthal ancestry (Table S8) Here we show how the predictions of our model can be related to morphological variation in the fossil record.

567
Let ∆ 2 be the expected squared difference in a morphological measurement between fossil 1 and fossil 2, whereX 1 and V 1 andX 2 and V 2 are measurement means and variances for the populations from which fossil 569 1 and fossil 2 are sampled.

570
If we assume that the trait being measured is evolving neutrally, which seems reasonable as an approx- and which relates ∆ 2 to between and within population coalescence times.

580
As a further approximation, if we assume that the heritability of the measurement is close to one half, which shows that under these assumptions ∆ 2 is proportional to τ B .

583
Nucleotide diversity, π, for pairs of individuals sampled from different populations is also proportional to 584 τ B . Therefore, ratios of ∆ 2 can be predicted from ratios of π for the same comparisons. For example, a ratio 585 for which π for Stem 1 and Stem 2 is the numerator and π for a pair of contemporary human populations 586 is the denominator can be used to predict ∆ 2 for two fossils sampled from each of the two stems based on 587 ∆ 2 for contemporary humans. In this way, the predictions of our model can be related to morphological 588 variation in the fossil record. Eurasian populations, as these would be poorer proxies for early Holocene and recent colonial migration from 604 the Near East and Europe back to Africa, and it is unclear how such a model should be interpreted. We 605 therefore refit each of our best-fit demographic models using the following alternate population sets: (Nama, 606 YRI, Gumuz, Oromo/Amhara, and CEU), and (Nama, MSL, Gumuz, Oromo/Amhara, and TSI). population structure may be impacted by recent admixture.

615
To assess the impact of recent European admixture in Nama individuals on inferred early divergence 616 times and population structure, we subset the set of Nama individuals that we included based on inferred 617 European-matching ancestry proportions. We used inferred ancestry proportions from ADMIXTURE with the number of clusters K = 4 (Section 1.5) and kept individuals with either at least a) 90% or b) 99% proportion 619 of inferred ancestry that clusters within the Nama group (colored green in Figure 2D). This left a) 18 Nama 620 individuals with at least 90% inferred South African ancestry, and b) 14 Nama individuals with at least 621 99%. For each subset, we recomputed diversity and heterozygosity statistics and fit each of our demographic 622 models to these statistics. In order to reduce parameter space to avoid overfitting, we fixed parameters that have previously been 625 inferred, including the timings of the early back-to-Africa migration contributing to the ancestry of East

626
African agricultural populations (12ka), the divergence of West and East Africans (60ka), and the out-of-

627
Africa migration event (50ka). Due to variability in inference and dating methods, there is some uncertainty 628 surrounding these dates. We explored how variable timings of these events effect our inferred models by 1) 629 increasing the fixed times of the E-W African split (to 100ka) and the out-of-Africa event (to 60ka), and 2) 630 allowing those two parameters to be jointly fit in model optimization. Here we focus on four parameterizations of early human history: (1) a "single origin" with recent expansion,

639
(2) primary and second stem populations with "continuous migration", (3) two stem populations that do not 640 exchange migrants before merging to form regional African populations, referred to as a "merger without 641 stem migration", and (4) two stem populations that are allowed to exchange migrants before merging to form 642 regional African populations, referred to as a "merger with stem migration". Given the best fit parameters 643 for each model (Tables S3-S7)  We compared the observed cSFS to model predictions for the Nama, Mende, and Gumuz populations. The 664 cSFS from data is sensitive to ancestral allele misidentification, but our simulations assume perfect knowledge 665 of the ancestral allele. To investigate the role of misidentification, we compared predicted and observed cSFS 666 for transitions and transversions separately as well as jointly.

667
For each comparison, we fit a misidentification parameter p misid as well as a scaled mutation rate θ.

668
The misidentification parameter is the fraction p misid of loci that have the ancestral state misidentified.

669
Accounting for p misid mixes the predicted joint population SFS with a small amount of the ancestral-allele-  northern Europeans (CEU), respectively, and refit our inferred demographic models to diversity and LD 683 statistics computed from those populations (along with Nama, Oromo/Amhara, and Gumuz). Because the 684 MSL and YRI share recent West African ancestry, and the GBR and CEU share recent European ancestry, 685 we expected most parameters in inferred models using these two datasets to be largely similar. Indeed, all 686 parameters inferred using the YRI and CEU data were well within one standard error of the same parameters 687 inferred using the MSL and GBR data, across all tested models. The one exception to this was the recent 688 size of the YRI branch, inferred to be somewhat smaller than the final size of the analogous MSL branch 689 (≈ 20, 000 vs. ≈ 30, 000, resp.). Other population sizes, migration rates, and divergence times were highly 690 consistent.

691
Because CEU and GBR are genetically closely related, we also tested the consistency of inferred param-692 eters when replacing GBR with Toscani from Italy, a more distantly related European population set than 693 GBR. Most parameters in each of the tested models were consistent with parameters inferred using GBR.

694
The only large difference was found in fitting the continuous migration model, where the Nama divergence between the models fit using GBR and TSI, with ancestral populations "reshuffling" between 100 and 120ka.

707
In summary, across all validations to alternative datasets described in this section, model fits were largely The LD statistics computed using the OMNI YRI recombination map and the Nama-specific recombination 716 map were very similar, with only a handful of statistics showing slight visible differences ( Figure S8). We 717 therefore expected inferred demographic models to be highly consistent when fit to the two sets of statistics.

718
Across each of our models, inferred parameters from fits using data from the Nama-specific map were 719 unchanged, with best-fit values all within one estimated standard error from the inferred parameters from 720 the OMNI YRI map.

721
Among human populations, the Nama are relatively highly diverged from the sampled populations used 722 in estimating the OMNI recombination maps (such as the YRI), and the two recombination maps were 723 inferred using different computational methods from different types of sequencing data (arrays vs. whole 724 genomes). Therefore, differences between these two maps are likely near the upper bound for differences in 725 recombination rates between present-day human populations (van Eeden et al., 2022). Because our fits were 726 highly consistent when using the two recombination maps, we conclude that inferences using the diversity 727 and LD statistics here are robust to variation in recombination rates between present-day populations as 728 well as to any differences in inferred recombination rates due to true recombination rate differences or typical 729 statistical error in recombination rate estimation. proportions. Nonetheless, we included Nama individuals that exceeded either 90% or 99% ancestry primarily 741 shared among Nama individuals.

742
For both thresholds, early history was robust in all tested models. Divergence times and migration 743 rates did not vary significantly from the original fits that included all unrelated Nama individuals. The 744 primary differences between these fits and the original fits that did not impose an ancestry threshold were 745 the inferred admixture proportions of East African agriculturalists (2ka) and Europeans (10 generations ago) 746 in the Nama population. The East African admixture proportion was reduced from ≈ 25% to 20 − 22%, 747 while the European admixture proportion was reduced from ≈ 15% to either 5 − 6% (with a 90% ancestry 748 cutoff) or 3 − 5% (with a 99% ancestry cutoff). No other parameters were strongly affected by subsetting the 749 Nama individuals by ADMIXTURE-inferred ancestry proportions. Because only the inferred proportions of 750 recent admixture in our model fits changed, we conclude that our inferences of early history are robust to 751 recent admixture, as long as that recent admixture is accounted for in the demographic models. divergence times to 60ka (for the out-of-Africa expansion) and 100ka (for the E/W African divergence). In 758 the next section, we allow those parameters to be fit along with all other model parameters.

759
By fixing those divergence times to older dates, the inferred models preferred older dates for the other free 760 divergence time parameters. These dates were stretched by a similar proportion that our fixed parameters 761 were shifted, so that the Nama divergence time was inferred to be ≈ 150ka, rather than ≈ 130ka. Notably, 762 older divergence times resulted in models with worse likelihoods than our primary presented results. Overall, 763 these inferred models have a similar interpretation to the primary inference results, and the deep population 764 structure was unaffected by these choices.

765
When we allowed the out-of-Africa expansion and eastern/western African divergence times to be jointly 766 fit in model optimization, we found that the best-fit inferred E/W divergence time and the OOA expansion 767 were more recent (45 − 55ka and 35 − 40ka, resp.). The remainder of the parameters were largely unaffected 768 by these shifts to more recent divergence times, and our interpretation of the inferred early history remains 769 unchanged. The 95% confidence intervals around these parameters were relatively narrow (≈ 5ka for the 770 OOA split, ≈ 10ka for the East/West African split), although this uncertainty does not capture uncertainty 771 due to model mis-specification (as discussed in Section 3.1).

772
In summary, fixing these parameters (i.e., East Africa-Eurasian split at 50ka, Neanderthal admixture at The single-origin model fit the cSFS relatively poorly across the three populations compared to the models 784 that allow structure between stem populations (Fig. S20). The model that best fit the cSFS was the merger- The IICR is only a reliable estimator of N e in the absence of population structure. However, since IIRC 800 are readily estimated and visually interesting, it is commonly attempted even when the assumption is not 801 met. Even if the assumption is not met, the inferred IIRC can be interpreted as a summary statistic for 802 which model predictions can be compared to data. It can therefore be informative about aspects of the data 803 that our models fail to predict. We also note that the coalescence rates inferred from data used a larger set  (Fig. S24). The model-inferred N e curves also show this general trend for the more recent reduction in 810 N e (Fig. S26). In the the more distant past, N e fluctuates in a manner that is broadly similar across 811 models with observed "humps" of increased N e . Despite the large differences in parameterizations and 812 interpretations of the four inferred demographic models, each of them show reconstructed N e trajectories 813 that are qualitatively similar, highlighting the difficulty in learning complex demography in the deep past 814 from reconstructed coalescence rate trajectories. 815 We also compared the relative cross-coalescence rates (RCCR) between pairs of populations in the four 816 inferred demographic models (Figures S29-S32) to those in the data (Fig. S28). In the recent past (<10ka), recombination rate r = 10 −8 and mutation rate u = 2 × 10 −8 .

845
As expected, the clean split models consistently provide divergence times T that are more recent than 846 the IM models. Because T and subsequent migration can be inversely confounded in SFS inference, fixing 847 migration rates to zero results in more recent inferred T (Fig. S36). Even though each of our four models

866
To do this, we simulated 500 1Mb regions using the same sample sizes as our analyzed data, with per-base mutation and recombination rates u = r = 10 −8 , using msprime (Baumdicker et al., 2022). From these 868 simulated regions, we replicated our bootstrapping procedure by resampling (with replacement) 500 regions. 869 We repeated this 500 times (to obtain 500 bootstrap replicate datasets). We fit the full (non-bootstrapped) 870 simulated dataset and each of the 500 bootstrap replicates, and compared the inferred parameter values to 871 the true simulated values (Figures S37 and S38).  Finally, many features of recent human history were consistently parameterized across each of our tested 880 models (such as recent migration rates and population sizes). We evaluated the consistency of these recent 881 parameters across our different models ( Figure S39). Since all models use the same set of bootstrapped simu-  coverage data, and 2) demonstrate that merging data without joint variant calling can lead to substantial 898 biases in inferred demographic parameters. We analyzed two populations, the Mende (MSL) and British 899 (GBR), and considered a two-population isolation-with-migration model. 900 We computed four sets of diversity and LD statistics, one set of statistics from the jointly called high 901 coverage data, one from the jointly called low coverage data, and one each from merged high and low coverage 902 (MSL high coverage and GBR low coverage; and GBR high coverage and MSL low coverage). Using the 903 measurement variances from each two-locus statistic in the high coverage data, we estimated measurement 904 error of the other three sets of statistics (low/low, low/high, and high/low) against the high coverage data.

905
The jointly called low coverage data provided statistics that closely matched the high coverage statistics, with 906 each statistic within 1-2 standard errors. On the other hand, both the merged datasets provided statistics 907 that deviated from the jointly called data, particularly for the cross-population statistics ( Figure S3. 908 With the observation that the low and high coverage data provided broadly consistent statistics while 909 the merged data provided inconsistent statistics, we next wanted to assess whether their differences affect   Figure S4: Merged datasets bias demographic inferences. The panels represent best-fit models according to different combinations of high-and low-coverage data. While statistics from low and high coverage datasets provide consistent best-fit demographic models, merged datasets result in biased LD statistics (Figure S3). These biased LD statistics lead to large downstream biases in inferred demographic models.
coverage data provides a consistent model fit to the high coverage data. Therefore, it is preferable to favor 919 jointly called low coverage data over a mix of high coverage but separately called data, at least when using 920 the diversity and LD statistics that we have in our analyses. analyses found that the more complex models (multiple stems with ongoing migration) provided the best 926 fit to the data. To ensure that these more complex models are not simply fitting noise in our data, we 927 simulated data under both simpler and more complex models, with same sample sizes and similar genome 928 length to that used in this study. From these, we fit each of our models and checked that the correct model 929 was chosen.

930
While simulations of datasets of this size are rapid, computing LD statistics is computationally burden-931 some. We simulated 500 1Mb regions under our best fit single origin, continuous migration, and merger 932 with stem migration models. Using these simulated regions, we constructed 500 bootstrap replicate datasets 933 by sampling genomic regions with replacement. Each of these 500 replicate sets were fit using the models 934 reported in the main text, and we compared log-likelihoods across each fit ( Figure S5). The correct model 935 was recovered in the all of the scenarios based on log-likelihoods alone (Table S1). We conclude that our 936 inferences are not simply fitting statistical noise, and the observed differences in log-likelihoods indicate 937 different abilities to capture signal present in the data.
938 Table S1: Model choice from fits to simulated data. We simulated three models, based on the best fit parameters from fits to data. Using these simulated datasets, we reinferred four models that we tested in our analyses in the main text, including simple (single origin) and complex models. Among 500 fits, we exclusively recover the simulated model, even when fitting more complex (i.e., more heavily parameterized) models to simpler simulated models (such as the single origin model).
Simulated model↓ Single origin Cont. migration Merger w/out stem mig. Merger w/stem mig.  Figure S5).  Figure S5: Log-likelihoods of fits of four models to simulated data. In three scenarios, a simple single origin model and our preferred continuous migration and merger with stem migration models, we simulated 500 replicates of 500 Mb with the same populations and sample sizes from our reported best fit models. We then fit the four models reported in the main text to each of these simulated datasets and compared log-likelihoods. In each case, the simulated model is correctly chosen, even under the single origin scenario. We conclude our more complex models (with more parameters) are not simply fitting statistical noise.

Allowing for additional migrations 945
In preliminary analyses, we found that inferred migration rates between the Nama and non-East African 946 groups were orders of magnitude smaller than migration rates between other populations included in our 947 inferences. Therefore, to reduce the number of parameters needing to be fit we fixed those rates to zero in 948 subsequent fits, and our reported best-fit models did not include migration between the Mende and Eurasians, 949 nor between the Nama and Eurasian populations (aside from recent colonial-period admixture). To assess 950 the impact of this choice, we re-inferred four of our reported models (single origin, continuous migration, and merger with and without stem migration), while allowing for additional parameters to include continuous, 952 symmetric migration for the duration of the coexistence of the Nama and Mende of the Nama and British. 953 Indeed, the inferred migration rates for both of these pairs of populations were small, with 2N e m ≪ 1 in 954 all cases. In the single origin model, the Mende-British migration rate was 1.04E-6 (2N e m = 0.02) and the 955 Nama-British rate was 2.67E-6 (2N e m = 0.05). In the continuous migration model, these rates were 2.69E-6 956 (2N e m = 0.038) and 3.38E-6 (2N e m = 0.048), resp. In the merger without stem migration model, these rates 957 were 2.96E-6 (2N e m = 0.064) and 5.46E-6 (2N e m = 0.119), resp. And in the merger with stem migration 958 model, these rates were 2.64E-6 (2N e m = 0.055) and 3.19E-6 (2N e m = 0.066), resp. In comparison with 959 other migration rates, with 2N e m ≳ 1, these migrations have negligible impact on underlying statistics. The 960 fits to other parameters were negligibly different from those reported with these migrations fixed to zero, 961 and log-likelihoods were unimproved. 962 7.9 Replacing the reference population for statistics normalization 963 We used observed π 2 and pairwise diversity in the Mende to normalize two-and one-locus statistics (see 964 Section 2.1). To confirm that this choice of normalization did not impact inferred model parameters, we 965 reran each of our reported models using data normalized by π 2 and pairwise diversity in the Gumuz. As 966 expected, the best fit parameters from fits using the Gumuz-normalized data were very close to those reported 967 in Tables S3-S7, and they were all well within the reported confidence intervals. Our results are therefore 968 unlikely to be sensitive to this choice of normalization.       Table S2: Fixed parameters in inferred demographic models. Many of the fixed parameters are specific to the Neanderthal branch, are constrained by known history, or were consistently fit across multiple model parameterizations. We assumed a generation time of 29 years throughout. See Section 3.1 for details.

B) Merger A) Continuous migration
Stem 2 Stem 1 Stem 2 Stem 1 Figure S6: Two scenarios for population rearrangement. In our demographic models, we considered A) continuous migration models, in which one of the stem population expands (splits into contemporary populations), and the other stem population(s) has continuous symmetric migration with those populations; and B) one or more of the stem populations expands, with instantaneous pulse (or "merger") events from the other stem population, so that recent populations are formed by mergers of multiple ancestral populations. 1.025 2 (1, 1, 2, 2) 2 (1, 2, 1, 2) Figure S7: Comparison of LD statistics parsed using two array-based recombination maps. The recombination map determines genetic distances between pairs of SNPs, so using different maps could result in systematic biases in parsed statistics if the maps differ significantly. Here, the solid lines are statistics parsed using the HapMapII recombination map, and dashed lines are using the OMNI-YRI map. Shading represents 95% confidence intervals from the OMNI map. These two maps results in LD statistics that are indistinguishable from one another. Notations for each statistic are described in section 2.1 and indexed by populations (0: Nama, 1: Mende, 2: Gumuz, 3: Amhara/Oromo, 4: British). 1.025 2 (1, 1, 2, 2) 2 (1, 2, 1, 2) Figure S8: Comparison of LD statistics parsed using OMNI YRI and Nama-specific recombination map. The OMNI YRI recombination map and the Nama-inferred map (van Eeden et al., 2022) differ in two primary ways: 1) the YRI and Nama have relatively highly diverged ancestries, and 2) the YRI map was inferred from low-density array data, while the Nama map was inferred from whole-genome sequencing data. Nonetheless, LD statistics computed under recombination rates from the two maps remain largely concordant, and inferences are unaffected by recombination map choice (Section 7). Shading represents 95% confidence intervals from the Nama-inferred map. Notations for each statistic are described in section 2.1, and indexes represent populations (0: Nama, 1: Mende, 2: Gumuz, 3: Amhara/Oromo, 4: British). We chose to perform model inference using intergenic regions alone to reduce bias due to selection in and around protein-coding regions (Section 2.2). Notations for each statistic are described in section 2.1, and indexes represent populations (0: Nama, 1: Mende, 2: Gumuz, 3: Amhara/Oromo, 4: British).     The continuous-migration model predicts that human stem populations remain genetically similar, while the merger with and without stem migration models predict a period of increased F ST between ancestors of contemporary humans. This is largely due to the very small inferred N e in one of the stem branches, which leads to a rapid increase in differentiation due to drift.  Figure S15: Predicted pairwise differentiation between contemporaneous populations. H i,j is the predicted pairwise differentiation between two populations per base-pair, using a mutation rate estimated from matching predicted to observed pairwise diversity from the intergenic data used in the fits (see Sec. 8).
The single-origin model (A) shows larger deviations between observed and expected differentiation than the models with early human structure (B: continuous migration, C: merger without stem migration, D: merger with stem migration). In the merger models (C and D), while F ST is large between early stems due to the sharp bottleneck in one of those branches, the pairwise differences are comparable to the continuousmigration model. Stem 1E and Stem 1S have low differentiation, as the bottleneck in Stem 1 (which they both split from) sharply reduces diversity by the time of the split. This has the effect of large F ST but small H i,j .   ) is interpreted as the proportion of the amount of drift between sampled populations A and B that aligns with drift between populations X and Y , sampled in the past (see Section 5.2). Despite population structure inferred to have extended up to 1Ma into the past, differentiation between contemporary African populations only traces back to differentiation between ancestral populations ∼100-200ka, indicating that while stem populations were structured (Fig. S14), differentiation due to that ancient structure contributes little to differentiation between contemporary populations. is interpreted as the proportion of the amount of drift between sampled populations A and B that aligns with drift between populations X and Y , sampled in the past (see Section 5.2). Unlike the continuous-migration model, population structure in the stems following the divergence of Neanderthal and human ancestors results in differentiation among contemporary populations that aligns with drift between the stems. In this model, migration between stem populations is disallowed.  Figure S23: Conditional SFS compared to merger-with-stem-migration model predictions. The merger-with-stem-migration model, in addition to providing the best fit to heterozygosity and LD statistics, provided the best fit to the cSFS in the Nama, Gumuz, and Mende populations. Rates of ancestral state misidentification were consistent across each comparison, with the ancestral state of transversion mutations estimated to be 0.7 − 1.0%, and that of transition mutations estimated to be 1.9 − 2.0%. These misidentification rates are consistent with the known difference in mutation rates between transitions and transversions, as well as other SFS-based inferences in humans using the Thousand Genomes data and the 6-primate alignment (see Methods).  Amhara/Oromo GBR Gumuz MSL Nama Figure S25: Inverse instantaneous coalescence rates inferred from the data, subset to focal populations in the model inferences. Fig. S24 shows IICR curves from reconstructed genealogies inferred using a larger dataset (including nearly 1,200 individuals) than the subset of populations we primarily focused on here. To assess the robustness of Relate-inferred genealogies and for a direct comparison to our inferred demographic models, we ran Relate on the 290 genomes from the Nama, Gumuz, Mende, Amhara/Oromo, and British individuals. While the reconstructed IICR curves are qualitatively similar to those from the full dataset for these populations in the distant past, there are noticeable discrepancies over the recent history in the last tens of thousands of years. Left: Relate run on full dataset, showing focal populations. Right: Relate run directly on the subset dataset of focal populations.  Figure S26: Inverse instantaneous coalescence rates reinferred from simulated data. We simulated data under our four highlighted models, sampling the same number of individuals per population as the original dataset, and then ran Relate on each simulated dataset to reconstruct IICR curves for each. Each model has qualitatively different histories, particularly for the distant past with varying numbers and timings of oscillations. However, due to the discrepancies in reconstructed genealogies using even the same data as input (Figs. S25 and S28), it is difficult to draw firm conclusions from quantitative comparisons between reconstructed genealogies.  Figure S27: Expected inverse instantaneous coalescence rates from models. The inferred demographic models specify population sizes, split times, and migration rates and events, making it possible to compute the exact expected IICR curve for each population. Due to discrete events (instantaneous size changes, for example), the models can produce non-smooth IICR curves, although each predicts a period of increased "effective size" in the past ∼ 200 − 500ka.  Figure S28: Relative cross-coalescence rates inferred from the data. Using the same reconstructed genealogies from Relate as displayed in Fig. S25, we computed relative cross-coalescence (RCCR) curves for pairs of populations using genealogies from the either the full dataset (∼ 1, 200 individuals, top) and the subset of focal populations (290 individuals, bottom). Despite the overlap in samples, the two sets of inferred genealogies provided inconsistent RCCR, with both the timing and ordering of increased RCCR varying between the two datasets. Genealogies reconstructed from the full dataset provide a better match to each of inferred demographic models (Figs. S29-S32).  Figure S29: Relative cross-coalescence rates reinferred from simulated data under the singleorigin model. RCCR curves from genealogies reconstructed from simulated data using Relate match those inferred from the full dataset (S28, top) for distant time points. There are large discrepancies in the very recent history, where Relate and other genome-wide coalescence-based methods are known to be underpowered.  Figure S30: Relative cross-coalescence rates reinferred from simulated data under the continuous-migration model. RCCR curves from genealogies reconstructed from simulated data using Relate match those inferred from the full dataset (S28, top) for distant time points. There are large discrepancies in the very recent history, where Relate and other genome-wide coalescence-based methods are known to be underpowered.  Figure S31: Relative cross-coalescence rates reinferred from simulated data under the mergerwithout-stem-migration-model. RCCR curves from genealogies reconstructed from simulated data using Relate match those inferred from the full dataset (S28, top) for distant time points. There are large discrepancies in the very recent history, where Relate and other genome-wide coalescence-based methods are known to be underpowered.  Figure S32: Relative cross-coalescence rates reinferred from simulated data under the mergerwith-stem-migration model. RCCR curves from genealogies reconstructed from simulated data using Relate match those inferred from the full dataset (S28, top) for distant time points. There are large discrepancies in the very recent history, where Relate and other genome-wide coalescence-based methods are known to be underpowered.

Continuous migration RCCR
Nama-MSL Nama-EP Nama-GBR  Figure S33: Expected relative cross-coalescence rates from inferred models. Our inferred demographic models allow for exact calculation of expected RCCR curves, showing that while the parameterizations of the four models are quite different, they each predict very similar RCCR profiles.   Figure S35: Deep branch distribution reinferred from simulated data. We used Relate to reinfer gene genealogies from simulated data under the four highlighted models and categorized Neanderthal-matching deep branches in the simulated datasets. Each model provided a qualitative match to the data (highest proportion of Neanderthal-matching deep branches in the British, followed by Amhara/Oromo and Nama). Similar to reinferred IIRC and RCCR, no model provided a perfect match to the reconstructions using Relate with the data, but the similarities between reconstructed genealogies in the simulated data suggest such coalescence-rate based curves are underpowered to differentiate between plausible models of history. 0 fGBR Nama Figure S38: Evaluation of confidence intervals in the merger with stem migration model. From 500 replicate bootstrap datasets, the true (simulated) parameter value typically lies within the distribution of inferred values from the bootstrap datasets. Stem 1 experienced a strong bottleneck, at the lower bound of allowed inferred values. In this case, and other parameters directly related to Stem 1, bootstrap-inferred values do not overlap with the true value, although this may be due to incomplete convergence of the optimization at these bounds of parameter space. Orange line: simulated parameter value. Green line: inferred parameter value. Blue histogram: inferred parameter values from 500 bootstrap replicate datasets. fGBR Nama Figure S39: Overlap of bootstrap distributions of inferred parameters across multiple parameterized models. While the details of deep history differ substantially between the different inferred models, the parameterization of recent history (population sizes, migration rates) are similar. Despite the differences in the deep history, many parameters related to recent history are consistent across all tested models. The single origin model (blue) more often has inferred values distinct from other models, although this model provides a far worse fit to the data than the other models.  Figure S40: ADMIXTURE clustering using K = 2 to 9 clusters. ADMIXTURE (Alexander et al., 2009) was used to cluster all individuals in the subset of populations used in analyses in this paper. At K = 5 clusters, West African populations begin to show variable separation in assigned clusters, likely due to isolation-bydistance demographic processes rather than recent admixture events. Figure 2D in the main text shows the ADMIXTURE results for K = 4, which most clearly displays known admixture events in the histories of East and South African populations.