sstar: A Python Package for Detecting Archaic Introgression from Population Genetic Data with S*

Abstract S* is a widely used statistic for detecting archaic admixture from population genetic data. Previous studies used freezing-archer to apply S*, which is only directly applicable to the specific case of Neanderthal and Denisovan introgression in Papuans. Here, we implemented sstar for a more general purpose. Compared with several tools, including SPrime, SkovHMM, and ArchaicSeeker2.0, for detecting introgressed fragments with simulations, our results suggest that sstar is robust to differences in demographic models, including ghost introgression and two-source introgression. We believe sstar will be a useful tool for detecting introgressed fragments in various scenarios and in non-human species.


Brief Communications
Admixture between populations is a topic of great interest (Fontsere et al. 2019), especially in hominins (Peter 2020). To detect archaic admixture from population genetic data, a statistic named S* was introduced to search for patterns of variation and linkage expected in the case of introgression (Plagnol and Wall 2006). This statistic has been applied in subsequent studies in modern humans (Wall et al. 2009;Huerta-Sanchez et al. 2014;Vernot and Akey 2014;Vernot et al. 2016;Xu et al. 2017;Jacobs et al. 2019), as well as other organisms (Cong et al. 2016;Kuhlwilm et al. 2019). Although the S* statistic is a powerful approach for detecting introgressed fragments without source genomes, there is no user-friendly and versatile package available. A previous implementation of S* is freezing-archer, which was specifically designed with human demographic models and used for detecting introgressed fragments from Neanderthals and Denisovans into Papuans (Vernot et al. 2016). Users must carefully read and understand the source codes of freezing-archer before manually changing the parameters inside the code. To improve the efficiency, robustness and reproducibility when using S* for detecting introgression, we implemented sstar.
The whole pipeline is illustrated in figure 1A. We define the population without introgressed fragments as the reference population, the population that received introgressed fragments as the target population, and the population that donated introgressed fragments as the source population (supplementary fig. S1, Supplementary Material online). We assume genotype data are diploid, biallelic and not missing in all the individuals of a dataset. We remove variants with derived alleles that are fixed in both the reference and target populations. Users can calculate S* for sliding windows across genomes by defining the window length and step size. To assess significance of S* scores, users can simulate data under a demographic model without introgression and build a generalized additive model (GAM) with different S* scores, quantiles of S*, numbers of mutations, and local recombination rates to predict the expected S* scores, as described previously (Vernot et al. 2016). If a genome from a potential source population is available, users can calculate the source match rate between an individual from the target population and an individual from the source population. If genomes from two different source populations are available, the origin of candidate introgressed fragments can be determined by comparing the source match rates with different source populations.
We evaluated the performance of sstar with precisionrecall curves because precision-recall curves may be more informative than receiver operating characteristic curves on imbalanced data sets (Saito and Rehmsmeier 2015). We simulated data with msprime 1.0 (Kelleher et al. 2016;Baumdicker et al. 2022) for different demographies and sample sizes. Two models tested ghost introgression: a Human-Neanderthal model (Gower et al. 2021) and a Bonobo-Ghost model ). Two further models tested two-source introgression: a Human-Neanderthal-Denisovan model (Malaspinas et al. 2016;Jacobs et al. 2019) and a Chimpanzee-Ghost-Bonobo model. For ArchaicSeeker2.0, we only used the best results because this tool does not provide options to define candidate introgressed fragments with different cutoffs. Nref is the diploid sample size of the reference population (10 or 50). sstar (full) are results inferred with GAMs using simulated data from full demographic models without introgression (supplementary figs. S6-S9, Supplementary Material online). sstar (constant) are results inferred with GAMs using simulated data from constant effective population size models without introgression (supplementary figs. S10 and S11, Supplementary Material online). sstar (only ref and tgt) are results inferred with GAMs using simulated data from models with only the reference and target populations, these populations are also constant in size (supplementary figs. S12 and S13, Supplementary Material online). src1 represents the performance for identifying the introgressed fragments from the source population 1. src2 represents the performance for identifying the introgressed fragments from the source population 2. A baseline is the performance of a random classifier, where the precision is equal to the true proportion of the introgressed fragments. An F 1 score is the harmonic mean of a given pair of precision and recall, dotted hyperbolic curves represent A Python Package for S* · https://doi.org/10.1093/molbev/msac212 MBE For ghost introgression, we compared sstar with SPrime (Browning et al. 2018), another tool using an S*-like approach, and SkovHMM (Skov et al. 2018), a tool based on hidden Markov models (HMMs). In the Human-Neanderthal model, our results show that sstar and SPrime performed better than SkovHMM, when sample size was small (ten reference individuals, fig. 1B). In the Bonobo-Ghost model, SPrime performed poorly ( fig. 1C), assigning the whole simulated sequence as a single introgressed fragment, while sstar and SkovHMM still detected introgressed fragments.
One key step in sstar is calculating the expected S* scores with simulated data from demographic models without introgression, requiring detailed knowledge on population history (supplementary figs. S6-S9, Supplementary Material online). Using approximate models (supplementary figs. S10-S13, Supplementary Material online), our results suggest that sstar still performed similarly to those using the full history ( fig. 1B and C). For two-source introgression, we compared sstar with SPrime, and ArchaicSeeker2.0 (Yuan et al. 2021;Zhang et al. 2022), another HMM-based tool. Both sstar and SPrime performed better when identifying Denisovan fragments than identifying Neanderthal fragments ( fig. 1D). This may be due to the Denisovan introgression event in Papuans being more recent and its admixture proportion being larger than for the Neanderthal introgression. More ancient events like in the Chimpanzee-Ghost-Bonobo model cannot be well determined by SPrime, while sstar still retained power ( fig. 1E).
We conclude that sstar is robust for detecting introgressed fragments. Since no single tool could perform well in all scenarios, users should choose appropriate tools based on their data. We believe sstar will be useful in various scenarios, especially considering small samples, and non-human data sets.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.