Genetic heritage of the Baphuthi highlights an over-ethnicized notion of “Bushman” in the Maloti-Drakensberg, southern Africa

Summary Using contemporary people as proxies for ancient communities is a contentious but necessary practice in anthropology. In southern Africa, the distinction between the Cape KhoeSan and eastern KhoeSan remains unclear, as ethnicity labels have been changed through time and most communities were decimated if not extirpated. The eastern KhoeSan may have had genetic distinctions from neighboring communities who speak Bantu languages and KhoeSan further away; alternatively, the identity may not have been tied to any notion of biology, instead denoting communities with a nomadic “lifeway” distinct from African agro-pastoralism. The Baphuthi of the 1800s in the Maloti-Drakensberg, southern Africa had a substantial KhoeSan constituency and a lifeway of nomadism, cattle raiding, and horticulture. Baphuthi heritage could provide insights into the history of the eastern KhoeSan. We examine genetic affinities of 23 Baphuthi to discern whether the narrative of KhoeSan descent reflects distinct genetic ancestry. Genome-wide SNP data (Illumina GSA) were merged with 52 global populations, for 160,000 SNPs. Genetic analyses show no support for a unique eastern KhoeSan ancestry distinct from other KhoeSan or southern Bantu speakers. The Baphuthi have strong affinities with early-arriving southern Bantu-speaking (Nguni) communities, as the later-arriving non-Nguni show strong evidence of recent African admixture possibly related to late-Iron Age migrations. The references to communities as “San” and “Bushman” in historic literature has often been misconstrued as notions of ethnic/biological distinctions. The terms may have reflected ambiguous references to non-sedentary polities instead, as seems to be the case for the eastern “Bushman” heritage of the Baphuthi.


Introduction
Using contemporary people as proxies for ancient communities is a contentious practice in anthropology [1][2][3][4][5] and an ongoing discussion in trying to understand the relationship between KhoeSan peoples and culture in southern Africa. 3,6,7 As communities and cultures are continually reinvented and lost, only an imperfect account of the past can be gathered from extant cultures and people. 1 Researchers look toward physical remains and historic accounts as well but connecting past descriptions to contemporary peoples may be misleading. Ethnic labels are continually formed, morphed, appropriated, and lost through time. KhoeSan refers to the collective of linguistically and culturally diverse African communities from a range of environments, regions, and times. 5,6 Possible cultural, genetic, and/or linguistic distinctions between eastern and western KhoeSan are incompletely understood. 8,9 The term ''KhoeSan'' and the many precursor terms are contentious because of their historic use and we refer readers to Note S1 for fuller discussion on the terms used here.
In the western parts of South Africa, KhoeSan have often been referred to as ''San,'' ''Hottentots,'' or ''Khoikhoi,'' 10,11 while those who inhabited mountainous regions of eastern southern Africa-present-day Lesotho, KwaZulu-Natal, Griqualand East, and the former Transkei-were likely to have all been !Ui San language-speakers 8 and have been referred to as ''Bushman'' or ''Mountain Bushman'' 10,[12][13][14] by Europeans and ''BaTwa'' by the Nguni speakers and ''Baroa'' by the Sesotho speakers, which are both Bantulanguage groups. In terms of language and identity, a few KhoeSan communities have persisted in the western regions, although with notable influences from historic events, including loss of language and other indigenous knowledge, cultural creolization, displacement, and genetic admixture. 3,7,8,[15][16][17] In eastern southern Africa, however, there are few known remnant KhoeSan communities-by name or culture-from which to draw insights. 18 The linguistic antecedents of these groups migrated southward into southern Africa by the fifteenth century, 8,[21][22][23]39 but the antecedents of the Nguni-speakers may have arrived earlier than that of the Sotho-Tswana communities. 8,40,39 The contemporary Baphuthi of the southern Maloti-Drakensberg are an interesting community, as they speak Siphuthi, which is a hybrid of these two language groups. 41 From the 1700s, the Baphuthi were ethnically heterogeneous. 11,39,[41][42][43][44] While the Baphuthi's history is rooted in the amalgamation of southern Bantu-speaking communities including Nguni-speaking (such as the amaZizi and the Mpondomise) and Sesotho-speaking (e.g., Maphuthing, Bafokeng), 43 their oral history and identity attests to KhoeSan heritage. 43 The Baphuthi rejected sedentary ''Great Place'' chieftaincy in favor of circulating through a series of settlements atop steep-sided hills scattered along the Senqu river. 11 This lifeway based on nomadism and cattle raiding, rather than agriculture, was shared with their KhoeSan antecedents and contemporaries. 43 The Baphuthi constituency, which has been historically referred to as ''Bushman,'' were ethnically heterogeneous too, but many of their members were KhoeSan. 34,41,43 Furthermore, the assimilation of some Eastern Cape amaTola, who are yet another KhoeSan-Bantu speaking creolized community, 43 would have added ancestry from KhoeSan speakers, in addition to contributions from escaped slaves and outlawed Europeans. 7,35,36 This recent assimilation of ''Bushmen'' and amaTola may be reflected as elevated KhoeSan ancestry, as seen in Lake Chrissie communities, 8 but it is unclear to what extent this would be the case for the Baphuthi as the Baphuthi and ''Bushmen'' of the 1800s are now recognized as ethnically heterogeneous. 25,35,37 We ask whether historic references to ''Bushman'' heritage does reflect KhoeSan ancestry and whether it can provide insight for an eastern KhoeSan ancestry.
We focus on two contending views for the ''Bushman'' heritage but acknowledge that neither is exclusive of the other. Firstly, ''Bushman'' descent may reflect a KhoeSan community with genetic distinctions from the western KhoeSan. Secondly, the ''Bushman'' ancestry may reflect the assimilation of heterogeneous societies with a shared lifeway but with very limited or no KhoeSan-type genetic affinities.
Efforts toward understanding the genetic diversity of southern African KhoeSan and the history of the region increasingly require the insights from remnant genetic signals in descendant communities, such as the Baphuthi. To this end, we examine the genetic affinities of Baphuthi individuals with oral history of KhoeSan descent from the southern Maloti-Drakensberg.

Sample collection and genotyping
We conducted interviews with residents from Masakala in South Africa (2017) and from Semonkong village and Quthing district in Lesotho (2019) (Figure 1). Information on mother-tongue lan-guage, place of birth, and ethnicity of the participant, their parents, and grandparents were collected. Approximately 2 mL of saliva was collected with Oragene-500 kits (DNA Genotek, Canada). To filter the data for individuals who may retain genetic signatures from the historic Baphuthi admixtures (with less influence from very recent admixtures), we considered only participants who indicated that all four grandparents were Baphuthi or spoke Siphuthi for genotyping. DNA extractions were performed at the University of Oxford following the prepIT.L2P salt extraction protocol (catalog# PT-L2P, DNA Genotek, Ottawa Canada). A total of 33 samples from Lesotho were genotyped for over 600,000 SNPs on the Global Screening Array (GSA v2) at the Estonian Institute for Genomics, Tartu, Estonia. A further two samples from South Africa were genotyped for 2.5 million variants on the Illumina  Omni2.5-8 Beadchip v1.3 at the Wellcome Trust Centre for Human Genomics, Oxford University. All raw output was processed with GenomeStudio software (Illumina, USA) and all samples passed a call rate of 97% or more. The Baphuthi population is likely to total only a few thousand, but we could not find census data. This sample set may be representative of the local communities sampled but would not capture the overall regional variation. The sample sizes are however in line with that used in other human admixture studies (see 8,9,22,30 ).

Datasets, merging, and quality control
We merged the samples genotyped here with publicly available data genotyped on the Illumina Omni5 or 2.5 array from the African Genome Variation Project, 45 the 1000 Genomes Project, 46 and four southern African datasets. 8,17,19,47 The latter included a collection of 11 KhoeSan groups including representatives of the Khoe-Kwadi (abbreviated to KK), Juu (abbreviated to JU), and !Ui -Taa (abbreviated to UT) language areas ( Figure 1A). In all populations except the Baphuthi, we restricted sample sizes to 20 randomly selected individuals to reduce computation load.
The final dataset included a collection of southern African populations to which we pay particular attention to compare with the Baphuthi. These groups are hereafter referred to as the ''focal groups.'' We included several southern African KhoeSan groups: Karretjie, zKhomani, and Namibian Nama-as the only representative of Khoekhoe groups. We included several groups from the southern Bantu language communities (here abbreviated to SB). From the southern Bantoid, eastern Bantu language speaking communities (abbreviated as SBE) we included individuals from Zambia, Zimbabwe (the eastern African SBE [web resources]) and from the Basotho, amaZulu, and amaNdebele (southern SBE). The Himba of Namibia were included as a southern Bantoid, western Bantu language community (abbreviated to SBW). We further included two groups of present-day Bantu speakers who historically had KhoeSan affinities: the Lake Chrissie San on the border of Eswatini and the Duma San from the KwaZulu-Natal uKhahlamba-Drakensberg. Both are geographically and linguistically close to the Baphuthi, however based on previous work, 8 we considered the Lake Chrissie San among the KhoeSan and the Duma San among the SBE. The remaining samples made up a ''global reference'' set from which we infer affinities. Table S1 provides an overview of the grouping.
From each dataset, the following analyses were conducted in PLINK 2.0. 48 We retain only bi-allelic variants and pruned for T/A or C/G polymorphisms to prevent strand ambiguities. We removed SNPs with no chromosomal position and updated coordinates from rsIDs to a custom Chr_position[b37] ID to ensure a match across datasets. The minimum allele frequency was set to 1% (-maf 0.01) and missing genotypes were trimmed per individual and per locus to a maximum 5% (-geno 0.05 -mind 0.05). Second degree and closer relatives (kinship coefficient > 0.087) were removed from all groups with the -king-cutoff 0.088 flag. Outliers were detected iteratively with smartpca from the software package Eigensoft. 49 We based removal on the first five eigenvectors by using five iterations with a sigma threshold of 6 as in Novembre et al. 50 The final dataset comprised 164,100 SNPs, 52 populations, and 806 individuals, of which 23 were Baphuthi (n ¼ 2 South Africa, n ¼ 21 Lesotho) (Table S1).

Data clustering and population structure
We investigated genetic clustering by using ADMIXTURE v1.3.0 51 and principal-component analysis (PCA). These approaches al-lowed us to confirm the absence of batch/chip effects or other merging artifacts and to compare the observed population structure to that previously reported. As neither analyses accounts for correlation between SNPs, we trimmed SNPs in linkage disequilibrium (LD). 52 We removed a locus from pairs with R 2 > 0.7 for a 50 bp frame with a 5 bp sliding window (PLINK -indep-pairwise 50 5 0.7) as per Busby et al. 23 The SNP count was reduced to 117,358 in these analyses. We further used a multidimensional scaling plot as a third method to examine possible batch effects (PLINK -cluster -mds 10).
The PCA was performed with plink (-pca) both on the entire dataset and with a focus on relevant data from populations south of the African Forest Belt ($5.6 S latitude).
We ran the ADMIXTURE analysis for the autosomal data for K values between 2 and 16, where K is the number of tested clusters. We used ten replicates for each K and a 5-fold cross-validation error (CV) estimation with 100 bootstraps for standard errors (-B100 -cv INPUTFILE.bed {2..16}). The lowest CV error determined the optimum K values. 51 To identify common modes across replicates, we processed the output with the CLUMPAK server 53 by using default settings (LargeKGreedy algorithm, 2,000 random permutations). Results were visualized with ggplot2 54 in R v.3.5.1 (web resources).
We tested for significant differences in each ADMIXTURE component between the focal populations by using a Kruskal-Wallis rank-sum test as implemented in R (kruskal.test) and we identified pairwise differences with two post-hoc tests: the Baumgartner-Weiß-Schindler (bwsAllPairsTest) and the more conservative Nemenyi test (kwAllPairsNemenyiTest), both from R package PMCMRplus and both with a Holms adjustment for multiple corrections (web resources).

Identifying parent populations and admixture dates
As a formal test for admixture in the history of the focal populations, we estimated the f 3 indices 55 in the form of f 3 (X,Y; test population), where X and Y are potential source populations. Focal and reference populations were all included as possible sources. Negative f 3 values (Z score < À3) are considered indicative of a discordant tree relationship and in support of admixture. 56 Estimates were made with Admixtools v.5.1. 57 The Lake Chrissie San were excluded from this analysis, as they were represented by three individuals only.
We estimated the timing of admixture events by fitting exponential decay curves of LD against increasing distances between SNP pairs, as implemented in MALDER. 19,58 Because of sample size, the Lake Chrissie San were excluded. We used an inter-generation time of 28 years, in line with other work. 9,23,59 Events were estimated from 1960 CE, the mean date of birth of the Baphuthi participants. We estimated standard errors by jack-knifing over chromosomes and estimated a Z score by dividing the mean by the standard error.
To understand whether historic and/or ancient bottlenecks and inbreeding have been influential in shaping the Baphuthi genome, we estimated the cumulative size of the genome that has runs of homozygosity (RoH). We followed the procedure of Schlebusch et al. 17 by using PLINK. We subsampled four individuals from each population for a total of 30 iterations to estimate the average cumulative RoH (cRoH) for each of five size categories in Mb; (0;1), [1;1.5), [1.5;3.), [3;6), [6;20). For populations with insufficient sample sizes for subsetting, we plotted estimates for each individual.

Sex biases in admixture based on the X chromosome
To explore in possibility of sex biases in admixture history, we analyzed the X chromosome data by using ADMIXTURE. Data were prepared as described for the autosome but the following adjustments were included as per Ongaro et al. 60 As X chromosome data were not available for all populations, the dataset was reduced to 27 populations and 3,731 loci after LD trimming. We revised sex classifications by imputing with the genotype data in PLINK (-impute-sex). A male call was made when the rate of homozygosity was >80% and any individuals with ambiguous imputations were removed.
In ADMIXTURE, we set heterozygous SNPs in the male X chromosome as missing and used the option ''-haploid ¼ 'male:23'' to treat male individuals as haploid. As a result of the filtering, one individual from the Baphuthi_LE was removed.
We summarized the results at K ¼ 5 following the same procedure as for the autosomal data and to facilitate comparisons with the autosome, we also summarized autosomal results. To ensure that the components identified for autosomal and X chromosome were indeed capturing the same regional ancestries and that a comparison would be valid, we performed a Pearson's correlation of components across the two datasets by using the corrplot (web resources) package in R.
Lastly, we focus on the ratio of non-KhoeSan African to KhoeSan ancestry (NKS:KS) as a marker of differentiation among populations in the history of admixture related to the expansion of agro-pastoralism.
The ratio of (NKS:KS) autosomal to (NKS:KS) X chromosome then gives an idea of sex biases and variation among populations. A value of 1 indicates no change in the NKS:KS ratio and thus no sex-biased admixture. Low values indicate higher KhoeSan ancestry on the autosome (possible male KhoeSan bias) and high values indicate lower KhoeSan ancestry on the autosome (possible KhoeSan female bias).

Data clustering and population structure
We explored global population structure by using PCA and ADMIXTURE analysis. The patterns observed in the global PCA correspond well with published results on global diversity (Note S3, Figures S1 and S2). When focusing on the PCA among southern African groups ( Figure 1B), the first five PCs accounted for $6.5% of the total variation ( Figure S3).
We see that on PCs 1 and 4, all the focal SBE populations-which includes the Baphuthi-cluster close to one another. These two axes accounted for the separation of KhoeSan from non-KhoeSan Africans and western from eastern populations, respectively, and may suggest shared KhoeSan and Bantu-speaking genetic affinities in the focal SBE. The Lake Chrissie San are distinctly closer to KhoeSan groups, in particular the Taa and Khoe-Kwadi groups. The Baphuthi are at the extreme end of PC 3, which captures a gradient between northerly and southerly Bantuspeaking populations (Figure 1). While this might be caused by a batch artifact, we do not see the same separation on other PCs ( Figure S3), nor do we see an ADMIXTURE component that is unique to the Baphuthi in the unsupervised analyses (results below). The results of the multidimensional scaling plot show no clear sign that the Baphuthi data are influenced by a batch effect either ( Figure S4). On the basis of this, we suggest that the separation on PC 3 reflects instead a Baphuthi-specific change affecting their affinities to other SBE speakers. On PC 2, which seems to account for non-African ancestry in the KhoeSan groups, the Baphuthi are no different from the other focal SBE. In contrast, KhoeSan groups with known Eurasian admixture (zKhomani, Nama, and Karretjie) separated from the remaining Africans. On PCs 2 and 3, the Chrissie San are shifted more toward the KhoeSan who have little recent Eurasian admixture (e.g., G|ui and G||ana, Hai||om) than the southern African zKhomani, Nama, and Karretjie, but they are also shifted toward the southern African SBE groups, showing some genetic similarity to them.
The ADMIXTURE analysis of the global dataset paralleled the variation captured by the PCA described above, in line with previously published work (e.g., Rosenberg et al. 61 and Bryc et al. 62 ) reflecting divisions between global regions (Figures 2 and S5). Here we discuss the results for K ¼ 9 (Figure 2), as the cross-validation errors were similar to the lowest estimates ( Figure S6). To simplify discussing the ADMIXTURE components, we focus on those present in the focal populations and refer to them on the basis of the populations in which the component is at the highest proportion on average.
The South African and Lesotho Baphuthi had similar ADMIXTURE profiles. On the basis of this and their shared position on all PCA axes, we merged the data and from here forward we discuss the joint Baphuthi data. Significant variation in ADMIXTURE components was found across our focal populations (Baphuthi, Duma San, Lake Chrissie San, and the SBE) (Table S2, Kruskal-Wallis test KW ¼ 106.40, p < 0.001). There was some disagreement between post-hoc tests for pairwise differences, but these reflect different levels of conservativeness between the tests. The overall agreement is discussed below.
The Baphuthi profile was composed predominantly of two components but had noteworthy contributions from an additional five components ( Figure 2). The most predominant component in the Baphuthi was prevalent across the southern African SBE but was highest in the Baphuthi (mean $81%, gray in Figure 2). This component distinguished the Bantu language communities from other Africans. The tests for significant differences among focal groups found that the southern SBE had significantly greater proportions of this component compared to all other groups (Figure 3, Tables S2 and S3, post-hoc tests).
The second largest component in the Baphuthi (mean 15% and 12% the lowest seen in any individual; dark green in Figure 2) was strongly related to the KhoeSan groups and was at its greatest in the Naro KhoeSan (98%, hereafter the ''Naro'' component). In the Baphuthi ADMIXTURE profile, the sum of the ''Naro'' and ''Baphuthi'' components (95% 5 4%) was notably greater than the sum of these two components in any other southern SBE (<90%) or eastern SBE (<60%) ( Table S3). For example, of the 22 Lesotho Baphuthi, 22% had a sum of more than 99%. Only some KhoeSan groups had individuals with such profiles.
If we look beyond our focal groups, we see that the Naro, Ju'|hoan, and G|ui and G||ana had the sum of these two components comparable to the Baphuthi (>95%). However, in these populations the sum was largely driven by elevated ''Naro'' proportions (the ratio ''Baphuthi'':''Naro'' was well below 1). The ratio of the two components in the Baphuthi is very similar to the ratio in other southern SBE (median $5.5 5 1, Table S3) despite the variation in absolute values. In contrast the ratios of the southern SBE were far greater than the eastern African SBE where the ''Naro'' component was virtually absent (1 5 1%; mean 5 SD Table S3).
A further distinction of the Baphuthi from the other southern SBE was that they had notably lower proportions of the non-major components. These components likely reflect variation in population history and may be important for understanding divergences between populations.
Of these minor components, the most variable among the southern SBE focal groups was the component dominant in the Himba (''Himba'' component, dark blue in Figure 2). Populations from our region of interest, southern Africa (SBE-speaking and UT-speaking communities) had lower proportion of the ''Himba'' component compared to Bantu-speaking, the KK-speaking, and JU-speaking KhoeSan from other regions (Table S3). The Baphuthi and amaZulu in particular had significantly lower proportions compare to the Basotho, Duma San, and amaNdebele and much lower than the Zambians (41% 5 3%) and Zimbabweans from further north (33% 5 3% Table S2). The Nguni-speakers (amaZulu, amaNdebele, and Duma San) are discrepant in the levels of Himba component. Admixture or drift may have differentiated the Baphuthi and amaZulu from the other southern SBE. There is thus impor-tant variation of this western African component in the region. The co-occurrence in relatively similar amounts of the ''Baphuthi,'' ''Naro,'' and ''Himba'' components are suggestive of a shared history for the southern SBE populations.
The Nama (KK group) are an exception within the region. They have distinctly elevated ''Himba'' (17% 5 17% Figure 3) and ''Somali'' (10% 5 5% Figure 3) components, which are comparable to the other Khoe-Kwadi groups. Furthermore, the Baphuthi, Duma San, and Lake Chrissie San can be distinguished from the Nama, Karretjie, and zKhomani by the lower levels of Eurasian components (light green and pink in Figure 2).

Identifying parent populations and admixture dates
The overall similarity of the southern SBE was further supported by f 3 admixture tests. For all the southern African SBE (which includes the Baphuthi), when the possible pairs of source populations were set as the KhoeSanspeaking Ju'|hoansi and the west Africans (Yoruba or Igbo from Nigeria), the lowest Z scores (Z < À10) were reported, indicating significant support for admixture. When testing for admixture in the eastern SBE, Z scores recovered were lower, for the Zambians, below À8, and the Zimbabweans were not statistically significant (Z > À3) (Table S4). Failure to detect the admixture in the Zimbabweans may reflect the lower proportion of KhoeSan ancestry as seen in the low ''Naro'' ADMIXTURE component.
Another common admixture signal was that of the east African pastoralist gene flow, which we detected in the Nguni-speaking Duma San, amaZulu, amaNdebele, and the Sesotho-speaking Basotho (Z scores < À3) and for all the southern African KhoeSan when we considered source pairs involving non-Bantu east Africans and KhoeSan populations (Table S2) The signal, however, appears weaker if not absent, in the Baphuthi as well as the east African Zambians and Zimbabweans as no admixture signals were detected.   eastern Africa SBE-southern African Nguni admixture is detected in the Basotho.
As admixture of the SBE with the autochthonous KhoeSan may not have been uniform across the region, we performed a series of tests to distinguish the introgression of KhoeSan contributions that were distinct from that already present in the southern SBE by using f 3 (KhoeSan, South African Bantuspeaking group, target population). The test did not support additional KhoeSan contributions to the Baphuthi, amaZulu, Zimbabweans, or Zambians compared to what is already present in the southern SBE (Table S2). Significant results were detected for the Basotho, Duma San, and amaNdebele. We point out that the Nguni-speakers (amaZulu, amaNdebele, and Duma San) do not show a coherent signal. All the southern KhoeSan produced support for admixture with these pairs of sources, indicating that KhoeSan-related proportions and/or diversity were greater than the estimates in the SBE.
We additionally tested for the occurrence of Eurasian ancestry not present in the KhoeSan by considering Eurasian-KhoeSan source pairs in the tests. The Baphuthi, amaZulu, Zambian, and Zimbabweans did not produce results to support such admixture. The Duma San, amaNdebele, and Basotho recovered a common significant f 3 value, and in all cases the top results included GIH (Gujarati) or eastern Asians. The results here were not consistent across the Nguni speakers. All the southern KhoeSan produced significant results for this test and the top scores included a European source.
To provide chronological context to admixture events supported by the f 3 results, we estimated admixture dates with LD decay curves (MALDER ,  Table S5). We focus on events younger than 5 kya as beyond this the accuracy of the LD decay curve is questionable. Multiple admixture events were supported for the Baphuthi, amaZulu, Nama, Himba, and Zambians. In the Baphuthi we detected recent admixture between a KhoeSan and European group dating to $1786 CE. Such dates may correspond to colonial era European admixture, as they are shared with events detected in the Karretjie, zKhomani, and Nama (1808-1835 CE), which are known to reflect recent admixture events. 3,17,[63][64][65] Several of the southern SBE produced KhoeSan-European admixture dates distinctly older (1450-908 CE), indicating an event not tied to the 17 th century expansion of Europeans into southern Africa. We looked at date estimates in SBE groups not included in the focal populations to add context to these older events. The other SBE produced similar dates (Table S5), the oldest of which were similar to events detected in the Nama (252 CE-139 BCE). This suggests that the oldest Eurasian admixtures ($908 BCE) in the SBE may reflect Eurasian ancestry brought southward by events related to the arrival of the East African pastoralist groups.
Admixture between African populations were detected in the Baphuthi (991-812 CE) and in the amaZulu (1258-1143 CE), while older dates were detected for the more northerly Himba and Zambians (843 CE-118 BCE), suggesting more recent admixture for the groups further South.
To profile the extent of historic and/or ancient bottlenecks and inbreeding on the Baphuthi genome, we estimated the cRoH for five size categories.
The Baphuthi were notably different from the other southern SBE in the size categories reflective of recent inbreeding (RoH > 1 Mb; reflecting <10 generations ago, since $1670 CE; see McQuillan et al.) 66 ( Figure S7) and were more similar to the KhoeSan groups. Size categories reflective of ancient events (<1.5 Mb) were again more similar to the southern KhoeSan.
Overall results suggest that historic events, but not recent inbreeding/bottlenecks, were shared between the Baphuthi and SBE. Among the southern Bantu speakers, we note that the amaZulu and Himba showed high cRoH for the category reflective of ancient events. The Duma San and Himba had support for recent events too. However, it is unclear how different admixture histories have impacted these cRoH metrics.
Sex biases in admixture based on the X chromosome We used the X chromosome ADMIXTURE components to detect possible sex biases in admixture by comparing them to the autosome as a ratio. We used the ratio of non-Khoe-San African:KhoeSan ancestry (NKS:KS) discussed below as (NKS:KS) autosomal :(NKS:KS) X chromosome .
At K ¼ 5, major regional ancestries are represented and could be related easily to that observed in the autosome ( Figure S8). Specifically, the autosome and X chromosome shared a non-KhoeSan African component (YRI), a KhoeSan component (Naro), an East Eurasian component (CDX), and a West Eurasian component (GBR or IBS).
Congruence between the datasets was supported by strong correlations of a single autosome-X chromosome pair (coefficient > 0.92, p value < 0.001, Figure S9).
The fifth components were not strongly correlated with each other nor with any other component. The fifth autosomal component, a possible eastern African component (Somali), was best correlated with the X chromosome's Naro and YRI components (coefficient < 0.15, p value < 0.01, Figure S9). The X chromosome's fifth component, a possible South Asian component (GIH), was weakly correlated with the autosomal GBR and CDX components (coefficient < 0.26, p value < 0.001, Figure S9). As these components were ambiguously related to other components, we discuss the less ambiguous YRI:Naro ratio for both autosomal and X chromosome results. Now that we have confirmed that the autosomal and X chromosome components are comparable, we discuss the change in the non-KhoeSan African:KhoeSan ancestry ratios between autosome and X chromosome.
Firstly, on the basis of the ratio of the sum of the NKS and KS components, i.e., (NKSþKS) autosomal :(NKSþKS) X chromosome , we saw very little change in the amount of YRIþNaro ancestry across populations (Table S6). The exceptions were two eastern African groups (MKK and LWK) and the KhoeKhoe-speaking Hai||om where there was a decreased ratio, indicating a possible male bias on non-African admixture (Table S6). In these three populations, it is possible that non-African admixture may confound signals of sex-biased admixing.
The (NKS:KS) autosomal :(NKS:KS) X chromosome ratios showed variation across Africa but were almost entirely above 1 (Table S6, Figure 4). Many populations were not statistically different from a ratio of 1, but this may be because of low power. Values larger than 1 indicate a female sex bias in KhoeSan admixture, which is wide-spread across communities. Of the southern African SBE, the Baphuthi had the largest ratio, on par with Bantu speakers further north ($1.76, p < 0.01), suggesting a stronger female bias in KhoeSan assimilation compared to the other two southern SBE (amaNdebele and Duma San, ratio $1.25, not statistically different from 1). The Hai||om and Chrissie San both had values supporting a female KhoeSan bias (>1.19), suggesting that the bias was not restricted to the Bantuspeaking communities.

Discussion
The Baphuthi have an oral history of descent from BaTwa/ Baroa (KhoeSan) and a more recent narrative of the assimilation of refugees. 35,43 The results from our investigation show that the Baphuthi have an overall similarity to neighboring Bantu-speaking communities but are indeed unique in some aspects of their genetic history. We found no support for a unique eastern KhoeSan ancestry, but the Baphuthi appear to have a unique drifted genome.
The Baphuthi have close affinities to the southern Bantu-speaking communities The Baphuthi show strong genetic affinities to the communities from the surrounding region who speak southern Bantoid, eastern Bantu (SBE) languages as we may expect from their history. This was evident from the overlapping positions in the PCA, similar ratio of ADMIXTURE components related to southern Bantu-speakers and KhoeSan ($5:1) as previously reported, 47,67 and from the MALDER and f 3 results. The profile was particularly similar to the amaZulu in our analysis and in previous work. 8,23,45 In contrast, Baphuthi notably differed from the Duma San (Nguni speakers) even though both share a narrative of recent KhoeSan descent. 7,8 The Baphuthi genetic history appears to have been shaped by bottlenecks and/or inbreeding not shared with other SBE communities as shown by the cRoH profiles. However, these events are not recent.
While the Baphuthi speak Siphuthi, a Southern/ Lowland Ndebele language 41 (web resources), we found the amaNdebele shared more characteristics with the Basotho than the Baphuthi. This was despite the fact that the Baphuthi and amaZulu were geographically and linguistically closer to amaNdebele (and Duma San). Clearly the relationship among the Nguni-speaking communities is not simple.
The Nguni-speakers are suspected to have been the earliest of the present-day Ntu-speaking agriculturalists to arrive to southern Africa 68 and their relatedness may give details to the early history of the region. The divergence of the amaNdebele from the other Nguni may result from non-uniform admixture between the Nguni and later arriving communities. The cultural differences of the Nguni and Sotho-Tswana provide some clues as the customs unique to the Nguni in southern Africa have possible parallels in Rwanda. These include ''hlonipha'' (respectful etiquette) and a distinct dichotomy in social roles (e.g., between men tending to cattle while women tend to crops), 40 suggesting the arrival of an ''intervening influence'' (possibly related to the Basotho) between the regions.
The amaNdebele and Duma San showed support for additional Eurasian and KhoeSan admixture events not seen in the amaZulu and Baphuthi (Tables S2-S4) but in common with the Basotho.
While the amaNdebele migrated out of KwaZulu-Natal during the 1600s 40 and may have admixed during this migration, the east African/Eurasian ADMIXTURE components are too consistent across individuals to be from recent admixture and there was no support from f 3 and MALDER (Tables 1 and S4). Instead the contribution may derive from admixture with another SBE group. The same may be said for the Duma San.
The Basotho appear to descend from recent admixture between a southern SBE with a group from further north. Indeed the Basotho could be modeled as f 3 (Baphuthi, Zambians; Basotho) or f 3 (Duma San, Zimbabweans; Basotho) (but no other combination of these, Table S4). This result would relate to the elevated western Bantu contribution in the Zambians and Duma San (increased ''Himba'' and ''Naro'' ADMIXTURE components, Table S3). The detected admixture dates to 1450-1173 CE (Table 1), which would reflect a late Iron-Age contact. Our results support that the ''proto-Basotho'' migrated southward as one would expect as Sesotho is related to Sepedi, Setswana, Tshivenda, and Makua, found predominantly further north (web resources). 69 The origin of the SBW component in the Basotho (possibly related to that in the Zambians) may be tied to the arrival of Benfica pottery tradition in southern Congo and ultimately a possible western Bantu source. 70 Figure 4. Ratio of autosomal:X chromosome ADMIXTURE components for African populations Components shown are the KhoeSan (Naro) and non-KhoeSan African (YRI), the ratio of the two (YRI:Naro), and the sum (YRI þ Naro) at K ¼ 5. Horizontal lines in violin plots indicate the 25%, 50%, and 75% quantiles. Note that the y axis is on a binary log scale and all values were capped at y ¼ 10 to aid visualization. Horizontal line at y ¼ 1 indicates the expected value when the autosome and X chromosome are equal. Asterisk on YRI:Naro subplot indicates distributions statistically different from y ¼ 1; *p < 0.05, **p < 0.01, ***p < 0.001. Highlighted bars below the plot indicate the focal populations. culture, which is reflective of KhoeSan influence. 7,8,73 We found support for a common genetic origin of the KhoeSan heritage with that of the southern SBE and southern KhoeSan but no support for a unique KhoeSan descent in the Baphuthi. In some SBE (e.g., Zambians), KhoeSan descent was detected that was distinct from the southern SBE. The Baphuthi were not among these groups, contrary to what was anticipated from the ''essentialist'' reading of ''Bushman'' descent in historic texts. Moreover, the component prevalent in the Baphuthi was most prevalent in the Karretjie, zKhomani, and Chrissie San suggesting a regional affinity. Our discussion is made with the caveat that a higher resolution dataset could possibly detect signals that are elusive in the current study. The KhoeSan ADMIXTURE component, referred to as the ''Naro'' component, is seen in the Baphuthi and the southern SBE, Duma San, and Lake Chrissie San. The southern SBE lacked a clear difference from the southern KhoeSan based on the f 3 statistics and among the southern SBE, the ''Baphuthi'':''Naro'' ADMIXTURE ratios were remarkably consistent. This is particularly noteworthy considering the variation in linguistic affinities and other ancestral components of the southern SBE. These results support the proposed common source and event for the KhoeSan admixture in the region. 9,74 The X chromosome deviated from the proportions detected in the autosome and supported a by and large female bias in KhoeSan admixture (Figure 4), in line with earlier work. 22,71,74,72 The trend may be related to the socio-political dynamics during the Bantu-language expansion.
When compared to other SBE, the southern SBE have elevated KhoeSan components and a weaker bias for female KhoeSan admixture, a pattern shared with groups from south-west Africa (Figure 4). This is often argued as the result of demographic dynamics when the early Iron Age Bantu-expansion progressed across in southern Africa. 74,75,76 The change in environmental conditions would have slowed the rate of population growth and allowed for greater admixture with local populations, 77 leading to greater KhoeSan ancestry and potentially more equal assimilation of sexes.
If we assume that the KhoeSan affinities of the Baphuthi indeed reflect a recently acquired ancestry from the Maloti-Drakensberg, these ancestors would then have had shared affinities with other groups across the region. The broad relatedness across the region is possible as the ancestral languages of the groups in which the ''Naro'' ADMIXTURE component was largest (G|ui and G||ana, the Karretjie [likely |Xam language], the ||Xegwi [Chrissie San ancestors]) belonged to the !Ui branch of the Tuu family. 8 The languages spoken around the Drakensberg are likely to have also been Tuu. 78 The southern SBE and southern KhoeSan appear to have a common ancestral KhoeSan population, but the admixture history is not likely shared. The variation of the ''Baphuthi'':''Naro'' ADMIXTURE ratios among the southern KhoeSan suggests independent admixture events, con-trasting to the common ancestral ''proto-southern SBE'' who drifted or admixed to give rise to the present-day SBE diversity.
Some traits make the Baphuthi data peculiar from other SBE. Of particular interest is that the Bantu-related ''Baphuthi'' and ''Naro'' ADMIXTURE components are at higher proportions compared to other southern SBE, yet several Bantu-related components are frequently absent from the Baphuthi ( Figure S5). The ''Baphuthi'' and ''Naro'' components sum to >95% for most Baphuthi but none of the other southern SBE (median total of the two components $85%-89%, Figure S5). Furthermore, components such as the ''Himba,'' ''Kalenjin,'' and ''Jola,'' which were almost ubiquitous in the southern SBE, were frequently absent from the Baphuthi. The Baphuthi have apparently diverged even from their closest genetic kin, the amaZulu. For perspective, consider that only the Lake Chrissie San, ǂKhomani, Karretjie, and G|ui and G||ana had individuals with similar profiles (i.e., the sum of the two components >90%). This may be related to the historic and ancient bottlenecks detected in the Baphuthi (cRoH and PCA results). This distinction is surprising, as the Baphuthi's history mentions the historic amalgamation of diverse SBE groups; e.g., amaZizi, Maphuthing, Bafokeng, and Mpondomise. 43 Furthermore, the Baphuthi had evidence for KhoeSan sex bias more similar to eastern Africans than their geographic neighbors ( Figure 4). There is no support for a recent connection of the Baphuthi to east Africa based on the autosome. It also seems unlikely that the sex bias is a retained ancestral signal as this would contradict the overall regional pattern and indeed a pattern shared with the Chrissie San and Hai||om. Most likely there has been some distortion that may be due to drift and the lower N e of the X chromosome. Without more Nguni populations, our discussion of the Baphuthi's admixture biases is necessarily inconclusive.
With regards to the minor autosomal components absent in the Baphuthi, an independent loss of the same in the Baphuthi, southern KhoeSan, and G|ui and G||ana seems unlikely considering the large differences in the proportions of the two major components if the minor components were at notable levels in the source populations. The position of the Baphuthi on PC3 ( Figure S1) and the high cRoH supports a bottleneck in the Baphuthi related to southern Bantu-speaking and KhoeSan affinities. Moreover, the cRoH estimates did not support a recent shared bottleneck for the Baphuthi and amaZulu, indicating that the lower proportions in the amaZulu are not the result of the recent bottleneck seen in the Baphuthi. It may be the case that these minor components had already drifted, or were not present, in the early arriving Bantuspeaking communities.
A separate (re)introduction of the minor components can be supported. The amaZulu and Baphuthi could not be modeled as an admixture of east Africans and a KhoeSan group, and f 3 suggested that the east African ancestry detected in the Basotho, Duma San, and amaNdebele (e.g., the Asian admixture on f 3 and the ''Somali,'' ''Kalenjin,'' ''Himba'' ADMIXTURE components, Figure 2, Table S4) may reflect subsequent admixture rather than recent drift in the Nguni. The Duma San have a long history with the amaZulu but also acknowledge more recent Bantu-speaker ancestors 8 that may be the source of the (re)introduced components.
The possible re-introduction parallels the argument for multiple streams of east African migration into southern Africa (i.e., Chifumbaze complex) 70,71 with an early arriving Iron Age Bantu-related community, possibly Kwale tradition facies (here the ''proto-southern SBE'' ADMIXTURE profile of the Baphuthi or amaZulu) and then a replacement or assimilation by late Iron Age groups possibly derived from early Iron Age Kalundu tradition ceramic facies or possibly Nkope influence (marked by the addition of the ''Himba,'' ''Kalenjin,'' and ''Jola'' ADMIXTURE components as in Banyarwanda, Barundi in Figure 2). MALDER date estimates for admixture between KhoeSan and non-KhoeSan Africans (991 CE-118 BCE in the Zambians, Himba, and Baphuthi) pre-date the arrival of the late Iron Age Bantu expansion into south-eastern Africa, 21,24 while dates detected in the other southern SBE (1102-908 CE amaNdebele; 1450-1143 CE amaZulu, Duma San, Basotho) for admixture between KhoeSan and non-KhoeSan could be related to the second millennium CE late Iron Age expansion. 79 We found no ''ancient'' distinction of the Baphuthi from the SBE in KhoeSan ancestry, but the Baphuthi may have recently incorporated amaTola. 42,43 The amaTola reportedly incorporated Khoekhoe during the Frontier wars (1779-1878 CE) as well as Bantu-speakers 7 and in the am-aTola descendants, the Duma San, 8 we do see greater proportions of ADMIXTURE components in common with the Nama (a Khoekhoe group). While Khoekhoe pastoralists are recorded along the western parts of South Africa 14 and left cognates in the Nguni languages, 68 the extent of their range eastward is unclear. 80 The Nama have genetic ancestry indicating an admixture between a southern KhoeSan group and a Eurasian group related to the arrival of pastoralism in the region 9,65 possibly 252 CE-139 BCE (Table 1).
If we assume that the east African components identified in the southern SBE entered during their migration through east Africa into southern Africa, 79-81 then the absence of elevated east African ancestry in the Baphuthi would indicate that the assimilated ''Khoekhoe'' were perhaps culturally pastoralists but not of Khoe-Kwadi descent. Alternatively, that component might have been lost to genetic drift. Further work will help clarify this issue.
In this work, we investigated to what extent the oral history of Baphuthi as eastern San and Khoekhoe descendants is reflected in their genetics. The Baphuthi harbor signals for an interesting connection to the early arrival of the Bantu languages, but we could not support a unique eastern KhoeSan contribution. In the case of the Baphuthi and the previously investigated Duma San, we find some support for KhoeSan descent but not the ''essentialist'' reading of an eastern KhoeSan from historic texts. Such essentialist interpretations have created misconstrued narratives of ethnic/biological distinctions. The high status attained by KhoeSan Shaman and the pride taken in the ''Bushman'' means of semi-nomadic subsistence 7 may have entrenched the importance of ''Bushman'' heritage in the collective memory and perhaps without necessarily reflecting recent assimilation of some remnant KhoeSan group.

Data and code availability
Ethics approval and participant consent allows for the data presented in this study to be used in future research provided the research abides by the agreement in the original consent forms and ethics application. To avoid any conflicts of interest and violations of the ethics agreement, the request for data sharing will be conditioned upon signing a Data Transfer Agreement. Further details are provided in Note S2. Array data has been deposited at the European Genome-phenome Archive (EGA), which is hosted by the EBI and the CRG, under accession numbers EGA: EGAD00010002467 and EGAD00010002468. Further information about EGA can be found on https://ega-archive.org.