Divergence and serial colonization shape genetic variation and deﬁne conservation units in Asian elephants

Asian elephants ( Elephas maximus ) are the largest extant terrestrial megaherbivores native to Asia, with 60% of their wild population found in India. Despite ecological and cultural importance, their population genetic structure and diversity, demographic history, and ensuing implications for management/conservation remain understudied. We analyzed 34 whole genomes (between 11 3 and 32 3 ) from most known elephant land-scapes in India and identiﬁed ﬁve management/conservation units corresponding to elephants in Northern (Northwestern/Northeastern), Central


INTRODUCTION
Asian elephants (Elephas maximus) are charismatic megaherbivores distributed across South and Southeast Asia and are culturally important across the globe. 1 They are found in a variety of natural ecosystems from tropical evergreen forests through deciduous forests to grasslands at various elevations.India harbors at least 60% of the population of wild Asian elephants. 1,2Increase in human footprint and land use change over the past two centuries has impacted elephants significantly, resulting in population isolation even at regional scales.6][7] We might expect low genetic diversity and high impacts of isolation under such conditions.However, camera trap data 8,9 and radiotelemetry studies 10 reveal that elephants have annual home ranges of several hundred square kilometers in India, often encompassing dense human habitation.Such movement through human-dominated areas might offset the impacts of fragmentation and the associated loss of genetic variation and inbreeding. 11Clearly, understanding elephant phylogeographic history and current population genetic structuring, coupled with possible effects of recent fragmentation, could be useful for elephant conservation and management.
3][14] Using microsatellites and a mitochondrial region, Vidya et al. 15 reported four Indian elephant population genetic clusters, one each in Northwestern-Northeastern India (NW&NE) and Central India (CI) and two in Southern India separated by the Palghat Gap in the Western Ghats mountain range, broadly corresponding to their regional population distributions.However, a more recent study by De et al. 16 suggests only three major genetic clusters corresponding to Northwestern India, Northeastern India, and a combined Southern and Central Indian population (with the NW&NE population appearing to be admixed).The patterns of genetic diversity also varied across different ecoregions.Vidya et al. 15 showed that the Southern Indian populations harbored lower mitochondrial haplotype diversity compared with other populations, while according to De et al., 16 the Northeastern Indian populations showed low heterozygosity.The only genomewide study of elephants globally 17 includes poor range-wide sampling of Asian elephants and suggests old divergences (circa 20-30 kya) between populations in Southern India across the Palghat Gap, as well as between the northern and the southern populations.Given the limited genomic data and analyses within Asian elephants, it is difficult to understand demographic history, population genetic structure, and, consequently, conservation priorities.
To address these gaps, we used whole-genome sequences of wild-caught Asian elephants from all regions and most landscapes within these in India, encompassing known biogeographic barriers, to assess their population structure, demographic history, and genetic variation.Our analyses help infer the predominant factors that have shaped the observed genetic patterns across the subcontinent.Further, we use this information to propose population management units (MUs) and speculate on future genetic issues in elephant conservation.

Population structure
We used 28 blood samples collected from wild-born captive elephants of known origin from almost all known elephant landscapes in India (Data S1A 11 ) and re-sequenced whole genomes (Figure 1).We also included 6 previously published elephant whole-genome sequencing data, of which 4 were from India.A set of 2,675,655 biallelic SNP markers revealed distinct population structure between elephant landscapes in India.A principalcomponent analysis (PCA) for understanding the clustering of genetic variation revealed that populations separate from the south to north direction along PC1 axis (Figure 2A).Elephants from NW&NE form a single cluster (Figures 2A-2C).Populations from CI and NW&NE cluster together along the PC axes 1 and 2 but resolve into separate clusters along PC axis 3 and in the ADMIXTURE plot (Figures 2A-2D), with the most optimum support for five population clusters (northwest-northeast, central, and three clusters in Southern India [namely, north of the Palghat Gap (NPG), south of the Palghat Gap (SPG), and south of Shencottah Gap (SSG)], EvalAdmix; Figures S1 and S2).NW&NE populations together, henceforth, are referred to as the Northern Indian population, while the population in the south in the Western Ghats (NPG, SPG, and SSG) are collectively referred to as the Southern Indian population unless the subregional populations are being described.The admixture graph that includes a Bornean elephant (ERR2260499) as an outgroup with branch lengths qualitatively adjusted to the drift parameter (Figures 2E  and S6) suggests a deep divergence between the Northern and the Southern Indian elephants.Additionally, we observe strong drift parameter values for the two populations SPG, potentially indicating low effective population sizes.The phylogenetic (neighbor-joining [NJ]) tree supports the aboveobserved patterns with a minor difference, where the central population seems to be nested within the northern population cluster (Figure S5).
Our results confirm that the NW&NE elephant population is different from other Indian populations 15,16,18 and that the Ganges and the Brahmaputra rivers have acted as potential barriers to gene flow (Figures 1 and 2).Previous studies suggested that the Brahmaputra River is an incomplete barrier, with femaleled family groups not venturing to cross except perhaps at the upper reaches, but not a barrier to adult male elephants. 15onsistent with these studies, we observe one individual (Rami-female) in the Central Indian samples had admixture with the northern populations, while another elephant (Bolanath-male) from the Central Indian population dispersed northward across the Ganges River before it was rescued recently from a location close to the northeastern population (Figure 1; Data S1A).Additionally, the Northern Indian elephant samples have ancestries found in all other clusters (Figure 2D), which could be due to incomplete lineage sorting. 19While geographically proximate, we find that Central Indian populations are genetically distinct as suggested previously. 15Consistent with movement ecology inferences, 20 the elephants in the northwestern population 16 are connected to the Northeastern Indian elephants.This is a large landscape running west to east along the Himalayan foothills, though elephant habitat connectivity here is fragile 20 or even completely disrupted at places in recent times. 1,2ur data and analyses allow identification of a novel genetic cluster in Southern India and suggest three genetically differentiated populations regionally.Along the Western Ghats in the south, certain breaks or passes divide the elephant population, with the Palghat Gap being the most prominent barrier to elephant dispersal.Further south, the Shencottah Gap also acts as a previously unknown impediment to elephant movement (though anecdotal information suggests that elephants moved across this gap until a few decades ago).Genetic differentiation of SPG and SSG (also see Figures S3 and S4) could be due to founder effects and inbreeding combined with recent isolation and small population size. 21Gene flow between north and SSG may have reduced recently (compared with across the Palghat Gap), largely due to a railway line, a highway, and associated development along these transportation infrastructures.While nested biogeographic implications of these gaps on population structure and phylogeography have been highlighted for smaller species (e.g., montane birds, 22 bush frogs, 23 geckoes, 24 and land snail 25 ) and some mammals (e.g., lion tailed macaque 26 ), that they result in such deeply divergent lineages in a large, highly mobile mammal such as the elephant is surprising.Alternatively, founding events followed by minimal gene flow might have resulted in the observed patterns of divergence.Additionally, elephant numbers and densities SSG have always remained small (on the order of 100 individuals).Such small populations are subject to genetic drift, which could accentuate the observed patterns.Pairwise FST, a measure of genetic differentiation between populations, supports these inferences (Data S1B), and we find no significant gene flow between the clusters based on F3 statistics (Data S1C), which tests whether a target population is an admixture of two test populations.If the F3 value is negative, the target population is admixed between the two test populations, signifying gene flow between the two populations.For Indian elephants, we do not find a significant negative F3 value for any combination of three populations (Data S1C).This lack of gene flow combined with the possibility of incomplete lineage sorting between northern populations and others points toward serial colonization from northern populations.
Interestingly, the results obtained from the haplotype network analysis based on the mitogenome reveal other nuances (Figure S7).Similar to the nuclear genome, the northwestern population was embedded within the northeastern population.However, the Central Indian population showed closer affinity to the southern populations, unlike the results obtained from the nuclear data, and more in line with some previous studies. 14,16Such discordance between mitochondrial and nuclear datasets in phylogeographic interpretations has been reported previously. 27Since elephants form matrilineal herds, individuals in a herd should share their mitochondrial haplotype.It is possible that herds of matrilineally related elephants from the northern population colonized CI and NPG and subsequently founded the other southern populations, leading to the observed patterns.Our findings support the recognition of five elephant MUs in India, emphasizing their antiquity and unique evolutionary histories.

Demographic history
We investigated recent demographic history for clusters with more than nine samples (northern and NPG) using GONE 28 and show both these populations underwent a recent bottleneck around 1,500-1,000 years ago (Figure 3A).Since the taming of elephants in the subcontinent during Harappan times, at least four millennia ago, there has been regular exploitation of wild individuals for military and domestic use. 1 We do not have estimates of the actual extent of exploitation (in terms of annual offtakes) but qualitatively infer the levels of exploitation from the stocks of elephants in captivity.Historical accounts suggest that the armies of ancient kingdoms and republics in the north (the Gangetic basin) maintained several thousand captive elephants from as early as the 3 rd century BCE until late Mughal times in the early 17 th century CE, suggestive of overexploitation of wild populations for nearly 19 centuries, until the invention and use of gunpowder in warfare rendered the war elephant irrelevant. 1 Our results also suggest that both populations (northern and NPG) may have started recovering from the bottlenecks around 300-500 years ago.The recent recovery-like pattern may be observed due to recent and fine-scale population structure in the northern cluster, 1,2 which would reduce shared linkage disequilibrium (LD) blocks between populations, giving an illusion of population expansion. 29,30dditionally, we find signatures of population bottlenecks around 100,000 years ago (Figure S10).However, further investigation with hPSMC 32 and fastsimcoal 2 31 suggests that this coincides with populations differentiating from each other (Figures 3B and 3C).hPSMC is a pairwise coalescence model based on in silico hybrid genomes consisting of a haplotype from each of the two populations.The sequences are expected to coalesce at infinity when the gene flow between the lineages ceases.Thus, the beginning of the increase in N e is noted as the time of population divergence; however, the divergence could be older, and we explored this possibility by explicitly modeling population divergence using fastsimcoal.
Qualitative assessments such as hPSMC suggest that the northern elephant population diverged from all other populations about 70,000-100,000 years ago, while fastsimcoal suggests it diverged from the Southern Indian populations 134-245 kya and from the CI cluster 102-174 kya.The hPSMC results further suggest that the Central Indian elephants diverged from the rest around 50,000-80,000 years ago, while the three Southern Indian populations diverged from each other only around 20,000-30,000 years ago.However, fastsimcoal supports a model with Southern Indian populations diverging from each other between 72 and 100 kya.The serial reduction of historic effective population size from north to south indicates a potential serial founding of elephant populations in India along this direction. 33verall, our results emphasize the antiquity of the northern populations of elephants, consistent with Vidya et al. 14,15 Palkopoulou et al. 17 had suggested a more recent divergence, which could be due to the differences in the mutation rates used for making inferences (4.06eÀ8 vs. 5.3eÀ9). 34Vidya et al. 14 made inferences on phylogeography based solely on mitochondrial DNA sequences, which are expected to provide older estimates of divergence times even up to an order of magnitude greater. 35iscordance between mitochondrial and nuclear genome estimates of phylogeography has been attributed to several reasons, including sex-biased dispersal, faster sorting rate of mtDNA than nuDNA, and differences in the geographical range or numbers of hybridizing populations 27,36 ; thus, inferences of phylogeny and divergence times from mtDNA alone are best avoided. 37However, consistent with Palkopoulou et al. 17 who used whole genomes, we find that the NPG and SPG populations in the south diverged from each other only around 20,000 years ago, or the time of the Last Glacial Maximum when southern India, in particular, the Western Ghats, was more arid. 38,39netic diversity To measure how the MUs compare with each other, we estimated the genetic diversity as the number of average pairwise differences in sequences (pi) within each cluster.We find that the clusters that are from Southern India (SPG, SSG, and NPG) have lower average nucleotide diversity than the NW&NE and CI clusters.Interestingly, there is no visible difference in nucleotide diversity within the southern and between the northern and central clusters (Figure 4A; Data S1D).This suggests that all the clusters within Southern India have a similar number of haplotypes and that the central and northern cluster have a similar number of haplotypes.Expectedly, the only significant statistical differences were observed between the northern and southern clusters and the central and southern clusters, respectively (Data S1E).About 75% of the discovered SNPs are shared or present in all populations (Data S1I).However, the number of heterozygous sites encountered per Mb is the highest for the northern population, while it is lowest for the southernmost population (SSG) (Figure 4B).We find no difference in heterozygous SNV per Mb for NPG and CI (Data S1E).Overall, our results suggest that most populations of elephants have similar nucleotide diversity, but in the population SSG, similar nucleotides are often paired together, indicating lower effective population size, while in the populations in Northern India, most of the nucleotides pair with a different one, indicative of higher effective size.These patterns are consistent with signatures of serial founding, where it is predicted that the heterozygosity of each successive population decreases in comparison to the parent population and is proportional to the effective population size of the founders. 33enetic variation is related to effective population size, which, in turn, is dependent on the census population size of a species and other variables such as the adult sex ratio. 40Although we cannot speculate on the historical population sizes of Asian elephants across the various regions we have investigated, a cursory look at the recent population sizes of the five management/conservation units we have inferred from the genomic data shows that census population size is not indicative of the observed genetic variation (pi and heterozygosity) (Data S1J).For instance, the genetic variation of the Northeastern (NE) population is much higher than that of the southern population to the NPG.Both clusters have similar census population size on the order $10,000 individuals.Central Indian elephants have higher variation than southern populations, even though the former has a distinctly lower census population size of $3,000 elephants (Data S1J).This might be indicative of recent bottlenecks in Central India, as heterozygosity decays slowly, while the southern populations, especially those SPG, may have had historically smaller populations.The highly female-biased sex ratios in southern populations, especially those to the SPG, in recent historical times (1970s-1990s) from selective poaching of male elephants for ivory, 5,41 could have decreased the effective population size (compared with census size).Elephants to the SSG were also connected to the SPG population through movement of males until the early 1980s, according to anecdotal evidence. 42,43Overall, our results are thus also consistent with a serial dilution of variation that could be the result of sequential colonization [44][45][46] from northern to central and northern + central to southern, and then from NPG to SSG, in that order.
An alternative and more parsimonious explanation could be that northern and central (NW&NE + CI) vs. southern (NPG, SPG, and SSG) groups diverged long ago (134-245 kya with a median of 185 kya), with the various groups or subgroups becoming more isolated over time due to a combination of several factors such as isolation by distance, phylogeographic barriers, and anthropogenic drivers (in recent millennia).In the future, with more sampling of these populations and varied types of data (including phenotypic data, known neutral and selected loci), a framework such as the one proposed in Orsini et al. 47 might help distinguish between these two scenarios of elephant evolutionary history (divergence vs. colonization) more conclusively.
We further test sequential colonization by estimating F ROH , the proportion of the genome in homozygous stretches, with longer stretches indicative of recent bottlenecks and/or inbreeding. 48e observe that individuals from SSG have high average F ROH > 0.1 Mb of 0.4 (40% of the genome is in homozygous stretches), while individuals from Northern India have the least average F ROH > 0.1 Mb of 0.2 (20% of the genome is in homozygous stretches; Figure 5A).There is no significant difference between individuals from SPG, NPG, and CI populations (average F ROH > 0.1 Mb is 0.25).Longer homozygous stretches are indicative of recent inbreeding. 49,50Run of homozygosity (ROH) longer than 1 and 10 Mb shows few differences between populations, although the Northern Indian population has the least recent inbreeding (Data S1F-S1H).Additionally, the SPG and northern populations have lower variance in F ROH compared with other populations, potentially indicating gene flow and connectivity within these large landscapes, which will result in lower variance in parental relatedness.The results suggest that despite inbreeding avoidance in elephants, 6 founder effects and drift led to highly inbred individuals.
While the F ROH analyses suggest individual inbreeding, a high average F ROH in a population need not result in inbreeding depression, as the homozygous segments may differ between individuals within a population.To measure the consequences of mating between individuals within each population, we measured the percent genome that is identical by descent (IBD) shared between pairs of individuals in each population (Figure 5B).Elephants in NW&NE share very few IBD stretches of the genome (on average 1.6% in more than 10 Mb long and 6.6% in more than 0.1 Mb long IBD stretches) with each other, while the two individuals we had from SSG shared about 43% of their genome in stretches longer than 10 Mb and about 52% of their genome in stretches longer than 0.1 Mb.Apart from the one outlier pair in the SPG sample set, we observe negligible differences in the SPG, NPG, and CI populations with regards to the proportion IBD stretches of genome shared between pairs of individuals.On average, these three clusters share more than 10% of their genome in IBD longer than 10 Mb (NPG, 8%; SPG, 14%; CI, 12%) and 19% of their genome in IBD longer than 0.1 Mb (NPG, 15%; SPG, 25%; CI, 17%).These results indicate inbreeding depression in the future in SSG, SPG, and CI populations due to inbreeding in upcoming generations.Since the present individuals already share large IBD stretches of the genome, impacts on fitness will be maximal, especially if these individuals have higher numbers of deleterious alleles.However, since there are a few outlier pairs with lower amounts of shared genomic IBD stretches (Figure 5B), there is hope that maintaining gene flow  within these population clusters could sustain extant genetic variation in these populations into the future.

Genetic load
We attempted to understand whether individuals from certain populations that have high homozygosity could suffer fitness effects due to inbreeding depression by estimating genetic load or putative damaging mutations.This depends on the number of derived deleterious mutations harbored across the genome, their homozygous, and the putative magnitude of their fitness effects (insertion/deletion [indel] loss of function [LoF] > LoF > missense mutation). 51,52We normalized this by derived SNPs in neutral/synonymous/intergenic regions to account for possible missingness/ uneven coverage across genomes.As expected, populations with the lowest genetic variation (SSG) had fewer LoF mutations (6a, x axis).This could be because of the serial founding demographic history from north to south (Figure 1E).However, LoF mutations in SSG were not entirely a subset of those in the northern population, potentially supportive of the emergence of de novo mutations in the SSG population along with differential influence of drift and selection due to the long history of isolation between the two populations.The SSG population has the highest homozygosity for the LoF mutations.Interestingly, SPG also had fewer homozygous LoF mutations (Figure 6A), potentially due to higher effective population size and de novo mutation accumulation due to historical isolation.
To better understand whether elephant populations such as SSG are endangered by imminent inbreeding depression, we compared the number and homozygosity of indel LoFs between all five populations.The total indel LoFs did not vary considerably between populations (x axis; Figure 6C).The observation that all populations have a similar indel LoF load (compared with LoF load; Figure 6A, x axis) suggests that they are under strong purifying selection and that these putative high negative effect mutations could have already been purged from these populations.By contrast, the homozygous indel LoFs remain higher in SSG and SPG.This suggests that despite purging, elephants to SPG could experience negative fitness consequences of indel LoFs first.This population may be threatened with inbreeding depression or lower fitness of individuals, ensuing negative feedback on population growth rates and a potential for further increase in homozygosity due to inbreeding in the future generations. 53However, the Asian elephant in Borneo, despite its extremely narrow genetic base, 54 possibly the outcome of a severe bottleneck during colonization of the island in the Last Glacial Maximum, 55 has recovered and maintains a population of about 2,000 individuals.Also, African elephant populations (such as at Addo National Park, South Africa) grew rapidly with no apparent deleterious effects after a severe bottleneck about a century ago. 56,57Although small populations are expected to be efficient at purging strongly deleterious alleles, 52,58 this does seem to be happening in the isolated population to SSG, leading to a fewer number of deleterious alleles there.But the question that remains is whether these endangered populations with long generation times and low population growth rates can tolerate high genetic load.
Southern populations (NPG, SPG, and SSG) have fewer derived missense mutations than CI and the northern populations (Figure 6B), but the populations SPG and SSG have higher homozygous missense mutation load than the NPG, CI, and northern populations, again potentially due to drift and inbreeding.The fewer number of mildly deleterious missense alleles in the Southern Indian population to the NPG compared with CI and northern is counterintuitive, as current literature suggests that larger populations (Data S1J) host high numbers of mildly deleterious alleles. 52,58We suggest serial dilution of missense variants combined with a long isolation from the larger and connected Northern Indian population may have led to restricted immigration of mild impact deleterious alleles in the NPG population while the LoF are purged out equally well in the large NPG, CI, and northern populations.Additionally, the low homozygosity in the NPG, CI, and northern populations indicates a potential for inbreeding depression. 53Overall, we observe that the populations in Northern India have high numbers of deleterious alleles, but the effect of these alleles is masked due to heterozygosity.Although the southern populations have fewer deleterious alleles, they have a high chance of expressing them due to high homozygosity.For example, in the southernmost population (SSG) more than 40% of the LoF mutations are homozygous, while in NW&NE, about 20% of the LoF mutations are homozygous.Thus, serial founding in elephants seems to increase the realized genetic load in every successive founding event.
While the exact fitness effects of mutations and the distribution of fitness effects are unknown in most wild species, we tried to better understand the possible fitness effects of load.The LoF mutations were annotated using the FUSIL database. 59The database creates knockout mutations of specific genes in mice and evaluates if specific knockout mutations are lethal, sub-viable (only 12.5% individuals survive), or harmless.Among the genes harboring LoF mutations in Indian elephants, 892 were represented in the FUSIL database.Out of these 892 genes, 188 LoF mutation were found to be homozygous lethal in mice, and 112 mutations were found to lead to a sub-viable phenotype in mice.Thus, about 34% of the LoF mutations, when homozygous, produce a deleterious phenotype in mice.This highlights the risks the elephants face in the near future.While the mutations causing lethal phenotypes are expected to be purged out faster, the ones with sub-viable phenotype may pose the real threat and lead to a slow decline in elephant populations.
Functionally, the derived missense mutations are related to sensory perception, detection of chemical stimulus, and especially olfaction (Figure S8).However, there is an abundance of olfactory genes in the elephant genome, 60 and thus there are more opportunities for mutations.Additionally, several olfactory genes can be pseudogenes.However, since we also polarized the alleles with the Bornean elephant genome, the missense mutations can be confidently classified to be recently derived and may be potentially deleterious.The LoF mutations mostly affect protein, ions, and nucleic acid binding abilities, along with transferases and transporters (Figure S9).Although inbred individuals are more homozygous for these mutations and may be expected to be severely affected, it is possible that sociality and herd living allow reduced sensory abilities in individuals.However, this needs further investigation.

Conservation implications
Overall, our results suggest five MUs for Asian elephants in India.Elephants in the Himalayan foothills from the northwest to the northeast of India are a single genetic cluster that diverged from other Indian populations more than 70,000 years ago.Also, given that the mitochondrial heritage of the northern cluster is distinct from other elephant populations in India, the NW&NE cluster is probably the most evolutionarily unique population.Elephants SSG need priority conservation attention to better understand census sizes, connectivity, and genomic variation.Conservation and management action should focus on protecting remaining habitats, minimize unnatural deaths, examine the feasibility of restoring the link between NPG and SSG, and serious considerations about translocations into and out of this population.Since we did not have extensive samples from the SSG population, more genetic sampling, as well as studies on inbreeding and possible associated phenotypes, will be important to understand future fitness trajectories here.
Populations to the NPG and SPG are distinct MUs, and animals should be moved across this biogeographic divide only after very careful biological evaluations of the consequences.Further, given the high genetic load in these two MUs, existing connectivity must be maintained within each of these two major landscapes.
The Ganges and Brahmaputra rivers in the north function as biogeographic divides, though incomplete, making elephants south of these rivers in Central India a unique management/conservation unit.Detailed and geographically widespread genetic studies in Central India are important to better understand this MU, in particular in the recent context of large-scale dispersals.
Overall, elephants across India face significant challenges because of a suite of factors, mainly negative human impacts, and we hope that conservation efforts can be bolstered by recognition of their unique evolutionary history.Using wholegenome data 27,61 as opposed to solely mitochondrial DNA can allow a better understanding of how Asian elephant evolution was shaped by Pleistocene climate cycles, 14 something which we have not attempted here, and will respond to future climate change in the subcontinent. 62Previously, researchers have suggested mixing lineages for conservation provided risks of outbreeding depression are evaluated 63 or assumptions can be made to ignore outbreeding depression. 64We find deep divergences between the populations of elephants.We recommend that maintaining historic connectivity between populations to allow natural dispersal between populations will be a better approach than translocation for conservation until risks of gene flow can be evaluated with empirical evidence.
While the impact of genetic variation on survival of small and isolated populations continues to be investigated, several recent studies highlight purging of deleterious allele load 52,53 and the references therein.This study on Indian elephants reveals that while the number of deleterious alleles is low in small populations (such as SSG), the realized load is high because of higher homozygosity.Additionally, our data demonstrate that populations can have low (but unmasked) deleterious allele load due to their unique demographic histories.Regardless, populations with homozygous deleterious mutation load should have high priority for conservation.The exact phenotypic and demographic trajectories of inbreeding depression and extinction will depend on the specific load and may be difficult to generalize across populations and species.
Our approach of using high-throughput genomic data allowed for unique and important conservation insights.Elephants in India present perhaps one of the first clear examples of serial founding of a large landscape at a subcontinental scale, with decreasing founder size from the source.Our data allow us to detect recent declines, potentially mediated by historic elephant captures on a large scale.Further, elephants provide an excellent system to better understand the interplay between genetic load and inbreeding in a set of five populations, a rare setup in endangered species.While immediate on-ground conservation challenges for elephants are mitigation of human-associated conflict and human infrastructure-associated mortality, conservation genomics insights provide long-term conceptual guidance for future survival of these populations.

Lead contact
Further information and requests for resources should be directed to anubhabkhan@gmail.com.

FST
We estimated pairwise FST between populations using VCFtools.Both mean and weighted FST were estimated in order to capture the relative separation of the populations.F 3 We estimated F 3 statistics to test for gene flow between various pairs of population under various three population model of trees.ADMIXTOOLS2 was used to estimate F 3 values for all possible three population models in our data.The function qp3pop and f3 were used.

Demographic history
Recent Demographic History We used the LD based GONE 28 for estimating recent demography history up to a couple of hundred generations ago.We used the default settings of the parameters and set PHASE as 0. We plotted the results assuming a generation time of 31 years. 17This is a multiple sample analysis and requires several samples for accurate estimates.Hence, we performed this analysis only for the NW&NE and the NPG cluster which have at least nine samples each.The results from this analysis better predict recent events but are not very reliable for estimating events older than a hundred generations. 28ld demographic History We used the coalescent based SMC++ 85 method for estimating older demographic history.We used the protocol described for the implementation of the method (https://github.com/popgenmethods/smcpp)as is.We set the mutation rate to 5.3*10^-9/base/ generation 34 and a generation time of 31 years.The estimates from this method are reliable for estimating demographic events between a hundred generations ago to forty thousand generations ago.Divergence time Divergence time was estimated by contrasting various models using fastsimcoal2.7 31 and hPSMC. 32e filtered the autosomal SNPs to remove sites that are in LD with r 2 value of 0.5.This retained 80,684 SNPs.We estimated folded site frequency spectrum using easySFS (https://github.com/isaacovercast/easySFS) to subsample 9,4,6,4,2 diploid individuals from the NW&NE, CI, NPG, SPG and SSG populations using the options -proj 18,8,12,8,4.This allowed us to retain maximum number of non-missing SNPs for each of these populations while maximizing the number of individuals retained.We tested scenarios presented in Figures 7A (model 1) and 7B (model 2) with constant population size and growth rate.For model_v2 (Figure 7A), time1>time2>ti-me3>time4 represents the coalescence of lineages NW&NE-CI > CI-NPG > NPG-SPG > SSG-SPG.For model_v3 (Figure 7B), time2>time1>time3>time4 represents the coalescence of lineages NW&NE-NPG > NW&NE-CI > NPG-SPG > SSG-SPG.We model two scenarios of divergence for both models in Figures 7A and 7B, one with advent of agriculture (7000 ya) as the maximum time of divergence (time1<7Kya for model_v2 and time2<7Kya for model_v3) and another with minimum divergence time around the last glacial maxima (time1>17.5Kya for model_v2 and time2>17.5Kya for model_v3 86 ).We estimated parameters using fastsimcoal2.7 using options -n 500,000 -m -M 0.001 -l 30 -L 60 as described in Armstrong et al. 80 We then selected the best model using the best AIC values (Data S1J).We estimated confidence intervals using bootstrap.For this, we randomly sampled 80% of the loci from the dataset 10 times and estimated parameter with initial values declared using -initvalues and fastsimcoal2.7 options -n 500,000 -m -M 0.001 -l 30 -L 60.
We also estimate divergence times between populations using hPSMC.We used a single individual selected at random from each population (Data S1A).We created a pseudo-diploid individual using samples from two populations for each scaffold except for the sex chromosome scaffold (https://github.com/jacahill/hPSMC). Then performed PSMC with the default settings.We plotted the results assuming a mutation rate of 5.3*10^-9/base/generation 34 and a generation time of 31 years.The time point where the effective population size estimated from the pseudo-diploid individual rises exponentially is the point where the two haplotypes do not coalesce and hence signify population divergence.Elephant census population size Elephant census population size were noted from EPE (2017) Synchronized elephant population estimation India 2017.Project Elephant Division, Ministry of Environment, Forest and Climate Change, New Delhi.See Data S1J

Genetic diversity
Nucleotide diversity: pi We randomly subsampled four individuals from NW&NE, CI, NPG and SPG sample sets and retained the SSG dataset (n=2) as is.We estimated the average number of pairwise differences per site for each population cluster as described in Wang et al. 87 Briefly, we set the function dosaf to 1 ANGSD to estimate allele frequency likelihood for each site.The we used the used the function realSFS and estimated a folded site frequency spectrum.We set the function doThetas to 1 and estimated pi per site and used the ThetaStat function the summarize the pi value for each scaffold.We used only the chromosomal scaffolds from this summary and divided the ''tP'' values by the number of sites for each scaffold to obtain the average pi per site.Furthermore, we statistically compared the significance of the estimated values between populations (see Data S1D).

Heterozygous SNV encounter rate
We estimated the number of heterozygous SNPs for each individual using the vcfstats function of RTGtools 75 (https://github.com/RealTimeGenomics/rtg-tools).We then divided the number of heterozygous sites by the total number of sites genotyped for the individual and multiplied by 10^6 to obtain SNV encounter rate/Mb.

Figure 1 .
Figure 1.Sampling locations and potential geographic barriers to elephant dispersal SSG represents south of Shencottah Gap, SPG represents south of Palghat Gap but north of Shencottah Gap, NPG is north of Palghat Gap, CI represents the Central Indian landscape, NW is Northwestern India, and NE is Northeastern India.See Data S1A for more details.

Figure 2 .
Figure 2. Population structure based on 2.7 million SNPs (A-C) PCA, (D) ADMIXTURE plot at K = 5, and (E) qpgraph with no admixture events with the branch length qualitatively adjusted to the drift parameter.NW&NE (n = 12, sky red) indicates samples from Northeastern India (n = 9) and Northern India (n = 3), CI (n = 5, dark blue) from Central India, NPG (n = 9, orange) from north of Palghat Gap, SPG (n = 4, green) from south of Palghat Gap but north of Shencottah Gap, and SSG (n = 2, yellow) from south of Shencottah Gap.See Figures S1-S4 and S6 for details on evaluation of ADMIXTURE plots and population structure after LD pruning.

Figure 3 .
Figure 3. Demographic history(A) Recent demographic history within in the last 100 generations inferred from GONE.28The increase in N e around 500 years ago is most likely an artifact.(B) Most likely demographic model obtained from fastsimcoal2.31(C) Population divergence estimated from hPSMC.32The point where population sizes start increasing exponentially in hPSMC plot is the time of population divergence.The colors used to represent the populations remain the same as earlier, northwest and northeast (sky blue), Central India (dark blue), north of Palghat Gap (orange), south of Palghat Gap (green), and south of Shencottah Gap (yellow).See FiguresS5, S7, and S10 and Data S1B and S1C for details on population differentiation and gene flow.

Figure 4 .
Figure 4. Genetic diversity (A) Boxplots of pairwise nucleotide differences per site (pi) in the populations (error bars represent variance in mean pi/site across scaffolds), and (B) heterozygous SNV encounter rate per Mb of the genome.The populations are represented, and as earlier, the different colors include northwest and northeast (red), Central India (dark blue), north of Palghat Gap (orange), south of Palghat Gap (green), and south of Shencottah Gap (yellow).See Data S1D and S1E for statistical significance.

Figure 5 .
Figure 5. Present and potential future inbreeding (A) Inbreeding measured as F ROH of individuals in each population based on ROH stretches longer than 0.1 Mb (mustard), 1 Mb (blue), and 10 Mb (pink) of genome.The error bars represent standard deviations.(B) Identical by descent (IBD) stretches of the genome longer than 0.1 Mb vs. 10 Mb shared between pairs of individuals in each population.The populations are represented, and as earlier, the different colors include northwest and northeast (red), Central India (dark blue), north of Palghat Gap (orange), south of Palghat Gap (green), and south of Shencottah Gap (yellow).See Data S1F-S1H for statistical significance.

Figure 6 .
Figure 6.Mutation load Load measured as a function of homozygosity vs. number of derived (A) loss-of-function (LoF) mutations, (B) missense mutations, and (C) indel mutations.The number of derived deleterious alleles/number of derived neutral alleles is a proxy for number of deleterious alleles.The error bars indicate standard deviations.The different colors include northwest and northeast (red), central (dark blue), north of Palghat Gap (orange), south of Palghat Gap (green), and south of Shencottah Gap (yellow).The error bars represent standard deviations.See Figures S8 and S9 for gene enrichment results.

Figure 7 .
Figure 7. Models tested on fastsimcoal2.7Modelstested (A) model_v2 and (B) model_v3 using scenarios of recent divergence (less than 7,000 years ago) and old divergence (more than 17,500 years ago).See Data S1K for statistical significance.