Introduction

The island of Taiwan in Southeast Asia is located about 130 km from the Fujian Province of Mainland China. The two are separated by the Strait of Taiwan, a shallow body of water no more than 100 m deep, which turns into a land bridge during glacial epochs. Human presence in Southeast Asia dates back to at least 40,000 years ago (ya) and the Negrito indigenous groups currently living in the Philippine Islands, Peninsular Malaysia, and Andaman Islands are thought to represent the descendants of the pre-Austronesian out of Africa migration wave [1]. The first occupation of the island of Taiwan can be traced back to the Upper Paleolithic 50–10 kya [2]. Today the aboriginal people of Taiwan, which reached the island during the Holocene 8–6 kya, constitute about 2.3% (~530,000) of the population of the island and are subdivided into 16 nationally recognized groups that include the Ami and Yami tribes, the subjects of this investigation.

Many archaeological sites indicate affinities between southern China and Taiwan, suggesting the first settlers of Taiwan migrated from coastal continental Southeast Asia. For example, the anatomically modern human (AMH) fossil known as Zuozhen Man and artifacts unearthed from the Cailiaoxi River Valley in the southwestern plains of Formosa near Tainan City were dated to 20–30 kya and pinpoint to an occupation that extends to as recent as 7–6 kya [3]. Changbinian cultural sites in Eastern Taiwan dated between 15 to 5 kya associate the people of Fujian Province in coastal Mainland China with groups from Eluanbi in the southernmost end of the island and Baxiandong in the central east coast [4]. Cultural similarities shared between Kweichow communities and ancestors of the Yüeh people of coastal Mainland China, including couvade, are still practiced by several Taiwanese aboriginal tribes [5].

Archaeological evidence suggests that several independent migrations from several coastal locations of mainland Southeast Asia (MSEA) populated different regions of Taiwan [6]. It has been theorized that the Dapenkeng culture (TPK culture) and the Lapita Cultural Complex (LCC) had roots in farming communities of MSEA that subsequently dispersed rapidly to Taiwan and the Philippines [7]. Southeastern China has been identified as the potential source-region of Neolithic crops transported overseas to Taiwan and Island SouthEast Asia (ISEA) [8]. Today the aborigines of the island of Taiwan speak 9–12 languages that belong to the Formosan subdivision of Austronesian [9, 10] while the inhabitant of Orchid Island, the Yami, speak Yami a member of the Batanic group of languages and itself a member of the Malayo-Polynesian branch of Austronesian that includes all Austronesian languages outside Taiwan. Batanic languages are also spoken in northern Philippines [11].

These vernaculars exhibit considerable phonological and lexical diversity and all are mutually unintelligible [11]. In Indo-European terms, the diversity of primary Austronesian subgroups in the island is roughly equivalent to finding representatives of every branch of the Indo-European language family (i.e., Anatolian, Indo-Iranian, Greek, Italic, Celtic, Germanic, Armenian, Tocharian, Balto-Slavic, and Albanian) [12]. According to one of the phylolinguistic models, the ten independent subdivisions of the Austronesian family cannot be nested (do not cluster into clades) suggesting considerable time depth and early settlement of the island [13]. Other models provide phylogenies with various levels of linguistic affinities among some of the aboriginal populations [14, 15]. All the tribes are culturally distinct. These archaeological and linguistic data add support to the possibility that multiple migrations from MSEA populated Taiwan.

Genetically, all Taiwanese tribes are internally highly homogeneous but exhibit an unprecedented high degree of diversity among them [16, 17]. A recent Y-STR study examining the phylogenetic relationships of the main aboriginal populations from Taiwan indicates that the tribes partition into three clusters reflecting different locations within the island [16]. One grouping includes the Ami and Yami who associate with populations from the Philippines, located within the putative Austronesian path to Oceania.

Subsequent to the seminal work of Melton and colleagues in 1998 [18], which implicated Taiwan as the source of the Austronesian dispersal, a number of studies contended that the genesis of Austronesians resided in MSEA, underscoring Taiwan as a layover into the Pacific [19,20,21,22,23,24] while others argued for a direct migration from MSEA into Oceania skipping the island altogether [25, 26] and others implicated Indonesia as the origin of Polynesians [27, 28]. According to the Out of Taiwan theory, Austronesians from Taiwan began to travel south about 5 kya [29]. Gray et al. have stipulated that Austronesians experienced a hiatus of 1–2 ky in Taiwan as they developed navigation technology capable of crossing the treacherous channel between Taiwan and the Batanes Islands of the Philippines [9]. A close relationship between the people of the Philippines and the Yami from Orchid Island based on Mitochondrial DNA (mtDNA) haplogroups B4a1a, E1a1a1 (formerly E1a1a), and M7c1c3 (formerly M7c3c) has been reported [30]. The Philippine Archipelago has been identified as the dispersal center of the Malayo-Polynesian branch of the Austronesian family of languages [31]. Following the incursion into the Philippines, it is theorized that Austronesians proceeded in a southeasterly direction. This path traces an insular arc that includes Melanesia and Micronesia, which according to the archaeological record were populated by Austronesians around the same time, about 3500 ya [32]. Uniparental and autosomal Single Nucleotide Polymorphism (SNP) evidence indicates that in Malaysia and western Indonesia, the Austronesians encountered the original inhabitants of the area and in eastern Indonesia they mixed with Papuans/Melanesians allowing for some degree of bidirectional gene flow [33]. These migrants then progressed along the northern coast of New Guinea and then in an easterly direction successively colonizing the islands of the Bismarck, Solomon, Santa Cruz, and the Vanuatu Archipelagos [34]. The Austronesians then colonized the Tonga Archipelago around 3.3 kya and the Samoan Islands around 3 kya [35,36,37]. According to radiocarbon dating of archaeological sites in the Society Islands of East Polynesia, the archipelago was settled approximately 1 kya [38, 39], the Marquesas 830–730 ya and Rapa Nui (Eastern Island) about 820 ya, Hawaii 800–850 ya, and New Zealand around 740 ya [38, 40]. In recent years, several studies have reported on the genetics of SEA populations setting the stage for comparative research relating to individual Taiwanese aboriginal populations [22, 41,42,43,44,45,46].

In the present study, the Ami and Yami aboriginal populations of Taiwan, previously identified within the putative Austronesian path to Oceania [16], are examined and compared to key targeted reference populations from the literature utilizing autosomal, mtDNA, and Y-specific markers. We theorized that the data generated would provide information on the trajectory taken by Austronesians in SEA as they colonized Oceania (see Materials and Methods in Supplementary Material).

Results

Y-SNPs and haplogroup distribution

Supplementary Tables 3 and 4 provide the Y-SNP genotype and the assigned haplogroup for every individual of the Ami and Yami populations, respectively. Supplementary Fig. 3 illustrates the phylogeny of the relevant Y chromosome haplogroups and the position of the markers that were genotyped on the branches of the phylogeny. Haplogroup diversity values were calculated and are provided in Supplementary Fig. 3. In the Ami and Yami, all individuals belong to haplogroup O-M175 sub-clades apart from one P1-M45 in Yami. The level of genetic diversity within haplogroup O is greater in the Ami, which have sub-lineages from both O1-M119 and O3-P201 with O1 Y chromosomes mainly belonging to O1a-F31 and O1c-M110. The Yami population displays O1a-F31 and O1b-F819, the later virtually missing from the Ami population. The single O3 in our Yami sample set belongs to the sister-branch of O3-P201 defined by marker KL1. The Y-SNP genotype data were utilized and analyzed with comparable data from pertinent key reference populations from MSEA, ISEA, and Oceania (Supplementary Table 5) to generate Principal Component Analysis (PCA) plots. One PCA was generated using individual reference populations (Supplementary Fig. 4A) while a second one utilized populations grouped into geographical regions (Supplementary Fig. 4B). The partitioning of populations in Supplementary Fig. 4A delineates a “V”-like distribution with the Ami and Yami groups at the lower end near the Y-axis and in proximity to groups from Taiwan reported in a previous study [19]. Emanating from this cluster, a number of populations from the Philippines, Malaysia, Indonesia, and MSEA segregate sequentially into quadrants 3, 4, and 1. The PCA based on regional groups (Supplementary Fig. 4B) similarly illustrates progressive partitioning from the Ami and Yami of Taiwan to the Philippines, Indonesia, Near Oceania, and Polynesia.

mtDNA

Although the mtDNA diversity in both Ami and Yami populations is limited compared to the MSEA, MSEA-dominating haplogroups including B, F, and M7 are also prominent in the two Taiwanese aboriginal tribes (Supplementary Table 6). Also, typical of ISEA, haplogroup E, nearly absent in MSEA, is present in the Ami (13.8%) and Yami (1.3%). The B4a haplogroup, a precursor of the Taiwanese/Austronesian-specific B4a1a haplogroup and the Polynesia-specific B4a1a1a haplogroup (Polynesia Motif), is an abundant haplogroup in the Ami (20.5%) and Yami (24.0%). The frequency of B4a1a is 16% in the Ami and 4% in the Yami individuals. M7 and F are the other two prominent haplogroups in the Ami (18.2% and 20.5%, respectively) and Yami (36.0% and 20.0%, respectively).

Supplementary Table 7 provides the mtDNA genotype data and the assigned haplogroup of every individual of the Ami and Yami populations. The genotype data of Ami and Yami individuals were analyzed with genotype data from pertinent reference populations from MSEA, ISEA, and Oceania (Supplementary Table 8) to generate PCA plots. One PCA was generated using individual populations (Supplementary Fig. 5A) while a second one used grouped populations from specific geographical regions (Supplementary Fig. 5B). In Supplementary Fig. 5A, the Ami population plots within a compact cluster composed mainly of Taiwanese and Filipino groups. The Yami plots at the center of this aggregation. From this cluster, a number of Filipino and Indonesian groups radiate into quadrant 3 followed by Polynesian populations from the Society and Marquesas Archipelagos of East Polynesia and then by populations from Near Oceania and Fiji. The MSEA groups from Thailand and Myanmar are scattered in quadrant 4 of the projection. A parallelism exists with the segregation observed in the Y-SNP plot (Supplementary Fig. 4A). The data from both of these uniparental marker systems signal a close genetic affinity among Taiwanese aboriginal and Filipino populations as well as gradual weakening of this phylogenetic closeness as a function of geographical distance from Taiwan in the direction of Indonesia, Oceania, and Polynesia. Both Y-SNP and mtDNA data indicate that the populations from MSEA partition distant from those of ISEA and Oceania. The mtDNA PCA based on regional groups of populations (Supplementary Fig. 5B) exhibits the Ami clustering with the Ami reported in a previous study, the merger of Taiwanese tribes and the total of all Filipino populations. The Yami plots in association with the Indonesian group of populations mainly from Sumatra (Indonesia-total). As with the mtDNA PCA generated using individual populations (Supplementary Fig. 5A) and Y-SNP projections with individual (Supplementary Fig. 4A) and regional groups of populations (Supplementary Fig. 4B), groups from Polynesia and Oceania follow in a progression into quadrant 3 with New Britain at the end (Supplementary Fig. 5B). Also, in this projection, the groups from MSEA segregate distantly to the right in quadrant 4 as outliers.

Complete mtDNA sequencing of 15 Ami and 18 Yami individuals were obtained. Phylogenetic trees of superhaplogroups B4a/B4g (Supplementary Fig. 6), B4b (Supplementary Fig. 7), B4c (Supplementary Fig. 8), B5 (Supplementary Fig. 9), F1 (Supplementary Fig. 10), and F4/F3 (Supplementary Fig. 11) were constructed with the complete mtDNA sequences generated in this study and previously published reference populations from Taiwan aboriginals, MSEA, Philippines, Indonesia, and the Society Islands (Eastern Polynesia). The complete mtDNA sequences were used to access the degree of differentiation and genetic homogeneity among individuals. Based on the complete mtDNA sequences, no identical mitochondrial genomes were detected among Ami individuals while among the Yami only two mtDNA sequences were found in single individuals, all others were shared by at least two individuals. The following haplogroups were found in two to four individuals: B4a2a1 (n = 4), M7c1c3a1 (n = 3), F1a1d1 (n = 3), B4c1b2a2a (n = 2), B4b1a2b2 (n = 2), B4a1a4 (n = 2). Based on these data, haplotype diversity was calculated to be 0.9020 ± 0.0374 for the Yami and 1.0 ± 0.0243 for the Ami (P = 0.0470). The last doubleton shares the Austronesian-specific motif (B4a1a) with all Society Islands individuals from Eastern Polynesia. Comparisons of the complete mtDNA genomes indicate close sequence similarities between the Ami/Yami and populations from the Philippines. Identical mtDNA genomes were detected between a number of Ami/Yami and Filipino samples in branches B4a1a (Supplementary Fig. 6), B4c1b2a2a (Supplementary Fig. 8), F1a4a1, F1a4a1 + 8757, and F1a3a3 (the last three in Supplementary Fig. 10). No identical or nearly identical mtDNA genomes were shared between Ami/Yami individuals and Mainland Chinese samples.

Bayesian skyline plots (BSPs) revealed that the Ami (Supplementary Fig. 12) and Yami (Supplementary Fig. 13) populations underwent bottleneck events at different times. The Ami started experiencing a decrease in effective population size starting at about 8 kya. It reached the lowest point below one-third the previous stable effective population size around 1500 ya and after that it started to increase. The Yami, on the other hand, started to decrease in effective population size around 6 kya reaching its lowest point below one-tenth the previous level around 500 ya. Subsequently, the population stared to increase in effective population size.

Autosomal SNPs

PCAs

The first two components in the PCA plot partition the Ami and Yami individuals in close proximity at the extreme right of the projection (Supplementary Fig. 14A). An expansion of this area is provided in Supplementary Fig. 14B. People from the Igorot population of the Philippines are embedded within this tight Ami/Yami cluster. From this threesome clutch of populations, individuals from the rest of the populations stream out in the form of five region-specific offshoots. From bottom to top, the first stream of people in yellow is entirely made up of individuals from China; in dark blue, individuals from MSEA; in dark green, the Malay Peninsula populations; in orange, groups from the Philippines; in olive green and light blue, people from Bajo in Indonesia and Polynesia, respectively. Most of the separation among individuals within these region-specific offshoots occurs along PC1. The population composition of each stream mostly corresponds to their geographical origins (e.g., the Philippines stream is made up entirely of Filipino individuals). Rarely an individual segregates outside its region-specific elongated cluster. The PC1/PC3 plot provided in Supplementary Fig. 15A parallels the topology observed in the PC1/PC2 projection. The same trio of Ami, Yami, and Igorot (Philippines) individuals followed by five streams of regional populations are observed (see expansion of the plot in Supplementary Fig. 15B). Individuals from Melanesia segregate distantly from the Taiwanese aborigines.

Structure analyses

To shed light on the genetic composition of the Ami and Yami in the context of SEA and Pacific Ocean populations, structure analyses were performed with the autosomal SNP data. Population substructuring was explored at k = 2–18 (Fig. 1A). At the most informative K values (K = 13 and K = 14) distinctive substructuring patterns are observed between the Ami and Yami populations. The former exhibits a substructure composition reminiscent of several Filipino and Borneo populations while the Yami illustrates a genetically highly homogeneous profile. The Yami structure patterns at K = 2–13 are essentially the same as the Igorot’s (Fig. 1A, B). In both populations, all individuals are genetically uniform exhibiting only one component. It is only at K = 14–18 that the Yami and Igorot adopt unique uniform colors (Fig. 1A, C); at K = 13, both Yami and Igorot exhibit a uniform light magenta color (Fig. 1B) while at K = 14 Igorot retains the light magenta but the Yami changes to a uniform brown color (Fig. 1C). The Ami substructuring at all K values is similar to the pattern seen in the Filipino populations of Casiguran and Zanbales from central coastal Luzon, Philippines, and the Kayah-Lebbo, Murut, and Dusun from west coast Borneo. The Ami, Filipino, and Borneo populations mentioned above share similar color patterns including light yellow, aquamarine, brown, light magenta, and orange components although in different proportions (Fig. 1A).

Fig. 1: Structure analysis.
figure 1

A Structure analysis based on autosomal SNP loci. Best runs of 100 replicates at K = 2–18. Please refer to Supplementary Table 1 for specific populations examined within each region. K values are indicated on the Y-axis. B Structure analysis of populations at K = 13. Please refer to Supplementary Table 1 for specific populations examined within each region. K values are indicated on the Y-axis. C Structure analysis of populations at K = 14. Please refer to Supplementary Table 1 for specific populations examined within each region. K values are indicated on the Y-axis.

Admixture, genetic affinities, and average heterozygosity

The f3 results do not show evidence of an admixture event between reference populations leading to the genesis of Ami (Supplementary Table 9) or Yami population (Supplementary Table 10). To see if the Ami and Yami always form a clade with respect to reference populations, f4 analysis was performed (Supplementary Table 11a). When f4 analysis (of the form “Ami, Yami; Mongolia, Yoruba”) was performed, Ami shared a border-line significant positive value with Mongolia. When the order was change to “Mongolia, Yami, Ami, Yoruba” to check this suspicious result, Mongolia’s f4 value was significantly negative (not different from other reference populations) (Supplementary Table 11b). This suggests that the Ami and Yami are equally related to all the test populations (Ami nor Yami shares excess ancestry with any of the reference populations). The Outgroup f3 analyses show that the most genetically similar populations to the Ami and Yami are from northern Philippines (North Luzon) (Supplementary Tables 12 and 13, respectively). These include the Igorot, Ilocano, Pangasinan, Casiguran, and Zambale, among others. Other groups from northern Borneo such as the Murut and Dusun appear within the populations of close genetic affinity to the Ami and Yami.

The average heterozygosity of the Ami and Yami ranks among the lowest among the populations examined (Supplementary Table 14). The Yami exhibits a level of heterozygosity comparable to inbred groups such as the Jarawa and Onge from the Andaman Archipelago.

Haplotype-based analyses

The degree of sharing of DNA segments between Ami/Yami individuals and individuals from the reference populations were ascertained. The observed patterns of short and long IBD (identical by descent) segment sharing with the reference populations are graphed in Supplementary Fig. 16A for the Ami and Supplementary Fig. 16B for the Yami; 95% confidence intervals are provided to evaluate statistical significance. In reference to the Ami, the population with the highest median IBD count is the Yami from Orchid Island followed by the Igorot from the Philippines. Next in median IBD counts is a group of populations from Luzon (Ilocano, Casiguran, and Zambale), the northern-most island of the Philippines. A second group that includes populations from the Philippines, Borneo, and Sumatra exhibits IBD counts lower than the previously mentioned Luzon group. East Polynesians from the Society Islands rank among the populations from this last group with IBD long segments in the range of 4–6 median counts. Other populations exhibiting smaller numbers of IBD median counts include Malaysia, Sumatra, Sulawesi (Indonesia), Papuans, Aboriginal Australians, and Melanesians. Generally, the same populations that possess high to moderate levels of median counts of long IBDs also exhibit appreciable short median IBD counts.

In relation to the Yami, the population with the highest median IBD count is the Ilocano from the northern tip of the island of Luzon. The Ami and Igorot follow in IBD counts. As with the Ami, the Yami share moderate levels of median counts (~9 median counts) with a group of Filipino populations (Visayan, Casiguran, and Zambale). A second group exhibiting lower IBD median counts (4–8 median counts) includes populations from the Philippines, Borneo, Sumatra, and Sulawesi. Polynesians rank among the populations from this last group with long IBD segments in the range of 5–7 median counts. At a lower tier of counts (1–3 counts), there are a number of populations from Malaysia, Sumatra, Sulawesi (Indonesia), Papuans, Aboriginal Australians, and Melanesian. For the most part, populations that exhibit high to moderate levels of long median IBD counts also exhibit appreciable short median IBD counts.

Supplementary Table 15 provides the average length of IBD tracts shared between the Ami or Yami and each of the reference populations as well as the number of tracts the averaging are based on. As expected, the comparison of the Ami to itself generates the highest average IBD length and number of tracts. Next the Papuans, Jarawa, and Kosipe follow with high average IBD lengths but low number of tracts (less than 100). Thereafter, Polynesians exhibit the next highest average IBD length followed by the Aeta of Luzon, the Yami, and Melanesians. Then, a number of ISEA populations from Sumatra (Batak), Luzon (Igorot), Borneo (Dusun), and southern Philippines (Tagbanua) rank high in IBD length and number of tracts. The average length of IBD tracts shared between the Ami and the reference populations are graphically illustrated in the contour map of Fig. 2A. The contour map indicates a distribution arc that delineates a cline emanating in Taiwan, decreasing first in a southern direction and then turning eastward. This crescent-shaped gradient based on average IBD size first follows a southern direction into the Philippine Archipelago, coastal Borneo, and then turns east toward Sulawesi in Indonesia and Bougainville in Melanesia. Noteworthy is the elevated IBD tract size shared between the Ami and the Society Archipelago in East Polynesia, approximately 10,700 km from each other. In fact, the Ami shares larger average IBD tracts with the East Polynesians than with the Yami (p < 0.0001), the aboriginal Taiwanese tribe from Orchid Island about 90 km away (Supplementary Table 15). Both Yami (6478 copies in common with Ami) and East Polynesia (4644 copies in common with Ami) exhibit high number of shared copies. A second arch points westward from the Philippines into the Malaysia Peninsula and then into Sumatra. In sharp contrast, the average lengths of the IBD segments shared between the Ami as well as the Yami and the MSEA populations are much shorter than the tracts shared with the populations within the arc in ISEA and Oceania (p < 0.0001 for both Ami and Yami) (Supplementary Table 15 and Fig. 2A).

Fig. 2: Contour analyses.
figure 2

A IBD sharing between SouthEast Asian populations with the Ami. Intensity scale located on the right-hand side of the contour map indicates average shared IBD sizes. Red points indicate location of populations. X-axis indicates longitude degrees while Y-axis indicates latitude degrees. Box on the lower right-hand corner indicates location of the Society Archipelago in French Polynesia. B IBD sharing between SouthEast Asian populations with the Yami. Intensity scale located on the right-hand side of the contour map indicates average shared IBD sizes. Red points indicate location of populations. X-axis indicates longitude degrees while Y-axis indicates latitude degrees. Box on the lower right-hand corner indicates location of the Society Archipelago in French Polynesia.

The IBD sharing contour map based on comparisons with the Yami tribe differs from the Ami-based projection (Supplementary Table 15 and Fig. 2B). As expected, the Yami exhibits a strong signal of IBD length sharing in the contour map when the Yami is compared to itself. Yet, the distribution arc evident in the Ami-based contour is not observed when the Yami is used as the point of reference. Specifically, the IBD sharing observed with Northern Filipino (Luzon) populations as well as the cline toward Southern Philippines, Borneo, Indonesia, and Melanesia is not seen when the Yami are used for comparison. Yet, it is of interest that the East Polynesian populations of the Society Islands exhibit larger average IBD lengths sharing with the Yami than the average sharing size shared between the Yami and Ami (p < 0.0001), geographically 90 km apart.

Total genome length shared between Ami (Supplementary Fig. 17) or Yami (Supplementary Fig. 18) individuals and individuals from reference population were averaged over the number of corresponding pairs (number of pairs is marked in brackets in each figure). In general, the total length of IBD segments shared between the Ami/Yami and the reference populations mirror the hierarchy observed in the median IBD count values with Igorot exhibiting the largest size shared followed by a group of Filipino populations including Pangasinense, Ilocano, and Visayan. As with the median IBD counts, Polynesians from the Society Archipelago share moderate IBD length segments with Ami and Yami. Both Ami and Yami exhibit less total genome length sharing with MSEA populations than with the ISEA (p < 0.0001 for Ami and Yami).

Discussion

Both the Ami and Yami exhibit limited average heterozygosity relative to most of the reference populations. The results based on the structure analyses and complete mtDNA sequences indicate that the Yami are genetically extremely homogeneous. This is reflected in the low haplogroup and haplotype diversity of this population with all but two complete mtDNA sequences present in multiple copies. This internal genetic homogeneity has been peviously reported for the Yami [21] and Ami [17]. The limited genetic diversity that is seen in the Ami and Yami today may have derived from the bottleneck episodes detected by the BSP analyses (Supplementary Figs. 12 and 13) about 8 kya and 6 kya, respectively, as populations migrated from MSEA during the Chinese Agricultural Revolution (shift to agriculture from hunting and gathering) [47, 48] in search of land [49, 50]. It is possible that bottleneck events may have impacted genetic signals from MSEA. As previously reported using mtDNA markers for Taiwanese tribes [51], we detected population expansions in the Ami and Yami subsequent to the bottleneck events. The Yami likely acquired and maintained a high level of genetic homogeneity as a result of strong genetic drift and intense inbreeding as the population is made up of only about 3100 individuals living in relative isolation on the tiny Orchid Island (45 km2). The genetic drift evident in the structure analysis (Fig. 1A–C) for the Yami and Igorot has been previously observed in a number of ISEA populations [52].

One widespread theme that emerges from most of the results generated in this study is a delineation of the route that Taiwanese aboriginal migrants may have taken into ISEA and Near Oceania. Examination of the topology of the PCAs based on Y-SNP and mtDNA loci illustrates a sequential distribution of populations stemming from Taiwan into the Philippine Archipelago, Indonesia, and Polynesia. The partitioning of populations within the PCAs based on autosomal SNPs, the profile constitution observed in the structure analyses demonstrating parallel patterns among specific populations, the average IBD tract length gradients observed in the contour maps, the total length of IBD segments shared between the Ami/Yami and the reference populations, the outgroup f3 results, and the identical mtDNA genomes linking the Ami/Yami and Filipino groups all indicate genetic affinities among populations that trace a geographical arc from Taiwan down into the Philippine Archipelago, Borneo, Indonesia, Melanesia, and Polynesia. This migration trajectory has been delineated by previous studies [13, 16,17,18, 51], although a number of publications posit more complex scenarios involving various source populations, dispersal routes, and directions [19,20,21,22,23,24]. Along these lines, mtDNA studies have reported that only about 20% of lineages in the contemporary ISEA gene pool result from the out-of-Taiwan dispersal [53].

Several lines of evidence from the present study indicate that this putative crescent-shaped route is anchored at one end by a trio of populations including the Ami, Yami, and Igorot. As previously reported by a number of studies using autosomal and uniparental markers [21, 54], our data from all the analyses performed show strong genetic affinities between Taiwanese tribes and various Northern Filipino populations. These genetic affinities are particularly evident between the Ami/Yami and the Igorots in the PCAs based on autosomal SNPs and structure analyses. The Igorots of the Philippine northern Luzon highlands are Austronesians and are thought to have migrated from Taiwan as part of the diffusion of the cord-marked and red-slipped pottery traditions [52, 55].

In the genome-wide PCAs (i.e., PC1/PC2 and PC1/PC3), the Ami, Yami, and Igorot individuals segregate at one end of the plots in similar tight conglomerates. From these compact clusters, both PC1/PC2 and PC1/PC3 graphs exhibit a topology reminiscent of five projections. These five elongated clusters are population- or region-specific (Supplementary Figs. 14B and 15B). This type of differentiation suggests limited gene flow among populations or regions subsequent to settlement.

Sharply contrasting the demic connections between the Ami/Yami, ISEA, and Oceania, the genetic affinities with populations of MSEA are more distant. Although some individuals from MSEA (e.g., Vietnam and Laos, and Han Chinese) are genetically close to the Ami and Yami in the PCAs based on autosomal data (eg., Supplementary Fig. 14A, B), the more distant genetic affinity with MSEA is visualized in the distant partitioning of the continental populations of China, Thailand, Cambodia, Laos, and Vietnam from the Ami and Yami seen in the PCA plots based on Y-SNPs (Supplementary Fig. 4A) and mtDNA haplogroups (Supplementary Fig. 5A) (see Supplementary Tables 1619 for genetic distances and statistical significance values) and the absence of strong sequence homology as seen in the total mtDNA genomes (Supplementary Figs. 611).

We tested the statistical significance of the differences in genetic distances of the mtDNA and Y chromosome data of the Ami and Yami in relation to ISEA and MSEA (Supplementary Figs. 4A, B and 5A, B). Supplementary Tables 1619 provide the genetic distances and statistical significance values. This comparison was done using the Ami and Yami populations separately as well as combined. The statistical significance (t-test) of the genetic distance differences was calculated. In all cases, the genetic distances between the Ami and Yami (combined and separately) were greater in relation to MSEA than to ISEA. And in all cases, the genetic distances between the Ami and Yami are statistically smaller to ISEA populations as opposed to MSEA populations, except for the differences involving the Ami using mtDNA markers based on groups of populations (P = 0.051) as well as Y-chromosome markers based on individual populations and groups of populations (Supplementary Tables 1619). In addition, we compared the distances of the Ami as opposed to the Yami in relation to MSEA as well as the genetic distances of the Ami as opposed Yami in relation to ISEA. In all cases, the Yami exhibits smaller genetic distances compared to the Ami in relation to MSEA and greater genetic distances compared to the Ami in relation to ISEA. Most of the differences between the two aboriginal groups are statistical insignificant (Supplementary Tables 1619). Greater sharing of haplotypes between Austronesian populations with ISEA compared to MSEA has been previously observed [52]. This more distant kinship with MSEA argues against strong genetic contribution from the mainland to the Austronesian dispersal. In the Ami IBD contour map (Fig. 2A), the clinal genetic affinities take the form of an arc. In this arc, the examined genetic parameters suggest a general gradual decline in phylogenetic closeness as a function of geographical distance from Taiwan.

Noteworthy in our analyses is the relative long average length of the IBD segments shared between the Ami/Yami and the East Polynesians of the Society Archipelago. The sharing of such long tracts and the large number of segments in common stand out considering that the Taiwanese aborigines and the Society Archipelago are about 10,700 km (straight-line distance) apart. This genetic affinity is also reflected in the prominent component shared by the Ami/Yami and East Polynesians at K = 2–10 of the structure analyses. It is likely that this spike in genetic affinity between the two aboriginal tribes and Polynesia may result from the impact of settlement of uninhabited lands as opposed to previously populated regions. Also, the number of tracts the averaging is based on is about one order of magnitude less when sharing is assessed between the Ami or Yami and Melanesia compared to the Ami or Yami and Polynesia (in both comparisons p < 0.0001). This difference may reflect the limited time Austronesians spent traveling through Papua/New Guinea and the finite amount gene flow that might have occurred between the Austronesian migrants and native Melanesians. This notion is consistent with a number of previous studies that suggest a fast-track route along the north coast of New Guinea [27, 42]. Ancient DNA from Vanuatu and Tonga, which lacks Melanesian DNA, also indicates that the remote Oceanian pioneers swept past the archipelago that surrounds New Guinea without much mating with local people [56].

Conclusion

This article reports on the genetic characteristics of the Ami and Yami, two aboriginal populations of Taiwan and their phylogenetic relationships to groups from MSEA, ISEA, and Oceania. The results from autosomal SNPs, structure analyses demonstrating parallel component patterns among specific populations, average IBD tract length gradients, total length of genome shared, outgroup f3 data, and identical mtDNA genomes all indicate genetic affinities among populations that trace a geographical arc from Taiwan down into the Philippine Archipelago, Borneo, Indonesia, and Melanesia. In this arc, the examined genetic parameters suggest a general gradual phylogenetic decline as a function of geographical distance from Taiwan. The relative long average length IBD segments shared between the Ami/Yami and the East Polynesians of the Society Archipelago, which are larger than the IBD sizes in common between the Ami and the Yami, a neighbor aboriginal Taiwanese tribe, suggest recent contact between the two regions. Both Taiwanese tribal populations exhibit overall limited genetic heterogeneity. The Yami exhibits little sub-structuring and likely reached this level of genetic homogeneity as a result of a dramatic bottleneck event about 6000 ya that occurred during the original colonization of Orchid Island, and subsequent drift and/or intense inbreeding.