Contrasting Paternal and Maternal Genetic Histories of Thai and Lao Populations

Abstract The human demographic history of Mainland Southeast Asia (MSEA) has not been well studied; in particular, there have been very few sequence-based studies of variation in the male-specific portions of the Y chromosome (MSY). Here, we report new MSY sequences of ∼2.3 mB from 914 males and combine these with previous data for a total of 928 MSY sequences belonging to 59 populations from Thailand and Laos who speak languages belonging to three major Mainland Southeast Asia families: Austroasiatic, Tai-Kadai, and Sino-Tibetan. Among the 92 MSY haplogroups, two main MSY lineages (O1b1a1a* [O-M95*] and O2a* [O-M324*]) contribute substantially to the paternal genetic makeup of Thailand and Laos. We also analyze complete mitochondrial DNA genome sequences published previously from the same groups and find contrasting pattern of male and female genetic variation and demographic expansions, especially for the hill tribes, Mon, and some major Thai groups. In particular, we detect an effect of postmarital residence pattern on genetic diversity in patrilocal versus matrilocal groups. Additionally, both male and female demographic expansions were observed during the early Mesolithic (∼10 ka), with two later major male-specific expansions during the Neolithic period (∼4–5 ka) and the Bronze/Iron Age (∼2.0–2.5 ka). These two later expansions are characteristic of the modern Austroasiatic and Tai-Kadai groups, respectively, consistent with recent ancient DNA studies. We simulate MSY data based on three demographic models (continuous migration, demic diffusion, and cultural diffusion) of major Thai groups and find different results from mitochondrial DNA simulations, supporting contrasting male and female genetic histories.


Introduction
Thailand and Laos occupy a key location in the center of Mainland Southeast Asia (MSEA; fig. 1), which is undoubtedly one of the factors facilitating the extensive ethnolinguistic diversity, as there are 68 recognized groups in Thailand and 82 groups in Laos, belonging to five language families (Simons and Fennig 2018). The prehistoric peopling of the area of present-day Thailand and Laos has been documented by several archaeological studies (Shoocongdej 2006;Demeter et al. 2012;Higham 2014Higham , 2017 and investigated further by recent ancient DNA studies (Lipson et al. 2018;McColl et al. 2018). The earliest presence of modern humans in SEA is dated to $50 ka (Higham 2013;Bae et al. 2017), followed by Paleolithic migration to East Asia $30 ka, inferred from genetic data (Yan et al. 2014;Hallast et al. 2015). There was also an expansion of Neolithic farmers and Bronze Age migrations from southern China to MSEA, which contributed to the present-day gene pool of modern MSEA people, for example, Thais and Laotians (Higham 2014(Higham , 2017Lipson et al. 2018;McColl et al. 2018). Additional migrations during the historical period from neighboring countries (Penth 2000;Schliesinger 2000) have further enhanced ethnolinguistic diversity.
The census size for Thailand was $68.41 million in 2017 and for Laos was $6.76 million in 2016 (Simons and Fennig 2018). There are five linguistic families distributed in these two countries. Although the Tai-Kadai (TK) language is widely spread in southern China and MSEA, it is concentrated in present-day Thailand and Laos as it is a major language spoken by Thais (90.5%) and Laotians (67.7%). Austroasiatic (AA) speakers are next most frequent, accounting for 4.0% in Thailand and 24.4% in Laos. In addition, this area is also inhabited by historical migrants who speak Sino-Tibetan (ST), Hmong-Mien (HM), and Austronesian languages (frequencies of 3.2%, 0.3%, and 2%, respectively, in Thailand; 3.1%, 4.8%, and 0% in Laos) (Simons and Fennig 2018).
It is generally thought that AA languages were brought to the Thai/Lao region by Neolithic farmers from southern China, whereas TK languages were brought by a later, Bronze Age migration, also from southern China (Bellwood 2018). The Neolithic expansion was $2-3 ka before the expansion of TK languages; thus, the AA people were thought to be present before the TK expansion. The TK migration during the Bronze Age could have occurred via either demic diffusion (an expansion of TK people that brought both their genes and their language) or cultural diffusion (a language spread with minor movement of people). A genetic study on the origin of TK people supports a southern Chinese origin (Sun et al. 2013), whereas our previous studies of mitochondrial DNA (mtDNA) genome sequences support demic diffusion as the best explanation for the origin of the presentday Thai/Lao TK groups, although there is a strong signal of admixture between TK and AA groups in central Thailand Kutanan, Kampuansai, Brunelli, et al. 2018). Although there is extensive ethnolinguistic diversity in the region, Thai/Lao populations can be generally categorized based on geography as either hill tribes or lowlanders. Nine ethnic groups, consisting of $700,000 people, are officially identified as hill tribes in Thailand: the AA-speaking Lawa, Htin, and Khmu; the HM-speaking Hmong and IuMien; and the ST-speaking Karen, Lahu, Akha, and Lisu.
Previous studies have reported an influence of postmarital residence pattern on genetic variation in northern Thai hill tribes, with lower within-population genetic diversity coupled with greater genetic heterogeneity among populations for patrilocal groups than for matrilocal groups for the malespecific portions of the Y chromosome (MSY), whereas the opposite pattern is observed for mtDNA (Oota et al. 2001;Besaggio et al. 2007). However, these previous studies compared genetic variation between partial mtDNA sequences (hypervariable regions of the control region) and Y chromosomal short tandem repeats (Y-STRs); it would be informative to investigate more complete genetic data from these groups.
The MSY are paternally inherited and exhibit lineages specific to populations/geographic regions, making the MSY an informative tool for reconstructing paternal genetic history and demographic change (Yan et al. 2014; Barbieri et al. 2016). However, to date, there have been few MSY studies of MSEA and almost all of them employed Y-STRs (Cai et al. 2011;Kutanan et al. 2011;Brunelli et al. 2017) and also defined Genetic Histories of Thai and Lao Populations . doi:10.1093/molbev/msz083 MBE haplogroups by genotyping assays, which are thus biased in terms of the haplogroups detected, and cannot uncover new sublineages. Analyzing partial sequences of the MSY and complete mtDNA genome sequences provides more insight into genetic history, especially sex-biased practices that can influence genetic variation, as well as the role of geography and language (Arias et al. 2018;Bajic et al. 2018;Kutanan, Kampuansai, Changmai, et al. 2018).
We have previously carried out comprehensive studies of the maternal genetic history of the Thai/Lao region, based on 1,823 complete mtDNA genome sequences Kutanan, Kampuansai, Brunelli, et al. 2018;Kutanan, Kampuansai, Changmai, et al. 2018). In order to investigate the paternal genetic variation and demographic history, here, we investigate $2.3 mB of MSY sequence in a subset of the above individuals, comprising 928 sequences from 59 populations. We compare and contrast the MSY and mtDNA results, with a focus on the patrilocal versus matrilocal hill tribes, the AA-speaking versus TK-speaking groups, and the various geographic regions (northern Thailand, central Thailand, and northeastern Thailand and Laos). We also use demographic modeling to address the role of demic versus cultural diffusion versus admixture in the origins of the major TK groups in each Thai/Lao region and contrast the results based on the MSY to previous results based on mtDNA. Our MSY sequencing results provide new insights into the paternal genetic history of MSEA and indicated contrasting paternal and maternal histories in this region.

Results
We generated 914 sequences of $2.3 mB of the MSY, which combined with 14 published sequences brings the total to 928 MSY sequences belonging to 59 populations from Thailand and Laos ( fig. 1 fig. 1, Supplementary Material online). O2a* or O-M324* is the second most frequent haplogroup (25.86%) and has a relatively high frequency (>40%) in some AA and TK groups, and all ST-speaking Karen. Additional minor non-SEA-specific haplogroups were also observed, for example, haplogroup N*, found in the Lawa groups, and haplogroups R*, H*, and J*, which support associations between India and the Mon, and genetic connections between Mon and TK groups ( fig. 1 and supplementary fig. 1 . 2D), suggesting recent paternal expansion in this group, but the converse trend (rather high diversity) for mtDNA. Interestingly, a significantly negative Tajima's D value was observed more frequently in the TK than the AA groups for both the MSY and mtDNA (MSY, P < 0.05: 10/31 for TK vs. 6/24 for AA; mtDNA, P < 0.05: 20/31 for TK vs. The Analysis of Molecular Variance (AMOVA) indicates that the variation among populations (within group) accounts for 11.12% of the total MSY genetic variance (table 1). There is greater genetic heterogeneity within the AA group (20.01%, P < 0.01 and 18.49%, P < 0.01 without MN, the hunter-gatherer group from southern Thailand) than among the TK (4.48%, P < 0.01) and ST-speaking Karen groups (2.29%, P > 0.01). For the AA group with more than one population sampled, the greatest withingroup variation by far was among the three Lawa populations Material online). Very low within-group variation was also observed for the central Thai groups from central Thailand (1.47% P > 0.01), Khon Mueang groups from northern Thailand (À1.83%, P > 0.01), and Lao Isan groups from northeastern Thailand (1.84%, P > 0.01), indicating overall genetic homogeneity among these major TK-speaking groups. In agreement with the MSY, larger mtDNA variation is observed in the AA groups (14.03%, P < 0.01) than the ST (6.51%, P < 0.01) and TK groups (4.33%, P < 0.01), but interestingly the largest within-group variation is not among the Lawa (7.78%, P < 0.01) but rather among the Htin populations (25.71%, P < 0.01). In contrast to the MSY, each of the TK groups with more than one population sampled showed significant within-group differences for mtDNA, especially the Khon Mueang (4.20%, P < 0.01) (supplementary fig. 2, Supplementary Material online). In sum, we observed different patterns of MSY versus mtDNA for the different language groups. The among-population variation within linguistic groups is larger for the MSY (20.01%, P < 0.01) than for mtDNA (14.03%, P < 0.01) for AA groups, but about the same for TK groups (4.48%, P < 0.01 for MSY and 4.33%, P < 0.01 for mtDNA), and the ST groups have larger among-population variation for mtDNA (6.51%, P < 0.01) than for the MSY (2.29%, P < 0.01) (table 1 and  Although there is more variation among groups defined by geographic location (2.38%, P < 0.01) than by language family (1.63%, P < 0.01) (table 1), there is much more MSY variation among populations within the same group than among groups defined either by geographic or by linguistic criteria. Moreover, when the divergent MN population of hunter-gatherers from southern Thailand is removed from the analysis, then the among-group component is no longer significant for either geographic location or language family (À0.09%, P > 0.01 for geography; À0.01%, P > 0.01 for language), and the total variation among populations within group reduces to 10.54%. Thus, neither geography nor language family is a good predictor of the MSY genetic structure of Thai/Lao populations, indicating that these two factors are not important in the broad view (table 1).
There are significant correlations between matrices of MSY genetic and geographic distance, estimated by Mantel tests, for all three types of geographic distances, that is, great circle distance (r ¼ 0.3381, P < 0.01), resistance distance (r ¼ 0.5418, P < 0.01) and least-cost path distance (r ¼ 0.3912, P < 0.01). However, the correlations are no longer significant when the MN group is removed from the analysis: great circle distance (r ¼ 0.0125, P > 0.05), resistance distance (r ¼ À0.0446, P> 0.05) and least-cost path distance (r ¼ 0.0139, P> 0.05). In contrary, no significance was detected (P > 0.05) between matrices of mtDNA genetic distance and geographic distances with and without MN (great circle distance: r ¼ 0.0776 and r ¼ À0.0323), resistance distance (r ¼ 0.1433 and r ¼ À0.1105), and least-cost path distance (r ¼ 0.0997 and r ¼ À0.0253).
To identify and describe population clustering based on multivariate analysis, discriminant analysis of principal components (DAPC) was carried out. This analysis attempts to maximize among-groups genetic differentiation and minimize within-group genetic variation; the results showed considerable overlap among groups defined by either language family or geographic location in both MSY and mtDNA (supplementary fig. 3, Supplementary Material online). In addition, the groupings by population and ethnicity of MSY data revealed the largest discrimination to be among some AAspeaking groups, that is, all Lawa groups (LW1-LW3), Htin (TN1), and Blang (BL), whereas all Htin groups (TN1, TN2, In sum, all results indicate lower genetic diversity of the AA groups than the TK and ST groups, except the Mon and Nyahkur, who exhibit high genetic diversity. The AA groups also show greater genetic heterogeneity than the TK and ST groups.

Postmarital Residence and Genetic Diversity
We studied five highlander groups: four hill tribes (Karen, Htin, Lawa, and Khmu) and the Palaung, another minority group in the mountainous area of northern Thailand but not officially recognized as a hill tribe. The Khmu (KA), Lawa (LW1, LW2, and LW3), and Palaung (PL) groups practice patrilocality, whereas the Htin (TN1, TN2, and TN3) are matrilocal, as are the ST-speaking Karen (KSK1, KSK2, KPA, and KPW). If residence pattern is influencing genetic variation, then lower within-population genetic diversity coupled with greater genetic heterogeneity among populations is expected for patrilocal groups than for matrilocal groups for the MSY, whereas the opposite pattern is expected for mtDNA (Oota et al. 2001). The MSY h and MPD values are higher for matrilocal groups, but not significantly (Mann-Whitney U tests: h: Z ¼ 1.4616, P > 0.05; MPD: Z ¼ 0.9744, P > 0.05); however, haplogroup diversity is significantly higher for the matrilocal groups (Mann-Whitney U tests: Z ¼ 2.1112, P < 0.05) (supplementary fig. 4, Supplementary Material online). For mtDNA, genetic diversity values are higher for patrilocal than for matrilocal groups, but the differences are not statistically significant (Mann-Whitney U tests: h: Z ¼ À0.9744, P > 0.05; MPD: Z ¼ À0.8120, P > 0.05; haplogroup diversity: Z ¼ À1.864, P > 0.05) (supplementary fig. 4, Supplementary Material online). Notably, TN1 and LW3 exhibit very low within-population diversity for the MSY, for example, MPD ¼ 20.07 and 23.07, compared with the average MPD (121.11), whereas TN1 and TN2 (20.69 and 26.14) show lower MPD than average (35.09) for mtDNA (supplementary table 1, Supplementary Material online). For genetic differences between-populations, the patrilocal Khmu, Lawa, and Palaung have significantly higher genetic differentiation for the MSY than for mtDNA (average U st ¼ 0.3109 for MSY and 0.0774 for mtDNA) (Mann-Whitney U tests: Z ¼ 3.5907, P < 0.01), whereas the matrilocal groups (Htin and Karen) also show higher average U st for MSY (0.1859) than for mtDNA (0.1553), but these are not significantly different (Mann-Whitney U tests: Z ¼ 0.3270, P > 0.05). Contrasting genetic differences for the MSY versus mtDNA of Lawa, Htin, and Karen are clearly seen in the multidimensional scaling (MDS) and DAPC plots ( fig. 4A  However, in general, the AA-speaking groups, whether identified as hill tribes or as other minorities, are patrilocal groups. The AMOVA result indicates that the variation among AA populations is higher in MSY (20.01%) than mtDNA (14.03%), in accordance with expectations if residence pattern is influencing genetic variation. Conversely, the TK populations, where neither patrilocal nor matrilocal residence is preferred, exhibit similar among-population variances for the MSY (4.48%) and mtDNA (4.33%) (table 1 and

Genetic Relatedness between Thai/Lao and Other Asian Populations
The MDS based on the MSY U st matrix of 73 populations from across Asia revealed that, in general, population clustering largely reflects linguistic affiliation (fig. 5), with some exceptions. In the first and second dimension, the AA populations are the most diversified, with the PL and MN appearing as outliers. There is one cluster of AA populations on the left, which also includes one TK group (BT2); the other AA populations are scattered along the main axis of the plot. Some Mon groups (MO2, MO4, and MO7) are relatively close to Indian and ISEA populations, indicating potential connections. Two central Thai groups (CT4 and CT7) are also relatively close to the Indian populations. The ST populations (Karen, Han Chinese, and Burmese) are rather close. The ISEA and Papuan populations are in closer proximity to South Asian populations (Indian, Bengali, and Punjabi). Generally, the haplogroup profile indicates genetic affinities between the Mon and South/Central Asian groups, which is consistent with the MDS plots (fig. 5) and results from previous mtDNA haplogroup analyses Kutanan, Kampuansai, Brunelli, et al. 2018).

The Expansion of Male Lineages
The Bayesian Skyline Plots (BSPs) of effective population size change (N e ) over time in each group reveal overall five different trends ( fig. 6). The most common trend, found in Mon, Khmer, Htin, Central Thai, and Black Tai, showed N e increasing gradually or remaining constant during 40-60 ka until a decline $5-7 ka, followed by rapid growth $5 ka and then a decrease $2.0-2.5 ka. The other trends differ from the first trend as follows: no population reduction $2.0-2.5 ka but population size either increases (Khon Mueang and Yuan) or remains stable (Lao Isan and Laotian); the Lue and Phuan show two increases in N e, at about $5 ka and $10 ka; the Lawa show a stable population size since $30 ka and then a decline during the last 2 ka with a sudden increase $1 ka; and the Karen differ only slightly from the common trend with a population increase $1 ka.
By contrast, the BSP based on mtDNA sequences for each ethnicity show three common trends ( fig. 6). The first trend is an increase in N e during 40-50 ka, followed by stability and then decrease $2 ka, which was observed in Mon, Htin, Lawa, Khmer, Yuan, Phuan, and Lue. The second pattern, shown by the Khon Mueang, is an increase in N e $ 40-50 ka, followed by stability and then increase again $10 ka, followed by a decline $2 ka. The Central Thai, Lao Isan, and Laotian show the third trend, in which population increases occur $40-50 and $10 ka. In general, the BSP by ethnicity indicated lower effective population sizes for the MSY than for mtDNA ( fig. 6).
We also plotted the BSP of several Asian populations from published MSY data (Karmin et al. 2015;Poznik et al. 2016) (fig. 7). Almost all of the MSEA and East Asian populations, that is, Kinh, Northern Han, Southern Han, and Japanese

MBE
show a pronounced increase of the MSY N e during $4-6 ka, except the Xishuangbanna Dai, in which there is an increase $2 ka. Around 5 ka, the Japanese show a decrease in N e before a sudden increase, suggesting a bottleneck prior to demographic expansion. Interestingly, the ISEA population shows a large increase in N e $ 35-40 ka and a smaller increase $2.5-3 ka. The South Asian populations, that is, Bengali, Punjabi, and Indians, also show two pulses of population increase at about the same times. The Punjabi also show an additional small increase in N e change during $12 ka.
The BSP by each major MSY haplogroup show four pulses of paternal N e increases, at $9-11 ka, $5 ka, $2.0-2.5 ka, and $1.0 ka ( fig. 8), in agreement with the plot by ethnicity. The early Holocene N e increment is obviously noticed in O2a1c* and O2a2a*, whereas the N e growth $5 ka is observed in O1b1a1a1b* and R*. Haplogroup O1a*, C* and D* show expansions in N e $ 2.0-2.5 ka and haplogroup N* shows a recent expansion $1.0 ka. In addition, there are two expansion times for O1b1a1a1a* and O2a2b* ($5 and $2 ka).

Demographic Models
Previously, we used mtDNA genome sequences and demographic modeling to test different hypotheses about the origins of TK groups. Specifically, we tested whether different TK groups were primarily related to local AA groups (reflecting cultural diffusion, i.e., an AA group switching to a TK language), to a TK group from southern China (reflecting demic diffusion, i.e., spread of TK languages via migration from southern China), or were related to both (reflecting admixture between an incoming TK group from southern China and a local AA group). We found that the Khon Mueang (from northern Thailand), Lao Isan (from northeastern Thailand), and Laotian most likely originated via demic diffusion from southern China without substantial gene flow from AA groups ). However, for the central Thai, the most likely scenario was demic diffusion with a very low level of gene flow between central Thai and Mon groups (Kutanan, Kampuansai, Brunelli, et al. 2018). Here, we use the same approach to test three demographic scenarios concerning the paternal origins of these major Thai groups (supplementary fig. 6, Supplementary Material online).
For the Khon Mueang (KM) people (Test 1), the highest posterior probability (0.80) and rather highly selected classification trees (0.58) were found for the demic diffusion model (supplementary table 3 , the cultural diffusion model had the highest posterior probability at 0.58 and was selected slightly more often among the classification trees (0.50) than the other models. However, a Principal Component Analysis plot shows that based on the first two PCs the observed data fall within the distributions simulated under the three models in only Test 4, whereas the other data sets fall within the simulated distributions for PCs 3 and 4, suggesting that there is low efficiency to reconstruct the variability of the observed data (supplementary fig. 7, Supplementary Material online). The parameter estimation for the best performing models in all five tests was able to obtain point estimates for each of the simulated effective population sizes (supplementary table 4, Supplementary Material online). However, the posterior distributions were generally flat (supplementary fig. 8, Supplementary Material online). We also calculated the MSY U st and corrected pairwise difference among groups of populations used in ABC tests to estimate their genetic relationships (supplementary table 5, Supplementary Material online). The KM are closer to the Dai than the local AA group (Test 1), the ethnic Lao and Laotian showed similar genetic differences to both Dai and AA groups (Test 2 and Test 3), whereas the CT groups (Test 5) have closer genetic relationships to the local AA group than to Dai. In contrast, mtDNA U st and corrected pairwise difference revealed that the KM and ethnic Lao are closer to the Dai than local AA, whereas the CT exhibited somewhat similar genetic distances to both Dai and AA. Overall, the simulations based on MSY sequences, compared with previous mtDNA simulation together with tests of genetic difference by U st and corrected pairwise differences, suggest different demographic histories for males and females in the region.

Discussion
In order to gain more insights into MSEA genetic history, we here investigate the paternal genetic variation and structure by sequencing $2.3 mB of the MSY from representative ethnolinguistic groups from Thailand and Laos. In sum, most of the studied populations exhibit two major MSY  fig. 1 and supplementary table 2, Supplementary Material online). We also compared patterns of MSY variation with mtDNA in the same set of populations and found some similar results, for example, overall lower genetic diversity and greater heterogeneity of AA groups than of TK and ST groups, large differences between the Mon and the other AA groups, and genetic connections between the Mon and central Thai (figs. 2-4). However, in many respects, the patterns of MSY and mtDNA variation are different, suggesting contrasting paternal and maternal genetic histories. Here, we focus on three groups of populations with different cultural practices and histories that also stand out in the genetic analyses: the hill tribes, the AA-speaking Mon, and the major TK-speaking groups.

Factors Influencing Contrasting Genetic Variation in the Hill Tribes
The hill tribes, who occupy the mountainous northern region of Thailand, are notable for their variation in patrilocal versus matrilocal residence pattern (Oota et al. 2001;Besaggio et al. 2007), as well as for their strong sense of group identity, which tends to isolate them from other groups (Schliesinger 2000;Nahhas 2007). If postmarital residence is influencing patterns of genetic variation, then the expectation is for larger between-group differences and smaller within-group diversity for patrilocal groups for the MSY, and the same trends for matrilocal groups for mtDNA. The first comparative study of mtDNA and MSY variation in patrilocal versus matrilocal groups was carried out in the northern Thai hill tribes and found a strong impact of postmarital residence on the mtDNA and MSY variation (Oota et al. 2001). However, previous studies compared genetic variation between partial mtDNA sequences and Y-STRs (Oota et al. 2001;Besaggio et al. 2007); here, we report the first comparison of mtDNA and MSY variation based on comparable sequence data.
Here, we analyzed the sequences of mtDNA genome and $2.3 mB of the MSY of the Khmu, Palaung, and Lawa groups, who practice patrilocality, whereas the Htin are matrilocal, similar to the ST-speaking Karen. The within-population genetic diversity values are in agreement with expectations, that is, greater diversity of matrilocal than patrilocal groups for MSY and the opposite trend in mtDNA (supplementary fig.  4, Supplementary Material online). Moreover, genetic differentiation between populations also goes in the direction predicted by postmarital residence pattern. However, in many cases, the differences between patrilocal and matrilocal groups are not significant, indicating that other factors are also having an effect. In particular, the Htin (TN1) and Lawa (LW3) exhibit very low within-population diversity for the MSY, whereas the Htin (TN1 and TN2) also show lower diversity for the mtDNA ( fig. 2A-C).
One factor in particular that could influence the withinpopulation genetic diversity and between-population differentiation is geographic isolation, which enhances genetic drift and inbreeding, thereby lowering within-population genetic diversity and increasing between-population differentiation. This could explain the very low internal diversity and high differentiation from other groups of some groups of Htin (TN1) and Lawa ( fig. 4A and B and supplementary fig. 3, Supplementary Material online) that live in mountainous, isolated parts of northern Thailand. The Lawa furthermore favor intramarriage (Nahhas 2007) which would also reduce genetic variation in this group. The Htin (TN1) also show very low diversity and extreme divergence in genome-wide single nucleotide polymorphisms data (Xu et al. 2010) and both Htin (TN1-TN3) and Lawa (LW3) exhibit lower diversity and large differentiation in autosomal STRs . Such drastic genetic drift effects could reduce the significance of the impact of postmarital residence on patterns of genetic diversity.
Moreover, these results are in keeping with previous observations that although the expected difference between patrilocal and matrilocal groups holds in some regions (Oota et al. 2001;Besaggio et al. 2007), in other regions patterns of mtDNA and MSY variation do not conform to expectations (Kumar et al. 2006;Arias et al. 2018). This is indeed to be expected given that many other factors, for example, other human cultures (e.g., linguistic exogamy), physical landscape, and subsistence strategies, influence patterns of genetic variation (Wilkins and Marlowe 2006;Chaix et al. 2007).

Genetic Variation and Origin of the Mon
The Mon groups showed genetic differences from other AA populations but closer relatedness to the TK populations, especially the central Thai, in both MSY and mtDNA (figs. 2A, B, 3A, and B). Our previous simulation results, based on mtDNA, also supported admixture among the Mon and central Thai groups (Kutanan, Kampuansai, Brunelli, et al. 2018). In addition, some Mon groups (MSY: MO3, MO5, MO6 and mtDNA: MO2, MO3 and MO4) exhibit genetic affinities with the Karen (fig. 3B), reflecting genetic heterogeneity and contrasting genetic patterns between MSY and mtDNA. Admixture might be an important factor influencing the genetic structure of the lowland AA-speaking Mon. Archaeological evidence indicates that the Dvaravati civilization of the Mon was centered in present-day central Thailand and southern Myanmar and had expanded to a large part of MSEA during the sixth to seventh century AD (Diffloth 1984;Guillou 1999;Saraya 1999). After the intensification of Thai and Burmese kingdoms, the Mon in Myanmar were conquered by the Burmese during the 18th century AD; the ethnic Mon in Myanmar are currently concentrated in the Mon and Karen States (Pon Nya 2001). In Thailand, the present-day Mon are distributed in central Thailand and surrounding areas, with some groups living in the North and the Northeast. However, they are not considered to be the descendants of the ancient Mon Dvaravati civilization in Thailand, but rather political refugees that fled from Myanmar to Thailand during the 16th to 19th centuries AD (Ocharoen 1998). However, based on linguistic evidence, the remnants of the Dvaravati Mon population are now Kutanan et al. . doi:10.1093/molbev/msz083 MBE considered a distinct ethnic group known as the Nyahkur (BO) whose communities are restrict found in hilly areas along the border between central and northeastern Thailand (Diffloth 1984). In contrary to linguistic evidence, the Nyahkur has no shared haplotype or related to any specific Mon groups, indicating their genetic differences. However, Nyahkur show genetic sharing in both MSY and mtDNA with the Khmer groups ( fig. 3A) which reflects their previous connection. In addition, the high frequency of MSY haplogroup O2a* and C* ( fig. 1), close genetic relationship to many TK-and ST-speaking groups ( fig. 3B) and highest MPD value for MSY ( fig. 2C) indicated later extensive gene flow, promoting the paternal difference of Nyahkur from the Mon and also other AA groups.
Previous genetic studies of G6PD mutations reported a high prevalence of the Mahidol type G6PD deficiency in the Mon, Burmese, and Karen, different from Thai, Laotian, and Khmer groups exhibiting the Vientiane-type G6PD mutation (Iwai et al. 2001;Matsuoka et al. 2005;Nuchprayoon et al. 2008). Thus, both our results and previous studies indicate a close genetic relationship among Mon, Burmese, and Karen in Myanmar, suggesting a common origin or extensive gene flow. Our previous mtDNA study also revealed genetic relations between some Mon groups (MO1 and MO5) and Burmese, with both of them close to some Indian populations, whereas other Mon groups are closer to the Karen groups (MO2, MO3, and MO4) (see details in Kutanan, Kampuansai, Brunelli, et al. [2018]). In general, genetic mixing among Mon, Karen, and Myanmar might have happened before the arrival of the Mon to Thailand, whereas mixing among the Mon and central Thai would have occurred after the arrival of the Mon. However, MSY data for the Burmese are limited, and further MSY studies of populations from Myanmar are needed to confirm this scenario.
A connection between Indian groups and the Mon is suggested by South/Central Asian MSY lineages in the Mon, for example, R*, H*, J*, L*, and Q* ( fig. 1), consistent with some mtDNA lineages, for example, W3a1b, M6a1a, M30, M40a1, M45a, and I1b Kutanan, Kampuansai, Brunelli, et al. 2018). Thus, both mtDNA and the MSY indicated contact between the ancestors of the Mon and Indian. Archaeological evidence also suggests Indian influences, for example, the symbolism on the Dvaravati coin which indicates the importance of royalty, and includes several motifs associated with Indian precedents of the first to fourth century AD (Higham and Thosarat 2012).

Demographic Changes
Demographic expansion of Thai/Lao populations is noticeably detected in both paternal and maternal lineages at the beginning of the Holocene, $10 ka ( fig. 6). In this period, increasing and more stable temperatures might have facilitated population expansion (Wen et al. 2016). The male N e increase during the Holocene is primarily driven by the O2a2a* and O2a1c* lineages (fig. 8). The Holocene expansion might thus be related to an expansion of HM paternal lineages, as O2* (O-M122*) is thought to have arisen at the beginning of the Holocene near Tibet (van Driem 2017).
According to this hypothesis, the bearers of this haplogroup became the progenitors of the "Yangtzean" or HM paternal lineages, and contributed this lineage to the ancient AA who carried O1b1a1a* or O-M95* by sharing of knowledge about rice agriculture. However, further sequencing of MSY lineages belonging to the HM populations are needed to verify this hypothesis.
During the Neolithic period, other significant expansions are observed in almost all ethnicities and many MSY haplogroups, that is, O1b1a1a1b*, O1b1a1a1a*, and R* ( fig. 8). Previously, it was suggested that the demographic expansion pattern in the Neolithic in SEA shows strong expansion dynamics, different characteristics than the Paleolithic expansion, and sex-specific expansion patterns, with earlier expansions in female than in male lineages. (Wen et al. 2016). The expansion signals in our results coincide with the beginning of the SEA Neolithic $5-4.5 ka, during which farming expanded from China to SEA (Bellwood 2018). The farming technology for food production could support a higher population density than hunting-gathering, as agriculture could produce a more steady food supply, and males could avoid hunting dangerous animals; thus, effective population size would increase (Jobling et al. 2004;Yan et al. 2014). The farmer expansion $4 ka was probably related to ancestral AA-speaking hill tribes with predominantly O-M95* lineages that knew rice agriculture (van Driem 2017; Lipson et al. 2018;McColl et al. 2018). However, the movement of Neolithic groups from southern China to MSEA probably involved not only AA groups but also TK groups (Bellwood 2018). In our study, a Neolithic expansion signal was observed for the MSY in all studied groups, indicating a large demographic expansion and probable admixture among the ancestors of indigenous southern Chinese groups during the Neolithic period. Haplogroup R1a was previously suggested to show a similar expansion, with paternal population growth during $6.5-4 ka observed globally Wang et al. 2016).
In addition, we found another significant expansion during the Bronze age $2 ka that involves TK-speaking populations, reflected by some haplogroups prevalent in the TK, for example, O1a* ( fig. 8). This TK-related expansion is consistent with the strong expansion detected in the BSP of Xishuangbanna Dai ( fig. 7) and corresponds with the results of a recent ancient DNA study (McColl et al. 2018). The southward expansion of the indigenous southern Chinese TK speakers to MSEA was probably driven by the Han Chinese expansion from the Yellow River basin to southern China during the Qin dynasty, starting $2.5 ka (Bellwood 2018). The migration and expansion of prehistoric TK groups during the Bronze Age has had a profound influence on the modern Thais and Laotians in term of languages and genes. Nowadays, TK languages are mostly concentrated in presentday Thailand and Laos, and the relatively high level of TK genetic homogeneity might be also driven by this recent expansion.
Our previous mtDNA modeling to explore the migration and expansion of prehistoric TK groups during the Bronze Age supported the spread of TK languages via demic diffusion Genetic Histories of Thai and Lao Populations . doi:10.1093/molbev/msz083 MBE and admixture Kutanan, Kampuansai, Brunelli, et al. 2018). Here, a similar modeling approach for the MSY data found weak support for cultural diffusion of TK languages. Although we built the model based on historical sources (supplementary fig. 6, Supplementary Material online), the models did not generate the observed variation (supplementary fig. 7, Supplementary Material online), indicating that the analyzed models do not correspond to the real paternal population history. A possible reason for this striking difference between maternal and paternal histories might be warfare. Historically, many areas of Thailand saw frequent warfare involving various TK groups $200-500 ya (Penth 2000). As a result, forced migrations were imposed upon the losing side and men were taken captive more often than women because men could be used to strengthen the victors' armies. This could result in a different history for the TK male versus female population. More complex demographic models could therefore more accurately capture the paternal history of Thai/Lao populations.
It may be that the MSY sequences do not harbor enough information to distinguish among the different demographic scenarios. However, comparison of genetic differences (U st and corrected pairwise differences) among the groups used in the simulations does support a real contrast in the maternal versus paternal histories for the major TK groups in each region, and also finds genetic heterogeneity among these major groups. The northern Thai people showed closer genetic relationship with the Dai than AA groups in both mtDNA and MSY, supporting the demic diffusion model, whereas the ethnic Lao are closer to Dai for mtDNA but for MSY they are related to both Dai and AA rather equally, suggesting demic diffusion for the maternal history and admixture for the paternal history. The central Thai MSY sequences could be of AA origin because they are genetically more similar to the AA groups than the Dai, supporting cultural diffusion, but for mtDNA they are related to both Dai and AA rather equally, supporting admixture in central Thailand as found previously (Kutanan, Kampuansai, Brunelli, et al. 2018). Overall, these results suggest that the demographic history of Khon Mueang, ethnic Lao, and central Thais are different, possibly reflecting either different migration routes or different small TK groups that expanded from China (Higham and Thosarat 2012). In addition, different patterns of admixture for males versus females could have occurred in ethnic Lao and central Thais. Archaeological and historical evidence indicate that prior to the TK migration, there were existing rich civilizations in the area, for example, the Dvaravati of the Mon and Chenla of the old Khmer. With the arrival of TK groups, the Mon people were incorporated by intermarriage into Tai society and adopted the increasing dominant Thai language as their own (Higham and Thosarat 2012). Our results suggest that there was variation in the pattern of cultural diffusion/admixture involving males versus females in different groups in the area of northeastern and central Thailand and Laos. Such admixture could also have had an impact on the patterns of genetic diversity in the matrilocal versus patrilocal groups, which might then contribute to diminishing the genetic signal attributable to residence pattern.
Finally, another more recent expansion signal was detected in the northern Thai AA-speaking Lawa, involving haplogroups O2a2b* and N* (figs. 6 and 8). Historical evidence indicates that after the arrival of the TK groups in northern Thailand, the native Lawa groups were fragmented and moved to the mountains (Penth 2000), resulting in cultural and geographical isolation. In support of this model of isolation and drift, we note that the most negative Tajima's D value is observed in the LW3 group, which suggests population expansion after a bottleneck ( fig. 2D).

Conclusion
We compared high-resolution mtDNA and MSY sequences and found contrasts in the maternal and paternal genetic history of various Thai/Lao groups, in particular the hill tribes, the major TK groups in different regions, and the AA-and STspeaking groups, as well as significant genetic heterogeneity among samples from the same ethnolinguistic group from different locations (figs. 1 and 4). These contrasting patterns reflect the influence of different factors in different Thai/Lao groups, for example, cultural practices in the hill tribes coupled with genetic drift in some population, as well as gene flow in the lowland Mon and TK groups. This new MSY study from Thai/Lao males provides more insight into the past demographic history in the paternal line and, along with our previous mtDNA studies, is generally in agreement with recent ancient DNA studies in SEA that indicate two demographic expansions from southern China to MSEA, with the first involving the ancestors of AA groups and the second involving TK groups (Lipson et al. 2018;McColl et al. 2018). Overall, the contrasting results for the maternal versus paternal history of some Thai/Lao groups supports the importance of detailed studies of uniparental markers, as such contrasts would not have been revealed by studying autosomal markers in just a few Thai/Lao groups. Additional ancient DNA studies, coupled with more detailed genome-wide data from present-day populations, will provide a complete reconstruction of the genetic history of this region.

Studied Populations
Genomic DNA was extracted from blood, buccal swab or saliva of 914 males belonging to 57 populations that were classified into 26 ethnolinguistic groups, as described previously Kutanan, Kampuansai, Changmai, et al. 2018) ( fig. 1 and supplementary

MSY Sequences
We prepared genomic libraries for each sample using a double index scheme (Kircher et al. 2012) and enriched the libraries for $2.34 mB of the MSY via in-solution hybridizationcapture using a previously designed probe set (Kutanan, Kampuansai, Brunelli, et al. 2018) and the Agilent Sure Kutanan et al. . doi:10.1093/molbev/msz083 MBE Select system (Agilent, CA); further details on the probe design are provided in supplementary table 6, Supplementary Material online. Sequencing was carried out on the Illumina HiSeq 2500 platform with paired-end reads of 125-bp length. Standard Illumina base-calling was performed using Bustard. Illumina adapters were trimmed and completely overlapping paired sequences were merged using leeHOM (Renaud et al. 2014). Demultiplexing of the pooled sequencing data was done by deML (Renaud et al. 2015). The alignment and postprocessing pipeline of the sequencing data was described previously (Kutanan, Kampuansai, Brunelli, et al. 2018).

Genetic Relationships
To investigate the paternal relatedness between populations, we performed a DAPC (Jombart et al. 2010). We grouped our samples based on population sampled, geographic location, and ethnicity (supplementary table 1, Supplementary Material online) before running the analysis for 100,000 iterations using adegenet 1.3-1 (Jombart 2008).
A correspondence analysis based on MSY haplogroup counts was performed using STATISTICA 13.0 (StatSoft, Inc., USA). Haplogroup assignment was performed by yHaplo (Poznik 2016). The R package (R Development Core Team 2016) was used to carry out a nonparametric MDS analysis (based on U st values of MSY and mtDNA), the MDS heat plot with five dimensions, showing perdimension standardized values between 0 and 1, and heat plots of the U st distance matrix and the matrix of shared haplotypes.
To get a broad picture of population relationships in Asia, we included 552 MSY sequences from Asian groups for comparison We downloaded the published Y chromosome sequencing data from the SGDP data set (https://sharehost. hms.harvard.edu/genetics/reich_lab/sgdp/Y-bams/Y.tar; last accessed June 25, 2018) (Mallick et al. 2016), the 1000 Genomes Project (1000Genomes Project Consortium et al. 2015) and the study of Poznik et al. (2016). We merged and processed all sequencing data through the same pipeline as the samples in our study (Kutanan, Kampuansai, Brunelli, et al. 2018). The resulting variant file was merged with data from previous study (Karmin et al. 2015; http://evolbio.ut. ee/chrY/; last accessed June 25, 2018) using Heffalump v0.2 (https://bitbucket.org/ustenzel/heffalump; last accessed June 25, 2018). We subset the variant file to sites that were overlapping the regions present on our capture bait and to samples that had a major haplogroup that was also present in our data set. These samples were combined with our samples; we then removed variant sites for which <25% of the samples had genotype information, and samples that had >25% of all sites with missing genotype information. The resulting data set provides 16,684 variable sites, which was imputed using BEAGLE v4.1 (Browning and Browning 2016). Additional details on these populations are provided in supplementary table 7, Supplementary Material online.

Bayesian Skyline Plots
Based on Bayesian Markov Chain Monte Carlo analyses, BEAST 1.8.4 was used to construct BSPs by ethnicity and by haplogroup (Drummond et al. 2012). To avoid a false detection of bottlenecks stemming from the sample collection procedure (Heller et al. 2013), we pooled all populations within the same ethnicity and ran jModel test 2.1.7 (Darriba et al. 2012) to select the most suitable model for each run during the creation of the input file for BEAST via BEAUTi v1.8.2. We used an MSY mutation rate of 8.71 Â 10 À10 substitutions/bp/year (Helgason et al. 2015), and the BEAST input files were modified by an in-house script to add in the invariant sites found in our data set. Both strict and log normal relaxed clock models were run for each ethnicity and haplogroup, with marginal likelihood estimation (Baele et al. 2012(Baele et al. , 2013. After each BEAST run, the Bayes factor was computed from the log marginal likelihood of both models to choose the best-fitting BSP. Tracer 1.5.0 was used to check the results. We also performed the BSP of compared populations, that is, Dai, Kinh, Southern Han, Northern Han, and Japanese from published MSY sequences . The BSPs by ethnicity based on mtDNA genomes were carried out in a previous study (Kutanan, Kampuansai, Changmai, et al. 2018).

Approximate Bayesian Computation
In order to investigate the paternal origin of TK groups in Thailand/Laos and their local histories, we employed five data sets (encompassing northern Thailand, central Thailand, and northeastern Thailand and Laos) and compared three competing scenarios: demic diffusion (i.e., a migration of people from southern China, who are then the ancestors of presentday Thai/Lao TK people); cultural diffusion (i.e., the Thai ancestors were the native AA groups who shifted languages and culture to TK) and continuous migration (i.e., gene flow between a migrant TK and native AA groups) that were developed based on known historical hypotheses (supplementary fig. 6, Supplementary Material online). The immigrant and endogenous scenarios postulated an initial split of AA Genetic Histories of Thai and Lao Populations . doi:10.1093/molbev/msz083 MBE and Dai populations, with a subsequent treelike split of the target group from Dai (immigrant) or AA (endogenous) populations. The continuous migration model not only shared the same demographic history as the immigrant model but also allowed subsequent bidirectional migration between the newly originated population and the AA population. All of the simulations assumed uniform population sizes, fixed separation times based on historical records, a fixed mutation rate of 8.71 Â 10 À10 substitutions/bp/year (Helgason et al. 2015), and a prior distribution for both effective population sizes and migration rates (supplementary table 4, Supplementary Material online). Finally, due to the uneven sample size between the tested groups, we simulated a number of individuals equal to the lowest sample size among the populations in the model.
We simulated the derived site frequency spectrum (unfolded-SFS) for 2,364,048 loci using the fastsimcoal simulator (Excoffier and Foll 2011) with the flag -s, through the software package ABCtoolbox (Wegmann et al. 2011) and running 50,000 simulations for each model. The observed SFS was calculated with the software 4P (Benazzo et al. 2015). To determine the best performing scenario in each set we employed the model selection procedure ABC-RF (Pudlo et al. 2016), which relies on random forest machine learning methodology (Breiman 2001). This classification algorithm is trained on a reference table of simulations and allows the prediction of the most suitable model at each value of a set of covariates (i.e., the summary statistics). Additional details concerning the ABC-RF analyses are described in our previous study (Kutanan, Kampuansai, Brunelli, et al. 2018).

Data Availability
All reads that aligned to the region of the MSY that was targeted by the capture-enrichment array were deposited in the European Nucleotide Archive (ENA) (study ID: PRJEB31636).

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.