Genetic polymorphisms and phylogenetic characteristics of Tibeto-Burman-speaking Lahu population from southwest China based on 41 Y-STR loci

Abstract Background Male sex-linked Y-chromosome short tandem repeats (Y-STRs) have been widely used in forensic cases and population genetics research. At present, the forensic-related Y-STR data in the Chinese Lahu population are still poorly understood. Aim To enrich the available Y-STR data of this Chinese minority population and investigate its phylogenetic relationships with other reported populations. Subjects and methods The genetic polymorphisms of 41 Y-STR loci were analysed in 299 unrelated healthy Lahu male individuals from Southwest China. Phylogenetic analyses were performed by multidimensional scaling analysis and neighbor-joining phylogenetic tree construction. Results A total of 379 alleles were observed at the 41 Y-STR loci. The allele frequencies ranged from 0.0033 to 0.9666. The genetic diversity values ranged from 0.0653 to 0.9072. A total of 254 different haplotypes of the 41 Y-STR loci were observed in 299 individuals. The values of haplotype diversity, haplotype match probability, and discrimination capacity were 0.9987, 0.0047, and 0.8495, respectively. The phylogenetic analysis indicated that the Tibeto-Burman-speaking Lahu population showed a close genetic relationship with the Yunnan Yi population. Conclusions The haplotype data of the present study can enrich the forensic databases of this Chinese minority population and will be useful for population genetics and forensic DNA application.


Introduction
The Lahu population is one of the oldest ethnic groups in China.At present, the Lahu people are mainly distributed in Yunnan Province, Southwest China and some other Lahu people live in Thailand, Myanmar, Laos, and other Southeast Asian countries (https://www.neac.gov.cn/seac/ztzl/lhz/gk.shtml).According to China's seventh national census in 2020, the total population of Lahu in China was 499,167, of which about 98.66% of the Lahu people live in Yunnan Province (http:// www.stats.gov.cn/).Based on the historical records, the Lahu people originated from the ancient Di-Qiang people, who lived in the area of Qinghai Lake.After continuous migration, differentiation, evolution, and integration, part of the ancient Di-Qiang people gradually formed the present Lahu people in Yunnan (https://www.neac.gov.cn/seac/ztzl/lhz/lsyg.shtml).The Lahu people have their own language (Lahu language), which belongs to the Yi language branch of the Tibeto-Burman subgroup of the Sino-Tibetan family (Ma 2003).
The Y-STR on the non-recombining part of the Y-chromosome with paternally inheritable capability is a valuable tool in the investigation of human evolution, population history, genealogy, forensics, and male medical genetics (Jobling and Tyler-Smith 2017).At present, the forensic-related Y-STR data in the Chinese Lahu population are still poorly understood.Therefore, to enrich the genetic data of the Y-STR and establish a Y-STR reference database of Lahu population male individuals of Yunnan Province in China, we first applied the YanHuangYDatabaseTyping kit (Shenzhen Huada Forensic Technology Co., Ltd., Shenzhen, China) to genotype the haplotypes in 299 unrelated healthy Lahu male individuals residing in Yunnan Province.Furthermore, we explored the phylogenetic relationships between the Lahu population and other reported populations.

Sample collection
Blood samples were collected from 299 unrelated healthy Lahu male individuals living in Yunnan province, Southwest China.All samples were obtained from participants under informed consent in compliance with the ethical standards of the Helsinki Declaration.This study was approved by the Ethics Committee of Kunming Medical University, Kunming, Yunnan Province, People's Republic of China (No. KMMU2020MEC013).

Data analysis
Allele and haplotype frequencies of each locus were calculated by direct counting.Genetic diversity (GD) and haplotype diversity (HD) were calculated following the formula: GD/HD ¼ n (1 À P Pi 2 )/(n À 1), where Pi indicates the relative frequency of the i-th allele or haplotype and n denotes the sample size (Nei 1973).Haplotype match probability (HMP) and discrimination capacity (DC) values were calculated according to the formulas: HMP ¼ P Pi 2 (Pi is the frequency of the haplotype), and DC ¼ N diff /N (N diff and N mean the number of different haplotypes and the sample size, respectively).The pairwise genetic distances (Rst) between different populations were calculated by analysis of molecular variance (AMOVA) and visualised in a multidimensional scaling (MDS) plot using the online tool of the YHRD database (http://www.yhrd.org)(Willuweit and Roewer 2015).A heatmap of pairwise Rst values was performed by R Software v. 3.3 using the heatmap package.The neighbor-joining (N-J) phylogenetic tree was constructed based on the pairwise Rst matrix using the MEGA v6.0 software (Rzhetsky and Nei 1993;Tamura et al. 2013).

Haplotype frequencies and forensic parameters
A total of 299 male individuals were successfully genotyped, and the allele frequencies and GD values of 41 Y-STR loci are shown in Supplementary Table S1.A total of 379 alleles were observed, among which DYS385, DYF387S1, DYS527, and DYF404S1 were multi-copy loci with 44, 37, 31, and 23 alleles detected, respectively, and the rest were single-copy loci with 2-14 alleles detected at each locus (see Table 1).The allele frequencies ranged from 0.0033 to 0.9666.The GD values spanned from 0.0653 (DYS645) to 0.9072 (DYF387S1).We found that the rapidly mutating (RM) Y-STR loci (DYF387S1, DYF404S1, DYS449, and DYS518) in this system exhibited higher genetic diversity (GD > 0.8) than single-copy loci in the Yunnan Lahu population.Among the 299 male individuals, 254 different haplotypes were observed in the 41 Y-STRs system (including RM Y-STRs) with haplotype discrimination capacity being 84.95%, and 246 in the 37 Y-STRs system (not including RM Y-STRs) with haplotype discrimination capacity being 82.27%, representing a 2.68% increase in discrimination capacity with RM Y-STRs relative to other Y-STRs in 299 Lahu male individuals.A previous study also indicated that adding RM Y-STRs would improve the efficiency of the system and greatly increase male relative differentiation capacity (Ballantyne et al. 2010).The micro-variant alleles were observed at loci DYS531 (11.1),DYS510 (16.3),DYS518 (37.2),DYS385 (14, 15.1; 14, 18.2), and DYS527 (20, 21.3; 21.2, 24) (see Table 1).A total of 254 different haplotypes were observed at the 41 Y-STR loci from 299 unrelated male individuals, among which 218 were unique, 30 were shared by two individuals, five were repeated three times, and one was observed six times (Supplementary Table S1).The values of HD, HMP, and DC were 0.9987, 0.0047, and 0.8495, respectively.

Population comparison study
To further explore the phylogenetic relationships between the studied population and other groups, 22 different groups were obtained from the YHRD database including 18 Chinese groups and four worldwide groups.Yunnan, China     S2).
The pairwise Rst matrix and associated P values among 23 populations are shown in Supplementary Table S3, and the heatmap of pairwise Rst for 23 populations is shown in Figure 1.Significant differences were observed between the Lahu population and all compared populations (p < 0.0022, after Bonferroni correction).The results indicated that Yunnan Lahu was most closely related to Yunnan Yi (Rst ¼ 0.0220), followed by Yunnan Hui (Rst ¼ 0.1437) and Qinghai Salar (Rst ¼ 0.1438), whereas the Aba Tibetan (Rst ¼ 0.4086) and Garze Tibetan (Rst ¼ 0.3741) showed the largest genetic distance with Yunnan Lahu.The MDS plot and phylogenetic relationship reconstruction based on the Rst distance matrix further revealed this genetic relationship.As presented in the MDS plot (Figure 2), Yunnan Lahu, Yunnan Yi, Qinghai Salar, and Yunnan Hui were in the first quadrant and Yunnan Lahu was relatively isolated in the top right corner.Three Tibetans (Aba, Garze, and Qinghai) and two Mongolians (Hohhot and Ordos) grouped together separately in the second quadrant and two Koreans (South Korea and Seoul) clustered together in the fourth quadrant.Chinese Han from different regions (Fujian, Jiangsu, Guizhou, Sichuan, Yunnan, Inner Mongolia, and Shaanxi) clustered together closely and are located in the fourth quadrant.Balochistan Hazara and Maryland Asian Americans were far away from the Yunnan Lahu.As shown in the N-J tree (Figure 3), Yunnan Lahu was first clustered with Yunnan Yi, followed by Qinghai Salar and Yunnan Hui.Yunnan Lahu was far from the Koreans and the Tibetans in the N-J tree.The Chinese Han were gathered into a cluster.On the whole, the population structure and distributions were roughly congruent with the results of the corresponding MDS plot.
The phylogenetic analysis showed that the Yunnan Lahu were genetically closer to the Yunnan Yi, which is consistent with the findings of Zhang et al. (2022).According to historical records, the ancient Di-Qiang tribes once active in most areas from the Central Plains to northwest and southwest China, migrated to the west and southwest around 2000-5000 years ago, and gradually evolved into the earliest indigenous peoples in Yunnan, including the Lahu, Yi, Bai and other ethnic groups (Dong et al. 2004).Linguistically speaking, the Lahu language and Yi language belong to the Yi language branch of the Tibeto-Burman languages (Song et al. 2021).The Yi language and Lahu language have a common historical origin and both possess the same phonetic characteristics, indicating the historical relationship between the two ethnic groups (Zhou 1998).Additionally, we observed Chinese Han from different regions clustered together closely in the phylogenetic tree and MDS plot, which reveals the genetic homogeneity among the Chinese Han (Fan et al. 2021).

Conclusions
In summary, the current study reported the genetic polymorphisms and phylogenetic characteristics of 41 Y-STR loci from Lahu male individuals residing in Yunnan Province, Southwest China.The haplotype data of the present study will not only enrich the available Y-STR data of this Chinese minority population, but will also be useful for population genetics and forensic DNA application.

Figure 1 .
Figure 1.Heatmap of pairwise Rst values among the studied population and 22 reference populations.

Figure 2 .
Figure 2. Multidimensional scaling (MDS) plots showed the genetic correlation between our subject and 22 reference populations.

Table 1 .
Allele frequencies and genetic diversity (GD) of the 41 Y-STR loci in Yunnan Lahu population (n