Genetic polymorphism of 27 Y-STR loci in Kazakh populations from Eastern Kazakhstan

Abstract Background The establishment of a national haplotype database is important for forensic and genetic applications and requires studying genetic polymorphisms at Y-STR sites. However, the genetic structure of the Eastern Kazakhstan population is poorly characterised. Aim To investigate the genetic polymorphisms of 27 Y-STR loci in the Kazakh population from Eastern Kazakhstan and analyse the population genetic relationships of the Eastern Kazakhs with other populations. Subjects and methods The Yfiler Plus kit was utilised to genotype 246 healthy, unrelated males from Eastern Kazakhstan. Based on the raw data, haplotype and allele frequencies along with forensic parameters were calculated, and an MDS plot was constructed. Results A total of 207 haplotypes were detected, of which 186 were unique. The haplotype diversity and discrimination capacity were 0.997 and 0.841, respectively. Population comparisons showed that Eastern Kazakhs have close genetic relationships with Kazakhs from Xinjiang, China. At the same time, a difference was found between the studied population and the previous one in the same part of Kazakhstan. Conclusions The obtained haplotypes will help to expand the Kazakhstan Y-chromosome reference database and will be useful for future genetic research and forensic applications.


Background
The analysis of Y-chromosomal short tandem repeats (Y-STRs) is extensively utilised in forensics (Kayser 2017), genealogical and anthropological studies, as well as population genetics (Jobling and Tyler-Smith 2003;Purps et al. 2014;Xu et al. 2015).Y-STRs are crucial and stand out, in particular, when it comes to forensic investigations since they allow the detection of male individuals in mixed male and female stains (Roewer 2009).As a result, several commercial Y-STR kits became available and were utilised to expand public Y-STR reference databases (Oostdik et al. 2014;Gopinath et al. 2016).The National Database "Kazakhstan" of the Y-Chromosome STR Haplotype Reference Database contains 2294 haplotypes, of which only 723 are of Kazakhs that were characterised using the 27 Y-STR system; there are 382 samples of Kazakhs from Northern Kazakhstan (Ashirbekov et al. 2022) and 341 samples of the general Kazakh population (Zhabagin et al. 2019;YA004185), in which Kazakhs from Eastern Kazakhstan are represented by only 17 samples.Most of the haplotypes presented in YRHD were obtained using the 17 Y-STR system (Khussainova et al. 2021).
Population studies of the Central (Balanovsky et al. 2015), Western (Zhabagin et al. 2021), Southern (Zhabagin et al. 2020), and Eastern (Tarlykov et al. 2013) regions were also limited to only 17 Y-STR markers.The population sample size of previously studied Eastern Kazakhs was only 67, the smallest among all other regions.Furthermore, a low degree of genetic diversity was observed for 17 Y-STR data from Eastern Kazakhs (Tarlykov et al. 2013).
Eastern Kazakhstan occupies a large area (401.8 thousand km 2 ) in the east of the country and is a border area with China and Russia.Along the border stretches the Altai mountain range, where the steppes on both sides are a historical corridor of bilateral latitudinal migrations and the mixing of Asians and Europeans (Gonz alez-Ruiz et al. 2012).Recent studies of the ancient population of this region, the Scythians, demonstrate their high genetic diversity (Gnecchi-Ruscone et al. 2021).Several caravans on their way to the Dzungarian Gates went through the area.The Dzungarian Gates are a natural passageway between the Dzungarian Alatau and the Barlyk Range.The Great Silk Road also went through this area.With these connections, it is not surprising to expect a large genetic diversity in the population of this region.Most of the people in the area are Kazakhs, especially from the Naiman and Kerei tribes (Zhabagin et al. 2018).
Thermo Fisher's Yfiler Plus, which was used in the current study, allowed for the characterisation of 27 Y-STR loci, including nine rapidly mutating ones, increasing the discrimination power of the analysis (Ballantyne et al. 2010;Ballantyne et al. 2012).This is especially of current interest to the Kazakh population, which consists of tribes and clans.Members of such patrilineal groups are closely related to each other.Consequently, it is important for forensic genetics to distinguish between them.Therefore, this study will contribute to population genetics research in East Kazakhstan and analyse the genetic links between the Eastern Kazakh population and other Kazakh groups.

Sample
To conduct research, 246 saliva samples were collected from healthy Kazakh male volunteers, with direct family history of being residents of the Eastern Kazakhstan region for at least three generations.In addition, they had not been connected in at least three generations.All participants in this study provided their written informed consent prior to sample collection.The study was approved by the Ethics Committee of the National Centre for Biotechnology (No. 2 of 1 August 2019) and the Ethics Committee of the Asfendiyarov Kazakh National Medical University for the M. Aitkhozhin Institute of Molecular Biology and Biochemistry (#6 of 29 October 2012).All experimental procedures were performed following the standards of the Declaration of Helsinki 1964.

DNA extraction and Y-STR fragment analysis
Genomic DNA was isolated from the saliva samples using the Wizard Genomic DNA Purification Kit (Promega Corporation, Madison, WI, USA).Amplification of 27 STR loci of the Y chromosome was performed using the YfilerV R Plus PCR Kit (Thermo Fisher Scientific, Waltham, MA, USA) on a SimpliAmp Thermal Cycler (Thermo Fisher Scientific, Waltham, MA, USA).Amplicon fragmentary analysis was performed on an 8-capillary Applied Biosystems 3500 genetic analyser (Thermo Fisher Scientific, Waltham, MA, USA).Further, the results were analysed using the GeneMapper IDx v.1.5software.(Thermo Fisher Scientific, Waltham, MA, USA) according to the reference allelic ladder.

Data management and statistical analysis
The haplotype data were submitted to the YHRD (http:// www.yhrd.org)with the accession number YA006011.To contribute to the haplotype data, the laboratories passed the Quality Control Test of the YHRD (YC000343).
The direct counting approach was utilised to determine the allele and haplotype frequencies in the Eastern Kazakh population.The number of analysed samples (n) and the frequency of the i-th allele or haplotype were used to calculate the gene diversity (GD) and haplotype diversity (HD) using the formula (Nei and Tajima 1981)

Haplotype/allele frequencies and forensic parameters
Supplementary Table S1 shows the haplotype distribution of Kazakhs from Eastern Kazakhstan.246 samples yielded 207 different haplotypes by 27 Y-STR, the frequencies of which are presented in Table S2.The number of unique haplotypes is 186 (90%), while the other 21 (10%) were observed at least twice.The most prevalent haplotype appeared in 10 individuals, followed by another that was detected in 5 individuals; 7 haplotypes were shared among 3 people; and 12 haplotypes appeared twice.Haplotype diversity (HD), discrimination capacity (DC), and haplotype match probability (HMP) were equal to 0.997, 0.841, and 0.007, respectively (Table 1).
Previously, low values of haplotype diversity (HD ¼ 0.629) and discrimination capacity (DC ¼ 0.338) on 17 Y-STR were reported for Eastern Kazakhs (Tarlykov et al. 2013).Furthermore, Tables S3 and S4 contain information about the 27 Y-STR allele frequency and corresponding gene diversity (GD).Overall, at single-copy loci (Table S3), 153 alleles were detected with their frequencies varying from 0.041 to 0.821.DYS391 was the least polymorphic locus with only three allele combinations observed and GD equal to 0.31, while the most polymorphic among single-copy loci was DYS449, with 13 allele variants and GD ¼ 0.83.On the other hand, 37 and 29 allele combinations were detected at two multi-copy loci DYF38FS1 and DYS385, respectively, with gene diversity of 0.92 and 0.84, correspondingly (Table S4).
Abnormal alleles are presented in Table S5.Overall, three intermediate alleles, 15 samples with double alleles, and 5 samples with triallelic patterns were observed at loci DYS458, DYS19, and DYF387S1, respectively.Null alleles were present at loci DYS448 and DYS481.

Conclusion
This is the first study characterising the genetic polymorphism of the Eastern Kazakh population using a comprehensive 27 Y-STR loci system.This allowed for the addition of 17 Y-STR haplotypes from Eastern Kazakhs to the Kazakhstan National Y-chromosome National Haplotype Database.As a result, the generated population haplotype data have a higher genetic diversity and better discrimination capacity in comparison with the previous study, allowing the usage of the data for forensic purposes.Furthermore, population comparisons indicated the genetic relatedness of Eastern Kazakhs and Chinese Kazakhs from Xinjiang, China.This study made it possible to characterise a new and different sample of Kazakhs in East Kazakhstan from the previous sample (Tarlykov et al. 2013), which was shown using Rst and MDS.However, more research is needed to elucidate the complex population structure of Eastern Kazakhs.

Table S7 .
(Tarlykov et al. 2013)as detected between Eastern Kazakhs and Kazakhs from Altai, Xinjiang, China (Rst ¼ 0.2726) and Gansu, China (Rst ¼ 0.2724), while there was a minimal distance between Eastern Kazakhs and Kazakhs from Xinjiang, China (Rst ¼ 0.0079).It should be noted that the previously studied sample of Eastern Kazakhs(Tarlykov et al. 2013)was not close (Rst ¼ 0.1713) to the Eastern Kazakhs of this study.This indicates the existence in East Kazakhstan of population groups of different paternal origins, which corresponds to the settlement of at least two large clans in East Kazakhstan: Naiman and Kerey.Much closer to the Eastern Kazakhs of this study are the Kazakhs of the North (Rst ¼ 0.1163) and the South (Rst ¼ 0.0612).