Genetic polymorphisms of 19 X-STRs in populations of Hubei Han and Guangxi Zhuang and their comparisons with 13 other Chinese populations

Abstract Background A prerequisite for applying short tandem repeat (STR) kits is obtaining population allele and/or haplotype frequencies and forensic parameters. Aim Firstly, we aimed to investigate the population data of 19 X-chromosomal STRs (X-STRs) included in the AGCU X19 STR kit in the Han people residing in Hubei Province, Central China, and the Zhuang people residing in the Guangxi Zhuang Autonomous Region of South China. Furthermore, we compared these population data with those for other Chinese populations. Subjects and methods In total, 509 unrelated Han males and 266 unrelated Zhuang males were genotyped using the AGCU X19 STR kit. Allele frequencies, haplotype frequencies, and forensic parameters were computed, and genetic differences among 15 Chinese populations were analysed. Results The 19 X-STRs showed a high power of discrimination and high mean chance of exclusion, whether calculated using allele or haplotype frequencies. Major differences were found between Han and Oroqen, Uyghur, Mongolian, Tibetan, Li, and Yi populations. Aberrant biallelic patterns at DXS10159, DXS10134, and DXS10079 and allelic dropouts at DXS10164 were observed. Conclusion The 19 X-STRs were highly polymorphic in the Hubei Han and Guangxi Zhuang populations, and the AGCU X19 STR kit was shown to be suitable for forensic casework.

Hubei Province is in Central China, with an area covering 185,900 km 2 (Figure 1).The population of Hubei Province is approximately 58 million people comprising 56 ethnic groups, of which the Han Chinese account for >95.0%.Additionally, Zhuang is China's largest minority group, with a population of more than 19 million people (China Statistical Yearbook 2021, http://www.stats.gov.cn/tjsj/ndsj/2021/indexch.htm,accessed 17 April 2022).An estimated 80.6% of Zhuang people live within the Guangxi Zhuang Autonomous Region of Southern China (Guangxi Zhuang) (Figure 1).The remainder have settled in Yunnan, Guangdong, Guizhou, and Hunan Provinces.The Zhuang people are the aboriginal inhabitants of Guangxi, with a history dating back to the Neolithic period (8.0-12.0ka BP) (IA CASS 2003).They also have their own writing system (Sawndip) and native language belonging to the Tai-Kadai language group.
To the best of our knowledge, genetic data are still rarely reported for Han people residing in Hubei Province (Hubei Han) and Guangxi Zhuang (Liu et al. 2022).Investigating the genetic data of these two populations will help to provide more complete evidence to support the fact that there is no considerable genetic difference between Hubei Han and the Han people residing in other parts of China and to determine whether there are substantial genetic differences between the two unreported populations and other ethnic groups.The present study explored the genetic polymorphisms of 19 X-STRs in Hubei Han, along with Guangxi Zhuang, in addition to analysing the genetic differences between 15 Chinese populations.

Sample
This study was approved by the Ethics Committee of Tongji Medical College, Huazhong University of Science and Technology, China (Approval Number: Tongji-2019-IEC-S160).After obtaining informed consent, buccal swabs or dried blood spots were collected from 509 unrelated Hubei Han males and 266 unrelated Guangxi Zhuang males.All Zhuang donors were indigenous people whose ancestors inhabited the Guangxi Zhuang Autonomous Region for at least three generations previously.Similarly, all Hubei Han donors were local residents, with their extended families originating from Hubei Province.From each individual, buccal swabs were scraped from the inner cheek, gum, and sublingual areas, and whole blood was collected via venipuncture and deposited on FTA cards.There are two main reasons why only males were included.First, numerous previous studies have shown no considerable difference in the allele frequencies of the studied X-STRs between males and females (Yang et al. 2016;He et al. 2017;Chen L, Guo, et al. 2018;Chen P, He, et al. 2018;He et al. 2018;Li et al. 2019;He et al. 2020;Luo et al. 2020;Liu et al. 2022).Second, we expected to investigate the haplotype frequency of X-STRs of the seven defined linkage groups.If several X-STRs are in a linkage state, then haplotype frequencies rather than allele frequencies should be used when applied to forensic human identification and kinship analysis.In addition, previous studies have analysed males alone (Deng et al. 2017;He et al. 2020).

Data collection
DNA extraction, PCR, and fragment analysis were performed as described by Xiao et al. (2020).In brief, genomic DNA was isolated from the samples using the Chelex-100 method.Multiplex PCR was then performed using an AGCU X19 STR kit (AGCU ScienTech, Wuxi, China) according to the manufacturer's recommendations in a 2720 thermal cycler (Applied Biosystems, CA, USA).PCR products were subsequently separated through capillary electrophoresis on an ABI 3130 Genetic Analyser and finally analysed using GeneMapper v3.2 software (Applied Biosystems, CA, USA).Bidirectional Sanger sequencing was used to analyse allelic dropouts at the DXS10164 locus using the procedure described by Yu et al. (2016).The primers for PCR and Sanger sequencing were as follows: 5 0 -GCC TAG GCA AGT TTT CAA GAG TAG-3 0 (Forward) and 5 0 -CCT AAA CAA CCA AAG CAA CTC AAC-3 0 (Reverse).

Data management and statistical analysis
Allele and haplotype frequencies were calculated using an in-house R script (https://www.r-project.org/).For biallelic patterns, the most frequent allele was selected for calculation and dropout alleles were restored using Sanger sequencing.Linkage disequilibrium tests were performed using Arlequin ver.3.5.2.2 software (Excoffier & Lischer, 2010).Polymorphism information content (PIC), power of exclusion, power of discrimination in females and males, and mean exclusion chances (MEC Kr€ uger , MEC Kishida , MEC Desmarais , and MEC Desmarais Duos ) were all computed using the online tool (https://www.chrxstr.org/xdb/calculate.jsf) in the ChrX-STR.org2.0 database.Haplotype diversity was estimated using the formula n=ðn À 1Þ Â ð1 À P p 2 i Þ, where n and p i represent the sample size and frequency of the ith haplotype, respectively.The allele frequencies of the same 19 X-STRs in the other 13 Chinese populations were downloaded from previously published reports (He et al. 2017;Chen L, Guo, et al. 2018;Chen P, He, et al. 2018;He et al. 2018;Han et al. 2019;Li et al. 2019;He et al. 2020;Luo et al. 2020;Liu et al. 2022) after a reliability assessment was completed (i.e. the frequency statistics were determined to be error-free).The exact test of population differentiation between the 15 populations was then conducted using allele frequencies in Arlequin ver.3.5.2.2 software locus by locus.Nei's standard genetic distances between the 15 Chinese populations were analysed and subsequently visualised as a phylogenetic tree using POPTREE2 (Takezaki et al. 2010).A multidimensional scaling (MDS) plot was also constructed using R ver.4.1.2.

Allele frequencies and forensic parameters
The recorded allele frequencies are presented in Tables 1  and S1, which show that allele numbers ranged from 5 to 22 in Hubei Han and from 4 to 19 in Guangxi Zhuang, and off-ladder alleles were observed.Among the 171 pairwise comparisons, two pairs (DXS10103-DXS10101 and DX10103-HPRTB) in Hubei Han (Table S2) and one pair (DXS10103-DXS10101) in Guangxi Zhuang (Table S3) were found to be in significant linkage disequilibrium after Bonferroni correction (p < 0.05/171 ¼ 0.0002924).
The forensic statistical parameters calculated based on allele frequencies are listed in Tables 2, S4, and S5.PIC values were 0.4227-0.9052 in Hubei Han and 0.4265-0.9095 in Guangxi Zhuang, with DXS10135 being the most informative marker.With the exception of DXS10103, the combined values of PDM, MEC Desmarais , and MEC Desmarais Duos were greater than 0.999,999,99 for both Hubei Han and Guangxi Zhuang (Table 2).

Haplotype frequencies and forensic parameters
Haplotype frequencies of the seven predefined linkage groups are shown in Tables S6 and S7.Their haplotype diversity values ranged from 0.9321 to 0.9966 in Hubei Han and between 0.9296 and 0.9948 in Guangxi Zhuang, with DXS10148-DXS10135-DXS8378 being the most polymorphic cluster.The combined values of PDM, MCE Desmarais , and MCE Desmarais Duos calculated based on haplotype frequencies were greater than 0.999,999,999 for both Hubei Han (Table 3) and Guangxi Zhuang (Table 4).

Inter-population comparisons
The allele frequencies of each locus in the Hubei Han and Guangxi Zhuang populations were compared with those of the same 19 X-STR loci in 13 previously published populations: the Heilongjiang Daur (Liu et al. 2022) (Han et al. 2019), and Guizhou Tujia (Luo et al. 2020) populations.After Bonferroni correction, the results showed that there were no significant differences in the 19 loci between the Hubei Han and the Heilongjiang Daur, Hainan Han, Wuzhong Hui, Guizhou Han, Guizhou Miao, and Guizhou Tujia populations, although there were significant differences at the HPRTB locus between the Hubei Han and Guangxi Zhuang populations (significance level ¼ 0.05/105), which may have been caused by sampling error.However, this was in addition to significant differences observed at four or more loci between the Hubei Han population and the other six populations.There were no significant differences in the 19 loci between the Guangxi Zhuang and the Hainan Han, Hainan Li, Guizhou Miao, and Guizhou Tujia populations, but there were differences at one locus between the Guangxi Zhuang and the Heilongjiang Daur or Wuzhong Hui population, as well as differences at three or more loci between the Guangxi Zhuang population and the six remaining populations.
The phylogenetic tree based on Nei's genetic distances (Table S8), and the MDS plot using Fst genetic distance were used to examine the population genetic similarities and distances between the two Chinese populations in this study and the 13 Chinese populations from previous studies (Figure 1).These plots showed that there were few differences between the Hubei Han and other Han populations living in other regions of China, as well as several ethnic minorities such as Guangxi Zhuang, Guizhou Tujia, Wuzhong Hui, and Guizhou Miao.However, there were major differences between ethnic minorities including Heilongjiang Oroqen, Xinjiang Uyghur, Xinjiang Mongolian, Dujiangyan Tibetan, Hainan Li, and Sichuan Yi (Figure 2).The close distances between some ethnic minorities and the Han population may have been due to a higher intermarriage rate with the Han people.

Comment
In this study, we obtained the genetic data of 19 X-STRs from populations of Hubei Han and Guangxi Zhuang.An MEC of >0.999,999,9 and a PD of >0.999,999,999,99 both indicated high applicability of the AGCU X19 STR kit in forensic human identification and kinship testing here.These data can serve as important supplements to the national X-STR database.In addition, we provide more complete evidence to support that there is no substantial genetic difference in the 19 X-STRs among the Han people residing in several different regions of China; we also reveal the difference in frequency distribution, genetic similarity, and genetic distance among the two reported populations and 13 other ethnic groups.Furthermore, greater attention should be paid to abnormal genotypes in future to generate more accurate expert opinions.Although this study did not include samples from females, this does not affect the conclusion, as is evident from the results of previous studies.In the future, we will supplement female samples to overcome this shortcoming.

Figure 1 .
Figure 1.Geographical locations of Hubei Han (red), Guangxi Zhuang (blue), and the 13 other Chinese populations included in the population comparison.The histogram indicates the sample size of each Chinese population.

Figure 2 .
Figure 2. Genetic differences between Hubei Han, Guangxi Zhuang, and the 13 other Chinese populations.Differentiations are shown by a Neighbour-Joining tree (A) based on pairwise Nei's genetic distances and a multidimensional scaling plot (B) based on pairwise Fst genetic distances.

Figure 3 .
Figure 3. Allelic dropouts at the DXS10164 locus.Electropherograms show allelic dropouts at DXS10164 for the samples 1098-3 (A) and 1018-3 (B).(C) DNA forward sequence assembly reveals dropouts of allele 10 [-(ATTCT)10 -] at DXS10164 for both samples.A point mutation (rs182880143: G/A, indicated by black arrow) located upstream of the core repeat region is found after aligning the sequences of the control sample, which also has a genotype of 10 at DXS10164, and the two samples with allelic dropouts.Detailed sequencing results can be found in Figure S1.

Table 2 .
Combined forensic parameters calculated using allele frequencies of 19 X-STR loci in the Hubei Han (n ¼ 509 males) and Guangxi Zhuang (n ¼ 266 males) populations.

Table 3 .
Forensic statistical parameters of seven linkage groups in the Hubei Han population (n ¼ 509 males).

Table 4 .
Forensic statistical parameters of seven linkage groups in the Guangxi Zhuang population (n ¼ 266 males).Note: polymorphism information content (PIC), power of exclusion (PE), power of discrimination in females (PDF) and in males (PDM), mean exclusion chance (MEC)