Forensic efficiency evaluation of a novel multiplex panel of InDels and STRs in the Guizhou Han population and its phylogenetic relationships with other reference populations

Abstract Background Insertion/deletion polymorphism (InDel), as the third genetic marker, has been given a lot of attention by forensic geneticists since it has the advantages of extensive distributions in the human genome, small amplicon, and low mutation rate. However, the extant InDel panels were only viewed as supplemental tools for kinship analyses. In addition, these panels were not conductive to mixture deconvolution because InDels in these panels mainly displayed two alleles. Aims The purpose of this study is to investigate genetic distributions of a novel panel of InDels and STRs in the Guizhou Han population; assess the forensic application value of the panel; and conduct population genetic analyses of the Guizhou Han and other reference populations based on the overlapping loci. Subjects and methods The bloodstain samples of 209 Guizhou Han were gathered and genotyped by the novel panel. Allelic frequencies and forensic parameters of two miniSTRs and 59 InDels in the panel were estimated. In addition, we assessed phylogenetic relationships among the Guizhou Han and other reference populations by principal component analysis, DA genetic distance, and neighbor-joining tree. Results A total of 139 alleles of 61 loci could be observed in the Guizhou Han population. Polymorphic information content values of 59 InDels were greater than 0.3 in the Guizhou Han population. The cumulative power of discrimination and probability of exclusion of two miniSTRs and 59 InDels in the Guizhou Han population were 0.999999999999999999999999997984 and 0.9999986, respectively. Principal component analysis of 14 populations showed that the Guizhou Han population located closer to Hunan Han and Southern Han Chinese (CHS) populations. Similar results were also discerned from DA genetic distances and the neighbor-joining tree. Conclusion To sum up, the novel panel could be employed for forensic personal identification and paternity testing in the Guizhou Han population as a promising independent tool. Besides, the principal component analysis and phylogenetic tree of the Guizhou Han and other compared populations revealed that the Guizhou Han population possesses close genetic affinities with Hunan Han, CHS, and Han Chinese in Beijing (CHB) populations.


Introduction
Insertion/deletion polymorphism (InDel), as the third genetic marker, belongs to a length polymorphic variant that results from the insertion or deletion of random DNA fragments (Weber et al. 2002).Analogous to single nucleotide polymorphism (SNP), InDels own the advantages of extensive distributions in the human genome, small amplicon, and low mutation rate (LaRue et al. 2012).More intriguingly, InDels can be compatible with conventional capillary electrophoresis (CE) detection technology owing to their alleles displaying length differences, and this has been given a lot of attention by forensic researchers.For example, Qiagen (Qiagen, Hilden, Germany) developed a multiplex amplification panel of 30 InDels for forensic individual identification; and a large amount of population data for these 30 InDels have been reported so far (Jian et al. 2019;Haidar et al. 2020;Shahzad et al. 2020;Zou et al. 2020;Al-Snan et al. 2021;Xie et al. 2022).However, some loci out of these 30 InDels exhibited relatively low genetic diversities in Chinese populations (Wei et al. 2014;Xie et al. 2018;Jian et al. 2019), which was unfavourable for forensic individual identification and paternity testing.Accordingly, forensic workers in China began to select InDels showing relatively high genetic diversities in Chinese populations and develop some multiplex amplification systems for forensic research using the CE technology (Chen et al. 2019;Jin et al. 2019;Huang et al. 2020;Jin et al. 2021).Nonetheless, these panels provided relatively limited genetic information for paternity analysis, which could be only regarded as the supplement system for forensic kinship testing in comparison to commonly used short tandem repeat (STR) kits.In addition, these InDels are detrimental to mixture deconvolution given their features of biallelic variations, especially for those mixed samples consisting of more than two contributors.
In a previous study, Liu et al. constructed a novel six dyes-labeled panel including two miniSTRs, 59 autosomal InDels, and three sex determining loci using the CE technology.Furthermore, they conducted the developmental validation of the panel in accordance with the guidelines of the Scientific Working Group on DNA Analysis Methods.Obtained results demonstrated that the panel not only showed robust stability, good sensitivity, and species specificity, it also contributed to detecting mixture and outdated samples (Liu, Mei, et al. 2022).Besides, they further evaluated the forensic efficiency of the novel panel in Hunan Han (HNH), Tibet Tibetan (TT), and Qinghai Tibetan (QHT) populations, and found that the system can be used as an independent and valuable tool for forensic identification of individuals and paternity analyses (Liu, Mei, et al. 2022;Liu, Cui, et al. 2022).Even so, few population data of the panel in Chinese populations have been reported so far.Therefore, it is necessary for us to investigate the genetic distribution of these loci and evaluate their forensic application value in other Chinese populations.
In the current study, we aimed to address three issues: (1) investigating genetic distributions of the novel panel in the Guizhou Han (GZH) population; (2) evaluating forensic efficiency of the panel in the GZH population; (3) exploring genetic relationships of the GZH and other reference populations from Asia based on the overlapping loci.

Sample information
Two hundred and nine bloodstains of GZH individuals were gathered after obtaining their written informed consent.Individuals used in this study are unrelated healthy persons and have lived in the Guizhou province for at least three generations.Our study was conducted by following the guidelines of Guizhou Medical University and authorised by the Ethics Commission of Guizhou Medical University (approval no.2021-224)

Loci information, PCR, and allele detection
The detailed information and selection criteria of 64 loci were given in the previous study (Liu, Mei, et al. 2022).Since the novel panel is the direct amplification kit, we punched a 1.0 mm 2 disc card from each sample and added it to the PCR cocktail that included 2 mL Master Mix (HEALTH Gene Technologies, Ningbo, China), 2 mL Primer Mix, and 4 mL nuclease-free water.Next, the mixture was used to perform the multiplex amplification of 64 loci on the GeneAmp PCR System 9700 (Applied Biosystems, Foster City, CA, United States) by the below parameters: 95 C for 5 min, 2 cycles of 94 C for 10 s, 63 C for 90 s; 25 cycles of 94 C for 10 s, 60 C for 90 s; final elongation at 60 C for 15 min, and then store at 4 C.After this step, 1 mL PCR amplified product was mixed into 0.5 mL SIZE-500 (HEALTH Gene Technologies) and 8.5 mL Hi-Di deionised formamide.Finally, the mixture was separated and detected on the GeneMapper ID-X Software v1.5 (Thermo Fisher Scientific, Foster City, CA, USA).We determined the allelic typing of each loci by comparing to the allelic ladder of the panel.

Reference populations
Previously reported HNH, TT, QHT, CDX, CHB, CHS, JPT, KHV, BEB, GIH, ITU, PJL, and STU populations (Donnelly et al. 2015;Liu, Cui, et al. 2022;Liu, Mei, et al. 2022) were used as the reference population to explore their phylogenetic relationships and genetic distances with the studied GZH population.General information of these populations are listed in Supplementary Table 1.

Statistical analysis
Allelic frequencies and forensic parameters of two miniSTRs and 59 InDels in the GZH were computed by the STRAF software v1.0.5 (Gouy and Zieger 2017).Moreover, p values of Hardy-Weinberg equilibrium (HWE) for these 61 loci and linkage disequilibrium (LD) analyses were also performed using STRAF.Insertion allelic frequency heatmap of 59 InDels in the GZH and 13 reference populations were plotted by the TBtools software v1.072 (Chen et al. 2020).Principal component analysis (PCA) of these 14 populations was performed using the factoextra package v1.0.7.999 of R software v4.1.0.In addition, we calculated D A genetic distances of these 14 populations using the DISPAN package (https://mybiosoftware.com/dispan-genetic-distance-phylogenetic-analysis.html).Finally, a neighbor-joining tree of these populations was plotted using the MEGA software (Kumar et al. 2018) based on pairwise D A values.

Results and discussion
HWE and LD analyses of two miniSTRs and 59 InDels in the GZH population P-values of HWE testing for two miniSTRs and 59 InDels are given in Supplementary Table 2. Results showed that these 61 loci conformed to HWE in the GZH population after applying Bonferroni correction (p ¼ .05/61¼ 0.00082).
As presented in Supplementary Table 3, we found that all pairwise loci were in line with linkage equilibrium after applying Bonferroni correction (p ¼ .0000273).Therefore, we proposed that these loci could be regarded as independent loci from each other in the GZH population.
Allelic frequencies and forensic-related parameters of 61 loci in the GZH population As shown in Figure 1(A) and Supplementary Table 4, a total of 139 alleles for these 61 loci were observed in the GZH population.For 59 InDels, insertion allelic frequencies of these loci ranged from 0.3517 to 0.6722.In a previous study, Larue et al. pointed out that bi-allelic genetic markers could be used for forensic personal identification once their minor allelic frequencies were greater than 0.2 (LaRue et al. 2014).We found that minor allelic frequencies of these 59 InDels in the GZH population were more than 0.2, indicating these loci could be utilised for forensic individual identification.In addition, there were eight and 13 alleles observed in D3S1358 and D1S1656 loci, respectively.
Forensic-related parameters including polymorphic information content (PIC), matching probability (PM), power of discrimination (PD), observed heterozygosity (Ho), expected heterozygosity (He), probability of exclusion (PE), and typical paternity index (TPI) of 61 loci in the GZH population are in Figure 1(B) and Supplementary Table 2.For Ho and He, they distributed from 0.3923 (rs10649306) to 0.8325 (D1S1656) and 0.4417 (rs3830338) to 0.8216 (D1S1656), respectively.In a previous study, Botstein et al. proposed that PIC could be used to assess genetic polymorphism: genetic markers in which PIC values are greater than 0.5 are highly informative; genetic markers are reasonably informative if their PIC values >0.25 (Botstein et al. 1980).In the present study, we found that the PIC values of all loci were greater than 0.3, reflecting that these loci possessed reasonable genetic information content in the GZH population.Even so, we observed that He values of 59 InDels were lower than those of two STRs.We inferred that InDels displayed lower genetic diversities than STRs because they mainly displayed two allelic variations.Therefore, we could choose InDels with more than two alleles so as to obtain higher genetic polymorphisms in the future.The mean PM, PD, PE, and TPI of these loci were 0.3741, 0.6259, 0.1929, and 1.0307, respectively.Cumulative PM, PD, and PE values of these 61 loci in the GZH population were 2.0156 Â 10 À27 , 0.999999999999999999999999997984, and 0.9999986, respectively.We also compared the forensic efficiencies of these 61 loci with previously reported InDels and STRs in the GZH population (Yang et al. 2017;Liu et al. 2020), as given in Table 1.We found that these 61 loci possessed higher cumulative PD and PE values than 30 InDels.Moreover, we found that these 61 loci showed slightly higher cumulative PD values than 22 STRs, implying that these 61 loci could also be employed for forensic individual identification in the GZH population, especially for some highly degraded samples.Even though the cumulative PE value of these 61 loci was lower than the value of 22 STRs, they were still viewed as valuable loci for paternity analyses in the GZH population given that the cumulative PE value of 61 loci was greater than 0.9999.Besides, these loci might provide more valuable information in some complex kinship cases than commonly used STRs since STRs possess a higher mutation rate.In a nutshell, we propose that the novel panel could be employed for forensic personal identification and kinship testing in the GZH population.

Allelic frequency comparisons and PCA of the GZH and other reference populations
Based on 59 overlapped InDels, we firstly compared allelic frequencies of these loci in the GZH and other reference populations, as shown in Figure 2(A).We found that most loci exhibited similar frequency distributions among these populations.In addition, we found that rs3083268, rs35173752, and rs35464887 loci showed relatively low allelic frequencies in BEB, GIH, ITU, PJL, and STU populations; instead, rs60922184, rs35453727, and rs71698233 loci displayed relatively high allelic frequencies in these five South Asian populations.Nonetheless, minor allelic frequencies of the majority of loci were greater than 0.2 in these 14 populations, implying these loci were also useful for forensic personal identification in these Asian populations.
Next, we conducted the PCA of these 14 populations based on allelic frequencies of 59 shared InDels, as shown in Figure 2(B).The PC1 and PC2 accounted for 66.8% and 11.6% of all variances, respectively.At PC1, these 14 populations could be classified into two population clusters: five South Asian populations (BEB, GIH, ITU, PJL, and STU) located on the right side; the remaining populations were situated on the left side.At PC2, we discovered that two Tibetan populations could be separated from the remaining East Asia populations.Furthermore, we found that the studied GZH located closer to HNH and CHS populations, indicating they possessed similar allelic frequency distributions.
Genetic distances and phylogenetic tree of the GZH and other reference populations D A genetic distances of 14 populations are shown in Figure 3(A).We found that the studied GZH population had the lowest D A value with HNH, followed by CHB and CHS populations.Conversely, the GZH population showed relatively high genetic distances with five South Asian populations (>0.009).Finally, we plotted the phylogenetic tree based on pairwise D A values of these populations, as shown in Figure 3(B).Two branches could be discerned from the tree: five South Asian populations formed a branch; the remaining populations formed another branch.In addition, we found that QT and TT populations formed the sub-branch, which was consistent with the results of PCA.For the studied GZH population, it firstly formed the branch with HNH, and then CHB, CHS, and the remaining East Asian populations.
A previous study based on genome-wide data revealed that Han populations from different regions showed genetic substructure, and Han populations could be classified into northwest Han, northeast Han, central China Han, southwest Han, southeast Han, and south coast Han populations according to their genetic differentiations (Gao et al. 2020).In addition, some scholars proposed that southern Chinese Han populations may have closer genetic affinities with some minority groups residing in southern China, especially for those Tai-Kadai and Hmong-Mien groups (Chen et al. 2009;Xu et al. 2009).In the study, we found that the GZH population showed close genetic affinities with HNH, CHS, and CHB populations, implying that the GZH population had low genetic divergences with these Han populations in comparison to other populations.As more genetic data of these 61 loci in other minorities groups living in Southern China are reported, we can better explore the genetic relationships of the GZH and these surrounding populations.

Conclusion
We reported genetic data of two miniSTRs, 59 InDels, and three sex-determining loci in the GZH population.The novel panel could be used for forensic personal identification and kinship analysis in the GZH population.In addition, population genetic analyses of the GZH population and other compared populations showed that the GZH population had close genetic affinities with HNH, CHB, and CHS populations.

Figure 1 .
Figure 1.Allelic frequencies (A) and forensic parameters (B) of 59 autosomal InDels and two STRs in the Guizhou Han population.

Figure 3 .
Figure 3. Genetic distances (D A ) heatmap (A) and the phylogenetic tree (B) of Guizhou Han and other reference populations.

Table 1 .
Forensic efficiency comparisons of different kits in the GZH population.