A 30-InDel Assay for Genetic Variation and Population Structure Analysis of Chinese Tujia Group

In the present study, thirty autosomal insertion and deletion polymorphic loci were simultaneously amplified and genotyped in a multiplex system, and their allelic frequencies as well as several forensic parameters were obtained in a sample of 236 unrelated healthy Tujia individuals. All the loci were in Hardy-Weinberg equilibrium after applying a Bonferroni correction and all pair-wise loci showed no significant linkage disequilibrium. These loci were observed to be relatively informative and discriminating, quite efficient for forensic applications. Allelic frequencies of 30 loci were compared between the Tujia group and other reference populations, and the results of analysis of molecular variance indicated the Tujia group showed the least significant differences with the Shanghai Han at one locus, and the most with Central Spanish population at 22 loci. We analyzed the population genetic structure by the principal component analysis, the clustering of STRUCTURE program and a Neighbor-Joining tree, and then evaluated the genetic relationships among Tujia and other 15 populations.

Statistical analyses. Hardy-Weinberg equilibrium (HWE), allelic frequencies and forensic statistical parameters of 30 InDels were calculated by the modified powerstat (version1.2) spreadsheet (Promega, Madison, WI, USA). Linkage disequilibrium (LD) analysis for all pair-wise InDel loci was performed using the SNPAnalyzer v2.0 (Istech, South Korea) 20 . Fst and p values for pairwise interpopulation comparisons were  Table 1. Allelic frequencies of deletion allele at the 30 InDel loci ranged from 0.0445 to 0.9089 in the group, with a mean value of 0.4939. The observed (HO) and expected heterozygosities (HE) ranged from 0.0890 (HLD118) to 0.5381(HLD92); and 0.0850 (HLD118) to 0.4985(HLD136), with a mean value of 0.4028 and 0.4073, respectively. Twenty-four InDel loci had power of discrimination (PD) values greater than 0.5, except the six loci: HLD39, HLD64, HLD81, HLD99, HLD111, and HLD118 loci. The values of the power of exclusion (PE), the matching probability (MP), the typical paternity index (TPI), and the polymorphic information content (PIC) ranged from 0.0067 to 0.2231, 0.3524 to 0.8379, 0.5488 to 1.0826 and 0.0814 to 0.3742, respectively. The lowest HO, HE, PIC, TPI, PD and PE were observed at HLD 118 locus, and this locus was also found with the lowest polymorphism in other previously studied groups 10 . The combined power of exclusion (CPE) and discrimination (CPD) at the 30 InDel loci in the Tujia group were 0.9860 and 0.9999999999761, respectively; combined matching probability (CMP) value of 30 InDels in the group was 2.3894 × 10 −11 , higher than that in our previous study which reached 1.10974 × 10 −19 of 21 autosomal STRs in Tujia group 11 . According to our calculation, the value of CMP combining 30 InDels with 21 autosomal STRs reached 2.652 × 10 −30 . These data suggested that the panel of 30 InDel loci could be a valid supplement to the routine detection of autosomal STRs in forensic cases.
Linkage disequilibrium tests. Linkage disequilibrium tests of these pairwise InDels were analyzed using the SNPAnalyzer version 2.0 and obtained several indexes: LOD, r 2 and |D'|. As shown in Supplementary Fig. 1, no strong linkage disequilibrium between two different InDels was observed in a total of 435 interclass correlation tests (data not shown) with the values of r 2 less than 0.8, and no crimson box was coated by a thick black curve. The present LD tests suggested that 30 InDels were independent for the following statistical analyses, and also suited for forensic cases in the Tujia group.
Genetic divergences. Genetic distance is a measure method of the genetic divergence between different populations, used for understanding the origin of biodiversity and reconstructing the history of different ethnic groups 22 . We measured the Nei's D A distance by examining the differences between allelic frequencies at the same set of 30 InDel loci of different populations. D A distances between the 16 groups with each other based on allelic frequencies of the 30 InDel loci were shown in Table 2   InDel diversities among populations. Population differentiations for 30 InDels were compared between the Tujia group and other populations previously published based on AMOVA method (p < 0.05). As shown in Table 3, the AMOVA comparison results showed significant differences between the Tujia group and Shanghai Han, Beijing Han, Guangdong Han, She, Xibe, South Korean, Tibetan, Yi, Uigur, Kazak, Uruguayan, Hungarian, Basque, Dane, Central Spanish populations at 1, 3, 3, 4, 5, 7, 8, 9, 14, 14, 20, 20, 20, 21 and 22 loci, respectively. The present results demonstrated that the HLD125, HLD99, HLD67, HLD118 loci had relatively high level of genetic variation, with the significant differentiation between Tujia group and other 9, 10, 10 and 11 populations, respectively; while the least differentiation was obtained at the HLD92, HLD101, HLD124 loci with only one pair-wise population. Therefore, allele frequency data obtained at 30 InDels are very important and necessary for forensic application research of different populations.   scaling plot which also showed the close relationship between Tujia and Han population 28 ; and the similar result was observed in the PCA plot based on the allelic frequencies of HLA-DRB1 locus 29 . The relatively far genetic relationships between Tujia group and Kazak or Uigur group were observed in the PCA map constructed by mtDNA haplogroup frequencies 30 and in the abovementioned HLA-DRB1 PCA plot 29 , respectively. The genetic relationships among Tujia, central Asian (Uigur and Kazak populations), western Eurasians (Hungarian, Dane, Basque and Central Spanish populations) and other eastern Eurasians (Shanghai Han, Beijing Han, Guangdong Han, She, Xibe, South Korean, Tibetan, and Yi populations) were also discerned with the aid of abovementioned InDel datasets at the individual level. Results of individual PCA were presented by the plots of the first two PCs (shown in Fig. 1b), which together accounted for 38.82% of the total variation in these populations. The first PC revealed an east-west geographic division within Eurasians. In concrete terms, all eastern Eurasians tended to cluster on the left of PCA plots, whereas western Eurasians formed a separate cluster on the right. The Tujia people were expectedly clustered within eastern Asian group.

Principal component analyses.
Neighbor-joining phylogenetic reconstruction. We constructed a neighbor-joining (N-J) phylogenetic tree (shown in Fig. 2). The branch in the upper-left corner contained the nine East Asian populations including Tujia group; whereas in the other branch, Dane, Basque, Central Spanish, Uruguayan, and Hungarian populations were found in the lower-left corner. The Kazak and Uigur groups were in the middle of the above two branches. In previous study, the close relationship between Tujia group and Han population was observed in the N-J dendrogram based on the allelic frequencies of HLA-A locus 31 . The language of Tujia belongs to Tibeto-Burman language system, without written script. Tujias lived with other nationalities like Miao and Han, and many of them can speak Mandarin Chinese and write the Chinese characters. The tight genetic relationship between the Tujia and Han population in Hubei provience was observed based on fifteen STRs, and the present and previous studies indicated that broad genetic exchanges had occurred among them in history 32 .
Population STRUCTURE analyses. The STRUCTURE program was used to evaluate the genetic structure of Tujia and other 15 populations. As shown in Fig. 3, at K = 2, three clusters were highly visible and easily distinguishable basically by red, green and mixture of the two. When K = 2-7 (in Supplementary Fig. 2), the STRUCTURE analyses revealed three major clusters: the first subpopulation of Dane, Basque, Central Spanish, Uruguayan, and Hungarian populations, the second of Kazak and Uigur; the last one of nine East Asian populations including Tujia group. The results presented here were similar to that of the PCA plot and N-J tree. With the increase of K values, no further population structures were obtained. We should, just as a precaution, study more ancestry informative InDels in the future in order to subdivide the genetic structure of different ethnic groups in China, and to infer the population origin and ancestral components of an unknown individual.

Conclusion
In summary, the population data here indicated the 30 InDels had high diversities within the studied group and genetic differentiations among different populations; and could be a useful supplement to the routine detection of autosomal STRs in forensic cases. The PCA plot, N-J tree and STRUCTURE analyses suggested the close relationships between Tujia and Han population in different regions. More ancestry informative InDels and SNPs should be selected and validated to clarify the Tujia ancestral origin.