Forensic characteristics and genetic substructure analysis of the Handan Han population, Northern China

Abstract We analysed the forensic characteristics and substructure of the Handan Han population based on 36 Y-STR (short tandem repeat) and Y-SNP (single nucleotide polymorphism) markers. The two most dominant haplogroups in Handan Han, O2a2b1a1a1-F8 (17.95%) and O2a2b1a2a1a (21.51%), and their abundant downstream branches, reflected the strong expansion of the precursor of the Hans in Handan. The present results enrich the forensic database and explore the genetic relationships between Handan Han and other neighbouring and/or linguistically close populations, which suggests that the current concise overview of the Han intricate substructure remains oversimplified.


Introduction
Handan is one of the earliest sites of domesticated common millet in East Asia, where many important events and stories took place in Chinese history (Figure S1).According to the latest census (2021), Han Chinese make up the majority of the nearly 10 million residents of Handan City.Previous studies have sketched out several main clusters in the Han groups, but also pointed out significant differences and intricate substructures among them (Wen et al. 2004;Xu et al. 2009).Therefore, regional lineage accumulations and spatial comparisons of the Han population among groups are still necessary.

Subjects and methods
In order to investigate the phylogenetic history and forensic characteristics of the Han nationality in Handan, a total of 730 unrelated male individuals were recruited and genotyped by the DNATyper TM 36Y Kit (No.2 Institute of the Ministry of Public Security, Beijing, China).All samples were collected after receiving informed consent, and individuals were considered autochthonous if their ancestors had lived in Handan municipality for at least three generations.The Ethical Committee of Fudan University, Shanghai, People's Republic of China, approved the study (No. 14012).Experimental and statistics methods are shown in Table S1.Data were submitted to the YHRD (Y chromosomal haplotype reference database, https://yhrd.org)under accession number YA004677 for the Handan Han population.First, to outline the genetic relationship between Handan Han and other Asian populations, 27 Y-STR data from 20 typical populations were selected for AMOVA based on linguistic and geographical classification and visualised in a multidimensional scaling (MDS) plot.Subsequently, to enlarge the reference dataset and refine the genetic background understanding of Handan Han, 17 Y-STR data from 38 populations were used for further population comparison.According to genetic distance matrices (Rst matrices), a phylogenetic tree was built with neighbor-joining (N-J) methods by MEGA-X (Kumar et al. 2018) and optimised by the Interactive Tree of Life v5 (Letunic and Bork 2021).The details of the reference dataset are shown in Table S2.
In accordance with Y-DNA haplogroup Tree 2020 (https:// isogg.org/tree/),we constructed a simplified phylogenetic tree that showed the overall distribution of Handan Han samples.To better understand the distribution of prevailing haplogroups in the studied population, we integrated data from previously published papers (Shi et al. 2005;Xue et al. 2006;Park et al. 2012;Trejaut et al. 2014;Ning et al. 2016;Li et al. 2020;Zhou et al. 2020) and constructed contour maps by Surfer 15 (https://www.goldensoftware.com/products/surfer).

Results and discussion
The allele frequencies and the gene diversity (GD) values for 36 Y-STR loci are summarised in Table S3.407 alleles were detected across the 36 Y-STR loci, with frequencies between 0.0014 and 0.7041.Among 30 single-copy loci, DYS449 was the most informative locus with the GD of 0.8743.The three multi-copy Y-STRs, DYS385a/b, DYF387S1 a/b and DYS527a/b were the most diverse markers with GD values of 0.9642, 0.9483 and 0.9437, respectively.The Y-STRs haplotype distributions and Y-SNP of the Handan Han population are listed in Table S4.A total of 729 unique Y-STR haplotypes were found in the 730 Handan Han samples, of which 728(99.73%) were distinct.To evaluate the utility of the new markers for forensic casework, haplotype-based analyses were repeated for various subsets of Y-STRs, namely, YfilerV R marker panel (17 loci), Yfiler V R Plus marker panel (27 loci), and DNATyper TM 36Y marker panel (Table S5).Overall, an increase in the number of analysed Y-STR markers decreased the number of shared haplotypes and increased the number of unique haplotypes.These results indicate that the DNATyper TM 36Y kit offers excellent discrimination capabilities and may be useful in forensic investigations and paternal lineage identification in Han populations.
To outline the genetic relationship between Handan Han and other Asian populations, we compared the 27 Yfiler haplotype data of Handan Han with those of 20 neighbouring populations based on linguistic and geographical classification.Table S6 shows that there were significant differences (p＜0.0024 after Bonferroni's correction) between the Handan Han and all the other populations.Among them, Handan Han had the longest genetic distance from populations of Mongolic, Tibeto-Burman, and Turkic groups (Rst$0.0615-0.3386),followed by Southern Hans (Guangdong, Guizhou, and Hainan) and Hunan-Miao (Rst$0.0226-0.0350).Low Rst values between Handan Han and the other Northern Hans (Changchun, Shannxi, Beijing, and Shandong) and Baishan-Manchu (a typical Tungusic-speaking population in Northern China) were obtained (Rst$0.0056-0.0163).To portray the patterns of population genetic relationships, an MDS plot constructed based on linearised Rst values (Figure S2) showed that all populations were generally clustered according to language and geographical factors.Handan Han, all the other Han populations, and Baishan-Manchu gathered in the upper right quadrant.A preliminary analysis indicated that language and geographical factors had a great influence on the genetic composition of Handan Han; for in-depth comparative analysis, we need to expand the range of Chinese-speaking groups and collect more populations from northern China.
On the basis of the above analysis, we further sorted out the 17 Yfiler haplotype data from Chinese-speaking groups and Tungusic-speaking populations (Table S7).According to the Rst values of these 38 populations, an N-J phylogenetic tree (Figure S3) was performed to depict their forensic genetic landscape.The Handan Han first clustered with the northern Han group (Dalian-Han, Gansu-Han, Beijing-Han, Shandong-Han, Shannxi-Han, Luoyang-Han, and Hulunbuir-Han), followed by the Tungusic-speaking group (Liaoning-Manchu, Chengde-Manchu, Baishan-Manchu, etc.) and then with the Hui group (represented by the Henan-Hui), which together formed a large northern branch in the phylogenetic tree.At the same time, the other populations constituted another large southern group cluster.This north-south differentiation pattern further emphasises the great influence of geographical factors on the genetic background.Although the Northern Han, Southern Han, and Hakka groups share the Han identity, they belong to different branches in the tree.The above groups presented a clear North-South dichotomous pattern, but we noted that Yunnan-Hui from the south was located on the northern branch.The Hui group is unique in that they speak Chinese as the Han group but practice Islam.Yunnan-Hui was mainly shaped by largescale migration in the Ming Dynasty (Chuang 2018) and there is a lack of fine-scale genetic structure research.The Yunan-Hui population in the YHRD database (sample size ¼ 43) contains a higher frequency of haplogroup J1(> 18.60%) than others (Lang et al. 2019).Therefore, the current clustering results are generally acceptable.
To extensively illustrate the frequency distribution of the two prevailing haplogroups in this study, Oa-F8 and Ob-F46, contour maps were drawn (Figure S5).Although both of them are the most abundant paternal lineages in East Asian populations, Ob-F46 was mainly centred in northern China, while Oa-F8, in addition to northern China, is also very popular among many groups in Southwest China.It is generally believed that these two haplogroups began to spread in the Upper and Central Yellow River Basin approximately 6,000-7,000 years ago and were related to the shift from huntergather subsistence to intensive agriculture in North China (Xue et al. 2006;Yan et al. 2014;Wang et al. 2018).Geographically speaking, the Handan City is located at or very close to the diffusion centres of these two haplogroups, so it is rational that Oa-F8 and Ob-F46 appear frequently in Handan Hans.Furthermore, we supposed that Handan Han samples also include abundant downstream branches of these two haplogroups, especially some unanalysed paragroups.From the view of network structure (Figure S6), in Handan Hans, Ob-F46 seems to contain two large secondary clades, while the downstream structure of Oa-F8 is more dispersive.

Conclusions
This study attempts to provide a clearer scenario of the Han genetic background in Handan, which is also an important complement to the current oversimplified Han substructure.We analysed the forensic characteristics and genetic substructure of the Handan Han population based on 36 Y-STR and predicted Y-SNP markers.The DNATyper TM 36Y set showed excellent discrimination capabilities in Handan Han and should be reliable for database and casework samples.Stratified population comparisons showed that Handan had close relationships with Chinese-speaking groups in northern China which is consistent with the linguistic and geographic classifications.The combination of Y-STR and Y-SNP markers reflected the strong expansion of the precursor of the Hans in Handan.With the development of sequencing technology, the Han group may exhibit a deeper and more complex evolutionary structure.These data obtained in this study could potentially be useful for regional-specific and prerequisite references to forensic, genealogical, and evolutionary researches.