The genetic structure and admixture of Manchus and Koreans in northeast China

Abstract Background The fine-scale genetic profiles and population history of Manchus and Koreans remain unclear. Aim To infer a fine-scale genetic structure and admixture of Manchu and Korean populations. Subjects and methods We collected and genotyped 16 Manchus from Liaoning and 18 Koreans from Jilin province with about 700K genome-wide SNPs. We analysed the data using principal component analysis (PCA), ADMIXTURE, Fst, TreeMix, f-statistics, qpWave, and qpAdm. Results Manchus and Koreans showed a genetic affinity with northern East Asians. Chinese Koreans showed a long-term genetic continuity with Bronze Age populations from the West Liao River and had a strong affinity with Koreans in South Korea and Japan. Manchus had a different genetic profile compared with other Tungusic populations since the Manchus received additional genetic influence from the southern Chinese but didn’t have West Eurasian-related admixture. Conclusions The genetic formation of Manchus involving southern Chinese was consistent with the extensive interactions between Manchus and populations from central and southern China. The large-scale genetic continuity between ancient West Liao River farmers and Koreans highlighted the role farming expansion played in the peopling of the Korean Peninsula.


Introduction
Manchu is the third largest ethnic minority group in China.Manchus were mainly distributed in the northeast of China, especially in Liaoning and Jilin Province.In addition, Manchus were scattered all over China, including Gansu, Guizhou, Hunan, and other provinces.Historically, the origins of the Manchu can be traced back to the Sushen in the Zhou Dynasty, Yinglou in the Han Dynasty, Wuji in the Southern and Northern Dynasties, Mohe in the Sui and Tang Dynasties, and Jurchen in the Jin Dynasty.During the Ming Dynasty, Jianzhou Jurchen developed into today's Manchu.The ancestral language of the Manchu is the Jurchen language.The present-day Manchu language belongs to the Manchuric branch of the Tungusic family, and they have the largest population in the Tungusic family.Tungusic languages are spoken in Eastern Siberia and Manchuria by Tungusic peoples, including Manchu, Xibo, Oroqen, Hezhen, Nanai, Ulchi, Evenk, Negidal, and so on.Archaeolinguistic study suggested that the homeland of proto-Tungusic was the region around Lake Khanka, the far east part of the border between Russia and China (Wang and Robbeets 2020).
Previous genetics studies of Manchus, including those looking at autosomal short tandem repeats (STRs), Y-haplotype, and X-STRs (Xue et al. 2005;Liu et al. 2013;Xing et al. 2019), suggested there were small genetic distances between Northern Hans and Manchus.The mtDNA suggested Manchus had the admixture signal from south and north East Asia and showed a close genetic relationship with the neighbouring Mongolians, Koreans, and the Liaoning Han Chinese (Zhao et al. 2011).From the paternal Y-chromosome side, the dominant type in Manchus is O2-M122 with a proportion of 42.6% (Katoh et al. 2005).The Rst values calculated by Y-STRs show the Manchus were significantly different from some Chinese populations like Tibetans and Uyghurs, but have a close affinity with Northern Hans and Mongolians (Katoh et al. 2005;Xue et al. 2006;He and Guo 2013;Bai et al. 2016;Atif et al. 2019).The qualitative and quantitative analyses of the high-density genome-wide SNP data of 93 Manchus collected from Xinbin, Liaoning, showed a large-scale admixture with northern Han Chinese (Zhang et al. 2021).The Manchu can be modelled as a mixture of the northern populations represented by the ancient Mohe and Xianbei people and the southern populations represented by the Iron Age Taiwan samples (Zhang et al. 2021).
Unlike Manchus, Koreans constitute a cross-border ethnic group and comprise the main nationality of the Korean Peninsula.The largest inhabited area of the Chinese Koreans is the Yanbian Korean Autonomous Prefecture in Jilin Province.From a linguistic view, Korean is considered similar to Japanese as an isolated language of unknown phylogeny (Song 2005).Korean has similar factors to Japonic, Tungusic, and Mongolic languages in terms of grammatical features.Just like Japanese and Manchu, Korean has many Chinese loanwords.However, the subject-object-verb structure of Korean is the same as Manchu, which is different from the subject-verb-object structure of Chinese.
From an mtDNA perspective, Koreans display a typical East Asian profile (Jin et al. 2009).Y-chromosome haplotype O2b-SRY465 suggested the origin of proto-Korean can be traced back to northeastern China during the Neolithic (9,900-10,000 years BP) and Bronze Ages (3,450-2,350 years BP) (Kim et al. 2011).Kim et al. reported 88 modern Korean genomes and found two major genetic components of East Siberia and Southeast Asia (Kim et al. 2020).Modern Koreans can be best described as an admixture of Neolithic Northeast Asians and the Iron Age Southeast Asians.Gelabert et al. reported the first paleogenomics data from Korea, which traced back to the 4th-7th CE of Korea's Three Kingdoms period.These ancient genomes can be modelled as an admixture between a Bronze Age northern Chinese genetic source (Yellow_River LBIA or Liao_River BA) and a Jomonrelated ancestry (Gelabert et al. 2022).However, there is no genomic analysis reported on Korean ethnicity in China.We note that the formation of populations is not subjected to national borders, but it's interesting to investigate the genetic diversity of Chinese Koreans.
Here, we reported the genome-wide SNP data of 16 Manchu individuals living in Jinzhou, Liaoning, and 18 Korean individuals living in Antu County of Yanbian Korean Autonomous Prefecture.Zhang et al. (2021) collected and genotyped Manchu individuals in Xinbin County, eastern Liaoning Province, but in this study, we collected and genotyped the Manchu samples from Jinzhou city in central-west Liaoning Province.Since Manchu has a large population in Liaoning Province, it is better to genotype more samples to infer the possible genetic substructure within Manchu.In addition, Jinzhou is located east of the Liaodong Corridor, which has historically connected most of the land transport between north China and northeast China.The ethnic memory of native Manchus in Beizhen, Jinzhou, recorded their homeland as Changbai Mountain (Supplementary Figure S1).Antu County is an ethnic Korean autonomous region at the foot of the Changbai Mountains.We sampled these two places in the hope of providing more detailed information on migration and admixture events of ethnic minorities in northeast China.

Sampling and genotyping
We collected saliva samples from 16 Manchus in Heishan and Beizhen (two adjacent sites) County in Jinzhou city, Liaoning province (group label "Manchu_Jinzhou" in the following analysis), and 18 Koreans in Erdaobaihe Town, Antu County, Yanbian Korean Autonomous Prefecture, and Jilin Province (group label "Korean_Antu" in the following analysis) (Figure 1).All samples were taken with informed consent.The studies involving human participants were reviewed and approved by the Medical Ethics Committee of Xiamen University (Approval Number: XDYX2019009).Genomic DNA was extracted by QIAamp DNA Blood Mini kit (QIAGEN, Hilden, Germany)  United States).We genotyped DNA samples with the Illumina WeGene Arrays containing 717,228 single nucleotide polymorphisms (SNPs).We used KING software (Manichaikul et al. 2010) to remove individuals with third degrees with other samples to guarantee that the samples in the population analysis were unrelated.Finally, 15 Manchus and 18 Koreans were reserved.

Merging data
We merged our newly collected samples with genome-wide SNP data from previously published modern and ancient populations (Patterson et al. 2012;Liu et al. 2020;Ning et al. 2020;Mao et al. 2021;Robbeets et al. 2021;Wang et al. 2021;Gelabert et al. 2022).Data merging was conducted by the mergeit program in EIGENSOFT.
Here we generated three reference datasets for our later genetic analysis:

Principal component analysis (PCA)
We conducted PCA using the smartpca package implemented in EIGENSOFT software (Patterson et al. 2006).We calculated the principal components (PC1 and PC2) using modern reference populations and then projected ancient individuals onto PC1 and PC2 using the lsqproject: YES option.

ADMIXTURE analysis
We first pruned the linkage disequilibrium SNPs on our dataset using PLINK 1.9 (Purcell et al. 2007) by parameters "-indep-pairwise 200 25 0.4."Finally, 4611 of 42,192 variants were removed.We ran ADMIXTURE from K ¼ 2 to K ¼ 8 and checked the cross-validation error (CV error) of each one.We chose the best-fitting model with the minimum CV error (K ¼ 4) and visualised the proportion of each individual by an in-house script.

Fst
We merged previously published Manchu data to calculate the Fst values, which measure population differentiation and genetic distance.We used the smartpca program in EIGENSOFT (Patterson et al. 2006) with default parameters and option fstonly: YES.

Treemix
Treemix v1.13 (Pickrell and Pritchard 2012) was used to construct a maximum likelihood tree for the set of populations.An African population Mbuti was considered as the outgroup and set as the root population via option: -root Mbuti.

Three-population test (f3 statistics)
Two different forms of f 3 statistics were used in our subsequent analysis by the qp3Pop program in the AdmixTools (Patterson et al. 2012).First, we conducted outgroup-f 3 -statistics in the form of f 3 (Studied population, reference population; outgroup).The outgroup-f 3 -statistics calculated the shared genetic drift between our studied population and chosen reference population since their divergence from the outgroup.The African population Mbuti was used as the outgroup.
Following this, the admixture-f 3 -statistics in the form of f 3 (Source1, Source2; Studied population) were performed to test whether our studied population was an admixture for the selected reference populations Source1 and Source2.The significant negative f 3 values with a Z-score < À3 indicated that two source populations could be related to the genetic ancestors of the studied population.

Four-population test (f4 statistics)
The f 4 statistics in the form of f 4 (Mbuti, B; C, D) were calculated by qpDstat program in AdmixTools (Patterson et al. 2012) with parameter f4mode: YES.We used Mbuti as the outgroup.The significant positive f 4 statistic with Z-score greater than 3 indicated a possible gene flow between the populations B and D. On the contrary, a significant negative statistic f 4 (Z-score<-3) indicated a possible gene flow between population C and D. The f 4 value that is close to 0 with absolute value Z-score less than 3 indicated B shared a similar number of alleles with both C and D.
We first used ancient farmers from Northeast Asia (represented by WLR_BA) or the Yellow River (represented by YR_ MN) and some ancient Koreans as one single source in the modelling.Based on the result of ADMIXTURE and f statistics, we then added Iron Age southern East Asians (represented by Taiwan_Hanben) as sources for the two-way admixture model.Considering historical factors, we also explored if ancient populations Heishui_Mohe and Mongolia_XiongNu had contributed to our studied populations in a three-way admixture model.

Population genetic structure
To qualitatively infer the population genetic structure, we first performed PCA (Figure 2).We observed two genetic clines corresponding well with geography and linguistics in the first two principal components (dotted wire circle).The north-south gradient cline of East Asia included Tai-Kadai, Austronesian, Austroasiatic, Hmong-Mien, Han Chinese and some Tibetan-Burman.The other cline mainly had Tungstic and Mongolic populations in Northeast Asia.Our studied Manchu and Korean samples were located between these two clines.Manchus clustered closely with north Han Chinese and some Yugurs, while Koreans in Antu (Korean_ Antu) clustered closely with Koreans from South Korea, Japanese, and the Bronze Age populations from the West Liao River.
We further performed an unsupervised model-based ADMIXTURE clustering analysis and observed the lowest cross-validation error at K ¼ 4 (Figure 3 and Supplementary Figure S2).Both the studied groups, Manchus and Koreans, consisted of four ancestral components, which were enriched in Tibetans or Hans (pink and yellow), Southeast Asians (yellow) and Northeast Asians (orange).The Manchus showed a similar genetic profile with northern Han Chinese, while the Korean_Antu samples were genetically similar to Koreans from South Korea, which is consistent with the PCA results.

Population relationships between reference modern East Asians
Manchu is a geographically widespread minority.We observed a genetic substructure in Manchus in northern and southern China.Manchus in Liaoning in northeast China (Manchu_Jinzhou and Manchu_Xinbin) and Manchus in Guizhou in southwest China (Manchu_Bijie and Manchu_ Jinsha) were genetically heterogeneous (Figure 4).In the PCA plot (Figure 2), our studied Manchus clustered with Manchu_ Xinbin but clearly split from Manchus in Guizhou.Guizhou Manchus are closer to the southern Han, Hmong-Mien and local Guizhou Mongolians (Mongolian_Bijie).
Consistent with the PCA and ADMIXTURE results, both Koreans and Japanese show high outgroup-f 3 values and low Fst values with studied Korean_Antu, indicating a genetic affinity of Korean_Antu with Japanese and Koreans.Both the outgroup-f 3 value and Fst value indicated a close relationship between studied Manchus and Koreans.In addition, Koreans, Japanese, and studied Korean_Antu cluster together on the Maximum Likelihood (ML) tree inferred by Treemix.We also detected a possible gene flow from Tungusic-speaking populations in northeast Asia to the Korean-Japanese cluster (Figure 5).
Compared to some Turkic and Tungusic speaking groups in northern East Asia (such as Kazakh_China, Kyrgyz_China, Even) (Figure 6(A,B), Supplementary Table S2), Manchus and Korean_Antu share more genetic drift with Han Chinese, Hmong-Mien, Tai-Kadai, Japanese, and Koreans as shown in outgroup-f 3 and pairwise f 4 statistics in the form of f 4 (Mbuti, Manchu/Korean; EastAsia1, EastAsia2).Manchus were genetically different from other Tungusic-speaking populations as shown in the values of f 4 (Outgroup, X; Manchu_Jinzhou, other Tungusic) where X represents an East Asian population (Figure 7(A)).Similar patterns can be seen in Korean_Antu (Supplementary Figure S3).The f 4 value shows a closer relationship between Southeast or East Asia and Manchus compared to other Tungusic groups, which we suspect was caused by the gene flow from West Eurasians into other Tungusic groups and the genetic influence on Manchus from southern China.We used Bronze Age Russia_Afanasievo pastoralists from the Eurasian Steppe as a western source to infer the possible influence of West Eurasians in studied Manchus and three Tungusic populations Hezhen, Oroqen, and Xibo (Supplementary Table S4).We have not detected significant evidence of West Eurasian-related influence in Manchus but found small proportions of western ancestry (4%$6%) in the other three Tungusic populations.

Population relationships between ancient populations
Manchus and Koreans have the closest affinity with ancients in the Yellow River (YR) and Yankovsky_IA (ancients from the Iron age far east) as shown in outgroup f 3 (Studied population, Ancient, Mbuti) (Figure 6(C,D)).The studied populations share more genetic drift with the ancient Yellow River and West Liao River basins than with Tibetan, Hmong-Mien, Tai-Kadai, and other southern and northeastern populations (Figure 7(B,C)).However, Manchus and Koreans shared similar amounts of genetic drift with the ancient populations of the Yellow River and West Liao River basins compared to the Han Chinese.
Koreans are spread across the Korean Peninsula and northeast China.We found Koreans in South Korea are genetically  S2B), the lowest CV error correlates to the best unsupervised model.Other models (K ¼ 2 to K ¼ 8 are shown in S Supplementary Figure S2A).homogeneous with our studied Koreans.On the other hand, historical Koreans (Korean_TK) share more genetic drifts with our studied Koreans.However, other ancient Koreans across the Neolithic have no significant difference in shared genetic drifts compared to other East Asians (Figure 7(C)).
Considering the observed close relationship with Han Chinese, we focus on the Han population in f 4 (Mbuti, ancients; Manchu/Korean, East Asia) (Figure 7(B,C)).Ancient populations of East Asia shared a similar amount of alleles with Manchus and northern Han from Henan, Shanxi, and Shandong provinces.But ancient populations from Southeast Asia, such as Malaysia_LN, shared more alleles with southern Han from Sichuan and Fujian provinces than Manchu.The f 4 statistics gave significant positive Z-scores when we put Koreans in place of Manchus and used ancient groups of northern East Asia and southern Siberia as "ancients", showing that Koreans harboured more northern East Asian or southern Siberian-related ancestry than Han Chinese.

Admixture model of studied Manchus and Koreans
We calculated the admixture f 3 statistics using Manchus and Koreans as target populations and all modern populations across East and Southeast Asia as potential sources (Supplementary Table S1).Tibetan_Chamdo, Ulchi, or Nanai as northern sources, with Ami or Tai-Kadai speaking groups as southern sources, can generate the top negative f 3 values in modelling the admixture of Manchu.While Northeast Asians such as Nanai or Ulchi with Tai-Kadai people such as Mulam or Maonan generated the top negative f 3 values using Koreans as the target.The negative f 3 values indicated the genetic formation of both Manchus and Koreans had probably involved an admixture between a northern and a southern source.
We next used qpAdm to model the genetic formation of Manchus and Koreans (Figure 8).We found Koreans can be fitted as deriving from a single source with Bronze Age farmers from the West Liao River (WLR_BA).The Neolithic or Bronze ancients in Korea (Taejungni and Ando) can be modelled as the single source of both studied Manchus and Koreans, showing the long-term genetic continuity in the West Liao River region and Korea.However, Manchus were suggested to have received additional genetic influence from southern China.In a two-way model, we can model Manchus as an admixture of approximately 17% of Iron Age Taiwanrelated ancestry and 83% of WLR-related ancestry.We note that the two-way model still held when the northern  ancestral source was replaced by the Iron Age upper Yellow River group (Upper_YR_IA), indicating the limited resolution of the currently available data in distinguishing Upper_YR_IA and WLR_BA.When we added Heishui_Mohe or Mongolia_ XiongNu as the third source in the 3-way models (Supplementary Table S3), Manchus can be modelled as 7.5% of Taiwan_Hanben, 84% of YR_LN and the remaining 8.5% of Heishui_Mohe related ancestry.To compare the difference between our target and closely related groups, we next modelled the formation of neighbouring populations, including Koreans and Mongolians (Supplementary Table S3).With the same sources and outgroups, modern Koreans can also be modelled as an admixture of WLR_BA(85%) and Taiwan_Hanben(15%).We observed Mongolians can be modelled with less southern ancestry from Taiwan_Hanben related populations but more ancestry from Northeast Asians, such as Xianbei or Mohe.

Paternal and maternal haplogroup genotyping of each individual
In the Supplementary The studied Koreans in Antu have a similar maternal genetic profile to Koreans in South Korea (Lee et al. 2006), which have the dominant D4 lineages but also have A5, B4, F1, G1, M7, M9, N9, and Y1 as other East Asians.The Y chromosomal haplogroups of six male samples belonged to O1, O2, and C2, all commonly found lineages in Han Chinese and Koreans (Cai et al. 2009).

Discussion
As a minority that once established a regime with great influence, the Manchus have integrated and interacted with many ethnic groups over the centuries.In the Qing Dynasty, established by the Manchus, they had large-scale intermarriages with Han Chinese.In the last centuries, large-scale migration from northeast China to the central plain through Liaodong Corridor and the Brave journey to the northeast (known as "Chuang Guandong" in Chinese) brought extensive interactions between Manchus and populations from central and southern China.Those historical reasons may explain the genetic affinity between Manchus and other populations all over China, including northern Sino-Tibetan and  (A) f4(Mbuti, ancient; Manchu_Jinzhou, East_Asia)   southern Tai-Kadai or Hmong-Mien groups in our study.Moreover, the close relationship between northern Hans and Manchus has been proven in many previous studies, and is confirmed in this study once again (Tian 2004;Zhao 2007).
Although the Manchu have the largest population in the Tungusic language family, the genetic profile of the Manchu is different from other Tungusic groups such as the Oroqen, Hezhen, and Xibo.We found small proportions of West Eurasian-related components in other Tungusic groups but not in Manchus.Manchus have a close affinity with ancient populations from the West Liao River and Yellow River Basin.Manchus can be modelled as an admixture of Bronze Age West Liao River farmers and Iron Age Taiwan_Hanben, indicating the north-south admixture in the formation of Manchu.A previous study found Manchus can be modelled as deriving 32.4% ancestry from ancient Mohe people and the remaining ancestry from the farming-related ancient populations in the Yellow River Basin (Zhang et al. 2021).However we failed in the two-way model with Heishui_Mohe as the source.The reason may be that we have approximately 200,000 SNP sites after merging with the 1240k dataset, which is 80,000 sites more than the previous study (almost 120,000 sites).We can have a better resolution when we have more SNPs, and more SNPs may detect the subtle genetic difference leading to a failure in the modelling, which could not be found with a smaller number of SNPs.Another reason may be our smaller sample size and the different sampling site of the Manchu: our samples were collected from Jinzhou, but the samples of Zhang et al. (2021) were from Xinbin.Gelabert et al. show that the 4th-5th-century South Korean populations had a varied proportion of indigenous Jomon-related ancestry, which does not survive in presentday Koreans (Gelabert et al. 2022).Consistent with the previous study, we have not detected Jomon-related ancestry in present-day Chinese Koreans.Our studied Koreans in Jilin showed a strong connection with present-day people from South Korea, which supported that the origin of Chinese Koreans can be traced back to the recent migration of the Korean Peninsula.Chinese Koreans can also be modelled as (C) f4(Mbuti, ancient; Korean_Antu, East_Asia)  deriving ancestry from a single source related to WLR_BA, consisting of the transmission route of farming from the northeast to the Korean Peninsula and even the Japanese islands (Kwak et al. 2017;Kim and Park 2020).
Previous studies have shown that Manchus and Koreans correspond in language, history and culture.Manchus and Koreans have a linguistic connection because Koreans have many Manchu loanwords and the same grammar structure.Chinese Manchus and Koreans are the main ethnic minorities in northeast China and have had many communications in history.In addition, Chinese Manchus and Koreans are deeply influenced by the Han culture and have genetic exchanges with Han (Kim and Park 2020).The historical memory of native Manchus in Beizhen, Jinzhou, recorded their original place as Changbai Mountain (Figure S1).We here found Manchus were genetically similar to Chinese Koreans from Changbai Mountain, which is consistent with the linguistic connection between Manchus and Koreans.
To further investigate the genetic profile in northeast China, more subgroups of Manchus and Koreans in this area should be collected in the future.In addition, the reference dataset of modern populations was mainly generated via the Affymetrix Human Origins arrays.We have only about 70,000 sites left in the analysis after merging with the Human Origin dataset, which is the limitation of our research.S3.

Figure 1 .
Figure 1.Overview of sampling sites in this study.(A) Geographical view of the sampling location in East Asia.(B) Geographical details of the sampling sites of Liaoning and Jilin Provinces.
components than Manchus in Jinzhou, which indicated Koreans might have more Northeast Asia-related ancestry.

Figure 4 .Figure 5 .
Figure 4. Fst value between Manchus and studied Korean with modern East Asia.The smaller the diversity between groups, the smaller the Fst value.

Figure 6 .
Figure 6.Outgroup-f3 values are shown in error bar plot.The error bar is marked as the standard deviation.Each group is classified in different colours by linguistic.Higher f3 value represented more shared genetic drifts.(A,B) Outgroup-f3 (Studied pops, Modern East Asia; Mbuti).(C,D) Outgroup-f3 (Studied pops, ancient reference East Asians; Mbuti).

Figure 8 .
Figure 8. Admixture model of qpAdm.Each bar represents a success model (p>.05) of our chosen outgroups.Each label of the bar contains target population and p value of this model.The error bar denoted the standard error estimated using jack-knife.Only successful models of the studied population are shown in this graph, other unsuccessful models or models that use other target populations are shown in TableS3.

Published modern pops Studied pops
The studied Korean_Antu contained more orange Figure 2. Genetic structure of ancient and present-day populations included in this study.Principal component analysis (PCA) of ancient individuals projected onto modern East Asia.More details are shown in Figure S3.
Results of ADMIXTURE plot for ancients and modern East Asians for K ¼ 4. The cross validation (CV) is lowest when K ¼ 4 (Supplementary Fig.

Table S5
(Malyarchuk et al. 2010)omosome haplogroups of the studied Manchus and Koreans.The dominant paternal Y-chromosome haplogroup in Manchus is O2a1c1, similar to northern Han Chinese.On the other hand, we identified a variety of mtDNA haplogroups, including B4, C4, D4, F1, F2, M8, and N9 in our studied Manchus, and most of these maternal lineages are also prevalent in Han Chinese.Surprisingly, we found a U5 haplotype in the studied Manchus.Haplogroup U is broadly distributed among West Eurasians(Malyarchuk et al. 2010).The presence of U5 in the Manchus indicated the West Eurasian genetic influence in the formation of Manchu people.
-F4 statistics.The significant positive f4-values marked with "þ" and significant negative f4-values marked with "À."Each group is classified in different colours by linguistics.The tree is generated by the default parameter of pheatmap function in R. Forms of f4 statistics are.(A) f4 (Mbuti, East_Asia; Manchu_ Jinzhou, Tungusic), to estimate the genomic affinity between Manchus and other Tungusic population with other modern East Asians.Negative values indicate the closer affinity of Manchus and East Asians.(B,C) f4 (Mbuti, ancient; Studied pops, East Asian), to estimate the genomic affinity between studied Manchus or Korean and ancient population.