Phylogeography of Pterocarya hupehensis reveals the evolutionary patterns of a Cenozoic relict tree around the Sichuan Basin

Environmental factors such as mountain tectonic movements and monsoons can enhance genetic differentiation by hindering inter-and intraspecific gene flow. However, the phylogeographic breaks detected within species may differ depending on the different molecular markers used


Introduction
The distribution and genetic structure of many species have been influenced by environmental factors such as monsoons, mountain tectonics, and other historical/ecological processes [1−3] .In addition, the presence of geographic barriers, combined with species-specific characteristics such as dispersal mode of seeds or pollen, can potentially effect on the genetic differentiation of species [4−6] .Thus, understanding phylogeographic patterns and their potential influences on biomes is a primary objective of conservation and evolutionary biology [1,7] .
The Sino-Japanese Floristic Region (SJFR), known for its abundant diversity of temperate flora, has attracted significant attention from phylogeographers and paleo-ecologists [8−10] .Its high biodiversity is typically explained by the absence of continental glaciation and a smaller magnitude of Quaternary environmental change [11] .The Sichuan Basin, a unique geological structure in East Asia, acts as a mountain refuge for numerous relict species [12−14] .The phylogeographic break of relict plants around the Sichuan Basin is mainly related to uplift of the Qinghai-Tibet Plateau (QTP) during the late Pliocene, as well as intensification of the East Asian monsoon system (EAMS) [15−17] .Climate fluctuations since the Miocene may also have been a key determinant of the differentiation and colonization of ancient taxa [18,19] .However, the dynamics of species around the Sichuan Basin during the Miocene climate change-and how species characteristics and geographic barriers affected their genetic patterns-remain poorly understood.
The climate of Asia has changed dramatically since the Miocene [20−22] .Between the early and middle Miocene, the pattern of aridity began to change from "planetary" subtropical to "inland" [21] , and this was followed by formation and dominance of the monsoon climate [23,24] .This phenomenon is generally explained by uplift of the QTP [25,26] and cooling of the global climate during the middle Miocene [27] , which further enhanced East Asian summer and winter monsoons during this period (c.15-10 Mya) [28−30] .
Many phylogeographic studies have shown that several post-Miocene uplift and monsoon events are related to the genetic structure and genetic differentiation of plants [14,18,31] .Monsoons can also strengthen geographic barriers by giving rise to different climatic environments on either side of the barrier, promoting the formation of lineage discontinuities, such as the "Tanaka-Kaiyong Line" (TKL) [32,33] .Abundant summer precipitation, brought about by strengthening of the East Asian monsoon, is critical to the development and maintenance of subtropical evergreen broad-leaved forests in China [34,35] .Aridity is also an important factor affecting plant distribution [36] .With uplift of the QTP and strengthening of the Asian monsoon in the Early Miocene, inland drought began to occur and intensify in Asia [31,37,38] .This forced plants in the interior of Asia to retreat southward to find suitable habitats in subtropical regions with better hydrothermal conditions.
The Sichuan Basin, which is located in the second step of China's terrain, is surrounded by mountains.These mountains exhibit complex geomorphological and climatic characteristics that have created diverse habitats for relict species along steep ecological gradients [39,40] .Many physical and ecological barriers, such as mountains, rivers, and climate, are believed to drive population diversification and speciation around the Sichuan Basin [6,16] .Among the main phylogeographic patterns around the Sichuan Basin, the best-known is the 105°E line [6,41] .This line runs longitudinally across the Sichuan Basin, coincident with the boundary of the Sino-Himalayan and Sino-Japanese forest sub-kingdoms, dividing taxa into eastern and western lineages [13,42] .The TKL is another major phytogeographic boundary that traverses the mountains in the southwestern Sichuan Basin [43] .In addition, the Yangtze River (Three Gorges region) traverses the mountains in the eastern Sichuan Basin, forming a phylogeographic break that divides taxa into northern and southern lineages [31,44] .Monsoons, QTP uplift, mountains, and river basins have thus isolated biodiversity into areas of endemism or created lineages by impeding gene flow in dispersal-limited organisms [16,45,46] .
Most phylogeographic studies of plants in East Asia have used chloroplast molecular markers [14,17,18,47,48] .Chloroplast DNA (cpDNA) is passed down by uniparental maternal inheritance in most angiosperms and is transmitted by seeds alone, thus providing no pollen-related information [49] .However, pollen-mediated gene flow may dominate in wind-pollinated trees [5,50] .Asymmetrical gene flows mediated by pollen and seeds have been consistently demonstrated in phylogenetic studies of many temperate tree species [5,41,51] .Different transmission characteristics may result in different intraspecific genetic structures.For example, inconsistencies between cpDNA and nuclear DNA of Juglans cathayensis and Populus lasiocarpa have been attributed to biological traits (e.g.extensive pollen exchange and wind-dispersed seeds) that partly delay the genetic imprinting of long-term isolation [5,6] .When exploring the effects of past events on the evolutionary history of taxa, species-specific biological characteristics should therefore be taken into account.
Pterocarya hupehensis is a tree species endemic to China that is found in mountainous areas (between 700 and 2000 meters above sea level) around the Sichuan Basin [46,52,53] .It is a typical riparian relict species of subtropical evergreen broad-leaved forests [52] .P. hupehensis is monoecious, producing flowers in catkins (amenta), with female amenta terminal on new growth; it is thus a typical anemophilic, cross-pollinated species [54,55] .P. hupehensis has been assessed as vulnerable (VU) and is consid-ered to be of conservation concern [53] ; a better understanding of its population genetics and phylogeography can therefore guide appropriate protective actions.It is also an excellent case study for investigating potential drivers of genetic patterns in wind-pollinated relict species around the Sichuan Basin.
Here, we used cpDNA and restriction site-associated DNA sequencing (RAD-seq) datasets to reconstruct the evolutionary history and phylogeography of P. hupehensis.We then performed ecological niche modeling (ENM) to explore its suitable habitats from the past to the future.In particular, we addressed three questions.First, what is the genetic diversity of P. hupehensis populations around the Sichuan Basin?Second, how did this species respond to climatic fluctuations from the Neogene to the Quaternary periods?Third, what evolutionary processes contributed to the observed genetic patterns?

Sampling and sequencing
Leaves were collected from 18 natural populations covering the distribution range of P. hupehensis around the Sichuan Basin (Table 1).Individuals in each population were at least 50 m away from each other, and 156 individuals were sampled.Fresh leaves were dried using silica gel, and DNA was extracted from the dried leaf tissue using a modified CTAB method [56] .After removal of samples with low-quality DNA, 141 individuals were selected for cpDNA sequencing and 122 individuals for RAD sequencing (Supplemental Table S1).Voucher specimens of each individual were stored at the Shanghai Chenshan Botanical Garden Herbarium (CSH).After screening previously published universal primers, six cpDNA loci (psbD-trnT [57] , trnV(UAC)x2-ndhC [58] , trnL(UAG)-rpL32-F [57] , trnS-trnfM [59] , trnG-trnS [60] , and trnD-trnY [61] ) were selected for use in this study.PCR amplification was performed as described previously [62] .The amplified PCR products were sequenced by Sangon Bioengineering Co., Ltd.(Shanghai, China).

Chloroplast DNA sequence analysis
Chloroplast DNA sequences were assembled and checked using Sequencher v4.1.4(Gene Codes Corp., Ann Arbor, MI, USA).The sequences were aligned and calibrated using Clustal W implemented in MEGA 11 [63] and then manually calibrated and adjusted.The haplotypes of the chloroplast fragments were extracted using DnaSP v6.0 with default parameters [64] , and a haplotype distribution map was constructed using ArcGIS 10.2.Haplotype diversity (H d ) and nucleotide diversity (π) were calculated with Arlequin 3.5 [65] .Total gene diversity (H T ), within-population gene diversity (Hs), and population differentiation indices (G ST and N ST ) were calculated using PERMUT 2.0 [66] .The Median Joining model of NETWORK 10.2.0.0 [67] was used to construct the haplotype network.BARRIER v.2.2 was used to detect biogeographic boundaries evaluated by 100 replicates of population average pairwise difference matrices [68] .Molecular analysis of variance (AMOVA) with 1000 permutations was performed to examine genetic variation among and within populations using GenAlex 6.5 [69] .
BEAST 2.5 was used to estimate chloroplast haplotype divergence times under a log-normal relaxed clock [70] .We chose Juglans regia as an outgroup [71] .On the basis of the Akaike information criterion (AIC) implemented in Modeltest 3.7 [72] , the HKY+I+G model was selected as the best alternative model.
The age of the earliest conclusive Juglans L. fossil was used as the minimum age to constrain the stem of the haplotype tree (~50 Mya: 44.4-57.88Mya) [73] .We also used Pterocarya fraxinifolia and Pterocarya macroptera as outgroups, and the age of the earliest Pterocarya fossil was used to constrain the crown group (~30 Mya: 28-34 Mya) [52] .Markov chain Monte Carlo runs were performed for 10 million iterations with parameter sampling every 1000 generations.Convergence was assessed using Tracer v1.7 [74] , and the effective sample size (ESS) for all parameters was calculated.The first 20% of each run was discarded as burn-in using TreeAnnotator v1.8 (http://beast.bio.ed.ac.uk/TreeAnnotator).Mismatch distribution analysis and two neutrality tests, Tajima's D [75] and Fu's Fs [76] , were performed to estimate historical demographic expansions using Arlequin 3.5 [65] .

RAD-seq data processing and SNP filtering
The raw reads from all 122 individuals were cleaned to remove reads with uncalled bases and low quality scores using the process_radtags module in Stacks v2.62 [77] .The process_radtags module was also used to truncate the final reads to 120 bp.The Pterocarya stenoptera [78] reference genome was indexed with BWA v0.7.17 [79] , and the P. hupehensis RAD sequences were aligned to the indexed reference genome.The 'flagstat' command in SAMtools v1.16 was used to calculate the mapping rates and read numbers [80] .The Stacks v2.62 pipeline was used to process the RAD-seq reads [77] .The Gstacks module was used to identify single nucleotide polymorphisms (SNPs) at each locus in the population and genotype each individual for each identified SNP.The resulting BAM files were sorted with SAMtools [80] .The populations module in Stacks was used for data filtering and SNP calling with the following criteria: (1) greater than 80% of individuals in each population were processed for each locus using the parameter 'r = 0.8'; (2) the maximum observed heterozygosity was set to 0.7 with the parameter 'max-obs-het = 0.7'; (3) the minimum minor allele frequency (MAF) was set to 0.05; and (4) only the first SNP locus of each read was retained to avoid physical linkage.The variant dataset was then filtered for missing data using VCFtools v0.1.16 [81]with the parameter 'max-missing = 0.8'.Raw sequencing reads were deposited in the NCBI Sequence Read Archive (SRA) under BioProject number PRJNA961732.

Population structure and genetic diversity
Bayesian clustering was performed using Admixture [82] .The most probable values of K for explaining population structure were determined using the lowest cross-validation (CV) error rate.R v4.1.0 [83]was used to visualize the curve of the cross-validation error rate from one to ten and the population structure histogram.Population structure was also investigated using principal-component analysis (PCA) for 122 individuals with the R package "adegenet" [84] .The optimal number of lineages was selected on the basis of the lowest associated Bayesian information criterion.An individual-based maximum likelihood (ML) tree was constructed using IQ-TREE v1.6.12 [85] and contained three outgroups: P. macroptera, Cyclocarya paliurus, and Juglans mandshurica [46] .The nucleotide diversity (π), the expected and observed heterozygosities (H E and H O ), and the fixation index (F is ) among populations were calculated using the populations module in Stacks.AMOVA was performed using Arlequin v3.5 [65] to explore the degree of genetic differentiation among lineages, populations within lineages, and populations.

Population demographic histories
We used TreeMix v1.13 [86] to infer possible hybridization events among populations by obtaining allele frequencies from multiple populations and generating a ML tree.Migration

A c c e p t e d & U n -e d i t e d
events were analyzed from one to ten and then calibrated on the ML tree.The parameter "-noss" was used to prevent overcorrection.We calculated the percent variance explained in order to judge the migration events using the script 'treemix-VarianceExplained.R'.We set the program to use 10 migration events for generation of the ML tree.The standard errors of all entries in the covariance matrix estimated from the data were used to construct a heatmap.The migration tree and heat map were visualized using R v4.1.0.Fluctuations in effective population size were inferred using Stairway Plot 2 [87] , which implements an unsupervised learning strategy for model selection and supports both folded and unfolded site frequency spectra (SFS).The P. stenoptera reference genome was indexed, and unfolded SFSs were generated for both genetic groups and total populations using ANGSD v0.939 [88] .The effective population size was inferred for each lineage and for all populations using a mutation rate of 2.06 × 10 −9 per locus per year (with reference to Juglans) [89] and a generation time of 30 years.We used the recommended percentage of training sites (67%) to run the Stairway Plot 2 program.By default, 200 input files were created for each estimation.

Ecological niche modeling
The potential distribution areas of P. hupehensis were generated on the basis of all known distribution points using M AXENT 3.4.4 [91].The distribution points included the 18 sampling points used here, as well as distribution records from the National Specimen Information Infrastructure (http://www.nsii.org.cn/) and the Chinese Virtual Herbarium (https://www.cvh.ac.cn/).We deleted points corresponding to non-natural populations (parks, urban areas, etc.) using a coordinate backcheck.The R package "dismo" [92] was used to delete missing, duplicate, and incorrect coordinates.To reduce the influence of spatial autocorrelation on climate variables, we performed grid screening to remove coordinate points less than 2.5 arc-min (~4.6 km) apart using the R package "raster" [93] .We acquired the standard 19 bioclimatic variables from the WorldClim website (https://worldclim.org/)at a 2.5-arc-min spatial resolution [94] for four time periods: the present (1970-2000), the last glacial maximum (LGM, c.21 kya) [95] , the last interglacial period (LIG, c. 120 kya) [96] , and the 2061-2080 period under the representative concentration pathway (RCP) 4.5 scenario.We used the function variance inflation factor (VIF) from the R package "usdm" [97] to remove environmental factors with correlation coefficients >0.8.The following environmental factors were retained as predictors: annual mean temperature, mean diurnal range, temperature seasonality, precipitation of wettest month, precipitation seasonality, and precipitation of driest quarter.In total, 25% of the data were used for model testing and validation.Ten independent replicates were analyzed using the bootstrap method.The average results were used to reclassify the suitable areas in ArcGIS 10.2.The parameter for reclassification had five levels, 0-0.08, 0.08-0.25,0.25-0.5,0.5-0.75, and 0.75-1, indicating the degree of habitat suitability from low to high.
The phylogenetic network resolved two main haplotype lineages with 30 step mutations (western and eastern lineages) located in the eastern and western parts of the Sichuan Basin.The eastern lineage was further divided into two clades (clades II and III).Clade II included haplotypes from the eastern Sichuan Basin and the western Qinling population (YPC).Clade III was composed mainly of haplotypes from the southeastern Sichuan Basin (Fig. 1).The haplotype structures revealed by BEAST were similar to the groupings revealed by the phylogenetic network.The haplotype divergence between the western and eastern lineages dated to the middle Miocene (16.7 Mya), and clade II separated from clade III at approximately 8.5 Mya.The crown age of clade II was estimated at 5.3 Mya and that of clade III at 5.8 Mya (Fig. 2).
The values of Tajima's D and Fu's Fs were positive for P. hupehensis (D = 0.675, P = 0.693; F S = 0.196, P = 0.626), suggesting no expansion of its distribution.Mismatch distribution analysis showed multimodal distributions for all samples, also suggesting that this species has not undergone a recent demographic expansion (Supplemental Fig. S2).

Population structure and genetic diversity
We obtained 87 million RAD-seq reads from 122 samples.The alignment rates of the samples to the reference genome ranged from 49.49% to 98.23%, with an average of 86.72% (Supplemental Table S2).After restricting variants to those

A c c e p t e d & U n -e d i t e d
processed in >80% of the individuals in each population, 695,262 variant sites remained.After filtering, 2427 and 2889 SNPs with and without outgroups, respectively, were obtained for subsequent analyses.
Admixture analysis revealed that the genetic structure of P. hupehensis consisted of two lineages (Fig. 3; Supplemental Fig. S3).Populations located in the northwestern Qinling Mountains, together with one population (NYX) from the southwest, were assigned to the western lineage.The eastern lineage included all populations from the eastern Sichuan Basin, as well as one population from the northwest (YPC) (Fig. 3a).Genetic introgression was detected in five populations (western lineages LYG, MYG, DSP, and HH and eastern lineage FLC) (Fig. 3a, c).The core populations of the eastern lineage always clustered together when K = 2 to 5 (Supplemental Fig. S4).PCA also divided all populations into two lineages.The percentages of variation explained by PC1 and PC2 were 9.2% and 3.9%, respectively (Fig. 3).A ML tree showed that the eastern lineage evolved earlier than the western lineage (Supplemental Fig. S5), and the hybrid populations were primarily located at the branch ends of the ML tree.

Population demographic histories
The migration events in the ML tree showed two strong signals with a high migration weight, indicating unidirectional gene flow from the DFX population to the common ancestral populations of NYX and HJG and also from the HJG to MYG populations (Fig. 4a).Eight other gene flow signals were also detected.The longest horizontal branch was that of the MYG population, indicating that it had undergone the greatest genetic drift compared with the other populations (Fig. 4a, Supplemental Table S3).The topology of the genetic relationships in the ML tree was consistent with the results of admixture and IQ-TREE analyses.Stronger introgression events among populations were shown by the residual heatmap than by the ML tree inferred with Treemix (especially in the SHJZ, YPC, DFX, and QDZ populations) (Fig. 4a, b).The BEAST result based on the phylogeny from the RAD-seq datasets showed   Fluctuations in the effective population size of P. hupehensis were estimated to have occurred from approximately 1.0 Mya in the Late Pleistocene (Fig. 4c).Stairway Plot 2 analysis revealed a genetic bottleneck event at about 200-400 kya in which the effective population size dropped to about 1.4×10 4 to 2 × 10 3 individuals.The effective population size then climbed to an upper limit of ~3.2 × 10 4 individuals at about 200 kya and remained steady between 120-140 kya during the LIG.From the LGM period to the Holocene, effective population size gradually decreased to its lowest level of 0.1 × 10 3 individuals (Fig. 4c).The western and eastern lineages exhibited similar trends in population size fluctuation.However, the bottleneck event occurred slightly earlier in the western lineage than in the eastern lineage (Supplemental Fig. S7a, S7b).

Ecological niche modeling
The Area Under Curve (AUC) value of the receiver operating characteristic (ROC) curve was high (>0.977) in the four periods (Supplemental Fig. S8a-S8d).Annual mean temperature (35%) and mean diurnal range (26%) made the greatest contributions to the model under the current climate (Supplemental Fig. S9, Supplemental Table S4).The potential distribution of P. hupehensis under the present climate matched its current distribution; suitable area for its growth was limited to the mountains around the Sichuan Basin, especially in the northern and eastern regions (Fig. 5a).During the LGM period, the suitable area was significantly smaller in the southern and western regions of the Sichuan Basin but greater in the Qinling Mountains (Fig. 5b).The suitable area for P. hupehensis around the Sichuan Basin was greatest during the LIG period (Fig. 5c).Notably, during this period, an area of high suitability appeared in the Hengduan and Daliang Mountains.Under the RCP 4.5 scenario for 2061-2080, the suitable area for P. hupehensis was predicted to be just slightly smaller than at present, and the suitable areas in the western and southern Sichuan Basin were predicted to shrink under global warming (Fig. 5d).

Significant chloroplast genetic structure
Pterocarya hupehensis, like other relict tree species in subtropical China, shows strong phylogeographic structure (N ST > G ST ) based on cpDNA markers [5,41,42] .Most of the haplotypes are private, except for H1 and H7, which are shared among populations.As reported for other angiosperm species, genetic differentiation among lineages of P. hupehensis was higher when assessed with cpDNA (F CT = 0.75) than with nuclear DNA (F CT = 0.22).Such significant chloroplast genetic structure is typically attributed to maternal inheritance, as chloroplasts are only transmitted by seeds, which have a limited dispersal distance.Seed-mediated gene flow can be influenced by seed dispersal ability, germination and dormancy, seedling establishment ability, and so forth.The wingnuts of P. hupehensis can be carried over short distances  by wind and dispersed over long distances along rivers.P. hupehensis, as a relict species, is present in restricted microhabitats and shows strong niche conservatism (moist riparian forests along rivers with an elevational range of 700-2,000 m in the mountains) [46,98−103] .It produces relatively few seedlings, even when abundant seeds are present, owing to low seed quality and difficult seedling establishment [52,53] .Taken together, these factors have impeded the exchange of chloroplasts among populations and shaped current genetic patterns.

Different nuclear genetic structure
Optimal clustering based on nuclear genetic structure (from RAD-seq data) divided P. hupehensis into two genetic clades in both ADMIXTURE analysis and PCA.More subdivided genetic structures were recognized in the admixture analyses when K values were larger.Most populations in the eastern lineage were maintained in a single group from K = 2 to K = 5, indicating extensive gene flow via pollen dispersal.However, the numerous and private chloroplast haplotypes in the eastern lineage suggested their independent evolution (Fig. 1, Supplemental Fig. S3).Genetic differentiation among all populations was higher for cpDNA (F ST = 0.98) than for nuclear DNA (F ST = 0.34).Compared with maternal inheritance of cpDNA, parental inheritance of nuclear DNA typically exhibits more imprints from pollen flow.Thus, the inconsistency in genetic structure between cpDNA and nuclear DNA (especially for wind-pollinated species) can be understood as arising from differences between seed-mediated and pollen-mediated gene flows [5,104,105] .The present study provides another example of a temperate tree species with stronger genetic structure in the chloroplast genome (seed-mediated gene flow) than in the nuclear genome (pollen-mediated) [42,106,107] .Gene flows mediated by long-distance pollen dispersal were detected here and have been demonstrated in many other anemophilous tree species, such as Pterocarya fraxinifolia, Quercus ruber, Zelkova carpinifolia, and others [108−110] .The strong East Asian monsoon that began in the early Miocene may have enabled the spread of pollen over long distances and thus promoted gene flow among populations [111] .
Extensive pollen flows of anemophilous tree species have been reported to facilitate genic exchange and delay genetic  differentiation in species with restricted distributions [5] .Such genic exchange can improve population adaptation, particularly for tree species with slow evolutionary rates, high pollen dispersal capacity, and weak reproductive ability [112,113] .Compared with other wind-pollinated species, P. hupehensis has a relatively high level of genetic differentiation [114,115] , which may reflect the influence of slower pollen-mediated gene flow, higher levels of genetic drift, and local adaptation due to selection pressure associated with long-term environmental heterogeneity [114−116] .bottleneck event and small population sizes of P. hupehensis may have led to high levels of genetic drift.In addition, long-established small and isolated populations of P. hupehensis are likely to have experienced more environmental selection pressure, which may also have contributed to a high level of genetic differentiation [117,118] .

Colonization occurs during the wet and rainy monsoon
Reconstructions of divergence times based on cpDNA and nuclear DNA revealed that the eastern and western lineages of P. hupehensis diverged during the early to middle Miocene.A similar pattern was reported for Cyclocarya paliurus, which also belongs to the Juglandaceae [18] .By contrast, most relict tree   [6,13,42] .Initial intensifications of the East Asian summer and winter monsoons began in the early Miocene owing to the rapid uplift of the Tibetan Plateau [20,28] .Abundant precipitation associated with the monsoons, together with subsequent cooling during the midto-late Miocene and early Pliocene, promoted speciation and lineage differentiation of plants in East Asia [16,119,120] .The SJFR contains a rich diversity of temperate flora, which benefited from the changes in precipitation pattern and incomplete glacial coverage of the Quaternary glaciation [10,121−123] .Our species distribution modeling provides further evidence that temperature and precipitation are the most important climatic predictors of suitable habitat for P. hupehensis.Both P. hupehensis and C. paliurus inhabit wet habitats near riverbanks or streams with high-humidity microclimates.Previous research has suggested that the characteristic of naked buds on temperate trees, as exhibited by P. hupehensis, may be associated with colder temperatures and summer precipitation [18,124] .

Multiple driving forces for lineage differentiation
The Sichuan Basin acts as a geographic barrier between the Sino-Himalayan and Sino-Japanese Forest subkingdoms (more or less along the 105° E line) [125,126] , affecting patterns of genetic diversity and structure for many relict species (e.g., Davidia involucrata [13] , Dysosma versipellis [17] , and Primula ovalifolia [127] ).Both chloroplast and nuclear evidence demonstrate that P. hupehensis has also been influenced by this geographic barrier.The NYX and DFX populations provide a clear illustration of this barrier, as they are close geographically but contain chloroplast haplotypes and nuclear genes from different lineages.The western and eastern lineages of P. hupehensis appear to have diverged at the end of the early Miocene, which was followed by intensification of the EAMS.A previous study also detected a monsoon-driven phylogeographic break between western and eastern lineages of relict species around the Sichuan Basin [16] .
North-south lineage divergence in the Three Gorges region of the east Sichuan Basin has been documented in plants [14,31,44] and animals [126] .Here, chloroplast haplotypes in the eastern lineage of P. hupehensis were further divided into northern and southern clades by the Yangtze River in the Three Gorges region.The barrier of the Three Gorges region blocks gene flow by limiting seed dispersal and animal migration.The absence of this phylogeographic break in nuclear gene analyses can be attributed to long-distance pollen dispersal by wind, and the results presented here for P. hupehensis are a good example of this phylogeographic inconformity.
The origin and formation of the Yangtze River and the Sichuan Basin were synchronized with the uplift of the Tibetan Plateau [128] , although details of the age and developmental history of the Yangtze River have been vigorously debated for more than 100 years [129] .Our results suggest that the Three Gorges barrier to plant gene flow may be traced back to the late Miocene, 5 Mya before the speculated formation time of the Yangtze River [130] .Thus, the phylogeographic break for some relict trees in the Three Gorges region may have occurred in the late Miocene.Overall, our data suggest that significant changes in climate and geography around the Sichuan Basin promoted phylogeographic breaks between the western and

Population demographic history after the Pleistocene
Climatic fluctuations in the Pleistocene glacial and interglacial periods had substantial effects on many relict plants.P. hupehensis experienced a population expansion during the warm LIG period and a slight shrinkage during the LGM period.However, we estimated relatively little fluctuation in its distribution around the Sichuan Basin from the Pleistocene to the future, consistent with findings for some other species [6,126] .Because the micro-environment of the mountains around the Sichuan Basin may have been more stable than that of other regions, East Asia acted as a biodiversity sanctuary during the LGM period.There was a brief cooling period at 120-350 kya (penultimate [Riss] glacial period) before the longest period of Pleistocene warmth [131] , and we detected a bottleneck event during this glacial period (200-400 kya).We speculate that temperature may have had an important effect on the distribution of P. hupehensis.The bottleneck effect can cause a series of negative chain reactions, especially for small populations, resulting in a loss of genetic diversity [132,133] , high levels of population isolation [134] , and altered fitness because of genetic drift and inbreeding [135] .The occurrence of bottlenecks or adversity is likely to lead to high levels of genetic drift [132,136] .
Our field investigations revealed that the current population of P. hupehensis is small and fragmented, with only a few dozen individuals in some populations [53] .P. hupehensis is currently listed as vulnerable on the IUCN Red List of Threatened Species.Our findings highlight the risk of a gradual decline in effective population size in the event of renewed adversity (Fig. 4c).Thus, future work should aim to assess the genomic vulnerability of each population, and both ex situ and in situ conservation of these small populations should be improved [116] .We should focus not only on the effect of global warming and greenhouse gas emissions on this species but also on the interference of human activities with its natural habitat.

Conclusions
We used cpDNA and nuclear DNA data to reconstruct the phylogeographic history of P. hupehensis.Both cpDNA and nuclear genetic data revealed two distinct lineages corresponding to two phylogeographic regions.However, the cpDNA data suggest a relatively isolated and stronger phylogeographic structure than the nuclear data.This result suggests that pollen flow plays a more important role than seed flow in shaping genetic structure.External geologic and climatic changes have also influenced current genetic distribution patterns.Strengthening of the EAMS during the early to middle Miocene appears to have been the main driver of colonization and differentiation in P. hupehensis.The Three Gorges region, which acts as a seed dispersal barrier, promoted further north-south differentiation among the eastern lineages.In addition to population genetics studies and modeling, more efforts should be directed toward searching for P. hupehensis populations in the western Sichuan Basin.Genetic resources of P. hupehensis from the western and southwestern Sichuan Basin should be given priority.

Fig. 1
Fig. 1 Distribution of 24 chloroplast DNA (cpDNA) haplotypes of Pterocarya hupehensis.The lower right panel shows the minimum spanning network of 12 chlorotypes.Circle sizes are proportional to the number of samples per haplotype.The red dashed line on the map represents the Sino-Himalayan/Sino-Japanese forest boundary.The black dashed line represents the phylogeographic break along the Yangtze River in the eastern Sichuan Basin.The black dotted lines delineate three phylogroups with closely related chlorotypes.

A c c e
p t e d & U n -e d i t e d that the western lineage separated from the eastern lineage at 16.78 Mya (95% HPD: 11.99-22.25Mya) (Supplemental Fig. S6), very similar to the divergence time of haplotypes between the western and eastern lineages estimated from cpDNA data (16.7 Mya, 95% HPD: 11.8-21.9Mya).

Fig. 2
Fig. 2 BEAST-derived chronogram of 24 Pterocarya hupehensis haplotypes based on six chloroplast DNA (cpDNA) fragments.Blue bars and numbers above bars indicate the 95% highest posterior densities (HPDs) of the time estimates.Posterior probabilities are labeled at each node.
A c c e p t e d & U n -e d i t e d

Fig. 3
Fig. 3 Genetic structure of Pterocarya hupehensis.(a) Geographic origins of 18 P. hupehensis populations and their color-coded grouping at close to K = 2.The red dashed line represents the Sino-Himalayan/Sino-Japanese forest boundary.(b) Principal component analysis (PCA), with pink and blue colors representing two clusters.(c) Histogram of the Admixture analysis of 122 individuals from 18 populations of P. hupehensis based on RAD-seq data for 2889 SNPs.
A c c e p t e d & U n -e d i t e d

Fig. 4
Fig. 4 Hybridization among populations of Pterocarya hupehensis.(a) Maximum likelihood (ML) tree inferred using Treemix; gene-flow events are depicted with arrows, and ten migration events were allowed.Migration arrows are colored according to their weight.Horizontal branch lengths are proportional to the amount of genetic drift that occurred on each branch.The scale bar shows 10× the average standard error of the entries in the sample covariance matrix.(b) Heatmaps of residual fit from the ML tree.Residuals white through blue indicate that the corresponding populations are more closely related to each other than on the ML tree, suggesting confounding events between these populations.(c) Demographic history of P. hupehensis inferred with Stairway Plot 2 using unfolded site frequency spectra.The 95% confidence interval for estimated effective population size is shown with gray lines.
A c c e p t e d & U n -e d i t e d species in this area diverged during the Pliocene, including Davidia involucrata (4.81 Mya), Euptelea pleiosperma (3.64 Mya), and Populus lasiocarpa (3.66 Mya)

Fig. 5
Fig. 5 Results of ecological niche modeling of P. hupehensis in four time periods from past to future.(a) Average projection of the model to present climatic conditions.(b) Average projection of the model to the last glacial maximum (c.21 kya).(c) Average projection of the model for the last interglacial (c.120-140 kya).(d) Average projection of the model to the year 2070 (2061-2080) under an intermediate climate warming scenario (RCP 4.5).Colors from blue to red represent the degree of habitat suitability for P. hupehensis survival, from unsuitable to suitable.

A c c e
p t e d & U n -e d i t e d eastern lineages, whereafter in the eastern lineage of P. hupehensis.

A c c e
p t e d & U n -e d i t e d A c c e p t e d & U n -e d i t e d A c c e p t e d & U n -e d i t e d

Table 1 .
Sample codes, sample locations, and genetic diversity indices for 18 populations of Pterocarya hupehensis based on RAD-seq data and cpDNA data.

Table 2 .
Analyses of molecular variance (AMOVAs) based on cpDNA data and RAD-seq data for Pterocarya hupehensis from western and eastern lineages.