Genomic clines across the species boundary between a hybrid pine and its progenitor in the eastern Tibetan Plateau

Most species have clearly defined distribution ranges and ecological niches. The genetic and ecological causes of species differentiation and the mechanisms that maintain species boundaries between newly evolved taxa and their progenitors are, however, less clearly defined. This study investigated the genetic structure and clines in Pinus densata, a pine of hybrid origin on the southeastern Tibetan Plateau, to gain an understanding of the contemporary dynamics of species barriers. We analyzed genetic diversity in a range-wide collection of P. densata and representative populations of its progenitors, Pinus tabuliformis and Pinus yunnanensis, using exome capture sequencing. We detected four distinct genetic groups within P. densata that reflect its migration history and major gene-flow barriers across the landscape. The demographies of these genetic groups in the Pleistocene were associated with regional glaciation histories. Interestingly, population sizes rebounded rapidly during interglacial periods, suggesting persistence and resilience of the species during the Quaternary ice age. In the contact zone between P. densata and P. yunnanensis, 3.36% of the analyzed loci (57 849) showed exceptional patterns of introgression, suggesting their potential roles in either adaptive introgression or reproductive isolation. These outliers showed strong clines along critical climate gradients and enrichment in a number of biological processes relevant to high-altitude adaptation. This indicates that ecological selection played an important role in generating genomic heterogeneity and a genetic barrier across a zone of species transition. Our study highlights the forces that operate to maintain species boundaries and promote speciation in the Qinghai-Tibetan Plateau and other mountain systems.


INTRODUCTION
Mating barriers and adaptive divergence are major drivers of speciation (Mayr, 1942;Nosil et al., 2005). Knowledge about the genetic and ecological causes of species differentiation is thus the key to understanding speciation and the distribution of biodiversity across landscapes and ecosystems. Hybrid zones or secondary contact zones provide opportunities to study contemporary dynamics of selection and evolution of reproductive isolation on a tractable timescale (Barton, 2001;Petit and Excoffier, 2009). Both theoretical and empirical studies suggest that species boundaries can be maintained by pre-mating barriers and post-mating intrinsic genetic incompatibilities between hybridizing species, and/or bounded hybrid superiority in intermediate habitats that maintain restricted hybrid zones and separate the parent populations (Moore, 1977;Barton and Hewitt, 1989;Nosil et al., 2005;Mao and Wang, 2011;Hamilton et al., 2013). Fitness variations along geographical and ecological transitions are expected to generate genomic heterogeneity in introgression, with the direction and rate of introgression influenced by the selective advantage or disadvantage of genetic variants in different genomic and ecological backgrounds (Barton, 2001;Nosil et al., 2009;. Genomic heterogeneity can, however, also result from neutral processes such as drift, population structure, and linked selection (Nosil et al., 2009;Cruickshank and Hahn, 2014;Ravinet et al., 2017;Colicchio et al., 2021). One way to disentangle these causes is to compare the frequencies of alleles at each locus with a genome-wide average representing the genomic ancestry of an individual or population, and evaluate genomic clines in rates and patterns of introgression among loci Fitzpatrick, 2013). Loci of exceptional introgression can be regarded as candidates that might underlie hybrid fitness or assortative mating Fitzpatrick, 2013). Further association analyses of genomic and environmental clines could provide additional evidence about the role of ecological selection in species delineation and yield important insights into species boundary dynamics and genomic heterogeneity.
Gymnosperms, especially conifers, are mostly outcrossing and show weak genetic incompatibilities even among taxa that diversified more than 10 million years ago (Critchfield, 1975;Zhao et al., 2014). This creates ample opportunities for gene exchange between parapatric or partially overlapping species when pollen dispersal is extensive. Even so, each species has a clearly defined distribution range and ecological niche (Mirov, 1967;Mao and Wang, 2011;Jin et al., 2021), suggesting that intrinsic and extrinsic forces act to define species barriers. However, insights into the forces that operate across species boundaries in the presence of gene flow remain limited.
Pinus densata forms extensive forests in the southeastern Tibetan Plateau . Intermediate forms of its needle and cone traits led to the suggestion that P. densata was a hybrid between Pinus tabuliformis and Pinus yunnanensis (Wu, 1956;Cheng and Fu, 1978). All three species are diploid (2n = 24) and typically wind-pollinated, outcrossing conifers. A series of genetic studies based on mitochondrial (mt), chloroplast (cp), and nuclear DNA provided evidence to support its hybrid origin in the late Miocene (Wang and Szmidt, 1994;Wang et al., 2011;Gao et al., 2012). These studies revealed an ancient hybrid zone between P. tabuliformis and P. yunnanensis in the northeastern margin of the extant P. densata range, where maternal mtDNA haplotypes characteristic of P. tabuliformis and P. yunnanensis were present, creating high mt haplotype diversity in the region (Song et al., 2002;Wang et al., 2011). From this hybrid zone, P. densata migrated westward and colonized its current distribution Gao et al., 2012;Zhao et al., 2020). During the westward migration, founder events, allele surfing, and physical environments contributed to distinct population differentiation between geographical regions Gao et al., 2012;Zhao et al., 2020). The demographic histories of these populations, which have not yet been studied, would provide additional information about the process of establishment of P. densata on the plateau and the impacts of regional geological events on its persistence and genetic diversity.
The three species are currently allopatric, with P. tabuliformis distributed in northern China, P. yunnanensis in southwest China, and P. densata in between them (Supplemental Figure 1). Two contact zones between pairs of these species are located in the southern and northern margins of the P. densata distribution, and the contact zone with P. yunnanensis is much wider than that with P. tabuliformis . The evolutionary history and distribution of this species complex raise an intriguing question: how do newly evolved hybrid species maintain genetic boundaries with their progenitors?
To explore these questions, we performed genome-wide population genetic analyses using exome capture sequencing with 40 000 exome probes in a range-wide collection of P. densata populations and representative P. tabuliformis and P. yunnanensis populations. The choice of genotyping method was dictated by the large (25 Gbp) and highly repetitive genome of pine (Niu et al., 2022), which constrains the feasibility of resequencing-based population studies. Exome capture sequencing enriches the representation of targeted exons while filtering away repetitive regions. We then focused on the broad contact zone between P. densata and P. yunnanensis to examine genomic patterns of introgression. We aimed to (1) characterize genetic variation over the range of P. densata to discern the clinal genetic and gene flow patterns across the landscape, (2) reconstruct the demographic histories of P. densata to understand the impact of climate oscillation on population persistence, and (3) quantify patterns of genomic clines across a species transition zone to define the role of selection in maintaining species barriers. Our results show that introgression, geographical barriers to gene flow, persistence during glacial cycles, and ecological selection over heterogeneous landscapes have contributed to the strong population structure in P. densata and the buildup of intrinsic barriers with its progenitor. This pattern may be generally applicable to plant speciation and diversification on the Qinghai-Tibetan Plateau (QTP), one of the most prominent biodiversity hotspots on Earth.

Population structure and genetic diversity
We included 24 populations of P. densata, two representative populations of P. tabuliformis, and nine populations of P. yunnanensis in this study ( Figure 1A). Exome capture sequencing of 268 sampled trees generated an average of 5.82 million reads per tree, with a mean depth ranging from 173 to 503. Alignment of the sequence reads to the Pinus taeda genome yielded an average of 15.62 Mbp of genomic sequence covered by at least five reads per individual. After stringent filtering, we retained 8 161 108 sites for all 268 individuals. After removal of non-polymorphic sites, 458 321 SNPs were retained; a further minor allele frequency (MAF) <0.05 filtering reduced the final dataset to 79 028 SNPs. Exclusion of Pinus sylvestris (outgroup) samples yielded 77 368 SNPs (MAF >0.05) in the 260 samples of P. tabuliformis, P. yunnanensis, and P. densata.
FastSTRUCTURE determined K = 5 to be an optimal number of clusters to explain the genetic structure in the P. densata species complex ( Figure 1B). Similar to the report of Zhao et al. (2020), one cluster was unique to P. yunnanensis, and the other four clusters split P. densata into distinct geographic zones: west (W), southwest (SW), central (C), and east (E). The eastern range of P. densata was dominated by P. tabuliformis ancestry ( Figure 1B). This grouping was further supported by principalcomponent analysis (PCA), showing clear division of group E and P. tabuliformis from the other four groups/species along PC1, and separations of the other groups/species along PC2 and PC3. The first three PCs jointly explained 24.50% of the total variation (Tracy-Widom test, p < 0.001, Figure 1C). A neighbor-joining (NJ) tree also recovered a topology that assigned samples to five major groups with admixed individuals occupying intermediate positions along branches ( Figure 1D). Differentiation among the four groups of P. densata (E, C, W, and SW) was significant, with F ST between the groups ranging from 0.083 to 0.294 with an average value of 0.176, which was 5-fold higher than the average F ST (0.036) between populations within each group (Supplemental Table 1). The highest F ST value was observed between the peripheries of W and E. Notably, nucleotide diversity (p) in each group of P. densata decreased gradually from east (p = 0.0040) to west (p = 0.0023) (Table 1, Figure 1E). Tajima's D for each group was negative, which indicated population expansion in the recent past (Table 1).
The high F ST values among geographical regions motivated us to generate an overview of gene flow across the landscape. The estimated effective migration surface (EEMS) highlighted scattered barriers to gene flow ( Figure 1F) that mostly coincided with population subdivision. One strong barrier in the eastern part of the P. densata distribution divided group E from the rest of the distribution. Another extensive barrier separated group C from the western range. Within the C region, minor blocks were also present. Further to the west, another block restricted migration between groups SW and W. Regions where gene flow was higher than expected only existed within each geographical zone. All these barriers and corridors were supported by high posterior probability (>0.90) of the migration parameter (Supplemental Figure 2). It should be noted that our sampling focused on representation of major geographical regions of P. densata, and extrapolation beyond the sampling range should therefore be interpreted with caution.

Demographic history
To understand the demographic history of the distinct genetic groups in P. densata, we plotted the changes in their effective population size (N e ) over time ( Figure 2). Group E increased steadily from $4 Mya, with a slight population contraction at $1 Mya, leading to a contemporary N e of $1.60 3 10 5 . Group C experienced a steep decline at $0.5 Mya followed by a quick recovery to the previous N e of 0.81 3 10 5 and has remained relatively stable since. Group SW experienced a contraction at about 0.4 Mya followed by a recovery and then remained at a stable N e of 0.32 3 10 5 . Group W had a weak expansion around 0.3 Mya and then maintained a low N e ($0.23 3 10 5 ) over the past 0.3 million years. To validate these results, we used fastsimcoal2 (Excofffier et al., 2021) to establish the most likely demographic scenarios for these groups with multipopulation models that consider different modes of gene flow and population size changes. The recovered best-fitting model largely agrees with the Stairway results confirming the bottlenecks in C, with an early contraction at 0.8 Mya in the ancestral population of C and a more recent one at 0.1 Mya. However, the strength of the first bottleneck in C was not as strong as that shown by Stairway, likely owing to differences in accounting for gene flow between the two methods. A detailed presentation of the fastsimcoal results is available in the supplemental information.
Introgression between P. densata and P. yunnanensis Detection of introgression using TreeMix suggested that the most likely number of gene flow events (m best ) was two, under which the model explained 99.97% of the variation in the data (Supplemental Figure 3). One strong signal of gene flow (29.30%; i.e., the estimated fraction of ancestry in the receiving population derived from the donor) was from P. yunnanensis to group SW of P. densata, and another weaker signal (12.42%) was between group E and the ancestor of groups SW and W ( Figure 3A). ABBA-BABA tests identified eight of the 20 trios with significant D values (p < 0.001, Supplemental Table 2). By calculating the f branch, we assigned significant introgression on the E branch with the other groups of P. densata and P. yunnanensis ( Figure 3B). This gene migration event should be ancient, as contemporary gene flow from E is geographically restricted ( Figure 1F). This result agrees with previous studies that recognized the E region as the place of origin of P. densata (Song et al., 2002;Wang et al., 2011;Gao et al., 2012;Zhao et al., 2020). The strongest introgression occurred between group SW and P. yunnanensis and was also detected by TreeMix. Another gene flow event occurred between group C of P. densata and P. yunnanensis ( Figure 3B). Overall, these results provided evidence of pervasive introgression between P. yunnanensis and P. densata.

Genomic clines and functional annotation of outliers
The detected introgression between P. yunnanensis and P. densata led us to explore genomic clines in the species complex to better understand the roles of gene exchange and selection on adaptive evolution. We selected 17 populations along a transition zone that runs from P. densata to P. yunnanensis to examine patterns of genomic clines. We estimated the degree of admixture and hybrid index for each individual ( Figure 4A) and found that hybrid index correlated strongly with elevation (Pearson's r = À0.9119, p < 0.01, Supplemental Figure 4A). Individuals at lower elevations were genetically more similar to P. yunnanensis, whereas those at higher elevations inherited more of their alleles from P. densata.
We ran Bayesian estimation of genomic cline (BGC) analysis on 57 849 SNPs in the 17 populations to identify loci with exceptional introgression in the admixed populations. We found 1941 outliers, including 640 positive a, 1227 negative a, 64 positive b, and 14 negative b outliers (Figures 4B and 4C; Supplemental Table 3). The 640 positive a outliers and 1227 negative a outliers represented excess ancestry in P. yunnanensis and P. densata, respectively. The b parameter represents the rate of introgression, and negative and positive outliers represent loci that may be associated with adaptive introgression and reproductive isolation, respectively. Four of the b outliers were also a outliers.
We annotated the detected outliers to gain a preliminary understanding of their biological significance. Of the 1941 BGC outliers, 684 were located in intergenic regions and the remaining 1257 were in 702 genes (hereafter referred to as BGC-outlier genes, Supplemental Table 3). We classified the 702 BGC-outlier genes into 123 biological categories that were significantly enriched in Gene Ontology (GO) terms ( Figure 4D; Supplemental Table 4; p < 0.05), including 77 biological process, 32 molecular function, and 14 cellular component terms. In the biological process category, we identified, in addition to growth and material transport and metabolism, a number of terms related to highelevation adaptation. These included terms related to DNA repair (GO:0033683, GO:0006296, GO:0000731, GO:0006301), carbohydrate metabolism (GO:0005351 and GO:0005402), and flavonoid biosynthesis (GO:0009962 and GO:0009963); flavonoids act as sunscreens and antioxidants to prevent damage and/or help plants adapt to a wide range of stress conditions in alpine environments (Baskar et al., 2018).

Association of genomic outliers with environments
We performed redundancy analyses (RDAs) as another test for signatures of genetic adaptation. Forward selection retained three environmental variables (annual mean temperature [bio1], precipitation seasonality [bio15], and wet-day frequency [WET]) and one distance-based Moran's eigenvector map (dbMEM) . Their variance inflation factors (VIFs) were all below 10, suggesting that multicollinearity among these predictors was properly controlled.
In the partial RDA that controlled for the spatial component, environment had a large and significant effect (p = 0.001, adjusted R 2 = 26.9%) on allele frequencies among populations ( Figure 5A). When controlling for environment, the exclusive contribution of spatial distances was not significant (p = 0.059, adjusted R 2 = 1.70%; Figure 5A). Partial RDA identified two significant RDA axes (p < 0.05) that explained 8.40% and 1.92% of the total variance. The first axis was primarily driven by bio1, whereas the second was associated with bio15 and WET.
RDA identified a total of 387 outlier SNPs significantly correlated with environmental predictors (Figures 5B and 5C; Supplemental Table 3 Lines represent the median effective population size (N e ) over time, and shaded areas represent 95% confidence intervals. The major glaciations on the QTP, i.e., the Xixiabangma (1.2-0.8 Mya), Naynayxungla (0.7-0.5 Mya), and Guxiang (0.3-0.1 Mya) Glaciations, are highlighted with gray vertical bars. outliers ( Figure 5B). Of the 102 RDA-BGC shared outliers, 46 were located in intergenic regions and 56 in 38 genes (Supplemental Table 3). These genes were divided among biological functions influencing growth and development of plants (Supplemental Tables 3 and 4 ). We examined the association of these 102 loci with environment and found that 61 loci were mostly correlated with bio1, 19 loci with bio15, and 22 loci with WET ( Figure 5C; Supplemental Table 3). We selected three loci to illustrate their clinal variation along climate gradients. Locus scaffold653363_26548 is located in a squamosa promoter binding protein-box gene (SPL8) that regulates flowering, growth and development, and drought tolerance (Unte et al., 2003;Gou et al., 2018). Allelic distribution of this locus was strongly correlated with bio1 ( Figure 5D). Locus tscaffold3033_455200 was annotated as a galactinol synthase gene (GolS4) whose encoded enzyme synthesizes galactinol and protects plant cells from oxidative damage (Nishizawa et al., 2008). The allele frequency of this locus co-varied with the precipitation gradient bio15 ( Figure 5E). Locus scaffold165922_114751 is located in a gene encoding a member of the cytochrome P450 family (LUT1) required for carotenoid epsilon-ring hydroxylation activity (Tian et al., 2004). Allelic frequency distribution at this locus followed the WET gradient ( Figure 5F). Interestingly, the hybrid index showed a strong correlation with bio1 (r 2 = 0.9295, p << 0.001; Figure 6A) and WET (r 2 = 0.8782, p << 0.001; Figure 6B). When temperature and WET increased, P. yunnanensis genotypes increased almost linearly (Figures 6D and 6E).

Distinct population structure on the Tibetan Plateau
Wind-pollinated, outcrossing conifers are generally characterized by weak population differentiation over large geographical distances due to extensive gene flow via pollen (Alberto et al., 2013;Hall et al., 2021). Across the large distribution range of P. tabuliformis from northern to central China, population differentiation (F ST ) in the nuclear genome is only 3% (Xia et al., 2018). By contrast, P. densata showed an F ST among groups as high as 18%. Group E of P. densata and P. tabuliformis shared very similar ancestry in the analyzed genomic regions. Previous studies show that group E is distinguishable from P. tabuliformis in mtDNA composition and morphometric traits (Mao et al., 2009;Wang et al., 2011), but the paternally inherited chloroplast DNA is dominated by P. tabuliformis types . This genetic pattern suggests extensive backcrossing with P. tabuliformis via pollen flow in the eastern range of P. densata Gao et al., 2012) that genetically homogenized the populations of the two species. Whether group E of P. densata should be classified as P. tabuliformis is an open taxonomic question. Group C showed admixed ancestry from P. yunnanensis, but little gene flow was detected in the opposite direction from P. densata to P. yunnanensis. Asymmetric introgression is commonly observed in hybridizing species complexes (e.g., Arnold et al., 2010;De La Torre et al., 2014;Todesco et al., 2020) and can result from selection on hybrids, genetic incompatibility, or premating barriers between hybridizing populations. We explore these possibilities in the section ''Genomic clines associated with adaptation to the alpine environment.'' The SW and W groups were distinct from each other and from P. tabuliformis and P. yunnanensis, indicating a long period of isolation from inter-and intraspecific gene exchange. The observed population structure in P. densata is consistent with a fragmented migration landscape shaped by high mountains and deep valleys, even over short distances. Furthermore, divergent selection over heterogeneous environments also played a significant role in shaping genetic clines in P. densata (Zhao et al., 2020).
High genetic differentiation is a characteristic pattern of mountain biodiversity. We reviewed studies that report nuclear genomebased F ST for plants on QTP and adjacent mountains. Surprisingly, nuclear genome-based studies that report population genetic parameters are scarce among the large number of biogeographical studies of diverse taxonomic groups on the QTP (Wen et al., 2014;Mao et al., 2021). Nevertheless, distinct genetic clusters/ lineages within species were commonly detected in the available data (e.g., Fu et al., 2018;Ma et al., 2021;Hu et al., 2022). In Primula tibetica, F ST between populations can be as high as 80% (Ren et al., 2017). Populations of a threatened conifer, Cupressus chengiana, in the northern Hengduan Mountains have pairwise F ST values of 11%-18% despite its narrow distribution (Li et al., 2020). These cases suggest that habitat isolation together with niche-specific selection promote genetic differentiation and clines, even in conifers capable of extensive gene flow.

Population persistence during glacial periods
Previous studies suggest that the distinct genetic groups of P. densata arose through stepwise colonization from E to W during the late Pliocene, as supported by declining population heterozygosity and increasing population differentiation along the migration route Gao et al., 2012;Zhao et al., 2020). Reconstructions of the Quaternary climatic history of the QTP recognize four major glaciations: the Xixiabangma, Naynayxungla, Guxiang, and Baiyu (the last) Glaciations, dated at about 1.2-0.8, 0.7-0.5, 0.3-0.1, and 0.07-0.01 Mya, respectively (Zheng et al., 2002). The Naynayxungla Glaciation was the most extensive, covering large areas of the southeastern QTP and forming large ice caps, glacier complexes, and great valley glaciers (Zheng et al., 2002). Our examination of population size changes within each group of P. densata correlated well with these regional climate events (Figure 2). The population contraction in group E at about 1 Mya aligned with the Xixiabangma Glaciation (1.2-0.8 Mya). Group C experienced its first strong bottleneck at about 0.5 Mya during the Naynayxungla Glaciation (0.7-0.5 Mya), and another weaker decline at about 0.2 Mya coincided with the Guxiang Glaciation. In contrast to groups E and C, groups SW and W each had 3-to 8-fold smaller effective population sizes and showed little size fluctuation since the mid-Pleistocene. This lack of fluctuation may reflect a smaller impact of glaciaiton on populations that were small from the start than on large, widespread populations . A noticeable pattern in all groups is the rapid population recovery during the interglacial periods after each contraction. This clearly suggests that P. densata persisted in multiple regional refugia during the Quaternary ice age and thus preserved the ancient genetic structure that facilitated population rebound. Reconstruction of biogeographical history of diverse plant species frequently reveals the preservation of multiple refugia on the QTP during glacial periods (Opgenoorth et al., 2010;Wen et al., 2014;Ren et al., 2017;Fu et al., 2018;Ma et al., 2021;Hu et al., 2022). The topographic heterogeneity of the QTP and connecting mountains provided rich micro-habitats and broad altitudinal gradients that could have buffered plants from climate fluctuations, allowing their persistence during the Quaternary ice age (Chen et al., 2019;Ding et al., 2020;Guo et al., 2022). The preservation of scattered refugial populations may have led to rapid local range expansion during the postor interglacial periods and thus to deep intraspecific divergence. Such population dynamics likely underlay the evolution of high biodiversity in the QTP and other mountain systems.

Genomic clines associated with adaptation to the alpine environment
Zones of hybridization and introgression are good settings for understanding genetic differences that are potentially associated with adaptation and reproductive isolation between diverging species (Barton and Hewitt, 1989;Buerkle and Lexer, 2008). We selected a transition zone between P. densata and P. yunnanensis for genomic cline analyses for two reasons: (1) the populations in this zone are distributed along an elevation (C) An a 3 b plot illustrates the 2D density of cline width/rate representing the cline center (a; x axis) and steepness of clines (b; y axis). Polygons define density space for significant a (blue), b (orange), and both (purple) outliers.
(D) Network of enriched GO terms of BGC-outlier genes. Node size represents the number of genes annotated to each GO term; each cluster is coded with a different color, and the shading of each node from light to dark represents the p value specified by the enrichment test from large to small.
(E) KEGG pathway enrichment for the BGC-outlier genes.
gradient that runs north to south, with a contact zone and admixed populations in the middle; (2) they are distributed along major river valleys that provide corridors for gene flow. Thus, possibilities for gene exchange among populations along valleys are not hindered by major geographical barriers, and an isolation-by-distance model provides a valid neutral expectation.
BGC identified exceptional introgression at 3.36% (1941) of the 57 849 loci analyzed in the admixed populations. Approximately two-thirds of these outlier loci showed excess P. densata ancestry and one-third showed excess P. yunnanensis ancestry. This could be the result of either divergent selection operating on different loci within the contact zone or genetic drift. Among the 1941 BGC outlier loci, 14 showed more rapid than expected introgression rates (negative b), suggestive of adaptive introgression, and 64 showed slower than expected introgression (positive b), suggestive of loci associated with reproductive barriers . In the genus Pinus, widespread intrinsic genetic incompatibilities are absent among closely related taxa (Critchfield, 1975). We previously examined the genetic compatibility between P. densata, P. yunnanensis, and P. tabuliformis in a controlled crossing experiment and found that the crossability between P. densata and P. yunnanensis was as high as 0.51 (Zhao et al., 2014). Thus, detection of numerous outliers associated with intrinsic reproductive isolation is not expected among crossable pines. In the genic view of speciation, reproductive isolation can be achieved by a small number of major barrier loci (Wu, 2001). The potential roles of these 64 outliers in reproductive isolation should therefore be investigated further.
It has been suggested that, when hybrid fitness involves many loci with different impacts, the loci undergoing the strongest selection will likely be labeled as b outliers, whereas weakly selected loci will be shown as having excess ancestry . This is likely to be the case in a hybrid zone between the two North American pine species Pinus strobiformis and Pinus flexilis, where thousands of excess-ancestry outliers but no b outliers are found (Menon et al., 2018). This pattern of genomic cline is regarded as supporting evidence for species boundaries that are maintained by external ecological factors (Ryan et al., 2017;Menon et al., 2018). Simulations also indicate that although both genetic drift and selection can cause excess ancestry and variable introgression, loci with excess ancestry are often enriched in selected loci . Most studies of conifer hybrid zones have recognized that ecological selection plays a dominant role as a driver of species diversification, but intrinsic isolation loci are rarely recovered (Holliday et al., 2010;De La Torre et al., 2014;Menon et al., 2018). Our results thus suggest that intrinsic genetic barriers may evolve faster when selection pressure is high, and this is likely the case in harsh or contrasting habitats. This explains differences in the recovery of introgression outliers among studies because the results reflect specific populationenvironment dynamics.
Our detection of strong clines in the outliers, their association with critical climatic factors, and the significant (B) Venn diagram depicting overlap between outliers identified by BGC and RDA. (C) Significant environment associated outliers identified by RDA, with the best-supported SNP-environment association in the same color. Dark colors represent the 102 RDA-BGC outliers.
(D-F) Illustration of allelic frequency variation along climate gradients of three outlier loci located on genes SPL8, GolS4, and LUT1 and their association with bio1, bio15, and WET, respectively. The frequencies of the two alternative alleles at each locus are shown in pie charts.
impact of environment on population divergence substantiate selection along climate gradients as a driving force for differential introgression, and thus the delineation of species distributions. The enrichment of genes related to adaptation to the alpine environment in the outlier loci adds functional support for the biological significance of the outliers. Although it is speculative to link these signals to local adaptation without molecular and functional validation, they nevertheless constitute a coherent set of evidence for a role of ecological selection in facilitating adaptive divergence in alpine environments.
Overall, our study highlights the contributions of introgression, migration history, and barriers to gene flow over fragmented and heterogeneous landscapes to the broad-scale population structure and genetic clines of P. densata. Climatic fluctuations in the Quaternary caused recurrent cycles of population decline and post-glacial expansion, a process that is characteristic of the biogeographical history of QTP flora. The preservation of ancient genetic lineages and divergent selection over heterogeneous habitats are important drivers of intraand interspecific diversification in this important biodiversity hotspot. The association of genomic clines with environmental clines also provided valuable perspectives on species boundary dynamics. We uncovered genomic imprints of ecological adaptation and reproductive isolation across a species transition zone, suggesting that both intrinsic genetic barriers and extrinsic ecological selection play a key role in maintaining species boundaries. Thus, although gene flow tends to homogenize connected populations, the coupling of genetic and ecological factors could increase reproductive isolation and ensure species separation and divergence. However, much remains to be done to relate genomic patterns to evolutionary processes. Whole-genome resequencing and better annotation of the pine genome will continue to refine our understanding of the targets, tempo, and modes of selection in this species complex and conifer evolution in general.

Population sampling and exome capture sequencing
This study included 24 populations of P. densata, two representative populations of P. tabuliformis, and nine populations of P. yunnanensis ( Figure 1A). Each population included eight individuals, except for five populations from which four individuals were collected. The sampled trees in each local stand were separated by at least 50 m from each other. Identification and sampling of P. densata were guided by species distribution maps, morphology, and mtDNA composition. Populations from the eastern margin of P. densata (group E in this study) exhibit heterogeneous maternal lineages as shown by a mix of mtDNA haplotypes found in the two parent species . Phenotypically, these eastern populations are distinct from the two parental species in cone and seed morphometric traits (Mao et al., 2009). The name, location, and sample size of each population are listed in Table 1. Of the 36 populations, 27 (nos. 1-6, 8-25, and 33-35) have been used for exome capture sequencing in a previous study (Zhao et al., 2020). The other nine populations (nos. 7, 26-32, 36) were new collections for this study. One population of P. sylvestris was included as an outgroup.
Needles were collected from individual trees in each natural stand, and genomic DNA was extracted using a Plant Genomic DNA Kit (Tiangen, Beijing, China). We performed probe capture sequencing using the same set of 40 000 exome probes as in Zhao et al. (2020). The probes, each 120 nt in length, were designed from P. taeda UniGenes (Neves et al., 2013). The majority of the probes aligned to c. 29 000 genes, and 9800 probes aligned to intergenic regions. The laboratory work consisted of three major steps: (1) genomic DNA fragmentation, (2) hybridization to capture probes, and (3) paired-end sequencing (23 101 bp) of the captured DNA fragments on the HiSeq 2000 platform (Illumina) (Neves et al., 2013). Library preparation, probe hybridization, and sequencing were conducted by RAPiD Genomics (Gainesville, FL, USA). In total, 268 samples were successfully genotyped and analyzed in this study.
To reduce the computation time of the Genome Analysis Toolkit (GATK) pipeline (Van der Auwera et al., 2013), we prepared reduced BAM files and the corresponding reduced reference for each sample. In brief, variants were called with the SAMtools and BCFtools pipeline using default parameters (Li, 2011) for all 268 samples. Scaffolds that had at least one SNP in R50% of the individuals were included in the reduced reference. The reduced BAMs were prepared with SAMtools based on the new reduced reference. PCR duplicates were removed using Picard MarkDuplicates (http://broadinstitute.github.io/picard/). Reads around indels were locally realigned using RealignerTargetCreator and IndelRealigner in GATK. Variant calling was performed using HaplotypeCaller. GenotypeGVCFs was then used to perform multisample Several filtering steps were performed to minimize genotyping errors: indels were removed, SNPs located in repetitive regions (with reference to the P. taeda genome v.1.01) and with mapping quality <40 were removed, genotypes with genotype quality <20 or read depth <3 were masked as missing, and SNPs that met any of the following criteria were removed: missing rate >20%, MAF < 5%, heterozygosity >70%, or allele number >2. The remaining SNPs were used for population genomic analyses, with the exception of site frequency spectrum (SFS) and nucleotide diversity-based analyses, for which no MAF filtering was applied.

Population structure and diversity
Population genetic structure was examined using FastSTRUCTURE (Raj et al., 2014), with the number of genetic clusters (K) required to explain the structure in the dataset set to 1-16. The best K value that maximized the marginal likelihood of clustering in the data was identified by the script ''chooseK.py''. The graphical illustration of ancestry coefficients was plotted using ''distruct.py''. Differentiation (F ST ) between genetic clusters was estimated using the R package HIERFSTAT (Goudet, 2005). Population structure was also investigated using PCA implemented in EIGENSOFT v6.1.4 (Price et al., 2006). An NJ tree among the samples was constructed using PHYLIP version 3.698 (Felsenstein, 2005). Nucleotide diversity (p) (Nei and Li, 1979) and Tajima's D (Tajima, 1989) of each population and group were calculated including both informative and invariant sites with VCFtools 0.1.16 (Danecek et al., 2011).

Estimated effective migration surface
To visualize spatial patterns of gene flow, we used EEMS (Petkova et al., 2016). EEMS uses a stepping-stone model (Kimura and Weiss, 1964) to identify geographic regions where decays in genetic similarity deviate from isolation-by-distance predictions, thus highlighting dispersal corridors or barriers. EEMS was run using a deme size of 800 with a Markov chain Monte Carlo (MCMC) length of 20 million iterations following a burn-in of 10 million and a thinning interval of 9999. The reliability of EEMS models was evaluated by checking the linear relationship between the observed and fitted dissimilarities within and between demes. Finally, the surfaces of effective migration rates (m) were generated and plotted using the rEEMSplot package (https://github.com/dipetkov/eems).

Inference of demographic history
Population genetic analyses identified four distinct groups in P. densata (see Results). We inferred the demographic histories of these groups (E, C, W, and SW) using the flexible multi-epoch coalescent approach implemented in Stairway Plot 2 (Liu and Fu, 2020). We constructed the folded one-dimensional SFS for each species or group by performing the down projection method using the Python script ''easySFS'' (https://github. com/isaacovercast/easySFS). The final folded SFS for P. densata contained 7 544 066 sites. We set the mutation rate to 7 3 10 À10 per site per year and the generation time to 50 years (Willyard et al., 2007). Two hundred subsamples of 67% of all sites were generated to estimate the median and 95% confidence interval of effective population size (Ne) over time. We also used another coalescent simulation-based method, fastsimcoal2 version 2.7 (Excofffier et al., 2021), to infer the demographic history of P. densata groups. We tested 11 scenarios, which differed in (1) the levels of gene flow between groups, (2) the mode of population size change after splitting from the ancestral population, and (3) whether and when strong bottlenecks occurred.
Detailed descriptions of fastsimcoal model testing are presented in the supplemental information.

Detecting introgression
To infer the potential presence of gene flow between distinct genetic groups, we applied a composite-likelihood approach implemented in TreeMix v1.12 (Pickrell and Pritchard, 2012) on linkage disequilibrium (LD)-pruned data. We identified and removed variants in strong LD (r 2 threshold of 0.05) along windows of 50 SNPs and a step size of five SNPs using PLINK (Purcell et al., 2007;Chang et al., 2015). We tested the addition of 0-10 migration events by building 100 replicate maximum likelihood trees for each migration event. Potential migration events were inferred when the proportion of explained covariance among groups stabilized toward their maximum asymptotic values. These analyses were implemented in the BITE R package (Milanesi et al., 2017).
We also performed ABBA-BABA (also known as Patterson's D) statistical tests with Dsuite version 0.4r43 to detect introgression (Malinsky et al., 2021). The ABBA-BABA test involves four populations or taxa in the form (((P1, P2), P3), outgroup) and determines potential gene flow between P3 and P1 or P2 based on the relative site patterns of ABBA and BABA. We inferred phylogenetic relationships among the populations by SVDquartets (Chifman and Kubatko, 2014) using the same set of SNPs as in TreeMix, and we uploaded the recovered tree to Dsuite. A total of 20 trios of (((P1, P2), P3), P. sylvestris) were tested (Supplemental Table 2). The significance of each test was assessed using 100 jackknife resampling runs. Branch-specific gene flow was estimated based on the f 4 ratio using the Dtrios and Fbranch program in Dsuite.

Analysis of genomic clines
ABBA-BABA tests detected introgression between P. yunnanensis and group C of P. densata (see Results). We selected eight populations from P. densata group C and nine populations of P. yunnanensis that formed a transition from pure P. yunnanensis to admixed populations with P. densata to pure P. densata. Additionally, they also formed a geographical gradient along the major river valleys of the region that run north to south following elevations (Supplemental Figure S4B). We investigated patterns of differential introgression between P. yunnanensis and P. densata using BGCs . BGC identifies introgression by quantifying the probability of locus-specific ancestry given genome-wide ancestry or hybrid index . The hybrid index is the proportion 4 of an individual's genome inherited from reference population 1 (P1); the probability of being inherited from reference population 2 (P2) is therefore 1 À 4. BGC summarizes introgression using a locus-specific a that quantifies the change in ancestry relative to a null expectation based on the genome-wide hybrid index and a b that specifies the slope of the cline . Loci identified as a outliers may be involved in directional selection toward homozygous genotypes from one reference population or selection against heterozygotes (Gompert and Buerkle, 2011;Oswald et al., 2019). Negative b values indicate faster-than-expected introgression rates, whereas positive b values denote rates slower than expected. Loci that are identified as b outliers may be involved in reproductive isolation (positive b) or adaptive introgression (negative b) . For both parameters, a zero value corresponds to the null model.
We classified our sampled trees into admixed or reference genotypes based on the Q-values of ADMIXTURE (Alexander et al., 2009). We considered a population to be admixed when its inferred ancestry coefficient (mean Q value) was between 0.1 and 0.9, and those with Q < 0.1 or Q > 0.9 were considered to be from the two reference populations. Under these criteria, we identified seven pure P. yunnanensis populations (P1), four pure P. densata populations (P2), and six admixed populations (Supplemental Figure S4C). We ran BGC three times independently for 400 000 MCMC iterations each time with a burn-in of 200 000 and a thinning interval of 100, and we assessed convergence by visual inspection (Supplemental Figures 4D-4F). Replicate runs were subsequently combined. BGC outliers were identified as credible when their 95% confidence intervals excluded the neutral expectation (i.e., a or b = 0) or their median of posterior distribution surpassed the (1À0.975)/2 and 0.975/2 quantiles of the probability distribution . Genomic clines were visualized using the ClineHelpR R package (Martin et al., 2021).

Genotype-environment association
To examine the role of environmental selection on genomic clines in the contact zone, we performed RDAs to identify outlier loci associated with environmental variables in the 17 populations analyzed for BGC. RDA is a multivariate constrained ordination approach that combines multiple linear regression and PCA to detect significant loci associated with environmental gradients (Legendre and Legendre, 2012).
We included the Hellinger-transformed allele frequencies of each population as the response matrix and two predictor matrices of environmental and spatial variables, respectively. We extracted 12 environmental variables for each sampling site (Supplemental Table 6): annual mean air temperature (bio1), isothermality (bio3), air temperature seasonality (bio4), annual precipitation (bio12), precipitation of the driest month (bio14), precipitation seasonality (bio15), ground-frost frequency (FRS), growing degree days, soil organic carbon, wet-day frequency (WET), annual mean UV-B (UVB1), and elevation (ELEV). These variables have previously been identified as most relevant to niche divergence in P. densata and other pine species Zhao et al., 2020). All raster layers were converted to the same resolution (30 arcseconds). To account for underlying spatial influences, we used the dbMEMs as spatial predictors. dbMEMs are orthogonal spatially explicit eigenvectors that capture spatial patterns from multiple angles rather than from simple latitudinal and longitudinal vectors. To minimize model overfitting, forward selection with 1000 permutations was performed with an a value of 0.05 on the spatial and environmental variables separately (Borcard et al., 2011).
A partial RDA conditioned on the dbMEM spatial matrix was then performed to identify adaptive loci with allele frequencies as the response variable and the environment matrix as the predictor variable using the vegan R package (Oksanen et al., 2019). Significant RDA axes were tested using the anova.cca function with a permutation test of 9999 iterations. VIFs for the predictor variables used in the model were evaluated using the vif.caa function. We extracted the SNP loadings from each of the significant RDA axes and chose three standard deviations (two-tailed p = 0.0027) as the cutoff for outliers. Each SNP was then associated with all environmental predictors using Pearson's correlation coefficient (r) (Forester et al., 2018), and those with the strongest correlations were regarded as the best-supported SNP-environment associations. Similarly, we performed another partial RDA conditioned on environment to estimate the effects of spatial vs. environmental predictors.

Outlier annotation and enrichment analyses
Outlier SNPs were annotated and categorized using SnpEff v4.3t (Cingolani et al., 2012) on the basis of their impact, functional class, and genomic region. The functional annotations of all protein-coding genes in the P. taeda v1.01 assembly were determined using eggNOG-mapper v2 (Cantalapiedra et al., 2021) with parameters -m diamond -tax_scope 33090. We extracted the GO terms and KEGG pathways from the eggNOG-mapper results to build a Pita Orgdb database using the AnnotationForge R package v1.30.1. Based on this Orgdb database, GO and KEGG enrichment analyses were performed in the clusterProfiler R package v3.16.1. We used GOMCL  to reduce redundancy and summarize the lists of GO terms and displayed the results with Cytoscape 3.8.2 (Shannon et al., 2003).

DATA AVAILABILITY
All sequencing data are archived in the NCBI SRA database (BioProjects PRJNA891676 and PRJNA492187) and the China National Genomics Data Center (BioProject PRJCA015270). The VCF genotype files of the samples are deposited at Dryad: https://doi.org/10.5061/dryad. sxksn0373.

SUPPLEMENTAL INFORMATION
Supplemental information is available at Plant Communications Online.

FUNDING
This study was supported by grants from the Swedish Research Council (VR 2017-04686) and T4F program, Sweden.

AUTHOR CONTRIBUTIONS
X.-R.W. and W.Z. designed the study. W.Z. and J.-F.M. conducted field sampling. J.-F.G., W.Z., and B.A. analyzed the data. J.-F.G., W.Z., and X.-R.W. wrote the manuscript draft. All authors contributed to the revision of the manuscript.