Effects of Geological and Environmental Events on the Diversity and Genetic Divergence of Four Closely Related Pines: Pinus koraiensis, P. armandii, P. griffithii, and P. pumila

The effects of mountain uplift and environmental oscillations on nucleotide variability and species divergence remain largely unknown in East Asia. In this study, based on multiple nuclear DNA markers, we investigated the levels and patterns of nucleotide diversity and interspecific divergence in four closely related pines in China, i.e., Pinus koraiensis, P. armandii, P. griffithii, and P. pumila. The four pine taxa shared low levels of nucleotide polymorphisms at the species level. P. pumila had the highest silent nucleotide diversity (πsil = 0.00661) whereas P. griffithii had the lowest (πsil = 0.00175), while the levels of genetic polymorphism in P. armandii (πsil = 0.00508) and P. koraiensis (πsil = 0.00652) were intermediate between the other two species. Population genetic structure analysis showed that variations primarily existed within populations of the four pine species, presumably due to habitat fragmentation or the island-like distributions of Pinus species. Population divergence (FST) analysis showed that the genetic divergence between P. griffithii and P. koraiensis was much greater than that between P. koraiensis and the other two pines species. Isolation-with-migration analysis suggested that asymmetric gene flow had occurred between any two pairs of pine species. Phylogenetic analyses indicated that the four allied species split into two groups about 1.37 million years ago, where P. armandii and P. pumila were closer and clustered as sister species, whereas P. koraiensis and P. griffithii were clustered on another branch. Our results and those obtained in previous studies suggest that mountain uplift and geological climate oscillations may have led to the patterns of genetic divergence and nucleotide variations in these four pine species.


INTRODUCTION
Nucleotide diversity levels within populations and spatial patterns, as well as species divergence are of great importance in the field of evolutionary biology (Coyne and Orr, 2004;Hao et al., 2015;Ortego et al., 2015). Mountain uplift and past environmental oscillations may have been largely responsible for shaping the spatial patterns of diversity and genetic divergence among species (Coyne and Orr, 2004;Wachowiak et al., 2009). In general, the level and distribution of nucleotide diversity are historical products of the long-term evolution of a species, and they are largely associated with the evolutionary potential or future fate of a species (Wright and Gaut, 2005;Wachowiak et al., 2009;Zhou et al., 2014;Tsuda et al., 2017). In addition, ecological or proximal causes (e.g., mating systems) and various barriers (e.g., geographic and spatio-temporal isolation) due to geological history can cause fragmented of species distributions, which may lead to reduced gene flow between isolated populations and adaptability. This process initiates allopatric divergence, and local adaptation can ultimately drive populations toward speciation and change evolutionary processes (Cutter and Gray, 2016;Ren et al., 2017).
Conifers are anemophilous and outcrossing (Fu et al., 1999). They are mainly characterized by long life cycles, large effective population sizes, incomplete lineage sorting, and extensive introgression/hybridization among populations, which makes their genetic structure and spatial patterns of diversity very different from those found in traditional model plants (Neale and Kremer, 2011;Gao et al., 2012;Li et al., 2012Li et al., , 2013Hao et al., 2015). For instance, conifers tend to share haplotypes/genotypes among species, with no distinct genetic divergence across species ranges, and most of the genetic variations are found within populations (Willyard et al., 2009;Chen et al., 2010;Ren et al., 2012;Liu et al., 2014;Zhou et al., 2014). In recent years, many studies have determined nucleotide polymorphisms and speciation history patterns using multiple nuclear loci (Ma et al., 2006;Li et al., 2010;Gao et al., 2012;Wachowiak et al., 2013;Zhou et al., 2014;Zou et al., 2016). These biparentally inherited nuclear genes are functional genes that encode proteins and they are characterized by their orthology, moderate to high rates of evolution, and the presence of many phylogenetically informative sites (Zhou et al., 2014). Therefore, large numbers of nuclear markers can be used to detect the deep evolutionary relationships among closely related species, especially recently diverged taxa (Chen et al., 2010;Zou et al., 2016). In this study, we employed nucleotide polymorphisms as well as the population structure and speciation history to explore the relationships among four Pinus species.
Four related Pinus species in subsection Strobus occur in East Asia: P. armandii Franch., P. koraiensis Sieb. et Zucc., P. griffithii McClelland, and P. pumila (Pall.) Regel. These species share some common features, such as possessing five needle leaves in a bundle. There are obvious differences among these species in terms of their ecological niche, natural geographic distribution, morphology, wood anatomy, and cytology (Wu and Feng, 1995). P. armandii, P. pumila, and P. koraiensis occur according to the changes in the hydrothermal conditions, and P. griffithii is distributed on the China-Nepal and China-Bhutan borders (Supplementary Figure S1; Wu and Feng, 1995). The distributions of these pines also increase successively from low to high altitudes, where P. koraiensis occurs at altitudes of 150-1,800 m and P. pumila always forms copses with other coniferous trees on mountain tops at altitudes of 1,000-2,300 m. P. armandii usually occurs in pure forest or mixed forest at altitudes of 1,000-3,300 m, and P. griffithii is distributed in the same manner at altitudes of 1,600-3,300 m on the Qinghai-Tibet Plateau and Mount Everest. According to the classic categorization of Pinus sect. Strobus, P. pumila is categorized into the P. koraiensis taxon and P. armandii into the P. griffithii group (Fu et al., 1999). A study of the divergence of the resin ducts in Pinus sect. Strobus suggested that P. armandii is the most primitive species and its southward spread gave rise to P. griffithii, whereas its northward spread gave rise to P. pumila and P. griffithii (Peng, 1999). In recent years, several studies based on plastid molecular markers have shown that P. koraiensis and P. pumila are most closely related to each other (Peng, 1999;Wang et al., 2016). However, the accurate phylogenetic relationships and interspecific divergence among these related pine species is still controversial due to the limited availability of morphological and molecular biological evidence (Peng, 1999;Liu et al., 2014;Hao et al., 2015).
In addition, some studies found low variations in DNA barcodes, such as rbcL and matK, among related species due to low levels of cpDNA diversity and genetic divergence (Syring et al., 2007;Li et al., 2015). Moreover, frequent interspecific introgression and hybridization among species have important effects on their genetic diversity levels, especially in parapatric or allopatric of species in China. Thus, in the current study, we used multiple nuclear genes to investigate the genetic diversity and divergence in four closely related pine species comprising P. koraiensis, P. armandii, P. griffithii, and P. pumila. We specifically addressed the following two questions. (1) How is the level and pattern of population divergence among the four pines species? (2) How is the pattern of gene flow and interspecific introgression between these closely related species in East Asia?

Population Sampling
To accurately determine the nucleotide diversity and interspecific relationships among pines, we sampled 216 individuals from 16 allopatric populations of the four pine species (Figure 1). The distance between any two trees of the same species was at least 50 m (Supplementary Table S1). We isolated the haploid megagametophyte from each of the sampled trees.

DNA Extraction, PCR Amplification, and Sequencing
Total genomic DNA was extracted from the megagametophyte samples for each individual using the modified CTAB method (Doyle and Doyle, 1987). In preliminary studies, about 40 nuclear gene loci were screened for cross-amplification in the four pine species (Ma et al., 2006;Eckert et al., 2013). Finally, six polymorphic loci (1_1609_01, CL1694, PTIFG2009, 0_12929_02, 0_14221_01, and 0_1688_02) associated with protein kinase family protein, serine-tRNA ligase, and leucine-rich repeat family protein were selected for subsequent sequence amplification and analysis (Supplementary Table S2). PCR amplification was conducted in a volume of 25 µL with a DNA concentration of 10-40 ng/µL, 50 mM of Tris-HCl, 0.5 mM of each dNTP, l.5 mM of MgCl 2 , 2 µM of each primer, and 0.75 U of Ex Taq DNA polymerase (Runde, Xi'an, China). The PCR program comprised initial denaturation at 94 • C for 5 min, followed by 35 cycles for 1 min at 94 • C, at a specific annealing temperature (53-60 • C, see Supplementary Table S2 for details) for 1.5 min and extension for 1 min at 72 • C, and a final extension for 10 min at 72 • C. Primer synthesis and sequencing of the PCR products were performed by Shanghai Biological Engineering Co. Ltd. (Shanghai, China). Sequencing was conducted using both forward and reverse primers for each gene (Supplementary Table  S2) on an ABI Prism 3730xl sequencer (Applied Biosystems, Foster City, CA, United States).

Data Reconciliation
Sequences were aligned and manually adjusted with Chromas and MEGA5.0 (Librado and Rozas, 2009;Tamura et al., 2011) to correct random errors generated by sequencing.

Nucleotide Diversity and Neutral Tests
The genetic diversity parameters for the four pine species were calculated using DnaSP v. 5.10 (Librado and Rozas, 2009), including the number of segregation sites S, Watterson parameters θ w (Watterson, 1975), total nucleotide polymorphisms π t (Li and Nei, 1975), nucleotide diversities of non-synonymous sites and silent loci (synonymous sites and non-coding positions), π a and π sil , number of haplotypes N h and haplotype diversity H d (Nei and Tajima, 1981;Fu, 1997;Depaulis and Veuille, 1998;Depaulis et al., 2001), and intragenic minimum recombination events (R M ) (Hudson and Kaplan, 1985). In addition, in order to accurately detect departure from the neutral model of molecular evolution at each locus, the neutral equilibrium was tested for various parameters using Tajima's D (Tajima, 1989), Fu and Li's D * and F * (Fu and Li, 1993), and the standardized Fay and Wu's H (Fay and Wu, 2000). Tajima's D measures the standardized difference between π and θ W , whereas Fay and Wu's H measures the difference between π and θ H . The former is more sensitive to an excess of rare variants whereas the latter is more sensitive to an excess of high-frequency-derived variants. Both D and H are expected to be zero under the standard neutral model (Zou et al., 2013). We also conducted maximum frequency of derived mutations (MFDM) tests to examine the likelihood of natural selection acting on individual loci at species levels. The MFDM tests exclude the confounding effects of demography completely when detecting recent positive selection (Li, 2011). In practice, a single DNA fragment (i.e., a locus) may have a short length and only contain a few R M . The MFDM v. 1.1 test always depends on the estimate of R M (Li, 2011).

Genetic Divergence and Population Structure
The sources of genetic variation among the four species (group), populations and individuals were analyzed by analysis of molecular variance (AMOVA) with ARLEQUIN v.3.11 (Excoffier et al., 2005). We estimated F statistics hierarchically, both among species (F CT ) and among populations within species (F ST ). F ST (Wright, 1949) is a coancestry statistic that provides the variance within populations relative to the total population. We used NETWORK v. 4.6.1.3 (Bandelt et al., 1999) to construct phylogenetic relationships based on the haplotypes of each species at the six loci (gaps were excluded). In addition, genetic clustering based on individuals was estimated by Bayesian clustering using the STRUCTURE V.2.3 program (Hubisz et al., 2009). To estimate the number of clusters (K) in the data, K values from 1 to 16 were explored using 10 independent runs per K and an admixture model. As described in previous studies, in order to generate a reliable estimate of the optimal K, the burn-in was set to at least 200,000 and the run length was at least 500,000 (e.g., Zou et al., 2013;Zhou et al., 2014;Tsuda et al., 2015). We also utilized the STRUCTURE HARVESTER program to estimate the most likely number (K) of genetic clusters (Evanno et al., 2005;Earl and vonHoldt, 2012).

Reconstruction of Historical Dynamics and Species Relationship
The migration rates, effective population sizes, and population split times were calculated based on the isolation-with-migration (IM) model using the IMa2 program to infer the population history dynamics of the four pine species (Nielsen and Wakeley, 2001;Hey, 2006Hey, , 2010Kuhner, 2009). We analyzed sibling species in a pairwise manner using a basic two-population model. We extracted the largest region with no recombination for each of the six nuclear loci. Functions of the model parameters were estimated in the M-mode based on 1 × 10 6 Monte Carlo Markov chain (MCMC) steps following 5 × 10 5 burn-in periods in order to obtain reliable estimates (i.e., similar posterior distributions for the parameter), and the effective sample size for each parameter was at least 200. The divergence time between species was estimated based on a mean mutation rate of µ = 4.875 × 10 −9 (per site per generation), and the generation time for pines was assumed to be 25 years (Ma et al., 2006). In addition, we constructed the phylogenetic relationships among the four pine species using * BEASTv1.8.0 (Heled and Drummond, 2010). The species tree was computed using the six nuclear genes sequenced for the sampled species. We selected a Yule model as the species tree prior, a constant population size, and relaxed lognormal clock models for all nuclear loci (Heled, 2012). Pinus bungeana was used as outgroup. We ran the MCMC analysis for one billion generations with sampling every 50,000 generations. Two independent runs were conducted. Tracer 1 v1.5 (Rambaut and Drummond, 2009) was used to assess the convergence of chains to the stationary distribution (effective sample size >200). After discarding the first 2,500 trees as a burn-in, the remaining trees were summarized in a maximum clade credibility tree with the TreeAnnotator v1.8.0 program (Drummond and 1 http://tree.bio.ed.ac.uk/software/tracer/ Rambaut, 2007). Joint Bayesian species delimitation and species tree estimation were also analyzed using the BPP v3.4 program based on the multispecies coalescent model (Yang, 2015). In addition, phylogenetic relationships based on nuclear haplotypes were reconstructed with the maximum likelihood (ML) model using PAUP 4.0 (Swofford, 2002). The ML analysis employed the HKY substitution model, where support values for the nodes were estimated based on 1,000 bootstrap replicates.

Nucleotide Polymorphisms
For all nuclear loci, P. griffithii had the lowest estimates for the total average segregating sites and average values of the segregating sites in silent sites compared with the other three species ( Table 1). P. griffithii had two singleton mutation sites in PTIFG2009. The numbers of shared polymorphisms (S S ) were similar among the four closely related species and the numbers of fixed differences (S f ) were low. The differences in the polymorphisms between P. pumila and P. griffithii were mainly due to the 0_14221_01 gene locus, with 18 polymorphic sites in P. pumila but only three in P. griffithii (Supplementary Table S3). Similarly, the differences between P. koraiensis and P. pumila were mainly due to the CL1694 locus, with 20 polymorphic sites in the former but only five in the latter. The 0_14221_01 and 0_12929_02 loci accounted for the observed differences between P. armandii and P. pumila.

Neutrality Tests
Positive Fu and Li's D * and Fu and Li's F * values were estimated for most loci, although most of these values were not significant in each species ( Table 2). The mean Tajima's D (D) values were negative for P. pumila (−0.236) and P. griffithii (−0.030), but positive for P. koraiensis (0.274) and P. armandii (0.254) ( Table 2). In addition, the mean Fay and Wu's H (H) values were negative for P. koraiensis and P. griffithii but positive for P. pumila and P. armandii (Table 2). However, with the exception of locus PtIFG2009 (P = 0.04674) in P. pumila, no significant deviation from neutrality were detected for the six loci using the MFDM test (Supplementary Table S4). The MFDM test detected slight deviation from the standard neutral model at the PtIFG2009 locus in the four pine species by considering genetic recombination (P < 0.05).

Population Genetic Structure
Within the four species, the 0_12929_02 locus had the highest genetic divergence among populations (F ST = 0.714, P < 0.001), whereas the 1_1609_01 locus had the lowest genetic divergence among populations (F ST = 0.085, P < 0.001) (Supplementary Table S5). The population genetic divergence was also significant across all loci (F ST = 0.624, P < 0.001). It should be noted that F ST was much higher for the overall loci than interspecific genetic differentiation (F CT ) except for the 0_12929_02 locus (Supplementary Table S5). Between pairs of species, F ST varied from 0.008 to 1.000 (Supplementary Table S6).  (Watterson, 1975); π t , nucleotide diversity across all loci; π a , nucleotide diversity at nonsynonymous sites; π sil , nucleotide diversity at silent sites; Rm, minimum number of recombinant events.
The divergence among the four pine species at the six nuclear loci was also supported by Bayesian clustering analysis (Figure 1). The most likely number of clusters for the entire dataset was K = 3 (Supplementary Figure S2). P. pumila and P armandii individuals were separated into two groups that corresponded to their respective species, whereas the majority of P. koraiensis and P. griffithii individuals were assigned to another cluster. Remarkable levels of gene flow and gene introgression were apparent between P. koraiensis and P. armandii (Figure 1 and Supplementary Figure S3).

Genealogy of Each Locus
The average number of haplotypes (N h ) and haplotype diversity (H d ) were much higher in P. pumila (N h = 12.167, H d = 0.822) and P. armandii (N h = 10, H d = 0.781) than P. koraiensis (N h = 7.167, H d = 0.655) and P. griffithii (N h = 4.167, H d = 0.420) ( Table 2). The haplotypes in the center of the network were shared (Figure 2), but most haplotypes were exclusive to specific species at the five loci (1_1609_01, 0_1688_02, PTIFG2009, 0_14221_01, and CL1694). In addition, there was no shared haplotype at the 0_12929_02 locus. According to the ST values for all loci among the species in Supplementary Table S7, each two Pinus species exhibited significant genetic divergence, where the highest ST value (0.62006) was found between P. koraiensis and P. griffithii, whereas the divergence between P. pumila and P. griffithii was lowest ( ST = 0.18941).

Evolutionary Relationships Among the Four Species
The mean divergence time between P. pumila and P. armandii was estimated at 1.13 million years ago (Mya). A younger divergence time (0.319 Mya) was estimated between P. griffithii and P. koraiensis (Table 3). In addition, we found asymmetric historical gene flow between pairs of species. In particular, the migration rate from P. pumila to P. griffithii was 2.0450, with 0.0005 in the reverse direction (Table 3). Pinus griffithii and P. koraiensis had smaller population sizes (0.0918-0.1846 and 0.3264-0.5369, respectively; Table 3) than P. pumila and P. armandii (0.3738-0.8326 and 0.4087-0.7019, respectively; Table 3). The species tree analyses demonstrated that the relationship was closer between P. armandii and P. pumila where they clustered as sister groups, whereas P. koraiensis and Significance levels: * P < 0.05; * * P < 0.01.
P. griffithii were located in another clade. The four species split into two groups about 1.37 Mya (Figure 3). Moreover, we obtained the best species-tree model using BPP v3.4 software, where the posterior probability of the species tree was one and the acceptance proportion was near to zero (0.025) based on multiple runs (Supplementary Figure S4). The topology of the tree was consistent with the results obtained by * BEAST (Figure 3 and Supplementary Figure S4).

Nucleotide Diversity
The nucleotide diversity at silent sites basically agreed with the neutral model of molecular evolution (Ma et al., 2006;Wachowiak et al., 2016). We detected low levels of silent polymorphisms in the four closely related pines species, because the average values were much lower than the average polymorphism for most conifers at multiple nuclear genes (π sil = 0.0029-0.0122) (Ma et al., 2006;Li et al., 2012). Among the four species, P. pumila had the highest silent nucleotide diversity (π sil = 0.00661), whereas P. griffithii had the lowest (π sil = 0.00175). However, these diversity values were much lower than those found in other Pinus species, such as P. densata and P. yunnanensis (Ma et al., 2006). Factors such as the nuclear gene loci selected in the study, sample size variations, mutation rates within species, demographic effects, and natural selection can influence the nucleotide diversity levels and patterns in species or populations (Loveless and Hamrick, 1984;Lande, 1988;Li et al., 2012;Zou et al., 2013). The four related species and six nuclear loci investigated in our research have also been studied previously, and thus these nuclear loci were not the main cause of the low levels of nucleotide polymorphisms. However, the unequal sample sizes for different genes and species may have caused differences in the nucleotide variability among species. To verify this bias, we detected the nucleotide diversity parameters based on the same sample sizes for each gene from each Pinus species. The results showed that there were significant differences in the levels of diversity among different species compared with the previous estimates (Table 1 and Supplementary Table  S8). We concluded that the levels of nucleotide variability among species were significantly associated with the samples FIGURE 2 | Networks obtained for the six nuclear genes in the four species comprising Pinus pumila (red), P. griffithii (green), P. koraiensis (yellow), and P. armandii (blue). Each sector of a circle corresponds to the frequency of the haplotype for each species.
FIGURE 3 | Divergence times and phylogenetic relationships among four pine species. The tree was constructed based on six nuclear genes using * BEAST. The tree was rooted with Pinus bungeana. The numbers on the branches indicate the corresponding posterior probabilities values, mean divergence dates, and 95% credibility interval.
sizes of the pine species. Similar differences in the patterns of diversity have also been detected in some other gymnosperm species (Ma et al., 2006;Du et al., 2009;Zou et al., 2013;Zhou et al., 2014). In addition, the long life cycles and low mutation rates in conifers may explain the low levels of nucleotide polymorphisms in P. pumila, P. griffithii, P. koraiensis, and P. armandii. Moreover, high levels of linkage disequilibrium were detected and some species deviated from neutrality according to the tests conducted in our study. This was particularly evident at one locus, i.e., PtIFG2009, which suggests that this locus might have undergone selection or population shrinkage according to the results obtained from the Tajima's D and MFDM tests. In particular, for the populations of P. griffithii and P. koraiensis, Tajima's D was positive, and Fay and Wu's H was negative, which is a pattern that is consistent with a recent bottleneck. In addition, P. griffithii and P. pumila descendant populations had a somewhat smaller size than the ancestral population (Table 3), and thus it is possible that the populations have experienced from genetic bottlenecks. The population dynamics due to geological isolation and climatic oscillations probably contributed to their relatively lower diversity (Chen et al., 2017). In addition, the mean Tajima's D (D) and mean Fay and Wu's H (H) values were negative but close to zero for P. armandii at the PtIFG2009 locus, which may indicate a neutral equilibrium q 1 , q 2 , and q a , parameters of population sizes of the first and second species and their ancestral population, respectively; t, parameter of the divergence time between two species; m 1 is the population migration rate from the second to the first species, m 2 is the population migration rate from the first to the second species; T is the divergence time among two species. HPD, 90% posterior density (90% low and 90% high). (Holliday et al., 2010). The numbers of nucleotide polymorphism were higher in P. pumila, P. koraiensis, and P. armandii than P. griffithii. These results can partly be explained by the fact that their seeds are food for nutcrackers and rodents such as squirrels. These animals may screen the seeds and transport them over long or short distances for secondary storage and dispersal, thereby also enhancing the spread of the seeds, and this may affect their genetic differences (Li et al., 2007;Fan and Jin, 2011). The low level of nucleotide polymorphism in P. griffithii may be explained by its small geographic distribution compared with more common and widespread species, because of drift, founder events, and other stochastic processes (Cole, 2003;Chen et al., 2017). In addition, P. pumila is well known because of its larger island-like distribution and it rarely develops into pure forest, thereby accounting for its higher diversity compared with the other three species (Qiu et al., 2007;Chen et al., 2017).

Interspecific Gene Flow and Species Divergence
Analysis of molecular variance detected remarkable divergence in the four Pinus species where the variations were mainly within populations, which agreed with the small differences among populations of wind-pollinated gymnosperms ( Supplementary  Table S5). However, we also found a high level of genetic divergence within groups, possibly due to habitat fragmentation or the island-like distribution of the four species, particularly when considering that the habitats of P. pumila and P. griffithii are harsher than those of the other two species. The former often grows in barren soil on bare rocky peaks. This type of habitat is vulnerable to fragmentation but mountains and ravines may partly hinder the gene flow between populations, thereby leading to the isolation of groups. Genetic divergence was found among the four species, although some degree of gene flow and introgression was detected. In particular, populations of P. koraiensis had a mosaic-like pattern and they were further subdivided into independent sub-clusters when K = 4, which suggests that a high level of introgression in this species. The pairwise migration rate between P. koraiensis and P. pumila was relatively high compared with that between P. koraiensis and P. armandii. In addition, P. pumila and P. koraiensis had a relatively limited distribution in the northwest and northeast of North China, and there was no clear phylogenetic resolution among P. pumila, P. koraiensis, and P. armandii based on DNA fragments from the chloroplast, mitochondrial, and nuclear genomes according to previous phylogenetic studies Wang and Wang, 2014;Hao et al., 2015). These results suggest that migration may have led to a sympatric distribution in addition to the existing incomplete reproductive isolation. The phylogenetic relationships determined based on ML and NETWORK analysis also showed that the shared haplotypes were located in the center of the topological structure (Figure 2 and Supplementary Figure S5), and thus incomplete genealogical screening based on a large effective population of Pinus may have led to the sharing of ancestral polymorphisms. However, there were no shared haplotypes based on the 0_12929_02 locus, and the interspecies variation was similar to the genetic differentiation among populations (F CT = 0.718, F ST = 0.714; P < 0.001; Supplementary Table S5). In general, different nuclear loci have different evolutionary rates and molecular functions (Li et al., 2012;Eckert et al., 2013;Zhou et al., 2014). Previous studies have shown that the 0_12929_02 locus is associated with the protein kinase family and that it has been under selection (Eckert et al., 2013). The rapid fixation of genetic variation in this locus may have led to greater species divergence Nei and Tajima, 1981;Ellstrand and Elam, 1993). In addition, nuclear DNA introgression in ancestral populations among species may have also affected the topology of the phylogenetic trees. The significant topological incongruence among the nuclear gene trees (Supplementary Figure S5) indicates a complex evolutionary history, thereby providing novel insights into the evolution of Pinus. The four species split into two clades about 1.37 Mya (Figure 3). However, we should be cautious when inferring divergence times based on the assumption that the generation time is 25 years in the four Pinus species because of the longevity of Pinus, the long overlaps between generations, the variable age of maturity and the replacement speed of forests. In addition, our multilocus analysis determined that the genetic divergence among the four pine species, was consistent with geological events and climatic oscillations in the mid-to late Tertiary period about 5 Mya. The uplift of the Tibetan Plateau caused by Himalayan orogeny had a great impact on the climate in China, with decreases of in temperature in some areas, but increases of 4-8 • C in the region east of 100 • E (across the Inner Mongolia, Gansu, Qinghai, Sichuan, and Yunnan regions of China) and of 1-4 • C to the west (Jiang, 2009). These climatic conditions may have changed the geographic distributions of plants, and thus we suggest that P. armandii and its ancestral population spread eastward to the northeast of China and westward to Tibet. However, the intensities of the winter and summer monsoons were reduced greatly during the middle-late Pliocene, and the dispersal of Pinaceae pollen by the wind might have been affected (Zhou et al., 2014). Moreover, after gradual changes in the microhabitats and variations in the directions and amounts of gene flow, as well as the accumulation of mutations, new relatives may have emerged by gradual divergence. Effective migration, hybridization, and introgression among species can increase genetic diversity (Wachowiak et al., 2016), and other factors such as selection, isolation, and genetic drift among different microhabitats can promote divergence and speciation. Indeed, significant and asymmetric gene flow and introgression were detected in the four closely related Pinus species. Gene flow and genetic introgression among different pines could have led to changes in genetic variability (Hao et al., 2015;Wachowiak et al., 2016).

DATA AVAILABILITY
Sequence data obtained in this study were deposited in GenBank (KF286539 -KF286739).

AUTHOR CONTRIBUTIONS
Z-HL conceived the study. YJ and JZ performed the experiments. Z-HL, YJ, YW, W-BF, and G-FZ contributed materials and analysis tools. Z-HL, YJ, and JZ wrote the manuscript. YJ and ZL revised the manuscript. All authors approved the final version of the manuscript.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01264/ full#supplementary-material FIGURE S1 | The occurrence records were denoted by small dots of different colors for four related pine species: Pinus pumila (red), P. griffithii (green), P. koraiensis (yellow), and P. armandii (blue). The large dots of four different colors represent the current sampling locations for four pines.     Pinus pumila, P. griffithii, P. koraiensis, and P. armandii. S 1 , number of exclusive polymorphic sites in the first species; S 2 , number of exclusive polymorphic sites in the second species; S S , number of shared polymorphisms; S f , number of fixed differences between two species.