Hybridization and introgression in sympatric and allopatric populations of four oak species

Hybridization and introgression are vital sources of novel genetic variation driving diversification during reticulated evolution. Quercus is an important model clade, having extraordinary diverse and abundant members in the Northern hemisphere, that are used to studying the introgression of species boundaries and adaptive processes. China is the second-largest distribution center of Quercus, but there are limited studies on introgressive hybridization. Here, we screened 17 co-dominant nuclear microsatellite markers to investigate the hybridization and introgression of four oaks (Quercus acutissima, Quercus variabilis, Quercus fabri, and Quercus serrata) in 10 populations. We identified 361 alleles in the four-oak species across 17 loci, and all loci were characterized by high genetic variability (HE = 0.844–0.944) and moderate differentiation (FST = 0.037–0.156) levels. A population differentiation analysis revealed the following: allopatric homologous (FST = 0.064) < sympatric heterogeneous (FST = 0.071) < allopatric heterogeneous (FST = 0.084). A Bayesian admixture analysis determined four types of hybrids (Q. acutissima × Q. variabilis, Q. fabri × Q. serrata, Q. acutissima × Q. fabri, and Q. acutissima × Q. variabilis × Q. fabri) and their asymmetric introgression. Our results revealed that interspecific hybridization is commonly observed within the section Quercus, with members having tendency to hybridize. Our study determined the basic hybridization and introgression states among the studied four oak species and extended our understanding of the evolutionary role of hybridization. The results provide useful theoretical data for formulating conservation strategies.

The genus Quercus L. (oaks) contains more than 400 species that are widespread in the Northern hemisphere [18]. Oak is an important model clade that offers fundamental insights into the ecological and evolutionary consequences of hybridization and introgression [19]. In Quercus, naturally occurring interspecific hybrids are common [11,15]. Within Quercus, the American oaks and European white oaks have been well-studied [20]. For example, high quality SNPs from genic regions have been used to characterize the introgression patterns among European white oaks (Quercus petraea and Quercus robur) [21]. Eaton et al. (2015) utilized genomic RAD-seq data sampled from American live oaks (Quercus series Virentes) for phylogenetic inference and determining introgressions between lineages [22]. In China, most oak hybridization studies have used nuclear DNA markers, such as SSRs and AFLPs [23]. For conservation purposes, it is necessary to study the hybridization and introgression of oak, especially native species.
China is the second-largest distribution center of Quercus, with 62 species described in the Flora of China that are divided into five morphology-based sections: Quercus, Aegilops, Heterobalanus, Engleriana, and Echinolepides [24][25][26]. The first two sections (Quercus and Aegilops) are composed of deciduous oak species which correspond to the Quercus and Cerris proposed by Denk [27]. For our present study, Quercus acutissima Carruthers and Quercus variabilis Blume belong to Group Cerris, while Quercus fabri Hance and Quercus serrata Murray belong to Group Quercus. A new classification of Quercus L. was proposed by Denk harbouring eight sections: Cyclobalanopsis, Cerris, Ilex, Lobatae, Quercus, Ponticae, Protobalanus, and Virentes [27]. The phylogenetic positions of our study species did not change, as they remain in Cerris and Quercus Groups.
The studied four oak species (Q. acutissima, Q. variabilis, Q. fabri, and Q. serrata), having an overlapping distribution in subtropical areas, are important species in subtropical mountain and warm temperate deciduous broad-leaved forests, which occupy important positions in China's forest ecosystem [28]. In our study, we aimed to: (1) identify the pattern of hybridization between these four oak species; (2) investigate whether, and to what extent, introgression exists among these four-oak species; and (3) compare the introgression intensity levels in sympatric and allopatric populations. This work provides an understanding of the evolutionary mechanisms of Quercus, as well as the biological processes behind their biodiversity, and has implications for forest management.

The availability of microsatellite loci (SSRs)
The average values of the estimated null allele frequency at each locus across all populations were all less than 0.2 (Table 1). No evidence of significant linkage disequilibrium was observed for each pair of loci in each population at the indicative adjusted nominal level (5%) for multiple comparisons equal to 0.000368 (Table S1). The 17 polymorphic SSRs generated 361 alleles, with a mean of 21 alleles per locus. Some SSRs were highly variable, including quru-GA-0M05, Quru-GA-1H14, Quru-GA-1i15, MSQ16, ssrQpZAG36, and ssrQrZAG112, which each contained more than 21 alleles. Mean PIC, A R , and H S for all markers were 0.909 (range: 0.844 to 0.939), 10.7 (range:7.1 to 14.2), and 0.550 (range: 0.254 to 0.980), respectively ( Table 1).
Comparison of DIC values across the 10 populations showed that models with null alleles had higher support than models without null alleles, and the average estimated frequency of across loci was slightly high (ranging from 0.102 to 0.188) ( Table 2; Table S1). The inbreeding coefficient of the 10 populations was large (mean = 0.41) and significantly deviated from zero (P < 0.05). Two populations, LT-F and ZW-B, did not departure from Hardy-Weinberg equilibrium, and seven populations, BY-A, BY-V, LT-B, ZW-V, ZJ-A, ZJ-V, ZJ-F, and ZJ-B, did not departure from Hardy-Weinberg equilibrium after excluding the null alleles bias.

Analyses of genetic diversity among the four oaks
At the DNA level, the four oak species had high degree of genetic diversity where mean N E, A R , and H S were 6.7 (range: 5.8 to 7.9), 10.7 (range: 8.7 to 11.8), and 0.829 (range: 0.782 to 0.862), respectively ( Table 2). The four species were divided into two groups belonging to Cerris and Quercus (groups AV and FB) based on the STRU CTU RE analysis. The H o of group FB (0.607) was greater than that of group AV (0.497) as determined using FSTAT (P < 0.05). The highest intraspecific genetic diversity (A R , H S , and H o ) was observed in the Q. fabri population compared with the three other investigated species and was significantly different from Q. variabilis (P < 0.05). Among the four different sites, Liangting had highest H o , which was significantly different from that of Zhongwu. Significant differences in A R were observed between Zijin and Zhongwu while H S and F ST showed no significant differences among sites.

Genetic differentiation and population structure
Null alleles have particular effect on the estimation of population differentiation, which was indicated by the almost equal values and similar 95% level CIs of global F ST across all loci with (cF ST = 0.085, 95% CI: 0.065-0.105) and without (F ST = 0.101, 95% CI: 0.078-0.125) using the "exclusion null alleles" correction (Table 1). For each locus, cF ST (range: 0.033 to 0.168) and standardized differentiation G′ ST (range: 0.033 to 0.190) were relatively low at the nominal level (5%). Thus, a weak genetic differentiation occurred among the four-oak species. We also compared the genetic differentiation between pairs of populations ( Fig. 1a). In this study, we identified the same species as homologous and different species as heterogeneous. Population differentiation analysis revealed the following: allopatric homologous (mean F ST = 0.064) < sympatric heterogeneous (F ST = 0.071) < allopatric heterogeneous (F ST = 0.084). The interspecific genetic differentiation levels were more significant than genetic differentiation among regions as observed using two-sided P-values ( Table 3). The PCoA confirmed a strong genetic structure existed among the populations ( Fig. 1b-d).
AMOVA results indicated that 1.04% (P = 0.003) of the genetic variation occurred among regions and 9.78% (P < 0.0001) among populations within regions, with the greatest genetic differentiation of 89.17% (P < 0.0001) being harbored within populations ( Table 4). The hierarchical AMOVA revealed a significant genetic differentiation among species (F CT = 0.04064, P < 0.001), with 7.2% (P < 0.0001) occurring among populations within species and the greatest genetic differentiation of 88.74% (P < 0.0001) being harbored within populations. The intraspecific gene flow of Q. acutissima, Q. variabilis, Q. fabri, and Q. serrata were estimated to be 3.31, 2.56, 4.72, and 3.8, respectively (Table 5). Simultaneously, the genetic variation among species was greater than among regions. In total, 300 samples were clustered into two branches based on Nei's genetic distances using the NJ method ( Fig. 2). Q. acutissima and Q. variabilis individuals were clustered into a large branch below the tree (group AV) and Q. fabri and Q. serrata (group FB) individuals were clustered into another large branch above the tree. The same species, along with other species, showed intraspecific differentiations and interspecies variations. Additionally, NJ graph showed considerable variation and somehow weak clustering of Q. fabri and Q. serrata. The clustering results of leaf variation were basically consistent with the molecular results. We clustered the four species and found that 14 individuals of Q. fabri and all of Q. serrata samples clustered into a single branch. These results indicated that the leaf traits were quite different from those of other Q. fabri individuals. There was a small deviation at the population-level clustering, which mainly occurred for the Q. variabilis population in Bayan, as well as for Q. fabri and Q. serrata populations in Liangting and Zijinshan (Fig. 3). This indicated that there was no great difference in morphology, but that the genetic structure had changed.

Admixture analysis and hybrid identification
Delta K, which is used to determine the best fit value of K, was computed using STRU CTU RE HARVESTER for the given range, 1-9, and the highest value was obtained at K = 2 (Fig. 4). This was consistent with the taxonomic classifications of these four distinct species. In total, 98% of Q. acutissima and Q. variabilis individuals were assigned to cluster I (QI), with an average proportion of membership (QI = 0.98), and most Q. fabri and Q. serrata individuals (100%) were assigned to cluster II (QII = 0.99). Only three remaining individuals (two Q. acutissima and one Q. variabilis) showed different degree of admixture with assignment probabilities of < 0.90. The 10 populations were divided into two large groups (Groups AV and Table 3 Comparison of genetic statistics among the four Quercus. L groups A R allelic richness with rarefaction to a common sample size of 10, H O observed heterozygosity, H S genetic diversity within populations, F ST genetic differentiation index. *P < 0.05. **P < 0.01. Two-sided P-value was obtained after 10,000 permutations and taken as the proportion of random data sets giving larger statistics than the observed statistics       (Table S2). We observed four types of hybridization, Q. acutissima-Q. fabri, Q. acutissima-Q. variabilis, Q. fabri-Q. serrata, and Q. acutissima-Q. variabilis-Q. fabri (Table 6). A total of 188 heterozygous individuals were observed (overall Hr = 62.77%), including 67 F 1 hybrids and 121 backcross individuals. Among the four sampled plots, the highest hybridization rate was found in Bayan plots (100%). Among the four-oak species, the hybridization rate was as follows: Q. acutissima (Hr = 18.0%) < Q. fabri (Hr = 50.7%) < Q. variabilis (Hr = 74.2%) < Q. serrata (Hr = 94.9%). Most hybridizations occurred in the same groups (AV or FB); however, there were also introgression individuals produced by cross breeding among

Discussion
Introgression studies usually focus on large-scale pairs of individual species [29] and sympatric related species [30], but rarely species within distinct ecological niches and evolutionarily clades. Here, we examined the extent of introgression that existed among four oaks belonging to two different groups of sympatric and allopatric populations. Oak (Quercus) are notorious for hybridization with species extinction, and this may also play a critical role in adaptive evolution [31,32]. Hybridization and introgression in oaks has been well documented, and adaptive introgression was important in evolution as revealed by genomics [11,[32][33][34]. Our study used nSSR markers to investigate the introgression of four oaks. We identified four types of hybrids: Q. acutissima × Q. variabilis, Q. fabri × Q. serrata, Q. acutissima × Q. fabri, and Q. acutissima × Q. variabilis × Q. fabri. Introgression occurred more readily in the Quercus group than in the Cerris group, and cross-group hybridization also existed; however, it was rare. This suggested that a low frequency of interspecific hybridization occurred with incomplete reproductive isolation. Reproductive isolation mechanisms contributed to the simultaneous coexistence of the four oaks.

Genetic diversity and genetic variation of the studied four oak species
Introgression is a vital source of novel genetic variation and drives diversification during reticulate evolution [3].
Owing to the critical variation introduced by hybridization, and the hybrid constitution itself, the diversification rate shifts [2]. In this study, we found that the studied 10 oak populations, all had high levels of genetic diversity, with Q. fabri in the Liangting and Zijing sites having the highest value. Long-lived trees span broad geographical ranges through a combination of adaptive plasticity and adaptive differentiation, as well as the standing genetic variation that occurs in response to environmental changes [4]. A high genetic diversity provides adaptability that ensures the survival of the four oaks when facing extreme climatic events. In our study, the F ST values of the 10 populations ranged from 0.05 to 0.15, which indicates moderate differentiation. Additionally, the order of of genetic differentiation among the 10 populations was as follows: allopatric homologous < sympatric heterogeneous < allopatric heterogeneous. Q. acutissima and Q. variabilis diverged 13.2 Ma, and the divergence time of Q. fabri and Q. acutissima was 12.33 Ma [35,36]. During the quaternary ice period, the three species (Q. acutissima, Q. variabilis, and Q. fabri) showed different migration routes in response to climatic changes. In addition, the strong uplift of the Tibetan Plateau in China led to the intensification of the Asian monsoon climate, which may promote species differentiation [37][38][39]. The main cause of the genetic differentiation was the long-term evolutionary divergence of the four species, followed by adaptive differentiation caused by interspecific introgression [40].
By studying 17 pairs of polymorphic loci with different F ST values, we speculated that gene flow occurred among these oaks. The limited gene flow prevented hybridization-induced genetic swamping, instead of providing more sources of genetic variation for heterozygotes. Additionally, we found that genetic variation among the four oaks mainly occurred within populations (88.74%), with only a small percentage occurring among populations (7.2%). Ramırez-Valiente et al. (2018) found that drought drove the evolution of genetic differences in functional traits among oaks [41]. The cluster analysis of leaf variation of the 300 individuals also showed large phenotypic variation. Allelic and associated phenotypic changes of species are strategies used to respond to the threat of global climatic changes during the adaptive evolutionary progress [42]. Although a small number of nuclear loci were used in this experiment to verify this hypothesis, we could not determine a phenotype associated with introgressed loci. A future genome-wide association analysis will further analyze the ingression regions and phenotypic associations.

Gene flow and hybridization among the four-oak species
Compared with intraspecific gene flow dynamics, interspecific introgression is more complex owing to  (3.749). Gene flow is an important attribute affecting species ability to spread its genes [44]. Most oak trees have larger pollen dispersal distances. The dispersal capacity in the genus Quercus is also related to many factors, such as tree height, vegetation density, and leaf area [45]. A small amount of gene flow enables all breeding populations to maintain similar alleles and thus high heterozygosity, so the NJ clustering graph showed weak differentiation between Q.fabri and Q. serrata groups while the STRU CTU RE clustering revealed similar genetic separation. However, it may also be caused by variation within population, as stated above, the proportion of variation within population is very high (88.74%). We found recurrent interspecific gene flow among the four oaks. The STRU CTU RE clustering revealed that most populations contained hybrids between two species. For example, the ZJ-A population showed F 1 hybrids between Q. acutissima and Q. variabilis, while BY-A population showed hybridization between Q. acutissima and Q. fabri. Interestingly, a large number of individuals in the BY-V population appeared to be trihybrids. Additionally, STRU CTU RE clustering showed that ZJ-F and ZJ-B populations did not form two strong groups and their hybrid individuals are relatively uniform, probably representing a mixed group. However, the phenotypic data indicated that these are two distinct species. Additionally, these two populations, BY-V and ZW-B, have more uniform hybrids, and we are not sure whether this is due to ancestral lineage or recent gene flow, and furthermore we did not rule out the existing possibility of shared variation. Because no suitable outgroups were utilized in the present study, the observed complex admixture patterns maybe are attributable to their ancestral contributions from other Quercus spp. Additional samples of related species are needed to elucidate this case. At the same time, the Bayan site appeared to be an especially active zone of introgression among these three oaks. Many F 1 progeny and some backcrosses were hybrids. In section Cerris, the most hybrids occurred among Q. variabilis backcrosses, while in section Quercus, the most hybrids occurred among Q. serrata backcrosses. This may have resulted from our sampling of individuals displaying more than one species traits, but we cannot rule out a bias toward introgressions between parental species. Such asymmetric introgression is common [46], as in the introgression from the coastal oak species Quercus dentata to an ecotype of the inland oak species Quercus mongolica var. crispula, which has colonized the coastal environment in northern Japan [34]. The environment in which hybrid individuals are grown is an important cause of asymmetric introgression [47][48][49]. Oaks are wind-pollinated, and gene flow mainly occurs through pollen [50]. Therefore, differentiation in seed and pollen dispersal between hybrids and parental species are crucial. Future sampling should target a broader range of individuals in unique environments, like the edge of a hybrid zone, to determine whether their asymmetric introgression is occurring at the genome level. In addition, the reproductive isolation patterns between the hybrid individuals and their parents would be experimentally verified.

Implications for forest management
Natural hybridization is a common phenomenon in plants, occurring in 25% of existing species [51,52]. Fertile hybrids backcross with their parents, leading to introgression. Introgression is an evolutionary creative force that introduces new, possibly adaptive alleles into a population [53,54]; however, excessive gene flow may result in genetic swamping and species extinction [55]. For conservationists, whether to prevent or encourage interspecific gene flow is a difficult question. Oaks are "notorious" for their widespread hybridization and are exceptional forest trees with high levels of genetic diversity. Oaks tend to hybridize, and introgression may influence community structure and increasing genetic diversity [6]. In the face of climatic changes, interspecific gene flow transfers adaptive alleles to avoid extinction [56,57]. The studied four oak species (Q. acutissima, Q. variabilis, Q. fabri, and Q. serrata) are all widely distributed in China. A palaeo distributive simulation showed that Q. acutissima, Q. variabilis, and Q. fabri tended toward expansion and contraction in glacial and interglacial periods, respectively [37][38][39]. Adaptive evolution in response to changing environment allows an organism to avoid extinction. Such adaptive evolution is related to the adaptive introgression of the species. In our study, hybridization and introgression existed among the studied four oak species. Although we did not confirm whether this introgression was adaptive, other experiments have confirmed this view [53,54]. In future studies, we will use genome-wide data to correlate phenotypic data with fitness to explore the adaptive introgression of Quercus. To increase the efficacy of conservation strategies, conservation management should consider evolutionary theory. Thus, managers should consider whether hybrids co-exist with the parental species when formulating conservation strategies because natural hybrids have conservation value. Natural hybridization zones are good sites for studying species adaptive evolution using newly developed genomics techniques.

Conclusions
The long-term evolutionary divergence of the studied four oaks, and the resulting interspecific introgression, led to their high levels of genetic diversity and moderate differentiation. The genetic variation among the four oaks mainly occurred within populations, with only a small percentage occurring among populations. Four types of hybrids (Q. acutissima × Q. variabilis, Q. fabri × Q. serrata, Q. acutissima × Q. fabri, and Q. acutissima × Q. variabilis × Q. fabri) were determined, accompanied by asymmetric introgression. We concluded that interspecific hybridization is commonly observed within the section and that section Quercus has a high tendency to hybridize. In future work, we hope to use genomic data to study the dynamic nature of gene flow and adaptive introgression in these hybrid populations.

Sampling and study sites
In this study, we used the new classification of Quercus proposed by Denk et al. (2017) in which Quercus is divided into eight sections: Cyclobalanopsis, Cerris, Ilex, Lobatae, Quercus, Ponticae, Protobalanus, and Virentes [27]. We selected four species in section Quercus, two wild populations of Q. acutissima, three of Q. variabilis, two of Q. fabri, and three of Q. serrata, for genetic analyses, and they were sampled randomly within four plots distributed in Jiangsu Province (Plot Zijing) and Anhui Province (Plots Bayan, Liangting, and Zhongwu) (Fig. 5). Within each locality, selected individuals were at least 30 m apart to avoid sampling the same plant. A total of 300 adult individuals were sampled, and their geographic coordinates and altitudes were recorded using a global positioning system (Table S3). From each individual plant, three to five leaves were collected, quickly dried in silica gel, and stored at room temperature for molecular experiment. At the same time, we randomly collected 10-15 leaves for morphological analysis. A single branch was collected for identification by Y.F. and then preserved in Nanjing Forestry University herbarium.

Measurement of leaf morphological traits
Nine leaf morphological attributes were measured using 10 mature leaves from each of the 300 trees (Table 4 S). The observed variables were scored as indicated in Table 4S. The dimensional characteristics were measured with a ruler. The epidermal hairs were observed under a microscope at 40 × magnification.

DNA extraction and microsatellite genotyping
Total genomic DNA was extracted from 30-40 mg dried leaves per individual using a Plant Genomic DNA Kit (Tiangen, China). The quality and concentration of the genomic DNA were evaluated using a One-drop spectrophotometer (OD-1000, Shanghai Cytoeasy Biotech Co., Ltd., China) and electrophoresis was conducted in 1% agarose gels. DNA samples were diluted to 20 ng/μL and stored at -20 °C.

Statistical analyses
We used the dimensional characteristics, transformed variables, counted variables, and observed variables described in the literature to detect the intraspecific variation. The morphological characteristics data was analyzed using SPSS 19.0 (SPSS, Inc., Chicago, IL, USA). Using the Euclidean distances, nine phenotypic characteristics were clustered at the individual and population level. The obtained population clustering graphs and individual clustering diagrams were used along with the molecular results in a comparative analysis.
For the SSR data, we used INEST, version 2.2 [64] to detect null allele frequency at each locus simultaneously in each population using the individual inbreeding model, which includes three parameters, null alleles (n), inbreeding coefficients (f), and genotyping failures (b). The number of Markov Chain Monte Carlo iterations and burn-in were set at 500,000 and 50,000 cycles, respectively. Thinning maintained every 50 th update. To conduct a Bayesian model comparison, we performed the analysis using the model with null alleles (nfb) and without null alleles (fb) to evaluate the significance of null alleles within each population. We set nfb significance using low deviance information criterion (DIC). For this method, the thresholds were set to a maximum of less than 0.2 and a mean of less than 0.1, which resulted in the final selection of all 17 loci for subsequent analyses. Linkage disequilibrium for all the locus pairs in each population and significant deviations from Hardy-Weinberg equilibrium were determined using Genepop version 4.6.9 [65].
Genetic diversity statistics of each population and each locus were estimated using POPGENE, version 1.32 [66]. The statistics included number of alleles (N), observed (H O ) and expected (H E ) heterozygosity and effective number of alleles (N E ). Polymorphic information content (PIC) was estimated in Cervus 2.0 [67], while allelic richness (A R ), genetic diversity within populations (H S ), and inbreeding coefficient (F IS ) were estimated using FSTAT version 2.9.3.2 [68]. The corrected inbreeding coefficient (F IS ′) for each population was also evaluated using INEST version 2.2 [64] with the full model (nfb). The significance of F IS ′ was assessed by comparing DIC values of inbreeding and without inbreeding models.
We estimated global F ST and cF ST [69] both with and without the "exclusion null allele" correction per loci using FreeNA [70]. The 95% level confidence intervals (CIs) of both global F ST and cF ST across loci were obtained through bootstrap resampling in the same program. The F ST of each pair of populations was estimated in GENALEX 6.5 [71] and visualized in the form of a heatmap using Heml 1.0.33 [72]. Gene flow was indirectly estimated using Wright's (1951) formula: Nm = 1-F ST /4 F ST . Additionally, we performed a principal coordinate analysis based on pairwise population F ST values using GENALEX 6.5 [71]. We tested the significance of differences in A R , H O , H S , and F ST among geographic regions and among species using FSTAT, version 2.9.3.2 [68]. The two-sided P-values were obtained after 10,000 permutations. To examine the genetic variances of allopatric and heterogeneous types we conducted analysis of molecular variance (AMOVA) using F-statistics using ARLEQUIN, version 3.5 [73]. Significances of fixation indices (F CT , F SC , and F ST ) were tested with 10,000 permutations.
STRU CTU RE software [74] was used to infer population structure. To identify the number of populations (K) capturing the data major structure, a burn-in period of 500,000 Markov Chain Monte Carlo iterations was used, with a 500,000-run length. In total, 20 independent runs were performed for each simulated K value, ranging from 1 to 9. All the iterations were run with the admixture model, which assumes that individuals may have mixed ancestry, because of the likelihood of inter-population and interspecific crossing among the four oaks. The optimal K value was identified from the maximum value of ΔK [75] as implemented using STRU CTU RE HARVESTER 0.6.93 [76]. Clusters of 20 runs were permuted using CLUMPP, version 1.1.2 [77], and DISTRUCT 1.1 [78] was employed to envisage the STRU CTU RE results after processing with CLUMPP. On the basis of the binary matrices, Nei's unbiased genetic distance matrix was calculated using GENALEX 6.5 [71], and was employed to construct a dendrogram using MEGA 7.0 [79] with the neighborjoining (NJ) clustering method [80].
The admixture coefficient (q-value) generated from STRU CTU RE was used to classify individuals into purebreds and hybrids, with a threshold q-value of = 0.1, where samples with q-values < 0.1 or > 0.9 were classified as purebreds, and those with 0.1 < q-values < 0.9 as hybrids, including the F 1 generation and backcrosses [81,82]. F 1 hybrids have q-values = 0.5, but the coefficients of backcrosses are biased toward one of the parental species and produce q-values between 0.5 and 0.9 [81]. Taking errors into consideration, individuals with 0.6 < q-values < 0.9 were recognized as backcrosses.

Supplementary Information
The online version contains supplementary material available at https:// doi. org/ 10. 1186/ s12870-021-03007-4. Additional file 1: Table S1. Deviance information criterion values for nfb, nb, and fb models in 10 populations obtained by INEST. Table S2. Probability of membership to each genetic cluster in each population of four oaks when K = 2 and 4. Table S3. Locations of 10 populations of four oak species. Table 4S.. Leaf macromorphological features examined. Table S5. SSR primer sets, number and size of alleles amplified in four oak species study.