Combined genotype and phenotype analyses reveal patterns of genomic adaptation to local environments in the subtropical oak Quercus acutissima

Understanding the effects of the demographic dynamics and environmental heterogeneity on the genomic variation of forest species is important, not only for uncovering the evolutionary history of the species, but also for predicting their ability to adapt to climate change. In this study, we combined a common garden experiment with range‐wide population genomics analyses to infer the demographic history and characterize patterns of local adaptation in a subtropical oak species, Quercus acutissima (Carruthers). We scanned approximately 8% of the oak genome using a balanced representation of both genic and non‐genic regions and identified a total of 55 361 single nucleotide polymorphisms (SNPs) in 167 trees. Genomic diversity analyses revealed an east–west split in the species distribution range. Coalescent‐based model simulations inferred a late Pleistocene divergence in Q. acutissima between the east and west groups as well as subsequent preglaciation population expansion events. Consistent with observed genetic differentiation, morphological traits also showed east–west differentiation and the biomass allocation in seedlings was significantly associated with precipitation. Environment was found to have a significant and stronger impact on the non‐neutral than the neutral SNPs, and also significantly associated with the phenotypic differentiation, suggesting that, apart from the geography, environment had played a role in determining non‐neutral and phenotypic variation. Our approach, which combined a common garden experiment with landscape genomics data, validated the hypothesis of local adaptation of this long‐lived oak tree of subtropical China. Our study joins the small number of studies that have combined genotypic and phenotypic data to detect patterns of local adaptation.


Introduction
Detecting signatures of adaptation across conspecific populations is important for understanding both the genetic basis of adaptation to the local environment and the potential effects of climate change on species (Savolainen et al., 2013;Prunier et al., 2016;Yeaman et al., 2016). Species occupying spatially heterogeneous environments are usually subject to heterogeneous selective pressures. Natural selection acts initially on phenotypic traits, over time resulting in changes of allele frequencies at the specific genetic loci underlying those traits and leading finally to locally adapted populations (Orsini et al., 2013). There is evidence of widespread local adaptation both in plants (Leimu & Fischer, 2008) and animals (Fraser et al., 2011). Tree species are of particular interest for research on local This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
May 2021 | Volume 59 | Issue 3 | 541-556 © 2020 The Authors. Journal of Systematics and Evolution published by John Wiley & Sons Australia, Ltd on behalf of Institute of Botany, Chinese Academy of Sciences adaptation because of their economic and ecological importance (Hoffmann & Sgro, 2011). Dominant tree species are keystone species for many complex ecological communities and many occupy highly heterogeneous habitats. In addition, trees typically have high outcrossing rates and large effective population sizes, resulting in low levels of linkage disequilibrium (Moritsuka et al., 2012;Parchman et al., 2012). Their genomes could therefore contain genetic material of unrealized adaptive potential Iida et al., 2014).
The architecture of adaptive traits and evolutionary history are important for studying the adaptation of tree species (Alberto et al., 2013;Kremer et al., 2014). Historical demographic processes such as range expansion and serial colonization could determine population structure and mimic the patterns produced by natural selection (Excoffier et al., 2009;Siol et al., 2010;Hoban et al., 2016). Although many common garden and reciprocal-transplant experiments have been carried out to demonstrate the adaptation of tree populations to local environments (reviewed by Savolainen et al., 2007Savolainen et al., , 2013Lind et al., 2017), it is still somewhat difficult to reveal the genetic basis of local adaptation. The lack of genome-wide investigation limits the ability to identify the genomic regions that are under selection as, in most cases, the traits that confer local adaptation in trees are suggested as polygenic quantitative traits (Le Corre & Kremer, 2012). Thus, many markers dispersed throughout the genome are needed to characterize both adaptive and random changes (Bragg et al., 2015;Chen et al., 2016). Given that few tree species have substantially developed genomic resources, restriction site-associated DNA sequencing (RAD-seq) (Miller et al., 2007;Baird et al., 2008) was a cost-effective and efficient method for generating single-nucleotide polymorphisms (SNPs) markers distributed across the whole genome. It could provide both genome-wide amounts of neutral and non-neutral loci for genomic studies of demographic history and for adaptation in trees (Parchman et al., 2018).
Oak (Quercus L. and related genera in the Fagaceae) is a powerful model for studying the adaptation of forest trees to global environmental changes in Eurasia and North America (Petit et al., 2013;Cannon et al., 2018). The combined effects of geographic distance and adaptation on the genetic and morphological differentiation of oak populations has been widely reported both in Europe (Alberto et al., 2013;Rellstab et al., 2016;Pina-Martins et al., 2019) and in California (Riordan et al., 2016;Sork et al., 2016b;Gharehaghaji et al., 2017). In addition, evidence of spatially divergent selection on climate-associated candidate genes in natural populations of valley oak (Quercus lobata Née) has provided preliminary indications of local adaptation (Sork et al., 2016b). Moreover, a study of three oak species (Q. petraea [Matt.] Liebl., Q. pubescens Willd., and Q. robur L.) in Switzerland provided an example of adaptive genetic variation associated with local topography, historical climate, and soil characteristics (Rellstab et al., 2016). However, in tropical and subtropical Asia, most genetic studies of Quercus and related genera have primarily focused on phylogeography and genetic diversity Shi et al., 2014;Xu et al., 2015;Zeng et al., 2015;Du et al., 2017;Meng et al., 2017;Deng et al., 2018;Jiang et al., 2018). Only two studies, one focusing on Castanopsis carlesii (Hemsley) Hayata (Sun et al., 2016) and another on Q. championii Benth. (Jiang et al., 2019), have used genome-wide markers to address whether climatic factors affect genetic variation in natural populations. Both studies suggested that geography and climate jointly shape the genetic structure of these species. Furthermore, specific climate variables such as seasonal precipitation were identified as key drivers of the local adaptation of Q. championii (Jiang et al., 2019). Further research, and in particular common garden experiments undertaken in concert with landscape genomics assessments, could simultaneously quantify the roles of phenotypic plasticity and local adaptation in oak species' response to environmental heterogeneity.
Quercus acutissima (Carruthers) is a dominant species occupying a wide range of environmental conditions in subtropical and warm temperate zones in China. Owing to its high ecological and economic value, it has been listed as a precious timber species in China. Populations of this species can persist across large temperature and precipitation ranges, which are associated with continuous gradients in latitude and longitude (and distance from the ocean), respectively. An investigation of the seeds of Q. acutissima found throughout its natural distribution showed significant differences between populations, especially in seed mass and free sugar content (Liu et al., 2011). Based on observed patterns of phenotypic differentiation, we hypothesized that environmental heterogeneity might exert strong selective pressures on these populations, and could thereby drive phenotypic and genotypic divergence in populations of Q. acutissima. The aim of the present study was to test the hypothesis of genetic-based climate adaptation in Q. acutissima by combining two complementary approaches, that is, an analysis of genome-wide genetic markers obtained using RAD-seq, and direct phenotypic evidence from a common garden experiment. As some of the phenotypic differences observed between populations in a common garden experiment are heritable (and could therefore be adaptive) rather than plastic (Gonda et al., 2011;Brachi et al., 2013), this experiment also enabled us to detect the local adaption on the phenotypic variation among populations from different environments. Our specific goals were to: (i) characterize the genetic structure of Q. acutissima populations; (ii) quantify phenotypic differentiation and the main environmental variables correlated with measured traits; (iii) examine the roles played by geographic isolation and adaptation in determining phenotypic and genotypic variation; and (iv) identify candidate outlier loci that might be involved in local adaptation.  Table S1).
We set up a common garden at Chuzhou, Anhui province, China (118.06°E, 32.16°N) using acorns from 15 to 20 adult trees of each population for long-term observation (Fig. 1). The common garden is located in a transitional area between the humid zone in central China and the semi-arid temperate zone in north China, with an annual mean temperature of 15.2℃ and annual precipitation of 1041.6 mm. We established the common garden using a random block design, with 15 seeds (mixed seeds from all individuals per population) in each population plot and three blocks. To prevent edge effects, we planted the same species tree around the trial. The morphological traits of seedlings were measured at 1 year old.

Restriction site-associated DNA sequencing and SNP calling
We isolated DNA from leaves of eight randomly selected and unrelated individuals from each population in the common garden and generated RAD sequencing data (Baird et al., 2008). In total, we obtained high-quality data for 167 individuals. The RAD-seq library was prepared using 200 ng DNA from each individual tree; after extraction, DNA samples were digested with the restriction enzyme EcoRI. Illumina sequencing adaptor and a unique barcode were ligated to EcoRI-digested fragments. All the individual libraries were pooled and randomly sheared, and amplified by polymerase chain reaction using Illumina sequencing primers (Doc. S1). DNA fragments with insert fragment sizes from 200 to 400 bp were isolated from a 2.0% agarose gel using QiaQuick gel extraction kits (Qiagen, Hilden, Germany). These fragments were then double-end sequenced on an Illumina HiSeq2500 (Illumina, San Diego, CA, USA).
The quality of the raw sequence reads was assessed using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/ fastqc/); adaptor sequences and low-quality bases (i.e., with base quality scores < 20) at the start and/or end of each read were removed using Trimmomatic (Bolger et al., 2014). Reads shorter than 36 bp were discarded. We used BWA-MEM (Li, 2013) to map the remaining reads to the Q. lobata reference genome (Sork et al., 2016a) with default parameters. Quercus lobata is the closest relative to Q. acutissima with a relatively good quality of genome sequence available. We carried out SNP calling using GATK version 3.2.2 with HaplotypeCaller and GenotypeGVCFs . We also used Vcftools (Danecek et al., 2011) to further filter low-quality SNPs according to the following criteria: (i) only biallelic SNPs that were at least 5 bp away from any indel and with mapping quality > 40 were retained; (ii) genotypes with a genotype quality score < 20 and sequencing depth < 10 were treated as missing; and (iii) SNP with a genotype missing call rate ≥ 30% and a minor allele frequency < 0.05 across the whole sample set were removed. After these filtering steps, 55 361 SNPs were retained for genetic analyses. For genetic structure and coalescent simulation analysis, we applied different filtering strategies as described below.
2.3 Genetic diversity, genetic structure, and population demography We estimated gene diversity (π) for each population using Arlequin version 3.5 (Excoffier & Lischer, 2010), and the perindividual based heterozygosity of each population using VCFtools version 1.012 (Danecek et al., 2011). We investigated the genetic structure of Q. acutissima populations using the model-based clustering algorithm implemented in Admixture version 1.23 (Alexander et al., 2009). As linkage disequilibrium could violate the statistical assumptions involved in this test, we pruned the dataset (kept 19 834 SNPs) using the indep-pairwise function (50 5 0.05) in PLINK (https://www.cog-genomics.org/plink/1. 9/). We ran Admixture with K = 1-15 and repeated the process 20 times with random seeds. A 10-fold cross-validation procedure was used for evaluating the runs with different K values in Admixture. We also assessed the distribution of genetic variance by principal component analysis (PCA) as implemented in EIGENSOFT version 6.0 (Patterson et al., 2006).
In cases of uneven sampling across species' range, the model-based methods for population structure may tend to partition continuous variation into spurious clusters and overestimate the number of potential clusters. The conStruct software (https://github.com/gbradburd/conStruct) infers genetic structure by taking into account the continuous pattern of differentiation in geographically separated populations (Bradburd et al., 2018). It examines two models, a non-spatial model, which is similar to the model used in Admixture, and a spatial model that allows for testing for the presence of isolation-by-distance. We tested both the spatial and non-spatial models for a range of K = 1-10, with 10 repetitions per K value and 50 000 iterations per repetition. We used 10-fold cross-validation and calculated the relative contribution of each cluster to the total covariance to determine the most likely number of clusters (K value).
To gain insight into the historical processes that generated the observed population structure, we tested eight demographic models using fastsimcoal2 (Excoffier & Foll, 2011;Excoffier et al., 2013) based on the identified two clusters (East and West) in the sampled populations (see "Results"). The models tested describe different demographic scenarios for the two groups. These models include: M1-2, divergence with or without gene flow, and descendant populations have constant population sizes; M3-4, after divergence, the descendant populations show exponential population size changes in both groups (with or without gene flow); M5-6, after divergence, the descendant populations show stepwise population size changes followed by exponential growth (with or without gene flow); and M7-8, after divergence, the descendant populations show constant exponential growth from different start times (see Fig. 3). Since fastsimcoal2 models with missing data can result in biased estimates of the site frequency spectrum (SFS), we performed a downsampling procedure following Thomé & Carstens (2016) to maximize the number of segregating sites retained for SFS estimation. Finally, we generated a joint folded SFS using Arlequin v3.5 (Excoffier & Lischer, 2010) for Group W (n = 36) and Group E (n = 30) using unpruned SNPs with > 10x coverage, including low-frequency SNPs (without filtering out SNPs with minor allele frequency < 0.05) and no missing data (4247 SNPs left). We used a mutation rate per site per year value of 1.8 × 10 −9 , as previously estimated for the genus Quercus  simulations. Each model proceeded for 100 replications of 200 000 simulations to calculate composite likelihoods (Papadopoulou & Knowles, 2015). Model selection was based on maximum likelihood over 100 independent runs using the Akaike information criterion (Excoffier et al., 2013), and the relative likelihood w i . We calculated the 95% confidence intervals (CI) for all parameter estimates from 100 parametric bootstrap replicates by simulating the SFS from the maximum composite likelihood estimates; parameters were reestimated each time.

Outlier detection and genetic-environmental association analysis
To detect putative outlier loci that may be associated with local adaptation, we used the allele frequency-based pcadapt (Luu et al., 2017) and a genetic-environmental associationbased method known as Bayenv2 (Coop et al., 2010;Günther & Coop, 2013). We used the dataset including 55 361 SNPs (without pruning) for these analyses, considering that significant associations are likely to be near the true target loci of selection. The pcadapt takes account of the population structure with PCA and using the Mahalanobis distance to detected outliers with respect to how they are related to population structure (Luu et al., 2017). We ran pcadapt iteratively for K = 1-10 latent factors for 2000 iterations with scaled allele frequencies (i.e., the default parameter); and used Cattell's graphical rule to choose the number of principal components (K). We used the R package qvalue method (Storey et al., 2015) to select SNPs at false discovery rate 0.05.
Bayenv2 tests for correlation between allele frequencies and environmental variables while accounting for sampling and covariance due to shared population history. It uses a set of putatively neutral loci as a null model to estimate an arbitrary covariance matrix of allele frequency for calibration (Günther & Coop, 2013). We generated a set of putatively neutral and independent SNPs by excluding the outliers identified by pcadapt and pruned the dataset using the indep-pairwise (50 5 0.05) implemented in PLINK. This set of SNPs was then used to estimate the covariance matrix over 100 000 iterations generated by three independent runs with different seed numbers to ensure convergence. The environmental correlations were determined by averaging five independent runs of Bayenv2 with 100 000 Markov chain Monte Carlo iterations with different random seeds and using the covariance matrix estimated by the null model as a calibration. We only considered SNPs that were in the top 1% of the observed Bayes factor values (i.e., > 3) and in the top 5% of the absolute Spearman's rank correlation coefficient (ρ) results to be robust candidate adaptive loci. The climate data included in our genetic-environmental association analysis were obtained from Worldclim (available at http:// www.worldclim.org/) (Hijimans et al., 2005); soil organic carbon, soil pH, and growing degree day values were from the Center for Sustainability and the Global Environment (https://nelson.wisc.edu/sage/data-and-models/); groundfrost frequency, wet day frequency, and vapor pressure values were taken from the IPCC database (http://www.ipccdata.org/observ/index.html); the global UV-B seasonality values were taken from the global UV-B radiation dataset (https://www.ufz.de/gluv/). To avoid multicollinearity, we selected 12 environmental variables that had pairwise Spearman correlation coefficients < 0.8 in our sampling range (Table S1). Fig. 3. A, Summary of the best demography model for the two Quercus acutissima groups. Eight demographic parameters estimated by fastsimcoal2 are shown. Each block represents a current or ancestral population with its estimated effective population size (Ne). Arrows denote the direction of gene flow with the estimated migration rate labeled above or below the arrow. The timings of the splitting events and expansion events are indicated in Mya. B, Eight demographic scenarios of the two groups: M1-2, divergence with or without gene flow, and descendant populations have constant population sizes; M3-4, after divergence, the descendant populations show exponential population size changes in both groups (with or without gene flow); M5-6, after divergence, the descendant populations show stepwise population size changes followed by exponential growth (with or without gene flow); M7-8, after divergence, the descendant populations show constant exponential growth from different start times.

Isolation by distance and isolation by environment
To understand the relative contributions of geography and environment to population differentiation, we undertook a series of redundancy analyses (RDA) to divide the amongpopulation genetic variation into IBD (isolation by distance) and IBE (isolation by environment) fractions. We included two independent matrices, the 12 environmental variables (representing IBE) and geographic variables (IBD). The geographical matrix was based on the geographic coordinates (in decimal degrees) transformed by the pcnm function in the vegan R package (Borcard & Legendre, 2002).
To prevent over-fitting, we used a forward selection procedure on geographic and environment matrix variables that retained only the relevant polynomial terms. After forward selection, the geographic variables of PCNM1, PCNM2, PCNM3, PCNM6, PCNM7, and PCNM9 and the environment variables of bio7, bio12, bio14, and bio19 were left. We used the Hellinger-transformed allele frequencies prior to RDA (Legendre & Gallagher, 2001), and we calculated the adjusted R 2 (variance explained by the model) to correct for the number of explanatory variables using the varpart and rda functions of the vegan package (Oksanen et al., 2016). The statistical significance of each partition was tested using the anova.cca function of the vegan package with a step size of 1000, resulting in at least 999 permutations. We undertook RDAs on both neutral (19 029 SNPs: pruned SNPs excluding the outlier identified by both pcadapt and bayenv2) and outliers identified by both pcadapt and bayenv2 (119 SNPs).

Phenotypic trait analysis
We scored the phenotypic traits of three 1-year-old seedlings that were randomly selected from each population grown in the common garden. We took smaller numbers (three individuals per population) for phenotype analysis as the biomass measurement of the seedlings will completely destroy the plants. We only collected the leaves for genotyping in two populations (YNXD and YNJC), and in another population (GZHP) we did not obtain phenotype data, thus we undertook the phenotypic trait analysis based on the remaining 24 populations. The traits that we measured included total mass, leaf area, leaf mass fraction (LMF, g leaf mass g −1 total mass), stem mass fraction (SMF, g stem mass g −1 total mass), root mass fraction (RMF, g root mass g −1 total mass), and specific taproot length (STRL, mm root length g −1 root mass), which measures the root extension achieved per unit root mass. The LMF, SMF, and RMF represent biomass partitioning among the major organs, and STRL represents the efficiency of soil depth penetration to access deeper water (Tomlinson et al., 2012). We ran a PCA on these morphological traits for all populations to examine whether morphological and genetic variables showed the same structure. Then we tested whether any of the traits differed between the two groups using Welch's two-sample t-tests, allowing for unequal variances. Next, we ran multiple regression models to find the best environmental predictors for each trait. We rescaled all predictors to a mean of zero and a variance of one so that coefficients could be compared directly. Because we had few replication samples, we used forward selection to maximize the degrees of freedom of the error. Parameters were added into the model only if they decreased the Akaike information criterion. All statistical analyses were undertaken using R (R Development Core Team, 2016). In addition, we examined the association of morphological variation with environmental and geographical factors using RDA.

Results
3.1 Restriction site-associated DNA sequencing data, genetic diversity, and structure Sequencing of the RAD library generated 2 016 014 750 reads in total and with an average of 12 071 945 reads per individual. After trimming and quality filtering, we retained an average of 11 359 235 reads per individual (  Fig. S1). The GATK pipeline identified 856 474 RAD loci in total. We applied a stringent reference genome-guided SNP calling strategy, and obtained a total of 55 361 SNPs from 167 samples, of which 23 190 (42.7%) were located in genic regions. The distributions of the missing data over all individuals were between 0.03 and 0.29 ( Fig. S1; Table S2).
All methods used to infer the genetic structures (i.e., admixture, conStruct, and PCA) have concordance results that revealed two genetic clusters in our samples of Q. acutissima which split the sampled populations into the East (E) and West (W) groups (Fig. 2). Group E included 11 populations (AHCZ1, AHCZ2, AHTH, AHQS, AHHS, ZJLQ, ZJJD, ZJFY, ZJKH, JSXS, and GDLC) ranging from Anhui to Zhejiang provinces, and group W included 16 populations (GXRS, SCLZ, SCWY, HBYA, HNYY, SDYS, HNCD, HNXN, YNXD, YNJC, HNCS, HNNZ, SXHZ, GZRJ, GZHP, and GZSS) ranging from Hubei to Yunnan in SW China (Figs. 2A, 2C). For the conStruct analysis, the cross-validation plot showed that the predictive accuracy improved very little after K = 2, although the highest predictive accuracy continued to K = 5 in the nonspatial model, which is indicative of overestimation of the potential cluster number (Fig. S2). For the spatial model, the highest predictive accuracy was K = 3; however, the first two layers contributed the most to the total covariance (Fig. S3), indicating that K = 2 sufficiently described the population structure. The spatial model displayed a better fit than the nonspatial model at K = 2 (Fig. S2), and the decay of genetic relatedness against geographic distance (α D = 0 for the nonspatial model, α D(layer 1) = 0.005 and α D(layer 2) = 0.001 for the spatial model ( Fig. S3; Table S3) suggested weak IBD within each ancestral layer. Together, the conStruct results support the East and West groups and moderately indicate the presence of IBD. One population (AHTH) from the East group differed from other populations in the structure analysis when K = 3 and 4, which might be due to the genetic introgression from other sympatricrelated species (Q. variabilis Blume and Q. chenii Nakai). Thus, this population was excluded in the following population history analysis. Partitioning the genetic variance from the PCA analysis also clearly separated the populations into these two groups (Figs. 2B, S4). Measurements of genetic diversity and the average observed heterozygosity in the sampled populations are summarized in Table S1. The estimates of gene diversity (π) and observed heterozygosity (H o ) were slightly higher in group E (0.2319 and 0.2772, respectively) than in group W (0.2222 and 0.2593).

Outlier detection and genetic-environmental association analysis
Analyses using pcadapt detected 2628 SNPs (4.75% of 55 361 SNPs) as significant outliers, while Bayenv2 identified 431 outlier SNPs (0.78% of 55 361 SNPs). The number of SNPs associated with environmental variables is shown in Table S6. To diminish the frequency of false positives, we selected 119 SNPs identified by both methods as robust outliers. The potential functions of the genes carrying these 119 SNPs were deduced from annotations of their homologs in the Q. lobata genome, which showed that 55 were functional genes. Thirty-seven genes are associated with temperature and 12 genes are likely related to adaptation to precipitation (Table S7). There are also seven genes that correlate to soil pH and four genes related to global UV-B radiation.
To understand the impacts of environment and geography on genetic and morphological differentiation, we undertook a series of RDAs on both neutral (pruned SNPs excluding the outliers from both the pcadapt and bayenv2 analysis) and putatively adaptive SNPs (outliers detected by both the pcadapt and bayenv2 analysis) as well as phenotypic traits (Table 1). We found that, when environment was controlled for, geography alone explained a significant proportion (10.11%) of the variation on neutral loci, but when geography was controlled for, the effect of environment was not significant. Regarding outlier loci, both geography (22.36%, P = 0.001, constrained by environment) and environment (5.16%, P = 0.029, constrained by geography) showed significant contributions. The combined effect of geography and environment was 44.06% on outlier loci and 20.80% on neutral loci (Table 1).
The two population groups appear to inhabit different precipitation zones, with Group E being found in areas with moister conditions than group W. Group E areas were found to be associated with greater mean annual precipitation (t-value = 2.92, P < 0.01), greater precipitation in the driest month (t-value = 3.08, P < 0.01), and greater precipitation of coldest quarter (t-value = 1.73, P < 0.01) ( Table 2).
Mean annual rainfall affected biomass traits, with a positive effect on leaf and stem mass fractions and a negative effect on root mass fraction ( Fig. 5; Table 3). Soil organic carbon also affected several traits, notably total mass (negative), leaf area (positive), and specific taproot length (positive; see Table 3). The diurnal temperature range had a positive effect on leaf area, whereas soil pH and Redundancy analysis test is of the form: F~independent matrices|covariate matrices. Significance of confounded fractions between environment and geography was not tested. *P < 0.1; **P < 0.01; ***P < 0.001. ns, not significant; Total explained, total adjusted R 2 of individual fractions.
Uvb both had negative effects on leaf area (Table 3). Two components, which together accounted for 72.4% of the variation, were identified by the morphological PCA test and were consistent with the pattern of genetic structure. The root, stem, and leaf mass fractions were all correlated with the first PCA axis, whereas specific taproot length, total mass, and leaf area were correlated with the second axis (Fig. S4). With respect to morphological variation, only the environment was a significant predictor (28.95%, P = 0.035), once geography was controlled for. The combined effect of geography and environment explained 34.41% of the morphological variation (Table 1).

Spatial genetic structure and divergence history of Quercus acutissima
This study surveyed approximately 8% of the oak genome and recovered 55 361 SNPs from sampled populations of Q. acutissima, and these sites included a balanced proportion of both genic and non-coding regions. Genetic structure and PCA analyses of these populations indicated an east-west split. Concerning the two geographic discrete groups separated by nearly 300 km, other published reports suggest there is an unsampled natural population that exists inbetween the two regions we covered (Zhang et al., , 2018. The Admixture method might overestimate potential clustering by partitioning a continuous variation into false separated clusters. To further assess the spatial pattern of diversity, we used the conStruct analysis to infer the genetic structure, which accounts for the continuous pattern of differentiation in geographically separated populations (Bradburd et al., 2018). The conStruct spatial model supports a weak IBD and revealed that two ancestry layers contributed the most to the total covariance between populations, indicating that two potential clusters sufficiently described the population structure. A typical IBD refers to a pattern of geographically restricted dispersal, allowing genetic drift to establish differentiation between distant locations (Wright, 1943). However, the weak IBD observed in the conStruct analysis of the Q. actuissima populations suggests that other historical processes, like geographic/environmental isolation or past refugia, resulted in the isolation of the eastern and western groups. A previous study of this species using three chloroplast DNA fragments did not detect any obvious geographical structure , likely due to incomplete lineage sorting and shared chlorotypes among populations (Simeone et al., 2016;Vitelli et al., 2017). However, a study using 10 nuclear microsatellites (Simple sequence repeats) (Zhang et al., 2018) of this species uncovered a clear east-west divergence, supporting the pattern derived from the whole genome analysis in this study. A similar phylogeographic difference between plant populations found in subtropical west and temperate east China has been found in other Fagaceae species, including Castanopsis fargesii Franch. (Li et al., 2014;Sun et al., 2014), C. eyrei (Champ. ex Benth.) Tutch. (Shi et al., 2014), and C. carlesii (Sun et al., 2016), as well as many other plant species (reviewed by Qiu et al., 2011; see also Liu et al., 2012). Taken together, these results suggest that an east-west split could be common for tree species in this area. The estimated divergence time between the two groups of Q. acutissima was approximately 1.31 Mya (95% CI, 1.27-1.34 Mya). When assuming an average generation time of 50 years, the divergence time estimated by simple sequence repeat markers for Q. acutissima (Zhang et al., 2018) was 1.775-1.185 Mya; this value also supports our results, and this estimate is similar to that of another sympatric oak species, Q. variabilis (1.45 Mya, Chen et al., 2012). The margins of the east and west populations occur along the eastern margin of the Yunnan-Guizhou Plateau and is the boundary between the second and third Chinese steppe regions (Committee of Chinese Academy of Sciences for Physical Geography of China, 1985). Therefore, the genetic divergence between the east and west populations might be the result of regional responses to the uplift of the Yunnan-Guizhou Plateau and associated mountain ranges in adjacent areas. The uplift of the Yunnan-Guizhou Plateau initiated approximately 2.8 Mya ago (Tang et al., 1994) and several mountain ranges (i.e., the Wuyi and Nanling ranges) gradually uplifted in the late  Model predictor variables were selected using forward selection using the criterion that variables were included if the model Akaike information criterion decreased. bio 2, mean diurnal temperature range; bio 12, annual precipitation; LMF, leaf mass fraction; RMF, root mass fraction; SC, soil organic carbon; SMF, stem mass fraction; SpH, soil pH; STRL, specific taproot length; Uvb2, global UV-B radiation seasonality.
Pleistocene (Chen & Zhou, 1993;Luo et al., 2010). These physical features could have restricted inter-regional gene flow. Furthermore, the intensified East Asia monsoon during the Pleistocene (Clift et al., 2008;An et al., 2015) increased the climatic differences between the second and third steppe regions, that is, a moist maritime climate in the east and seasonal inland climate in west China. An obvious climatic gradient can be observed from west to east, especially with respect to precipitation in the driest month, the driest quarter, and the coldest quarter; each of these variables is generally low in the west and higher in the east (Li et al., 2014). Taken together, ancient regional geological events along with changes in climate likely drove west-east lineage isolation and increased the habitat heterogeneity of plant populations distributed in East Asia (Yan et al., 2012(Yan et al., , 2013Meng et al., 2017;Jiang et al., 2018), including the habitat of Q. acutissima.
The estimated gene diversity in Q. acutissima is similar to that of C. carlesii, another sympatric Fagaceae species (the mean gene diversity for the western and eastern group was 0.250 and 0.32, respectively, in Sun et al., 2016) with a higher diversity in the eastern group than in the western group; however, these differences are not significant. The two groups each occupied a large area with constant population size over a long period of time following divergence, therefore indicating a relatively consistent genetic diversity. However, the slightly higher diversity in the eastern group might be because genetic diversity was increased by introgression between genetically differentiated populations, as hybridization and introgression are common in sympatric and closely related oak species (Cannon et al., 2018), as documented in C. carlesii (Ortego et al., 2015;Sun et al., 2016) as well as in Q. acutissima. We found that the rate of gene flow from the east group to the west group was much higher than that from the west group to the east group. Asymmetric gene flow between populations or species is usually caused by a combination of several factors. First, asymmetric gene flow can result from intrinsic factors, such as hybrid incompatibility , and extrinsic factors, such as wind direction (Friedman & Barrett, 2009). Quercus actuissima is located in the East Asian monsoon region; it is wind pollinated and flowers in early spring (March-April). The subtropical regions of China are mainly affected by western and southern winds from March to May, which facilitate gene flow from the eastern to the southwestern group through wind-borne pollen (Xu & Gao, 1962;Bai et al., 2010). Moreover, a high level of environmental heterogeneity often leads to directional gene flow (Paz-Vinas et al., 2013). Second, population abundance also affects gene flow. Although the estimated effective population size of the east group was slightly lower than that of the west group, potentially leading to a greater gene flow rate from the west group to the east group, this effect might be counteracted by other factors. Third, it is possible that gene flow estimated with the fastsimcoal2 model is biased by genetic introgression from a third, unsampled species (Strasburg & Rieseberg, 2010;Gao et al., 2012). Introgression from the sympatric closely related oak species like Q. variabilis (Chen et al., 2012) and Q. chenii  in the east area might result in an overestimation of the gene flow from the east group to the west group and an underestimation of that in the opposite direction. Further analysis should include sister species in the adjacent regions to better understand the patterns of introgression.

Signature of environmental adaptation
Partitioning the relative effects of geographic distance and environmental differences in Q. acutissima showed that geography and environment combined could explain 17.40% of the genetic differentiation of neutral loci and 44.06% of the variation of outlier SNPs. Geography alone explained 10.11% and 22.36% of the variation at neutral and outlier SNPs, respectively, indicating a strong effect of geographic isolation. This suggests that gene flow in this species is geographically limited, which is consistent with the demographic model. Environment had a significant and stronger impact on outlier loci and morphology. Environment alone explained 5.16% of the variation of outlier loci, and 28.95% of phenotype differentiation, but geography had no significant effect on phenotype. Taken together, the genetic and phenotypic analyses revealed a joint effect of geography and environment on Q. acutissima. The proportion of the variation of outlier SNPs explained by environment was much smaller in Q. acutissima than that reported in two previous studies in Q. lobata Sork et al., 2016b). This could be due to the differences in genetic data; the two Q. lobata studies used genetic markers either putatively involved in local adaptation or obtained from the transcriptome data. These markers are more likely linked to genes that underwent adaptive selection. However, the proportion of variation explained by environment in our study was larger than those estimated in three California oaks (Q. engelmannii Greene, Q. berberidifolia Liebm., and Q. cornelis-mulleri Nixon & K.P. Steele) using microsatellite markers (Riordan et al., 2016), suggesting the effectiveness of our genome-wide scan method.
Distinguishing isolation by colonization (IBC) from IBD and IBE is difficult as their contributions are often confounded. The serial colonization events could lead to a similar pattern as IBD/IBE (Ramachandran et al., 2005;DeGiorgio et al., 2009;Henn et al., 2012) when colonization routes correlate with the environmental/geographic gradients (Orsini et al., 2013;Dewoody et al., 2015). However, the genetic signatures of IBC could be eliminated if the contemporary level of gene flow and the time since colonization is sufficient (Xia et al., 2018). Tree species in subtropical China were less affected by Pleistocene glaciations, and they have been preserved in multiple localized glacial refugia in central and southwestern China during the glaciations (Hewitt, 2000;Shi, 2002). The previous study of the phylogeographic pattern revealed by chloroplast DNA variation in Q. acutissima shows no signs of long-distance colonization . Our historical demography inferred a much earlier divergence time (1.31 Mya) of the two groups of Q. acutissima, as well as the pre-last glacial maximum expansion (0.81 Mya) of the two groups, support the previous results that it did not experience the postglaciations recolonizaiton. Thus, a long time since the population divergence enabled this species to produce a higher number of generations with contemporary levels of gene flow, which could erode the influence of historical factors. Moreover, morphological differentiation, in contrast to genetic variation, is widely considered adaptive and is rarely considered as historic phenotypes of the ancestral gene pools from colonization (Kremer et al., 2002;Gailing et al., 2012). The environment, rather than geography, was significantly associated with phenotypic differentiation in our study supported the effect of environmental isolation (Orsini et al., 2013).
Differences in environment between the two spatially and genetically distinct groups reflect differences in precipitation. Relative to the western group (Group W), the eastern group (Group E) is found in areas with greater mean annual precipitation, greater precipitation in the driest month, and lower precipitation seasonality. Consistent with this environmental differentiation, these two populations showed signs of phenotypical divergence that reflect adaptation to the major climate variable (i.e., precipitation). Seedlings from Group W had more below-ground biomass, a characteristic of plants adapted to drier areas, whereas seedlings from Group E had greater allocation to above-ground biomass, likely reflecting competition for growth. Biomass allocation traits were found to be significantly associated with precipitation, thus further corroborating the role of environment in driving the differentiation of these populations. In general, precipitation critically shapes many plant adaptive traits, including leaf size, branching architecture, and growth rate (Nicotra & Davidson, 2010;Moles et al., 2014). Several studies have found that species adapted to low or seasonal precipitation tend to show increased allocation to root mass (Mokany et al., 2006;Poorter et al., 2012). Furthermore, correlations between biomass and/or ecophysiological traits and climatic variables related to precipitation or temperature have also been reported (De Kort et al., 2014;Lind et al., 2017). For example, Solidago virgaurea L., a species distributed in dry serpentine habitats, shows higher resource allocation to below-ground biomass, a trait involved in local adaptation to dry sites (Sakaguchi et al., 2017). We detected 236 outliers that correlated with precipitation-related variables using bayenv2. Fifty-five annotated genes were identified based on the 119 most robust outlier SNPs, of which 12 genes were related to precipitation response. Therefore, precipitation was associated with both the morphological traits and the outlier SNPs, further supporting the notion that precipitation had driven the local adaptation of this species. The role of precipitation variables driving genetic adaptation in subtropical China has also been confirmed in the sympatric Fagaceae species pair C. fargesii (Li et al., 2014) and Q. championii (Jiang et al., 2019). In another study, the precipitation of the warmest quarter of the growing season explained a larger proportion of the variation of the geographic distribution of the valley oak Q. lobata in North America and was also found to lead to genetic differentiation (Gugger et al., 2013). Individual climatic variables varied with genetic and morphological variations in the three sympatric Californian oaks, suggesting a species-specific response. However, precipitation seasonality was an exception, which is only associated with genetic variations in all the three species (Riordan et al., 2016). Together, these studies indicate that precipitation is a potential common selective pressure for oak species. However, it is worth noting that many of the SNPs we identified were associated with temperature-related variables and several SNPs were related to soil organic carbon, soil pH, and global UV-B radiation. Manel et al. (2012) investigated 13 European alpine species and found that temperature and precipitation both strongly influenced the adaptive genetic variation in those species. Thus, temperature might also have a minor impact on the adaptive genetic variation in Q. acutissima.
In this study, many of the genes identified by our functional annotations of the outliers play roles in growth, development, and disease resistance. One of the detected outlier SNPs (SNP_ID: scaffold2960: 20721) was functionally assigned to a sugar transport protein (i.e., a sucrose transporter). This locus might therefore be involved in the biomass allocation patterns in plants. In plants, sucrose is important as an energy vector, a signaling molecule, and as a source material for the building of carbon skeletons (Lemoine, 2000). Sugar transport proteins mediate cell-tocell and long-distance transport of sugars throughout the plant (Williams et al., 2000). Some functional genes have been identified in other plant species confirming the robustness of our outlier identification; for instance, the loci coded for glutathione S-transferase (GST) was also indicated to play a role in environmental adaptation and is significantly associated with bud phenology traits in oaks (Alberto et al., 2013;Sork et al., 2016b); the SNPs annotated as marking a GTP-binding protein, a NAC domain-containing protein, and UDP-glucosyltransferase in Q. acutissima have homologous genes in Populus L. (Geraldes et al., 2014). NAC genes are involved in drought and salt resistance as well as the regulation of drought-response genes (Hu et al., 2006;Lu et al., 2007), and have also been identified as linked to precipitation in genetic association studies of black spruce (Prunier et al., 2011).

Conclusions and Implications
Adaptive phenotypic traits are usually controlled by multiple genes, each with small effects spread across the genome (Le Corre & Kremer, 2012;Hendry, 2013). Detecting the actual loci linked to phenotypic variability from a genomewide association study strongly depends on the extent of linkage disequilibrium and the density of markers along the genome. The RAD-seq technique is capable of uncovering a large number of markers throughout the genome. In this study, we sampled approximately 8% of the oak genome, a higher degree of coverage than other genetic studies on oak species undertaken to date (e.g., Eation et al., 2015;Deng et al., 2018;Ortego et al., 2018;Pina-Martins et al., 2019). Most importantly, 42.7% of the 55 361 SNPs identified in this study were located in genic regions, a finding rarely reported in other studies. This high-density dataset should provide a sufficient amount of information to assess the population structure and demographic history of populations of Quercus acutissima, and might also be sufficient to detect outlier loci in genic regions, thereby increasing the probability of inferring potentially adaptive variation. Another important advantage of this study is that we used a common garden experiment to detect adaptive phenotypic divergence, enabling us to distinguish such divergence from phenotypic plasticity. Moreover, thus far, few studies (De Kort et al., 2014;Sork et al., 2016b;Vangestel et al., 2018) have combined genome-wide scans with common garden experiments, especially in forest trees with large genomes and longer generation times. To our knowledge, this study is the first to attempt to combine a genome scan and phenotype evaluation in a common garden in an Asian subtropical oak species; it will provide a valuable study system to compare with temperate oaks in North America and Europe. We also note the existence of a few caveats in the interpretation of our study, including the small sample size used to estimate phenotypic variation in each population, and the fact that measurements were taken for only 1 year. Although young seedling traits cannot fully reflect adult phenotypes, they still provide useful information on adaptation, and were used in the neotropical oak Q. oleoides to infer the evolutionary potential to climate change (Ramírez-Valiente et al., 2018) as well as in other studies (Horacio & Miguel, 2003;Savolainen et al., 2004). In the future, long-term observations of a large number of individuals will be carried out in the common garden experiment, which should provide us with additional information regarding adaptive variation in Q. acutissima. Thus far, the adaptive morphological traits and candidate genes discovered in Q. acutissima could help in effective artificial selection and screening of stock material for transplanting, thereby assisting both breeding programs and management decisions for predicting tree performance in response to climate change.
The following supplementary material is available online for this article at http://onlinelibrary.wiley.com/doi/10.1111/jse. 12568/suppinfo: Fig. S1. Histogram of restriction site-associated DNA sequencing (RAD-seq) data quality. A, the percentage of the loci achieved the Q30 (%); B, the distribution of reserved reads ratio (%); C, the genome coverage (Mbp); D, E, the distribution of mean sequencing depth and missing rate (%). Red and blue vertical lines indicate the mean and median value of the distribution, respectively. Fig. S2. Cross-validation results from K = 2 to K = 10 of the conStruct analysis. Green dots represent the non-spatial model; blue dots represent the spatial model. Fig. S3. Genetic structure of the restriction site-associated DNA sequencing (RAD-seq) variation in 27 populations from conStruct. A-C, Results of the spatial model. D-F, Results of the non-spatial model. A, D, Maps of admixture proportions for K = 2. B, E, Corresponding layer-specific covariance curves for K = 2. C, F, Each cluster's relative contribution to the total covariance to each K value. Fig. S4. Distribution of genetic variance by principal component analysis (PCA) for PC1 and PC3. Fig. S5. Principal component analysis (PCA) of the morphological traits from the common garden. LMF, leaf mass fraction; RMF, root mass fraction; SMF, stem mass fraction; STRL, specific taproot length. Table S1. Geographic locations, sample size (N), gene diversity (π), average heterozygosity per locus (Het), and environmental variables of the 27 Quercus acutissima populations. Table S2. Sequencing quality information of each individual. Table S3. Alpha D values based on the spatial model and the non-spatial model. Table S4. Model composite likelihoods and Akaike information criterion model selection for eight alternative demographic models of Quercus acutissima from fast-simcoal2. Table S5. Demography parameters inferred from the best model. Migration rates scaled according to population effective sizes (2Nm) are given forward in time. The 95% confidence intervals were generated from 100 bootstrapped datasets. Estimates of divergence and expansion times are given in years. Estimates of sizes (Ne) are given in the number of haploids. Table S6. Number of putatively adapted single nucleotide polymorphisms detected using Bayenv2. Table S7. Gene annotation identified using the outlier approach and by testing for association with environmental variables in pcadapt and Bayenv2. Doc. S1. The protocol of the restriction site-associated DNA sequencing (RAD-seq) library preparation. Data S1. Raw sequence reads and phenotype data, available from the Dryad Digital Repository: https://doi.org/10.5061/ dryad.q2bvq83fv