Introduction

Natural selection is an important driver of evolution, as advantageous alleles can be inherited across generations to maximize fitness in the local environment (Kawecki and Ebert, 2004; Aitken et al., 2008). Although neutral processes such as drift and migration exert a genome-wide effect, divergent selection may result in clear allele frequency shifts at loci under selection and/or subtle changes in adaptive allelic combinations (Latta, 1998; Le Corre and Kremer, 2003; Kremer and Le Corre, 2012; Eckert et al., 2013). These genomic signatures of selection are expected to be more pronounced with increasing geographical scale because of larger environmental differences and decreased gene flow across longer distances (Volis et al., 2005; Fraser et al., 2011). Unraveling the molecular signature of selection at different geographic scales and along different environmental gradients may reveal fundamental insights into the evolutionary processes underlying species’ abilities to respond to environmental cues, and it may assist management decisions with regard to the preservation of adaptive differentiation within species (Ekblom and Galindo, 2011; Breed et al., 2012; Funk et al., 2012; Narum et al., 2013).

Although the identification of loci that are under selection remains challenging in non-model species without reference genome sequences, several methods exist that aim to detect genetic variation putatively driven by selective forces (Seeb et al., 2011; Schoville et al., 2012; Jones et al., 2013; Joost et al., 2013; Weinig et al., 2014). First, the genome can be screened for outlier loci, by identifying genetic marker variation deviating from overall expected neutral distributions (Luikart et al., 2003; Excoffier et al., 2009). This may, however, render false negative results in the case of weak divergent selection or when small-effect loci are involved, resulting in limited differences in allele frequencies among populations, and therefore low power to detect weak or polygenic selection (Narum and Hess, 2011; De Villemereuil et al., 2014). Moreover, complex demographic history and population structure tend to increase the number of false positives generated by outlier detection methods (Foll and Gaggiotti, 2008; Jones et al., 2013). Landscape genomic approaches do not rely on FST-based neutrality assumptions but attempt to find associations between allele distributions and environmental variables presumed to be the drivers of selection (Manel et al., 2003; Eckert et al., 2010; Jones et al., 2013; Manel and Holderegger, 2013; Sork et al., 2013). Although landscape genomics approaches considerably increase the likelihood of finding weak and polygenic selection (De Mita et al., 2013), correlations among environmental variables (that is, spatial autocorrelation), and lack of knowledge regarding the true drivers of selection, may increase the amount of false positive results (De Villemereuil et al., 2014). In addition, it is assumed that there was sufficient time for selection to drive patterns of allele frequency distributions caused by the environmental agents of selection (Joost et al., 2013). Yet when used together, population genomic and landscape genomic approaches may complement each other in terms of power and accuracy (Joost et al., 2013; De Villemereuil et al., 2014; Weinig et al., 2014).

Despite the importance of tree species with regard to ecosystem service provisioning (for example, carbon fixation timber production, water regulation and erosion control), ecogenomic studies on tree species that lack closely related reference genomes are scarce. Although two studies have investigated the genetic signature of selection using amplified fragment length polymorphism markers in tree species hitherto lacking information on gene functions (Alnus glutinosa, Cox et al., 2011, and Fagus sylvatica, Jump et al., 2006), to our knowledge, only one study so far has screened the uncharacterized genome of a tree species for genome-wide patterns of natural selection using completely anonymous single-nucleotide polymorphisms (SNPs) (A. glutinosa, De Kort et al. (2014)). In contrast, tree genera for which genomic resources are available (for example, Pinus sp., Picea sp., Eucalyptus sp., and Populus sp.) have been frequently examined for patterns of environmental selection (for example, Grivet et al., 2011; Prunier et al., 2011; Keller et al., 2011, Bradbury et al., 2013; Steane et al., 2014). Hence, although genomic markers provide valuable information on the capacity of tree species to respond to climatic cues, little is known about the relative effects of climate and geographical scale on the genomic differentiation in non-model tree species. A most effective strategy for gathering genomic resources in non-model tree species involves pooled restriction-sitesite-associated DNA (RAD) sequencing, which relies on the sequencing of a few hundreds of base pairs adjacent to restriction enzyme recognition sites (Baird et al., 2008). Pooling a limited number of individuals sampled across the species’ distribution range greatly reduces the costs of discovering thousands of SNPs (Vandepitte et al., 2013). Moreover, pooled RADseq opens the possibility to genotype a selected set of candidate SNPs in a large population samples at relatively low costs using allele specific primers (Cuppen, 2007; Liu et al., 2012; Vandepite et al., 2013), as the discovered SNPs can be functionality annotated by using the genome of a model species as a reference (Paterson et al., 2010). SNPs with putatively adaptive functions constitute excellent candidates for population and lanscape genomic research, whereas a set of random SNPs lacking functional annotations can be used to simulate neutral expectations.

Here, we performed a population and landscape genomic investigation of Frangula alnus subsp. alnus Mill. (Rhamnaceae), an insect-pollinated, deciduous small tree with a high seed dispersal capacity and a wide geographical range (Medan, 1994; Hampe, 2004; Hampe, 2005). Our major aims were (i) to evaluate the relative strength of selection in shaping the distribution of genomic variation at different spatial scales and (ii) to identify putative environmental drivers. To achieve these goals, population and landscape genomics approaches were used to screen a previously developed panel of annotated SNPs (De Kort et al., 2013) for the molecular imprints of selection across a continental scale (Europe), a regional scale (the Flemish region, Belgium) and an intermediate scale (the regional scale together with a southern Belgian and a French region). More specifically, a standard Bayesian population genomic outlier screen (BAYESCAN, Foll & Gaggiotti, 2008) was complemented with individual-based latent factor mixed modeling (LFMM, Frichot et al., 2013), and multivariate redundancy analyses (RDA, Borcard et al., 2011), to identify (i) SNPs putatively under selection and (ii) the main environmental drivers of population differentiation while accounting for spatial autocorrelation in allele frequencies. Although the use of candidate SNPs has been the standard method to find loci putatively under selection (for example, Namroud et al., 2008; Alberto et al., 2013; Guichoux et al., 2013; Olson et al., 2013), they may compromise the relevance of classical outlier detection methods due to overestimation of neutral distributions. This stresses the importance of integrating landscape genomic methods to reliably reveal loci putatively under selection and the selective agents underlying the observed patterns (De Villemereuil et al., 2014; Weinig et al., 2014).

Material and methods

Study species

Frangula alnus (Rhamnaceae) is a small, self-incompatible, insect-pollinated tree that occurs throughout most of Europe and in northern Morocco (Medan, 1994; Hampe, 2005). The most prevalent subspecies (F. alnus alnus) occurs throughout most of the continental range, whereas the two other subspecies are limited to Central and East Anatolia (F. alnus pontica), and South Iberia and North Morocco (F. alnus baetica). The fruits of the species are dispersed by birds and water which may transfer alleles over large distances (Hampe, 2004; Kremer et al., 2012). F. alnus grows on soils with diverse humidity levels, varying from relatively dry sandy soils over ditches and wet forest edges to wetlands (Godwin, 1943; Hampe et al., 2003; Fiedler and Landis, 2012).

Samples and genotyping

We collected samples from 25 populations of F. alnus subsp. alnus across Europe, largely covering the extent of the subspecies’ distribution (Figure 1, Table 1). The 25 populations were sampled across eight regions, with three populations from each of four regions in Flanders (northern Belgium: Sandy region, Western Brabant, Eastern Brabant and Campine), three populations from the Ardennes region (southern Belgium), three populations from both the Picardy region (northern France) and the Medelpad region (Sweden) and four populations from the Tuscany region (Italy). The populations from Medelpad and Tuscany represent the northern and southern edge of the F. alnus subsp. alnus geographical distribution range, respectively. We explored patterns of genetic clustering and natural selection across three nested spatial scales: the regional scale (four regions in Flanders), the intermediate scale (the four Flemish regions + Picardy + Ardennes; six regions) and the continental scale, also including Tuscany and Medelpad (eight regions).

Figure 1
figure 1

Sampling locations of F. alnus subsp. alnus. Twelve populations in Flanders (northern Belgium), three populations in the Ardennes (southern Belgium), three populations in Picardy (France), three populations in Medelpad (Sweden) and four populations in Tuscany (Italy). Population codes are explained in Table 1.

Table 1 Population code and location, sample size, expected and observed heterozygosity of the sampled populations

A total of 619 F. alnus subsp. alnus trees were genotyped for 250 putative SNPs, which were selected from a large set of 7383 previously discovered SNPs (for the full discovery, genotyping and annotation procedure see De Kort et al. (2013)). To reduce SNP discovery induced ascertainment bias, which refers to the systematical underrepresentation of existing SNPs, and which mainly affects estimates of population size among other demographic parameters (Nielsen et al., 2004; Rosenblum & Novembre, 2007), we assembled our ascertainment panel from individuals sampled across the distribution range of F. alnus subsp. alnus (Morin et al., 2004; Nielsen et al., 2004). Furthermore, the high sequencing coverage (on average 39.7x), the exclusion of rare alleles, and the small genome of F. alnus (see De Kort et al., 2013) reduce the occurrence of sequencing and read assembly errors that may mimic SNPs (Da Silva et al., 2014).

Briefly, paired-end sequencing of restriction-site-associated DNA (RAD-PE; Etter et al., 2011; Vandepitte et al., 2013) was used on a pooled set of eight individuals distributed across the species’ natural range, to find polymorphisms randomly distributed across the F. alnus subsp. alnus genome. The contig sequences containing the 7383 detected biallelic putative SNPs were subsequently subjected to a blastx against all plant proteins available at the Swiss-Prot database using Blast2Go, and hits were mapped to their corresponding gene annotations using the Gene Ontology (GO) database and several additional database files for functional categorization. This rendered annotation results for 306 SNPs, of which the vast majority (n=250) were selected for genotyping using allele-specific primer extension genotyping (KASPar, LGC Genomics, Teddington, UK). First, we selected all SNPs assigned to one or more of the following, broad higher level GO slim (http://www.geneontology.org/) categories: responses to stimulus (137 GO terms), reproduction (84 GO terms) and developmental processes (225 GO terms) (Supplementary Information 2, De Kort et al., 2013). This yielded a set of 180 putative SNPs which was then complemented with 70 SNPs taken randomly from other categories. The selection of SNPs representing a wide array of gene functions should ensure that overall patterns approximate genome-wide genic variation (see Vandepitte et al., 2014 for a similar approach). Moreover, the use of candidate SNPs may increase the likelihood of finding loci putatively associated with selective agents considerably. Of the 250 assayed SNPs, 183 had fewer than 10% missing data within individuals, and could be genotyped successfully by using a allele-specific primer extension (De Kort et al., 2013). This set of 183 SNPs was used for population genomic data analysis (see below).

Population structure and genomics

For each population, estimates of expected (HE) and observed (Ho) heterozygosity were obtained by using GENALEX 6.4 (Peakall and Smouse, 2006). A mixed linear model was built to compare HE estimates between neutral and outlier SNPs (see below for the outlier detection method). To account for non-independency among data within populations and regions, we included ‘region’ and ‘population nested within region’ as random factors. For each pair of populations, we computed pairwise FST values with ARLEQUIN 3.5 (Excoffier and Lischer, 2010), using 10 000 permutations. A principal coordinates analysis on genetic distances (GENALEX 6.4) was performed to detect major genetic clusters (K) at the three nested spatial scales. Only the neutral SNPs were used to make sure that the revealed clusters represent neutral population structure.

To identify putative loci under selection at the three scales, we used BAYESCAN 2.1., which compares the allele frequencies of each population with a common migrant gene pool (Foll and Gaggiotti, 2008). Hence, effective size and migration rate are allowed to differ among populations. Log10 values of the posterior odds (PO) >0.5 and 1.5 were taken as ‘substantial’ and ‘very strong’ evidence for selection, respectively (Jeffreys, 1961; Foll and Gaggiotti, 2008). The false discovery rate was set at 0.05 and 0.01 to adjust the log10(PO) significance thresholds corresponding to the 0.5 and 1.5 values. Because we were mainly interested in patterns of adaptation driven by environmental gradients, we focused on outlier patterns indicating divergent selection (FST significantly higher than neutral expectations) rather than outliers for balancing selection (FST significantly lower than neutral expectations). BAYESCAN analyses were complemented with a hierarchical genome scan using ARLEQUIN 3.5 which, however, resulted in similar results and is not discussed further (see Table 2, Supplementary Information 1).

Table 2 Outlier SNPs and their significant (|z|-score >4) associations with environmental variables using LFMM

Selection of environmental variables

To test for associations between outlier loci and environment, we collected soil and climate data. For each population, a bulked soil sample was analyzed to quantify cation exchange capacity, moisture content and organic material content. To deal with the collinearity among the soil variables, a principal component analysis (SPSS v20, Armonk, NY, USA) was performed. The first component, which was positively correlated with all measured soil variables and explained 92% of the original variance (Supplementary Information 3), was used in all further analyses. Climate variables related to temperature and precipitation (maximum temperature of the warmest month, minimum temperature of the coldest month, mean temperature of the warmest quarter, mean temperature of the coldest quarter, precipitation of the wettest month, precipitation of the driest month) were downloaded from WorldClim (Hijmans et al., 2005) with a 2.5 arc-minutes resolution. Both the extreme and average climatic variables can be expected to drive natural selection. Values for each climate variable were extracted for each population using Quantum GIS (QGIS 1.9.0, http://www.qgis.org). To deal with collinearity among climate variables, a principal component analysis analysis on all climatic variables was performed. The climate variables were reduced to two principal components (explaining 83% of the variation), the first one positively related to temperature, and the second negatively related to precipitation (Supplementary Information 3). Taken together, one soil and two climate variables (soil, temperature and precipitation) were used for all subsequent analyses.

Landscape genomics: LFMM

We applied LFMM, to find significant associations with environmental factors while accounting for neutral population structure (Frichot et al., 2013). For the analysis at the continental scale, three major genetic clusters (K=3) were chosen for the demographic background model, as revealed by the clustering of populations into three main groups (Supplementary Information 4). At the intermediate and regional scales, K=1 was chosen due to limited genetic structure (Supplementary Information 4). Although other K values may be more appropriate in cases of hidden population structure, results with different K values rendered equivalent results on all scales (Supplementary information 2). To estimate the LFMM parameters, an MCMC algorithm was used for each of the five environmental variables (temperature, precipitation, soil, latitude, longitude), using 1000 sweeps for burn-in and 10 000 additional sweeps to compute |z|-scores for all loci (Frichot et al., 2013). Significance was determined using a standard Gaussian distribution and corrected for multiple testing using the R package q-value v1.43 (Dabney et al., 2011). We considered that a SNP effect was significant when its |z|-score was >4, corresponding to a q-value of q<10−5. For each SNP, we were particularly interested in the environmental variable with the highest |z|-scores, as additional environmental associations are likely the result of spatial autocorrelation resulting in spurious associations (Supplementary information 3). Simulations have shown that tests based on LFMM significantly reduce the rates of false positive associations in highly structured populations (Frichot et al., 2013; De Villemereuil et al., 2014).

Landscape genomics: RDA

RDA is similar to canonical correlation analysis and has the advantage of eliminating redundant information in allele frequencies and their associations with environmental variables, which allows extracting the amount of genetic variance explainable by significant environmental variables (Borcard et al., 2011). The RDA analyses were performed using the Vegan package in R, to reveal how much of the variation in allele frequencies can be explained by spatial vs abiotic variables (Oksanen et al., 2008). Temperature, precipitation and soil variables (abiotic variables), and longitude, latitude and their square-transformed values (spatial variables) were chosen as potential explanatory variables for the observed multivariate genetic structure (Legendre and Fortin, 2010). As raw polynomial terms are highly correlated, we computed the orthogonal polynomials of latitude and longitude using the function ‘poly’ of the R package stats. We performed the forward selection procedure as proposed by Borcard et al. (2011) to determine which geographical variables were relevant at each spatial scale, using the R package packfor. Allele frequencies were Hellinger transformed prior to RDA (Legendre and Gallagher, 2001). An adjusted R2 was calculated to correct for the number of explanatory variables and significance was determined by using 1000 permutations (Borcard et al., 2011).

Results

Population structure and genomics

Expected heterozygosity was on average 0.314 (±0.013), and ranged from 0.219 (Luriano, Tuscany) to 0.346 (De Pinte, Flanders), with highest average values for the populations in Belgium and France (0.307–0.345) and lowest average values for the populations in Sweden and Italy (0.219–0.311) (Table 1). Genetic diversity was higher for SNPs associated with temperature and precipitation (average HE 0.323±0.036) than for neutral SNPs (average HE 0.289±0.029) (F1, 3686.6=33.31, P<0.001) (Figure 2). Pairwise FST values ranged from 0.013 to 0.155 (average 0.064±0.032; P<0.01) at the regional scale, from 0.017 to 0.157 (average 0.081±0.029; P<0.01) at the intermediate scale and from 0.160 to 0.369 (average 0.239±0.047; P<0.01) at the continental scale (Supplementary Information 2). The principal coordinates analysis plot based on all 25 populations showed three main clusters, which correspond to (i) Flanders/Ardennes/Picardy, (ii) Medelpad and (iii) Tuscany (Figure 3; Supplementary Information 4).

Figure 2
figure 2

Expected heterozygosity in the different regions, on the set of neutral SNPs and on the set of putatively adaptive SNPs (SNPs associated to temperature and precipitation).

Figure 3
figure 3

Principal coordinates analysis (GENALEX 6.4) of genetic distances among all populations. Samples were assigned the names of their corresponding region. PCo1 and PCo2 explain, respectively, 34.92% and 22.05% of the variation. See Supplementary Information 4 for genetic structure at the other scale levels using populations as labels. At the intermediate scale, PCo1 and PCo2 explain, respectively, 31.13% and 20.55% of the genetic variation, and at the regional scale, PCo1 and PCo2 explain respectively 34.10% and 15.19% of the variation. See Supplementary Information 5 for the genetic structure for all combinations of the first five PCo axes using regions as labels.

At the continental scale, 12 loci (6.6%) were identified as loci putatively under positive selection (Figure 4), and 5 of these were assigned the gene ontology term ‘response to abiotic stimulus’ (Supplementary Information 2, De Kort et al., 2013). Hence, 41.67% of the 12 outliers versus 29.51% (51 SNPs, Supplementary information 2) of all 183 SNPs were putatively associated with the GO term ‘response to abiotic stimulus’. At the intermediate scale, eight putative outlier loci (4.4%) were observed. Finally, five loci (2.7%) putatively under positive selection were detected at the regional scale (Figure 4, Table 2). Four outlier loci putatively under positive selection (2.2%; a2, a34, a107, n55) were found at all three spatial scales (Table 2), of which a2, a34 and a107 were associated with ‘response to abiotic stimulus Table 1 (Supplementary information 2). SNP n55 was not associated with an ecologically relevant function. All of the regional outliers and six out of the eight of the intermediate outliers were also found at the continental scale (Table 2).

Figure 4
figure 4

Outlier detection at the three spatial scales showing empirical FST values as a function of the log10(PO), resulting from the Bayesian model implemented in BAYESCAN (Supplementary information 2). The dashed line represents the Log10(PO) of 0.5, whereas the solid line indicates the Log10(PO) of 1.5. Log10 values of the PO>0.5 and 1.5 were taken as ‘substantial’ and ‘very strong’ evidence for selection, respectively. Positive outliers (under positive selection) have FST values that were higher than expected under neutrality, whereas negative outliers (under balancing selection) have FST values lower than neutral expectations.

Landscape genomics: LFMM

At the continental scale, a total of 143 SNPs (78.14%) could be associated to one or more environmental variables with |z|>4 (Supplementary information 2). Temperature was the most significant variable in 17 SNP-environment associations (9.30%; Figure 5), with a mean |z|-score (that is, strength of SNP associations) of 7.844. Precipitation could be associated to 47 SNPs (25.68%; Figure 5), with a mean |z|-score of 6.645. Soil was the most significant variable in 13 SNP-environment associations (7.10%; Figure 5), with a mean |z|-score of 5.678. Finally, latitude and longitude could be associated to 23 (12.57%) and 43 SNPs (23.50%), respectively (Figure 5). All 12 loci that were suggested as outlier loci under positive selection at the continental scale were associated with one or more abiotic variables (Table 2). More specifically, precipitation (10 associations), temperature (6 associations) and soil (6 associations) were correlated with outlier allele frequencies. The strongest associations were between precipitation and two SNPs, with locus n19 showing a striking pattern when considering the BAYESCAN and LFMM results combined (Figure 6,Table 2). We previously performed a Blastx of the n19 contig sequence against the Arabidopsis thaliana genome, and annotation of the corresponding Arabidopsis homolog revealed a very confident association with cell wall modification and pectinesterase activity (1.0E-42; AGI code AT2G26440, De Kort et al., 2013).

Figure 5
figure 5

The percentage of SNPs that were associated with an environmental variable. To exclude spurious effects of correlated variables, only the most significant association for each SNP was counted. Numbers represent the average per-SNP |z|-score (i.e., strength of selection).

Figure 6
figure 6

Comparison between the population-specific values of precipitation and allele (G) frequency of SNP n19. Negative values reflect high precipitation and vice versa. Population codes are explained in Table 2.

At the intermediate scale, a total of 91 SNPs (49.73%) were associated with one or more environmental variables (Supplementary Information 2). Temperature was the most significant variable in 39 SNP-environment associations (6.01%; Figure 5), with a mean |z|-score (that is, strength of association) of 5.639. Precipitation was associated with 10 SNPs (5.46%; Figure 5), with a mean |z|-score of 5.520. Soil was the most significant variable in two SNP-environment associations (1.09%; Figure 5), with a mean |z|-score of 4.269. Finally, latitude and longitude were the most important variables in 21 (11.48%) and 47 SNP-environment associations (25.68%) respectively (Figure 5). A total of three out of eight outlier loci were associated with ‘temperature’ and ‘precipitation’ according to the LFMM at the intermediate scale (Table 2).

At the regional scale, a total of 73 SNPs (39.89%) could be associated to one or more environmental variables (Supplementary information 2). Temperature was the most significant variable in 16 SNP-environment associations (8.74%; Figure 5), with a mean |z|-score of 6.618. Precipitation was associated with 12 SNPs (6.56%; Figure 5), with a mean |z|-score of 5.977. Soil was the most significant variable in two associations (1.09%; Figure 5), with a mean |z|-score of 5.567. Finally, latitude and longitude could be associated to 13 (7.10%) and 38 (20.77%) SNPs, respectively (Figure 5). Temperature and/or precipitation were associated with 4 out of 5 loci that were suggested as outlier loci under positive selection (Table 2).

Landscape genomics: RDA

The redundancy analyses revealed strong non-linear isolation-by-distance, with latitude2 explaining most of the genetic variation at the continental scale (F=10.973, P<0.001), whereas longitude and longitude2 played an important role at the intermediate (F=3.940, P<0.001 and F=2.519, P<0.001 respectively) and regional scales (F=5.579, P<0.001 and F=1.804, P<0.01) (Figure 7, Supplementary information 2).

Figure 7
figure 7

The percentage of genetic variation (R2adj) that could be explained by environmental variables at the continental scale (CS), the intermediate scale (IS) and the regional scale (RS).

Together, the geographical variables explained 31.60, 29.73 and 46.53% of the variation in allele frequencies at the continental, intermediate and regional scale respectively (Figure 7, Supplementary Information 2). Temperature was the most important explaining abiotic variable at the continental (R2adj=13.53%, F=3.805, P<0.001) and the regional scale (R2adj=4.68%, F=1.695, P<0.05), whereas precipitation explained most of the genetic variation at the intermediate scale (R2adj=11.11%, F=3.397, P<0.001). Soil had no significant effect on the variation in allele frequencies (Figure 7).

Discussion

Both the population genomic and the landscape genomic approaches indicated a more distinct signature of selection with increasing geographical scale, where more outliers and associations with climate were observed. At the continental scale, large environmental differences and non-linear isolation-by-distance very likely decrease the effective gene flow among populations of F. alnus subsp. alnus, thereby increasing the frequency of beneficial alleles in local environments. At the regional scale, on the other hand, the limited abiotic differentiation may explain the relatively small number of outliers. Nevertheless, the occurrence of some outliers suggest that natural selection may effect genomic variation, even at these small scales where gene flow is expected to be extensive. In line with the outlier detection results, the number of SNP associations with environmental variables increased with increasing geographical scale, while the strength and type of selection pressure varied among scales. At the continental scale, precipitation (reflected by the number of SNP associations) and temperature (reflected by the strength of SNP associations) were identified as putatively important drivers of adaptive differentiation in F. alnus subsp. alnus. At the regional scale on the other hand, SNPs were mainly correlated with temperature (reflected by both the number and strength of SNP associations). Finally, high levels of expected heterozygosity, in particular for the putatively adaptive loci, suggest high potential to cope with climate change.

Evidence for selection at all three spatial scales

In line with previous studies on the tree species A. glutinosa (Cox et al., 2011) and the fish species Salmo trutta (Meier et al., 2011), the number of outliers increased with increasing geographic scale (Figure 4, Table 2), from 2.7% outlier SNPs (5 outliers) at the regional scale, to 4.4% (8 outliers) at the intermediate scale and 6.6% (12 outliers) at the continental scale, suggesting changes in the nature of the selective pressures depending on the distance among the populations. Although restricting the study to SNPs derived only from coding regions may have overestimated the proportion of outlier loci to some extent, outlier detection studies in other tree species revealed similar proportions of outliers at a large scale (>400 km), for example in Picea glauca (5.5% outliers, FST=0.006, genic SNPs) (Namroud et al., 2008), A. glutinosa (3.7% outliers, FST=0.057, amplified fragment length polymorphisms; and 2.86%, FST=0.01, SNPs) (Cox et al., 2011; De Kort et al., 2014), and Quercus robur and Q. petraea (4% and 5% outliers resp., FST=0.013 and 0.015 resp., genic SNPs) (Guichoux et al., 2013). Nevertheless, many of these findings should be interpreted with considerable caution due to the use of candidate loci (except for those of Cox et al., 2011, De Kort et al., 2014), which tend to overestimate neutral distributions and therefore underestimate the true percentage of loci under selection, the latter being expected to be high when using loci related to adaptive functions or traits. The percentage of outlier loci we detected at a similar geographical scale (>400 km, 6.6%) even slightly exceeds this range of outlier proportions, likely because of (i) stronger selection driven by environmental differences between populations, and (ii) limited gene flow between Italian, Swedish and Belgian/French populations as a consequence of geographical isolation by mountain and sea barriers combined with pollination by insects. Indeed, F. alnus subsp. alnus is insect pollinated, which may be featured by limited gene migration between populations, therefore potentially facilitating divergent selection, whereas the studies mentioned earlier all examined signatures of selection in wind pollinated species. These barriers are probably responsible for the unimodal genetic response to latitude revealed by the RDA analyses. Accordingly, pairwise FST values at the continental scale were rather high with an average of 0.239 (±0.047).

The detection of five outlier loci at the regional scale, where little population structure and high genetic diversity were observed (Figures 2 and 3, Supplementary information 5, Table 1), suggests that low levels of adaptive differentiation may have emerged in the presence of strong among-population gene flow, strengthening the notion that local selection pressures can override the homogenizing effects of gene exchange (Lenormand, 2002; Alleaume-Benharira et al., 2006; Neale and Ingvarsson, 2008; Yeaman and Otto, 2011; Richardson et al., 2014).

Environmental agents of selection at the continental scale

Both temperature and precipitation seemed important selective agents driving adaptive differentiation in F. alnus subsp. alnus (9 outliers, Table 2), with the number and strength of SNP-climate associations differing among geographical scales (Table 2). Drought stress induced by different temperature and precipitation regimes may indeed facilitate selection of specific genetic variants, as the species naturally occurs across a variety of soil moisture levels, ranging from ephemeral to wet ecosystems (personal observations; Godwin, 1943; Fiedler and Landis, 2012; see also Zang et al., 2014). Interestingly, whereas most SNP-climate associations at the continental scale involved precipitation (25.68% SNP-precipitation associations vs 9.30% SNP-temperature associations, Figure 5), the most significant associations involved temperature (mean |z|-score of 7.844 and R2adj of 12.24%, Figures 5 and 6). This result may suggest a highly polygenic nature of traits responding to precipitation, whereas temperature may affect traits with a less complex genetic architecture.

Although we found strong neutral genetic differences between the Italian, Swedish and French/Belgian populations (Figure 2), it seems unlikely that neutral genetic divergence in the absence of gene flow has mimicked the signature of selection. First, the RDA analyses took both linear and quadratic geographical structure into account, whereas the LFMM approach accounted for the different genetic clusters. Second, the association of several outlier loci with climate was remarkable, especially at the continental scale, and did not unequivocally correspond to geographic distances.

Furthermore, correlated co-ancestry and demographic processes may increase the number of false positive outliers considerably (Excoffier et al., 2009; Mosca et al., 2012; Bierne et al., 2013). However, the detection of the same outliers at different geographical scales, together with the association of most of our outliers with ecologically relevant functions and environmental variables, supports selective divergence of the corresponding loci (Limborg et al., 2012; Buehler et al., 2012; Alberto et al., 2013; Steane et al., 2014). Overall, these findings also render it unlikely that our outlier SNPs represent sequence assembling errors, which can be introduced during the de novo sequence assembly preceding SNP discovery in non-model organisms (Da Silva et al., 2014). A literature search on the gene homologs containing the outlier SNPs (Table 2) uncovered some patterns that corroborate the ecological relevance of their associations with specific environmental variables. First, locus n19 was annotated to an A. thaliana homolog that is known to affect pectinesterase activity in cell wall modifications, mediating plant responses to drought (Wang et al., 2011; Verelst et al., 2013). Correspondingly, this SNP correlated exceptionally well with patterns of precipitation, over a gradient of very little rainfall in Italy over moderate levels in Sweden and Flanders, to high levels of precipitation in the Ardennes (Figure 6). Second, SNP a107, which appeared to be an outlier at all geographical scales, was putatively associated with a laccase locus (lac17, Table 2). Laccases are mainly known for their role in the lignification process and therefore especially relevant for biotechnological purposes (Berthet et al., 2011; Guerra et al., 2013; Lu et al., 2013). Moreover, because lignin quantity and quality affects cell wall fragility, laccase activity can affect tree growth and tolerance to various biotic and abiotic stresses (Peter and Neale, 2004). Although the role of laccase in responses to drought stress is hitherto poorly understood (but likely related to increased lignin production, Yoshimura et al., 2008; Moura et al., 2010), we observed a clear association between the allele frequency of locus a107 and precipitation at the continental scale. In concert with the exceptional genetic differentiation among populations of F. alnus subsp. alnus (Table 2), this finding denotes locus a107 as a candidate target for natural selection induced by drought.

Although the occurrence of the same outliers at different spatial scales may reflect excellent candidates for further research, selective loci may be different at different spatial scales because of variable selection pressures across scales, or because of correlations among loci underlying the same phenotypic trait (Nosil et al., 2009; Alberto et al., 2013). We therefore cannot exclude the potential adaptive value of outliers that do not appear consistently across all scales.

Environmental agents of selection at the intermediate and regional scale

At the intermediate scale, it is unclear which climatic variable is the most important selective agent (similar number of SNP-climate associations, but precipitation explains significantly more of the genetic variance according to the RDA analyses, Figures 5 and 6). This may be due to the strong correlation among the two variables at this scale (Spearman’s ρ=0.955, Supplementary Information 3), resulting in spurious associations. At the regional scale, temperature imposes a stronger selective pressure than precipitation as putative selective agent, both through the number and the strength of the SNP-temperature associations (Figures 5 and 6). This finding may suggest that at a very small scale, only loci underlying adaptive traits with a less complex genetic architecture, putatively representing large-effect loci, may be able to override the homogenizing effects of gene flow. Nevertheless, a full genome scan is required to make final conclusions on the relation between environmental drivers of selection and the genetic architecture of the SNPs involved.

The dominating effect of temperature at the regional scale (Table 2) is in line with temperature differences following a longitudinal cline, with slightly higher minimum winter temperatures in the more Atlantic populations in the west (ca 0.5 °C) than the more continental populations in the east of Flanders (ca −0.5 °C). In parallel with its association with temperature at the regional scale, outlier SNP a2 was annotated as an auxin-binding protein (Table 2). Auxin is an important plant hormone, modulating many aspects of plant growth, development and responses to biotic and abiotic stress (Wolters and Jürgens, 2009; Peleg and Blumwald, 2011). Variation at loci underlying auxin expression may thus have profound fitness consequences, as has been demonstrated in the context of temperature stress (for example, Swindell et al., 2007; Lee et al., 2014), plant–pathogen interactions (Kazan and Manners, 2009) and symbiotic interactions (for example, Reddy et al., 2006). We correspondingly observed a significant association between SNP a2 and temperature at the regional scale (Table 2). Natural adaptive variation in auxin genes has also been demonstrated in several other tree species, including Picea glauca (Namroud et al., 2008) and Populus trichocarpa (McKown et al., 2014), stressing the role of micro-evolution in driving genetic differences in auxin levels among populations.

Genetic diversity and adaptive potential

The observed high heterozygosity in F. alnus subsp. alnus populations can be explained by the use of putatively adaptive candidate loci, which are expected to considerably increase the probability of finding signatures of selection. Correspondingly, we found significantly higher levels of expected heterozygosity at putatively adaptive SNPs than at the neutral SNPs. Allelic variation may be higher at adaptive loci rather than at neutral loci for several reasons, and we accordingly observed higher levels of genetic diversity at the loci that were associated with climate than the loci not associated with climate (Figure 2). First, climatic variation in time may occur within the life span of a tree, resulting in older trees bearing the alleles adapted to a former climate together with younger trees that were selected for alleles adapted to the new climate. Given the very strong selection during the juvenile stage of each tree generation (Petit and Hampe, 2006; Kremer et al., 2012), long life spans may thus allow accumulation of selective alleles within populations to a greater extent than neutral alleles. Second, the complex genetic architecture underlying the adaptive traits may support high allelic diversity within populations for the loci underlying these traits (that is, adaptive loci). When many small-effect loci affect an adaptive trait, interactions among loci may prevent genetic drift of alleles that are neutral or slightly deleterious in the local environment but beneficial in a different environment (conditional neutrality and antagonistic pleiotropy resp.) (Latta, 1998; Colautti et al., 2012; Alonso-Blanco and Méndez-Vigo, 2014). Hence, although divergent selection reduces population genetic variation at adaptive loci, at least over one generation, it may not negatively affect the overall population adaptive diversity, when considering the adaptive variation across generations. Our findings may therefore suggest substantial adaptive genetic variation that may provide resilience to climate change in F. alnus subsp. alnus (see also Jump et al., 2009). Nevertheless, the current generation of trees may harbor the genetic variation that is associated with former demographic and climatic conditions, but not with future climate, which is predicted to be substantially different from historical conditions (Moritz and Agudo, 2013). Moreover, ongoing habitat loss and fragmentation may reduce population genetic diversity and increase inbreeding in many tree species, which may threaten the ability to cope with the synergistic effects of habitat reduction and climate change (Brook et al., 2008; Hof et al., 2011; Meier et al., 2012; Vranckx et al., 2012).

Conclusion

Whereas the majority of population and landscape genomic studies involve wind-pollinated tree species, our study demonstrates the genetic effects of divergent selection across the genome of a non-model, insect-pollinated tree species. Based on the number of outliers, the number of associations between loci and climate, and the strength of environmental effects on allelic distributions, we were able to assess the molecular signature of selection at different spatial scales. Although patterns of divergent selection imposed by temperature and precipitation were strongest at the continental scale, their occurrence at the intermediate and regional scales, where high intra-population diversity and limited genetic clustering were observed, may suggest a major role of among-population gene exchange in safeguarding evolutionary potential by supporting the conservation of adaptive genetic variation. Taken together, we conclude that the integration of complementary population and landscape genomic approaches using annotated SNPs rendered evidence for considerable selection imposed by temperature and precipitation at various spatial scales in F. alnus.

Data Archiving

The SNPs used in this study are part of a larger set of previously discovered SNPs, which has been submitted to GenBank and can be accessed using the following accession number PRJNA203129. http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA203129. Sequences of the 250 SNPs as well as the GENALEX file containing the successfully genotyped SNPs (183) are accessible on Dryad using the following 10.5061/dryad.r12jk.