Dissecting the Polygenic Basis of Cold Adaptation Using Genome-Wide Association of Traits and Environmental Data in Douglas-fir

Understanding the genomic and environmental basis of cold adaptation is key to understand how plants survive and adapt to different environmental conditions across their natural range. Univariate and multivariate genome-wide association (GWAS) and genotype-environment association (GEA) analyses were used to test associations among genome-wide SNPs obtained from whole-genome resequencing, measures of growth, phenology, emergence, cold hardiness, and range-wide environmental variation in coastal Douglas-fir (Pseudotsuga menziesii). Results suggest a complex genomic architecture of cold adaptation, in which traits are either highly polygenic or controlled by both large and small effect genes. Newly discovered associations for cold adaptation in Douglas-fir included 130 genes involved in many important biological functions such as primary and secondary metabolism, growth and reproductive development, transcription regulation, stress and signaling, and DNA processes. These genes were related to growth, phenology and cold hardiness and strongly depend on variation in environmental variables such degree days below 0c, precipitation, elevation and distance from the coast. This study is a step forward in our understanding of the complex interconnection between environment and genomics and their role in cold-associated trait variation in boreal tree species, providing a baseline for the species’ predictions under climate change.


Introduction
Understanding the fitness consequences of naturally occurring genetic variation is of great biological interest. In natural populations of widely distributed species, the challenge is to identify genes underlying traits that increase the species' ability to survive, thrive and reproduce [1,2]. Species that encounter diverse ecological and environmental natural conditions may be subject to strong differential selection pressures across heterogeneous environments that counteract the homogenizing effects of gene flow and drift in the evolution of local adaptation [3][4][5]. In natural populations of forest tree species, the highly polygenic nature of climate and trait adaptation, the presence of phenotypic plasticity, and the complexity and scale of environmental variation across the species' ranges present significant challenges for the study of phenotypic variation and local adaptation [6][7][8]. Therefore, a comprehensive understanding of the species' potential for evolutionary change and adaptation to changing environments will require the study of genome-wide genetic variation across the species' natural range and its association with ecologically important traits.
In the Northern hemisphere, the interactions between growth, phenology and cold hardiness are main determinants of survival in tree populations. Cold adaptation in forest trees involves significant physiological, cellular, genetic and morphological changes, and is highly synchronized with environmental cues such as photoperiod, day length and temperature [9,10]. Temperate and boreal forest tree species alternate periods of active growth and dormancy. Annual growth and tree development occur through a tradeoff between maximizing growth under favorable environmental conditions to be able to compete for light, while avoiding cold injury from late spring frosts and early fall frosts [9,11]. Both bud burst (growth initiation) and bud set (growth cessation) are highly adaptive traits and parallel latitudinal or longitudinal environmental clines [11,12].
The identification of genes underlying variation in growth, phenology and cold hardiness has been particularly difficult due to the highly polygenic nature of the traits. Univariate genome-wide association studies (GWAS) successfully identify genes of major effect (in which a new advantageous mutation is rapidly driven to fixation) but have very low power to detect weakly selected loci, characteristic of polygenic adaptation [5,13,14]. Therefore, a multivariate GWAS will give a more accurate estimate of the number and effect sizes of genes contributing to phenotypic variation in polygenic traits. Adaptation to climatic or environmental variables also requires the study of many genes and interconnected environmental variables. In this case, multivariate genotype-environment association studies (GEA) are also more adequate than univariate methods. A comparison between univariate and multivariate methods will provide a good understanding of the different trait architectures in a species.
Douglas-fir (P. menziesii) is an economically and ecologically important species in western North America. Considered one of the most important sources of lumber worldwide, it is grown extensively in plantations throughout Europe [15] and is also a major source of timber in its natural range, particularly in the Pacific Northwest [16]. Douglas-fir's wide geographic range spans a large gradient of environmental conditions and exhibits strong clines in relation to drought and cold tolerance [17][18][19]. Two varieties, menziesii and glauca, have been identified in coastal (Northern California-British Columbia) and intermountain regions (central Mexico-British Columbia); although phylogeographic studies using mitochondrial and chloroplast DNA suggest the presence of three, instead of two, distinct varieties (Coastal, Intermountain and Mexican) [20]. Common garden experiments focusing on climate adaptations of coastal Douglas-fir revealed that cold hardiness is highly correlated with cold-season temperature of the seed source [17], adaptations for coping with cold and drought overlap significantly [18,19], and that the species' phenology likely lacks the ability to track climate change [21,22]. Cold adaptation in coastal Douglas-fir has also been studied through transcriptomic analysis [23], QTL mapping [24,25], and candidate gene based GWAS analysis [26]. Genome-wide studies of local adaptation were limited due to the absence, until recently, of a Douglas-fir reference genome [27]. Sequencing and annotating conifer genomes have been quite challenging due to their enormous genome sizes (>10 Gbp) and their high content of repetitive elements. Douglas-fir is a diploid species with 13 chromosomes and an estimated genome size of 16Gbp [27]. In this study, we combined genome-wide patterns of genetic variation obtained from wholegenome re-sequencing, phenotypic measurements of growth, phenology, emergence and cold hardiness, and environmental variables to understand the genomic basis of cold adaptation in coastal Douglas-fir. Our questions were: (1) What is the genomic architecture (number of genes, effect sizes, genomic and geographic location) of cold adaptation in the species? and (2) Which genes, pathways and biological processes are associated with variation in environments and traits related to cold adaptation in the species?

Whole-Genome Re-Sequencing
Seeds from ten individuals spanning most of the coastal Douglas-fir's natural distribution (latitude 42-44 N, longitude 120-124 W degrees) were collected for whole-genome re-sequencing analysis. After collection, seeds were soaked in water at room temperature for four days, then haploid megagametophytes were dissected from each seed. DNA was extracted with a Qiagen DNeasy mini-prep Plant kit, and DNA quality and concentration were evaluated using nanodrop and picogreen on a Qubit 2.0 Fluorometer, respectively. Sequencing libraries were constructed using Illumina's TruSeq Nano DNA Library Prep Kit [27]. Prior to amplification, DNA was fragmented (200 ng starting material and 550 bp target insert size), followed by end repair and size selection of fragments, adenylation of 3 ends, and adapter ligation. PCR enrichment was performed in eight cycles. Barcoded libraries were combined into normalized pools and sequenced to >10 X coverage on an Illumina HiSeq 3000 using 150 bp paired-end reads at the University of California-Davis Genome Center.

SNP Calling
Raw reads from whole-genome re-sequencing data of ten Douglas-fir individuals were aligned to the 16 Gbp Douglas-fir reference genome version Psme v1.0 [27,28]; using Bowtie2 v2.2.9 [29]. The alignments were processed using SAMtools v1.3.1 and BEDtools v2.25.0; and SNPs were called using BCFtools v1.3.1 [30,31] with default parameters. A total of 458M SNPs were called. Filtering criteria included the removal of SNPs with a mapping quality < 20, depth of coverage < 8, and indels. All SNPs were given a score based on the sum of 16mer frequency sums of the 30 bp forward or reverse adjacent to the SNP. SNPs were discarded when the score was higher than 300. Moreover, SNPs were only called if present in scaffolds of 1kb or larger.

Sample Collection Prior to Genotyping
Seeds were obtained from 288 open-pollinated parent trees from throughout the range of Douglas-fir in western Washington and Oregon. Progeny from the parent trees were measured in a common garden study for adaptive traits including growth, phenology, emergence, and cold hardiness [32]. The same set of parents were used in previous association mapping studies [17,26,33]. Population stratification was based on geographic areas showing similar adaptive characteristics (defined as "provinces" in [17]), which roughly correspond with Douglas-fir breeding zones in Oregon and Washington.

DNA Extraction and SNP Genotyping
Seeds were soaked in a solution of 70% water and 30% of 3% hydrogen peroxide for 12 h. After that, megagametophyte haploid tissues from ten half-sib individuals for each family were pooled together to infer the maternal genotype. Megagametophytes DNA was extracted using the Qiagen DNeasy mini-prep Plant kit and an Eppendorf automated pipetting workstation. The extraction protocol included one day of tissue lysis and incubation at 96 • C, followed by several steps of precipitation and filtering. DNA quality and concentration were assessed using nanopore and picogreen on a Qubit 2.0 Fluorometer, respectively. Samples were genotyped using a custom-based multi-species Illumina Infinium SNP array comprising 80 k SNP markers from which 20 k were designed for Douglas-fir and 60 k for sugar pine. Ilumina's Genome Studio Genotyping Module v2.0.4 was used to call genotypes, filter, and generate genotyping statistics for all samples and SNPs. Filtering was applied to Douglas-fir and sugar pine SNPs separately and the pass-filter SNP data were subsequently merged. All SNPs with a call frequency ≤ 0.65 and all individuals with a call rate ≤ 0.8 were not included. Further, markers were filtered out based on minor allele frequency (>0.01) to remove all monomorphic and low-quality SNPs. SNP functional annotations were obtained from aligning against the full NCBI nonredundant protein sequences database (nr) using BLASTP (e-value < 1 × 10 −10 ), and by using the Douglas-fir's genome annotations [34]. SNPs originally designed for sugar pine but that genotyped well in Douglas-fir were aligned to the Douglas-fir genome assembly with Minimap2 [35]. BEDtools v2.25.0 was used to assign SNPs to coding and non-coding regions of the Douglas-fir genome.

Population Structure
The number of genetic clusters was initially assessed with a principal component analysis (PCA) implemented in the "adegenet" R package [36,37]. Population clustering was further analyzed using the software fastSTRUCTURE [38]. Ten runs were completed for each value of K between one and ten. The chooseK script in fastSTRUCTURE was then used for each run to calculate which value of K was optimal. Because fastSTRUCTURE includes stochastic simulation, CLUMPP [39], implemented in the R package "POPhelper" [40] was used to combine the results of each run to optimize the number of clusters. Individuals were assigned to individual genetic clusters when q-value > 0.8; or to multiple clusters when q-value < 0.8. All R analyses were carried out on R version 3.5.3.

Isolation by Distance
To rule out the possibility that any patterns observed were due to isolation by distance (IBD), pairwise genetic and physical distances were calculated in the R package adegenet [36,37]. A mantel test was later performed to determine if there was significant IBD. Genetic distances were calculated as Nei's distances [41] and fixation index (Fst) with the StAMPP package in R [42]. Geographic distances were calculated in base R, with the median latitude and longitude was used for the geographic location of populations. Mantel tests were carried out in ade4 R package [43].

Multivariate and Univariate Genome-Wide Association (GWAS) of Cold-Related Traits
Phenotypic data were measured as part of a large genecology experiment containing 1338 families from 1048 locations planted in a randomized plot design, seedlings were grown for two years in a common garden in Corvallis, Oregon [32]. A subset of those individuals was included in the study represented here. The full description of the methods used in the common garden can be found in [32]. In summary, seeds were collected from open-pollinated individuals in natural stands throughout the range of Douglas-fir in Oregon and Washington. Offspring of the maternal trees were grown for two years in a common garden located in Corvallis, Oregon. Tests were established over three years using different sets of families, with a common set of all three years to adjust for year effects as in [44]. Breeding values for all 23 growth, phenology, emergence, and cold hardiness traits (Appendix A, Table A1) were estimated by measuring the traits of seedlings being grown in the common garden. This study analyzed a small subset of this phenotypic data from 271 families.
A multivariate GWAS was performed by fitting a Bayesian sparse linear mixed model (BSLMM) in GEMMA v0.98 [45]. BSLMM uses Markov Chain Monte Carlo (MCMC) to associate markers and phenotypes by modeling all markers simultaneously while controlling for population structure and relatedness using the following model: where 1n is an n-vector of 1s; µ is a scalar representing the phenotype's mean; X is a matrix of genotypes from 271 families and 20,397 SNPs; β is the corresponding p-vector of the genetic SNP marker effects; u is an n-vector of random effects; and e is an n-vector of errors. Models were fitted for each of the 23 cold-related traits. In contrast to other multivariate GWAS methods, BLSMM captures the effect sizes of both small effect SNPs (α) and large effect SNPs (β). Therefore, PVE, the proportion of phenotypic variance explained by the combination of all small and large effect SNPs, reflects how well one could predict the phenotype from the available SNPs if β and u were known; and PGE reflects how well the phenotype is predicted by only using β [45]. Analysis was done with default options, which includes a burn-in of 100,000 steps, and 1,000,000 sampling steps. A posterior inclusion probability (PIP) > 0.01 (selected SNPs that had an effect in at least 1% of the models) was used as threshold for selection of associated SNPs [46,47]. In addition, a univariate GWAS using mixed linear model (MLM) was performed in TASSEL v.5. The first principal components of a PCA were used as co-variates to control for population structure, and a kinship matrix to account for relatedness [48]. Breeding values of progeny's traits were determined as in [32]. The proportion of phenotypic variance explained by the SNP (R 2 ), and the dominance and additive effects were also calculated with TASSEL v.5.

Multivariate and Univariate Genotype-Environment Association Analyses (GEA)
Elevation data were obtained from GIS coverages using a digital elevation model (DEM). Climate data were obtained using ClimateNA [49]. ClimateNA downscales PRISM data [50] to scale-free point data, allowing to more accurately predict maternal tree climate variables. All climate data are based off the averages for the years 1962-1990. Climate variables include monthly, seasonal and annual averages for minimum and maximum temperature, precipitation, daily temperature fluctuation and aridity (a ratio of precipitation to temperature); dates of 50 % probability of last spring frost and first autumn frost; frost-free period; and seasonal ranges in temperature and precipitation. All variables are detailed in Appendix A, Table A1. Correlations between all environmental (climatic and geographic) and phenotypic variables were investigated in R (version 3.6.1). Correlation plots were produced.
A redundancy analysis (RDA) as implemented in the vegan R package [51] was performed to estimate how much of the genetic variation was explained by environmental variables. RDA is a type of multiple regression that determines how much of the variation in one set of variables is explained by the variation in another set of variables. Environmental variables were investigated for correlation prior to carrying out the analysis. Variables were removed if their correlation coefficient was >0.8 with any other environmental variable. Since most variables were highly correlated with each other, the selected variables for the analysis were: Summer heat moisture index, distance to the sea, and precipitation as snow. Candidate SNPs were selected if falling outside of the 2.5 standard deviations of the mean loading score (p-value = 0.012) for each of the first two axes.
In addition to the multivariate GEA analysis, two univariate GEAs were performed using the programs Bayenv v.2 [52] and TASSEL v.5 [48]. Bayenv is based on a Bayesian method that control for effects caused by isolation-by-distance, population structure, and genomic background. The program estimates a covariance matrix of estimated allele frequencies to use as a null model of neutral genetic structure. This null model is then compared to a linear model between the allele frequencies and each environmental variable to test for improved fit over the null model. The software returns Bayes factors (BF) for each locus and, when using the nonparametric extension, also returns Spearman's rank and Pearson's correlation coefficients. Ten runs (100,000 iterations each) of the covariance matrix estimation were carried out and averaged together to ensure an accurate matrix across different runs. Loci were considered robust candidates for environmental selection if they had a BF ≥ 3 and were in the top 1% of Pearson's correlation coefficient. TASSEL univariate analyses were performed using each environmental variable as the vector of observations, and the SNP markers, population structure and kinship as fixed effects in an MLM association analysis. The first three principal components of a PCA of environmental variables were also tested for associations with the SNP markers.

Genome Scan for Detection of Selection Signatures
Pcadapt R package [53] was used to detect signatures of selection in the 20.3 k SNP set. Pcadapt estimates principal components (K) using a principal component analysis; and later calculates the correlation between genetic variation and the first K principal components using Mahalanobis distance to detect outliers [53]. A q-value threshold of 0.1 was used. Pcadapt was chosen over other genome scans methods because it does not require grouping individuals into a-priori defined populations and it is not sensitive to admixed individuals in the dataset.

SNP Calling and SNP Genotyping
Of the 458M SNPs called in the whole genome re-sequenced individuals, a total of 1.2M SNPs were retained after filtering following the SNP calling process. All SNPs were scored by Illumina using in silico design scores to maximize the number of markers for genotyping that will provide high conversion rates. As a result, 20 k best-scoring SNPs were included in the Illumina Infinium array for SNP genotyping. From this, genotypes were obtained from 16,146 SNPs. Posterior genotype clustering, and removal of monomorphic and low-quality SNPs led to 20,397 SNPs, from which 5815 were originally designed for sugar pine. SNPs selected for posterior analyses were distributed along 5892 scaffolds in the Douglas-fir genome and matched 6833 coding regions (4301 genes plus 2532 transcripts from unknown genes). The distribution of SNPs minor allele frequencies can be seen in Appendix A, Figure A1.

Population Structure
A K value of 2 was shown to be most optimal using the choose K script included in fast STRUCTURE. Distruct [54] used the CLUMPP output of all 10 runs where K = 2 to make an ancestry plot of each individual ( Figure 1). Results of the fastSTRUCTURE analysis showed that two distinct genetic clusters exist within the study zone; a dominant type throughout the sampled range up to the Canadian border ("Coastal Dominant"), and a smaller one existing in Southern Oregon ("Coastal South"). Individuals with mixed ancestry from both clusters were found in Southern Oregon. The map resulting from assigning individuals to the distinct genetic clusters can be seen in Figure 1. The results of the principal component analysis also showed two fairly distinct groups ( Figure 1).

Correlations among Cold-Related Phenotypic Traits and Environmental Variables
All correlations among phenotypic traits and environmental variables are represented in heatmaps in Appendix A, Figures A2-A4. The three measures of cold damage (percentage of damaged tissue in needle, stem, and bud tissue) are positively correlated with each other as well as with growth rate (both height measurements, height increment, root, shoot, and total weight, and root length). These measures of cold hardiness had weak

Correlations among Cold-Related Phenotypic Traits and Environmental Variables
All correlations among phenotypic traits and environmental variables are represented in heatmaps in Appendix A, Figures A2-A4. The three measures of cold damage (percentage of damaged tissue in needle, stem, and bud tissue) are positively correlated with each other as well as with growth rate (both height measurements, height increment, root, shoot, and total weight, and root length). These measures of cold hardiness had weak negative correlations with measures of emergence and root to shoot ratio. Most phenotypes were significantly correlated with each other, with the exception of the propensity to second flush and length of second flush, which were correlated with each other, but not strongly correlated to any other variables (Appendix A, Figure A2). Correlations within environmental variables can largely be broken into two groups, one containing all temperature variables and continentality; with a second group consisting of precipitation variables, PCs 1-3 and geographic variables (Appendix A, Figure A3). Trait 1 was negatively correlated with elevation, longitude, and continentality (TD) and positively correlated with Mean Annual Temperature (MAT) and Mean Coldest Month Temperature (MCMT). Trait 2 was negatively correlated with Climate Moisture Deficit (CMD), Summer heat moisture (SHM), and positively correlated with latitude (Appendix A, Figure A4).

Genome Scan for Detection of Selection Signatures
PCAdapt identified 582 outlier loci distributed across 290 scaffolds in the Douglas-fir genome (Supplementary Material Table S1). From those SNPs, 225 were from coding regions (156 matched Douglas-fir genes and 69 matched transcripts with unknown genes), and the remaining SNPs were from non-coding or unidentified genomic regions. From the 582 outlier loci, 271 were significant in GWAS analyses and 269 in GEA results ( Figure 2).  Table S6). Of these 528 SNPs, 144 matched genes and 58 transcripts (from unknown genes). Each of these variables were correlated to other environmental variables, suggesting the combined effect of groups of individual variables associated with an aspect of climate (Appendix A, Figure A3). Figure 2 shows the individ-

Isolation by Distance
When individuals were not grouped into populations, mantel tests revealed very low levels of isolation by distance (r = 0.09, p-value = 0.001). There was no significant relationship between genetic distance and geographic distance when individuals were grouped into populations (r = 0.02, p-value = 0.37). See Appendix A, Figure A5 for a plot of the results. This, combined with the results of fastSTRUCTURE, indicated a weak population structure in the dataset.

Univariate and Multivariate Genome-Wide Association Study of Cold-Related Traits
The MLM model implemented in TASSEL identified 799 significant associations after a false discovery rate (fdr) correction for multiple-testing [55]. A total of 690 associations were significant for additive effects (Supplementary Material Table S2). Associations were found among 237 SNPs and 20 traits. Of the 237 SNPs, 170 matched coding regions (126 genes and 44 transcripts) and the others were non-coding (Supplementary Material Table S2). Minor allele frequencies of significant SNPs ranged from 0.01 to 0.48 (mean = 0.04, standard deviation = 0.06). Trait 1 (a linear combination of several growth traits), for which higher values represent higher vigor (faster growth, later budset, earlier emergence, greater partitioning to shoot vs. root) was associated with 106 SNPs, and root to shoot ratio (RTSH) was associated with 115 SNPs based on TASSEL results. The number of SNPs associated to each trait can be found in Table 1.  In contrast, GEMMA identified 7536 significant associations among 6034 SNPs and all 23 traits (Table 1, Supplementary Material Table S3). Significant SNPs were matched to 2173 Douglas-fir genes and 1117 transcripts (from unknown genes). Minor allele frequencies of significant SNPs ranged from 0.01 to 0.5 (mean = 0.15, standard deviation = 0.13). Traits with large (>1000) number of associated SNPs were needle cold damage, bud cold damage and Trait 2. Increasing the threshold of posterior inclusion probability (PIP) from 0.01 to 0.05 and 0.1 significantly reduced the number of SNPs associated with needle cold damage, bud cold damage and Trait 2, but had little effect on other traits. The number of shared SNPs found across all GWAS and GEA analyses also remain relatively unchanged when changing the PIP threshold (Appendix A, Figure A6).
Individual SNP markers explained from 6 to 24% (mean = 9.9, stdev = 3.1) of the phenotypic variation for all traits, whereas combined SNP markers explained up to 93% ( Table 1, Supplementary Material Table S2). Cold hardiness traits (needle and stem cold damage) had high PVE but low PGE (proportion of genetic variance explained by sparse effects), suggesting that these traits are highly polygenic and mainly controlled by large numbers of genes of small effect. This was further supported by the fact that traits exhibiting this pattern had very few SNPs identified by univariate models (Table 1), but many SNPs identified by GEMMA. Both large and small effects genes seem to be important controlling growth, emergence and phenology traits. None of the studied traits seemed to be only controlled by large effect genes.

Univariate and Multivariate Genotype-Environment Association Analyses (GEA)
TASSEL univariate association analyses identified 74 SNPs significantly associated with 17 environmental variables (Supplementary Material Table S4). Most SNPs were related to cold temperature variables, with 55, 28, and 16 SNPs associated with Degree days below 0 in autumn, year-wide, and spring (DD_0_at, DD_0, DD_0_sp), respectively. Results of the PCA of environmental variables indicated that principal component 1 explains 57.5% of the variation in the data and is almost entirely represented by mean annual precipitation (MAP), whereas PC2 explains 32.4% and is represented by growing degree days (DD_5) and elevation (Appendix A, Figures A7 and A8). Bayenv identified associations among 404 SNPs and 22 environmental variables. These SNPs matched 110 genes and 43 transcripts from unknown genes (Supplementary Material Table S5). Traits were associated with 1 to 77 SNPs each. Minor allele frequencies of significant SNPs had a mean of 0.04 and a standard deviation of 0.03. From the 404 SNPs, 260 were also significant in other GEA analyses ( Figure 2).
The multivariate (RDA) GEA method identified a total of 528 significant SNPs for continentality (255 SNPs), precipitation as snow (219 SNPs), and summer heat moisture index (54 SNPs) (Supplementary Material Table S6). Of these 528 SNPs, 144 matched genes and 58 transcripts (from unknown genes). Each of these variables were correlated to other environmental variables, suggesting the combined effect of groups of individual variables associated with an aspect of climate (Appendix A, Figure A3). Figure 2 shows the individuals and SNPs plotted in the ordination space. Individuals of genetic cluster coastal south (southern Oregon, see Figure 1) grouped together and separate well from the rest of the individuals mostly on summer heat moisture index. Individuals from this cluster also tend to be at lower latitudes and longitudes than the rest of the population. Minor allele frequencies had a mean of 0.07 and a standard deviation of 0.07. Main gene ontologies of combined results across all GEA methods were metabolic processes (carbohydrate, lipid, nucleobase, phosphorus, and cellular aromatic), response to stress (oxidative and osmotic), and developmental processes involved in reproduction (Figure 2). When comparing the results of all methods, 385 SNPs (130 genes and 54 transcripts from unknown genes) were significant in at least one GEA and one GWAS analysis (Table 2). From them, 15 SNPs were significant in all univariate and multivariate GEA and GWAS analyses (Supplementary  Material Table S7, Appendix A, Figure A6).

Discussion
In agreement with previous studies of natural populations of highly-outcrossing, widely distributed boreal and temperate conifer species [4,6,9,26,33], we found strong evidence for cold adaptation despite low population structure and (potentially) high gene flow. Our Bayesian clustering and PCA analyses revealed very little population structure within the study area, with gene flow mainly occurring asymmetrically from northern to southern populations, and from high to low elevations. Steep gradients of selection are likely balancing large rates of migration (via pollen) to shape steep elevational clines. These clines are most likely produced by large differences in the frequency of alleles that are either common (and widespread) or rare (but frequent at high elevations). Previous studies have shown strong evidence of steep phenotypic and environmental clines within the coastal variety [17][18][19]32,56]. Significant divergence in allele frequency was also found in genome-wide studies comparing high and low elevation populations of Pinus yunnanensis [57]. In contrast, studies in this and other forest tree species suggested small to moderate shifts in allele frequency of adaptive alleles [6,58,59], and the presence of intermediate-frequency polymorphisms characteristic of recurrent selective sweeps [60]. Further studies are required to know whether recurrent selective sweeps, demographic processes and/or introgression from the interior (mountainous) variety explain the pattern observed.
Our results highlight the role of both coding and non-coding regions in cold adaptation of the species. We believe these regions are widely distributed across the genome, however, in the absence of a high-density linkage map or a chromosome-scale reference genome, our study could only identify genomic locations at the scaffold-level. We identified multiple SNPs or genes within the same scaffolds. Due to the highly fragmented nature of the Douglas-fir reference genome, we assume genes located in the same scaffold have a high chance of being linked. A widespread genomic location of adaptive genes was also found in recent GWAS and GEA analyses in Pinus taeda [6,61], Pinus lambertiana [62] and Populus balsamifera [63].

Polygenic Basis of Cold Adaptation in Coastal Douglas-fir
Both GWAS and GEA analyses suggested a polygenic basis of cold adaptation, with many genes associated with cold-related traits and climate adaptation. This is coincident with recent genome-wide analyses in widespread, outcrossing plant species with large population sizes such as Zea mays [64], Populus trichocarpa [65], Pinus contorta [66], and Pinus sylvestris [67]. Trait architecture in coastal Douglas-fir seems to be more complex than previously suggested, with traits (such as growth, emergence and phenology) controlled by both large and small effect genes, and others (cold hardiness) mainly controlled by large numbers of small effect genes. None of the traits was only controlled by major effect genes. The main differences between univariate and multivariate GWAS results were found in cold hardiness and trait2. While univariate MLM identified less than a handful of SNPs, multivariate BSLMM identified 1 or 2 thousand SNPs associated with each trait. Most of these SNPs had very small effect sizes, and even BSLMM failed to capture them when the posterior inclusion probability (PIP) threshold increased from 0.01 to 0.05. While the first results (PIP > 0.01) might overestimate the number of associated SNPs by incorporating false positives, the second results (PIP > 0.05) fail to capture SNPs of very small effect significantly increasing the number of false negatives. Another important difference between univariate and multivariate GWAS results is the percentage of phenotypic variance explained by SNPs. Individual SNP markers explained from 6 to 24% of the phenotypic variation for all traits, whereas combined SNP markers explained up to 93% (Table 1,  Supplementary Material Table S2).
Aspects of climate are highly correlated with each other, complicating the interpretation of genome-wide environmental associations. Our GEA analyses identified several hundreds of SNPs associated with a large number of temperature and precipitation related variables (Supplementary Material Tables S4-S6), suggesting the study of climate adapta-tion requires the analysis of groups instead of individual environmental variables. Previous studies have identified modular aspects of climate adaptation in species such Pinus contorta [68] and Pinus taeda [6]. Multivariate methods identified more SNPs but the difference between univariate and multivariate GEA methods was more subtle than between GWAS methods, with hundreds of significant SNPs shared across methods (Figure 2). Univariate Bayenv and multivariate RDA identified the same 237 significant SNPs, but univariate MLM only found 6 common significant SNPs with multivariate RDA and 29 with Bayenv ( Figure 2). Pcadapt, the outlier method, identified 383 SNPs in common with other GEA methods (Figure 2). When comparing the results of all GEA and GWAS methods, 385 SNPs (130 genes and 54 transcripts from unknown genes) were significant in at least one GEA and one GWAS analysis. These genes indicate a complex genomic architecture of cold adaptation in the species with many genes involved in many important biological functions related to growth, phenology and cold hardiness and that strongly depend on variation in environmental variables such Degree days below 0c, amount of precipitation, elevation and distance from the coast (TD) ( Table 2).

Trade-Offs between Growth and Cold Hardiness
Trees living in cold environments must balance the timing of growth initiation with the risk of frost [69]. If a tree initiates growth too early, before the last spring frost, it risks tissue damage or death. On the other hand, if a tree initiates growth too late, it loses out on potential growing season and might not be able to compete for light with nearby trees. This trade-off between growth and cold hardiness have been observed in several temperate and boreal tree species, including Pseudotsuga menziessii [9,26], Picea sitchensis [70], Populus trichocarpa [71], Pinus sylvestris [67,72,73], Abies sachalinensis [74], Acer rubrum, Betula alleghaniensis, Quercus rubra and Juglans nigra [75].
Our results showed the complex relationships between genotypes, growth (DIAM, HT1, HT2, HTINC), emergence (EMEAN, EMSTD), phenology (BS1, BS2, BB2), cold hardiness, and environmental variables such as degree days below 0c, precipitation as snow, and continentality (TD) (Figure 3). The importance of these environmental variables for cold adaptation was suggested in previous common garden studies in the species [18,19,26,32]. As we move further away from the sea towards the mountains (decreasing longitude and increasing continentality), the weather gets colder with a higher number of degree days below 0c, and increased snow precipitation, characteristic of longer winter seasons. Longer winter seasons bring shorter growing seasons, in which Douglas-fir develops new growth earlier but also stops growing earlier than their coastal counterparts. As a consequence of this, general growth (height and diameter) is reduced but cold hardiness is increased at lower longitudes and higher elevations. For example, 106 SNPs showed associations with Trait 1 (Table 1), a linear combination of several growth traits. Trait 1 inversely correlates with longitude, elevation, continentality and winter temperatures suggesting that higher values of Trait 1 are correlated with higher vigor (faster growth, later budset, earlier emergence, greater partitioning to shoot vs. root), which is coincident with previous common garden studies suggesting an association between Trait 1 and lower drought and cold tolerance [18,32]. Most of the genes associated with both traits and environmental variables also showed evidence of trade-offs between growth and cold hardiness ( Table 2).

Functional Characterization of Genes Associated with Cold Adaptation
Cold adaptation in forest trees involves significant physiological, cellular, genetic and morphological changes [9,10]. In our study, we found that genes associated with cold adaptation affect many important biological processes such as primary (carbohydrate,

Functional Characterization of Genes Associated with Cold Adaptation
Cold adaptation in forest trees involves significant physiological, cellular, genetic and morphological changes [9,10]. In our study, we found that genes associated with cold adaptation affect many important biological processes such as primary (carbohydrate, lipid, aminoacid and RNA) metabolism, secondary metabolism (terpenes, steroids, vitamins and chlorophyll), growth and reproductive development, transcription regulation, stress and signal transduction, and DNA processes ( Table 2). Both carbohydrate and lipid metabolism are deeply modified during cold stress in temperate and boreal trees. Carbohydrate metabolism changes include the accumulation of non-reducing disaccharides (sucrose and raffinose) and the increase in polysaccharide (starch) breakdown, which produces more energy to compensate for reduced photosynthetic activity during freezing stress. In our study, we found genes involved in cell wall pectin (gene PSME_35569); starch and sucrose metabolism (gene PSME_01956); fructose and mannose metabolism (gene PSME_39947); pentose phosphate pathway (PSME_32494); glycolysis (PSME_39868 and PSME_34754); sugar transport (PSME_46816 and PSME_47335) and nine other genes involved in similar processes. Most of these genes were associated with cold hardiness traits (ndlcold and budcold), and environmental variables such as Continentality (TD) and others ( Table 2). Similar to carbohydrates, membrane lipids adjust their composition to protect cell walls against dehydration-related freezing injury [10,11,76]. In our study, we only found one gene involved in the metabolism of glycerophospholipids (PSME_32128), which was correlated with mean coldest month temperature (MCMT) and trait2 ( Table 2). Genes involved in arginine, proline, β-alanine, cysteine and methionine metabolism were also found in this study (PSME_23694, PSME_30851 and PSME_40791).
Plant secondary metabolites play an important role in adaptation to changes in the environment [77]. In our study, we found four SNPs involved in terpenoid biosynthesis (3 of them matching the same gene PSME_39361). One of those SNPs, seq-rs11849-DF (1-deoxy-D-xylulose 5-phosphate reductoisomerase) produces a non-synonymous change and was found to be significantly associated in all five GEA and GWAS analyses. The homozygote form containing the minor allele (CC) is present at eastern longitudes and higher elevations (usually > 900 masl) and is associated with slower growth and earlier bud set (Table 2, Figure 3). Other important genes involved in terpene biosynthesis (specifically diterpenes) are members of the Cytochrome P450 super gene family [78]. We found Cytochrome P450 gene PSME_03065, from which the homozygote form containing the minor allele (GG) is present at higher elevations (usually > 1000 masl) and is associated with lower values of trait 2 (later bud burst and lower partitioning to diameter vs. height) and precipitation as snow higher than 400 mm.
Plants employ complex signaling pathways to regulate the expression of defense and stress genes and other mechanisms that allow resistance to environmental stress [79]. The ubiquitin-proteasome system controls the degradation of most proteins in the cells. It provides a rapid strategy to control many cellular processes by degrading specific proteins, playing a critical role in the regulation of many biological processes such as hormonal signaling, growth, embryogenesis, senescence and environmental stress [80,81]. The initiation of ubiquitination (protein degradation) requires an E1 enzyme joining a Ubiquitin protein and follows with a three-step conjugation cascade (E1 > E2 > E3) that detects specific ubiquination signals [81]. Our study identified a strong candidate for that first ubiquitin protein in the ubiquitination process in Douglas-fir (SNP seq-rs6421-DF, Gene PSME_00292 ubiquitin-40S ribosomal protein S27a). All univariate and multivariate methods in this study identified this SNP as significantly associated with several traits and environments. The homozygote form containing the minor allele (AA) is present at eastern longitudes and higher elevations (usually > 900 masl) and is associated with earlier bud set and slower growth (Table 2, Figure 3). Cold stress induces the degradation of transcription factor ICE1, process that is mediated by a RING-type E3 ligase. In Arabidopsis, overexpression of HOS1 (RING-type E3 ligase) and AtCHIP (U-box type E3 ligase) led to reduced cold hardiness [81]. In our study, we have found two U-box genes (PSME_03290 and PSME_37900) and one RING-type E3 gene (PSME_02072) associated with several growth traits and continentality (TD) in Douglas-fir.
Members of the thaumatin family ("antifreeze proteins") are upregulated during cold stress and are potentially regulated by photoperiod in P. sitchensis [11]. In our study, we found thaumatin gene PSME_47504, which is associated with several growth traits, degree-days below 0c and continentality. Peroxidases (such as genes PSME_37207 and PSME_46413) are upregulated during cold stress to protect against oxidative damage caused by an increase in reactive oxygen species [11,82]. Members of the ABC super gene family were found to be key players in defense mechanisms against different herbivores [83,84] and pathogens [61], growth and drought [84] in several forest tree species [85]. In our study, we found that the homozygous of the minor allele (CC) for ABC gene PSME_40921 was associated with increased root to shoot ratio-RTSH (slower growth); and usually occurs at mountainous regions (elevation > 1000 masl) with more than 30 chilling degree-days in autumn.

Conclusions and Predictions in the Face of Climate Change
Due to the rapid nature of climate change and the slow nature of plant migrations, proactive strategies should be employed by forest managers to mitigate potential damages resulting from the increased frequency of extreme cold and drought events [86]. As global climate warms at unprecedented rates, species' distributions are expected to change exposing populations to new environmental conditions [87]. Climate prediction models suggest current climate change could reduce the resilience of coastal Douglas-fir on the warmer margins of its range [21], which will translate in a reduction in fitness and therefore lower wood productivity. Populations in colder areas might also suffer from an abrupt growth decline as a consequence of late spring frosts (more common under warming climates), as previously observed in Canadian populations of Picea mariana [88]. Moreover, populations of Douglas-fir that are isolated on mountains, especially southern populations of the intermountain variety, should be of particular interest due to their higher risk of losing genetic diversity [89]. Our study indicated that these mountainous regions harbor low-frequency alleles that have very important roles in cold adaptation in coastal Douglasfir. As predictions of future climate variation improve, so should our understanding of the role of phenotypic variation in the potential for adaptation to changing environments.

Supplementary Materials:
The following are available online at https://www.mdpi.com/2073-4 425/12/1/110/s1, Table S1: Results of the outlier analysis results, Table S2: Results of the GWAS analyses implemented in TASSEL, Table S3: Results of the GWAS analyses implemented in GEMMA, Table S4: Results of the GEA analyses implemented in TASSEL, Table S5: Results of the GEA analyses implemented in Bayenv, Table S6: Results of the multivariate GEA analyses using Redundancy Analysis (RDA), Table S7: Candidate SNPs for cold adaptation in Douglas-fir.

Acknowledgments:
The authors would like to thank students who helped with DNA extraction such as Anita to and Justin Ndihokubwayo.

Conflicts of Interest:
The authors declare no conflict of interest.      Figure A2. Correlations among phenotypes. A heatmap is shown with a gradient between purple (negative correlation coefficient) and teal (positive correlation coefficient) with white representing a correlation coefficient of zero. No significant correlations are denoted by an "×" in each box. Phenotypes' IDs can be found in Table A1. Figure A2. Correlations among phenotypes. A heatmap is shown with a gradient between purple (negative correlation coefficient) and teal (positive correlation coefficient) with white representing a correlation coefficient of zero. No significant correlations are denoted by an "×" in each box. Phenotypes' IDs can be found in Table A1. Figure A3. Correlations among environmental variables. A heatmap is shown with a gradient between purple (negative correlation coefficient) and teal (positive correlation coefficient) with white representing a correlation coefficient of zero. No significant correlations are denoted by an "×" in each box. Environmental variables' IDs can be found in Table A2. Figure A3. Correlations among environmental variables. A heatmap is shown with a gradient between purple (negative correlation coefficient) and teal (positive correlation coefficient) with white representing a correlation coefficient of zero. No significant correlations are denoted by an "×" in each box. Environmental variables' IDs can be found in Table A2. Figure A4. Correlations among phenotypes and environmental variables. A heatmap is shown with a gradient between purple (negative correlation coefficient) and teal (positive correlation coefficient) with white representing a correlation coefficient of zero. No significant correlations are denoted by an "×" in each box. Phenotypes' IDs can be found in Table  A1, and environmental variables' IDs can be found in Table A2. Figure A5. Results of the Mantel tests. IBD refers to Isolation by distance, and "physical distance" refers to geographic distance among populations. Physical Distance G e n e tic D is ta n c e IBD by Individual Figure A4. Correlations among phenotypes and environmental variables. A heatmap is shown with a gradient between purple (negative correlation coefficient) and teal (positive correlation coefficient) with white representing a correlation coefficient of zero. No significant correlations are denoted by an "×" in each box. Phenotypes' IDs can be found in Table A1, and environmental variables' IDs can be found in Table A2.

Appendix A
Genes 2021, 12, x FOR PEER REVIEW 23 of 29 Figure A4. Correlations among phenotypes and environmental variables. A heatmap is shown with a gradient between purple (negative correlation coefficient) and teal (positive correlation coefficient) with white representing a correlation coefficient of zero. No significant correlations are denoted by an "×" in each box. Phenotypes' IDs can be found in Table  A1, and environmental variables' IDs can be found in Table A2. Figure A5. Results of the Mantel tests. IBD refers to Isolation by distance, and "physical distance" refers to geographic distance among populations.