Evidence for parallel adaptation to climate across the natural range of Arabidopsis thaliana

How organisms adapt to different climate habitats is a key question in evolutionary ecology and biological conservation. Species distributions are often determined by climate suitability. Consequently, the anthropogenic impact on earth's climate is of key concern to conservation efforts because of our relatively poor understanding of the ability of populations to track and evolve to climate change. Here, we investigate the ability of Arabidopsis thaliana to occupy climate space by quantifying the extent to which different climate regimes are accessible to different A. thaliana genotypes using publicly available data from a large-scale genotyping project and from a worldwide climate database. The genetic distance calculated from 149 single-nucleotide polymorphisms (SNPs) among 60 lineages of A. thaliana was compared to the corresponding climate distance among collection localities calculated from nine different climatic factors. A. thaliana was found to be highly labile when adapting to novel climate space, suggesting that populations may experience few constraints when adapting to changing climates. Our results also provide evidence of a parallel or convergent evolution on the molecular level supporting recent generalizations regarding the genetics of adaptation.


Introduction
Climate is one of the most important factors determining the distribution of plants (Walther 2003) and therefore adaptation to climate should be a major selective force. Furthermore, the ability to adapt to climate heterogeneity can facilitate or constrain the dispersal of organisms, affecting species range (Angert et al. 2011), and climate adaptation may even play an important role in speciation (Keller and Seehausen 2012). Although historically local climates have been known to fluctuate across space and time at an ecological scale, human impacts are accelerating climate change and this has already affected the survival and distribution of some organisms (Parmesan and Yohe 2003;Parmesan 2006). The effects of climate change are expected to increase in the future (Hancock et al. 2011). Thus, the ability to adapt to different climate regimes will likely be an important factor in the persistence of populations and species. This is especially true of plants, which are sessile and less able to disperse to more favorable climates as climate change occurs.
Of particular interest is how labile populations are with respect to climate adaptation. That is, how easily are they able to expand their range into novel climate space, and how readily are they able to respond to climate shifts in their own range? The ability to predict the evolutionary dynamics that will result from widespread climate change will inform both conservation efforts and basic evolutionary theory (Bradshaw and Holzapfel 2001;Olsen et al. 2004;Teplitsky et al. 2008;Kearney et al. 2009;Hoffman and Sgro 2011;Hansen et al. 2012).
Studies of the effect of climate on species ranges have a long history in plant ecology and evolution (see e.g., Darwin 1859, ch. 11). Furthermore, there is extensive evidence for ecotypic variation within species that contributes to climate adaptation (e.g., Clausen 1926;Clausen et al. 1947;Lowry and Willis 2010). Although plasticity does play a role (Nicotra et al. 2010), the overall picture is that there is a significant genetic contribution to climate adaptation.
Large, publicly available data sets provide a wealth of information for genetic studies. Climate data are also widely available. Given that climate is a significant selec-tive pressure, when populations have resided in a locality for a considerable time (number of generations), it is reasonable to assume that they have adapted to the local conditions. Therefore, combining such large-scale data sets allows researchers to estimate adaptation to climate on a greater scale than would be possible using experimental methods (Banta et al. 2012).
The mouse-eared cress Arabidopsis thaliana is an ideal candidate for such a study. A. thaliana exhibits an annual life-history strategy with a cosmopolitan distribution across a wide range of habitat types. As a model organism for genetic studies, A. thaliana strains from many different climate regimes have been extensively genotyped (Shindo et al. 2007). Climate is known to be an important feature affecting fitness of A. thaliana (Wilczek et al. 2009;Fournier-Level et al. 2011). Climate regimes have been experimentally shown to predict performance under common garden conditions Rutter and Fenster 2007). Finally, climate has been shown to be an important factor limiting the distribution of A. thaliana (Hoffmann 2002).
Although only a few loci contributing to climate adaptation have been well studied, the emerging picture is that climate adaptation in A. thaliana is affected by a vast network of genes affecting traits such as tolerance to temperature (Westerman 1971) and drought (McKay et al. 2003). Loci related to climate adaptation have been found to be widespread throughout the genome by a genome scan (Hancock et al. 2011) and a recent study found a correlation between climate and particular nonsynonymous substitutions at the genomic level (Lasky et al. 2012). Despite this, few studies have empirically examined adaptation to climate in natural A. thaliana populations due in large part to the difficulty in conducting field studies across a large sample of populations (but see Agren and Schemske 2012). When environmental factors can be correlated with fitness, relying on publicly available environmental and genetic data allows for more comprehensive studies.
Here, we quantify whether the genotype of an ecotype is a useful predictor of the climate habitat it occupies. On the basis of earlier studies (Wilczek et al. 2009;Lasky et al. 2012), we expect the relationship between genetic distance and climate distance to be positive. However, how strong shared evolutionary lineage determines the ability to invade climate space is key to our understanding the lability of populations to adapt to climate. If there is a weak relationship between genetic relatedness and occupied climate space then it would suggest that there are multiple ways that a lineage can adapt to a particular climate regime, indicating high lability in the ability of this organism to adapt to climate. This question is highly relevant given the current state of drastic anthropogenic climate change. If the relationship between climate space and genotype space is limited, then it bodes ill for organ-isms like plants that may be restricted in their ability to escape unsuitable habitat.

Methods
We used a large genetic data set from A. thaliana and a worldwide climate database to examine the relationship between genetic relatedness and occupied climate space. To compile data on a substantial number of ecotypes and to generate a genetic distance matrix, we took advantage of publically available data from a large-scale genotyping study (Borevitz lab: http://www.naturalvariation.org/ hapmap). The A. thaliana accessions that we used were taken from 853 lines characterized at 149 SNPs. It was important to have evidence that the accessions had experienced the local climate for long enough to adapt to their collection climate locality. Thus, we attempted to only use accessions that were collected from less anthropogenically disturbed habitats (i.e., not roadsides) typical of A. thaliana's natural habitat where they were more likely to have a relatively long history, and consequently enough time to adapt to local climatic conditions. Such habitats include steep rocky slopes, open areas near forest (but not in understory), and open habitats with sandy or limited soil. We included these habitats as well as habitats that reflect some human disturbance including fallow fields, rocky walls, cemeteries with sandy soil, and etc. In response to reviewers' suggestions we added a further 67 accessions that from the habitat descriptions appeared to be from more anthropogenically disturbed sites including roadsides, tourist parks, fields under active cultivation, railway ballasts, and etc. (Appendix B). However, the vast majority of lines had no habitat data and were omitted immediately. Of the rest, several were collected outside of the native range of A. thaliana (e.g., in North America) and several more were not genotyped at the majority of the SNPs. In addition, we excluded such habitats as "Botanic Garden" or any university-associated sites, as those plants may represent escaped accessions adapted to other localities. This resulted in a data set consisting of 60 accessions (Appendix A) that derived from what we consider the least anthropogenically disturbed habitats and 67 more accessions from sites that might reflect higher anthropogenic disturbance (Appendix B).
To generate a climate distance matrix among the 60 and 67 accessions, we compiled climate data for each locality from a database consisting of nine different climate factors recorded every 10 degree minutes worldwide. The closest recorded point to the collection site of each A. thaliana line was used for this study. In some cases this created overlap in the site data for certain accessions. While this data set does not capture what may be important microclimate variation, it was the most precise data available to us. Given that previous studies have demonstrated a genetic contribution to climate adaptation, we believe that this will provide a conservative measure of the lability of genetic adaptation to climate, as genotypes from populations in the same climate regime would be expected to increase the correlation between genotype and habitat climate. Data for eight of the climate factors were collected monthly. These were precipitation (pre), number of wet days (wet), mean temperature (tmp), mean diurnal temperature range (dtr), relative humidity (reh), sunshine (sunp), ground frost (frs), and 10-m wind speed (wnd). The ninth was elevation and consisted of a single measure for each location. We included all available climate factors to avoid any a priori assumptions about which factors were most important. We ran additional analyses on a selected subset of the data (mean temperature from November to June and precipitation from June to August). These factors were selected using Banta et al. (2012) as a guide. This reduced the partial Mantel correlation slightly and to a nonsignificant degree. We therefore included all climate factors in the final analysis. The data are from New et al. (2002) and can be downloaded from Climate Research Unit website (http://www.cru.uea. ac.uk/cru/data/tmc.htm).
To compare climate distance to genetic distance we calculated distance matrices for both genotype (SNPs) and climate. The genetic distance matrix was calculated using DNADIST from the PHYLIP package (Felsenstein 1989) using the F84 substitution model. The climate distance matrix was calculated using PROC DISTANCE METHOD=DGOWERS in SAS 9.1.3 (SAS Institute 2004). This is Gower's environmental distance metric (Gower 1971).
To compare the genetic distance matrix to the climate distance matrix using tree-based methods, we estimated neighbor-joining trees for each distance matrix using the program NEIGHBOR in the PHYLIP package (Felsenstein 1989). We then calculated the Robinson-Foulds tree distance metric using TREEDIST from the PHYLIP package (Felsenstein 1989). This metric measures the dissimilarity among the overall topology of two or more unrooted trees (Robinson and Foulds 1981). Smaller numbers indicate higher similarity among topologies. The scale of the metric ranges from 0 (total concordance) to 2n À 6, where n is the number of terminal nodes. In the situation where we used the 60 accessions, then the maximum Robinson-Foulds index would be 2 (60) À 6 = 114. We then used the program topd/fMtS (Puigbo et al. 2007) to calculate a Robinson-Foulds metric among a set of randomized trees. This number is expected to reflect low concordance due to random topologies.
We expected that geographic distance could inflate the relationship between genetic and climate distance because closely related genotypes are expected to share geographic locales and hence similar climates (Beck et al. 2008). Thus, to remove the confounding influence of geographic proximity, we calculated an additional matrix of geographic great circle distance in R (R Development Core Team 2011). We used the VEGAN (Oksanen et al. 2011) package in R to calculate the partial Mantel correlation (Mantel 1967) between the genetic distance matrix and the climate distance matrix controlling for the geographic distance matrix for both the 60 least disturbed and 67 moderately disturbed accessions separately and together. Mantel and partial Mantel tests are commonly used in ecology to study the relationship between ecological factors and genetic distance (Smouse et al. 1986). VEGAN calculates three correlation measures: Pearson's product moment correlation, Spearman's rank correlation, and Kendall's rank correlation. All R scripts can be found in the online supporting information.

Results
The genetic distance matrix (SNP) and the climate distance matrix for the 60 accessions collected from less disturbed sites are both represented as neighbor-joining trees (Fig. 1). The Robinson-Foulds distance metric for the two neighbor-joining trees was 110, indicating very low concordance between trees. The least possible concordance is 2n À 6, 114 in this case. The calculated Robinson-Foulds for a set of 100 randomized topologies for this data set is 113 with a 95% confidence interval of AE0.2. Therefore, although the concordance between these two trees is very low, there is a small signal of lineage on occupied climate space.
The results of the partial Mantel tests are presented in Table 1 for the 60 accessions collected from less disturbed sites that in our opinion more likely reflect native habitat. The partial Mantel correlations comparing the genetic distance matrix to the climate distance matrix and controlling for geographic distance were positive but low. Including the 67 moderately disturbed localities with the less disturbed (a total of 127 lineages) did not affect the ranked correlations from the partial Mantel test but reduced the Pearson correlation by about two thirds (from r = 0.23 to r = 0.07). When a partial Mantel test was conducted with the 67 lineages from the moderately disturbed localities alone, the Pearson correlation between genetic distance and climate distance was not significantly different from 0 (r = 0.02, P = 0.285). Therefore, we decided to base our conclusions only on the 60 accessions collected from less disturbed localities.
As an additional visualization we include a scatter plot of pairwise climate distance by pairwise genetic distance for the 60 accessions that shows a positive but low correlation, with the majority of the points reflecting high genetic distance coupled with low climate distance (Fig. 2). We acknowledge that pseudoreplication is a concern with this presentation and we do not base any of our formal analyses on this figure. It is included solely for illustrative purposes.

Discussion
We demonstrate positive but low concordance between genetic relatedness in A. thaliana populations and the climate space that those populations inhabit. Both the partial Mantel tests and the Robertson-Foulds index indicate that genetic relatedness has little explanatory power in predicting the climate in which a genotype will be found. We interpret our results to mean that similar A. thaliana genotypes are able to occupy different climate regimes and that different genotypes have access or the ability to evolve to similar climate regimes. Therefore, access to different climate spaces appears to be relatively unconstrained by the A. thaliana genotype. This is also seen in Figure 2 where a number of genotypes have zero genetic distance based on the survey of 149 SNP's and yet occupy a wide range of climate regimes.  . Scatter plot of pairwise climate data (Gower's distance) versus pairwise genetic data for 60 Arabidopsis thaliana accessions collected across its native range. As these data suffer from pseudoreplication, we did not include it in our formal analyses and only include it for illustrative purposes. The figure is divided into four quadrants representing general relationships between climate distance and genetic distance. They are (A) high climate distance and low genetic distance, (B) high climate distance and high genetic distance, (C) low climate distance and low genetic distance, and (D) low climate distance and low genetic distance. The majority of the points reflect a correlation between low climate distance and high genetic distance. Hoffmann (2005) examined the evolution of climate adaptation in the genus Arabidopsis using phylogenetic reconstruction with climate space as a character. The analyses determined the core climate space (the climate space where all studied taxa coexist) and the realized climate niche (the intersection of taxa distribution ranges and climate data) of the genus. Hoffmann concluded that there was a high degree of parallel evolution to climate across the genus. Here, we demonstrate this same phenomenon within a species.
The ability of different genotypes to access similar climate habitats, and vice versa may help explain how A. thaliana has been able to achieve a cosmopolitan distribution in such a short period (e.g., across North America in approximately the last 150-200 years) (Vander Zwan et al. 2000;Jorgensen and Mauricio 2004). Given its annual life history it is possible that populations have adapted to climate on the order of tens to hundreds of generations. Recent range expansion in A. thaliana is certainly a result of human interference, but it is believed that the dispersal of self-fertilizing seed colonists has been the most important force in the history of the species. The ability of these annual colonists to rapidly adapt to novel climate habitats likely facilitated this process (Samis et al. 2012). Genetic adaptation to climate may exhibit the pattern we found due to parallel evolution or convergent evolution. In the former, the same genetic changes occur independently. In the latter, different genetic changes occur but the end result is the same. Several greenhouse and field studies have found a high beneficial mutation rate in A. thaliana, with as many as 50% of new mutations conferring a fitness advantage (Shaw et al. 2000;MacKenzie et al. 2005;Rutter et al. 2010Rutter et al. , 2012. Thus, the independent adaptation to similar climates by genotypes that are not directly related may be due to the contribution of new beneficial mutations. We foresee two potential concerns for our interpretation of independent adaptation to climate space. The first is the possibility that A. thaliana is phenotypically plastic with regard to climate space. Although this likely plays some role, we believe that our study also reflects genetic adaptation to climate. A reciprocal transplant study demonstrated genetic adaptation to climate between two European populations (Agren and Schemske 2012). Likewise, a common garden experiment has shown that the success of an accession of A. thaliana in a particular habitat can be accurately predicted by the similarity of that habitat to the native habitat of the ecotype (Rutter and Fenster 2007), consistent with adaptive differentiation to climate. Finally, analysis of the 67 accessions from localities we deemed that moderately disturbed showed no correlation between climate distance and genetic distance. We therefore feel our criteria for selecting localities were sufficiently conservative and reflect accessions that are likely to be locally adapted to their climate regime.
The second concern is that our study may not have captured variation in the loci that are involved in climate adaptation. The 149 SNP markers that we used to construct the genetic distance matrix were not intended to be used to identify loci associated with climate adaptation, although it is likely that some were given that linkage disequilibrium estimates vary from 10-250 kb (Nordborg et al. 2002(Nordborg et al. , 2005Kim et al. 2007). Rather our SNP-based genetic distance tree clearly demonstrates that genetic relatedness is not a strong predictor of the climate inhabited by the genotype. We do know that climate adaptation in A. thaliana involves the interaction of a large number of loci distributed throughout its genome (Wilczek et al. 2009). In a recent study using 214,051 SNPs and 1003 accessions of A. thaliana, 15.7% of the genetic variation was found to be associated with climate (Lasky et al. 2012). Therefore, one would not expect all the same loci to be involved in the evolution to similar climates given the large number of loci so far identified to be associated with climate adaptation.
Based on our results, it seems likely that parallel evolution is common not just among species of Arabidopsis (Hoffmann 2005) but also within A. thaliana. An intriguing generality from adaptation genetics studies is that, at the sequence level, parallel evolution may be more common than once believed (Orr 2005a,b). If this is true, then species may not be as genetically constrained with regard to potential habitats, or adaptation to changing habitats. Our data indicate that A. thaliana is unconstrained with regard to climate adaptation within the range of climate in which it is found and this may account for its rapid cosmopolitan range expansion. Similar results were reported by Banta et al. (2012). These authors found that later flowering time restricted the niche breadth (measured by climate variables) of A. thaliana accessions as compared to earlier flowering accessions which were relatively unconstrained. Flowering time in their study could be controlled by any one of 12 different loci. That is, adaptation to climate could be affected by at least 12 different genetic pathways.
Our findings have important implications to conservation efforts that are responding to anthropogenic change (Bradshaw and Holzapfel 2001;Hoffman and Sgro 2011), suggesting that it may not be easy to predict which populations have the ability to adapt to new climate regimes. Using information from ecological and genetic databases in conjunction with smaller scale field studies may therefore be a useful way to generate results from a much larger number of populations than is feasible using experimental methods, and may help shed light on the genetics of climate adaptation.