Species distribution modeling and molecular markers suggest longitudinal range shifts and cryptic northern refugia of the typical calcareous grassland species Hippocrepis comosa (horseshoe vetch)

Abstract Calcareous grasslands belong to the most diverse, endangered habitats in Europe, but there is still insufficient information about the origin of the plant species related to these grasslands. In order to illuminate this question, we chose for our study the representative grassland species Hippocrepis comosa (Horseshoe vetch). Based on species distribution modeling and molecular markers, we identified the glacial refugia and the postglacial migration routes of the species to Central Europe. We clearly demonstrate that H. comosa followed a latitudinal and due to its oceanity also a longitudinal gradient during the last glacial maximum (LGM), restricting the species to southern refugia situated on the Peninsulas of Iberia, the Balkans, and Italy during the last glaciation. However, we also found evidence for cryptic northern refugia in the UK, the Alps, and Central Germany. Both species distribution modeling and molecular markers underline that refugia of temperate, oceanic species such as H. comosa must not be exclusively located in southern but also in western of parts of Europe. The analysis showed a distinct separation of the southern refugia into a western cluster embracing Iberia and an eastern group including the Balkans and Italy, which determined the postglacial recolonization of Central Europe. At the end of the LGM, H. comosa seems to have expanded from the Iberian refugium, to Central and Northern Europe, including the UK, Belgium, and Germany.

In the traditional view, glacial refugia are known to be southern refugia for temperate species from all groups of organisms: the Iberian, Italian, Balkan peninsulas in Southern Europe (Hewitt, 1999(Hewitt, , 2004Taberlet, Fumagalli, Wust-Saucy, & Cosson, 1998), where the influence of the glacial cycles was alleviated (Tzedakis, Lawson, Frogley, Hewitt, & Preece, 2002) and species could escape from cold dry climates and persist until they could repopulate Europe in the interglacials and after the last glacial. Nevertheless, recent studies suggested several additional refugia for temperate species beyond these peninsulas. These cryptic northern refugia are postulated for higher latitudes than the expected southern refugia (Bhagwat & Willis, 2008;Bylebyl, Poschlod, & Reisch, 2008;Magri et al., 2006;Willis & van Andel, 2004) and are defined as climatic islands with favorable conditions (Stewart & Lister, 2001), surrounded by unsuitable conditions.
In the case of calcareous grasslands, this may have been rocky outcrops or steep sunny slopes with shallow dry soils (Ellenberg, 1988;Poschlod, Baumann, & Karlik, 2009) in deeply incised valleys providing microclimates for temperate species, allowing to survive where they normally would have perished (Flojgaard, Normand, Skov, & Svenning, 2009;Stewart & Lister, 2001). As soon as climate became warmer in the postglacial, recolonization of the surrounding steppe-tundra vegetation may have started from there. In this context, it must be mentioned that the following reforestation might have been held back before the Neolithic (see Bush, 1988), either by human with fire or by megaherbivores enlarging the potential habitats of calcareous grassland species besides naturally treeless sites like cliffs (Pokorný, Chytrý, & Juřičková, 2015;Svenning, 2002). Consequently, both wild animals and humans with domesticated animals may have contributed to species' ranges and contributions. Stewart, Lister, Barnes, and Dalen (2010) postulated a longitudinal oceanic-continental gradient that is often ignored when speculating about recolonization of species along the latitudinal axis.
The longitudinal gradient explains the expansion of steppe species and their inclusion in the Late Pleistocene 'steppe-tundra' biome.
Accordingly, the occurrence of current postglacial steppe species is limited to eastern continental interglacial refugia which are determined by the longitudinal gradient. Occurrences in the west could therefore be interpreted as cryptic refugia (compare Kunes et al., 2008;Schmitt & Varga, 2012). On the other hand, there should be counterpart examples for oceanic species, because extension of arid climates during the late Pleistocene would have been an impediment to some taxa likewise cold climates. The ranges of oceanic species would have expanded during the moister interglacials and contracted to Western Europe during the glacial periods. Together with the latitudinal gradient, both work in tandem in defining suitable habitats of a species (Stewart et al., 2010).
In order to gain insight in the origin of a typical calcareous grassland species with submediterranean and oceanic requirements, we chose Hippocrepis comosa L. (horseshoe vetch). It has already been demonstrated that human activities have contributed at least since the early Neolithic to the migration of crops, weeds, and animals (Beebee & Rowe, 2000;Fjellheim, Rognli, Fosnes, & Brochmann, 2006;Rosch, 1998;Willerding, 1986). It seems therefore quite possible that the migration of H. comosa is also related to human migration processes. The occurrence of H. comosa was first time documented for the Roman age in the lower Rhine Valley (Knörzer, 1996). Therefore, the question is whether or not the species came to Central Europe via Roman settlers. As Mediterranean species, there is also the possibility of spreading from Iberian or Balkan Peninsula. Exemplarily Poschlod (2014) claims the migration of dry grassland species from the Eastern Mediterranean region or southeast Europe through the migration of the first farmers of the linear ware ceramic culture (LBK) to Central Europe or from Western Europe through the La Hoguette culture.
Considering the climatic conditions in Central Europe during the Pleistocene, we postulate that H. comosa shifted westwards in the glacial periods due to the lateral expansion of continental climate and additional to its submediterranean character also southwards. We assume that H. comosa survived glaciations in western or southern refugia, but it cannot fully be excluded that the species also occurred in cryptic refugia in Central Europe.
Our aim was to identify glacial refugia and postglacial immigration routes of H. comosa to Central Europe, and we applied, therefore, two scientific approaches. Firstly, we used species distribution modeling (SDM) to predict suitable refugia during the Pleistocene with climate data. Within MaxEnt, a machine-learning application, we initially calibrated a model containing actual distribution data of H. comosa in combination with a set of today's climate parameters (Elith & Leathwick, 2009). This model was then used to process climate data prevailing during the last glacial maximum to predict suitable refugia. Secondly, we applied amplified fragment length polymorphisms (AFLPs) as molecular markers to analyze the genetic variation within and among 38 populations of H. comosa from the whole distribution range to gain information about glacial refugia and recolonization routes of the species. More specifically we asked the following questions: (i) Which refugial areas served as source for the postglacial immigration of H. comosa to Central Europe? (ii) Where were the main migration routes from the refugia to Central Europe? (iii) Is there evidence for the long-term survival of H. comosa in cryptic northern refugia?

| Study species
For this study, we selected Hippocrepis comosa (horseshoe vetch), which is a typical calcareous grassland species with submediterranean and oceanic requirements. As mentioned by Schmidt et al. (2007), the species occurs primary in natural habitats and secondary in seminatural habitats. It also occurs in recent and ancient grasslands  and seed exchange by grazing was shown possible (Müller-Schneider, 1938

| Species distribution modeling
Information containing georeferenced occurrences of H. comosa was downloaded from the Global Biodiversity Information Facility (GBIF). The total number of downloaded data was 17,934 with about 7,000 locations clustered in the northern half of France. Therefore and because of the fact that the data set showed a mixture of grid based data (mainly in Germany, France, Spain, and UK) and pinpoint occurrences, an uniform raster was created with a point distance of 2.5 min in an unprojected coordinate reference system (WGS84) encompassing the total distribution area of the H. comosa. With this approach, sampling bias can be avoided (Wisz, Hijmans, & Li, 2008).
The new distribution map was reduced to 2,794 points, 38 of them were additionally added from another study focusing on the same species (unpublished data). Geological parameters had to be excluded from our survey, as due to our grid based approach, the occurrence data of H. comosa would have been linked to incorrect edaphic values. Furthermore, to our knowledge, geological maps that describe the edaphic conditions during the LGM, especially in regard to current undersea areas, are not available.
To describe the climatic circumstances of the present age and the LGM (about 22,000 years ago), we used 19 bioclimatic variables (listed in Table A1, Appendix). The variables are derived from monthly mean temperature and precipitation and represent climatic annual trends, seasonality, and extreme conditions. Provided as separate climate layers at WorldClim (http://worldclim.org, Version 1.4, release 3, Hijmans, Cameron, Parra, Jones, & Jarvis, 2005), data were downloaded as grid (raster) format. The resolution of the data was 2.5 min (WGS84, unprojected). The current conditions involve interpolation of observed data from 1950 to 2000. As climatic data for the last glacial maximum conditions, two different reconstructions were used: CCSM4 and MPI-ESM-P. All gathered from WorldClim the original data were provided by the Coupled Model Intercomparison Project (CMIP5). The resolution was 2.5 min (WGS84, unprojected). As H. comosa exclusively occurs in Europe, geographic data were reduced to the European region. To avoid geographic bias, data were projected to an equal area projection (Europe Albers EAC). All GIS-related work was carried out in ArcGis 10.2.2 (ESRI, Redlands, CA, USA).
Ecological niche modeling and the subsequent creation of the geographic distribution maps of H. comosa at present and past time was computed with the program Maxent, version 3.3.3 (Phillips & Dudik, 2008). The Maxent software uses a maximum entropy algorithm which is well suited for species habitat modeling using presence-only data (Elith, Graham, & Anderson, 2006). Therefore, it is a proper method for predicting species distributions for both past-and future-orientated scenarios (Hijmans & Graham, 2006). In a first step, we calibrated the model with the actual occurrence data of H. comosa together with all current 19 bioclimatic variables (Table A1, Appendix). The resulting potential species distribution model was then projected onto the climate conditions prevailing during the last glacial maximum. To validate the informative value of the model regarding species distribution, we used the area under the receiver operating characteristic (ROC) curve (AUC) (Fielding & Bell, 1997), which is an implemented validation routine within Maxent. The occurrence data were randomly partitioned into two groups: One group containing 75% of the data was used for the model calibration; the remaining 25% were used for model testing (Phillips, Anderson, & Schapire, 2006). High AUC values (>0.7) indicate a good model performance (Fielding & Bell, 1997). With following exceptions, we used the default settings in Maxent (Phillips & Dudik, 2008). The convergence threshold was set to 10 −5 , the maximum number of iterations was 5,000, and 15 replicates with the replicated run type "subsample" were made. The selection of the relevant climate data was automated. As threshold rule, we chose maximum test sensitivity and specificity (MTSS) to optimize the correct discrimination of presences and pseudoabsences in the test data (Hernandez, Graham, Master, & Albert, 2006;Jimenez-Valverde & Lobo, 2007). The continuous logistic output of Maxent was transformed in a binary presenceabsence map. The threshold value for presence was based on MTSS values which were averaged over 15 runs.

| AFLP analysis
For the molecular analysis, plant material of H. comosa was sampled throughout the whole species range distribution on the European continent and embraced in total 588 individuals from 38 populations (Table 1). Each sample encompassed multiple fresh and healthy leaves which were dried in silica gel.
DNA extraction followed CTAP protocol from Rogers and Bendich (1994) adapted by Reisch (2007) using 15 mg of the dried leaf samples. DNA contents were photometrically determined and adjusted to 7.8 ng DNA per 1 μl H 2 O. We chose the dominant marker analysis of amplified fragment length polymorphisms (AFLP, Vos, Hogers, & Bleeker, 1995;Zabeau & Vos, 1993) to produce loci over the whole genome (standardized AFLP protocol from Beckmann Coulter (Brea, USA)). Molecular analyses were conducted as described previously (Bylebyl et al., 2008). For the selective DNA amplification,

| Statistical analysis of the AFLP data
Based on the allele frequencies from the 0/1 matrix Nei′s Gene Diversity (Nei, 1972), Shannon′s Information Index (Shannon 1948), the number and percentage of polymorphic loci were calculated for each population using POPGENE v. 1.31 (Yeh, Yang, & Boyle, 1999).
In order to make the results of Nei's gene diversity more comparable, an additional calculation was conducted at which the number of tested individuals was set to 12 for each population (lowest available amount). The 12 samples were chosen randomly with 50,000 iterations, and a mean value was calculated. The results were plotted onto the geographic coordinates of the sample locations. A base map provided by "Natural Earth" served as background for this and all following maps.
As an additional measure of divergence, the rarity of markers was calculated by frequency-down-weighted marker values (DW) (Schönswetter & Tribsch, 2005). The calculation of the DW values was performed via the r-script AFLPdat (Ehrich, 2006). To grant equal sample sizes, for each population, 12 individuals were randomly selected with 10 iterations and a mean value of DW was calculated. The results were plotted onto the geographic coordinates of the sample locations. The value of DW is expected to be high in long-term isolated populations where rare markers should accumulate due to mutations whereas newly established populations are expected to exhibit low values, thus helping in distinguishing old vicariance from recent dispersal (Schönswetter & Tribsch, 2005). We The resulting AMOVA-SS diversity values per sample location were also presented cartographically. Permutation tests (9,999 iterations) were conducted to show significance.
The genetic structure and group assignment of the populations was investigated with Bayesian clustering in STRUCTURE v 2.3 (Pritchard, Wen, & Falush, 2009;Pritchard, Stephens, & Donnelly, 2000). The program performs a Markov chain Monte Carlo (MCMC) algorithm to assign the tested individuals into k groups based only on its genetic data, and not on population affiliation. The program was run with following parameters: no admixture ancestry model, correlated allele frequency model, k from 2 to 40, a burn-in period of 10,000 followed by 10,000 iterations, 10 replicate runs. The most likely number of groups in the data set was determined via the calculation of Δk following the method of Evanno, Regnaut, and Goudet (2005). The results were plotted onto the geographic coordinates of the sample locations.
To identify spatial genetic patterns within the data set, a multivariate approach was conducted using spatial principal component analysis (sPCA). sPCA was carried out in R (Development Core Team, 2014) using package adegenet (Jombart, Devillard, Dufour, & Pontier, 2008).  ETRS 1989 LCC. To avoid same coordinates of individuals in the same population, the coordinates were shifted randomly by a factor of 0.5.
Unlike the analysis with STRUCTURE, the data for a sPCA do not have to meet Hardy-Weinberg expectations or linkage equilibrium. For the method, two matrices are necessary. The first one contains the relative allele frequencies of all individuals and the second embraces all spatial proximity information from the projected coordinates. The spatial proximity information matrix was gained from a connection network using Delaunay triangulation. Also, the second matrix was used to calculate a spatial autocorrelation using Moran's I (Moran, 1948). Moran's I ranges from +1 to −1, indicating a strong positive or negative spatial autocorrelation, respectively. In case of a positive spatial autocorrelation, a global structure in the data can be assumed. A Moran's I of zero indicates a totally random pattern. For a visual verification of the occurrence of spatial structures, a screeplot was drawn by plotting the variance of the sPCA against spatial autocorrelation (Moran's I).
Supplementary, to statistically strengthen the previous visual findings, two Monte Carlo tests with 9,999 permutations each were conducted in order to detect global or local structures in the data set. As display, the genetic differentiation of the principal components was plotted onto the geographic coordinates.

| Species distribution modeling
Species distribution modeling resulted in three maps, which show the

| AFLP analysis
In  Mantel test (10,000 iterations) for a correlation between geographic and genetic distances was highly significant (p < .0001).

| Eastern and western lineages and postglacial migration
AFLP analysis is a powerful tool in revealing glacial refugia and post-  (Hewitt, 1999(Hewitt, , 2004Taberlet et al., 1998) could be confirmed: Iberia as a southwestern refugia and Italy and the Balkans as southeastern refugia, enclosing the admixed populations in Spain, South Germany, and Hungary as contact zones. Contact zones have been reported to cluster in the Alps, Central Europe, northern Balkans, and the Pyrenees (Hewitt, 2000;Taberlet et al., 1998), resulting in an accumulation of genotypes from both groups. Therefore, we assume that H. comosa repopulated Central Europe from Iberia (Pyrenees) to France, Britain, Belgium, Switzerland, and Germany until its eastern border, where the populations admixed with populations that were migrating from Italy and the Balkans up north. It has been shown before that in contrast to Iberian lineages, Italian genomes rarely populated Northern Europe, as the ice-capped Alps prevented their northward expansion (Hewitt, 2000;Taberlet et al., 1998). This barrier is regarded as explanation of the relatively low species' and genetic diversity of northern populations compared to southern populations (Hewitt, 2000).

| Southern and cryptic northern refugia
The most important refugial areas are geographic regions where species persisted throughout several full glacial/interglacial cycles (each 100-120 kyr in duration). These so-called true refugia (Stewart & Dalen, 2008) are expected to possess higher genetic variability compared with surrounding recolonized regions (Comes & Kadereit, 1998;Taberlet et al., 1998;Tzedakis et al., 2013). In contrast, recent dispersal might lead to genetic depauperation due to founder effects.
Supporting this theory, the highest values for genetic variation of H. comosa were recorded almost entirely in southern populations, like the Iberian, the Italian, the Balkan Peninsula, and south of the Alps.
Other above-average genetically diverse populations were located in the Alps and in Germany. The former can be ascribed to the abovementioned hybridization of the western and the southeastern lineage (Petit, Aguinagalde, & de Beaulieu, 2003;Provan & Bennett, 2008;Tzedakis et al., 2013). We found different explanations for the high genetic variation of the latter. Firstly, it may be the result of recent genetic exchange due to grazing management, which is more prevalent in the area of the Jurassic mountains in Germany than in the northern German populations. Paun, Schonswetter, Winkler, Tribsch, Secondly, the high genetic variations may also have resulted from a phalanx way of recolonization from south to north or can be interpreted as a legacy from Younger Dryas cold reversal (Hewitt, 1999(Hewitt, , 2000(Hewitt, , 2004. A reduction of northern populations during this cold period could have led to high diversity populations due to the mixture with recolonizing linages during the Holocene (Tzedakis et al., 2013).
In order to circumvent these confusions, a second parameter, the rarity index (DW) was used, as it is a better indicator of historical processes (Paun et al., 2008). The value of DW is expected to be high in longterm isolated populations where rare fragments could accumulate due to mutations, whereas young populations are expected to show low values, thus helping in distinguishing old vicariance from recent dispersal. Refugial populations and recolonized regions would on the one hand contain identical fragments but due to drift rare fragments would accumulate and distinguish them from other refugial areas and surrounding younger populations that would be less divergent (Provan & Bennett, 2008;Schönswetter & Tribsch, 2005;Tzedakis et al., 2013).
According to this, we identified additional northern populations of H. comosa that are located far from the traditional southern refugia which possessed high DW values, supposing cryptic refugia in UK, the Alps, and Central Germany. Such cryptic refugia have previously also been identified for other grassland species like Bromus erectus (Sutkowska, Pasierbinski, Warzecha, Mandal, & Mitka, 2013) or grassland related pine forest species like Polygala chamaebuxus. Although it must not be ignored that the rarity index might be overestimated, as related southern populations with these alleles may not have been investigated or distinct gene patches might result by gene surfing on the leading edge (Tzedakis et al., 2013). Nevertheless, we found a significant correlation of the rarity index (DW) and Nei's gene diversity (He), which means that in our study, rare fragments accumulation and high genetic diversity came along with each other, pointing toward "true refugia". The German population with a very high rarity index and relatively low genetic variation may indicate an isolated refugium during the LGM, similar to alpine populations of Ranunculus glacialis (Paun et al., 2008). These results also coincide with Dengler, Janisova, Torok, and Wellstein (2014), who proposed a continuous existence of palearctic grassland at least since the Pleistocene (2,400 ka). During glaciations, grasslands covered most of the continent as steppetundra over permafrost and as xerothermic grassland further in the F I G U R E 4 Results from STRUCTURE analysis. The surveyed populations of Hippocrepis comosa were plotted onto geographic coordinates of. As STRUCTURE proposed a two-group solution, each population was assigned according its associated group south. During the interglacials (Lang, 1994;Pärtel, Bruun, & Sammul, 2005), grasslands were mostly replaced by forests, apart from smallscale areas on soils that impede tree growth (drought, shallow ground, instability (Ellenberg & Leuschner, 2010;Janišová, Bartha, Kiehl, & Dengler, 2011;) and reoccurring events like fire, wind throw (Hejcman, Hejcmanova, Pavlu, & Benes, 2013), grazing by wild herbivores (Vera, 2000), or human activity since the Mesolithic (Bush, 1988;Simmons & Innes, 1981). As H. comosa was described frost tolerant by Hennenberg and Bruelheide (2003) and it is growing up to 2000 m a.s.l. in the Alps, it seems reasonable to assume that the species might have occurred in Central Europe during the LGM on outcrops under favorable conditions.

| Latitudinal and longitudinal constraints of its distribution
On a global scale, it applies for past, present, and future that wild species ranges are primarily determined by climate (Normand, (Normand et al., 2011), which may have resulted in a current disequilibrium (postglacial colonization) of species ranges within the actual climate (migrational lag). Regarding the distribution of H. comosa in Europe, we assume that it has fully expanded to its potential range. The northern distribution limitation as it is today was investigated on a regional scale by Hennenberg and Bruelheide (2003), showing a reduced fitness (reduced seed setting), which correlated with effective air temperature measured at a height of 10 cm above ground. The area of climatically suitable habitats was also shown in the present time species distribution model for this species. In addition, soil composition impedes an expansion further north and east than its present status (Hartmann & Moosdorf 2012).
Therefore, we depicted that the current occurrence of H. comosa is mainly limited due to climate and soil factors and that the distribution pattern is not limited by reduced accessibility. The declining temperature at the beginning of the last ice age was with no doubt the main driving factor influencing migration of plants following a latitude gradient to their southern refugia (Taberlet et al., 1998 (Fischer, Poschlod, & Beinlich, 1996;Müller-Schneider, 1938;von Oheimb, Schmidt, Kriebitzsch, & Ellenberg, 2005) and epizoochory, implying a transportation with soil material in the hooves, which was reported for sheep and cattle (Fischer et al., 1996; and affects the genetic structure of plant populations (Willerding & Poschlod, 2002 (Bush & Flenley, 1987;Bush, 1988;Pokorný et al., 2015), also because of human practices (Poschlod, 2014). The La Hoguette culture (~7,500 y.a.) showed first steps toward nomadic goat and sheep breeding (Gronenborn, 2003) and therefore could have directly influenced the dispersal of H. comosa via epizoochory (Müller-Schneider, 1938) from Southern France to Germany. This assumption demonstrates that a more interdisciplinary approach to this subject is necessary in order to fully understand the recolonization of Central Europe by plants, animals, or humans. A strengthened cooperation of phylogenetics and archeology could deliver intriguing new insights in the genesis of our environment.

| CONCLUSIONS
Based on the present study on the previous and current distribution of H. comosa, we could demonstrate that a comprehensive climatic approach including a second driving factor can lead to a better understanding of historical and present developments.   FIGURE A4 Screeplot of the total data set. The eigenvalues of the sPCA are arranged by its variance (x-axis) and Moran's I (spatial autocorrelation, y-axis). As the y-axis represents Moran's I, positive scores indicate global structures. Moran's I shows a high-spatial autocorrelation (>0.8). Due to its largest positive eigenvalues λ1, the first global score can be clearly distinguished from all other values and therefore was chosen for further interpretation. The maximum attainable variance from an ordinary PCA is shown as the dashed vertical line on the right side of the graph. The range of variation of Moran's I is limited by the horizontal dashed lines