Potential distribution range of invasive plant species in Spain

Success of invasive species has been frequently estimated as the present distribution range size in the introduced region. However, the present distribution range is only a picture of the invasion for a given time step and do not inform on the potential distribution range of the species. Based on niche-based models we used climatic, geographic and landscape information on the present distribution range for 78 major plant invaders in Spain to estimate and map their potential distribution range. We found a positive relationship between present and potential distribution of species. Most of the species have not yet occupied half of their potential distribution range. Sorghum halepense and Amaranthus retroflexus have the widest potential distribution range. Sorghum halepense and Robinia pseudoacacia have the highest relative occupancy (i.e. proportion of potential distribution range currently occupied). Species with a larger minimum residence time have, on average, higher relative occupancy. Our study warns managers that it might be only a matter of time that currently localized invasive species reach their potential area of distribution.


Introduction
Invasive plant species are defined as alien species that sustain self-replacing populations without direct human intervention.They produce offspring, often in very large numbers, at considerable distances from the parent plants, and thus have the potential

ReSeAh ARtICle
A to spread over a large area (Pyšek et al. 2004).Yet, the spread rate of invasive species differs considerably.The distribution of invasive species is not static.There might be large differences between the present and potential distribution ranges of invasive species (Higgins et al. 1996, Sakai et al. 2001).From a management point of view, it is extremely important to identify areas not yet invaded but where early warning detection and control programs are critical to implement.
Up to now, most efforts to evaluate the success of invasive species at the regional scale have been traditionally measured as the present distribution range in the region of introduction (Mack et al. 2006).However, the present geographical range size shows only a picture of the degree of invasion for a given time step, but it does not inform about the dynamics of invasion and the potential invasion range in the near future.Recent studies have developed niche-based models to assess the suitability of a region for a given invasive species and its potential to spread throughout (Petterson 2003, Rouget et al. 2004, Guisan and Thuiller 2005, Thuiller et al. 2005).These models are mainly based on the climate matching approach (Curnutt 2000, Pauchard et al. 2004, Watt et al. 2010, Kriticos et al. 2011).However, even at the regional scale, other factors determine the distribution of species including biotic interactions, evolutionary change and dispersal ability (Pearson andDawson 2003, Ibáñez et al. 2006).For invasive species, direct and indirect human assisted dispersal is a primary determinant of species distribution.This is the reason why recent estimations of the distribution area of invasive species incorporate geographical and landscape variables related to human activities and disturbances (Pino et al. 2005, Thuiller et al. 2006, Chytrý et al. 2008, Gassó et al. 2009).
Moreover, historical factors determining differences in propagule pressure such as the minimum residence time (i.e.time since first record) also influence the range size of invaders (Hamilton et al. 2005, Gassó et al. 2009, Ahern et al. 2010).Due to lag times, the longer the species is present in the region, the more propagules are spread and the probability of founding new populations increases (Crooks 2005, Lockwood et al. 2005).Therefore, the relationship between range size and residence time should be considered.If there is a positive relationship between the proportion of the potential distribution range currently occupied and the minimum residence time, we can consider that it is only a matter of time for a localised invasive species to become widespread.
Here we calculated and mapped the potential distribution ranges of the main invasive plant species in Spain using climatic, geographic and land use variables.This research is planned to assist environmental managers to estimate the risk of present alien invasive species to expand into non-invaded areas.Mapping the potential distribution of species in areas were they are still not occurring is of high priority for the regional administrations to fulfil the Spanish List and Catalogue of Invasive Exotic Species Act 1628/2011 (http://www.boe.es/boe/dias/2011/12/12/pdfs/BOE-A-2011-19398.pdf).Our main questions are: (i) To what extent is the potential distribution range related to the present distribution range?(ii) What is the mean proportion of potential distribution range currently occupied (i.e.relative occupancy)?(iii) Does relative occupancy depend on the minimum residence time of the species?

Species distribution
Distribution data and minimum residence time (i.e.earliest date on which a given species was recorded in Spain) were compiled from the Atlas of Invasive Plant Species in Spain (Sanz-Elorza et al. 2004).This atlas contains spatially explicit presence records for over 100 invasive alien plant species at a resolution of 10×10 km UTM (Universal Transverse Mercator) grid.The atlas was generated using several information sources: herbarium records, publications and field surveys.From the initial database, we only calculated the potential distribution range for neophytes (i.e.established aliens introduced after 1500) recorded in more than 10 UTMs.We did not include archaeophytes because the minimum residence time is unknown.We also excluded UTM cells with a land proportion of less than 60% to avoid large differences of land proportion per UTM cell.Overall, our analysis is based on 2401 UTM cells and 78 invasive species (Appendix I).

Environmental data
Environmental data were obtained from different data sources that were originally at different resolutions, but we aggregated each one of them to a 10x10 km UTM grid cell scale by averaging.All the GIS procedures involving the set up of the environmental variables were performed using MiraMon (Pons 2000); mapping was performed with ArcView (ESRI 1992(ESRI -2006)).
The selection of environmental variables was based on preliminary results on variables strongly related to invasive plant species richness in Spain (Gassó et al. 2009).These included 3 climatic variables (minimum temperature in winter, annual temperature range, and summer rainfall), a reduction of 10 landscape variables to 5 using a principal component analysis (PCA), and keeping the first five orthogonal axes (cumulated explained variance = 80%) and one geographic variable (distance to the coastline) (Table 1).
In Spain, distance to coastline encompasses a complex gradient.Coastal areas concentrate the tourism, trading and transport centres, as most of the first records of alien species (Sanz-Elorza et al. 2004, Gassó et al. 2009).Moreover, due to the continental effect and the natural topography of Spain (i.e.high plateau in the centre), there is a climatic gradient from mild climatic, lowland conditions in the coast to contrasted, mountain climate inland.In consequence, distance to the coast is strongly and negatively correlated with annual temperature range (i.e.difference between maximum temperature in July and minimum temperature in January).In order to keep distance to coastline into the model despite its association to annual temperature range, we adjusted distance to coastline by fitting a univariate non-linear regression (generalised additive model with 4-degrees of freedom) with annual temperature range as the pre-dictor variable.We then used the residuals of the univariate regression as a predictor into the model.We followed the same strategy for summer rainfall which was correlated with minimum winter temperature and annual temperature range as predictor variables (for more details on the approach, see Thuiller et al. 2006).

Estimation of potential distribution ranges
Because a precise native distribution was not known for most of the species selected, we estimated the potential range of each species using climatic, geographic and landscape information from their present distribution in Spain (see Wilson et al. 2007 for more details on the approach).
Considering that our goal was to estimate and map the potential distribution of 78 invasive species, it was impossible to find good climatic data from the native range for table 1.Initial set of environmental predictors to estimate potential distribution ranges of 78 invasive plant species in Spain.Landscape variables were reduced from 10 to 5 using a principal component analysis (PCA) and keeping the first five orthogonal axes (cumulated explained variance = 80%).Distance to the coastline and 3 climatic variables were also selected.For distance to the coastline we used the residuals from the regression with annual temperature range as predictor variable.For summer rainfall, we used the residuals from multiple regressions with annual temperature range and minimum winter temperature as predictor variables (for more details see Thuiller et al. 2006).all species.However, notice that we did not solely base our analysis on climatic data but also on geographic and landscape data.These variables account for habitat invasibility and propagule pressure influencing on the degree of invasion.Therefore, even if possibly our models might be climatically conservative they included other relevant landscape variables known to influence the degree of invasion (Vilà and Ibáñez 2011).

Variables
Considering that the grain of the analysis are 10×10 km UTM grids, these maps can be used as tools for risk analysis for the different Spanish administrative regions (e.g.early warning maps for species that have still not invaded a particular administrative region).
The potential distribution range of each species was modelled as a function of the 9 selected environmental variables.All the modelling process was performed using the BIOMOD application implemented under R software.We calibrated 4 models usually described as the most powerful approaches available (Elith et al. 2006, Prasad et al. 2006): generalised linear models (GLM) using a stepwise regression with AIC criteria, generalised additive models (GAM) with four degrees of smoothing using a stepwise regression with AIC criteria, Random Forest (RF) with 2000 trees, and Generalised Boosting Models (GBM) with 3000 trees and an interaction depth of 2. Models were calibrated using 70% of the initial data sets and evaluated on the remaining 30% using the Relative Operating Characteristic (ROC) curve procedure.
To avoid the usual trouble of selecting a particular model, we performed a weighted averaging procedure across our four models as recommended by Marmion et al. (2009).For each species, the four models were ranked according to the area under the ROC curve values (AUC), and only the best three predictions (i.e. from the best three models) were conserved and were awarded 3, 2 or 1 point(s) respectively and then standardized to produce a vector of weights whose elements sum to unity.Final projections consisted in the weighted average of these three simulations.Then, for each species, we transformed the averaged predictions into presence-absence using a threshold maximizing the percentage of presence and absence correctly predicted (Pearce and Ferrier 2000).For these averaged predictions, the accuracy of the simulations was assessed using the area under the ROC curve (AUC).We used the following conservative rough guide for the AUC: AUC<0.8, bad model; 0.8<AUC<0.9,good model and AUC>0.9, very good model.

Statistical analyses
We analysed the relative occupancy as a function of minimum residence time.Relative occupancy was calculated as the proportion of potential distribution range currently occupied by each species.Relative occupancy was expressed as a binary variable with the first column containing the number of UTM cells currently occupied and the second column with the number of potentially suitable UTM cells not yet occupied.Minimum residence time was log transformed before analysis to meet the assumptions of parametric analysis.
Invasive species are a non-random subset of all species introduced (Blackburn and Duncan 2001).Furthermore, species are linked by phylogeny (Harvey and Pagel 1991).Therefore, using species as independent data points may inflate the degrees of freedom and increase the Type-I error.We used Generalized Linear Mixed Models (GLMM) to deal with the phylogenetic effects by incorporating taxonomic categories as nested random factors (Family/Genus).Several sophisticated procedures are available to implement phylogenetic structure in the model, but, in our case, there was not any robust phylogeny available covering all studied species.
Analyses were conducted in the open source R software version 2.5.1 (R Development Core Team 2005).We modelled relative occupancy with a binomial distribution of errors using the glmmPQL of the MASS library on the R statistical package (Venables and Ripley 2002, R Development Core Team 2006).

Results
We are confident that our models to estimate the potential distribution range were very good because for most species AUC>0.9 (Appendix I).As expected, there was a positive relationship between present distribution ranges (CDR) and potential distribution ranges (PDR).However, present distribution ranges only explained half of the variance of the potential distribution ranges (PDR = 309.3+ 0.89CDR, R 2 = 0.53, F (1, 76) = 85.44, p < 0.01).There was a set of species, especially those that currently occupy less than 200 UTM that, according to our models, would have the potential to spread through larger areas than that expected by the linear relationship (Fig. 1).
The species with the widest potential distribution ranges were Sorghum halepense, considered as one of the top weeds in the world (Holm et al. 1977) and Amaranthus retroflexus, also a worldwide invader, both of them invading many different habitattypes (Sanz-Elorza et al. 2004).These two species were introduced more than 100 years ago and exhibit wide present distribution ranges, being spread already in more than half of their potential distribution range.
The mean (±SE) relative occupancy of species was 0.28 ± 0.02; with values ranging from 0.05 to 0.73.Most of the species have not yet occupied half of their potential geographic ranges (Fig. 2).The two species with the lowest relative occupancy were the shrub Senecio inaequidens (0.28) and the herb Tradescantia flumminensis (0.23), and the two species with the highest relative occupancy were the deciduous tree Robinia pseudoacacia (0.73) and the grass Sorghum halepense (0.72) (Fig. 3).
Having accounted for the potential phylogenetic effects, the glmmPQL showed that minimum residence time explained a significant portion of variance in relative occupancy (t = 3.9, p<0.0001).Species introduced earlier had, on average, occupied a higher proportion of their potential distribution range (Fig. 4).However, it is interesting to note that the relationship is not linear, with some species introduced a long time ago (i.e.400-450 years) having still a very restricted distribution with respect to their modelled potential distribution range.

Discussion
We found large differences among species in their potential distribution range in Spain.However, in general, invasive species have not yet reached half of their potential distribution ranges.On average, it takes around 150 years for a neophyte to reach its maximum distribution range in an European country (Gassó et al. 2010) and many invasive species in Spain have been introduced less than a century ago (Sanz-Elorza et al. 2004).
Our calculations of potential distribution ranges for the 78 invasive species are based on climatic conditions in the introduced range in Spain and not in the native range.Theoretically, modelling the potential distribution range of an alien species should be based on climate matching envelopes build with information from the native range where the species is at equilibrium (Jiménez-Valverde et al. 2011).It is possible that recently arrived species in Spain have not yet expanded to available localities of suitable climates.However, climatic information from the native range was not available for most species.Moreover, models build with climatic conditions in the native range assume that the same interactions between biotic factors and climatic factors that limit the range size in the native range operate in the introduced range (Pearson and Dawson 2003).This assumption might not be correct as plant fitness, population performance and distribution range of invasive species are usually improved in the introduced than in the native range (Hierro et al. 2005).It is clear that the interactions between biotic factors and abiotic factors in the native range are different than in the introduced range (Ibáñez et al. 2006, Wilson et al. 2007).Even if our estimations of the potential distributional ranges are probably conservative, we are confident that by including geographic and landscape variables in our models, the predictions are more accurate than by only including climatic variables (Ibáñez et al. 2009).
Results confirmed that relative occupancy was dependent on minimum residence time (Hamilton et al. 2005, Williamson et al. 2009, Ahern et al. 2010).In general, the longer an invasive species in a region, the more it extends into its potential range because it has had more opportunities to be introduced several times at various locations and more time to disperse naturally.In a previous work with a larger subset of species from the same data set, a relationship between present distribution range and minimum residence time was also found (Gassó et al. 2009).However, this association was not significant for species introduced during the last 100 years.Thuiller et al. (2006) also found that minimum residence time did not explain the distribution patterns of invaders in South Africa less than a century of residence.As previously mentioned, species distribution models assume that organisms are at equilibrium with their environment.Nevertheless, this might not be the case for recently introduced species (Václavík and Meentemeyer 2012).These species are listed as invasive in Spain due to their fast population growth at the local scale even if their regional spread might still be limited.The accuracy of potential distribution models for species at early invasion stages and with short minimum residence times is usually lower than for species from late invasion stages (Václavík and Meentemeyer 2012).
Similarly, the association between relative occupancy and minimum residence time is weak for species introduced many centuries ago (Williamson et al. 2009) because there is high uncertainly with old first records.Indeed, in Spain, some species introduced a long time ago have not yet occupied their entire potential suitable habitat.For example, Sophora japonica was introduced 304 years ago, but it currently occupies only 11.1% of its potential distribution range, demonstrating a very low spread rate.The low spread of this species could be related to the history of its use.It was first introduced in the country in the 18 th century but it was not used commonly as an ornamental species until the 20 th century (Sanz-Elorza et al. 2004).Besides minimum residence time, other historical factors such as the intensity and frequency of introduction determine propagule pressure and hence invasion success (Lockwood et al. 2005).For example, market availability (i.e.sold in many nurseries) and frequency (i.e.sold very often) are significant determinants of invasion by traded ornamental plant species (Dehnen-Schmutz et al. 2007).We can therefore suspect that in some species introduced several centuries ago there might be a substantial time lag between the date of first record and spread due to differences in historical propagule pressure (Crooks et al. 2005).
Besides propagule pressure, differences in the potential distribution range might be also explained by differences in their niche breadth (Thuiller et al. 2005) and the availability of suitable habitats for establishment.The potential distribution ranges calculated here are mainly based on abiotic factors defining a fundamental niche sensu Hutchinson and Deevey (1949), while species traits (e.g.reproduction or dispersal), local biotic interactions (e.g.competition, natural enemies, and mutualistic relationships), and geographical barriers for dispersal also influence alien species establishment and spread (Rejmánek et al. 2005, Ibáñez et al. 2006, Pyšek and Richardson 2006).
Our study has been possible because there was reliable spatially explicit data on the present distribution of invasive species (Sanz-Elorza et al. 2004).This empirical information, which has been a compilation of the effort by many naturalists and botanists, has been complemented by modelling approaches to estimate the potential distribution area of the species.More than half of the main invasive species in Spain have not reached their potential area of distribution.However, many species would be able to reach this area in the near future because many species are ornamental and dispersal is favoured by humans (Aikio et al. 2010).We believe that the potential distribution maps of these species are a crucial early warning tool to guide control and eradication plans even if the potential distribution for recent introduced invasive species is possibly underestimated.

Figure 3 .
Figure 3. Maps of present distribution range (black) over potential distribution range (grey) of four invasive plant species in Spain.The two species on top are the ones with the highest relative occupancy (i.e.proportion of the potential distribution range currently occupied) and the ones at the bottom are those with the lowest.Maps for the remaining analysed 74 invasive species are available in Appendix II.

Figure 4 .
Figure 4. Average relative occupancy (i.e.proportion of the potential distribution range currently occupied) for minimum residence time classes of invader plant species in Spain (N = 78).