Predicting the expansion of invasive species: how much data do we need?

Ecological niche models (ENMs) are a powerful tool to predict the spread of invasive alien species (IAS) and support the implementation of actions aiming to reduce the impact of biological invasions. While calibrating ENMs with distribution data from species’ native ranges can underestimate the invasion potential due to possible niche shifts, using distribution data combining species’ native and invasive ranges may overestimate the invasion potential due to a reduced fitness and environmental tolerance of species in invaded ranges. An alternative may be using the increasingly available distribution data of IAS as they spread their invaded ranges, to iteratively forecast invasions as they unfold. However, while this approach accounts for possible niche shifts, it may also underestimate the species’ potential range, particularly at the early stages of the invasion when the most suitable conditions may not yet be represented in the distribution range data set. Here, we evaluate the capacity of ENMs to forecast the distribution of IAS based on distribution data on invaded ranges as these data become available. We further use dispersion models to assess the expansion process using the predicted potential distributions. Specifically, we used the common waxbill ( Estrilda astrild ) in the Iberian Peninsula as a model system. We built ENMs with 10×10 km grid cells distribution records cumulatively for each decade from 1960 to 2019, and yearly bioclimatic variables, to forecast the species potential range in the coming decades. Then, we assessed the performance of the models for each decade in forecasting the species’ observed range expansion in the following decades and evaluated how the number of distribution records determined the quality of the forecasts. Finally, we performed dispersal estimates (based on species traits, topography, climate and land cover) to analyse the prediction capacity of models as their uncertainty may be reduced when projecting them to the next decades. Our results show that invasion-only ENMs successfully forecasted the species’ range expansion over three decades after invasion, while dispersion models were not important in forecasting common waxbill expansion. Our study highlights the importance of constantly monitoring alien species, suggesting that iterative updating of ENMs with observed distribution data may accurately forecast the range expansion of alien species.


Introduction
Biological invasions are among the most worrisome environmental problems in modern times (Díaz et al. 2019).The spread of invasive alien species (IAS) worldwide has been responsible for population declines of native species, changes in community composition (Bellard et al. 2016(Bellard et al. , 2021)), alterations of ecosystem processes and functioning (Ehrenfeld 2010), disruptions of socio-economic activities (Diagne et al. 2021) and public health issues (e.g.Naeem et al. 2009;Fournier et al. 2019;Ogden et al. 2019).In a globalised world, the number of IAS is expected to increase (Seebens et al. 2021), as well as their potential impacts (Fournier et al. 2019;Essl et al. 2020), promoted by the increasing international wildlife trade and global changes (Scheffers et al. 2019;Naimi et al. 2022).As a response to this urgency, several international regulations and mechanisms have been implemented in recent decades to prevent the introduction and spread of IAS.This includes the establishment of a legal framework with specific legislation, such as the EU Regulation 1143/2014 on IAS (Regulation EU 2014).However, the successful implementation of these mechanisms requires the anticipation of new invasion areas which have been hampered by the lack of monitoring data on species distributions at adequate spatial and temporal resolutions (Santana et al. 2023).There is thus a need for a continuing effort to develop approaches, which may include ecological modelling tools, to accurately predict IAS expansion, in order to reduce both the ecological and socio-economic impacts of IAS.
Modelling and projecting the realised niche of IAS in the geographical space allows the identification of the areas at risk of invasion (Jiménez-Valverde et al. 2011;Guisan et al. 2014).The realised niche is part of the fundamental niche, i.e. the abiotic environmental space where a species can maintain a viable population and persist over time without immigration, which is then further limited by biotic interactions, dispersal capacity, or historical aspects (Soberón and Peterson 2005).This assessment is often done through correlative ecological niche modelling (ENM) (Peterson and Vieglais 2001;Jeschke and Strayer 2008;Jiménez-Valverde et al. 2008;Capinha and Anastácio 2011;Venette 2015;Sillero et al. 2021), which quantify species-environment relationships based on observed patterns of species distributions and environmental predictors (Franklin 2010;Peterson et al. 2011;Guisan et al. 2019;Sillero et al. 2021).A procedure of key practical importance concerns the geographical areas used to calibrate the ENMs.For IAS, these models can be calibrated using distribution data from the species' native range (Peterson et al. 2003), thus assuming that the species native distribution represents the entire suite of suitable environments (i.e., distributional equilibrium; Guisan and Zimmermann 2000;Araújo and Pearson 2005;Araújo et al. 2005), or at least, all suitable habitats where the species can disperse (i.e.pseudo-equilibrium; Anderson and Raza 2010; Sillero et al. 2021).However, species' realised niches may shift in new areas or periods (i.e.niche shift sensu Guisan et al. 2014), which implies that IAS will not be necessarily circumscribed to areas that are environmentally analogous to their native ranges (Peterson 2003;Jeschke and Strayer 2008;Elith and Leathwick 2009).This is because, when the environmental conditions change, or the species arrives in a new area, the drivers limiting the species' realised niche can change (e.g. the new area lacks a competing species or the species can now disperse to new habitats), enabling the exploration of new areas inside its fundamental niche (Sillero et al. 2022).Some IAS have shown marked climatic niche shifts during invasion (i.e., a divergence between climatic conditions in native and alien ranges; sensu Broennimann et al. 2009), likely driven by adaptive changes enabling species to endure conditions that were previously unsuitable (Blossey and Notzold et al. 1995), i.e., a shift in its realised climatic niche (Sillero et al. 2022).
Considering the potential for realised niche shifts, previous studies have recommended calibrating ENMs using distribution data of IAS in both native and invasive ranges (Fitzpatrick et al. 2006;Broennimann 2007;Broennimann and Guisan 2007;Urban et al. 2007;Beaumont et al. 2009;Pili et al. 2020).While this approach potentially captures niche shifts as they emerge in invaded areas, the combination of native and invasive distribution data raises relevant practical and conceptual issues.The existence of higher-quality distribution data for the species in one range versus the other (Vanette et al. 2010), may require a reduction in spatial resolution, leading to information loss when merging both data frames (Jarnevich et al. 2022).Although spatial downscaling can be employed to enable modelling at a coarse resolution and projection at a higher resolution, this approach introduces uncertainty due to assumptions regarding the consistent relationships between coarse and fine-resolution data within the area, on the employed methods (Keil et al. 2013).On the other hand, and perhaps more importantly, the use of native distribution data may overestimate invasion ranges.This is because there are circumstances where invasive species may not be able to colonise similar environmental conditions to their native areas.Factors such as novel negative interspecific interactions (e.g., predators, parasites, competitors) (Sih et al. 2010;Dostal et al. 2013;Carthley and Banks 2018), genetic bottlenecks and founder effects, can drive a reduction in the species' environmental tolerances, and species dispersal capacity.These factors cannot be included directly in the native-based model, which will provide the maximum extent of the species distribution in the invasive range if the environmental conditions are the same (Jarnevich et al. 2022).
Invasion monitoring efforts are producing high-quality spatiotemporal data of spread for a large number of IAS in invaded ranges (e.g.Groom et al. 2019;Howard et al. 2022).Hence, given the challenge of reconstructing the invasive process over time for most species, an alternative is to use spatiotemporal invasion data to iteratively forecast invasions as they unfold.The issues raised by using native distribution data are overcome by restricting the calibration of ENMs to the region being invaded.However, any approach relying only on invasive distribution data for calibrating ENMs must acknowledge the likely underestimation of species' potential ranges, particularly at early stages of invasion, when most suitable conditions may not yet be represented in the distribution range data set.In this context, it is pivotal to clarify the data requirements to ensure accurate ENM for IAS, and particularly, the extent to which invasion-only distribution data can be used to accurately predict the expansion of IAS.Modelling the invasion over time will provide information about the routes used by the species during the expansion process.
Here, we evaluate the capacity of iterative calibration of ENM models based on invasion-only distribution data to predict the invasion potential and analyse the expansion process of IAS.We specifically explore the relationship between the number of species distribution records since establishment and the capacity of models to inform about the species' invasion potential.We also assessed the importance of accounting for the dispersal capacity of species to predict their expansion.To achieve these aims, we considered one of the most studied and successful avian invasive alien species, established in different environments and biogeographic regions worldwide: the afro-tropical common waxbill (Estrilda astrild).This species was first introduced to Portugal in 1964, and has spread across much of the country and into part of Spain fed by further introductions (Reino and Silva 1998;Silva et al. 2002).Common waxbill is a small-sized (<12 cm) granivorous finch that forages in low grass vegetation, typically found in open agricultural fields near water bodies (Payne et al. 2020;Ribeiro et al. 2020).The invasive success of this gregarious, non-territorial species may be attributed to its breeding biology and interspecific relations (Ribeiro et al. 2020), including: i) a variable breeding season (Sanz-Aguilar et al. 2015;Payne 2020); ii) the ability to produce several broods a year (Burton and Burton 2002); iii) vagrant movements in search of suitable habitat (flight range<37 km) (Payne 2020); and iv) the lower prevalence of parasites in non-native ranges compared to native regions (Lopes et al. 2018).
Using a unique, high-quality, database on spatial dispersion of the common waxbill through the Iberian Peninsula over six decades (Reino and Silva 1998;Silva et al. 2002;Reino 2005;Sullivan et al. 2012), we applied a backcasting approach, fitting ENMs using distribution data available until the end of each decade and using the resulting model to project the distribution for the next decade.Then, we analysed how the number of observation records used in each ENM was related to the performance of the forecasts of species dispersal over time.Finally, because ENMs do not account for species' dispersal per se (Sillero et al. 2021;Sillero et al. 2022), we also implemented a species dispersal model over time considering a comprehensive set of species traits and climatic and landscape variables (Engler et al. 2012).We discuss these results in light of the amount of distribution data (i.e.length of the time series since establishment) needed for invasion-only ENMs and dispersal analyses to predict the invasion potential of IAS and iteratively forecast future invasions.

Study area
The Iberian Peninsula (southwestern Europe), covers an area of 582,860 km 2 and mainly includes the continental territories of Spain and Portugal (Fig. 1).It is bordered to the southeast and east by the Mediterranean Sea and to the south, north and west by the Atlantic Ocean, and is separated from the rest of Europe by the Pyrenees in the northeast.The Peninsula has a high diversity of climatic conditions, influenced by both the Atlantic Ocean and the Mediterranean Sea, with a longitudinal gradient of precipitation and a latitudinal gradient of precipitation and temperature (Capel 1981).While the Mediterranean climate dominates most of the area, the Oceanic climate occurs primarily in major mountain ranges and isolated southern mountains (Sillero et al. 2009).

Common waxbill distribution data
We gathered historical data on the common waxbill expansion in the Iberian Peninsula since its first introduction in the 1960s.For this, we obtained presence data of the species in the continental territories from Sullivan et al. (2012), including the national and regional breeding bird atlases from Portugal and Spain, updated with all-year-round information from the eBird database (eBird 2019) and authors unpublished data (L.Reino).Presumed new human-mediated introductions were considered whenever an isolated presence of the species was identified or when introduction events were documented through publications or personal communications.Following data compilation, we mapped all records onto a 10×10 UTM km grid of Portugal and Spain (Fig. 1).We then aggregated the data by decade, from 1960 to 2019.Therefore, we represented the species expansion over six decades: 1 st -1960-1969 (9 UTM grid cells), 2 nd -1970-1979 (57 cells),  3 rd -1980-1989 (163  cells), 4 th -1990-1999 (340 cells), 5 th -2000-2009 (760  cells) and 6 th -2010-2019 (1128 cells) (Fig. 1, Suppl. material 1).

Environmental data
We obtained yearly climate data for the temporal period covered by the distribution data from the EuMedClim Database (http://gentree.data.inra.fr/climate/;Fréjaville and Garzón 2018), which provides yearly climate data between 1901-2014 at 1 km resolution for Europe and the Mediterranean Basin.We considered the seven bioclimatic variables available from this source: bio1 -annual mean temperature; bio2 -mean diurnal temperature range; bio5 -Maximal temperature of the warmest month; bio6 -minimal temperature of the coldest month; bio12 -annual precipitation; bio13 -precipitation of the wettest month; bio14 -precipitation of the driest month).From these variables, to minimise cross-correlation between variables, we kept four variables that had an absolute value of Pearson correlation coefficient below 0.7 (Suppl.material 2: table S1, fig.S1): mean diurnal temperature range (bio2), minimal temperature of the coldest month (bio6), precipitation of the wettest month (bio13) and precipitation of the driest month (bio14).As our species occurrences were available at 10×10 UTM km grid squares, we aggregated the values of these variables to match this resolution.

Ecological niche models
We estimated the realised niche of the species (sensu Sillero 2011) every decade, considering, for each model, the species distribution data cumulatively since the first introduction in 1960 (Fig. 1, Suppl.material 2: fig.S2).For this, we assumed that there were no extinctions, which is a reasonable expectation given the 10×10 km resolution, allowing us to use the climate data from the corresponding decade.Then, we projected each model to the following decades (Suppl.material 2: fig.S2).For example, we used the model calibrated with data from the first decade (1960)(1961)(1962)(1963)(1964)(1965)(1966)(1967)(1968)(1969) to project the species' potential distribution in each of the following decades, i.e., 1970-1979, 1980-1989, 1990-1999, 2000-2009 and 2010-2019.We calculated realised niche models using Maxent v.3.4.4 (Phillips et al. 2006(Phillips et al. , 2017) ) following standard procedures (Sillero and Barbosa 2021;Sillero et al. 2021).All Maxent models used the same parameterization.Specifically, we used presence/ background data representing the spectrum of environmental conditions available to the species as dependent variables (Phillips et al. 2009;Guillera-Arroita et al. 2014).Background data consisted of 9196 randomly generated points distributed throughout the entire study area, including pixels where the species occurred.Thus, background points are not equivalent to pseudo-absences (Phillips et al. 2009;Guillera-Arroita et al. 2014).Maxent output represents habitat suitability, ranging from 0.0 (not suitable) to 1.0 (suitable), in Cloglog format (Phillips, et al. 2017).Because Maxent includes stochasticity in the training data random selection, different model runs can lead to slightly different outcomes (Phillips et al. 2006(Phillips et al. , 2017)).For this reason, we used the average of 10 distinct modelling events to obtain the final suitability values for each decade, randomly selecting 70% of the occurrence records as training data and 30% as test data.The models used auto features, where different distribution functions are used depending on the sample size (Phillips et al. 2006(Phillips et al. , 2017)).
We measured model discrimination performance using the area under the curve (AUC) of the receiver operating characteristics (ROC) plots (Liu et al. 2005) and True Skill Statistics (TSS; Allouche et al. 2006).The ROC plot was calculated by representing the sensitivity against 1-specificity for all possible thresholds.It measures the proportion of true positives against the proportion of false positives, i.e., the likelihood that the model will rank a randomly chosen presence higher than a randomly chosen absence.The AUC is an integral of the ROC curve.The AUC discriminates a species' model from a random model, with a value equal to or close to 0.5 corresponding to an accuracy similar to that of a random model and a value of 1 corresponding to a perfect discrimination accuracy.TSS is equal to sensitivity + specificity -1 (Allouche et al. 2006).TSS ranges from − 1 to 1, with 0 corresponding to random classification.AUC and TSS increase with the extent of the study area and are correlated (Leroy et al. 2018).Additionally, we calculated a set of null models following the methodology by Raes and ter Steege (2007).For this, we generated 100 different datasets with the same number of random points as each dataset following a Poisson distribution.We calculated a Maxent model for each of these random datasets and obtained the AUC values of the ROC plots.Then, we compared the training AUC values of the species models with the ones calculated for the null models using the non-parametric Wilcoxon test.We calculated the null models in R 3.4.4(R Core Team 2020) with the 'dismo' package (Hijmans et al. 2017).As AUC and TSS are correlated (Leroy et al. 2018), we calculated the null models using only the AUC.
We used the Boyce Index (Boyce et al. 2002;Hirzel et al. 2006) to measure the level of agreement between predictions generated by the model and actual observations (proportion of presences).The Boyce Index is specifically designed for presence-background algorithms and measures the extent to which model predictions differ from a random distribution of the observed presences across the prediction gradients (Hirzel et al. 2006).The index calculates the proportion between the number of presences predicted by the model that fall into a particular class and the number of presences expected under a random classifier, plotted against the corresponding suitability category.The habitat suitability range is divided into b classes (or bins) to calculate a predicted-to-expected (P/E) ratio (Fi), where P is the predicted frequency of evaluation points and E is the expected frequency of evaluation points.If the model is well calibrated, low suitability classes should contain fewer evaluation presences than expected by chance (Fi < 1) and high suitability classes should contain more evaluation presences than expected by chance (Fi > 1).A good model should have a monotonically increasing curve when P/E is plotted against the mean habitat suitability of each class.The index ranges between -1 and +1: positive values indicate a model where predicted presences are consistent with the distribution of presences in the evaluation dataset; values close to zero mean that the model is not statistically different from a random model; negative values indicate counter predictions, i.e., predicting lower suitability in areas where presences are more frequent.We calculated the Boyce index with ModEvA 3.10 R package (Barbosa et al. 2013).
We determined the contribution of each climatic variable in explaining the species' distribution using a jackknife resampling based on: (1) values of the training and test gain; and (2) of AUC values.The jackknife resampling comprises two steps: (1) the generation of a model with all climatic variables except one; and (2) the generation of univariate models, each using only one climatic variable.In each step, the jackknife analysis measures the change in training and test gain, and the AUC determines the importance of each variable.Using the results from each of these procedures, Maxent calculated an average percentage contribution of each climatic variable.We also calculated the permutation importance: for each environmental variable in turn, the values of that variable on training presence and background data are randomly permuted.The model is re-evaluated on the permuted data, and the resulting drop in training AUC is calculated, and normalised to percentages (Phillips et al. 2006).When variables interact, the variable contributions and the permutation importances are not equally ordered, preventing individual responses for each variable.

Ecological niche models validation over time
We further validated the ENMs for each decade and their respective projections by counting the number of presences classified as presences or as absences.For this, we categorised the continuous models into two categories by applying the threshold 'Maximum training sensitivity plus specificity' for the Cloglog output.We used the presence records of each decade and previous decades, i.e., cumulatively.The total number of presences used to validate the projections of each model was the same (9,114,160,334,752,1120, see Suppl.material 2: table S4).We also validated the projections of all the models using the presences from the last decade (i.e.2010's; 1120 presences, see in Suppl.material 2: table S5).

Dispersal analyses
Accounting for dispersal barriers/capacity has been pointed out as important to reduce uncertainty in future projections of species distribution (Engler et al. 2012).We estimated dispersal movements over time with the R package 'MigClim' (Engler et al. 2012), a cellular automaton model that simulates the dispersal of species in the landscape.MigClim uses ENMs as indicators of landscape permeability: the higher the habitat suitability index, the higher the permeability.We applied MigClim to each decadal Maxent model and respective projections (Suppl.material 2: fig.S2).Therefore, Migclim modelled how the species dispersed between Maxent models, i.e. from the first model to the next projections.The MigClim model considers short-distance and long-distance dispersal events, the type of dispersion through the landscape (using a continuous or a categorical Maxent model), propagule production probability, initial maturity age, and the presence of barriers.We considered three possible scenarios: i) no dispersal barriers; ii) weak barriers (i.e.barriers that can be transposed), and iii) strong barriers (i.e.barriers that cannot be transposed) (Table 1).We assessed the significance of dispersion models in forecasting the expansion of the common waxbill by visually inspecting the maps produced for each scenario.
Dispersal barriers were represented by elevation, land cover, and hydrological factors (Fig. 2).We obtained elevation data from the Shuttle Radar Topography Mission (https://www.earthdata.nasa.gov/sensors/srtm;Farr et al. 2007) at 90 m and aggregated it to 10×10 km cell resolution.Following the results obtained in previous studies in Iberia (e.g.Silva et al. 2002), we considered elevations higher than 800 m as barriers to dispersion.We obtained land cover from the Global Land Cover 2000 dataset with 250 m of spatial resolution from the European Environmental Agency (https://www.eea.europa.eu/data-and-maps/figures/globalland-cover-2000-250m).We considered the land cover classes including tree cover as barriers to the dispersal, as E. astrild is mostly associated with open habitats (Payne et al. 2020;Ribeiro et al. 2020) (Suppl.material 2: table S2).To represent hydrological configuration, we considered the vectorial dataset of free-flowing rivers in Europe from the European Environmental Agency (https://www.eea.europa.eu/data-and-maps/figures/free-flowing-rivers-in-europe).River hierarchy ranged from 1 to 8, with a value of 8 corresponding to large watercourses such as main rivers and 1 to small, often intermittent streams.We calculated the average hierarchy of watercourses in each grid cell of 10×10 km.For this variable, we considered as barriers those grid cells with a hierarchy of 1, reflecting the species' association with permanent water courses (Ribeiro et al. 2020).The final layer of barriers corresponded to the combination (multiplication) of all barrier layers, resulting in a layer classified as 0 (without dispersal barriers) or 1 (with barriers) (Fig. 2).The parameters used in MigClim are shown in Table 1.

Common waxbill expansion patterns
The current geographic distribution of the common waxbill in the Iberian Peninsula spans most of the Iberian Atlantic coast, but also through large areas in Southern and Eastern Iberia extending to the Mediterranean coast, as far as Catalonia (Fig. 1).Although this species was initially introduced around the Lisbon region (Central Portugal) during the 60's, it rapidly spread in all directions during the following decade.Its geographic expansion was also enhanced by more recent and independent introductions, as in the Algarve (southern Portugal), during the 70's, and Andalucia (southern Spain) and later in other regions in eastern Spain, during the 80's (Fig. 1).
The expansion process was faster in the Central and Northern regions of Portugal (70's), whereas the spread in the south seemed to have been boosted by an additional introduction event in the Algarve in the same decade that enabled the colonisation of western Andalucia during the following decade (Fig. 1).This event appears to underlie the colonisation of almost all Iberian southwestern coasts, unifying the northern population -which had also started to colonise Alentejo (Portugal) from the Sado River valley in the 80's -with the southern population (Fig. 1).Spread eastwards was slower, and several areas remain uncolonized to the present, namely the mountainous regions of Northern and Central Spain (Fig. 1).Spanish populations have arisen from independent introductions and the expansion of Portuguese populations directly from the extensive area bordered by the Minho River in Galicia in the 80's, and through the Tagus and Guadiana River valleys in the Southern and Central regions by the 90's (Fig. 1).

Temporal changes in ecological niche projections
ENMs had test AUC values higher than 0.8 and significantly differed from random (Table 2, Suppl.material 2: table S3).All test TSS values were higher than 0.6 (Table 2).The Boyce index exceeded 0.85 for all decades, except the first (0.02).This indicates a consistent prediction of presences by the models, aligning closely with the distribution of presences in the evaluation dataset for all but the first decade, to which the model is not statistically different from random.

Ecological Niche Models validation
Validation of the ENMs of each decade projected to the remaining periods (Fig. 4, Suppl.material 2: table S4) indicated the proportion of the presences correctly classified is >80% for the projections of all models, except for the projections for the last decade based on the models using distribution data of the first and second decades (44% and 50%, respectively).The same pattern was observed in the models using data from the past decade, so the results are presented only in the Suppl.material 2: fig.S3, table S5.
Table 3. Contributions and permutation importance of the bioclimatic variables (mean diurnal temperature range (bio02), minimal temperature of the coldest month (bio06), precipitation of the wettest month (bio13) and precipitation of the driest month (bio14) of the Maxent models.Highest values of variable contribution and permutation importance for each model are highlighted in bold.Temperature, and particularly the minimal temperature, was the most important variable affecting the distribution of common waxbill for all models.

Dispersal analyses
The species' potential range accounting for dispersal capacity increased over time, driven by the results of ENM projections.In the first decade, the range deemed susceptible to colonisation was narrow, and almost all of the Iberian Peninsula was beyond reach.On the other hand, for the last decade, these areas were much wider (Fig. 5).Similarly to what was verified for the ENM, the spatial patterns of the dispersal models have remained quite stable since the fourth decade (1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999).Visual inspection of the maps reveals that there were some differences when barriers were introduced in the dispersal models compared to when they were not (Fig. 5), but the results with weak or strong barriers are very similar.Dispersal models thus confirmed that the species was able to disperse over time following suitable areas identified by ENM.
Our results are in line with previous studies arguing that ENMs may underestimate the species' potential ranges (Liu et al. 2020), particularly at the early stages of the invasion when the most suitable conditions may not yet be represented in the distribution range dataset.The common waxbill is a tropical species sensitive to low temperatures (<15 °C) and wet weather conditions and inhabits open fields with tall grasses, often near water (Ribeiro et al. 2020).While detailed information on the distribution data in the native area is not available for analysis, existing data suggest that the distribution range of common waxbill in the invaded area generally falls within the climatic variation of its native African range, with  S4).Our backcasting approach showed a high forecast capacity of EMNs after the 3 rd decade following the common waxbill establishment (high % of correctly classified presences).1960-1979 1960-1989 1960-1999 1960-2009 1960 -1969 1960 -1979 1960 -1989 1960 -1999 1960 -2009 1960 -2019 expansions observed into colder and rainier areas in Northern Iberia in recent decades (Ribeiro et al. 2020).Our projections based on invasion-only data failed to forecast the current species distribution using data from the first two decades after species introduction, likely because the species range was still not representative of the species' suitable environmental conditions (Araújo and Pearson 2005).The actual niche overlap of the species is probably contributing to the model accuracy of the last decades.However, the AUC has slightly decreased in the models for the last three decades probably due to the increment in the presences: as the species distribution range increases, the species has a more generalist character, making it harder to predict (Guisan and Thuiller 2005;Sillero et al. 2021).Models disregarding the species' global distribution provide worse results than full distribution models (Barbet-Massin et al. 2010;Capinha et al. 2011;Jarnevich et al. 2022).This is because ENM algorithms assume that the species distribution data used is a good representation of the species' environmental requirements (Sillero et al. 2021).In other words, the algorithm assumes that the data used represents a species in equilibrium with the environment, i.e. the species occupies all available suitable habitats where it can disperse (Guisan and Thuiller 2005;Anderson and Raza 2010;Sillero et al. 2021).Distributional data representing only a portion of a species' global range may fail to capture all the suitable conditions where the species can thrive, potentially leading to an underestimation of its potential range.Therefore, modelling the realized niche of an expanding alien species presents significant challenges (Ficetola et al. 2005).It is expected that the ENM for a particular period will fail to forecast imminent range expansion stages, although this does not mean that the ENM is inaccurate (Barbet-Massin et al. 2018).The ENM for that particular period can be accurate, but the ENM does not have enough information to predict the upcoming dispersion process.This was our case: the increment of new suitable areas in ENM projections stabilised from the third decade after introduction (1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999), i.e., only after three decades of dispersion, the species' occurrence data were representative of the species' environmental requirements.The abrupt change between the third and fourth decades might be strongly influenced by new introduction events in the third decade that occurred in regions environmentally different from previous ones, particularly in the East of the Iberian Peninsula.These results thus suggest that modelling expansion based on the early stages of introduction may provide limited results, demanding the interactive recalibration of models as new distribution data becomes available.
Contrary to expectations, our results suggest that barriers to dispersion were not insuperable by the common waxbill, although they might be important for other species with lower dispersal capacity.MigClim considers long dispersal events where the species reaches new locations without human intervention.In that case, the species only needs to arrive at a pixel with enough suitability.The few differences found in the projections using the ENM-only and dispersion models (with strong and weak barriers) indicate that the species was able to disperse over time following the suitable areas predicted by ENMs.In line with the commonly observed lag period during the invasion process (generally attributed to the exponential growth process, stochastic extinction of propagules, or an evolutionary modification of species following establishment, Sakai et al. 2001), previous studies have suggested that the dispersion capacity of the common waxbill across Iberia was very low in the first decades after the first introduction (e.g., Silva et al. 2002;Reino 2005).Justifications for this lag period for the common waxbill relied on the stochastic extinction of propagules due to the absence of favourable habitat conditions (i.e.agricultural fields near water bodies, Ribeiro et al. 2020) out of the areas where the species was first introduced.This was based on the slower colonisation process in the southern regions of Portugal, where initial populations were very small and limited to the Tagus valley around Lisbon and the westernmost region of Iberia (Portugal) and an acceleration after the 80s.According to these studies, the additional introductions across Iberia might have jointly fostered higher dispersion rates to new areas, suggesting that the dispersal capacity of the common waxbill in Iberia is a combination of both habitat suitability and propagule pressure.However, our results suggest that this is likely to be a consequence of insufficient data to capture the species' environmental requirements during the first decades, as they projected a potential for expansion lower than the real one.
While the common waxbill has been a highly successful avian invader across various continents and islands, its spatial spread appears more limited in comparison to other, predominantly older, Palearctic invaders worldwide, such as the European starling (Sturnus vulgaris) or the house sparrow (Passer domesticus).However, it seems to show a rather eclectic adaptation as these last two species colonise a great diversity of open and semi-open habitats, but not limited to human-made habitats (e.g., agricultural habitats, gardens), but also to wetlands.Probably, its expansion is more comparable with a more recent invader: red-billed leiothrix (Leiothrix lutea) in Europe.Though this species is more associated with forest habitats and is likely to be more limited to tree-based habitats, it has been a very successful and established invader with populations in several European countries and regions (Pereira et al. 2019).It is worth noting that both the European starling and the house sparrow have much older introductions, with established populations in regions like North America and Australia.For instance, the European starling was successfully introduced to multiple areas during the same period, as in Australia (Stuart et al. 2023).Variances in propagule pressures may offer a partial explanation for differences in range expansion.

Conclusions
Accurately anticipating the expansion of IAS is key to ensuring the successful implementation of preventive and mitigation actions.Forecasting invasions by means of different quantitative methods and modelling strategies have been used in the last decades, and new approaches are constantly emerging (Peterson 2003;Reino et al. 2009;Jiménez-Valverde et al. 2011).However, predictions may be severely compromised by different methodological options and their specific limitations.ENMs are powerful tools for predicting the spread of IAS and guide management.Although ENM enables predicting and evaluating biological invasions, it is often compromised by the amount (time-series length), quality (spatial and temporal resolution) and availability (data access) of distribution data in both native and invaded ranges.Our study evaluates the capacity of ENMs based on spatiotemporal data of invaded ranges only to forecast the potential distribution of IAS using common waybill dispersion analysis as a case study.We demonstrate that invaded range-only data may be used to accurately project the expansion of alien species in novel regions if enough time (at least three decades in our study model) is given to allow the species to expand and occupy the most suitable conditions.These results indicate that the invaded ranges-only ENMs are of limited utility in the early stages of invasion, while they clarify the need for using an iterative approach where models are recurrently updated with the most recent distribution data of the species since establishment.This approach will contribute to a better understanding of climatic niche changes during the expansion process of alien species, and offers a solution to managers and scientists dealing with the scarcity and asymmetry of distribution data available for alien species worldwide, in their native and invaded ranges.Our study helps solve a much-discussed conundrum and offers a practical solution to better guide management actions and significantly improve stakeholders' ability to halt biological invasions.

Figure 1 .
Figure 1.Location of the study area in the Iberian Peninsula, southwest Europe (upper panel) and distribution of the common waxbill Estrilda astrild in each decade from 1960 to 2019 (bottom panel) and the accumulated number of presences for each decade (n = #, bottom right of each map).

Figure 2 .
Figure 2. Variables (elevation, land cover, and river hierarchy) used to define the barriers to dispersal through the landscape, and the barriers used in MigClim to measure the dispersion across the landscape.Blue means barrier; white means no barrier.

Figure 3 .
Figure 3. Results of cumulative ecological niche models projecting habitat suitability to the next decades.The suitability maps are organised as in Suppl.material 2: fig.S2: each decade is a row; models (light blue background) and projections (yellow background) are placed in columns.Habitat suitability ranges from dark blue (low suitability) to red (high suitability), following the turbo palette.

Figure 4 .
Figure 4. Percentage of presences incorrectly (red) and correctly (green) classified over time, for each model (blue background) and projection (yellow background).Validation for each model was conducted using the number of cumulative presences from the previous decade(s) (for details see Suppl.material 2: tableS4).Our backcasting approach showed a high forecast capacity of EMNs after the 3 rd decade following the common waxbill establishment (high % of correctly classified presences).

Figure 5 .
Figure 5. Results of dispersal models per decade and type of barriers (no barriers, weak barriers, and strong barriers).The colour sequence Blue -> Green -> Light Green -> Yellow indicates the dispersal of species over time in each decade.Yellow indicates areas where the species did not have time to arrive.Purple indicates areas where the species cannot occur because habitat suitability predicted by ENMs was low.

Table 1 .
Parameters and values used in dispersal models.

Table 2 .
Results of AUC, TSS, and Boyce index for models considering the accumulated presences since the first introduction.