Valuing the information hidden in true long-term data for invasion science

Invasive species pose a significant threat to global biodiversity and human well-being. Despite the widespread use of long-term biomonitoring data in many natural science fields, the analysis of long-term time series with a focus on biological invasions is uncommon. To address this gap, we used twenty macroinvertebrate time series from the highly anthropogenically altered Rhine River, collected over 32 years from 1973 to 2005. We examined the adequacy of the data in capturing non-native species trends over time and explored trends in alpha, beta, and gamma diversity of non-native species with several climatic and site-specific predictors. Our findings revealed that the data adequately captured a saturating non-native species richness over time. Additionally, we observed an increase in both alpha and gamma diversity of both native and non-native species over time, with a recent dip in trends. Beta diversity trends were more complicated, but eventually increased, contrasting trends in native species beta diversity. Our applied models indicate that in this highly altered ecosystem, climatic shifts were insignificant, while time was the primarily driving factor. Proximity to anthropogenic structures and the distance to the outlet were the only site-specific predictors facilitating non-native species diversity. These findings highlight the value and importance of long-term time series for the study of invasive species, particularly long-term invasion dynamics and once again underline that naturality of ecosystems precede the effect of climate change.

species for resources, altering ecosystem processes, and causing extinction of native species (Mack et al. 2000;Simberloff et al. 2013). While not all nonnative species will ultimately become invasive (Catford et al. 2016), climate change (Hellmann et al. 2008) and human-mediated ecosystem change (Sala et al. 2000) will probably increase the likelihood of non-native species to establish, spread, and to cause notable impacts (D'Antonio et al. 2020).
The impacts of non-native species are far-reaching and can have significant economic consequences by damaging crops, fisheries, and forestry, leading to billions of dollars in losses each year (Diagne et al. 2021). Despite the significant impacts of non-native species, there is still much that is not understood about their ecology and the factors that contribute to their success (Simberloff et al. 2013;Courchamp et al. 2017). Past studies investigating the ecology of nonnative species have primarily relied on historical or anecdotal data (see e.g., Haubrock et al. 2021;Clavero 2022), rather than long-term monitoring (but see Soto et al. 2023a). This has resulted in a lacking, and incomplete understanding of the long-term impacts of non-native species on ecosystems and biodiversity (Strayer et al. 2006). As such, impact assessments and the projection of future impacts of non-native species have mostly focused on local scales and momentary recordings, rather than robust records from the past and present (Seebens et al. 2017;Vilizzi et al. 2021). However, recent studies have shown that long-term data from biomonitoring can be of unimaginary value to invasion scientists, managers, and stakeholders Soto et al. 2023a, b, c).
The importance of long-term data obtained from biomonitoring cannot be overstated: Biomonitoring data contains information on past records and changes in abundance and richness over time at local and large scales (Stork et al. 2017;Hulme 2022). By combining a multitude of monitored sites, biodiversity change can be studied over a broad spatio-temporal gradient (Lepš et al. 2016). By studying changes in abundance and richness, scientists can gain a better understanding of the factors that contribute to the success of non-native species. Biomonitoring data can further provide invaluable insights into the ecology of non-native species and their impact on ecosystems. It can reveal how ecosystems have changed over time, as well as the factors that have contributed to those changes. In particular, long-term data from highly invaded regions can be of utmost relevance as it may explain shifts in previous trends of nonnative species richness (Wright 2011). When linked with environmental (e.g., hydromorphological or climatic) explanatory variables, this data can be used to explain trends in respective changes in non-native species diversity over time and thereby reveal predictors of future invasion success (Bertocci et al. 2013); and if originating from an already anthropogenically stressed system which are commonly less speciesrich (McKinney and Lockwood 1999), revealing the importance of climate change or site-specific characteristics (Alexander et al. 2015). By understanding which factors contribute to the success of non-native species, managers can develop more effective control strategies and develop mitigation strategies to deter the spread of non-native species (Pyšek et al. 2012).
Long-term data obtained from biomonitoring can be of considerable value to invasion scientists, providing needed insights into the dynamics of nonnative species and their impact on ecosystems over time. Here, we used continuously collected aquatic macroinvertebrate biomonitoring data from the Rhine River, a highly anthropogenically altered river in Germany invaded by numerous non-native species (Le Hen et al. 2023), to investigate changes in the local and regional presence of non-native species over time and how trends can be explained by external drivers (i.e., climatic shifts or hydromorphology; Hellman et al. 2008). We hypothesize that in such an artificially altered system, (1) neither climatic nor environmental (i.e., site-specific hydromorphological) changes facilitate the presence of non-native species locally or regionally, and (2) that continuous biomonitoring can identify the point at which saturation in non-native species occurs, i.e., when all non-native species are identified. Our results should hence be seen as a first investigation of the value of 'true' longterm biomonitoring data for invasion scientists.

Data
To investigate the temporal dynamics of non-native species in the Rhine River, we selected time series from this drainage within a recently collated database of long-term macroinvertebrate time series (Haase Valuing the information hidden in true long-term data for invasion science et al., unpublished data; but see Haubrock et al. 2022), reporting the abundance of macroinvertebrate taxa in streams and rivers across 22 European countries. We specifically selected the Rhine River, as it is one of the most heavily anthropogenically impacted freshwater rivers in Europe with a total length of 1250 km (Rheinhold and Tittizer 1997; Van der Velde et al. 2000;Uehlinger et al. 2009), and provides numerous services such as transportation, power generation and drinking water (Cioc 2002;Uehlinger et al. 2009). The opening of the Rhine-Main-Danube canal in 1992 links the Rhine catchment via the Main tributary with the Danube River and serves as a major pathway facilitating the spread of alien species (Rheinhold and Tittizer 1997;Bij de Vaate et al. 2002;Balzani et al. 2022). Each time series comprised macroinvertebrate assemblages collected at a single site over a minimum of eight years.
We identified 45 time series from the Rhine River ( Fig. 1) and covered a period of 33 years (1968 to 2007). Starting and ending years of time series, as well as their length varied among time series (Supplement 1). These gaps could lead to biased or inaccurate interpretations and thus, to ensure the comparability of trends and patterns over time among time series, we selected a common period of all time series between 1973 to 2005, retaining a total of 20 time series. Within these time series, macroinvertebrates were sampled consistently over time following the German standard protocol DIN 38410 (Arndt et al. 2009). This protocol includes both qualitative and quantitative methods for collecting and identifying macroinvertebrates, involving the use of a variety of sampling methods (e.g., kick sampling and hand sampling). We then kept only entries identified at the species level to ensure the homogeneity of the taxonomic resolution.

Filling gaps
As macroinvertebrate time series data was not annually sampled (i.e., there are missing years), we decided to fill the annual gaps to minimize the effects of missing years on successive analyses by using the mice function of the mice R package (Van Buuren and Groothuis-Oudshoorn 2011). This function relies on multiple imputations by chained equations (Van Buuren and Groothuis-Oudshoorn 2011) and is commonly implemented with specifying Generalized Linear Models (GLMs) for the univariate conditional distributions (Raghunathan et al. 2001;Royston and White 2011) to impute different types of variables. We used the mice function to impute missing years in species-specific taxa and their respective abundances over time using predictive mean matching for numeric data (Van Buuren and Groothuis-Oudshoorn 2011).

Non-native species saturation
To identify alien species in each time series, we verified the natural range of these species by checking three main sources: (1) Web of Science (https:// webof knowl edge. com/), (2) Invasive Species Compendium (CABI, https:// www. cabi. org/ ISC), and (3) the Global Biodiversity Information Facility (GBIF; https:// www. gbif. org/). To determine whether sample size (i.e., number of time series over time) was sufficient to describe the presence of non-native species in the Rhine River and hence to identify a possible saturation in the detection of non-native and native species, both native and nonnative species occurrences were separately plotted against the cumulative number of time series investigated per year. For this, we used the specaccum function of the vegan R package (Oksanen et al. 2013), randomizing each time series ten times (Ferry and Cailliet 1996;Ferry et al. 1997). Cumulative curves were considered to be asymptotic if ten previous values of the total number of taxa were within ± 0.5 of the range of the asymptotic number of taxa, indicating the required minimum of monitored time series years to describe the diversity of non-native and native species (Huveneers et al. 2007).
Native versus non-native diversity over time We calculated three common measures of biodiversity over spatial scales: alpha (α), beta (β), and gamma (γ) diversity for both the native and non-native communities. Alpha diversity represents the diversity of species within a community (i.e., in each time series) and γ represents the overall number of species at large scale (e.g., Rhine River), while β diversity represent the differences in species composition among communities, estimated as the ratio between γ and α diversity (sensu Whittaker 1972), as a simplified yet reliable estimate (Andermann et al. 2022).
To analyse trends in α, β, and γ diversity, we used Generalized Additive Mixed Models (GAMMs) using the R-packages mgcv (Wood and Wood 2015). We included a set of climatic, site-, and region-specific characteristics that to some degree reflect anthropogenic interferences ) that may modulate the temporal trends of α, β, and γ diversity trends. We considered: (1) runoff, expressed as the annual Q (mm) which was extracted from the Terr-aClimate dataset at 4-km spatial resolution (Abatzoglou et al. 2018), (2) the elevation of each site (MERIT Hydro digital elevation model; Yamazaki et al. 2019) at 90-m spatial resolution, (3) the sites' local slope that was extracted using the r.stream.slope function (Hydrography90m; Amatulli et al. 2022), (4) the distance to the next weir or barrier (Global Reservoir and Dam Database; GRanD v1.3), and (5) the distance to the outlet was extracted using r.stream. distance function (Jasiewicz and Metz 2011) as sitespecific predictors. We further extracted (6) mean daily temperature and (7) total daily precipitation data from the E-OBS gridded European scale observationbased dataset (spatial resolution: 0.1°; Cornes et al. 2018), and considered the average monthly temperature and precipitation of the 12 months preceding the sampling data as climatic predictors. Finally, (8) we included the average native species α and γ diversity, as native species richness can indicate competition (Sagouis et al. 2015), biotic resistance (Jeschke et al. 2018), and habitat degradation (Mokany et al. 2020), therefore, the vulnerability of the ecosystem to invasion ("invasibility", Hui et al. 2016).
In order to select the "best model" possible (i.e., better explain the relationship between the response and predictors variables), we performed a model selection implemented in the glmulti function of the glmulti package in R (Anderson and Burnham 2004;Calcagno and de Mazancourt 2010). We found that based on the lowest corrected Akaike Information Criterion (AIC), 'elevation' was the only non-relevant predictor, which was also subject to collinearity (Morlini 2006) and variance inflation (VIF, Craney and Surles 2002). The threshold for detecting collinearity among variables was established at a value of four, and thus, the variables exceeding this threshold were considered to be highly correlated and were evaluated individually for their ecological importance (i.e., were kept if considered important). The collinearity was evaluated using the vifstep function of usdm package (Naimi 2015).
Hence, each response variable (i.e., α, β, and γ diversity of non-native species) was analysed as a function of climatic and site-specific characteristics, i.e.: 'year', 'mean monthly temperature', 'mean monthly precipitation', 'the site's slope', 'distance to the next weir', 'distance to the outlet', 'runoff', and finally 'native species α diversity' and 'native species γ diversity'. In addition, we included the 'site ID' as a random effect to account for spatial variability among time series. We used a negative binomial distribution (with "log" link to the predictor) as appropriate for data when the residual variance is found to be larger than the mean which is in ecological data commonly known as overdispersion (White and Bennetts 1996;Wood 2008). All predictors were checked to ensure that variance inflations were lower than four, to exclude the possibility of collinearity, and that all predictors were not affected by nonlinear correlations or dependencies (concurvity; Wood 2008). Lastly, the importance of the variables in each model were evaluated based on the χ 2 and p of the respective model output using the gg_vimp function of the randomForestSRC R package (Ishwaran et al. 2023). All analyses were performed in R version 4.2.2 (R Core Team 2020).

Results
In total, we recorded 35 native and 14 non-native species at the species level (excluding genus-level information) in the Rhine River time series from 1973 to 2005 (see Supplement 2). Our species accumulation curves showed that observations of new non-native species tended to reach saturation earlier than observations for native species, with non-native species reaching their asymptote after an average of 14.0 monitoring years compared to 41.1 monitoring years for native species (Fig. 2).
Focusing on diversity metrics, we observed a similar pattern in the α and γ diversity of non-native species (Fig. 3a, c). Both showed a continuous increase until reaching their peaks between 1995 (γ diversity) and 2000 (α diversity). While the γ diversity of native species matched that of non-native species in terms of trend and percentage increase, the α diversity of native species peaked much earlier (1985 vs. 2000). Non-native α diversity increased from, on average, one non-native species (Dreissena polymorpha) in 1973 to an average of 5.3 non-native species per time series in 2000 (an increase of 530%). Beta diversity (β diversity) of native and non-native species showed contrasting patterns, with non-native species decreasing towards the mid-1980s (and native species β diversity increasing) before rising towards the end of the available period (and native species β diversity declining; Fig. 3a).
Examining non-native species α, β, and γ diversity, we observed a consistent pattern of change over time, which was statistically significant (p < 0.05; Supplement 3). The 'year' was found to be the most significant predictor of change across these dimensions. Moreover, γ diversity of native species was also significantly relevant in determining changes in α and γ diversity of non-native species, while the distance to the outlet was found to be a significant predictor of α diversity of non-native species (as shown in Fig. 4a). We further found that across the monitored period, the ratio between native γ and α diversity was larger than the ratio for non-native species (2.21 > 1.72).

Discussion
Despite the importance of long-term data in ecology, a limited number of studies in the field of invasion science focus on large-scale trends in nonnative species diversity over time using long-term time series data from biomonitoring efforts (but see

c)
Native community Alien community Haubrock et al. 2022Haubrock et al. , 2023aSoto et al. 2023c). This is mostly due to a lack of available and suitable data. However, we found that the information hidden within long-term time series data for studying non-native species is substantial, even when the taxonomic information was lowered due to the exclusion of non-species-specific information. To the best of our knowledge, no other work has investigated trends in non-native macroinvertebrate species diversity (α, β, or γ) using high temporal resolution data over time from a highly degraded system like the Rhine River in Germany (but see Le Hen et al. 2023). Therefore, we believe that our study is the first of its kind. A recently published study on the non-native fish community of the Rhine River (Le Hen et al. 2023)-albeit subject to substantial data gaps-found resembling trends to ours, i.e., that non-native species were increasingly occupying a larger proportion of the community. Indeed, within the twenty time series analysed from the Rhine River in this study, we found that at the identified species level, non-native macroinvertebrate species contributed almost half of the overall alpha and about 1∕3 of gamma diversity at their respective peaks. Furthermore, native diversities decreased stronger than non-native diversities after reaching their respective peak, indicating a potential turnover toward a non-native species dominated community (Haubrock et al. 2021). While we acknowledge that the disregard of species information at the genus level results in the loss of a considerable amount of information, this may also be the case for non-native species which themselves also suffer from an identification bias, i.e., them often not being identified correctly (Gurevitch et al. 2011). In many cases, temporal dynamics of native and non-native species were synchronized, i.e., following each other. Non-native species α diversity followed similar patterns as native species α diversity-thereby indicating the presence of an underlying driver (e.g., climate warming) affecting both-albeit non-native species trailing native species α diversity trend by finding a peak substantially later and decreasing less from its peak when directly compared. While we cannot make any inference as to why native species α diversity declined after its peak, the weaker decline in nonnative species might however indicate their resilience toward ecosystem change (Pyšek et al. 2020). It is also possible that unfavorable conditions can severely affect native species reducing their abundance or even leading to local extinction and creating empty niches, which in turn provide new opportunities for nonnative species to establish (Catford et al. 2016), or that competitive interactions between native and nonnative species resulting in the decline of native species (Davis et al. 2020). As such, the concomitant pattern in native and non-native gamma diversity, similar to the comparable trend in α diversity, hints towards the effect of e.g., global warming, which increases the productivity of freshwater ecosystems that then allow a higher richness and larger abundances. While we cannot exclude the possibility that the recent decline in native species diversity was caused by unfavorable climatic conditions, temperature and precipitation were not limiting factors for either native or non-native species (Hellman et al. 2008). This also indicates that in highly anthropogenically altered ecosystems, other non-climatic drivers may make the difference in native and non-native species changes (Vitousek et al. 1997).
Indeed, the applied models did not reveal any significant climatic effects on non-native species diversity. However, the results suggest that native species' γ diversity, which represents the total regional native species richness, was a significant predictor of a site's α diversity and the regional non-native γ diversity. While a positive increase in native γ diversity is unlikely to favor higher invasion rates, the positive relationship between native species γ diversity and non-native α and γ diversity is intriguing. This relationship may reflect (1) a higher identification bias, which positively affects non-native species identification as well, while it may also (2) be due to favorable environmental conditions that promoted higher detection rates, or (3) reflect general shifts in species richness in an already anthropogenically altered environment. Remarkably, our findings revealed that the γ:α diversity ratio of native species was larger than that of non-native species, implying that non-native species exhibited a more uniform distribution across the studied stretch of the Rhine river compared to native species, which is suggestive of the prevalent ecological profile of non-native species (Leprieur et al. 2008). Although only the change over time was found to be significant for β diversity, non-native α diversity was also predicted by an increasing distance to the outlet. The continuous invasion of non-native species from the Ponto Caspian region in the Rhine River since the opening of the Rhine-Main-Danube canal in 1992 may explain why the distance to the outlet reflects the ongoing downstream spread of non-native species. Hence, our results suggest that the effects of anthropogenic disturbance that caused severe degradation of habitats outweigh climatic effects (Vitousek et al. 1997;Hulme 2007), or that in an already highly anthropogenically altered system, climate effects are of lesser magnitude. This, once again, underlines the importance of restoration projects to revert ecosystems to their natural condition (Strayer et al. 2005;Sinclair et al. 2022). Long-term biomonitoring data analysis has hence become a critical component of ecological science and should be considered by invasion scientists. Our findings suggest that 20 long-term monitoring sites along the Rhine River repeatedly monitored over the study period were sufficient to identify all non-native species, indicated by saturation curves having reached their respective asymptote. This highlights the significance of having a network of long-term monitoring sites with high spatio-temporal resolution, such as eLTER (Mirtl 2018). Interestingly, the native species accumulation curve reached its asymptote considerably later, possibly indicating better species-level identifications or the immigration of native species from the surrounding areas. The Rhine River is frequently used for shipping and likely serves as a steppingstone for the introduction of new alien species upstream (Leuven et al. 2009). Therefore, long-term biomonitoring throughout the entire Rhine River catchment could prove to be an effective way not only to identify non-native species introductions but also to mitigate their spread (Feld et al. 2011). Our results thus support the value of long-term biomonitoring in identifying non-native species.
Funding Open Access funding enabled and organized by Projekt DEAL.
Data availability Data used in this work will be shared upon reasonable request.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. Valuing the information hidden in true long-term data for invasion science