A digital GIS update to the classic Ulbrich’s 1930 map of muskrat ( Ondatra zibethicus ) population range expansion in Central Europe

Historical data sets are valuable for learning and developing new tools for understanding and predicting species spread and dispersal. We update the classic Ulbrich (1930) map of muskrat ( Ondatra zibethicus ) spread in Central Europe, converting it to a set of GIS maps. We illustrate the updated data by fitting a set of simple models that measure the rate of spread of species. We recover the classic findings of radial spread from previous studies and, with the finer-scale data, show when the traditional radial assumption of spread breaks down. We believe that our version of the Ulbrich data is a valuable tool for pedagogy and a type of digital playground for the design and development of new tools to estimate species spread and dispersal.


Introduction
Non-native or invasive species pose significant ecological and economic burdens (Early et al. 2016), having the potential to cause shifts in species richness and ecosystem function by predating on or competing with native species for resources (Strayer 2012), hybridization and introgression (Rhymer and Simberloff 1996), physically damaging ecosystems (Campbell and Long 2009), and altering nutrient dynamics (Chisholm 2012;Cucherousset and Olden 2011;Ehrenfeld 2010).In order to make informed management decisions, policy makers and managers require accurate information on where and how an organism is spreading.Indeed, accurate and effective detection are key components to the least-cost management of non-native species (Epanchin-Niell et al. 2012;Homans and Horie 2011;Mehta et al. 2007), where a policy is "least cost" in the sense that it is the least expensive policy considered or it has the lowest impact in terms of disturbance to or welfare of other wildlife.Similarly, when evaluating a particular management action such as the removal or stocking of individuals in an area, managers need some "objective function" to gauge the effectiveness of their actions on the species in question and for the interests of local stakeholders.
Historical data sets are valuable resources for learning and developing tools for understanding and predicting species spread and dispersal.Ulbrich's (1930) data set of muskrat spread (Ondatra zibethicus) throughout Central Europe during the early 1900s is a seminal data set for species exhibiting constant spread from a single point of introduction.Since its original publication, the data have been used in analyses by Skellam (1951), Elton (1958), Williamson et al. (1986), Hengeveld (1989), Andow et al. (1990), Williamson (1996), and Shigesada and Kawasaki (1997), among others, and still remains a classic data set in this field.Yet to the best of our knowledge, while the dispersal maps themselves may have been digitized, the data itself remains coarse.We obtained an original copy of Ulbrich (1930), digitized the dispersal data, and constructed a high-resolution geographic information system (GIS) map of the population range expansion.
We illustrate the use of the data by fitting a set of simple models that measure the spread of a species.Models that measure the spread of an organism range from the highly abstract to data-driven statistical models to more-recent machine learning and spatial mapping techniques (Hastings et al. 2005;Liebhold and Tobin 2008).Most models in this literature include local spread dynamics which can be approximated with an assumption of a constant, radial rate of spread around an initial point of introduction (Hengeveld 1989;Okubo and Levin 2001;Shigesada and Kawasaki 1997;Williamson 1996).Analyses of these simple models of local spread dynamics have the advantage that they are straightforward to implement and understand, are easily applied to empirical data, and can provide reasonable first approximations of complex models (Andow et al. 1990;Hastings et al. 2005;Okubo and Levin 2001).
Every model of the spread of an organism is a phenomenological model fit to data and is subject to the assumptions inherent to the construction of the model.Taken together with the ecology of an organism, these assumptions affect the interpretation of spread and make the resulting estimate more or less reliable.Indeed, we recapture the results from previous analyses, but -due to the finer-scale information in the updated digitized data set -show that the assumptions of these simple models break down when we dig deeper into the details of muskrat range expansion over time.However, while few studies explicitly compare models of spread (Hastings et al. 2005;Liebhold and Tobin 2008), taking a set of simple models designed to capture different aspects of spread can still reveal a clear picture of the evolution of the population range over time.
We believe that this data set is particularly valuable for pedagogy, and it could certainly be used as a baseline when testing simple and complex spread models, developing new approaches for measuring spread, or for training or calibrating machine learning models.

Materials and methods
Using the Worldshare Interlibrary Loan service provided by the Online Computer Library Center (OCLC), we obtained an original copy of Ulbrich (1930).The dispersal data-in the form of a physical, hard-copy map-were scanned and converted into a digital image file (Figure 1A).The present-day names of every city were identified and the image loaded into Arc Map 10.2.We geo-referenced the dispersal data by overlaying the image onto a current map of Central Europe and used the city names as reference points to accurately "fix" the dispersal map to the base layer of present-day Europe.For each year of the data, we traced a polygon of the population range of the species (Figure 1B).To the best of our knowledge, this is the highest resolution version of the data and the only version available in a geographic information system (GIS) format.
Since the data of Ulbrich (1930) consist solely of a range map, we have no information of the individual observations of species presence/absence used to originally construct the dataset.However, if we assume that there were sufficient presence observations at the periphery of each year's range to construct the original maps, then we can take the vertices of each polygon-or the set of individual points which, once connected, define the shape of the population range-as observations of species occurrence (Supplementary material Figure S1).This is an approach similar to the "microscale observations" referred to in Andow et al. (1990).
To demonstrate an application of the data, we fit a series of models that measure the spread of an organism, often referred to as "spread [rate] estimators" or "metrics" and "models of spread" in the literature.This literature is vast, and we would direct the reader to books by Hengeveld (1989), Williamson (1996), and Shigesada and Kawasaki (1997) and papers by Hastings et al. (2005) and Liebhold and Tobin (2008) for comprehensive reviews.Specifically, we fit two types of models of spread: those that explicitly measure the population range ("population range metrics") and those that measure the how far the organism has dispersed from its initial point of introduction ("distance metrics").The site of initial introduction is rather ambiguous, with reports including a pond in the estate of Prince Colloredo Mannsfeld in Dobris (Becker 1972;Williamson 1996) and a farm near Prague (Shigesada and Kawasaki 1997).Therefore, since we have no concrete record of the site of initial introduction, we take the center of the first population range as the point of introduction.We re-evaluate muskrat spread at each time point in the data.
It is worth noting that there are questions regarding the completeness of the original data, specifically missing data along the western border between Bohemia and the Austro-Hungarian empire (Andow et al. 1990;Skellam 1951;Williamson 1996).For now, let us treat the data as complete.We will return to this point later in the Results and Discussion.

Population range metrics
We take advantage of the updated format of the data to calculate the population range area and border length over time (polygon area and length) (Andow et al. 1990;Shigesada and Kawasaki 1997;Skellam 1951).We then calculate the perimeter to area ratio of each polygon, which acts as an indicator of the complexity of the population range (Levin et al. 2009;Wu and Hobbs 2007).We complement this with the patch shape index, another well-established index of patch complexity (Forman 1995;Wu and Hobbs 2007).The patch shape index is measured as the perimeter divided by two times the square root of the area times π, the inverse of which is often referred to as "compactness" in landscape ecology.It is equal to 1 for circles and about 1.2 for rectangles.We compare our polygons to a circular population range by taking the maximum dispersal distance-the farthest observation of species occurrence from the origin-as the radius of the circular population range, and use it to calculate range area, circumference, and perimeter to area ratio.The circular model can be thought of as a kind of null model with which to test the hypothesis of constant, circular range expansion.The greater the polygon deviates from the circular model, the greater the complexity of the shape of the population range, and the less likely that the assumption of constant, circular spread holds.

Distance metrics
Many of the models of local spread dynamics make the assumption that the species population range uniformly expands at a constant rate from a single point of introduction, i.e. circular or radial range expansion.Species dispersal occurs uniformly in all directions.
If we implicitly assume circular range expansion, then the square root of the range area divided by π yields the average dispersal distance of an organism at a given moment in time (Andow et al. 1990;Shigesada and Kawasaki 1997;Skellam 1951;Williamson et al. 1986).Further dividing by the time since introduction gives the spread rate in units of distance (kilometers) per unit time (year).
Alternatively, we can compare this to a set of models that explicitly assume a circular range expansion.We construct lower and upper bounds on the rate of spread as per Suarez et al. (2001).We assume that dispersal occurs uniformly in all directions and calculate the minimum dispersal rate as the difference between the farthest observations from the origin at any two points in time (t and t+Δt) divided by the change in time between those observations (Δt).The maximum dispersal distance is given by the farthest observation from the point of origin at a moment time.It is interpreted as the maximum possible distance an individual could disperse in one year.We augment these bounds by an intermediate radial measure of spread defined as the farthest observed dispersal distance at time t divided by the time since introduction (Caughley 1970;Shigesada and Kawasaki 1997;Suarez et al. 2001).A visual illustration of the calculation of these bounds can be found in Figure S2.
The lower, intermediate, and maximum bound models of spread are designed to be used with limited data availability, relying on small sample sizes for their calculation.We complement the previous approaches with two more complex (but still simple) spread models that can incorporate more data and mechanistic processes into the estimation of the spread process.First, we perform a linear regression of dispersal distances from the origin against time, where the dependent variable represents a vector of the distances of all observations from the origin from the initial time of introduction to time t.The independent variable is a vector of time periods for each observation, and the constant term is constrained to zero (a species is not present prior to its introduction).The slope of the regression line may be interpreted as the spread rate, or the average expected change in distance from the origin over time.This approach has been used in studies of the gypsy moth (Liebhold et al. 1992;Sharov and Liebhold 1998), maritime pine (Higgins and Richardson 1999), and the emerald ash borer (Sargent et al. 2010).
Second, we adopt a measure of spread common to the theoretical literature that has been used to describe the spread of the muskrat (Skellam 1951), sea otter (Lubina and Levin 1988), and house finch (Okubo 1988).It takes spread as a diffusive process, measuring it as a traveling period wave (Brownlee 1911;Fisher 1937;Shigesada and Kawasaki 1997;Skellam 1951) with a speed approximated by, (1) where the parameter r is an organism's intrinsic growth rate and D is the diffusion coefficient, e.g. a measure of how quickly (on average) the organism spreads (diffuses) from the point of origin.Note that the model in (1) assumes Malthusian growth and that the dispersal data are normally-distributed around the origin (Okubo and Levin 2001;Shigesada and Kawasaki 1997).The diffusion coefficient can be estimated empirically by, (2) where d it is the distance of observation i from the origin at time t since introduction, and N is the total number of observations at that moment in time (Andow et al. 1990;Holmes 1993;Kareiva 1982;Kareiva 1983).
Both the intrinsic growth rate and diffusion coefficient have been estimated for the muskrat.The former was approximated as 1.39 (Williamson et al. 1986) and between 0.2 and 1.1 individuals per year (Andow et al. 1990).The latter was estimated as 22.97 km²/year (Central Europe) (Williamson et al. 1986), 51.2 km²/year (Netherlands) (von Trooswijk 1976), and 230 km²/year (Finland) (Andow et al. 1990).We use the updated data to calculate the diffusion coefficient and at least partially control for site-specific properties that affect muskrat dispersal.We vary the intrinsic growth rate to compute a lower (r = 0.2), intermediate (r = 1.1), and upper (r = 1.39) estimate of the speed of the traveling periodic wave.
Finally, we can relax the assumption of circular range expansion and measure the change in the population range in each of the four cardinal directions (Kovaliski 1998).Compared to our other models, this gives a more specific description of where the population range is changing.Indeed, it is a simplified version of an approach of Andow et al. (1990), whom break down the population range into neighborhoods and measure spread in each neighborhood separately.
A digitized scan of the original Ulbrich (1930) map, its associated polygon shape files, a detailed step-by-step guide to georeferencing and tracing the population ranges per year, and code for the spread model analyses are available on the Zenodo data repository (10.5281/zenodo.8094920).

Results and discussion
Our updated data set allows for precise analyses of the muskrat population expansion.Skellam (1951) and Shigesada and Kawasaki (1997) include six and seventeen polygons of the population area in their analyses respectively.We can account for every year in the data.Furthermore, while we do not have data of the individual species presence observations used to construct the original population ranges, if we assume that there are sufficient observations at the edge of the population range to identify its border, then we have a total of 17,034 observations in the data across the study period (µ = 744, σ = 346.33observations per year).
In our illustration of the data, we recover the primary finding from previous work that the radius of the population range (measured as the square root of the area of the range divided by π) increases linearly over time (Figure 2A).This is the traditional finding taken as evidence that the population range expands radially at a constant rate over time.
Additionally, the nature of the updated data allows for a finer-scale analysis of population range expansion that goes well beyond assumptions of circular expansion.For instance, we observed several distinct periods of population area expansion (Figure 2B) and a reduction in shape complexity over time (Figure 2D).Digging deeper into our updated data shows that-while the traditional assumption of constant, radial spread may hold true on averagethere is significant variation in spread rate over time (Figure 3).Both population range and distance metrics exhibit variation around the average, though this degree of variation can be minimal (linear regression, µ = 9.51, σ = 1.23) or quite large (max dispersal distance, µ = 179.41,σ = 106.60).Much of this variation can be explained by the quantity and type of data used to estimate the spread rate.The linear regression uses all of the data points; the maximum and intermediate dispersal distances rely on a single observation.
A close inspection of the Ulbrich (1930) data reveals that part of the range expansion may be missing in the original data, particularly along the western front of the former Austro-Hungarian empire (1916-1929 and 1921-1926).(This likely explains some of the lower observed spread rates in the western direction.)We could certainly expect missing or erroneous data to bias the estimates of many of the models of spread implemented in this paper.Interestingly, this is not discussed at all by Skellam (1951), Williamson (1996), and Shigesada and Kawasaki (1997), and mentioned only in passing in Andow et al. (1990) (though notably all of these studies omit these data points in their analyses).This presents a pedagogic opportunity.By omitting the data for these years and rerunning our analysis, we can test the robustness of our models to errors in data quality and quantity.We find that filtering out the potentially incomplete polygons and focusing only on the remaining ones (1905-1915, 1920, and 1927) leads to surprisingly few changes in regard to the individual estimates of spread rates per year, but, in general, lower average estimates with the lower sample size (Figures 2 and 3, gray; Figures S3 and S4).This is due to the quantity and type of data used by the models in the study.Take, for example, the intermediate and maximum radial measures of spread.If the observation farthest  S1.Trends in each distance metric over time are located in Figures S3 and S4. For comparison, Williamson et al. (1986) observed rates of spread between 0 and 60 km/year (µ = 11.3,σ = 10.04).Using a linear regression, Andow et al. (1990) estimated muskrat spread between 10.3 and 25.4 km/year in Central Europe (specifically, the former Czechoslovakia), 0.9 to 4.6 km/year in France, and 3.5 to 6.7 km/year in Finland.In fitting reaction-diffusion models, Williamson et al. (1986) and Andow et al. (1990) estimated spread as 7.65 km/year and between 6.4 to 31.8 km/year respectively.from the origin is not on the western front, then-since these models rely solely on this observation for their estimation of spread-they will not be affected by the inclusion or exclusion of the missing data.Future studies could delve deeper into questions of data quantity and quality by reducing the number of individual observations at the border of each polygon or by randomly sampling the landscape inside and at the limit of the population range, and using these data to estimate spread.
We now have access to high-resolution landscape and climatic data that are important for shaping the rate and distribution of spread across a landscape.The muskrat is a semi-aquatic species, preferring swamps or wetlands.We would expect them to expand farthest along rivers, and slower in the face of mountains and dry seasons (Hengeveld 1989;Shigesada and Kawasaki 1997;Williamson 1996).Examination of population range expansion reveals that the maximum dispersal distances often follow river systems, such as the Vltava (1907-1908), Berounka (1908-1909), Jizera (1911-1912), or Sàzava (1915-1916) (Figure S5).It is more pronounced in the East-Southeast and South directions (Andow et al., 1990).Indeed, range expansion is the result of a complex interaction between species biology, topology, environmental conditions, species interactions, and human behavior.This is no more evident in the country-specific differences in reproductive and spread rates, and the successful eradication of the muskrat in Britain, Scotland, and Ireland but failure in continental Europe (Shigesada and Kawasaki 1997;Williamson 1996).It would be feasible to augment the data with historic spatial data sets (elevation, slope, temperature/precipitation, land use and land cover, including river systems) to develop habitat-based models that consider landscape heterogeneity in their measurement of spread.(See, for example, Buchan and Padilla (1999) and Havel et al. (2002), who incorporate habitat characteristics into predictions of invasion probability.) We view the data as a type of experimental playground, where individuals can learn, play, and test out models on a seminal, historical data set of species spread and dispersal.Because the data set is a "classic", it is well suited as a teaching tool for students, managers, policy makers, or incoming researchers to the field.Additionally, it could be used in the development of new approaches to measure species spread and dispersal.For example, we could imagine applying bootstrapping techniques to understand the effects of sampling rates and design on the observed spread rate.It could potentially be used as a training data set for more recent machine learning algorithms such as that of Aidoo et al. (2022), who applied a machine learning algorithm to predicting potential invasions of the African citrus psyllid (Trioza erytreae).

Figure 1 .
Figure 1.Original map of muskrat dispersal in Central Europe (Ulbrich 1930) (A), and the corresponding geo-referenced map with the population ranges traced in ArcGIS (B).Each colored shape in (B) corresponds to a polygon of the population range of the muskrat in a given year.

Figure 2 .
Figure 2. Population range metrics over time: (A) radius of the population range, (B) shape area, (C) shape length, and (D) perimeter to area ratio.Marker shape indicates whether the population range is measured as a polygon (square, solid) or circle (triangle, hollow).Gray markers indicate years with potentially incomplete data.In (A), the black trendline denotes a fit with the full data set; the gray line illustrates a fit for years only with complete data.

Figure 3 .
Figure 3. Box plots of distance spread metrics over time, with an inset of the change in cardinal directions over time.Colored box plots were created using all years of the data.Red bars represent the median values for each model.Boxes represent the 25 th and 75 th percentiles of estimates over the study period.Whiskers correspond to approximately 2.5 standard deviations from the mean.Pluses indicate potential outliers.Box plots omitting years with potentially missing values are plotted in gray.Note that the right axis corresponds to the maximum bound model.Means (µ) and standard deviations (σ) for each model can be found in TableS1.Trends in each distance metric over time are located in FiguresS3 and S4.For comparison, Williamson et al. (1986)  observed rates of spread between 0 and 60 km/year (µ = 11.3,σ = 10.04).Using a linear regression,Andow et al. (1990) estimated muskrat spread between 10.3 and 25.4 km/year in Central Europe (specifically, the former Czechoslovakia), 0.9 to 4.6 km/year in France, and 3.5 to 6.7 km/year in Finland.In fitting reaction-diffusion models,Williamson et al. (1986) andAndow et al. (1990) estimated spread as 7.65 km/year and between 6.4 to 31.8 km/year respectively.