Mapping environmental suitability for anthrax reemergence in the Arctic

Recent studies have raised concerns about the potential reemergence of infectious diseases in Arctic regions associated with warming temperatures. Among these, particular attention has been devoted to anthrax, as a consequence of the outbreak that occurred in the Russian Yamalo-Nenets peninsula in 2016. Understanding how environmental change might influence the diffusion of this pathogen could allow informed decisions to prevent further zoonotic or epidemic episodes. To that end, the present study aims to identify and investigate the driving variables that may favor anthrax transmission within the Arctic, in order to build environmental niche maps describing the future suitability of these regions for the pathogen. To do so, we use the MaxEnt statistical learning tool informed by Arctic-specific variables, such as reindeer herd distribution and active-layer variation. Because of the relative lack of reliable georeferenced information in these regions, the resulting potential distribution maps are to be considered preliminary, but they can already provide a first assessment tool for local communities living in potential risk areas. They also indicate areas in which additional investigation is needed to improve the reliability of environmental niche modeling, hence the accuracy of risk mapping and the usefulness to Arctic communities.


Introduction
More than other regions, the Arctic environment is experiencing dramatic modifications as a consequence of climate change and human activities, and these modifications have global implications (see e.g. the IPCC Fifth Assessment Report [1]). Indeed, over the last few decades, many studies have brought attention to the unprecedented rates of coastal erosion, carbon and contaminant release, global sea-level rise, and active-layer thawing occurring in the Arctic, leading to an overall degradation of this crucial, yet fragile environment [2][3][4][5][6]. As a result, the livelihoods, health, and cultural heritage of indigenous communities might be compromised [7]. Moreover, recent studies have investigated the presence of potentially reemerging bacteria and viruses in permafrost [8][9][10] and even in deep layers of Arctic sea ice [11].
Among these pathogens, anthrax has recently raised major concerns in light of the 2016 outbreak in the Yamalo-Nenets peninsula, where over 2000 reindeer and one child died, and around 100 people were hospitalized. Reports by the World Organization for Animal Health (OIE; World Animal Health Information Database [12]), suggested that the outbreak may have been triggered by the exposure of herds to infected carcasses that were buried during an anthrax outbreak that had occurred in the region in the past, and that were brought to surface as a consequence of unprecedented thawing rates of the active layer. This assumption has been further supported by other studies [13,14]. Experts had already warned about the potential existence of infected sites in Siberia, which was severely affected by anthrax outbreaks between 1915 and 1920, about one century before the 2016 incident [15]. Their rational, indeed, was that sites that once were home to an outbreak could be 're-activated' as a consequence of the increasing rates of active-layer and permafrost thawing, in combination with imperfect burial practices and the long-term viability of anthrax spores.
Anthrax is a global zoonotic disease triggered by the spore-forming bacterium Bacillus anthracis. It principally affects sub-Saharan Africa, Asia and South America, and some areas in Central and North America [16][17][18]. However, it has recently been reported that climate change might promote a northward range expansion of anthrax potential suitability in the northern hemisphere, as a consequence of changes in habitat conditions, wildlife distribution, and other environmental factors [19]. For this reason, anthrax is listed among the so-called 'climate-sensitive' diseases [19][20][21].
Investigating how environmental variables and their changes might influence disease diffusion can prove instrumental towards providing institutions and communities alike with useful information about potential precautionary and surveillance measures. Ecological niche modeling has been widely used for this purpose. Concerning anthrax, its spatial distribution has been investigated at multiple spatial scales, ranging from local and national (e.g. [22][23][24][25]) to regional and global [16,19]. However, its potential relevance for the Arctic region is still underinvestigated, and thus possibly underestimated, mainly because the relative lack of easily accessible and sufficiently detailed georeferenced information makes it difficult to properly characterize the peculiarities of this environment (e.g. spatial information about remote human settlements, reindeer husbandry, infected burial sites).
For this reason, in this study we investigate the spatial suitability of the Arctic region for anthrax, leveraging on the spatially-distributed data currently available. Our purpose is twofold: on the one hand, we aim to produce an indicative map describing the environmental conditions that may be conducive to anthrax outbreaks and favor the diffusion of this pathogen in the Arctic; on the other, we try and highlight the limitations of this approach linked to data deficiency, which could be usefully overcome with the implementation of suitable surveillance systems. We follow the methodology applied by Walsh et al [19], namely the MaxEnt (Maximum Entropy) statistical learning tool [26][27][28]. We include variables that are relevant to the Arctic environment, including reindeer herd distributions and active-layer thawing depths, building also on recent modeling work [29] suggesting that these factors may play a relevant role in disease transmission dynamics.

Methods and data
MaxEnt is a machine learning algorithm that is widely used in ecological applications for modeling species distributions from presence-only data [27,28,30,31]. In general, distribution models like MaxEnt can be used to characterize the environmental niche occupied by a species, which in turn can be thought of as the set of habitat requirements under which a certain species can persist without immigration [32]. These methods have been also applied in epidemiology, namely to investigate the suitability of the environment for sustaining infections and to predict potentially at-risk areas [33][34][35]. In the following, we briefly outline how MaxEnt can be applied to evaluate the potential distribution of anthrax in the Arctic region, based on the available spatial data on anthrax outbreaks and some relevant covariates. Given a set of environmental features within a certain area of interest, Max-Ent is trained on a selected fraction of sites where outbreaks have occurred (training presence sites). Accordingly, the algorithm identifies the range values of the environmental features for which the disease is more likely to emerge. Results are then tested on the remaining fraction of known outbreak locations (testing presence sites). Eventually, outbreak risk can be projected over the whole study area.

Anthrax data and environmental variables
To gather a sufficient amount of data and obtain significant results with MaxEnt, we trained and validated our model on a larger geographical scale and then discussed its projections for the Arctic area. Indeed, we considered predictive variables and anthrax outbreak occurrences above the 25 • parallel north. Based also on previous studies [19,16], the sites of known anthrax outbreaks have been taken from the WAHIS database portal maintained by OIE and from the Global Animal Disease Information System (EMPRES-i) portal maintained by the Food and Agriculture Organization. Additional information was acquired in ProMED-mail, an alert program run by the International Society for Infectious Diseases, in order to cover North America, Canada, and China. After removing duplicates, a total dataset of 151 anthrax outbreaks, occurred between January 2005 and December 2019, was used.
Since few data were available within Arctic regions, we included additional information in order to better resolve feature selection and model results in this region. Specifically, we added four outbreak occurrences in the Yakutia region, in Northern Siberia, where many old infected burial sites have been reported [15]. Since the exact locations of old burial sites are unknown, the coordinates of these outbreaks have been randomly selected within the region. This number was balanced to include the characteristics of this region in the evaluation of the environmental suitability of anthrax, while at the same time avoiding out-weighting the features of other regions where outbreaks were reported. Slightly different numbers (in the range 2-8) gave similar results. Although the environmental conditions used in this study might have been slightly different when these past outbreaks occurred, we argue in favor of their inclusion to better characterize the risk in this region. As the 2016 Siberian outbreak has suggested, old infected burial sites in the Arctic might still be active, and the release of spores should be considered a potential threat also at present. All the considered sites where anthrax outbreaks occurred are shown in figure 1.
Covariates have been chosen in order to highlight environmental characteristics that might contribute to anthrax proliferation and transmission according to available literature [18,19], and that can be relevant also for the Arctic regions [29]. Accordingly, we used soil temperature anomaly, Priestley-Taylor coefficient, soil pH and organic carbon (OC), reindeer and livestock densities, and active layer depth variations (figure 2). Soil temperature anomaly (T) for the period 2005-2019 was obtained from the NASA Earth Observations repository based on data provided by the NASA Goddard Institute for Space Studies (GISS). The soil temperature anomaly of a given year describes changes in temperature registered with respect to the period 1951-1980 [36]. GISS provides annual mean values at two degrees spatial resolution, from which we derived the overall mean temperature anomaly across the period 2005-2019 [37][38][39]. The Priestly-Taylor α (PT-α) coefficient was used as an indicator of soil water stress, representing water availability with respect to the water requirement of the vegetation. It ranges between zero, for high water stress, and one, when water availability is high. A global raster map of PT-α of 30 arc-seconds cell size was obtained from the Consultative Group for International Agricultural Research Consortium for Spatial Information [40]. Soil pH and OC content were retrieved at 30 arcsecond resolution from the Global Soil Dataset for Earth System Modelling [41]. One-kilometre resolution spatial rasters of active-layer thickness were obtained from the Permafrost Project run by the European Space Agency within its Climate Change Initiative [42]. Grid products are released in annual files containing the maximum depth of seasonal thaw, which corresponds to the active-layer thickness. The dataset covers the Northern Hemisphere (north of 30 • , where there is permafrost) for the period 2003-2017. We then calculated the average rate of change in the active-layer thickness (following referred to also as AL), obtained as the slope of the linear trend across the entire period.
Cattle, goat, and sheep population density distributions in 2010 were obtained from the Gridded Livestock of the World database at a spatial resolution of five arc-minutes [43]. We summed up these variables and formed a unique spatial raster, in the following referred to as livestock. Information about reindeer herds has been derived from the International Centre for Reindeer Husbandry webpage [44], which provides the number of reindeer within herders at approximate locations. A plausible spatial distribution of all herd communities was obtained by gauging additional information from the report by Magga et al [45]. From these two sources we inferred the spatial distribution of reindeer abundance within the study area. To obtain a realistic estimate of population density (number of animals/km 2 ), the spatial abundance pattern was randomly downscaled using a negative binomial distribution (with mean corresponding to the average density of reindeer in the study area) to describe the observed clumping of herders [46]. In addition, the abundance of migratory herds of caribou (North America and Greenland) and wild reindeer (Russia and Norway) was retrieved from the Report Card of the Arctic Program of the U.S. National Oceanic and Atmospheric Administration [47]. We then applied the same procedure described for reindeer herders to derive spatial grids of reindeer and caribou densities. We finally combined the derived grids and created a spatial raster that sums up all the reindeer data. Since the preparation of this dataset involved a stochastic downscaling procedure, different realizations may produce slightly different spatial distributions of reindeer density.
All the variables described so far were mapped using the Lambert azimuthal equal-area projection to avoid distortions at northern latitudes. Environmental features as well as MaxEnt outputs were processed at a spatial resolution of 5 km × 5 km. Correlation between covariates was tested and found to be low in most cases. Indeed, the pairwise Pearson correlation coefficient, ρ, was lower than 0.65 for all features (figure 2).

Model set-up and evaluation
Preliminary to MaxEnt runs, the sites where anthrax outbreaks occurred were randomly split into 70%-30% training-testing sets. Also, 10 000 points with no known outbreaks but with available information about the environmental covariates were randomly sampled within our area of interest. These background points are used by Max-Ent to contrast presences against background locations where presence/absence is unmeasured and help estimate site-specific presence probabilities [27,30].
MaxEnt was then set up by using the KUENM package [48], implemented in R software (version 4.0.3). We created 84 models by combining four sets of environmental and host variables (see table 1), seven values of the regularization parameter, used in machine learning to reduce model overfitting [30], β (i. e. β = {0.25, 0.5, 0.75, 1, 1.25, 1.5, 2}), and three sets of feature classes (linear, l, quadratic, q, and their combination, lq). Best model candidates were chosen by evaluating • the partial receiver operating characteristic (ROC) curve and the associated area under the curve (AUC), to measure overall classification performance and statistical significance. Partial ROCs were used instead of full ROCs, because more suitable for niche modeling applications, usually based on presence data only [49]. AUC values were compared against null expectations (i.e. AUC = 0.5, that is a model does not distinguish between true and false positive) by defining the AUC ratio (AUC/0.5) [48,49]. The statistical significance of each candidate model was assessed by bootstrapping 50% of the test sites 500 times and then by counting the proportion of bootstrap replicates that have AUC ratio <1; • the omission rate, E, to test whether models created using training points fit well values at testing locations [50,51]. The target omission rate was set to be below 10%; and • the Akaike information criterion corrected for small-sample size, AIC c [52], to weigh model accuracy vs. complexity. AIC c was chosen over the common AIC because of the relatively small sample size of known outbreak locations. A target ∆AIC c ≤2 was chosen, which corresponds to the default value in the KUENM package [48].
The final model was then created with the whole set of occurrences and the parameters chosen during the evaluation process described above. We produced five replicates of MaxEnt predictions by bootstrap, and calculated their mean.

Results
Out of the 84 candidate models evaluated, all resulted significant (i.e. the proportion of bootstrap replicates with AUC <1 resulted always zero), 12 met the omission rate criterion, and, out of these, only one met the AIC c condition ( figure 3(a)). The selected model includes active layer variations, soil pH, PT-α, temperature anomaly, livestock and reindeer densities (Set 3 in table 1), with a regularization parameter β = 0.25, and with both linear and quadratic features (lq). Figure 3(b) shows the ROC curve for all five replicate runs of the selected model and its average pattern. The average AUC over all the five replicates resulted 0.840, with a standard deviation of 0.006.
The modeled spatial distribution of anthrax potential suitability is shown in figure 4. The overall picture (panel a) is consistent with the distribution identified by previous studies, in particular the one conducted by Walsh et al [19]. Model results suggest higher anthrax suitability in Europe, in the south-central zone of the Eurasian continent, and between northern USA and southern Canada. On the other hand, our model predicts a comparatively lower, yet still appreciable, anthrax suitability in the Arctic region. Indeed, zooming in on this region (panel b), it becomes apparent that widespread areas within northern latitudes are potentially suitable for anthrax reemergence. In general, it can be noted that the spatial distribution of the areas projected to be potentially at risk agrees with higher values of temperature anomaly and active-layer variation rates (see figure 2), confirming the importance of these variable in determining potential anthrax reemergence and diffusion.

Discussion and conclusions
We applied the MaxEnt statistical learning algorithm to study the spatial distribution of anthrax disease outbreaks. Specifically, the present work constituted a first attempt at investigating the spatial suitability of the Arctic region for the reemergence of anthrax.
To this purpose, we included in our model Arcticspecific information related to e.g. reindeer herders and permafrost active-layer thawing. As a matter of fact, the distribution of reindeer herds is directly linked to the magnitude of the pool of organisms that are susceptible to the disease, a primary driver for outbreak risk, while permafrost dynamics is a typical feature of the Arctic environment, one that has been linked to the reemergence of anthrax transmission in this region [29]. The overall resulting projections of anthrax spatial distribution (figure 4) agree well with previous studies [16,19]. Regarding Arctic regions, environmental niche modeling suggests that anthrax may be more likely to reemerge in areas with higher values of temperature anomaly and active-layer rate of change. The former variable has been used in a previous study to identify the influence of climate warming [19]. The latter is an original addition of this work, whose inclusion allowed us to highlight the specific effects of warming temperatures in the Arctic environment, especially in terms of increasing active-layer thawing rates. Therefore, our results stress once more the importance of including the region-specific effects of climate change in any modeling effort aimed to evaluate the spatial patterns of potential anthrax suitability, in particular within the Arctic region, where higher temperatures have led to drastic modifications of the environment [3,53,54]. Regarding the other environmental variables that prove instrumental to risk mapping, anthrax spores are known to be sensitive to soil pH, water content, OC, and temperature, whose values are critical for their maintenance [18,19,55]. The selection of livestock and reindeer distributions as covariates of the risk map, instead, highlights the importance of host populations for anthrax transmission and proliferation.
In general, we note that an exact detection of areas potentially at risk does not seem completely at reach yet, mostly for the limitation of the currently available data. For example, our results indicate low probability of anthrax transmission in the Yamalo-Nenets region, where the 2016 outbreak occurred. In this regard, we acknowledge that our analysis is based on data of possibly inconsistent quality. Therefore, the presented results should be intended as preliminary, and the identification of the spatial locations characterized by higher potential risk of anthrax transmission should be interpreted with caution as well. For example, it is worth remarking that the reindeer distribution has been statistically derived starting from information available at a larger scale. By contrast, ground-truthed population density estimates, possibly complemented by the identification of summer grazing areas and migration routes, would make reindeer information more accurate, thereby improving our estimates of spatial anthrax suitability.
While collecting the data for our analysis, we also noted a general under-representation of the Arctic region in the reporting of anthrax outbreaks. In fact, of the 149 reported episodes accounted for in our dataset, just 3 referred to this region, in contrast with the reported presence of infected burial sites [15]. In this regard, knowing the exact locations of infected burial sites would represent a crucial piece of information to improve the accuracy of model-based risk projection. The availability of reliable, georeferenced information would make it possible to focus more specifically on the Arctic, thus allowing a better qualification of its peculiar environmental characteristics, with possible implications for risk projections. Compared to previous studies, our application suggests that neglecting the proxies related to Arctic-specific features, such as active-layer depth and reindeer herding practice, would lead to a likely misrepresentation of the potential suitability of northern latitudes for anthrax diffusion. By allowing a more accurate identification of areas subject to high (albeit potential) risk of anthrax reemergence, improved data availability could thus allow herding communities to face the risk associated with the reemergence and transmission of this pathogen in an informed way, thereby possibly helping them to adapt their practices and prevent future infections.

Data availability statement
No new data were created or analyzed in this study. The study is based upon available data-sets. Corresponding links are reported as references.