Nested Species Distribution Models of Chlamydiales in Ixodes ricinus (Tick) Hosts in Switzerland

Ixodes ricinus is the vector of pathogens including the agent of Lyme disease, the tick-borne encephalitis virus, and the less well-known Chlamydiales bacteria, which are responsible for certain respiratory infections. In this study, we identified the environmental factors influencing the presence of I. ricinus and Chlamydiales in Switzerland and generated maps of their distribution from 2009 to 2018. We found an important expansion of suitable areas for both the tick and the bacteria during the last decade. Results also provided the environmental factors that determine the presence of Chlamydiales within ticks. Distribution maps as generated here are expected to bring valuable information for decision makers in controlling tick-borne diseases in Switzerland and establishing prevention campaigns. The methodological framework presented could be used to predict the distribution and spread of other host-pathogen pairs to identify environmental factors driving their distribution and to develop control or prevention strategies accordingly.


Initial data
• SRTM Digital Elevation Model • Spatial resolution: 90 m • Source: NASA Shuttle Radar Topography Mission (https://www2.jpl.nasa.gov/srtm/), with hole-filled version processed by the CIAT Agroecosystems Resilience project (Jarvis, 2008) • URL: http://srtm.csi.cgiar.org/ Pre-processing • The two tiles covering Switzerland were downloaded and merged using the QGIS 2.14.7 software (function merge from GDAL). • The merged dataset was then resampled at a 100-m resolution and cropped to the Swiss extent using the SAGA "resampling" tool accessed from QGIS 2.14.7 and using the interpolation method: mean value (cell-area weighted). • Null values were assigned for all pixels outside the Swiss borders.
Slope, Aspect, General Curvature (GC) • Method: These indicators were computed in SAGA GIS 2.3. using the tool "Terrain Analysis > Morphometry > Slope, Aspect, Curvature", with the method "9 parameter 2nd order polynom" (Thorne et al., 1987) and the units defined in degrees.

Morphometric Protection Index (MPI)
• Definition: This indicator provides a dimensionless index expressing how well an area is protected from the surrounding relief, based on the analysis of the environment surrounding each pixel up to a given distance. It is equivalent to the positive openness described by Yokoyama et al. (2002). • Method: This indicator was computed in SAGA GIS 2.3.2 using the tool "Terrain Analysis > Morphometry > Morphometric Protection Index and the default parameters (the relief in the surrounding 2km of each pixel is taken into account).
Terrain Ruggedness Index (RI) • Definition: This indicator compares the elevation of one pixel with the elevation of the neighbouring pixels to provide a measure of terrain heterogeneity (Riley et al., 1999). • Method: This indicator was computed in SAGA GIS 2.3.2 using the tool "Terrain Analysis > Morphometry > Terrain Ruggedness Index" with the default parameters (Radius (Cells)=1 indicating that one neighbour cell is considered in each direction).

Sky-view factor (SVF)
• Definition: This indicator provides an indication of the portion of sky that is obstructed by the surrounding relief: 0 = completely obstructed, 1=completely visible (Böhner and Antonić, 2009, p. 8) • Method: This indicator was computed in SAGA GIS 2.3.2 using the tool "Terrain Analysis > Lighting, Visibility > Sky view factor" with the default parameters (Maximum search radius = 10 km).
Topographic Wetness Index (TWI) • Definition: This indicator is defined from the ratio of the catchment area (area draining water to a given cell) to the local slope (indicator of the capacity to evacuate the water received) and is used as a proxy of soil moisture (Kopecký and Čížková, 2010). • Method: First we computed the specific catchment area in SAGA GIS 2.3.2 using the tool "Terrain Analysis > Hydrology > Flow Width and Specific Catchment Area" with the default parameters (Aspect method). The TWI was then computed in SAGA GIS 2.3.2 using the tool "Terrain Analysis > Hydrology > Topographic Wetness Index" with the default parameters (Standard method).

Land Cover
Land cover classification • Land cover classification in 6 classes : artificial areas, grass and herb vegetation, brush vegetation, tree vegetation, bare land and watery areas • Spatial resolution : 100 m • Source: Swiss Federal Statistical Office (OFS, 2017) • URL: https://www.bfs.admin.ch/bfs/fr/home/statistiques/espaceenvironnement/nomenclatures/arealstatistik/nolc2004.html • Processing: the only processing was to rasterise the data using the function rasterise in QGIS 2.14.7 (the initial data was available as a .csv file)

Processing
• First, the raster with a spatial resolution of 25m was resampled in QGIS 2.14.7 to a raster with a spatial resolution of 12.5 m using the function "resample" with the nearest neighbour method. • A percentage of conifers was then assigned to each 12.5m pixel according to the classification proposed by OFS: 0 = no-forest => 0 % coniferous 1 = coniferous forest => considered 100% coniferous 2 = mixed forest predominantly coniferous => considered 70% coniferous 3 = mixed forest predominantly broadleaved => considered 30% coniferous 4 = broadleaved forest => considered 0% coniferous 9 = unclassified => considered no forest => 0% coniferous • The 12.5 m raster was aggregated to a 100 m target grid, by computing for each target cell the average percentage of coniferous using the tool "zonal statistics" in QGIS 2.14.7. • The resulting grid was rasterised using the "rasterise" function in QGIS 2.14.7.

Initial Data
• Vector landscape model SwissTLM3D from 2016 • Source: Swiss Federal Office of Topography (O'Sullivan et al., 2008) • URL: https://shop.swisstopo.admin.ch/en/products/landscape/tlm3D Processing • All the elements characterising watery areas were extracted from the landscape model o For running water: the lines "Fliessgewaesser" and the polygons "Fliessgewaesser" extracted from the LandCover (Bodenbedeckung) polygons o For stagnant water: the lines "Stehendes Gewasser" and the polygons " "Stehendes Gewasser" extracted from the land cover (Bodenbedeckung) polygons o For the wetlands: the polygons "Feuchtgebiet" extracted from the land cover (Bodenbedeckung) polygons • The vector layers were rasterised using the "rasterise" function in QGIS 2.14.7.
• For each pixel, the minimal Euclidean distance to each water category was then computed using the function "Raster > Analysis > Proximity" in QGIS 2.14.7. This resulted in three raster layers, representing the minimum distance to running water elements, stagnant water and wetlands, respectively. • Finally, the minimum of the three raster files was used as the minimal distance to any watery element.

Initial Data
• MODIS Terra 16-days composite NDVI • Definition: The 16-day composite NDVI is produced on 16-day intervals and provide an indicator of the greenness of the vegetation during these 16 days. NDVI is derived from the reflectance in the red and near-infrared (NIR) bands obtained from the images of the MODIS satellite.

= − +
A large amount of red wavelengths are absorbed by the vegetation during photosynthesis, while the near infrared is reflected, in a proportion that depends in particular on the leaf area index. Land covered by vegetation will therefore show a large difference between NIR and red reflectance, resulting in high NDVI values.

Processing
• We downloaded all images for the years 2006 to 2019.
• The hdf4 files were converted to .tif format using the "gdal_translate" function in R (R Development Core Team, 2008) • The MODIS data being in sinusoidal projection, rasters were reprojected in the CH1903/LV03 projection system using the "projectRaster" function of the "raster" package in R • The files were cropped and resampled to a 100m resolution using the "crop" and "resample" function from the "raster" package in R • For each pixel, the monthly mean values were then computed in R.
• Finally, remaining null values were replaced by the average value of the neighbouring pixels using the "focal" function from the "raster" package in R.

Derived variables
Four indicators were derived for the period of interest (1,3,12,24

Derived variables
First, 15 indicators were derived for the period of interest (1,3,12,24 or 36 months before the sampling date). Some of these indicators are very close to the worldclim bioclimatic variables (https://worldclim.org/data/bioclim.html). They were computed in R using two custom R functions (one defined for the treatment of data frame and the other for raster layers). The two functions are available in: https://github.com/estellerochat/SDM-Chlamydiales.

Processing
• The daily grids were imported in R • The daily relative humidity was computed using the same procedure as in Zimmermann et al. (2001) o Compute the average daytime temperature following Running et al. (1987) = 0.394 + 0.606 o Compute ambient vapour pressure using the Tetens equation for temperatures above 0°C (Murray, 1966) and minimum temperature as an approximation of dew point temperature (Running et al., 1987) = 610.78 exp � 17.269 237.3 + � o Compute the potential vapour pressure of saturated air for daytime temperature using the Tetens equation for temperatures above 0°C (Murray, 1966)  = * 100 • The daily relative humidity grids were then aggregated to compute four monthly grids:

Ixodes ricinus
Sampling date Distribution of sampling dates (month and year) of the occurrence dataset (presences, 2293 points) and selected background points below 1500 m (6050 points).

Altitude
Distribution of altitude values of the occurrence dataset (presences, 2293 points) selected background points below 1500 m (6050 points).

Sampling date
Distribution of sampling dates (month and year) of the occurrence dataset (presences, 186 points) and background points (1029 points).

Altitude
Distribution of altitude values of the occurrence dataset (presences, 186 points) and background points (1029 points).

Supp. File 5 -Ixodes ricinus models
This tab provides the list of all models tested for the distribution of Ixodes ricinus. The mean and standard deviation (sd) values over the 20 runs are given for each of the evaluation parameters. reg is the value of the regularization multiplier. features indicate the features used (l=linear, lp=linear and product, lq = linear and quadratic, lpq=linear product and quadratic). OE_test is the omission error on the testing dataset and OE_indep on the independent dataset. # coeff is the number of non-zeros coefficients estimated by the model. The ranks (1)(2)(3)(4) correspond to the ranking procedure defined in the method section. The final rank gives the final ranking of the models (1=best model, parameters selected for the final modelling).

Supp. File 6 -Ixodes ricinus suitability maps
Maps of suitability predicted based on the "best" model presented in the paper.

Supp. File 7 -Chlamydiales models
This tab provides the list of all models tested for the distribution of Chlamydiales. The mean and standard deviation (sd) values over the 20 runs are given for each of the evaluation parameters. reg. Is the value of the regularization parameter. feat. indicates the features used (l=linear, lp=linear and product, lq = linear and quadratic, lpq=linear product and quadratic). med suit. P 2009 (resp. 2018) is the median of the suitability predicted on presences points from 2009 (resp. 2018). med suit. "A" 2009 (resp. 2018) is the median of the suitability predicted at sites where no Chlamydiales were found ("absences") in 2009 (resp. 2018). # coeff is the number of non-zeros coefficients estimated by the model. The ranks (1)(2)(3)(4) correspond to the ranking procedure defined in the method section. The final rank givse the final ranking of the models (1=best model, parameters selected for the final modelling).

Supp. File 8 -Chlamydiales : T-test and selection of variables
For the signification of the acronym names, please refer to Supp. File A2.3.

T-test
For each variable and buffer radius, the heatmap below shows the results of the T-test. Only results that were significant according to the p-value of the T-test are shown (grey area = no significant results). The numbers on the cells indicate the time period considered before sampling date (in number of months) which resulted in the highest T-value for the given combination of variable and buffer radius. Numerical values are available in the following