Distribution models calibrated with independent field data predict two million ancient and veteran trees in England

Abstract Large, citizen‐science species databases are powerful resources for predictive species distribution modeling (SDM), yet they are often subject to sampling bias. Many methods have been proposed to correct for this, but there exists little consensus as to which is most effective, not least because the true value of model predictions is hard to evaluate without extensive independent field sampling. We present here a nationwide, independent field validation of distribution models of ancient and veteran trees, a group of organisms of high conservation importance, built using a large and internationally unique citizen‐science database: the Ancient Tree Inventory (ATI). This validation exercise presents an opportunity to test the performance of different methods of correcting for sampling bias, in the search for the best possible prediction of ancient and veteran tree distributions in England. We fitted a variety of distribution models of ancient and veteran tree records in England in relation to environmental predictors and applied different bias correction methods, including spatial filtering, background manipulation, the use of bias files, and, finally, zero‐inflated (ZI) regression models, a new method with great potential to investigate and remove sampling bias in species data. We then collected new independent field data through systematic surveys of 52 randomly selected 1‐km2 grid squares across England to obtain abundance estimates of ancient and veteran trees. Calibration of the distribution models against the field data suggests that there are around eight to 10 times as many ancient and veteran trees present in England than the records currently suggest, with estimates ranging from 1.7 to 2.1 million trees compared to the 200,000 currently recorded in the ATI. The most successful bias correction method was systematic sampling of occurrence records, although the ZI models also performed well, significantly predicting field observations and highlighting both likely causes of undersampling and areas of the country in which many unrecorded trees are likely to be found. Our findings provide the first robust nationwide estimate of ancient and veteran tree abundance and demonstrate the enormous potential for distribution modeling based on citizen‐science data combined with independent field validation to inform conservation planning.


Species distribution model parameter tuning and evaluation of alternative methods of splitting of training and test data using Maximum Entropy modelling.
Section S1: Introduction When fitting any species distribution model (SDM) the choice of parameters can highly influence the models accuracy and performance (Fourcade et al., 2018). Maximum Entropy modelling has a variety of variable parameters that can be used to tune the model and produce a model of best fit (Phillips et al., 2009). Two of these parameters, feature class (FC) and regularisation measure (RM) are among the most useful and allow optimisation between overfitting and goodness of fit (Muscarella et al., 2014;Fourcade et al., 2018). In addition, the method of splitting training and test data for model evaluation has been shown to have strong influences on the models performance (Wenger & Olden, 2012;Bahn & McGill, 2013). One of the most common methods involves selecting a random proportion of occurrence (and background) records, usually between 20-50% as a 'pseudo-independent' test data set (Fielding & Bell, 1997). Alternative methods involve splitting the data into k number of groups (k-fold cross validation) and using each group to subsequently act as the test data, and the other groups as the training set. However, these methods of random splitting have been critised for underestimating model error and being affected by spatial autocorrelation problems (Burnham & Anderson, 2003;Araújo et al., 2005). Instead, non-random, geographical splitting of the data Nolan et al: Two million old trees in England may be more appropriate, and can test the extrapolation ability of the model (Radosavljevic & Anderson, 2014;Roberts et al., 2017). Initial analysis was carried out to evaluate the best method to split the test and training data, as well as the best tuning parameter combination of FCs and RMs for the baseline SDM of ancient and veteran trees across England with no bias correction method.

Section S2: Methods and analysis
MaxEnt models of ancient and veteran tree distributions across England were tuned and fitted in R (R Core Team, 2018) using the 'ENMeval' package in R (Muscarella et al., 2014). Initial and RMs of 0.5, 1, 2, 3, 4, and 5 was undertaken for the model with no bias correction method applied. Model predictive power was evaluated using three methods of splitting the data into training and test data. The first method involved geographic splitting of the data into four spatial blocks, from which one was randomly assigned as test data and the others as training data ('Block'). The second method was similar but split the data into a spatial checkerboard design ('Check'), dividing the area into bins at the resolution of the raster predictors. The final method used 10 fold cross validation ('Kfold'). The splitting was carried out 10 times, with a separate model run for each, resulting in a total of 720 models (four feature classes (FC) x six regularisation methods (RM) x 3 splitting methods x 10 repetitions). Model performance was evaluated using corrected Akaike information criterion (AICc) and 'Area Under the Curve' (AUC). Generalised Linear Mixed Models (GLMMs) were used to analyse significant differences between model performance (AICc and AUC) in relation to FC, RM and splitting methods. GLMMs were fitted in R using package 'lme4' (Bates et al., 2015) separately for training and test data specifying a Gaussian distribution, and included splitting method, FC and

Nolan et al: Two million old trees in England
RM as fixed factors, and repetition run as a random factor. Backward selection based on AIC was used to find the most parsimonious model with the most influential predictors.

Section S3: Results
Model performance and predictive power differed significantly across splitting method, FC and RM (Appendix S2: Table S1). When considering each parameter separately, the most effective tuning parameters based on both mean AICc and AUC (train and test) were the 'Kfold' splitting method, FC 'LQ' and RM 5 (Appendix S2: Fig. S1 & S2). However, when considering interactions between parameters, an increase in RM only had a significantly positive influence on model performance (AICc) across FCs 'LQP' or 'LQPTH', and had little effect on model with 'L' or 'LQ' FCs (Appendix S2: Fig. S1). Therefore, the choice of RM when using either 'L' or 'LQ' FCs appears to be of little consequence, and the default version of 1 may be the best choice. Additionally, there was a significant interaction between splitting method and FC (Appendix S2: Table S1), with significantly poorer model performances with FC 'LQP' or 'LQPTH', particularly for the 'Block' splitting method (Appendix S2: Fig. S1).
Therefore, based on AICc, the selection of the best tuning parameters should be based on a 'Kfold' splitting method and FC 'LQ', with any RM. When considering AUC, all parameters and interactions had a significant influence on the model predictive power (Appendix S2: Table   S1). Again, the worst performing models used the 'Block' splitting method, 'LQP' and 'LQPTH' FCs and lower RMs), particularly when assessing the test data (Appendix S2: Fig.   S2).

Section S4: Conclusion
The choice of tuning parameters is an important step in model fitting, as well as the division of the training and test data for model evaluation. Choice of parameters is highly model specific

Nolan et al: Two million old trees in England
and should be carried out before fitting and interpreting any SDM. In all cases, 'Kfold' data splitting was the most effective way to divide training and test data, regardless of any other parameter. Therefore, in all subsequent models of bias correction we have chosen to use this method. For the baseline model of ancient and veteran tree distributions with no bias correction, the combination of parameters which produce the model with both the highest performance and fit, as well as predictive power were using FC 'LQ' and RM 5, hence these parameters are the chosen ones for this model. For all other sampling bias corrected SDMs, models were fitted using all combinations of FC and RM as the best combination is likely to be highly variable across models. The best model for each bias correction method was then chosen based on AICc.  ('Block, 'Check' or 'Kfold'), feature class (FC) ('Linear (L) ', 'Linear and Quadratic (LQ) ', 'Linear, Quadratic and Product (LQP)' or 'Linear, Quadratic, Product, Threshold and Hinge (LQPTH)') and regularisation measure (RM) (0.5,1,2,3,4,5). Mean values (±SE) are shown across the 10 repetitions of model fitting.   ('Block', 'Check' or 'Kfold' methods), regularisation measures (RM) (0.5, 1, 2, 3, 4 and 5) and feature classes (FC) ('Linear (L) ', 'Linear and Quadratic (LQ) ', 'Linear, Quadratic and Product (LQP)' or 'Linear, Quadratic, Product, Threshold and Hinge (LQPTH)