Predicting the future distribution of antibiotic resistance using time series forecasting and geospatial modelling [version 1; peer review: awaiting peer review]

Background: Increasing antibiotic resistance in a location may be mitigated by changes in treatment policy, or interventions to limit transmission of resistant bacteria. Therefore, accurate forecasting of the distribution of antibiotic resistance could be advantageous. Two previously published studies addressed this, but neither study compared alternative forecasting algorithms or considered spatial patterns of resistance spread. Methods: We analysed data describing the annual prevalence of antibiotic resistance per country in Europe from 2012 – 2016, and the quarterly prevalence of antibiotic resistance per clinical commissioning group in England from 2015 – 2018. We combined these with data on rates of possible covariates of resistance. These data were used to compare the previously published forecasting models, with other commonly used forecasting models, including one geospatial model. Covariates were incorporated into the geospatial model to assess their relationship with antibiotic resistance. Results: For the European data, which was recorded on a coarse spatiotemporal scale, a naïve forecasting model was consistently the most accurate of any of the forecasting models tested. The geospatial model did not improve on this accuracy. However, it did provide some evidence that antibiotic consumption can partially explain the distribution of resistance. The English data were aggregated at a finer scale, and expectedtrend-seasonal (ETS) forecasts were the most accurate. The geospatial model did not significantly improve upon the median accuracy of the ETS model, but it appeared to be less sensitive to noise in the data, and provided evidence that rates of antibiotic prescription and bacteraemia are correlated with resistance. Open Peer Review Reviewer Status AWAITING PEER REVIEW Any reports and responses or comments on the article can be found at the end of the article. Page 1 of 18 Wellcome Open Research 2020, 5:194 Last updated: 19 AUG 2020


Introduction
Globally, the prevalence of antibiotic resistance is growing at an alarming rate [1][2][3] . This is a huge threat to public health which is further compounded by the fact that the rate of discovery of new antibiotics has fallen to almost zero 4 . It is now understood that the continued use of antibiotics in medicine, agriculture, and aquaculture creates the selection pressure which has driven the increase in the prevalence of resistant bacteria 2,5 . However, rates of resistance are highly heterogeneous between locations, reflecting the impact of multiple covariates on the observed prevalence of antibiotic resistance 6 . These include: rates of antibiotic use 5 ; seasonality 5 ; prevalence of disease particularly within health-care settings which promotes increased antibiotic consumption; poor quality antibiotics 7 ; and patient adherence to prescribed antibiotic regimens 8 . Socio-economic status has also been observed to be correlated with antibiotic resistance, although it is probable that this is confounded by other factors such as disease prevalence and inappropriate consumption of antibiotics 9,10 .
Recommendations for changes to clinical practice and public health policy have been designed to limit the future spread of antibiotic resistance [11][12][13][14] . Additionally, therapeutic solutions to treating multidrug resistant (MDR) infections are under development [15][16][17][18][19] . However, to maximise the efficacy of these clinical interventions, they should be implemented as early as possible when resistance to particular antibiotics is observed to be rising. For this to be achieved, active surveillance of antibiotic resistance is essential 20 , and the ability to accurately forecast the future spatial and temporal distribution of antibiotic resistance from surveillance data could be highly informative.
Most published mathematical models of antibiotic resistance epidemiology are mechanistic models of bacterial evolution 21 , or transmission of disease 22 , from which trends in the prevalence of antibiotic resistance are inferred [21][22][23][24][25][26][27] . These studies normally use small data sets from a single hospital or clinical trial, typically with the intention of assessing the efficacy of an intervention 26 or improving understanding of the spread of antibiotic resistance and the factors which promote it 27 . Although these studies provide useful insight into the biology of resistance they are likely to be unsuitable for making future forecasts of the spatiotemporal distribution of antibiotic resistance 28 .
It has been demonstrated that model structure can have a highly significant influence on the predictions of epidemiological studies 29 . Therefore, initial testing can employ flexible forecasting algorithms, to identify methods that might be broadly generalisable to different species, antibiotics, and settings. Such models learn their parameters from the data, independent of assumptions regarding the population structure or evolutionary biology of the species in question. To our knowledge there have only been two previous attempts to forecast the future prevalence of antibiotic resistance from retrospective antimicrobial susceptibility testing (AST) data using a model which satisfies these conditions: Lopez-Lozano et al., 2000 30 and Teodoro and Lovis, 2013 28 . Lopez-Lozano et al. applied an autoregressive integrated moving average (ARIMA) model to a time series of AST data taken from a single hospital and combined it with pharmacy data as a proxy for antibiotic consumption to investigate the correlation between these two factors, whilst also producing future forecasts. A similar approach was taken by Gharbi et al., 2015 31 . However, they only included meropenem consumption as an explanatory variable, not retrospective AST data, so their model is only applicable if detailed retrospective data on antibiotic prescription or consumption are available. A more complex approach was taken by Teodoro and Lovis, 2013 28 , who applied empirical mode decomposition (EMD) followed by k-nearest neighbour (KNN) analysis to an AST time series from a hospital in Geneva to generate forecasts.
Whilst the authors of both these papers demonstrate their method produces reasonably accurate forecasts, there are important limitations to both studies. Firstly, they only include data from a single geographic location and therefore cannot make inference regarding the future spatial spread of antibiotic resistance.
Secondly, the output of the model devised by Teodoro and Lovis does not include any estimate of uncertainty. This is a limitation of many machine learning models, which do not implicitly provide any measurement of error, unlike ARIMA, which is constructed within a frequentist framework allowing standard error to be inferred from the data 32 .
Finally, neither study presents any comparison of their chosen method to other methods of time series analysis (other than a benchmark, random-walk model). Reviews in the econometrics literature have demonstrated that there are large differences in the accuracy of alternative forecasting models, and the most accurate method is often related to the type of data [33][34][35][36] . Given that neither Lopez-Lozano et al. 30 or Teodoro and Lovis 28 present such a comparison, the optimal method of forecasting from AST data remains unclear.
The primary objective of this study was to build on the work of Tedoro and Lovis, 2013 28 and Lopez-Lozano et al., 2000 30 to develop models for accurately forecasting the dissemination of antibiotic resistance. Specifically, we aimed to: 1. Determine the optimal method for predicting future prevalence of antibiotic resistance within a single geographic location from retrospective surveillance data.
2. Extend the model to produce accurate forecasts of the spatiotemporal distribution of antibiotic resistance.
3. Incorporate covariates into the model to investigate their relationship with the outcome and observe how they impact on forecast accuracy.

Data
Two open access data sets describing the prevalence of antibiotic resistance over time across a given geographic region were used for this study. The United Kingdom) were used for the analyses of fluoroquinolone resistance. These countries and antibiotics were chosen because data describing the estimated annual consumption of these antibiotics in these countries was available from the European Surveillance of Antimicrobial Consumption Network (ESAC-Net) 37 .
Additional covariates included in the analysis of the European data were annual gross domestic product (GDP) per capita, estimated by the International Monetary Fund (IMF) 38 , and annual rate of influenza-like-illness (ILI) per 100,000 people, estimated by the World Health Organisation (WHO) 39 .
The second dataset was the Fingertips database, including antibiotic resistance surveillance data from England 40 . This described the quarterly percentage of E. coli isolates in each of 195 clinical commissioning groups (CCGs) that were nonsusceptible to treatment with either ciprofloxacin, gentamicin, piperacillin/tazobactam, or 3rd generation cephalosporins (3gC), between the final quarter of 2015 and the final quarter of 2018. We also included the rate of E. coli bacteraemia per CCG per quarter from this dataset in our analysis 40 . These data were merged with information on the rate of prescribing of each of the above antibiotics per CCG from the Open Prescribing database 41 measured in the number of items prescribed per month, which was divided by the population in each CCG, estimated by the Office of National Statistics 42 , and multiplied by 1000 to give the number of items prescribed per 1000 people. We then aggregated the prescription data to quarters to allow comparison with the antibiotic resistance data.
Extended data, Figure 1 and Figure 2 shows the location of each country represented in the European data and the CCG boundaries in England respectively 43 .

Single time series forecasting models
Four alternative forecasting models were tested on the data from each location individually. These were autoregressive integrated moving average (ARIMA), which is the model used by Lopez-Lozano et al. 30 ; expected-trend-seasonal (ETS) forecasting 32 ; neural network (NN) autoregression forecasting 32 , and the model developed by Teodoro and Lovis 28 (from hereon referred to as Teodoro and Lovis forecasting) which is summarised in the following paragraphs. ARIMA, ETS, and NN autoregression are standard methods of time series prediction, described in 32. Finally, a naïve forecasting model (every point in the future is equal to the last observed value in a time series) was used as a benchmark comparison of forecast accuracy 32 .
Teodoro and Lovis forecasting begins by applying empirical mode decomposition (EMD) to the time series. This decomposes the data into a set of intrinsic mode functions (IMFs), defined as a function satisfying two properties 44 : 1. The number of extrema must be equal to, or one more or less than, than the number of zero crossings.
2. At any given point in the time series the mean value of the envelopes defined by the local maxima and minima equals zero.
EMD is conducted iteratively by first identifying the local maxima and minima of a time series and connecting them with a cubic spline to define the upper envelope e max(t) and the lower envelope e min(t) . At each point in the time series the mean of the upper and lower envelope is calculated according to A time series decomposed with EMD can be expressed as: where c is an IMF, n equals the total number of IMFs, and r is the residual. The IMFs and the residual are collectively referred to as components of the original time series.
Teodoro and Lovis then remove the highest frequency components as these are assumed to be associated with random noise. They hypothesise that the remaining components are associated with seasonalities of decreasing frequency and finally the overall temporal trend in the rate of antibiotic resistance.
Having selected the relevant, low frequency components of the time series, each component is then embedded into delay vectors. This converts the component into a series of vectors each representing one point in the component and containing the previous m data points. The value of m is selected using a method derived by Chakrabarti and Faloutsos, 2002 45 who use fractal dimensions to specify the optimal value of m. They increment m between 1 and 10 and calculate the fractal dimension of the resulting matrix using a box-counting algorithm 46 . After some value of m the fractal dimension of the matrix plateaus and this is chosen as the optimal value of m for the delay embedding vectors.
Finally, k-nearest neighbour (kNN) analysis is applied to the components in the m-dimensional embedding space. Cross validation is used to select the optimal value of k which gives the number of points closest in time to the output (y t+1 ) to be used to generate a forecast. The mean of these k variables gives the output of the kNN algorithm. This is done for each low frequency component of the time series and the one-step ahead forecast is calculated as the sum of each of these values. Multi-step ahead forecasts are calculated by iteratively concatenating each predicted value with the time series and repeating the process.

Spatiotemporal modelling
Classically, geostatistical data has been modelled as a continuous Gaussian field 47 . This is an n-dimensional process from which any subset of values can be described by a multivariate Gaussian distribution, defined by its mean and a covariance function. Having fitted a model of this kind to spatially indexed data it is possible to use Markov chain Monte Carlo (MCMC) methods to sample from a posterior distribution defined by the Gaussian field and the data. However, this is a computationally expensive method which scales very poorly (O(n 3 )) due to the high cost of performing algebraic operations with large, dense covariance matrices 48 . As an alternative to this, we employed an approximate Bayesian computation (ABC) method that uses stochastic partial differential equations (SPDE) to derive an approximate probability distribution from which inference can be made using integrated nested Laplace approximations (INLA) 48,49 .
The INLA approach first requires a Gaussian field to be approximated as a discretely indexed Gaussian Markov random field (GMRF) 48 . This is defined on a regular lattice with a precision matrix Q (the inverse of the covariance matrix). The assumption that is made in formulating a GMRF is that the conditional distribution of a single element x i is only determined by it's neighbouring elements δ i . Formally where π denotes a probability distribution. Therefore, a sparse precision matrix can be defined, which only need specify the conditional relationship between each element and it's immediate neighbours. For example, an autoregressive model of order 1 32 defined as modelled as a GMRF would have the following precision matrix: where all the unspecified elements are zeros. SPDE describes the method used to define a GMRF for a given spatiotemporal model.
The GMRF itself, formulated with SPDE, has a lattice (or mesh) structure where observed data points are triangulated between vertices of the lattice. Any point in the GMRF X(s) can then be described as a piecewise combination of functions each corresponding to a vertex in the lattice: where n is the total number of edges connected to point X(s), ψ l is a basis function corresponding to edge l and ω l is the Gaussian distributed weight associated with the edge defined by the covariance matrix Q.
To complete the spatiotemporal model, a 1-dimensional mesh is constructed using SPDE with vertices connecting each point x i in the 2-dimensional mesh at time t, with the same location x i at times t + 1 and t -1.
Given a spatiotemporal model formulated as a GMRF by utilising SPDE, the INLA algorithm can be applied to approximate the posterior distributions of parameters of interest using deterministic simulations 49 . This is considerably more efficient than stochastically sampling from the posterior distribution of a continuously indexed field using an MCMC algorithm. For full details see Rue et al. 49 , Bakka et al. 50 and Lindgren et al. 47 .

Modelling covariates
As described above, we also gathered data on covariates that were likely to be associated with the prevalence of antibiotic resistance. An advantage of all the algorithms described above is that they can easily be extended to include covariates by including another set of inputs, each corresponding to the value of a covariate in a given time and location.
Evaluating forecast accuracy The forecasting models described above were applied to the European and English data sets. To allow the accuracy of each model to be determined, for an h-step ahead forecast, the time series of length T was split to remove the last h observations (the testing data). The model was then fitted to the remaining T-h observations (the training data) so the difference between the forecasted data and the real data could be observed.
The single time series forecasting models were fitted to each location individually and the INLA model was fitted to the entire data set and forecasts produced for each location. Two measures of accuracy were calculated for each forecast to allow comparison of the different models. Firstly, the root mean squared error (RMSE) (Equation 7) was calculated to assess ψ the magnitude of the forecast error. RMSE is a proper scoring rule as defined by 51, meaning the objective of the forecasting task will always be to minimise this value.
In Equation 3 h is the forecasting horizon and e i is the error associated with each forecasted point in the time series.
Secondly, the percentage of real data points that lie within the 95% prediction interval (PI) of the forecast was calculated to determine how accurately each algorithm assessed its own uncertainty.
All of the above analysis was performed in the R statistical programming language version 3.6.1 52 . The INLA model was implemented within the R-INLA package version 18.07.12 49 , and the single time series forecasting models within the forecast package version 1.26 53 .

Results
The following sections describe the results of our analyses of the European and English antibiotic resistance surveillance data sets. For each data set we first provide a comparison of the results of the temporal forecasting algorithms applied to data from individual locations; we then consider the impact of including relevant covariates in the forecasting model; and finally we present the results of the spatiotemporal forecasts produced by the INLA model. In each case the forecasting algorithms were applied to the training data, which were the first T-h time points describing the proportion of isolates that were resistant in a given location (where T is the total length of the time series, and h is the forecasting horizon). Forecast accuracy is evaluated by comparison to the final h data points in a time series (the testing data).
European data: single location forecasting We began our analysis of these data by splitting the data by country and applying single time series forecasting models with a forecasting horizon of two years to each country's data. For these data we compared three forecasting models: ARIMA, Expected-Trend-Seasonal (ETS), and Neural Network (NN) autoregression. The Teodoro and Lovis model was not applied to these data because there were only six data points meaning the Intrinsic Mode Function (IMF) algorithm nearly always returned the original time series as the residual, and no additional components. Therefore, the model was reduced to just the k-nearest neighbours (KNN) algorithm. Finally, we also included a naïve forecasting model as a benchmark comparison.
Box and whisker plots in Figure 1 show the root mean squared errors (RMSE) of each forecasting model applied to each location in the data set. The eight individual plots represent each species-antibiotic combination for Acinetobacter spp., E. coli, K. pneumoniae or P. aeruginosa, and carbapenems or fluoroquinolones.
The accuracy of the forecasts produced by the alternative models is highly variable between data sets. The relative accuracies of the four forecasting models were comparable between species and phenotypes. The neural network autoregression model consistently produced the least accurate forecasts (indicated by the greatest median RMSE) which is probably a result of the neural network overfitting to a small data set. The worst performing model, based on the highest observed RMSE (53%), was the NN model for the P. aeruginosa -fluoroquinolones data. The forecast RMSE produced by the naïve model was the lowest observed RMSE in almost every instance, followed by the ARIMA forecasts. Finally, on average the ETS forecasts had slightly greater RMSE values than the equivalent ARIMA or naïve forecast. Hence only the ARIMA method showed the potential to outperform a naïve model.
The median RMSE for each forecasting model was relatively consistent across data sets: between 0% and 7% for every species -antibiotic combination with every forecasting model. It should be noted however, that this is probably skewed by the fact that the mean value of many of the individual time series was very close to zero. As there are limits applied to the outputs of the forecasting models to prevent them from predicting values greater than 100% and less than 0%, if each observation in a time series is very close to either limit (and there is minimal fluctuation in observed values) then the forecasts will be maintained artificially close to the testing data as any trend is curtailed by the upper or lower forecast limit.
An alternative measure of model accuracy to RMSE is the percentage of observations in the training data that lie within the 95% prediction interval (PI) of the forecasting model ( Figure 2). This also demonstrates that the naïve forecasting model produces the most accurate projections with these data for the vast majority of time series, the only exception being the Acinetobacter spp.-carbapenems data, for which the mean prediction interval accuracy was slightly higher for the ARIMA forecasts. In every case the mean PI accuracy of the ETS forecasts was lower than the ARIMA and naïve forecasts, but always within one standard error of the ARIMA forecast. Finally, the PI accuracy of the NN forecasting model was as low as a 0% for many of the individual time series, and the greatest mean PI accuracy with this model was 25% for the E. coli -carbapenems data. This reflects both the poor accuracy of the point estimates of the NN model (highlighted by Figure 1), and the difficulties of estimating error outside of a defined frequentist or Bayesian framework. Prediction intervals for the NN model are obtained by performing multiple simulations randomly drawing error terms from the training data, and therefore it is a natural tendency of NN models to overestimate their own precision when making predictions outside of the range of the training data. This confirmed that the ARIMA and naïve models best predicted these data.
It is possible that when the coverage of the 95% PI of a forecasting model is high, this is a result of the model predictions having very high variance, meaning the PI often spanned a very wide range of possible values. To test this possibility, we also calculated and plotted the 75% PI for each forecasting model (Extended data, Figure 3 43 ). A perfect forecasting model would have a 95% PI that included 95% of all future data points and a 75% PI that included 75% of all future data points. The reduction in percentage accuracy of the 75% PI compared to the 95% PI was greater than 20 percentage points for every ARIMA and ETS model. However, the difference in coverage of the 95% and 75% PI for the naïve and NN model was not always this great. (For the NN model this is because the 95% PI accuracy was already approaching zero with many of the data sets.) Figure 2 and Extended data, Figure 3 indicate that all the models have a tendency to overestimate their precision when fitted to these data. However, the magnitude of this overestimation was least with the naïve model, followed by ARIMA, ETS, and then the NN autoregression model.
Having tested each of the forecasting models on individual time series we then considered incorporating relevant covariates for which we had data into the forecasting models. These were: rate of consumption of the relevant antibiotic, expressed in defined daily doses (DDD) per 1000 people per day; annual cases of influenza-like-illness (ILI) per 100,000 people; and annual gross domestic product (GDP). Extended data, Figure 4 shows the relationship between each of the covariates and the percentage of Acinetobacter spp. isolates that were resistant to carbapenems 43 . It is clear there is no correlation between any of the covariates and the outcome. This relationship was observed for every species and resistance phenotype. Therefore, we did not extend any of the single time series forecasting models to include covariates, as there was insufficient evidence that any of the covariates were directly associated with the outcome.
European data: spatiotemporal forecasting To further investigate the relationship between antibiotic consumption, GDP, and ILI, and the rate of antibiotic resistance, we hypothesised that there could be a more complex association driven by the spatial distribution of the covariates and potentially a lag in the emergence of resistance following a change in the level of a given covariate. An example of this phenomenon is described by Volz and Didelot 54 , who observed a lagged spike in the effective population size of methicillinresistant Staphylococcus aureus (MRSA) in the USA following a sharp increase in the rate of consumption of beta-lactam antibiotics.
For this we employed a spatiotemporal model fitted with the integrated nested Laplace approximation (INLA) algorithm. This model also captures the spatial correlation in antibiotic resistance as well as the temporal trend. We first fitted the INLA model to the resistance data without any covariates and a range of values of the autoregressive parameter. Increasing the autoregressive parameter beyond 2 years did not improve the model fit any further and we therefore used this as the optimal value for all analyses of these data with the INLA model. We then fitted the INLA model to the data for each species and phenotype with each covariate individually. Table 1 shows the point estimate and 95% credible interval of the coefficient for each covariate with each species and resistance phenotype.
The data in Table 1 provides some evidence that the rate of resistance is correlated with consumption when spatial and temporal trends are accounted for by the INLA model. This association was only significant (p = 0.05) for Acinetobacter spp., K. pneumoniae, and P. aeruginosa resistance to   Figure 3 (the RMSE of the naïve and ARIMA forecasts are also provided for comparison).
The accuracy of the INLA forecasts was broadly comparable to the ARIMA forecasts. However, the median RMSE of the naïve forecasts is marginally lower for every species and resistance phenotype. This indicates that, although there was an observable relationship between antibiotic resistance and consumption when spatial and temporal correlations were accounted for, there was not enough signal in these data to improve upon the accuracy of a naïve forecasting model.

English data: single location forecasting
To test whether finer temporal and spatial resolution would improve the performance of these predictive algorithms, we applied them to the Fingertips database of disease in England. Firstly, we split the data by CCG and applied five forecasting algorithms to the data from each location individually. These were ARIMA, ETS, NN autoregression, Teodoro and Lovis forecasting, and naïve forecasting. For these data we used a range of forecasting horizons between 1 and 4 quarters to observe the difference in accuracy when forecasts are made further into the future. The RMSE of the forecasts produced by each model for each resistance phenotype with forecasting horizons ranging between 1 and 4 quarters are displayed in the box and whisker plots in Figure 4. Plots A-D represent forecasts of E. coli resistance to 3rd generation cephalosporins (3gC), ciprofloxacin, gentamicin, and piperacillin/tazobactam respectively, over forecasting horizons of 1-4 quarters.
Box and whisker plots in Figure 4 firstly demonstrate that the median RMSE of each forecasting algorithm increases as forecasts are calculated for a more distant forecasting horizon, increasing by approximately 1.5% between quarters 1 and 4 for each algorithm. This observation is consistent across phenotypes. The median accuracy of the forecasts produced by each algorithm are also broadly comparable across phenotypes. For example, the median RMSE of the ETS forecasts performed with a forecasting horizon of 4 quarters ranged between 2.98% for E. coli resistance to gentamicin and 3.84% for resistance to ciprofloxacin. With regards to the relative accuracy of each forecasting model, NN autoregression is consistently the least  accurate algorithm for all phenotypes and forecasting models, followed by ARIMA. The median and maximum RMSE of the forecasts produced by the naïve model are very similar to those produced by the ETS model: for E. coli resistance to 3gC, ciprofloxacin, and piperacillin/tazobactam the median RMSE of the naïve forecasts was marginally lower than the median RMSE of the ETS forecasts, whereas for E. coli resistance to gentamicin the median RMSE of the ETS forecasts was lower (2.08% and 2.07%). Finally, the Teodoro and Lovis were always equivalent to a naïve forecast in this population 28 (see Discussion).
Plots A-D in Figure 5 refer to E. coli resistance to 3gC, ciprofloxacin, gentamicin, and piperacillin/tazobactam respectively, and show the percentage of future observations in the time series that are within the 95% PI of each forecasting algorithm. (Teodoro and Lovis forecasting is excluded because it does not produce any measure of uncertainty). As was observed with the European data, the percentage of future observations that are within the 95% PI of the NN forecasts is far lower than with any other forecasting model (less than 11% for every phenotype with every forecasting horizon). The 95% PI of the naïve forecasting algorithm also contains the most future data points of any of the forecasting models (greater than 90% for every forecasting horizon with every phenotype with the exception of cephalosporins, for which it was between 88% and 90%). This was also observed with the European data. However, unlike the forecasts of the European data, the 95% PI of the ETS model contains a greater percentage of future observations than that of the ARIMA forecasts in every case. The RMSE of the ETS forecasts is also consistently lower than the RMSE of the ARIMA forecasts ( Figure 4).
We also plotted the percentage accuracy of the 75% PI for each forecasting model applied to the English data (Extended data, Figure 5 43 ). The difference in coverage of the 95% and 75% PI was far less with the English data than with European data: the mean percentage accuracy of the 75% PI was between 60% and 75% with both the ETS and naïve model applied to every data set.
Having observed the relative accuracy of each forecasting model applied to the time series of E. coli antibiotic resistance in each CCG, we then considered incorporating covariates into the forecasting models for these data. These were the rate of E. coli bacteraemia and the amount of each antibiotic that was prescribed per CCG. Extended data, Figure 6 shows the relationship between these covariates and resistance of E. coli to 3gC considering each point in space and time individually 43 . As was observed with the European data, there is no correlation between either covariate and the prevalence of antibiotic resistance (this relationship was consistent for every E. coli phenotype). Therefore, we did not include either covariate in the single time series forecasting models. However, it is important to note that this analysis did not capture the spatial correlation or the temporal correlation of resistance with either covariate.
English data: spatiotemporal forecasting To further investigate the spatiotemporal distribution of E. coli antibiotic resistance in England and the covariates described above, we fitted the INLA model to the data describing each resistance phenotype and each of the two covariates. The autoregressive parameter was set to 4 so the model could capture any seasonality in the data. Table 2 shows the coefficients of the INLA model fitted to the data for each E. coli resistance phenotype with each covariate. Although the effect sizes are small (0.022-0.036), the model indicates that the rate of E. coli bacteraemia is significantly associated (p = 0.05) with resistance to each of the four antibiotics of interest when spatial and temporal trends are accounted for. It is unclear, however, whether the rate of bacteraemia is a driver of resistance or alternatively, increases as a result of increasing rates of resistance (see Discussion). The INLA coefficient describing the relationship between the rate of prescription of piperacillin/ tazobactam and piperacillin/tazobactam resistance was also statistically significant (p = 0.05). However, the relationship between prescription of, and resistance to each of the other three antibiotics was not deemed significant after accounting for the spatiotemporal distribution of these data. Therefore, forecasts were produced from the INLA model with bacteraemia included as the only covariate. A forecasting horizon of four quarters was used; the RMSE of these forecasts is presented in Figure 6 alongside the RMSE of each of the naïve forecasts and the most accurate alternative single location forecasting model for these data (ETS). Plots A-D refer to E. coli resistance to 3gC, ciprofloxacin, gentamicin, and piperacillin/tazobactam respectively.
The box and whisker plots in Figure 6  It is important to note that all the forecasting models, including the naïve model, produced more accurate results on average with the English data than with the European data. Therefore, it should not be assumed that the ETS and INLA forecasts only appeared to do better with these data because they are smoothing over noise in the data which is being captured by the naïve model. It is possible that in some cases the ETS model is able to more accurately forecast the English data because it captures seasonality in the data. However, this is unlikely to be the case for the majority of time series' because E. coli resistance to the drugs of interest here is not highly seasonal, when averaged across English healthcare facilities. These data are shown in Extended data, Figure 7 43 . We also note that the rate of prescription of all four antibiotics in England was not found to be seasonal in these data: Extended data, Figure 8 43 .

Discussion
The purpose of the analysis presented here was to investigate the plausibility of performing accurate and precise data-driven forecasting of the future prevalence of antibiotic resistance. To do this we identified two open access data sets describing the past prevalence of antibiotic resistance at differing spatial and temporal scales (see Methods for a description of the two data sets). This allowed us to question how differences in the data a forecasting model is fitted to affect the accuracy of the model predictions. Significant differences in the accuracy of each forecasting model, including the benchmark naïve forecasting model, were observed between species. These differences are immediately apparent when we consider Figure 1 and Figure 3, showing the RMSE of each forecasting model applied to Europe-wide data describing the prevalence of resistance to fluoroquinolones and carbapenems of each of the four species. The RMSEs of the forecasts for each model applied to the data describing Acinetobacter spp. and P. aeruginosa are broadly comparable. However, they are both clearly greater than the RMSE of each model prediction when fitted to the data describing E. coli resistance to either antibiotic class. RMSEs of the forecasting models fitted to the K. pneumoniae-fluoroquinolones data are similar to those of Acinetobacter spp. and P. aeruginosa data. However, the RMSEs of the models fitted to the data describing K. pneumoniae resistance to carbapenems are considerably lower than that of either Acinetobacter spp. and P. aeruginosa. All four of these species are environmental opportunistic pathogens, and are commonly associated with hospital acquired infections (HAI) [55][56][57][58][59] . Therefore, it is possible that the differences in the accuracy of these forecasts is driven by alternative exposures to antibiotics within the community as well as within health-care settings. For example, their individual ecologies will determine the degree of their exposure to antibiotics prescribed in hospitals and the community, as well as antibiotics used in agriculture and aquaculture 2,5 . The data we had access to were aggregated data on hospital and community consumption in Europe 37 so it was not possible for us to investigate the relationship between resistance in each species and different uses of antibiotics.
Other factors which could potentially explain species differences in the accuracy of the forecasts produced by the models include total prevalence of the species in the population and the seasonality of disease. For example, the ETS method explicitly models seasonality 32 , whilst other models such as ARIMA, NN autoregression, and INLA will implicitly capture seasonality if the degree of the autoregressive component of the model is at least equal to the period of seasonality in the data 32,49 . Unfortunately, the European data was aggregated per year, so it was not possible to investigate how seasonality of disease impacts on forecast accuracy with these data. However, the English data (which was recorded per quarter) did not have noticeable seasonal trends, in contrast with previous studies of antibiotic resistance which have concluded that resistance is highly seasonal 60,61 .
Differences in population genomics could be a further contributory factor to species differences in the accuracy of antibiotic forecasting models. Compared to eukaryotes, bacterial species are notoriously difficult to define 62,63 , and studies have shown that ecological perturbations can result in dramatic changes to the genetic and phenotypic characteristics of bacterial populations [62][63][64][65] . Therefore, ecological differences, and environmental changes that the data we had access to here did not capture, are also likely to be important in driving the dynamics that determine the rate of antibiotic resistance.
There are also some differences in the accuracy of the forecasts produced regarding resistance to different antibiotics. For example, the forecasts of fluoroquinolone resistance were consistently more accurate than those of carbapenem resistance in Europe. However, it is unclear whether these differences reflect biological variation in the epidemiology of resistance to alternative antibiotics, or simply stochastic variation in the amount of noise in alternative datasets. It is reasonable to expect that there may be some natural variation in the accuracy of forecasts produced corresponding to resistance to different antibiotics, related to the way resistance is transmitted within bacterial populations and the alternative ways in which antibiotics are used in healthcare. Secondly, the differences in forecast accuracy between antibiotic resistance phenotypes could be correlated with the alternative ways in which antibiotics are used, such as the mode of delivery or the patients they are offered to (inpatients versus outpatients). One objective of future work in this field should be investigating these species and resistance phenotype differences in forecast accuracy further.
Comparing Figure 5 and Figure 6 (the RMSE and mean PI accuracy of the forecasts with the English data set, which was recorded per quarter per CCG) with Figure 2 and Figure 3 (the corresponding Figures for the European data set, recorded per year per country), demonstrates that increased spatial and temporal resolution of the data is associated with improved forecast accuracy. An in-depth comparison of the two sets of results could be misleading, as the data describe different species and resistance phenotypes. However, the results do indicate that forecast accuracy is improved with data which is aggregated at smaller units in space and time.
Considering the plots describing the RMSE and mean PI accuracy of the alternative models for both the European and English datasets demonstrates that there are significant differences in the accuracy of forecasts produced with each model. The Teodoro and Lovis 28 model is one of only two previously published forecasting models for antibiotic resistance at the time of writing. As indicated previously there are significant flaws with this model: firstly, using the kNN algorithm to produce forecasts from an alternative representation of the data is more susceptible to error caused by noise in the data than fitting coefficients to different components of the time series. Secondly, for any forecasting task it is always important to have an awareness of the uncertainty in the model outputs, but unfortunately this model does not produce any measure of uncertainty. Thirdly, there is a problem applying the method that Teodoro and Lovis adapted from Chakrabarti and Faloutsos 45 . When selecting the optimal lag for creating the embedding space in which kNN will be performed, Chakrabarti and Faloutsos state that the maximum lag which should be considered is the point at which increasing the lag any further does not significantly increase the fractal dimensionality of the data any further. This is defined as an increase in the fractal dimensionality of the data of less than max (0.3,0.1A m ) where A m is the moving average of the fractal dimension 45 . However, with every time series we applied this algorithm to, the increase in the fractal dimension after increasing the lag from 1 to 2 was always less than 0.3, therefore the kNN step of the algorithm was always applied to vectors of length one, containing only the last observation in the time series. Hence, the output was always equivalent to a naïve forecast for the published surveillance data of the type we used here, without changing these parameters.
The NN model performed significantly worse than the simpler statistical forecasting models. This result is consistent with the results of previous comparisons of forecasting models [33][34][35]66 which show that applying complex machine learning models to time series forecasting tasks is rarely justified, and they can often be reduced to simpler models. This second observation was demonstrated by Hyndman and Billah 66 who showed that the "theta model" 67 , which won the M3 forecasting competition 33 , is equivalent to a simple exponential smoothing model with drift 32 .
It appears that the ARIMA model was more accurate than ETS for forecasting the European data. However, for the finer resolution English data set the ETS model was considerably more accurate. This was the only case in which an alternative forecasting model performed better than the benchmark naïve forecasting model. This indicates that with good quality surveillance data ETS forecasting would be the best method, of those considered in this study, for producing forecasts from a single location.
Finally, we also applied the INLA algorithm 49 to extend the forecasts to include spatial correlation, and covariates where justified. Interestingly, this did not improve upon the forecast accuracy achieved with an ETS model. This implies that there may not be sufficient spatial signal in the determinants of antibiotic resistance for any more predictive accuracy to be gained from modelling the spread of resistance as a spatial process. Another objective of future work in this field should be to test this assertion with finer resolution data, which was purposefully collected to minimise sampling bias. It will be useful to extend the work we have begun here by using data aggregated at a range of spatiotemporal scales, so the optimal scale for forecasting can be determined.
Whilst fitting the INLA model to the data did not improve upon the forecasting accuracy of the single time series forecasting models, it did provide some useful insight into the relationship between the covariates, and the outcome. We did not find any evidence of a relationship between GDP and antibiotic resistance, despite the fact that the prevalence of antibiotic resistance is known to be inversely correlated with wealth 11 . This is probably explained by the very high spatial aggregation of the European data, and the fact that all the countries represented in the data were high income countries 68 so the gradient of GDP was not great enough for this relationship to be detected by the model. The rate of prescription or consumption of antibiotics was also not found to be significantly correlated with the prevalence of resistance in many cases, which is also an unexpected result. For the European data this is possibly explained by the signal being obscured by the high spatial and temporal aggregation of the data. However, the English data, recorded in the Open Prescribing database 42 , reports the number of items of antibiotics that were prescribed. Consequently, each prescription was treated as equal, regardless of dosage, length of course of antibiotic, or mode of delivery. This likely introduces some noise into the data. Additionally, the Open Prescribing database describes the amount of antibiotic prescribed in primary care, and therefore does not include information on prescribing in hospitals, which is likely to be highly relevant to HAIs. Finally, fitting the INLA model provided some evidence of an association between the rate of bacteraemia and resistance. However, it is probable that this relationship is due to resistant infections being more likely to result in cases of bacteraemia as the infection is harder to treat, rather than bacteraemia being a determinant of resistance.
Having tested a range of forecasting models, including two previously published antibiotic resistance forecasting models 28,30 , we conclude that ETS forecasting 32 is likely to be the most appropriate model for producing forecasts of antibiotic resistance within a given location. Extending the forecasting model to incorporate spatial information using INLA did not improve the average predictive accuracy of the ETS forecasts. However, there were significant caveats with the data that potentially reduced the accuracy of all the models tested compared to a benchmark, naïve forecasting model. To extend this work it will be necessary to gather better quality data from which forecasts can be produced. This data should be collected following a rigorous protocol to limit bias. Ideally, a range of spatiotemporal units would be used so that the optimal scale for forecasting could be determined, but the evidence we provide here indicates that data should be aggregated in smaller spatial units than the countries of Europe and smaller temporal units than years. It may also be useful to incorporate data on the rate of consumption of antibiotics in hospitals and the community, as well their rate of use in agriculture and aquaculture, in order for the model to construct a more complete picture of the ecological forces which drive the spread of antibiotic resistance.