Inference for a spatio-temporal model with partial spatial data: African horse sickness virus in Morocco

African horse sickness virus (AHSV) is a vector-borne virus spread by midges ( Culicoides spp.). The virus causes African horse sickness (AHS) disease in some species of equid. AHS is endemic in parts of Africa, previously emerged in Europe and in 2020 caused outbreaks for the first time in parts of Eastern Asia. Here we analyse a unique historic dataset from the 1989–1991 emergence of AHS in Morocco in a naïve population of equids. Sequential Monte Carlo and Markov chain Monte Carlo techniques are used to estimate parameters for a spatial–temporal model using a transmission kernel. These parameters allow us to observe how the transmissibility of AHSV changes according to the distance between premises. We observe how the spatial specificity of the dataset giving the locations of premises on which any infected equids were reported affects parameter estimates. Estimations of transmissibility were similar at the scales of village (location to the nearest 1.3 km) and region (median area 99 km 2 ), but not province (median area 3000 km 2 ). This data-driven result could help inform decisions by policy makers on collecting data during future equine disease outbreaks, as well as policies for AHS control.


Introduction
African horse sickness (AHS) is a disease of equids caused by African horse sickness virus (AHSV) belonging to the Reoviridae family, genus Orbivirus. This vector-borne disease, spread between equids by biting midges (Culicoides spp.), is endemic in parts of Africa. Historic outbreaks, caused by serotype-9, have occurred in other (non-endemic) regions of Africa, the Middle East and Asia and Europe (Spain) (Anwar and Qureshi, 1973;Howell, 1960;Carpenter et al., 2017;Mellor and Boorman, 1995). In July 1987, equids in Spain became infected with AHSV serotype 4, which is believed to have been caused by importation of sub-clinically infected zebras from Namibia. New infections continued to arise until October, when lower temperatures restricted the ability of the virus to spread. Although it was initially hoped that this would eradicate AHSV from the region, the virus successfully overwintered (Lubroth, 1988;Thompson et al., 2012). The next year, the virus persisted in Spain alone, however in 1989 it spread to Portugal and Morocco. The outbreak in Portugal only lasted 13 weeks due to the introduction of a policy of mass vaccination, strict movement controls and culling of equids housed at the same premises where infected equids were identified. Overall, 206 Portuguese equids were infected and the total cost was estimated at 2 million US dollars (Portas et al., 1999). The next year, Spain eradicated the virus after a monovalent vaccine was administered to 38,000 susceptible equids. During the outbreak in Spain, there were 400 infectious cases and a further 900 were culled in an attempt to control the spread of AHSV (Rodriguez et al., 1992;Mellor, 1993). The virus was not eradicated from Morocco until 1991 (Baylis et al., 1997). More recent outbreaks include serotype 2 in Nigeria and Senegal and serotype 7 in Senegal in 2007 (Diouf et al., 2013) and serotypes 2, 4, 6, 8 and 9 in Ethiopia (Aklilu et al., 2014(Aklilu et al., ) in 2007(Aklilu et al., -2010. Remarkably, in 2020 AHS (serotype 1) occurred for the first time in Thailand and Malaysia (Castillo-Olivares, 2021;Lu et al., 2020), this is the furthest east the disease has emerged. This demonstrates the threat this disease continues to pose in both Africa and other parts of the world, including countries historically unaffected by the disease.
In naïve populations of horses, mortality due to AHS may exceed 90% in epidemics (Castillo-Olivares, 2021). There is no specific treatment for animals with AHS but vaccines are available for all nine serotypes. In endemic regions, a multivalent live-attenuated vaccine is used. However, there are concerns regarding use of live-attenuated https://doi.org/10.1016/j.epidem.2022.100566 Received 24 October 2021; Received in revised form 4 April 2022; Accepted 10 April 2022 AHSV vaccines because of their capacity for reversion to virulence and transmission by midges, the potential for reassortment between the vaccine and field strains of virus leading to novel viruses and teratogenic effects (Van Rijn et al., 2020;Mellor and Hamblin, 2004). Inactivated vaccines avoid these potential drawbacks and monovalent inactivated vaccines have been used to control outbreaks where the virus is not endemic, e.g. in Iran in the 1960s and Spain in 1993(Castillo-Olivares, 2021. The emergence of bluetongue virus (BTV), a closely related virus also transmitted by Culicoides, in Europe in 2006 has raised concerns about the potential of AHSV outbreaks in Europe's naïve equine populations (Zientara et al., 2015).
Spatio-temporal models are often used to understand viral outbreaks in livestock species, including BTV (Szmaragd et al., 2009;Boender et al., 2014). Previously, models have been used to assess the spatiotemporal risk of AHSV, but not transmission between premises (Faverjon et al., 2015;Lo Iacono et al., 2013). A key difference in our ability to accurately model outbreaks of BTV is the greater quality of data available on distribution and movement of cattle and other livestock species compared with that available for equids in many countries. In this study, we used a spatial dispersal kernel model to analyse data on the emergence of AHSV in Morocco in 1989. Here we know the locations of the premises on which infected equids were identified ('infected premises') but not of the 'uninfected premises'. However, data available on the estimated number of equids in each province is used to approximate their spatial distribution. We estimated parameters given the data at the province, region and village level for the infected premises. This allowed assessment of how the quality of spatial data available influences how well we can model the outbreak.

Data
Data for the Moroccan outbreak from 1989-1991 described by Baylis et al. (1997) were obtained in 1993 by examining case records held at the Ministére de l'Agriculture et de la Réforme Agraire, Rabat, to obtain the names and locations of all villages in Morocco where cases of AHS were reported. The latitudes and longitudes of these villages were obtained by identifying named villages on 1:50,000 Morocco survey maps.
Various datasets were available as estimates for the population number and distribution of equids. Some gave estimates for individual rural communities based on vaccination records. However, these did not include all communities in the outbreak area. The dataset documents were hard copies translated into French close to the time of the outbreak. This caused issues, with many villages having multiple translations from Arabic to French, which varied across the datasets. The minimum scale at which all equine population estimates were available was the province. Shapefiles for Morocco were available from Anon (2018) at four levels, two of which matched the data available. These are referred to here as the provinces (level 2) and regions (level 4). Due to many of the names of regions differing between the historic records and modern shapefiles, the co-ordinates for each infected premises were plotted using QGIS and assigned the region name given in the shapefile these co-ordinates were within. Some cases in Tanger province were found to be in another province called Fahs (not mentioned in the French dataset). The province names were updated to those given in the shapefiles.
The data included an estimate of the number of equids in each province. Due to this not being at the species level, the data of infected animals were simplified so they were not species specific. Therefore, for infected premises, the data available were: date infected, province (all premises), region (270/271 premises), latitude and longitude of village (243/271 premises), number of equids infected and total number of equids on premises. No information was available for the uninfected premises.

Spatial distribution
Uninfected premises are assumed to have the same distribution of the total number of equids on each premises as the infected premises. Therefore, a gamma distribution is fitted to the total number of equids on infected premises, giving a shape parameter of 1.85 and scale parameter of 1.24. Using the estimates for the total number of equids in each province equids that did not become infected were divided into premises such that the total number of equids on uninfected premises followed the same gamma distribution as the infected premises. As we only know the province of these simulated uninfected premises they are randomly placed in the shapefile of the province they are in. Here an additional shapefile was made, using QGIS, combining Tanger and Fahs to match the historic data. The infected farms are placed randomly in the shapefile of the province or region they are in for the province or region level of spatial specificity, respectively. As the latitude and longitude co-ordinates for the village of infected premises are given to the nearest minute, we randomly place these within a circle of radius 30 s from the co-ordinates of the village. This also avoids multiple premises being assigned the same location.

Transmission model
The incubation period for AHSV determined from the literature (Fairbanks et al., 2021), varied from 2 days to 11 days with mode 3 days. The latent period varied from 3 days to 10 days with mode 4 days. Delays between the onset of clinical signs, their observation, a veterinarian officially diagnosing and culling taking place has potential to vary depending on location, day and management of premises. Due to the variability in these factors, two premises infected on the same day could potentially be reported several days apart. Therefore we do not consider the latent period between premises in the model.
The distance kernel, ( ), is used to describe how the transmissibility of AHSV varies in relation to the distance of an infectious location ( ) from a naïve location ( ). For each spatial distribution the distance between premises is calculated using the haversine function. We only calculate the distance between infected premises and all other premises to reduce computational costs.
The force of infection ( ) on farm from farms on day is calculated as where is the number of equids on premises , is the distance between premises and , is the kernel-function, and ( ) is 1 if premises is infected on day and 0 otherwise. is the transmission parameter. We assume that premises become infected with an exponential distribution with − ( ) being the probability that a premises does not become infected on day . The likelihood is therefore given as where is the set of premises with no report of infected equids during the outbreak, is the set of premises with reported infected equids during the outbreak, 0 is the start of the outbreak and is the time at which infection is detected on an infected premises. This method does not require the model to be simulated, therefore reducing computational costs.
To identify the most appropriate kernel model, PubMed was searched using the search terms (disease OR virus) AND ∼model AND kernel AND ∼transmission'. Eligible studies included fitting parameters for a distance-based kernel model of disease spread between livestock premises. Supplementary file 1 describes results from the literature review. The selected kernel is given as Szmaragd et al. (2009) attempted to fit this kernel (Eq. (3)) to the spread of BTV in Northern Europe during 2006. Here multiple values of were fit to the data with = 2 yielding the best fit, therefore we set to 2. The sequential Monte Carlo-Markov chain Monte Carlo (SMC-MCMC) approach was applied as described in Supplementary file 2. As the likelihood was not numerically stable (converging to 0), the log-likelihood

Parameter estimation
was used. Initially, to derive our prior, we consider the parameter values fitted by Szmaragd et al. (2009) to explain the spread of BTV-8 in Northern Europe during 2006 for the kernel (Eq. (3). Our kernel is less informed than Szmaragd et al. (2009) as it does not include parameters such as the probability of acquiring infection, which evidence suggests may be different for AHSV and BTV .
Our parameter priors are therefore calculated based on the BTV parameters as: = the probability of acquiring infection x√ ,

=̂2,
wherêis the kernel parameter ( ) from Szmaragd et al. (2009). This gives values of = 0.0108 and = 0.0012. To initialise the model, Latin hypercube sampling (LHS) is used to select parameters from a uniform distribution with minimum of 0.5x and maximum of 2x the calculated values for and . This yields a lower bound of 0.0054 and 0.0006 and upper bound of 0.0216 and 0.0024 for and , respectively.
As the spatial distribution of uninfected premises is estimated, it may affect the parameter estimation. Therefore, each iteration we generate multiple spatial distributions that each have a different set of parameters depending on the results of the previous iterations. We use a modified version of Ripley's L function (Supplementary file 3) to determine how the spatial homogeneity of the distribution of randomly distributed uninfected premises varied around the infected premises across spatial distributions. If the spatial distributions are similar, this will cause less variation during parameter estimation, therefore fewer iterations will be needed than if they are very different. As little variability was observed between spatial distributions, we conclude that a large number of different spatial distributions is not necessary at each MCMC iteration (Supplementary file 3).

Results
It was found that disease outbreaks were widespread across the country (Fig. 1), with the greatest number of reported cases in the northwest (1989)(1990). For 1991 there were many more premises that could not be added to the map due to lack of spatial data, but it is possible to observe the increased range of the virus. In 1991, there were significant numbers of infected equids in Marrakesh province. Despite having a large equine population, there were no cases reported west of the Atlas Mountains. During the first year of the outbreak, 271 infected premises were recorded, resulting in a total of 518 cases and an additional 106 equids culled. Cases were reported in the provinces of Larache, Tanger and Tetouan.
Initially during the parameter estimation using the latitude and longitude of villages, we observe a large deviation from the prior (Supplementary figure S3). The first 100 iterations are removed as burn-in. After this, parameters appear to follow normal distributions ( Table 1). As appears somewhat skewed, a log-normal distribution was fitted, but this yielded the same result.
As the spatial specificity decreases, the variance of accepted parameter values increases (Supplementary figures 3-5). The region data provide parameter estimates much more similar to the village data than the province data (Table 1). Another more simple approach to analysing the patterns of transmission involves calculating the distances between the premises where latitude and longitude of the villages of infected premises are given and calculating the minimum distance of each premises from a previously infected premises (Fig. 2a). We observe that most of these distances are short (<5 km), although there were some cases of transmission occurring over larger distances. Comparing outputs from this and our transmission kernel shows that our kernel also predicts most of the transmission occurring locally (Fig. 2b).
The prior and posterior values for the kernel shape parameter were similar at the province level (Fig. 2b). However, the posterior values for AHSV at the village and region levels suggest that AHSV transmission during the Moroccan outbreak occurred over shorter distances than during the 2001 BTV outbreak in northern Europe, from which the prior values were derived.

Discussion
In this paper we were able to parameterise a model for the 1989 emergence of AHSV in Morocco for three different spatial specificities of data on infected equids (village, region and province). Data-informed models are important for stakeholders with responsibility for controlling outbreaks in naïve populations, which are increasingly likely with climate change and globalisation (Rocklöv and Dubrow, 2020). Previously, a lack of data has prohibited modelling the transmission of AHS between premises, but this unique dataset allowed a first attempt at this. One of the biggest challenges encountered was the computational demand of this large individual-scale model. Missing data were compensated for using multiple random distributions. Here the Ripley's L function was used to determine the spatial variation between these random distributions. Observing that there was little variance in spatial homogeneity between premises near infected premises allowed the number of SMC steps to be reduced each iteration. Previous research has shown the optimal acceptance rate is 0.234 under quite general conditions (Gelman et al., 1997). Further research has indicated that this acceptance rate does seem to perform well for approximate finite-dimensional situations where the number of parameters jointly estimated is as few as five. This does not apply when estimating fewer parameters. For example, it has been found that for one normally distributed parameter the optimal acceptance rate is approximately 0.44 (Rosenthal, 2011). Here, we accepted up to 30% of the parameter sets each iteration, dependent upon their log-likelihood.
The number of equids on 'uninfected premises' was assumed to follow the same distribution of those on premises with a reported infection. The accuracy of the census data used to create the distribution of uninfected premises is not known, and could be an underestimate of the true number. Gubbins et al. (2014) suggested that the principle mechanism behind AHSV transmission is vector dispersal. Analysis of the 1989 Morocco outbreak showed most of the transmission occurred over short distances, consistent with studies of Culicoides dispersal (Kluiters . This does not necessarily imply that AHSV lacks BTV's potential to spread over larger distances. Rather, this could be caused by equine premises being more clustered compared to cattle and sheep holdings, and therefore most of the transmission is local. Tildesley et al. (2010) showed that spatial structure can be important for predictions of emergence and population-scale dynamics, however this structure is mostly subsumed in the parameterisation when considering the effectiveness of control strategies. Here we assume that the premises stay infected after infected equids are culled as local infected midges will remain present for a period. However, there is uncertainty regarding midge behaviour at the time after culling. Elbers and Meiswinkel (2015) quantified host preferences of Culicoides during early summer in the Netherlands. Generally, there was a strong correlation between attack rate and host size with a cow attracting the greatest proportion (62.4%) of the Culicoides captured, followed by a Shetland pony (29.2%) and then a sheep (8.4%). However, similar numbers of C. dewulfi were collected from the cow and the pony and C. obsoletus numbers where evenly distributed among the three host species. In contrast, Schmidtmann et al. (1980) captured C. obsoletus in comparable numbers from calves and sheep but significantly smaller numbers were found on ponies (Schmidtmann et al., 1980). Therefore, when equids are removed from a premises by culling, how far the Culicoides may travel to seek alternative hosts may depend on the species of Culicoides and the distribution and density of other species locally.
One of the most important features of vector-borne diseases not addressed in this model is climate impacts as climate data from the time of the AHS outbreak were not available for the region of this study. Climate factors will impact both the abundance of Culicoides and the dynamics of virus transmission (Sanders et al., 2011;Carpenter et al., 2011). Although climate data are not available to inform this model, the within-premises dynamics may help provide information on vector capacity throughout the outbreak. For example, if more secondary infections are reported (before culling) for a period within the outbreak, we could assume that the vector capacity was higher at the time.
In conclusion, although vector capacity is likely to vary between geographical regions this model may help inform the dynamics of potential outbreaks upon emergence. Overall, we found that the village and region datasets generated similar parameter estimates, whereas the province level data did not. Should countries be faced with the introduction of a foreign animal disease such as AHSV, the current uncertainty in spatial data for many species is likely to effect the ability to capitalise on modelling techniques available. Data on the location of equids are often sparse; for example, reports of equine influenza in 2019 were given for each nomenclature of territorial unit 2 (NUTS2) of the UK. The median area of a NUTS2 region in the UK is 4788 km 2 , which is most comparable to the median province area for the three provinces of Morocco with infected equids in 1989 (2653 km 2 ). This suggests that, in the unfortunate circumstance of an AHSV outbreak in the UK, for example, spatial data would need to be collected at a finer scale to predict the population-scale spatial dynamics of the virus from a knowledge of the individual-level dynamics. Proactively gathering more informed data before an outbreak would allow the epidemiological and modelling tools available to be applied in real-time.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
An example randomly generated region-level spatial distribution and code are available on the University of Nottingham data sharing platform (http://dx.doi.org/10.17639/nott.7193).