Modelling the spread and mitigation of an emerging vector-borne pathogen: Citrus greening in the U.S.

Predictive models, based upon epidemiological principles and fitted to surveillance data, play an increasingly important role in shaping regulatory and operational policies for emerging outbreaks. Data for parameterising these strategically important models are often scarce when rapid actions are required to change the course of an epidemic invading a new region. We introduce and test a flexible epidemiological framework for landscape-scale disease management of an emerging vector-borne pathogen for use with endemic and invading vector populations. We use the framework to analyse and predict the spread of Huanglongbing disease or citrus greening in the U.S. We estimate epidemiological parameters using survey data from one region (Texas) and show how to transfer and test parameters to construct predictive spatio-temporal models for another region (California). The models are used to screen effective coordinated and reactive management strategies for different regions.

Introduction A rapid response to emerging epidemics of crop disease is essential for successful management of an epidemic [1][2][3] just as for the pandemic of COVID-19 [4,5]. Effective management, however, requires knowledge of critical epidemiological parameters for transmission and dispersal of the pathogen [6][7][8]. Knowledge of epidemiological parameters is required for models to predict the current extent of infection, the likely future spread [7,9,10] and the effectiveness of potential intervention strategies [7,[11][12][13]. Epidemiologically important parameters such as transmission rates and dispersal kernels are frequently unknown when an epidemic invades a new region. There are two options to obtain values for the epidemiological parameters: wait until there are adequate surveillance data in the newly invaded region from which to estimate parameters, or transfer parameters derived for another region and incorporate them into models that allow for different host distributions and environmental variables in the newly invaded region. Here we introduce and test an epidemiological modelling framework for an emerging epidemic of Huanglongbing (HLB) disease, one of the most serious threats to citrus production world-wide [14,15]. We parameterise and validate the model using training and test data in one region (Lower Rio Grande Valley in Texas). We then assess the adaptability and validation of the model in new regions with different climatic conditions, where the insect vector is endemic (southern California) or invading (Central Valley, California).
Huanglongbing disease also known as citrus greening causes severe chlorosis of foliage, dieback, loss of yield, discolouration and ill-flavour of fruit, and death of citrus trees [14]. The disease is associated with three bacterial strains of which Candidatus Liberibacter asiaticus (Las) is the prevalent type in the Western Hemisphere and the only strain that has been reported as established within the U.S. The disease is reported to have caused a 74% drop by 2018 in citrus production in Florida since the first detection of the pathogen in 2005 (USDA/NASS; 2022). The pathogen has spread rapidly in backyard and commercial trees in Texas [16] and has been introduced and become established in backyard trees in southern California [17,18]. There is a consequent risk of invasion into the major citrus production area with the Central Valley in California [19], where the insect vector, the Asian citrus psyllid (ACP), Diaphorina citri Kuwayama, is currently invading. The pathogen also continues to pose a major threat to citrus production in Brazil [20]. The invasion of another vector, the African citrus psyllid, Trioza erytreae (Del Guercio), which primarily transmits the Africanus strain of Liberibacter (Laf), in Portugal and Spain constitutes a threat of introduction of the disease into European countries [21]. The disease is also widespread in South East Asia [22] and the Las bacterium has recently been reported in Kenya [23].
The pathogen is transmitted by human mediated movement of infected planting material as well as the insect vector [14]. There are currently no genetically resistant hosts. Options for control include pesticide application to kill the vector, removal of infected and surrounding trees and quarantine to prevent movement of infected planting material and citrus fruit [24]. Given the widespread distribution and continued spread of the pathogen and the psyllid vectors, there is an urgent need for a flexible parameterised epidemiological model that can be used to predict spread at landscape scales and to inform surveillance and management options. Here we adopt a classical susceptible-exposed-infected (SEI) compartmental modelling framework for HLB [1,8]. We couple this with a model for the dynamics of ACP infection and an observational model for vector trapping data and disease surveillance data. The models take account of the heterogeneous distribution of the citrus host in the landscape, encompassing plantations and backyard trees [25]. The models are parameterised using Bayesian methods and extensive surveillance data that comprises successive snapshots of disease for the citrus growing region in the lower Rio Grande Valley in Texas (Fig 1A and 1D).
The Bayesian approach allows for commonly encountered problems in the estimation of parameters for emerging epidemics. These include incomplete spatial coverage of surveys that require inferences about chains of unobserved infections between recorded snapshots of disease in the landscape and allowance for cryptic infection from asymptomatic or otherwise undetected hosts [7,25,26]. The parameter estimation also takes account of the confounding effects of pesticide application by some growers to manage the vector, which affects the observed dynamics of pathogen spread [8]. Our initial objective is to test and validate the model using the Texas data. Specifically, we use the model to quantify the infection pressures from primary and secondary infection sources in order to analyse the roles of different transmission mechanisms in epidemic spread. We also use the model to analyse the effectiveness of ACP control strategies retrospectively for the epidemic in Texas.
The HLB epidemic in southern California is at an earlier stage compared with the outbreak in Texas. The ACP vector was first detected in southern California in 2008 [27] and is now endemic. HLB was first detected in 2012 in Hacienda Heights and then in 2015 in the San Gabriel areas of greater Los Angeles with subsequent clusters of infection in Los Angeles, Orange, Riverside and San Bernardino counties. An extensive surveillance programme linked with compulsory treatment and removal of infected trees together with imposition of quarantines over movements of planting material and fruit around infected sites has restricted but not prevented the spread of the disease. The reservoir of the pathogen in southern California poses a threat to the principal citrus production area for the state in the Central Valley, where the ACP vector is known to be invading [17]. The intensive management of the disease in southern California along with absence in the Central Valley mean there were insufficient data to re-estimate parameters for the epidemiological models under Californian conditions. Accordingly, we therefore test the adaptability and validation of the model in new regions with different climatic conditions. We use the model to predict rates of pathogen and vector spread and screen some options for management. The potential flexibility of the combined modelling and parameter estimation are discussed.

Data
We obtained data from surveys for early HLB detections in both plant and psyllid samples in Texas and California and detection of ACP invasion in California. The data were collected by the U.S Department of Agriculture (USDA) and the California Department of Food and Agriculture (CDFA). Data collection surveys were part of an intensive area-wide management regulatory program for HLB over multiple U.S. states and localized management of ACP in California. The survey teams collected leaves with HLB-like symptoms and psyllids based on a risk-based survey model [28]. Samples were tested for Las bacteria in certified laboratories using the approved USDA diagnostic protocol utilizing qPCR technology [29]. For confirmatory diagnostic tests of plant tissue samples, a Ct value less than 36 (over 40 cycles) classified the sample as HLB positive. A Ct threshold of 38 was used for samples of vectors. The lower threshold for leaf tissue samples was an internal policy decision by USDA for added confidence in the diagnostic results because USDA regulates on positive plant tissue samples, but not on ACP samples positive for Las bacteria. We used the HLB leaf tissue survey data in Texas (Fig 1A and  1B) to estimate parameters and validate an HLB epidemiological spread model (Fig 2A). We also used an additional survey of psyllid samples for HLB-infected ACP in Texas to estimate parameters for vector spread (S1B Fig). Maps of citrus distribution (Fig 1C and 1D), together with HLB survey and ACP trapping data for southern California ( Fig 1D) and Central Valley ( Fig 1F) were used to initiate simulated epidemics for model inference on spread and control strategies and for further validation. All data were geo-mapped and aggregated by visiting dates on rasterised 1 km x 1 km grid cells to obtain spatiotemporal datasets for presence or absence of HLB for plant and vector samples in Texas (Figs 1B and S1B)), for presence or absence of HLB plant samples in southern California ( Fig 1D) and ACP presence in the Central Valley ( Fig 1F). For further details of the datasets used for parameter estimation, see S1 Text.

Epidemiological models
We used spatially explicit, continuous-time, stochastic compartment models for HLB and ACP dynamics and spread through rasterised landscapes comprising 1 km x 1 km grid cells. We treat the grid cells as the units of interest (i.e. in determining whether or not a grid cell is infected). We allow for the dynamics of infection spread within cells, i.e. transitions from Exposed to Infections to Symptomatic at the grid cell level but we do not explicitly model infections spread from one tree to another within grid cells. We also distinguish between two cases where the vector is endemic (Fig 2A: Texas and southern California) and where the vector is also spreading (Fig 2B: Central Valley, California). In addition, we also model the spread of HLB infected ACP through an endemic ACP population (S1B Fig). We link the HLB (Fig  2A) and HLB-ACP ( Fig 2B) to observation models for detection of infected plants and infected vectors for the purposes of parameter estimation. The principal variables and parameters used in the models are summarised in Table 1. The simulation models and the algorithms for parameter estimation were coded in open access Python and are available in GitLab (see SI). Epidemiological models for ACP and HLB spread. (A) The stochastic compartment model for HLB epidemiological dynamics in a Texas citrus grid cell and the observation model that matches infection status to survey diagnostic data. An infectious cell can infect other susceptible cells via the movement of local vectors, which were known to have established over the whole region before the emergence of HLB. ϕ, γ P , π P denote the transition rates and detection probability and are described in Table 1 and derived in the Methods section. (B) The joint epidemiological model for ACP and HLB dynamics in California where ACP is invading and not yet endemic. The model extends the Texas model and introduces a new epidemic category 'ACP + HLB infected' that connects the dynamics of ACP infestation to HLB infection in a grid cell. The transition rates ϕ, ψ, φ, γ V , γ P and detection probabilities p V ; p P ; p V P are described in Table 1  Pathogen spread by endemic insect vector. When the vector is endemic, as in Texas and southern California, we consider four categories for the status of HLB infection in each citrus grid cell: 'HLB susceptible', 'HLB exposed', 'HLB infectious', and 'HLB detected' (Fig 2A). A cell is susceptible if all the trees in the cell are healthy or free from the Las bacterium. An exposed cell contains infected trees but is not yet infectious to trees in other cells. An infectious cell can transmit Las bacteria to other cells via the movement of the psyllid (ACP). Finally, a cell becomes detected as a survey team collects a positive HLB sample confirmed by a qPCR diagnostic test. The transition of a cell from being infectious to detected requires two steps: first, infectious trees must show symptoms, and second, the site must be visited by a survey team in searching for the symptomatic trees.
Pathogen spread by invading insect vector. The model for HLB spread (Fig 2A) was adapted to account for the fact that the underlying ACP population in the Central Valley is still spreading and has not fully invaded the region. Grid cells were additionally classified ( Fig 2B) as: 'ACP susceptible' (free from vectors), 'ACP exposed' (first vectors arrived in the cell but have not reproduced sufficiently to invade other cells), and 'ACP infested' (vectors settled in the cell and started invading nearby cells). We also introduced a new epidemic category 'ACP + HLB infected' for grid cells occupied by HLB-infected vectors ( Fig 2B).
Modelling HLB exposure. An 'HLB susceptible' cell, i, is exposed to infection as the first tree in the cell gets infected. The exposure can happen either by primary or secondary transmission. Primary transmission originates from either the introduction of infected trees and products by trade and other human-mediated movements or from infected vectors arriving from external environments, including across the Mexico border. The border was marked by 1km 2 cells that contain the border line. We introduce a parameter, � to represent HLB importation rate by human activities and two parameters, ε B for cross-border vector transmission and ε W for vector transmission from further outside the region of interest.
Secondary transmission from an HLB infectious to a susceptible cell happens by the flux of vectors moving between the two cells. We denoted β P as the rate for secondary infection and used an isotropic exponential kernel, K α (r)/e −r/α where α represents the dispersal scale, to depict the dependence of movement rate on the spatial distance, r, between the cells. The number of vectors moving towards a cell i from cell j depends on the distance between the cells and the availability of new flushes (foliar growth attractive for psyllid feeding) in the destination cell i. We established the overall strength of spatial coupling between grid cells by considering a mechanistic model that addresses vector movement and feeding. The model constructs the equilibrium abundance of psyllids moving to and feeding in a citrus grid cell. In doing so, it considers the rates of dispersal and also the birth and death rates of vectors (see S1 Text for a detailed derivation of the flux model). As vector dynamics occur at faster rates than HLB epidemiological dynamics, the vector counts quickly converge to equilibrium values that we used in calculating the HLB exposure rate (ϕ i (t): see also Fig 2A and 2B) from primary and secondary sources as follows: Secondary infection is encompassed in the final term in Eq (1) and accounts for the net exposure rate of a susceptible cell i from infectious cells j, allowing for citrus density, h i , and a vector density weight, which incorporated the effect of vector control, k C i , and weather suitability to vector development k W i . The superscripts V and P indicate infected vectors and plants, respectively and j 0 is a general cell index that points to all cells in the host raster including i and j. The parameter μ is the ratio of time spent feeding compared with moving by the vector. See S1 Text for the derivation of Eq 1.
As commercial growers in Texas applied sprays in November and early February in a coordinated manner, they were able to reduce the density of vectors in commercial orchards. We used k C i to represent the relative weight of vector capacity in plantations compared with residential trees. We assumed that the vector density in cell i decreased as the proportion of commercial trees in the cell, f C i , increased. The parameter η denotes the efficiency of vector control measures applied to a cell: Daily temperatures and other weather variables affect vector density by leveraging or slowing down the development of eggs, nymphs, and adult vectors. We used k W i to account for the variation of vector density at different locations due to their corresponding weather patterns. We used the modified Logan function r(�) provided by Liu and Tsai [30] to calculate the vector development rate in a day, given the day's temperatures, r id = r(w id ). The function addresses temperatures in the range of 10 to 33˚C and assumes that no vector growth occurs beyond this range. We computed the expected weather-driven vector capacity coefficient, k W i , by averaging over the development rates for the whole year: The unknown parameters �, ε B , ε W , α, β P , μ, η required to model HLB exposure were estimated from the plant diagnostic data of the Texas HLB survey using a data augmented Markov chain Monte Carlo (DA-MCMC) algorithm under a Bayesian inference framework. We used uninformative prior distributions for all parameters.
Modelling ACP exposure. An 'ACP susceptible' cell i is exposed to an infestation when the first few vectors arrive in the cell either from nearby sites or are transported from external environments. We refer to these mechanisms as secondary and primary infestation and used parameters β V and ε V , respectively, to represent the rates of infestation. We calculated the force of ACP exposure (ψ i (t): See also Fig 2B) to a susceptible cell i analogously to the pressure of HLB exposure (Eq 1) as follows (see S1 Text for a detailed derivation): Since the ACP spread model was used for the Central Valley only, we discarded the extra risk of primary infection along the Mexico border. Lack of sufficient data meant we could not estimate the unknown parameters ε V , β V directly from California ACP trapping data. We assumed, therefore, that ε V �ε W and estimated β V from the Texas HLB survey data using the 'ACP Infested'-to-'ACP + HLB Infected' model described below.
Modelling the transition from 'ACP Infested' to 'ACP + HLB Infected'. An 'ACP infested' cell transitions to 'ACP + HLB Infected' cell ( Fig 2B) when vectors in cell i acquire Las bacteria either by feeding on infectious trees inside the cell or by migrating from a nearby infectious cell. The former is also known as the bulking up of infected vectors inside the cell and is driven by an unknown parameter ξ. The rate at which infected vectors migrate from an 'ACP + HLB Infected' cell to an 'ACP Infested' site is equivalent to the rate β V at which vectors from an 'ACP Infested' cell arrive in an 'ACP Susceptible' cell under the assumption that Lascarrying vectors behave similarly to uninfected vectors. Using the same dynamic model of vector movement and feeding as before, we can calculate the transition force as follows: The superscripts V and P indicate the epidemic compartments for the vector and pathogen respectively. The unknown parameters ξ, β V were estimated using the vector diagnostic data from the Texas HLB survey. Since we carried out parameter estimation under a Bayesian inference framework, we can use the previously acquired posterior estimates for α, μ, η to complement the sparsity of vector diagnostic data.
Latent period parameters. The time it takes for a cell to transition from 'HLB Exposed' to 'HLB Infectious' is known as the HLB latent period. It starts when a tree gets exposed to HLB infection and ends as a significant number of trees in the cell became infectious so that the number of Las-carrying vectors that move away from the cell is large enough to cause infection in another cell. We followed Parry et al. [8] to use a seasonally-forced model for the rate of infectiousness onset, where a P denotes the average rate at which a cell moves from 'HLB Exposed' to 'HLB Infectious'. We set the parameter to empirical estimates from in-orchard and in-nursery observations for trees more than ten years of age, which reported an average latent period of 15 months, i.e. a P = 0.8. We assumed that the specific latent period for each cell follows an exponential distribution with rate γ(t). As such, the latent periods vary among cells and include very short durations due to the exponential form of the model. Analogously, the ACP latent period indicates the time it takes for a cell to transition from 'ACP Exposed' to 'ACP Infested'. We used the same seasonally forced model as for the HLB latent period model, with a P replaced by a V = 0.042. The rate is equivalent to an expected ACP latent period of 15 days, which covers the duration for nymphs to develop into adult psyllids.
Parameter estimation MCMC details. We adopted a Bayesian approach to estimate parameter values for the epidemiological parameters from the survey data. We treated the unknown timing of epidemiological transitions as random variables and used a data augmented MCMC algorithm [31,32] to infer the timings of the unobserved transitions. The Metropolis-Hasting method [33] was used to construct samplers for both parameters and unobserved epidemic transitions. We used simple log-normal proposal distributions for �, ε W , ε B , α, μ, η, ξ, σ, π, and Gibbs samplers for β P , β V . We used the randomized construction of Markov trajectory [34] and exact inference algorithms for hidden Markov models [35] to improve samplers of epidemic transitions. Further details of the algorithms are given in the SI.
The overall likelihood of a set of parameter values comprises the model likelihood and the data likelihood (Fig 2B). The model likelihoods (Eqs (5) and (8) in the SI) can be naturally derived from the stochastic construction of the HLB and ACP epidemiological models described above. We developed data likelihoods (Eqs (6, 7 and 9) in the SI) using two parameters of the data collection process: π represents the probability that a positive sample is collected from an infectious cell, and σ indicates the expected duration from becoming infectious to getting detected. Both parameters, together with parameters of epidemiological models, were estimated from Texas HLB survey data.

Model validation
We estimated the epidemiological parameters for spread of HLB (�, ε W , ε B , α, β P , μ, η, σ, π) using the Texas HLB survey data collected between December 2011 and August 2016 as training data. We validated the model using data collected between September 2016 and October 2018 as the testing data for Texas. The application of the HLB spread model, parameterised for Texas to southern California was validated using surveillance data collected for HLB collected in southern California from June 2015 to June 2019. We compared model simulations against testing data in terms of both temporal progression and spatial autocorrelation metrics. We used Moran's I [36,37]. See the S1 Text for further details of the validation processes.
The application of the ACP spread model, parameterised using Texas plant and vector diagnostic data and adjusted for temperature conditions, to the Central Valley, was validated against ACP trapping data for the Central Valley in 2015 and 2016. We used temporal prevalence as the evaluation metric and compared the model predictions with the observed trapping data.

Model variants
To analyse the sensitivity of different model components to prediction performance, we considered four variants to the full model described above. Each model variant differed from the full model by one component of the secondary transmission model: (1) no normalisation, in which the normalisation term for vector fluxes is assumed to be the same for all cells and absorbed into the secondary infection rate; (2) no control effect, in which we ignored the occurrence of the annual coordinated spraying program; (3) no border effect, in which we did not distinguish between sites near to and far from the Mexico border and used the same primary infection rate for infected vectors from external environments; (4) power-law kernel, in which the exponential dispersal function is replaced with a power-law function. Model variants were fitted using the Texas HLB survey data up to August 2016 and validated with data up to August 2017 (S2 and S3 Figs).

Predictions and allowance for control
We made prospective predictions more than two years beyond the final observation time for each region. Not all sites already infected with HLB are observable at the time of forecast. The observed survey data were used to infer the locations of cells that had been exposed to and infectious with HLB at that time. We used an MCMC-based simulator analogous to the dataaugmented MCMC algorithms used for parameter estimation to sample epidemic transition times that agreed well with observed data.
Model simulations were also used to analyse the efficiencies of selected control measures in each region. These included an annual coordinated spraying program to manage ACP in Texas, reactive removal of infected trees and HLB quarantine in southern California, and reactive vector spraying upon ACP detection from sticky traps in the Central Valley. The efficiency of ACP control strategies in Texas was assessed by comparing results for η = 100%, 50%, 20% of the estimated (default) value for the spread of HLB between 2011 and 2019. For southern California, we simulated epidemic trajectories with a quarantine radius of 1, 2, 3, . . ., 14 km. The potential of the reactive ACP eradication program in the Central Valley was assessed by varying the treatment efficiency between 0% and 100% and the eradication radius from 0.1 to 2.0 km. In all cases 1000 simulations were run for each scenario and credible intervals 50%, 75% and 95% for disease trajectories were computed. (See S1 Text for further details of simulations.)

Parameterisation and validation of the model for the HLB epidemic in the Lower Rio Grande Valley, Texas
Preliminary analyses were carried out to compare the performance of the full HLB epidemiological model, as specified in Fig 2A,  We also used the receiver operating characteristics (ROC) curve analysis to compare and evaluate the models' predictive performance (Fig 3A).
The The ROC curves (Fig 3A) indicated superior performance for the full model and the variant with no border effect over the other three variants, with the full model performing marginally better that the 'no-border' variant ( Fig 3A).
The full model was selected for future work and the parameters re-estimated with a slightly adjusted citrus landscape to accommodate for updated information of residential sites and with 1000 simulations. We used the data-augmented Markov chain Monte Carlo (DA-MCMC) algorithm with survey data from December 2011 to August 2016 as a training dataset and from Sept 2016 to October 2018 as a testing dataset. The data comprised the locations and times of collection for presence and absence of HLB assessed by qPCR of leaf samples in commercial and residential locations (Fig 1B).
The posterior estimates for the transmission rates (�, ε B , ε W , β P , β V ) and the dispersal scale (α) are summarised in Table 1, together with the average efficiency of the coordinated spraying program (η) adopted by commercial growers and the waiting period (σ −1 ) from being infectious to detected for a citrus grid cell. The corresponding marginal posterior distributions are summarised in S3 Fig. We observed good agreement between the model simulations and the training and testing data for both temporal and spatial validation metrics (Figs 3C, 3D and 4) with additional support from ROC curves for the model performance as a binary classifier that separated positive from negative cells (Fig 3B).
Inspection of the temporal curves for epidemic progress (Fig 3C) and retrospective mapping of the training data indicate an initial lag from the time that a cell becomes detectable according to the model and is recorded as detected by ground survey (Fig 4). Model The difference underlines the importance of cryptic (asymptomatic) infection in generating epidemic spread ahead of ground surveys [1,2,9]. Ground surveys, in turn, are frequently under-resourced during the early stages of epidemics, followed by an intensification in surveillance in response to increased awareness of the epidemic. Slow initial surveillance followed by acceleration in surveillance is consistent with the results in Figs 3 and 4).
Local spread of HLB was dominated by secondary transmission involving ACP vectors (Fig 5A). The rate of secondary transmission was three orders of magnitude greater than the rates of primary infection via cross-border infected vectors and four orders larger than human-mediated movement of infected material (Fig 5A). We used an exponential prior for the cross-border infection rate (cf S4 Fig) to reflect the default belief that the rate is close to zero but we observed a clear departure from zero for the posterior distribution. We also found that the posterior estimate of longer distance (ε W ) is much lower than immediate cross-border (ε B ) primary transmission (Table 1). These results suggest that there are sources of infectious vectors close to the Mexico border. Fig 5B presents the annual infection pressure (calculated as the relevant components of ϕ i (Eq 1 in Methods section) and summed over the whole landscape and each year) caused by the four infection sources. We again observe the significant role of local vector movement in driving the epidemic, resulting in infection pressures an order of magnitude larger than primary forces in early years, rising to two orders in the later years.

Retrospective analysis of the effectiveness of the ACP control strategy in the Lower Rio Grande Valley in Texas
The parameterised model enabled retrospective analysis of the area-wide coordinated vector spraying program in Texas. Beginning in 2011, growers joined the program by applying pesticide sprays within a short, designated period to target the overwintering vector populations. Using historic HLB survey data, we estimated that the program helped to reduce approximately 80% of the vector population in commercial orchards, designated as the control efficiency (Fig 6A). To understand the importance of having an area-wide collaborative effort amongst growers in place, we ran retrospective simulations for hypothetical scenarios in

PLOS COMPUTATIONAL BIOLOGY
which less intensive ACP control had been carried out (Fig 6B and 6C). We observed a nonlinear relationship between the control efficiency and HLB exposed area. While increasing control efficiency from 20% to 50% did not guarantee the reduction of epidemic size, bringing control efficiency up to 80% was effective (we observed both a significant reduction of the infected area in 2020 and a clear separation of the 95% credible intervals associated with 20% and 80% control efficiency) (Fig 5A).

Transfer of the model and prediction of HLB spread in southern California
The HLB epidemic in southern California was at an earlier stage compared with the outbreak in Texas, with clusters of infected trees found in the Orange and Riverside Counties (Fig 1E). Recorded HLB detections were limited to two clusters with intensive tree removal upon detection. Such intervention and the limited spatial representativeness of the data made it impossible to infer key parameters from the data. The data were sufficient, however, to test the credibility of transferring Texas parameters to this new region. We accounted for the difference in weather conditions that affect ACP distributions in California and Texas by incorporating a weather, suitability score for ACP growth, particularly for temperature [30] to the model. We also introduced HLB quarantines and the removal of HLB-confirmed trees as discrete stochastic events into the spread model.
We inferred the locations of unobserved infected cells up to 30 th June 2017 in southern California using the survey data collected before that date and thereafter simulated the epidemic forward. For the two years in the testing data (June 2017-June 2019), the predicted detections successfully reproduced the spatiotemporal patterns observed in the survey data (Fig 7). The predictions also showed good quantitative agreement with the training data and initially for the testing data for the temporal progression (Fig 8A) but while the model predicted a continued upward trend in the amount of detected grid cells, the surveillance data suggest a slowing towards a linear rate of increase. The observed measure of the spatial autocorrelation metric (Moran's I) for the surveillance data lay within the credible intervals predicted by the model albeit with some subjective evidence of underestimation at short and overestimation at longer distances ( Fig 8B). Inspection of the model predictions for unobservable infectious categories (exposed and infectious cells) indicated that the extent of HLB spread is likely to be far greater than was detected by survey data. Huanglongbing is likely to be present in most counties in southern California and even with the imposition of quarantines and tree removals upon detection, the disease was increasing in severity. Our results indicate that surveys by visual inspection are likely to reveal more HLB positive samples from all over Los Angeles and Orange Counties, and infected trees in San Bernardino, Riverside and San Diego Counties would become detectable.

Effect of changing quarantine radius on disease management in southern Californian
To assess the effect of increasing the radius of quarantine circles to decrease the HLB infectious area in southern California, we used the counts of 1km 2 cells that had been HLB infectious up to December 2021 as the evaluation metric. We started simulations in June 2017 using the inferred infected locations from data up to that point. We did not make use of either survey data or the quarantine boundaries data available after this time. In the absence of the availability of detailed information from the ground regarding the timings of removal and quarantine events after detection, we sample the lagging time as stochastic events. However, control events are systematic and deterministic in terms of location i.e. cells are marked for removal and quarantine if they or a nearby cell becomes detected with HLB. We evolved the boundaries of  the quarantine area as the model predicted new detections. The model used the detection rate estimated from Texas. Besides the implementation of HLB quarantines, we also incorporated the effect of the annual coordinated spraying and the removal of HLB infected trees upon positive diagnostic confirmations.
We simulated 1,000 epidemic trajectories for each quarantine radius of 1, 2, 3, . . ., 14 km and derived the median and credible intervals from the generated samples. Simulation results demonstrated consistent reduction of the infectious regions as the quarantine radius is increased (Fig 9), with the then currently prescribed 8 km radius helping to reduce a third of the epidemic size by the end of 2021.

Estimation of ACP invasion rate in the Central Valley using Texas ACP survey data
The potential epidemic of HLB in the Central Valley, the major citrus growing area in California, was at a much earlier stage than for southern California, with the vector, ACP, invading rather than being established (Fig 1C and 1F). Modelling the spread for ACP at the landscape scale requires an estimate of the vector invasion rate. Exploratory analyses showed this was not possible using the small amount of ACP trapping data for the Central Valley: instead, we calculated the ACP invasion rate using the more extensive ACP survey dataset for the Lower Rio Grande Valley in Texas. Although ACP had fully invaded Texas by the start of data collection in 2011, the region was not fully infested with HLB-infected ACP. By introducing a new epidemic category 'ACP + HLB infected' (Fig 2B) that connected the dynamics of ACP infestation to HLB infection in a grid cell (S1A Fig), it was possible to estimate an invasion rate parameter for ACP from the ACP diagnostic data collected as part of the Texas HLB survey (S1B Fig). In particular, the 'ACP + HLB infected' category marked cells containing HLBinfected ACPs. By modelling the transition of cells from 'ACP infested' to 'ACP + HLB infected', we estimated the rate at which vectors move from one cell to another. Spatiotemporal maps for the potential spread of the ACP vector and HLB infection up to 2030 are shown in Fig 10 on the assumption of transferable parameters from Texas with suitable adaptation for weather conditions in the Central Valley.
We validated the ACP spread model by running simulations to reproduce the historic spread in the Central Valley in 2015 and 2016 and compared model predictions with the ACP trapping data, which were collected independently from the HLB survey and not used for parameter estimation. We incorporated the dynamics of the reactive ACP treatment program by the California Department of Food and Agriculture (CDFA) into the model and observed good agreement in both temporal progression (Fig 11A and 11B) and spatial autocorrelation  We assumed that as ACP prevalence passes 0.3, ACP the reactive treatment program would have been dropped due to the high cost of maintaining the program and the reduced effectiveness as ACP becomes widespread. (C) Comparison of the spatial autocorrelation scores of the predicted ACP infestation prevalence (red line and shading) in the Central Valley with that of the ACP trapping data (purple line) at the end of the data availability period. (D,E) The effect of varying the efficiency and radius of pesticide treatment upon detection of the vector ACP on the total Infested area in the Central Valley. We started simulations from January 2020 and calculated the total Infested area for December 2021 for (D) control efficiency from 0% to 100% for treated circles of radius 0.4 km and (E) treatment radius from 0.1 km to 2 km assuming spraying efficiency of 80%.

Potential efficiency of vector control in the Central Valley
As the focus for the Central Valley was on ACP invasion, we considered two parameters that drive a reactive ACP eradication program: the eradication efficiency (Fig 11D), and the radius of the treated circle around an ACP positive site (Fig 11E). Our simulations allowed for pesticide treatment, applied by CDFA, on all citrus trees within a 400 m radius of an ACP positive site. Where treatment circles overlap a commercial grove, the whole grove is treated. Reactive treatment occurs in addition to a coordinated spray implemented annually by commercial growers. Simulation results show that having both high efficiency and sufficient radius are essential to slow down the spread of ACP in the Central Valley (Fig 11D and 11E). There is a consistent reduction in the median infested area and also the 95% credible interval as the eradication efficiency increases. An eradication efficiency of 80% reduces the median infested area by half. It also reduces the upper boundary of the credible interval by 75% compared with no treatment. Increasing the eradication radius from 100 m to 500 m decrease the expected infested area by 50% and the credible interval by 75%. The results suggest that although 80% eradication efficiency is reasonable, it might be worthwhile to increase the treatment radius to 500 m.

Discussion
Our primary aim in this paper was to develop and test an epidemiological modelling framework to predict the spread and control of Huanglongbing disease on citrus at extensive landscape scales (cf Fig 1). Our stochastic framework takes account of the intrinsic uncertainty of disease spread through heterogeneous citrus populations that encompass backyard trees and commercial citrus plantations. The framework was designed to be flexible. It allows for cryptic infection from sites that are infected but not yet symptomatic or detectable [1]. The model framework can be adapted to account for multiple sources of infection, and it has tuneable parameters that are adjustable to simulate a range of control scenarios. The models can also be adapted to allow for cases where the vector is endemic (Fig 2A) as in Texas and southern California or when the vector is also invading (Fig 2B) as in the Central Valley in California.
Having compared several model variants, the selected SEI model with an additional detected class (Fig 2A), an exponential dispersal kernel, normalised vector flux and allowance for vector control in commercial plantations successfully fitted the data for the citrus growing region in the Rio Grande Valley in Texas (Figs 3 and 4). Although sparse relative to the size of the host population, the surveillance data for Texas had the advantage over other datasets, such as for Florida, in that systematic surveys for both ACP and for Las had been conducted across a wide area before the first HLB positive trees were found. This allowed for direct observation of the spread of the disease from known points of introduction, which aided parameter estimation.

Insights from model fitting on dynamics of HLB spread in Texas
Our results distinguish four potential transmission rates in the Rio Grande Valley (Fig 5). Secondary transmission by local movement of infected vectors is the dominant force of infection in epidemic spread following introduction (Fig 5B). We show, however, that longer-distance vector spread, local cross-border border and human-mediated movements all contribute to the initiation of new infected sites. Secondary transmission is remarkably effective in spreading the pathogen thereafter ( Fig 5B). Moreover, retrospective inference of HLB infection times for Texas showed that the epidemic progressed at a much faster pace than had been captured by the survey data. There is a marked difference between the extent of spread of the detected class, which is comparable with the surveillance data, and the exposed and infectious classes (Fig 4). Knowledge of the locations of cryptically infected sites gives government and industry decision-makers a two-year advantage in knowledge of the extent of the epidemic when compared with survey data alone. Failure on the part of regulators and other stakeholders to allow for the enhanced spread of the pathogen beyond what is immediately detectable can lead to serious underestimation of the severity and impact of emerging epidemics [11].
We were able to infer the average efficiency of the annual coordinated spraying programme from the Texas survey data and to run retrospective analyses to compare with less effective uptake (Fig 6). Our results showed the benefit of having a high efficiency equivalent to a high level of participation from commercial growers in slowing epidemic progression, especially during the first few years after the invasion (Fig 6A). Thereafter, however, the benefit is insufficient to prevent the ultimate spread of the epidemic through the target region (Fig 6). Allowance for the localised disruption and interference of control on the intrinsic rates of epidemic spread is an important and often overlooked challenge for parameter estimation of emerging epidemics. Parry et al. [8] examined the impact of control on transmission and dispersal parameters for HLB at the plantation scale. Here we have extended the results from Parry et al. [8] using a similar SEI model and MCMC estimation to the landscape scale.

Efficiency of MCMC estimation
Surveillance data that comprise incomplete snapshots of disease at successive times also pose a significant challenge for parameter estimation [7,8]. We used a Markov chain Monte Carlo method with data augmentation (DA-MCMC) to infer chains of infection. The DA-MCMC method is considered a robust approach for inferring parameters for stochastic, individualbased epidemiological models [8,32,38,39]. The method has been used to estimate epidemiological parameters for heterogeneous plantation-scale epidemic systems with cryptic infections [8]. Related applications of the methods beyond plant disease include foot-and-mouth outbreaks in cattle [38], avian influenza epidemics in poultry [40] and MRSA outbreaks in hospital wards [41].
Convergence of the DA-MCMC method, however, is known to be difficult when applied to domains (landscapes) with heterogeneously distributed target populations. Accordingly, we improved the mixing and convergence of MCMC samplers for unobserved epidemic transitions by utilising the randomised construction of Markov trajectories [34] and exact inference algorithms for hidden Markov models [35]. Using the improved MCMC samplers, it was possible to infer locations that were cryptically (i.e., asymptomatically) infected from the survey data available at the time of prediction. Initialising spatio-temporal epidemic models with asymptomatic as well as symptomatic infected sites was essential to capture the current extent and the future potential for disease spread.

Transfer of the models to California
Transfer of the models to southern California gave encouraging results albeit in general patterns of epidemic spread, albeit with a caveat that some dynamics are not yet accounted. Allowing for cryptically infected sites (estimated from the training data) as well as survey reports of symptomatic sites when initialising the HLB spread model in southern California gave very good agreement in predicting the spread patterns observed in the two years of test data (Fig 7). There was also good agreement between model predictions and data for the spatial autocorrelation metric (Fig 8B). Dynamic risk maps for HLB in southern California from the epidemiological model (Fig 7) were markedly consistent with the risk maps derived by McRoberts et al. [24] from a statistical model. Building on earlier work to analyse risk in Florida [42] the statistical model used a mixture of social, biophysical and environmental variables to quantify risks across a rasterised landscape (1.6 km 2 ) analogous to the 1km 2 grid cells in the epidemiological model. Narouei-Khandan et al. [43] also used regression methods to relate global occurrence of HLB with historic climate data. The models also predicted that coastal areas in California were climatologically favorable for ACP and Las, but less so than in Florida. When current USA presence data were included in the models, the suitable areas for ACP establishment expanded to the Central Valley, CA, while this area remained less conducive for Las.
Careful inspection of the future predictions of the temporal data for southern California (Fig 8A) indicated evidence of deviation between the survey data and the detected class during the test period. The model results predicted continued acceleration in the rate of disease spread compared with a near-linear rate of increase in the surveillance data up to June 2019 (Fig 8A). Expert opinion based upon ongoing sampling of the vector indicates a striking difference between Texas and California in the percentage of ACP samples testing positive for Las.
Whereas >50% of psyllid samples tested positive for Las in Texas by 2018, only 0.25% of psyllid samples have tested positive for Las in California since 2017 (Bartels, pers comm). The highest level (0.6%) in southern California was reached in the 2 nd quarter of 2022. Even when data from the core areas of HLB positive trees are separated out, the percent ACP samples positive for Las is~2% of the total. The proportion of infested psyllids in Texas is comparable with reports from other states and countries, albeit with sometimes considerable variation. Wulff et al. [44] found the incidence of ACP with Las ranged from 33 to 74.6% in Brazil's citrus belt, while Hall [45] reports a mean of 17.5% of ACP with Las in Florida. Hu et al. [46] observed found the incidence of ACP with Las in China ranged from 3 to 78% and was correlated with HLB incidence in citrus trees.
The reason for the difference in the uptake of Las by ACP in California compared with other areas is unknown and requires further study to determine how the strains of Las in California interact with the ACP vector. Four different strains of Las in California, based on prophage typing groups, have been detected with sequence data indicating that the California strains were not introduced from Florida but are likely to have come from Asia [47]. Our model was adjusted to account for differences in temperature patterns between southern California and Texas, with essentially no vector dynamics outside 10˚-33˚C [30] in California. This might underestimate survival in certain high temperature inland regions. Antolínez et al. [48] recently analysed and discussed heat and dry stress on ACP, noting that survivorship at high temperatures is likely to depend upon exposure time. Antolínez et al. [48] suggest this may account for low ACP numbers during very hot summer months in inland areas of southern California compared with relatively higher ACP numbers reported for non-desert areas close to the coast.
Heat and dry stress are not sufficient to account for low uptake of Las by ACP and subsequent effects on transmission of Las. Our model parameters for the rates of acquisition of Las by ACP (ξ), and of primary (ε V ) and secondary infestation (β V ) can be altered to accommodate changes in uptake, with β P adjusted for transmission rates. This, however, requires further data and analysis.

Provisional control scenarios in California
The models were used to investigate the impact of different control strategies on epidemic outcomes for HLB in California under an assumption of the ability of the vector to acquire and transmit infection. While necessarily speculative, the analyses indicate the likely intrinsic sensitivity to adjusting the intensity of quarantine. Increasing the radius of the quarantine area to prevent movement of citrus products around newly detected HLB sites has the potential to reduce the total Infectious area for southern California (Fig 9). Within the Central Valley, our results indicate that changing the radius and the efficiency for the reactive ACP treatment programme could each reduce the infested area (Fig 11A and 11B) if a significant epidemic were to occur. There was also a marked reduction in the uncertainty of the outcomes of the programmes with enhanced control effort (Fig 11A and 11B).

Predictive models for disease outbreaks
Predictive models are an important tool in the management of infectious disease outbreaks.
Here we have used compartmental epidemiological models coupled with dispersal kernels to analyse and the predict the spread of HLB. Epidemiological models of this type have the advantage that both the state variables (e.g. susceptible, exposed, infectious sites) and the parameters (e.g. transmission rates, time to detection, dispersal kernel) have intrinsic biological meanings [1].The models can also be readily adapted to incorporate mechanisms for control, as here, for quarantine and vector control. Parry et al. [8], Chiyaka et al. [49], Neri et al. [7] used compartmental frameworks to model the dynamics of citrus disease at scales ranging from individual plants [49] through plantations [8] to local landscapes [7]. Parry et al. [8] and Neri et al. [7] also addressed fundamental issues in the use and analysis of Bayesian methods to estimate parameters from surveillance data for emerging epidemics of HLB and citrus canker. Haynes et al. [50] used a related approach of agent-based modelling to analyse ACP and HLB spread on a lattice of nine plantations. Our work, in this paper, has extended the scale to analyse and predict the spread and control of the pathogen and the vector at large heterogeneous landscape scales that extend across multiple counties. We also address the transferability and modulation of parameter estimates derived from one region to other regions. Compartmental epidemiological models, analogous to those introduced here, have also been widely used to respond to outbreaks and to formulate current and future policies for livestock and human diseases. Examples range from early work on foot-and-mouth disease [26,51], severe acute respiratory syndrome (SARS) [52] and recent intensive work on SARS-CoV-2 (e.g. [53-55]). Parameter estimation for emerging epidemics of human and livestock populations frequently benefits from enhanced reporting and more extensive mapping of outbreaks than is usual for crop disease. Many of the challenges, however, in extracting signals from comparatively sparse data are common to all types of infectious disease epidemiology and deserve more study to assess how soon and for what types of data robust initial estimates of epidemiological parameters can be obtained for emerging epidemics. Statistical methods, based on regression and correlation, along with the increasing popularity of machine learning [56] offer alternative approaches to assessing disease risk. McRoberts et al. [24] estimated landscape-scale risk maps for HLB in California at an analogous spatial resolution to the one used in this paper while Alves et al.
[57] used a hierarchical Bayesian modelling approach to link climatic variables with the regional prevalence of HLB (at a spatial resolution of 55 x 55 km) in Minas Gerais state in Brazil. Further work is needed to assess the relative merits and complementarities of different approaches to modelling risk and assessing options for control of emerging epidemics such as for HLB on citrus.
Supporting information S1 Text. Technical Appendix. Contains additional details of data, epidemiological models and for methods parameter estimation and model prediction used in the analyses. (DOCX)

S1 Fig. The epidemiological model and Texas data used to estimate the ACP invasion rate.
(A) The joint model of ACP and HLB spread after accounting for the fact that the vector had fully infested the citrus landscape before the modelling period. The ACP and HLB spread share the dispersal scale parameter and differ by the rates of invasion β V and transmission β P respectively. τ V!H denotes the probability or efficacy of pathogen transmission to trees during the feeding of infected psyllids. The full model which allows for emerging ACP population is shown in Fig 2B (main text). (B) Geo-coded diagnostic samples of the vector ACP collected between December 2011 and October 2018 as part of the HLB state-wide survey in Texas. Samples with Ct value less than 38 were marked as positives. Diagnostic samples of citrus leaves collected as part of the same survey are mapped in Fig 1D (main text).