Statistical methods for linking geostatistical maps and transmission models: Application to lymphatic filariasis in East Africa

Infectious diseases remain one of the major causes of human mortality and suffering. Mathematical models have been established as an important tool for capturing the features that drive the spread of the disease, predicting the progression of an epidemic and hence guiding the development of strategies to control it. Another important area of epidemiological interest is the development of geostatistical methods for the analysis of data from spatially referenced prevalence surveys. Maps of prevalence are useful, not only for enabling a more precise disease risk stratification, but also for guiding the planning of more reliable spatial control programmes by identifying affected areas. Despite the methodological advances that have been made in each area independently, efforts to link transmission models and geostatistical maps have been limited. Motivated by this fact, we developed a Bayesian approach that combines fine-scale geostatistical maps of disease prevalence with transmission models to provide quantitative, spatially explicit projections of the current and future impact of control programs against a disease. These estimates can then be used at a local level to identify the effectiveness of suggested intervention schemes and allow investigation of alternative strategies. The methodology has been applied to lymphatic filariasis in East Africa to provide estimates of the impact of different intervention strategies against the disease.


condition
The absolute continuity condition in Lemma 1, that f must be absolutely continuous with respect to g, means that whenever g(p) = 0 then we must also have f (p) = 0. In other words, when the prior probability of a prevalence is zero then the map measure of prevalence must also be zero. This has important considerations for implementing our method. For example, we find that our prior for the transmission model parameters is unlikely to produce any simulations with a prevalence of above 85%. However, prevalences this high do appear in the tails of the posterior distribution in the geostatistical model, especially when the amount of uncertainty is high. When the weighting is applied, there are no simulations to capture the high prevalence region, which leads to an underestimation of the mean prevalence, for example.
There are several potential approaches to ameliorate this. The simplest would be to mask out areas of the map where this occurs. However, there may be large areas of the map that have only a very small probability of producing a prevalence above 85% according to the geostatistical model.
A second approach, which we could take if we believed that prevalences above 85% were a priori impossible, would be to remove any samples from the geostatistical map that went above 85%. This is effectively the same as applying the weighting naively, and leads to the mean and median of the prevalence distribution of the simulations being lower than the corresponding statistics from the geostatistical model. A final more sophisticated approach could be to shift the weight from high prevalences in the geostatistical map to the nearest prevalences available in the simulations. For example, by using the histogram-based empirical Radon-Nikodym derivative with a wide bin capturing all prevalences above 75%. Although this reduces the size of the underestimation, it substantially reduces the effective sample size, as a small number of simulations get very high weight.
Appendix A.3. Minimum discrepancy-based empirical Radon-Nikodym derivative For each pixel, we would like to minimise the following distance between the empirical cumulative distribution functions (cdfs) of the posterior prevalences and the weighted simulated prevalences: with respect to the weights w (2) . First, the posterior and simulated prevalence samples are sorted in ascending order, such that . Similarly, we introduce the notation w (2) (j) for the weights of simulation j which produces prevalence p (j) , for j = 1, 2, . . . , J. For clarity of notation, an example of two empirical cdfs is shown in the left panel of Figure  Starting with j = 1, we have that: (1)ŵ (2) (2) = 0ŵ (2) 1 Figure A.1: An example of the empirical cumulative distribution functions of the posterior prevalences (red) and the weighted simulated prevalences (blue). The empirical cdf of the weighted simulated prevalence is shown in the left panel with arbitrary weights and in the right panel using the optimal weights that achieve the minimum distance between the two cdfs, as described in Appendix A.3.
At the minimum Equation (A.1) is equal to 0, therefore: Similarly, we differentiate the distance between the two cdfs with respect to w (2) (j) , for j = 2, 3, . . . , J − 1 and by setting it equal to 0, Finally, to ensure that the weights sum to one, for j = J we have that: (2) (k) As an example, in the right panel of Figure A.1 we provide the optimal weightsŵ (2) (j) , i.e. the ones with the minimum distance between the two empirical cdfs.

Appendix B. Toy example
Suppose that the prior distribution is π(θ 1 , θ 2 ) = 2 if 0 < θ 2 < θ 1 < 1 and zero otherwise. The prior support and marginal densities are shown in Figure   B.2. For simplicity, assume that the transmission model has equilibrium prevalence given by p(θ 1 , θ 2 ) = θ 1 so that the induced prior over prevalences is the marginal for θ 1 , ie. g(p) = 2p for 0 < p < 1. Further, suppose that we are given a pixel with prevalence measure f (p) = 2(1 − p) for 0 < p < 1.
This challenging example allows us to assess how the methodology performs when there are few simulations with low weights in the area of high posterior probability close to p = 0.
By the Lemma 1 of the main text, the new measure over the parameter space is given by First, we can verify that we get the correct marginal for p = θ 1 .
as required.
Second, we can determine the new marginal density for θ 2 .
In this section, we perform a series of simulations to assess the accuracy and efficiency of the proposed method for recovering the distribution of pixel prevalence. Particular focus is given on how the method is affected as we vary the value of δ, the proposal distribution q(θ), and the empirical estimate of the Radon-Nikodym derivative.
We first investigated the performance of the proposed method as a function of δ, where the observed pixel and simulated prevalence data are obtained from the toy model described in Appendix B. In particular, we generated M = 2 000 pixel prevalence samples from Beta(1,2). We then generated J = 2 000 samples from the joint prior distribution of parameters θ 1 and θ 2 ; first we drew θ 1 ∼ Beta(2,1) and then θ 2 | θ 1 ∼ θ 1 ×Beta(1,1). Therefore, the obtained simulated prevalence samples are draws from Beta(2,1), which is the marginal prior for θ 1 . We considered values for δ from 0.001 through to 0.1, increasing by 0.001 each time.
Accuracy was assessed by computing the KolmogorovSmirnov (KS) distance which is defined as the largest vertical distance between the two empirical cumulative distribution functions of the pixel prevalence and the weighted simulated prevalence. We also consider an additional measure of quantifying the distance between the two cdfs, termed as integrated squared distance and given by The calculation of the two distances is repeated 100 times for each value of δ, using new pixel and simulated prevalence samples each time, in order to prevent biases occurring due to the simulating procedure. Results are shown in Figure B.3. We see that the performance of our method is affected by the value of δ. In the left panel of Figure B.3 we show the median KS distance along with the 95% credible interval, over the 100 realisations for each δ. Overall, the accuracy of the algorithm increases as δ grows from 0.001 to 0.033, where the median KS distance reaches its minimum value, and for δ higher than 0.033 a slight decrease is observed. The integrated squared distance, shown in the middle panel of Figure B.3, provides identical conclusions. Therefore, from now on we use the integrated squared difference to assess the accuracy of the method.
An opposite pattern is observed in the efficiency of the method, as can be seen in the right panel of Figure  In this section, we carry out a sensitivity analysis to assess the effect that the proposal distribution of the parameters has on the performance of the method. More specifically, instead of drawing θ 1 from its prior distribution Beta(2,1), we consider a uniform proposal U (0, 1). This change results to a uniform distribution over the simulated prevalences. As before, we assess the performance of the method for different values of δ, shown in Figure   B.4. Overall, we conclude that the performance of the method improves substantially when we move from the prior to the uniform proposal distribution over the prevalences. This is because there are no areas with strong posterior probability that have few simulations from the proposal. In particular, the median integrated squared distance between the two cdfs (left panel) appears to be much lower in the latter compared to the former, and it is also associated with lower variability of the estimate. In addition, using a uniform proposal distribution leads to substantial increase in ESS (right panel), with the median ESS being at least 1.5 times higher for values of δ close to 0.001 and up to 3.5 times for larger values of δ. For reference, we also display the minimum discrepancy-based empirical Radon-Nikodym derivative, as described in Appendix A.3, and its associated ESS. Overall, we come close to reach the minimum possible integrated squared distance between the two cdfs, with larger ESS.
To further assess the accuracy of the method we compared the weighted posterior distribution of θ 2 with its true density, calculated in Equation (B.1).
Results are shown in Figure B.5 for each choice of the proposal distribution.
The simulation analysis illustrates that both proposals perform well in terms of recovering the target density of θ 2 . Nevertheless, using a uniform proposal for the prevalence leads to a large improvement in the estimate. In order to study the influence of the empirical Radon-Nikodym derivative (ERND) on the method performance, we repeated our simulation analysis for three different choices: (a) the proposed empirical estimate described in Section 2.2 referred as the distance-based derivative, (b) the histogrambased derivative in Section 2.5.1 and (c) the discrepancy-based derivative using the integrated squared distance (for more details see Section 2.5.2 of the manuscript). Results are given in Table B.1. Note that the proposed distance-based derivative is based on using the prevalences withinδ/2 of p j and the histogram-based derivative is calculated by splitting the [0, 1] interval into 100 equal bins.
As expected, the discrepancy-based ERND is optimal in terms of the distance between the two cdfs, but results in much lower ESS compared to the alternative derivatives. When there are few simulations of the proposal in the area of high posterior probability close to zero prevalence, the distance-based ERND performs better compared to the histogram-based ERND in terms of both integrated squared distance and ESS. When we move to a uniform simulated prevalence distribution, we find that histogram-based derivative outperforms the distance-based derivative, and is the one that scores highest in terms of ESS, followed in order by distance-based and discrepancy-based derivative. However, its performance depends on the choice of bins, since the relative weightings within each bin are unchanged. For example, Figure B.6 uses just 10 bins to illustrate that the distribution within each bin does not reflect the target distribution. Finally, we conclude that the performance of all the derivatives is greatly improved when we change the proposal from the prior to the uniform distribution over the prevalences.  for each individual and according to the number of fertile female worms is increasing as well as decreasing at constant rate. The total mf density in the population contributes towards the current density of L3 larvae in the human-biting mosquito population. Therefore, the model describes the dynamics of individual human, worm inside the host, mf inside the host and larvae inside the mosquito. A detailed description and mathematical formulation of the model are given in Irvine et al. (2015) and more recently in Smith et al. (2017), so here we provide a summary.

Worm dynamics
For each individual i, both male and female worms are added according to their bite risk b i which is individually sampled from a gamma distribution with mean = 1 and shape parameter k. The rate at which an individual i acquires an adult worm, either female or male, is described by the following expression: where λ is the number of bites per mosquito, V /H is the ratio of vectors to hosts, ψ 1 is the probability that an L3 larvae leaves the host during a biting event, ψ 2 is the probability that the L3 enters the host, s 2 is the proportion of L3 within the host that develop into adult worms and h(a) is the age-dependent biting rate which increases with body size to saturate at age nine. Finally, we assume that each worm has a constant death rate µ.

Mf dynamics
For an individual i, the dynamics of mf are given by the following expression: with α being the production rate of mf per worm, γ the constant death rate of mf and function I is one if there are male worms and zero if not.
This expression follows from the fact that W. bancrofti is completely polygamous and therefore the mf production rate depends upon the number of female worms combined with the presence of at least one male in the human host.

Larvae dynamics
Larvae development occurs when mf entered the mosquito during a blood meal form an infected host. There are two forms of this relationship depending on the genus of mosquito vectors as expressed below: where m is the concentration of mf per 20µL taken during a blood meal and r, κ s are the parameters which are related to the functional form of the uptake curve (Gambhir and Michael, 2008). The uptake of larvae is an average of mf concentration in the peripheral blood over all individuals weighted by their bite-risk b i and is described by the following function:L giving the average number of larvae per mosquito. Using this expression we can calculate the averaged number of larvae taken up in the populationL as follows: where λ is the number of bites per mosquito, g is the proportion of mosquitoes which pick up infection when biting an infected host, σ is the death rate of mosquitoes and ψ 1 is the proportion of L3 leaving the mosquito per bite.
Finally the equilibrium value for L3 in a mosquito is given by:

Host dynamics
Each human begins with zero infection and as mentioned before, has a bite-rate of exposure drawn from a gamma distribution with mean = 1 and shape parameter k. The shape parameter defines how aggregated bites are amongst individuals and consequently defines the aggregation of infection amongst individuals (Irvine et al., 2015). Each individual has a rate of death τ that is assumed to be constant throughout an individuals lifetime with a cut-off at age 100.
A list of the basic model parameter values is provided in Table C.3, with specification of their source. In Table C    individual exposure to mosquitoes (k) and the importation rate (α Imp ). More specifically, for the importation rate we used a uniform prior distribution with minimum 0 and maximum 0.0005 (max 5 10 000 infections per month). We investigate different maximum values for the importation rate, and we chose the one which give us the desired results without driving the dynamics of the disease. In addition, the interventions reduce the prevalence over time, and therefore as years pass, we decrease the importation rate after intervention in proportion to the reduction in prevalence seen in pilot simulations. More specifically, we produce 2 000 simulations with constant importation rate over five years of MDA. Then, for our main set of simulations, we adjust the importation rate according to how the prevalence changed in our pilot runs after the intervention was applied.
Parameters V /H and k were drawn from a range of plausible values based on previously analysed data (Irvine et al., 2015Smith et al., 2017). The graphical representation of their prior distribution is shown in the left panel of Figure C.7. We then generate 100 000 parameter vectors by randomly sampled from these prior distributions of parameters V /H, k and α Imp and from the proposal distribution of the population size as described in Section 4.3 and showed in the right panel of Figure C.7. These samples were then used to generate 100 000 simulations from the model for each scenario specified in the main text, using the exact years and coverage of the MDA treatments.
Our simulations here are focused on areas with anopheles as the dominant vector species. Finally, we set δ = 0.01.     Point estimates along with lower (2.5%) and upper (97.5%) percentiles are presented.