Dengue modeling in rural Cambodia: statistical performance versus epidemiological relevance

Dengue dynamics are shaped by the complex interplay between several factors, including vector seasonality, interaction between four virus serotypes, and inapparent infections. However, paucity or quality of data do not allow for all of these to be taken into account in mathematical models. In order to explore separately the importance of these factors in models, we combined surveillance data with a local-scale cluster study in the rural province of Kampong Cham (Cambodia), in which serotypes and asymptomatic infections were documented. We formulate several mechanistic models, each one relying on a different set of hypotheses, such as explicit vector dynamics, transmission via asymptomatic infections and coexistence of several virus serotypes. Models are confronted with the observed time series using Bayesian inference, through Markov chain Monte Carlo. Model selection is then performed using statistical information criteria, but also by studying the coherence of epidemiological characteristics (reproduction numbers, incidence proportion, dynamics of the susceptible class) in each model. Considering the available data, our analyses on transmission dynamics in a rural endemic setting highlight both the importance of using two-strain models with interacting effects and the lack of added value of incorporating vector and explicit asymptomatic components.


INTRODUCTION INTRODUCTION
The World Health Organization (WHO) considers that dengue is a major public health issue 41 worldwide, with four billion people in 128 countries exposed to the dengue virus [3,4], an    130 We take as the reference population (N=161391) the number of children below 15 years old . Models 140 We take a Susceptible-Exposed-Infected-Recovered (SEIR) model as the simplest model 141 (cf. Figure 2). In this model, the basic reproduction number, i.e. the number of secondary

One-strain models
This model is compared with two other models that include the mosquito vector transmission 145 components. In the first one, derived from Pandey et al. [29], the vector is modelled . Models METHODS external force of infection including two stages, latent (κ) and current (λ) (cf. Figure 4). 149 We derived R 0 for each model as R Pandey 0 (γ+µ H )(σ+µ H ) µ V (µ V +τ) and the estimation . Models where v s is the proportion of susceptible mosquitoes, v E the proportion of exposed mosquitoes, implicit vector-borne transmission is modelled with the compartments κ and λ; λ current force of infection; κ latent force of infection reflecting the exposed state for mosquitoes during the extrinsic incubation period; β(t) is the transmission parameter; τ is the transition rate associated with the extrinsic incubation period.
The equations describing the Laneri model are: . Models 160 We also consider a model in which asymptomatic infections are explicitly taken into account 161 in the transmission process (cf. Figure 5)

Model with explicit asymptomatic individuals (SEIAR)
In the 2012 and 2013 epidemics, DENV-1 was highly dominant : the three other serotypes 172 represented less than 1% of the cases reported in the DENFREE study in 2012 and about 173 20% in 2013 (cf. Table 1). Therefore, a two-strain model is also studied, in which we 174 separate DENV-1 cases from DENV-2, DENV-3 and DENV-4 combined (cf. Figure 6).

175
For simplicity and parsimony in the number of parameters, the two strains share the same 176 parameter values. We first assume both strains to be independent (ψ = 1 in equation 5, 177 called SEIR2 model). In this context, the reproduction numbers for each strain are equal, strains; H E1 (resp. H E2 ) individuals infected (not yet infectious) to strain 1 (resp. strain 2); H I 1 (resp. H I 2 ) individuals infectious to strain 1 (resp. strain 2); H S1 (resp. H S2 ) individuals immune to strain 1 only (resp. strain 2); H E12 (resp. H E21 ) individuals (not yet infectious) with a secondary infection to strain 2 (resp. strain 1); H I 12 (resp. H I 21 ) infectious individuals with a secondary infection to strain 2 (resp. strain 1); H R individuals immune to both strains; β(t) is the transmission parameter; σ is the rate at which exposed individuals move to the infectious class; infectious individuals then recover at rate γ; ψ is the change in infectivity for secondary infected individuals in SEIR2psi model (in SEIR2 model, ψ = 1); individuals leave the children population at rate µ H .
, according to a sinusoidal function whose phase p and 197 amplitude b are estimated. 198 We also assume that a constant number of cases i are imported. The prior distributions are listed in Table 3.

217
[40], which we considered as the closest setting to be compared with Kampong Cham. We S λ is used as the mean of the gaussian prior, and the standard deviation is fixed at 0.05.

222
Using information on the sampling scheme, we calculate an observation rate on DENFREE 224 data (cf. Observation rate (r s =(g/f)*(n/N) and r a =b/N) 0.0283 0.0107 0.0661 0.0255 We first calculated this indicator on the data used for estimation (separating NDSS data 252 and DENFREE data). Then we used it to assess the predictive performance of the model,     [2014][2015]. It is also computed on separated years in order to highlight well or badly estimated years: for example, for each simulation i,  Figures 12 and 13).
For single strain models, the SEIR and Pandey models proved the best with the DIC criterion 274 (cf. Table 6). For two strain models, the SEIR2psi proved best. As regards simulation-based but the ones with two strains overpredict the first epidemic peak (as pointed out in Table 6).

285
SEIR2psi is the model in which the large epidemic in 2007 is best reproduced and overall,

286
SEIR2psi is the model that reproduces most accurately the observed data.    Simulations with negative binomial noise using parameters from the posterior distribution (SEIR/Laneri/Pandey/SEIR2/SEIR2psi: observed NDSS cases, SEIAR: hospitalized cases). Posterior median (solid line), 95% credible intervals (shaded blue area) and NDSS data points (black dots). Considering the predictive capacity of the models, only SEIR2psi model predicts a smaller 288 epidemic in 2014, as was observed in the data (cf. Figure 9). Across the other models,     The average R 0 is estimated to be between 2 and 3 in most of the models and the maximum 293 value between 3 and 4 (cf. Table 7), except with the Pandey model, in which it is higher  The observation rate for NDSS data varies between models, from 14 to 21% in models with 322 one strain and between 6 and 13% in the two strain models. These values indicate that a 323 large proportion of dengue cases are not reported in national surveillance, likely reflecting 324 mild symptoms that do not require hospitalization or misdiagnosis or misreporting.

326
In the SEIR2psi model, we also plotted the current number of infected individuals for each 327 strain (cf. Figure 10). In our simulations, the first strain is responsible for large explosive 328 outbreaks, whereas the second one has a more regular pattern over the years. Moreover, 329 the two strains are asynchronous and each one dominates for two or three years. It is also 330 qualitatively close to the dynamics observed in Thailand [13] or Singapore [54].

331
The proportion of susceptible individuals also displays asynchronous dynamics between 332 strains, as they reflect the history of past epidemics. Despite the seasonality and the 333 year-to-year variations, the total number of susceptibles remains high (cf. Figure 10 for We calculated the annual incidence proportion as the proportion of new infections over one 337 year among the susceptibles at the beginning of that year (cf. Figure 11). The values for 338 primary infections are coherent with other studies in Vietnam or Indonesia who analyzed 339 is highly variable from one year to the next, especially in models with two strains.  ones, and there is no strain specific R 0 or incidence.

389
Secondly, the selected model formulations were restricted due to data availability. In 390 particular, despite the endemicity of the four serotypes in rural Cambodia, we did not 391 consider more than two dengue serotypes. This was done to limit model complexity,