Improving early epidemiological assessment of emerging Aedes-transmitted epidemics using historical data

Model-based epidemiological assessment is useful to support decision-making at the beginning of an emerging Aedes-transmitted outbreak. However, early forecasts are generally unreliable as little information is available in the first few incidence data points. Here, we show how past Aedes-transmitted epidemics help improve these predictions. The approach was applied to the 2015–2017 Zika virus epidemics in three islands of the French West Indies, with historical data including other Aedes-transmitted diseases (chikungunya and Zika) in the same and other locations. Hierarchical models were used to build informative a priori distributions on the reproduction ratio and the reporting rates. The accuracy and sharpness of forecasts improved substantially when these a priori distributions were used in models for prediction. For example, early forecasts of final epidemic size obtained without historical information were 3.3 times too high on average (range: 0.2 to 5.8) with respect to the eventual size, but were far closer (1.1 times the real value on average, range: 0.4 to 1.5) using information on past CHIKV epidemics in the same places. Likewise, the 97.5% upper bound for maximal incidence was 15.3 times (range: 2.0 to 63.1) the actual peak incidence, and became much sharper at 2.4 times (range: 1.3 to 3.9) the actual peak incidence with informative a priori distributions. Improvements were more limited for the date of peak incidence and the total duration of the epidemic. The framework can adapt to all forecasting models at the early stages of emerging Aedes-transmitted outbreaks.


Models
Here, we provide a detailed description of three dierent models adopted for the reconstruction of the informative priors and for providing forecast on the ZIKV epidemics in French West Indies. A one island, one disease model was adopted to forecast ZIKV epidemics in each island of French West Indies based on dataset D1 and the three dierent prior choices presented in the main paper. A several islands, one disease approach was used for modelling jointly several CHIKV epidemics in the three islands of the French West Indies (based on dataset D2), and obtaining the posteriors on regional and local CHIKV parameters as described in the rst step of the procedure presented in the main paper. A several islands, several disease model [?] was used for the several CHIKV and ZIKV epidemics in six islands of French Polynesia (dataset D3) to recover the ratio between ZIKV and CHIKV parameters as described in the second step in the main paper.
1.1 One island, one disease: dataset D1 We are interested in modelling observed ZIKV incidence data in an island of the French West Indies available (up to week K) and producing forecasts of future observed incidence (up to week K + 104). We consider the time series O = {O t } t=1,··· ,K of the weekly number of incident cases reported to the surveillance system. O consists of the cases who sought clinical advice and were diagnosed, a fraction of the (unobserved) incident infected cases I = {I t } t=1,··· ,K . We therefore write, in the "observation" level of the model, that O t is a proportion ρ of all cases I t according to: where ρ is the probability that an infected case consulted with a health professional, was diagnosed and reported, thereafter refered to as the reporting rate. In the "transmission" level, we link incidence I t with past observed incidences O − t = {O 1 , · · · , O t−1 } as : where N the total population of the island and R 0 is a transmission parameter. The term 5 n=1 w t,n O t−n /ρ summarizes exposure to infectious mosquitoes at time t: it is an average of past incidence with weights dened by the serial interval distribution w t,n = G(n + 0.5;T t ) − G(n − 0.5;T t ), as discussed in section 2 of the supplementary appendix. The number of susceptible individuals S t = N − t−1 u=1 I u at the beginning of period t is computed as N − t−1 u=1 O u /ρ, noting that O u /ρ is a rst order approximation to I u . To avoid data augmentation with the unobserved I during estimation, we collapse the "observation" and "transmission" levels into a single binomial distribution: In a nal step, we account for the imprecise nature of the O data, since observed cases O have been extrapolated from limited information provided by a network of local health practitioners. We therefore allow for over-dispersion using a negative binomial distribution instead of the binomial, as: where variance is computed as the mean divided by φ. The joint probability of data and parameters is nally where π(R 0 ), π(ρ) and π(φ) are prior distributions, which could either be non-informative or informative (Table 1), the implications of this choice being the main interest of the paper. The forecasts of future incidence rely upon a stochastic model following equation (4), and are obtained using the full posterior distributions of R 0 , ρ and φ.

Several islands, one disease: dataset D2
Here, we aim to jointly model the three CHIKV epidemics in Martinique, Guadeloupe and Saint-Martin, with the objective of extracting information from these past epidemics and use it to improve forecasting. We introduce a hierarchical structure in the model for reporting rates and transmission, with island levels nested within a regional level. Reporting rates ρ i in island i are modelled using a logistic-normal model where r i ∼ N (µ ρ , σ 2 ρ ) is a random island-specic eect. Reporting is thus controlled at the regional levels by two hyperparameters, θ rho = {µ ρ , σ ρ }. Likewise, we allow for a random island-specic coecient in the transmission term, with Transmission is thus controlled at the regional level by two hyperparameters θ R0 = {µ R0 , σ R0 }. Writing O i the observed incidence in the outbreak in island i, and O the whole dataset, we have: We chose weakly informative prior distributions for all parameters (Table A).

Several islands, two diseases: dataset D3
Last, we jointly model the successive epidemics of CHIKV and ZIKV in six islands or archipelagoes of French Polynesia. Building from the hierarchical model introduced in section (1.2), we add a xed eect for the disease. Reporting rates ρ ij in island i for disease j are therefore modelled using a logistic-normal model where r i ∼ N (µ ρ , σ 2 ρ ) is an island-specic random parameter corresponding to the reporting rate of CHIKV in island i, V j is 1 for ZIKV and 0 for CHIKV and β ρ is the ratio of reporting ZIKV cases relative to CHIKV cases during an epidemic. Three parameters θ ρ = {µ ρ , σ ρ , β ρ } thus control reporting. Likewise, transmission is modelled as follows: is an island-specic random parameter corresponding to the transmission of CHIKV in island i, V j is 1 for ZIKV and 0 for CHIKV and β R0 is the relative transmission of ZIKV compared to CHIKV. Transmission thus depends on parameters θ R0 = {µ R0 , σ R0 , β R0 }. Writing O ij the observed incidence in the outbreak in island i and disease j, and O the whole dataset, we have: There again, we chose weakly informative prior distributions for all parameters (Table B).

Parameter
Prior distribution Comments µ ρ N (0, 1.5 2 ) Implies a uniform distribution between 0 and 1 after inverse-logit transformation.  We thus obtained posterior estimates of the dierence between reporting and transmission during CHIKV and ZIKV epidemics occuring in the same locations, that is π(β ρ |D3) and π(β R0 |D3), respectively. These were combined with the information extracted from dataset D2 to build the local and regional informative priors.

Serial interval
Here we provide detailed description of the reconstruction of the serial interval distribution. We used the framework that we previously developed [?, ?, ?]. It describes the dierent stages of disease progress in the infected humans and vectors from the development of symptoms in the primary case to development of symptoms in the secondary case. Precisely, we split the distribution of the serial interval T SI in four components: 1. Time from infectiousness to symptoms in a human case T V , distributed as: where τ V is the maximum duration from infectiousness to symptoms.
2. Time from infectiousness to infectious mosquito bite T B , such as: where τ a and τ b are the minimum and maximum durations of the infectious period, respectively.
where γ is the average duration of a gonotrophic cycle, κ is the duration of the extrinsic incubation period. The weights of each T M,c component is proportional to e −δcγ where δ is the mosquito mortality: this corresponds with the fraction of mosquitoes surviving long enough to bite during the c-th cycle.
Depending on the gonotrophic cycle duration, the number of components in the mixture changes.
4. The incubation period (from infection to disease onset) in a secondary case T I , which follows a lognormal distribution of mean µ I and standard deviation σ I .
Using these components, the serial interval is: According to this formulation, the distribution of T SI depends on eight parameters, ve of which relate only to the human host (µ I , σ I , τ V , τ a , and τ b ), and are constant over time for a given virus (  This resulted in distributions for the serial interval best summarized by gamma distributions with mean 2.5 weeks (standard deviation 0.7) for ZIKV and mean 1.6 weeks (standard deviation 0.6) for CHIKV. These distributions were then discretized to be used as the weights w t in the statistical models (

Information extracted from historical data
The analysis of past CHIKV outbreaks in the French West Indies (dataset D2) using the model described in section 1.2 and of past ZIKV and CHIKV outbreak in French Polynesia (dataset D3) using the model described in section 1.3 led to posterior distributions for R 0 and ρ summarised in Table D. In addition, dataset D3 was used to estimate the relative transmissibility (with a ratio β R0 estimated to 1.03 with 95% credible interval 0.90-1.18) and the relative reporting (with a ratio β ρ estimated to 0.46, with 95% credible interval 0.42-0.50) of ZIKV with respect to CHIKV.

Parameter estimates for the ZIKV epidemics in the French West Indies
The evolution of the posterior distributions (mean and 95% credible intervals) of the basic reproduction number R 0,Z (panel A) and the reporting rate ρ Z (panel B) throughout the ZIKV epidemics of the French West Indies is shown in Fig. 5 of the main paper. In addition, we present summaries of the posteriors at dates P" (peak) and E" (end of the period of epidemic activity) in Table E Table E: Posterior distributions of R 0,Z and ρ Z at dates P" (peak) and E" (end of the period of epidemic activity) during the ZIKV epidemics in the French West Indies using dierent a priori distributions on the parameters: non-informative priors (NI) or informative priors based on historical data considered either at the regional (R) or the local (L) level. Distributions are summarized by their mean and 95% credible interval.

Impact of the serial interval
Our estimate of the distribution of the serial interval was based on a mechanistic reconstruction based on several assumptions (section 2). In this sensitivity analysis, we verify whether the main results would hold if the mean of the distribution of the serial interval was translated by +1 week or -1 week (Fig. B). The advantages of using informative priors, in particular local, over non-informative priors were still noticeable with this alternative parameterizations (Fig. C).

Using information from CHIKV in the same island without adjusting for the dierences between CHIKV and ZIKV
In this sensitivity analysis, we replace the informative prior distributions on R 0 and ρ by the posterior estimates of these parameters obtained using data on CHIKV in the same islands, without adjusting for the ratios between CHIKV and ZIKV as advocated in our approach. Compared to the local priors used in the main analysis, this change mainly aects the prior on ρ, which was estimated in French Polynesia to be lower for ZIKV than for CHIKV (β ρ was estimated to 0.46 with 95% credible interval 0.42-0.50), while the estimates of R 0 were similar for both diseases (β R0 was estimated to 1.03 with 95% credible interval 0.90-1.18). The results show that omitting the ratios between ZIKV and CHIKV consistently leads to signicantly lower forecasting accuracy (Fig. D). This highlights the importance of the comparative analysis between the dierent diseases of interest to inform the use of historical data from one disease in the epidemic forecast of the other.

Using information from the ZIKV epidemics in other islands
Here, we use the posterior estimates of R 0 and ρ obtained from the ZIKV epidemics in each of six islands of French Polynesia as priors for the ZIKV epidemics in the French West Indies. Compared to the local priors built for Guadeloupe, Martinique and Saint-Martin using data from these islands, this change leads to consistently higher priors on R 0 , centered around 2 (Fig. ??). The modications induced on the priors on ρ dier for each island. For instance, the estimates of ρ in Mo'orea, the Sous-le-vent islands and Tahiti concentrated around 0.15 are very close to the local prior on ρ applied to Guadeloupe, while they are higher in the Australes, Marquises and Tuamotu islands. These dierences in the range of prior specication translate into dierences in forecasting accuracy (Fig. E), with a large drop in accuracy when using estimates from the Australes, Marquises or Tuamotu islands as priors. In Martinique and Saint-Martin, local priors computed according to our approach led to better results than priors directly taken from any area of French Polynesia at the early stages. In Guadeloupe however, the accuracy obtained with priors from Sous-le-vent, Tahiti or Mo'orea was similar to that obtained with local priors computed according to our approach. This sensitivity analysis emphasizes that directly using the estimates obtained from an epidemic of the same pathogen in a dierent location might lead to prior misspecications causing large inaccuracies in some cases.

The respective impact of the informative priors on ρ and R 0
One of the conclusions of this work is that improvements in forecasting quality come together with improvements in the estimation of the reporting rate ρ rather than of the transmission rate R 0 . This suggests that prior information is essentially required for the reporting rate, a dicult-to-estimate quantity as already noted in [?]. In a sensitivity analysis, we test this hypothesis, by testing separately the informative priors given on R 0 and on ρ. The results conrm the greater importance of using informative priors on ρ in forecasting accuracy (Fig. F). Indeed, after the rst 2-4 weeks of circulation, by providing a prior for ρ only we obtain the same accuracy as providing priors for ρ and R 0 . This small delay is coherent with Fig. 5 of the main paper, which shows that R 0 estimates are close to the nal ones from weeks 2-4.

Informative priors on the overdispersion parameter
The parameter φ is related to the imprecise nature of observed incidence data, which was extrapolated from the limited information provided by a network of local health practitioners (see section 1.1). In this sensitivity analysis, we measure the impact of introducing an informative prior on φ, obtained directly from the analysis of past CHIKV epidemics in the French West Indies. No hierarchical structure was introduced for this parameter, which was thus considered the same in every island. Moreover, we did not consider meaningful to adjust the estimate of φ obtain during the CHIKV epidemics in the French West for ZIKV, and this directly used this estimate as prior for the ZIKV epidemics in the region. Adding an informative prior on this parameter has a very limited eect on forecasting accuracy (Fig. G).