An ecological and digital epidemiology analysis on the role of human behavior on the 2014 Chikungunya outbreak in Martinique

Understanding the spatio-temporal dynamics of endemic infections is of critical importance for a deeper understanding of pathogen transmission, and for the design of more efficient public health strategies. However, very few studies in this domain have focused on emerging infections, generating a gap of knowledge that hampers epidemiological response planning. Here, we analyze the case of a Chikungunya outbreak that occurred in Martinique in 2014. Using time series estimates from a network of sentinel practitioners covering the entire island, we first analyze the spatio-temporal dynamics and show that the largest city has served as the epicenter of this epidemic. We further show that the epidemic spread from there through two different propagation waves moving northwards and southwards, probably by individuals moving along the road network. We then develop a mathematical model to explore the drivers of the temporal dynamics of this mosquito-borne virus. Finally, we show that human behavior, inferred by a textual analysis of messages published on the social network Twitter, is required to explain the epidemiological dynamics over time. Overall, our results suggest that human behavior has been a key component of the outbreak propagation, and we argue that such results can lead to more efficient public health strategies specifically targeting the propagation process.


S1/ Estimation of Aedes aegypti population dynamics 23 24
In order to estimate mosquito population dynamics, we analyzed through a 25 logistic Generalized Linear Model (GLM) the presence/absence data of Aedes aegypti 26 throughout the Martinique Island thanks to the extensive routine surveillance during 27 the last twenty years (described elsewhere, see (1)). Table S1 shows the results of this 28 GLM.

sqrt(Proportion of breeding sites within large recipient)
5.419e-2 <2e-16*** sqrt(Proportion of breeding sites within tires) 1.061e-1 <2e-16*** sqrt(Proportion of breeding sites within trashe s) Through using this GLM, we can therefore extrapolate the population 36 dynamics throughout the island (figure S1), which reflects roughly a sinusoidal 37 function. It is worth pointing out that, in addition to these factors that have been kept, 38 our dataset included also the same set of environmental variables that the one fully 39 described in (2). These remaining variables were the variables kept after a forward-40 model selection. It is worth pointing out that the relation between presence of larva in breeding 48 sites and abundance of adult mosquitoes is not simple and probably not linear as 49 we assume. First, there is some delay between the larval and the adult stages. 50 Nevertheless, this delay is about a week for Aedes aegypti, while we consider a 51 month time scale in our epidemiological model. Such delay is therefore not able 52 to perturb our results. Moreover, the probability of presence could be not 53 directly proportional to the abundance. However, we consider this dynamic only 54 to identify the trends in mosquito population dynamics, i.e., the periods of high 55 and low abundance, while its quantitative impact on pathogen transmission rate 56 relies on epidemiological parameters that are estimated using the 57 epidemiological dynamics (see Section S3 in Supplementary materials for a full 58 description). We are therefore confident that our estimation of mosquito 59 population is relevant for the purpose of our study.

S2/ Classification of tweets and dynamics through time 61
We have first identified all the Twitter accounts that have mentioned the word 62 "Chikungunya" during the outbreak in Martinique (from December 2013 to June 63 2014). Among these accounts, only the ones declared to be located in Martinique in 64 their user profile have been considered. The number of such tweets was interpreted as 65 measure of the awareness of the outbreak. 66 In order to measure the protection need, two different persons analyzed the 67 content of each message (tweets) three times to identify correctly the tweets 68 expressing protection need. During the first reading, we identified all the keywords 69 associated with protection (the keyword in French is indicated in italics): 70 71 Répulsif (repellant), protéger (to protect), anti-moustique (anti-mosquito), traitement 72 (treatment), raquette (electric racket used to kill mosquitoes), huiles essentielles 73 (essential oils), prier (pray), homéopathie (homepathy), vêtement long (long clothes), 74 démoustication ( Following this, a second reading of those tweets containing any of the above 79 keywords was done once again independently by the two readers to verify their 80 classification as expressing protection need. Finally, a third reading by the two readers 81 was done on the tweets that did not contain any of the keywords identified in order to 82 confirm their classification as not expressing protection need. 83 84 85 86 We note that protection need tweets are a subset of the awareness tweets, as 87 described above. This follows also from the observation that protection need requires 88 awareness of the disease in the first place. Figure S2 shows the dynamics of these two 89 quantities through time.

S3/ Estimation of best parameters 98
We estimate the likelihood function of our model through the following 99 function (3): 100 where d represents data, m model realizations and ! the variance of the 102 errors. We therefore assume that the errors are normally distributed, as it has been 103 done in similar studies (3, 4).  In the main text, we show that propagation waves are different in the north and 152 in the south of the island (i.e., correlations are significant with the epidemic's peak in 153 the north and with the whole time series in the south). As mentioned in the discussion, 154 one main difference between the north and the south of the island is the road network 155 topology, which is less dense in the north of the island because the presence of the 156 volcano ( Figure S6). 157 158 159 where index i indicates the population considered, and n is the number of 174 populations involved. !" represents the transmission rate from population i to 175 population j. All other parameters are the same as in the main text.

177
To represent the road network in the north of the island, which is not 178 extremely connected, we assumed a one step matrix as following: We assumed a transmission rate that is decreasing through the reciprocal 182 distance for the south of the island: Here !" represents the inter-population transmission rate and !! the intra-186 population transmission rate. Initially, we assume that all !" are identical among 187 them as well as all !! among them. Then, we also include a random term ( , which 188 follows a uniform distribution between 0 and 1) in transmission patterns between 189 localities in order to introduce stochastic noise. We have considered seven localities 190 arbitrarily, but the results shown below will remain the same as far as the same 191 assumptions are considered regarding matrix values. Finally, we assume that the 192 population sizes of the different localities decrease linearly with geographic distance 193 from the largest city.

195
We then wanted to test if the first situation (representing the road network in 196 the north of the Island) and the second one (representing the road network in the south 197 of the Island) can produce the pattern we observe in the data. In the first situation 198 (unidimensional road network, corresponding to the North of the island), the timing of 199 the epidemic peak has a much stronger correlation with geographic distance than the 200 distance between series ( Figure S7, A,C,E). However, the Euclidean distance between 201 times series has a much stronger correlation with geographic distance than the timing 202 of the epidemic peak for the second situation (representing the South of island, figure  203