Opportunities for improved surveillance and control of infectious diseases from age-specific case data

One of the challenges faced by global disease surveillance efforts is the lack of comparability across systems. Reporting commonly focuses on overall incidence, despite differences in surveillance quality between and within countries. For most immunizing infections, the age-distribution of incident cases provides a more robust picture of trends in transmission. We present a framework to estimate transmission intensity for dengue virus from age-specific incidence data, and apply it to 363 administrative units in Thailand, Colombia, Brazil and Mexico. Our estimates correlate well with those derived from seroprevalence data (the gold-standard), capture the expected spatial heterogeneity in risk, and correlate with known environmental drivers of transmission. We show how this approach could be used to guide the implementation of control strategies such as vaccination. Since age-specific counts are routinely collected by many surveillance systems, they represent a unique opportunity to further our understanding of disease burden and risk for many diseases.


Introduction 45
A fundamental challenge of disease surveillance systems is how to transform data that is 46 routinely collected into useful, actionable evidence that can inform control interventions. Disease 47 surveillance systems typically focus on analyzing aggregate counts of cases over defined time 48 periods to stratify the risk of populations (1). However, the use of raw case counts, or incidences, 49 as a measure of disease risk can frequently be misleading because the quality of surveillance 50 often differs significantly both across countries and within regions of a country making 51 comparisons inappropriate. Areas with more complete reporting of disease may inaccurately 52 appear to have more disease simply because reported cases scale linearly with completeness of 53 reporting. Surveillance systems may change over time (e.g. improving in completeness over 54 time) making comparisons difficult or even impossible. These problems are particularly 55 troubling when examining disease trends or ranking regions according to their disease risk. 56 Recent efforts have tried to improve quantification of disease burden by pooling numerous 57 sources of data. For example, disease mapping methods that combine disease presence/absence 58 data, environmental covariates and available incidence data (from cohort or cross-sectional 59 studies) have been used to predict spatial limits and global case counts for diseases including 60 malaria, dengue and several neglected tropical diseases. (2)(3)(4)(5) However, while these methods are 61 successful in identifying boundaries of endemic areas, the robustness of these approaches to 62 quantify transmission within endemic areas hasn't been validated. 63 64 For most immunizing diseases, serological surveys of immunity are regarded as the gold 65 standard to measure the susceptible fraction and infer the extent of transmission, as they provide 66 a direct measure of the proportion of the population that has been infected at a single point in 67 time or between two or more time points. This is particularly useful for diseases such as dengue, 68 influenza or Zika, where the asymptomatic to symptomatic ratios are large or unknown. Methods 69 to estimate transmission parameters from age-stratified serological data have been available for 70 many years(6) and have been used to analyze trends in transmission for multiple diseases 71 including measles(7), hepatitis A(8), dengue(9-12), pertussis(13), influenza(14), malaria (15)  While aggregated case-counts can be misleading when quantifying disease risk, the age-84 distribution of incident cases contains a lot of information on the age-specific susceptibility of 85 the population. Importantly, the age-distribution of cases is also largely robust to under-86 reporting, facilitating the comparison between locations or over time. By combining age-specific 87 incidence data and mechanistic models of how population immunity is acquired over time, it is 88 possible to estimate key transmission parameters and obtain a much more accurate picture of the 89 local and global burdens of disease. Since age-specific counts are routinely collected by 90 surveillance systems as part of standard practice, they represent a missed opportunity to further 91 our understanding of epidemic patterns for many diseases. 92 93 Here, we use dengue virus as an example to illustrate how age-specific incidence data can be 94 used to quantify disease transmission and inform control interventions. Dengue is a relevant 95 example because, despite being the most widely spread mosquito-transmitted virus, large gaps 96 remain in our understanding of its global and local epidemiology(17). We present a model to 97 estimate the transmission intensity of dengue from age-specific incidence data, and apply it to 98 surveillance data from administrative units in four countries that suffer from endemic dengue 99 transmission (Thailand, Colombia, Brazil and Mexico). We validate our estimates using 100 serological data and show that they correlate well with known environmental drivers of dengue 101 transmission at subnational level. Finally, we show how this approach could be used to guide the 102 implementation of dengue control strategies such as vaccination. 103 104

105
We estimated the average forces of infection (FOI) over the last 20 years for 152 administrative 106 level one units where age specific case data was available ( Figure 1B) For the 17 locations where we had access to both age-stratified serological data (the gold 130 standard) and case data, we found good correlation between the estimates of the force of 131 infection derived from both data sources (R2=0.73, 95%CI 0.51-0.87, Figure 1C). In contrast, we 132 found no correlation between recent incidence of dengue in these locations (the average yearly 133 incidence over last 5 years where we had data) and the estimates of force of infection derived 134 from serological data (R2=0.002, Figure S1). 135 136 Since estimates of transmission intensity derived from seroprevalence data are only available for 137 a small number of locations, to further validate our method we also explored the association 138 between our estimates of the force of infection for 211 Colombian municipalities (administrative 139 level 2) with known environmental drivers of dengue transmission including temperature, 140 elevation, a published metric of Aedes aegypti abundance ( Figure 2) and population density 141 ( While we found strong associations between environmental variables and FOI, the recent 150 incidence of dengue in these municipalities was not correlated with temperature, elevation or 151 Aedes aegypti abundance (R2 0.01, 0.01, and 0.00 respectively, Table S1) We did find a negative 152 association between population density and incidence, indicating a 6% (95%CI 2%-10%) 153 decrease in log incidence for each 2-fold increase in population density. In locations where seroprevalence among 9 year olds was estimated to be less than 80%, we 177 calculated the age-group that could be targeted to ensure a seroprevalence >80%. Our results 178 suggest that, to comply with the WHO-SAGE recommendations, it would be necessary to target 179 children 14 years of age or older in over 70% (108/152) of locations. Furthermore, in 180 approximately 50% of the locations evaluated, the target vaccination need to be 18 years or 181 older, precluding school-based vaccination strategies. 182

Discussion 183
For most immunizing infections, age-stratified serological surveys are considered the gold-184 standard to measure the susceptibility of the population and quantify transmission. However, 185 population-based serosurveys are expensive and labor intensive and therefore rarely available at 186 a spatial or temporal resolutions useful to assess global and local epidemiologic patterns or to 187 target control interventions. We developed a model to estimate the force of infection of dengue 188 from age-specific incidence data, routinely collected by most surveillance systems, and 189 generated country-wide subnational estimates for Thailand, Colombia, Brazil and Mexico. Most surveillance systems use case counts or incidences to describe temporal and spatial trends 204 of communicable diseases. Our results underscore the extent to which, for immunizing diseases 205 in endemic circulation, recent incidence may be poor metric of transmission and may be 206 misleading when ranking spatial units ( Figure S3). Immunity of the population in high 207 transmission settings reduces the number of individuals that are susceptible to infection. As a 208 result, incidence in places where transmission intensity is lower, but people remain susceptible 209 for a longer period of time, may be roughly equivalent to that in higher transmission intensity 210 areas. Metrics such as the force of infection, that quantify the risk among the susceptible 211 population, better reflect the underlying transmission potential. Since FOI estimates are derived 212 from the age distribution of incidence, and not from the aggregate counts, they are more robust to 213 differences in surveillance efficiency and can be obtained with relatively small numbers of 214 yearly cases. These findings raise concerns about the vaccination policies that were decided in 215 some settings based on recent incidence, in the absence of seroprevalence data. In particular, the 216 decision to deploy the vaccine in the Brazilian state of Paraná is highly questionable, as estimates 217 suggest that seroprevalence is likely to be well below the recommended threshold for 218 vaccination, even among adults (20). 219 220 There are several limitations of using age-specific surveillance data to estimate transmission 221 parameters of dengue. Of concern are age specific differences in the probability of clinical 222 disease upon infection or in health care utilization that could bias estimates derived from case 223 data either up or down depending on the specific bias. Differences in reporting practices between 224 and within countries could also limit our capacity to reconstruct transmission history from case-225 data. Our model also makes several assumptions that may be questionable. In particular, it 226 assumes that the age distribution of cases represents the age-distribution of secondary infections, 227 thus ignoring the potential contribution of primary, tertiary and quaternary infections. It also 228 assumes that risk is not age dependent, even though there is some evidence that suggests that 229 certain age-groups may be at increased risk(21-23). Finally, it assumes equal circulation of all 230 serotypes, despite the known dominance of specific serotypes for extended periods of time in the 231 Americas.(21,24) Despite these simplifications, validation of our estimates using age-stratified 232 serological data from 17 locations is very encouraging, as is the good correlation with known 233 drivers of dengue transmission. While further validation is desirable, it is important to note that 234 some discrepancy between our estimates and those derived from seroprevalence data is expected 235 as the two sources of data do not represent exactly the same period of time and location. For 236 example, most of the serosurveys available were conducted in specific urban centers, while case-237 data represents the full administrative unit. 238 239 Targeting control interventions against dengue and other communicable diseases requires good 240 understanding of when and where transmission is occurring. Careful analyses of age-specific 241 incidence data, collected by surveillance systems at high temporal and spatial resolution, can 242 provide very useful information to characterize transmission and target control interventions at 243 spatial scales at which serological data is rarely available. While here we present average forces 244 of infection, these methods can also be used to reconstruct changes in transmission over time 245 (25,26). Open-access to age-specific incidence data would greatly enrich and enhance existing 246 efforts to quantify trends in the global burden of disease.

Data used 253
We used data on the yearly age-specific number of dengue and dengue hemorrhagic fever cases 254 for administrative level 1 units of Thailand, Colombia, Brazil and Mexico (27-30) as well as 255 administrative level 2 units from Colombia. We also used population data from each 256 administrative unit analyzed, available from the national statistical office of each country. 257 Information on the type, source and years of data used are provided in Table 1. 258 259 260

Statistical analyses 261
We estimated the average force of infection of dengue, over the last 20 years, for each 262 administrative unit for which we had available data. The force of infection (l) is a metric used 263 to characterize the transmission intensity in a specific setting and estimates the per capita rate at 264 which susceptible individuals are infected. Methods to estimate transmission intensity from age-265 specific incidence data have been previously used to reconstruct the transmission history of 266 measles and dengue(7,26). Briefly, these methods rely on the fact that, for immunizing 267 infections, accumulation of immunity shapes the age distribution of future cases. In settings with 268 high endemic transmission, incident cases are expected to be concentrated in younger age 269 groups, as adults are likely to be already immune ( Figure 1A). In contrast, in places where there 270 is less population immunity, the age distribution of cases is more likely to resemble the age 271 distribution of the population itself, with cases in in both children and adult populations. 272 273 Methods to estimate dengue forces of infection from case data have been applied to settings 274 where dengue is thought to be close to endemic circulation (26,31). These methods generally rely 275 on the cumulative incidence proportion, and therefore assume that all individuals are infected by 276 dengue at some point in their lifetime. They also often assume that the distribution of cases (of 277 dengue hemorrhagic fever cases in particular) is representative of the distribution of secondary 278 cases. Here, we extend these methods to accommodate settings where transmission hazards are 279 lower or where dengue may have been more recently introduced. We do this by modeling 280 directly the age-specific incidence of cases, rather than the cumulative incidence proportion. 281 Details of our model are provided in the supplementary material and code to implement the 282 model is available at https://github.com/isabelrodbar/dengue_foi. 283 We fit all models in a Bayesian Markov chain Monte Carlo (MCMC) framework using the RStan 284 package in R, using wide priors (Normal distribution with mean 0 and standard deviation of 285 1000). We simulated four independent chains, each of 30000 iterations and discarded the first 286 10000 iterations as warm-up. We assessed convergence visually and using Rubin's R statistic. 287 We obtained 95% credible intervals from the 2.5% and 97.5% percentiles of the posterior 288 distributions. 289 290

Validation and sensitivity analyses 291
We validated our estimates of the force of infection by comparing them to those obtained from 292 age-stratified serological data (the gold-standard) for 17 locations where we had both serologic 293 and age-specific case data. 294 Since dengue transmission is known to be highly spatially heterogeneous, we also correlated our 295 administrative level 2 estimates for Colombia with known environmental drivers of dengue 296 transmission: temperature, elevation, population density and a published composite metric of 297 Aedes aegypti abundance(32). 298 As stated above, a key assumption of this model is that the age distribution of cases represents 299 the age distribution of secondary infections. Since data from Thailand has consistently suggested 300 that the majority of dengue hemorrhagic fever (DHF) cases arise from secondary infections(33), 301 we limited our analysis to reports of DHF where possible (Thailand, Brazil and Mexico). 302 However, for Colombia we used combined DF/Severe dengue data because the severe dengue 303 data alone was too sparse. To assess the impact of the data type, we performed sensitivity 304 analyses including all dengue cases, as it is known that many surveillance systems do not 305 differentiate between types of dengue disease. 306 307 Application: Guiding dengue vaccination policy 308 The first dengue vaccine has been licensed for use in children over 9 years of age in 20 countries. 309 Due to uncertainty regarding the vaccine's benefits and risks in individuals who haven't been 310 previously infected by dengue, the WHO's scientific advisory group of experts (SAGE) 311 committee recommended in April 2016 that this vaccine only be used in settings with known 312 high endemicity, defined as places where seroprevalence is greater than 70% in the target 313 vaccination age-group (34), and should not be used in places where seroprevalence is under 50%. 314 This recommendation was later revised, and the WHO now recommends that individuals should 315 be tested for dengue antibodies prior to vaccination, and the vaccine should only be given to 316 individuals who have been infected by dengue in the past(18). In the absence of appropriate 317 serological assays that would allow for pre-vaccination screening, an alternative that has been 318 discussed is deploying the vaccine in settings were seroprevalence is 80% or greater. These 319 recommendations pose challenges to countries wanting to implement the vaccine, as they require 320 detailed knowledge of the epidemiology of dengue. Specifically, they require knowledge of the 321 population seroprevalence against dengue at subnational levels, even though such data is not 322 available. 323 In order to provide information useful to countries considering deploying the vaccine according 324 to the WHO recommendations, we used our estimates of the force of infection to calculate the 325 proportion of the population expected to be seropositive at age 9 years for of the subnational 326

Model i. Estimating the force of infection
The force of infection (l) is a measure used to characterize infection hazard in a given setting and estimates the per capita rate of acquisition of infection by susceptible individuals. Methods to estimate the force of infection from agestratified serosurveyshave been described extensively elsewhere 1 . When serosurvey data is not available, age-stratified incidence data can also be used to estimate forces of infection, as age patterns of disease depend on the age distribution of susceptible individuals in the population 2 .
Methods have been described to estimate dengue forces of infection using case data in settings where dengue has been in endemic circulation 3 . These methods rely on the cumulative incidence proportion, and therefore assume that all individuals are infected by degue at some point in their lifetimes. They also assume that the age distribution of incident cases is representative of the age distribution of secondary infections, the incidence.
Here, we adapt these methods to accommodate settings where transmission hazards are lower or where dengue may have been more recently introduced. We do this by modeling directly the age-specific incidence of cases, rather than the cumulative incidence proportion.
The fraction of the population susceptible to all dengue serotypes at age a and t, x(a,t) is given by ! ", $ = & '() *'* +, -.
, (1) where λ(t) is average force of infection per serotype at time t and 4λ(t) is the total force of infection assuming four circulating serotypes.
The proportion of individuals of age a who have been infected with only serotype at time t, but are still susceptible to all other serotypes is denoted z1(a,t) and is given by: ). (2) Assuming that the age-specific incidence of cases is representative of the distribution of secondary infections, the expected incidence rate among individuals age a at time t is given by and the expected reported number of cases is where P(a,t) is the size of the population aged a at time t, and Φ(t) represents a time specific reporting rate.

ii. Likelihood and estimation
Assuming that the observed age specific case counts C(a,t) follow a Poisson distribution, the likelihood of the data can be expressed as We fit the model in a Bayesian Markov chain Monte Carlo (MCMC) framework using the RStan package in R 4 5 . Both the annual hazards of infection (λ) and the reporting rates (Φ) were estimated on a logit scale using wide priors (Normal distribution with mean 0 and standard deviation of 1000). We simulated four independent chains, each of 30000 iterations and discarded the firs 10000 iterations as warm-up. We assessed convergence visually and using Rubin's R statistic. We obtained 95% credible intervals from the 2.5% and 97.5% percentiles of the posterior distributions.
A limitation of this approach is that, due to the large number (often in the thousands) that are characteristic of the data for some settings, the estimated confidence intervals produced are extremely narrow and do not reflect the underlying uncertainty adequately. The observed counts can also be assumed to follow a negative binomial distribution to account for some overdispersion.

iii. Parameters estimated
Since it is known that the force of infection has varied substantially over time in many of the settings considered, we allowed λ(t) to vary as a function of time. To limit the number of parameters estimated, we assumed constant λ(t) for periods of 20 years. Thus, if for a given setting we were estimating hazards for the period 1935-2015, we assumed piecewise-constant λ(t)s for the periods 1935-1954, 1955-1974, 1975-1994, 1995-2015. Given the objective of this study was to characterize recent transmission in endemic settings, we focused our results on the estimate of the average λ(t) for the most recent 20-year period. In the main text we focus on reporting the total force of infection (4λ(t)).
To account for the large variation in yearly dengue incidence, our model also included yearly "reporting rates" Φ(t). These reporting rates not only capture variations in reporting itself, but also variation in the symptomatic:asymptomatic ratio of dengue infections.
iv. Estimating the proportion seropositive at a given age Using our estimates of the average force of infection, we estimated the proportion of individuals expected to be seropositive by age y(x) as: y x = 1 − x a = 1 − e '(=> Where x(a) is the proportion of the population susceptible at age a and l is the average force of infection per serotype (assuming four serotypes circulating). Since the vaccine has been registered for use in children 9 years of age or older, we report the proportion of individuals expected to be seropositive by age 9 years for each of the settings.
v. Estimating the minimum age to achieve a given level of seropositivity Given that the WHO initially recommended this vaccine in places where at least 70% of the target age-group is seropositive, we estimated the minimum age at which this level of seropositivity is expected for each of the settings.
For a given level of transmission l it is possible to estimate the minimum age (A) at which a given level of seropositivity (s) is expected as: Filled circles show the estimates for each municipality (administrative level 2 unit) within the department that reported >200 cases. Hollow circles indicate the mean force of infection for those municipalities that reported <200 cases. Size of circles is proportional to the number of cases available to estimate the FOI. Triangles indicate the mean estimate for each department.   Figure S3: Ranking of administrative level-1 units of Thailand, Colombia, Mexico and Brazil based on FOI estimates (x-axis) and incidence in the past 5-years (y-axis).  fact that milder forms of the disease probably arise from a mixture of primaryquaternary infections, and not just secondary infections. Thus, inference made for places where all-cases are reported is likely to be conservative.