Challenges in dengue research: A computational perspective

Abstract The dengue virus is now the most widespread arbovirus affecting human populations, causing significant economic and social impact in South America and South‐East Asia. Increasing urbanization and globalization, coupled with insufficient resources for control, misguided policies or lack of political will, and expansion of its mosquito vectors are some of the reasons why interventions have so far failed to curb this major public health problem. Computational approaches have elucidated on dengue's population dynamics with the aim to provide not only a better understanding of the evolution and epidemiology of the virus but also robust intervention strategies. It is clear, however, that these have been insufficient to address key aspects of dengue's biology, many of which will play a crucial role for the success of future control programmes, including vaccination. Within a multiscale perspective on this biological system, with the aim of linking evolutionary, ecological and epidemiological thinking, as well as to expand on classic modelling assumptions, we here propose, discuss and exemplify a few major computational avenues—real‐time computational analysis of genetic data, phylodynamic modelling frameworks, within‐host model frameworks and GPU‐accelerated computing. We argue that these emerging approaches should offer valuable research opportunities over the coming years, as previously applied and demonstrated in the context of other pathogens.


| Dengue transmission cycle
Dengue virus is transmitted to humans by the bite of infectious mosquitoes. Two mosquito species have been identified as the main disease vectors: Aedes aegypti and Ae. albopictus. Intercontinental changes in the second half of the 20th century, in particular the post-World War II boom in urbanization and globalization, have facilitated the introduction and establishment of dengue's two principal mosquito vectors into many major urban and peri-urban settings and resulted in the worldwide establishment of endemic transmission cycles (i.e., between humans; Vasilakis, & Weaver, 2008;Murray et al., 2013;Weaver & Vasilakis, 2009). Ae. aegypti is particularly domesticated and often uses artificial water containers found within and around human habitats as breeding sites. Adult mosquitoes are optional haematophages, but females require blood for egg production and show strong preference for human blood meals (Scott, Morrison, Lorenz, Clark, & Strickman, 2000).
In contrast, the transmission cycle of sylvatic viruses takes place in forest environments of South-East Asia and West Africa between nonhuman primates and arboreal Aedes species. Rural populations living in close proximity to forested or plantation areas are periodi-Dehecq, Balleydier, Jaffar, & Michault, 2012), and a major epidemic in Pakistan took place in 2013 (Kraemer, Perkins, Cummings, Zakar, & Hay, 2015;Khan, Khan, & Amin, 2016). Together with the widespread presence of Ae. aegypti and Ae. albopictus, this would indicate that the burden of dengue in Africa is likely to be worse than previously acknowledged (Guzman et al., 2010;Jaenisch et al., 2014;Kyle & Harris, 2008).
In some cases, this may progress to more severe and life-threatening forms of disease-dengue haemorrhagic fever (DHF) or dengue shock syndrome (DSS)-with manifestations of circulatory failure, vascular permeability and haemorrhagic symptoms (Gubler, 1998;Halstead, 2007).

| Immune interactions and computational approaches
Importantly, however, empirical evidence supporting a universal increase in the transmission potential of secondary infections is still lacking. That is, while the ADE phenomenon has been shown both in in vitro and in vivo studies Dejnirattisai et al., 2010;Halstead, 2011), its actual onsequences for disease transmission and the proportion of hosts contributing to ADE-induced phenotypes at any given time are largely unknown.
Cross-immunity, where an infection by one serotype negatively affects the fitness of subsequently infecting serotypes, has equally been put forward in modelling studies to explain dengue's complex epidemiology Reich et al., 2013;Aguiar et al., 2008;Wearing & Rohani, 2006). Recent studies on prospective cohorts have found that shorter time periods between sequential infections are associated with some degree of clinical protection (Anderson et al., 2014;Montoya, Gresh, Mercado, Williams, & Vargas, 2013). However, apart from a single human study conducted in the 1950s (Sabin, 1952), experimental support for both the nature and duration of cross-immunity is still lacking. Consequently, there is no broad consensus about the protective efficacy of the observed cross-reactivity between dengue serotypes and whether cross-reactive responses prevent infection or solely act to alleviate clinical outcomes. Model assumptions have therefore diverged significantly with regard to the duration and the type of protection following a primary dengue infection.
It is worth noting that some of the controversies surrounding serotype immune interactions and their effects on the epidemiological dynamics of dengue are partially consequences of the actual modelling frameworks themselves. That is, deterministic models based on mass-action principles, which comprise the majority of mathematical models found in the published literature, require an interaction term in order to desynchronize the serotypes and to exhibit dengue's characteristic multi-annual incidence patterns.
Stochastic, and in particular individual-based, models do not rely on this assumption to generate epidemiological time series with such properties. The contrasting dynamic behaviour is driven by the amplification of stochastic effects at the individual-level (demographic stochasticity), which keeps each serotype/strain in a transient regime rather than approaching the expected deterministic equilibrium even in the absence of explicit desynchronizing factors (Lourenço & Recker, 2013;Alonso, McKane, & Pascual, 2007). Crucially, understanding the epidemiological consequences of dengue immune interactions goes beyond the theoretical argument and has recently been shown to be of major importance for dengue vaccination impact studies based on mathematical models (Lourenço & Recker, 2016;Flasche et al., 2016).

| CHALLENGES AND OPPORTUNITIES IN COMPUTATIONAL RESEARCH
The zoonotic origin, separate endemic and sylvatic transmission cycles, immunological serotype interactions and the vast diversity of F I G U R E 2 Dengue publication over the last five decades. Total number of dengue articles per year (bars) and the percentage of those with a computational focus (spikes). Between 1970 and 2016, a total of 15,267 dengue articles were published, including 190 modelling studies (see Appendix for details) 1970 1973 1976 1979 1982 1985 1988 1991 1994 1997 2000 2003 2006 2009 2012 2015 Dengue−related publications

| Real-time collection and computational analysis of genetic data
Like the yellow fever virus (Auguste, Lemey, Pybus, Suchard, & Salas, 2010), DENV was probably first introduced in the Americas through infected hosts and vectors during the slave trade (Weaver & Vasilakis, 2009;Vasilakis, & Weaver, 2008). At present, the distribution of DENV serotypes results from a combination of strong population structure and gene flow across geographical regions. South-East Asia harbours the greatest DENV genetic diversity, and it is from there that viral lineages seem to seed epidemics in the Americas and elsewhere (Holmes, 2008). Several studies suggest that every 7-10 years, Dengue virus incidence is often underestimated and novel surveillance, tools need to be applied to improve current estimates and reduce misclassification (Silva, Rodrigues, Paploski, Kikuti, & Kasper, 2016 (Nunes, Faria, de Vasconcelos, Golding, & Kraemer, 2015). It is also the case that representative data are essential to quantify the burden of transmission and to predict dynamic behaviour and impact of interventions (Jaenisch et al., 2014). However, even in regions with access to diagnosis and surveillance, it is rarely the case that reported case data on arboviruses are representative (Silva et al., 2016;Lourenco, Maia de Lima, Faria, Walker, & Kraemer, 2017). A major example for which both of these issues are particularly challenging is the Amazon region, that has been suggested to play a stepping stone role in the introduction of DENV lineages from the Caribbean into Brazil (Temporão, Penna, Carmo, Coelho, & Azevedo, 2011;Nunes, Faria, Vasconcelos, de Almeida Medeiros, & de Lima, 2012), and may have also been involved in the emergence and maintenance of ZIKV (Faria, Quick, Claro, Thézé, & de Jesus, 2017).
With the ongoing democratization of pathogen sequencing (Faria, Sabino, Nunes, Alcantara, & Loman, 2016), it is hoped that the generation of genetic data will soon become standard in public health and clinical practise (Houldcroft, Beale, & Breuer, 2017 With the real-time genomic epidemiology revolution (Gardy, Loman, & Rambaut, 2015), new and complementing insights (to clinical reported case data) are expected to provide unique information about the macro-and microscale process shaping viral dynamics. Statistical models that take into account shared ancestry allow testing for the contribution of demographic factors associated with virus persistence and spread (Faria, Suchard, Rambaut, Streicker, & Lemey, 2013;Lemey, Rambaut, Bedford, Faria, & Bielejec, 2014). Using such approaches, air travel has been shown to play a pivotal role in the spread of vectors (Tatem, Hay, & Rogers, 2006) and DENV (Nunes et al., 2014). Mobility may also play a significant role in the spread of arboviruses at more local scales; for example, time to travel has been shown to be strongly associated with incidence within distinct neighbourhoods of the same city during an arboviral outbreak (Faria, Lourenço et al., 2016). As realtime collection of genetic data becomes more frequent and available, increased time and spatial resolution will allow future phylogenetic and modelling studies to further evaluate the relative contribution of short-and long-distance mobility.
It may seem that these recently established approaches and

| Phylodynamic modelling frameworks
The genetic diversity of RNA viruses is shaped in direct response to changes in host and vector demographic and ecological factors, such as population size, structure or movement. This provides measurable connections between host population dynamics, viral evolution and transmission. In recent years, genetic, immunological and epidemiological data have been successfully used to investigate such connections, unifying key observations across biological scales within a single framework known as phylodynamics (Grenfell et al., 2004 This approach is exemplified in Box I, where we extended a previously published, individual-based dengue transmission model (Lourenço & Recker, 2013) to include explicit molecular evolution. Two scenarios are compared: one in which the dynamics and evolutionary histories are purely driven by demographic and transmission processes without immune interactions (Box I, a-d), and one where we considered temporary, serotype-transcending immunity following recovery from infection (Box I, e-f). With this approach, the phylogenetic output would be particularly informative as cross-immunity results in more imbalanced phylogenetic trees, akin to a ladder-like topology that is often reported for other pathogens, such as influenza A Bedford, Cobey, & Pascual, 2011) or within-host HIV (Shankarappa, 1999), where cross-immunity is known to play a crucial role.
Given seasonal epidemic troughs in endemic regions, possible mechanisms for viral persistence are still open topics of discussion, in particular as phylogenetic studies have shown that the lack of case reports can be followed by resurgence of previously circulating genetic variants (Bennett et al., 2003(Bennett et al., , 2010Carrington, Foster, Pybus, Bennett, & Holmes, 2005

Box I Phylodynamic modelling
The explicit modelling of molecular evolution allows for an exploration of dengue's drivers for key observations of phylogenetics and population genetics, such as tree imbalance, Tajima D and measures of viral diversity. Here, in order to illustrate such potential, we expand an individual-based framework to include the transmission and evolution of individual virions. We make a number of simplifying assumptions: one virion is modelled per host; the transmission chain of a single serotype is used to define the discrete birth-death process of virions; mutation occurs over the birth-death process, resulting in a multichotomous, evolution tree; a fixed molecular clock is used for the mutational input, which follows an infinite-sites model with no recombination; all mutations are assumed to be neutral; a burn-in period of 25 years for transmission (ending at t = 0 in figures), followed by 10 years for molecular evolution (t = 10), is considered before model output is recorded; a single founder virus is used (t = 0); population genetic sampling is uniform in time-and frequency-dependent; phylogenetic tree is a small sample (<1%) of the evolution tree. In a broader evolutionary perspective, drivers of key observations on the molecular evolution of dengue could also be studied within these frameworks. For instance, strong purifying selection is consistently reported in sequence data, for which it is generally accepted that the two-host life cycle imposes strong evolutionary constraints.
Box I (b) shows, however, that even without explicit constraints on molecular evolution (e.g., deleterious mutations), negative Tajima D, a proxy for purifying selection, is self-emergent. As such, demographic and transmission processes appear sufficient to maintain the viral population as under purifying selection. Sensitivity experiments could therefore be used to test which processes seem to dictate these pop- Our previous work has suggested that stochastic effects and clonal interference are major determinants of the emergence of fitter, potentially more virulent variants (Lourenço & Recker, 2010). With these more detailed individual-based model frameworks, the strength of such determinants could be tested with higher resolution, discerning, for instance, between stochastic pressures dictated by population structure, host mixing or spatial heterogeneity in herd immunity (Lourenço & Recker, 2013).
Disparate levels of within-and between-host genetic diversity are a common feature of DENV viruses, and complementary hypotheses exist for these observations (Choudhury et al., 2015;Lequime et al., 2016;Thai et al., 2012). By allowing parameterization of mutational input and quantification on how population-level events modulate within-host genetic diversity, phylodynamic models could help make predictions on the relative contribution of different mechanisms. In particular, transmission bottlenecks could be implemented and their effect tested in the background of cross-immunity and clonal interference (Lourenço & Recker, 2010).
There are concerns related to viral escape variants following intervention programmes based on vaccination and/or Wolbachia-infected mosquitoes. While for Wolbachia-based control strategies, it can be argued that these concerns are so far theoretical (Bull & Turelli, 2013), escape in a postvaccination era has been reported for a wide range of pathogens (Read, Baigent, Powers, Kgosana, & Blackwell, 2015;Pérez-Sautu, Costafreda, Caylà, Tortajada, & Lite, 2011). The explicit generation, emergence and spread of escape mutants could be modelled within phylodynamic frameworks such as the one presented in Box I, given that individual mutations are already tracked. Vaccination, as demonstrated in previous work, can also be easily included (Lourenço & Recker, 2016;Flasche et al., 2016). Parameterization on the efficiency of viral escape to vaccine-induced immunity would allow for sensitivity experiments and projections on the public health impact of such escape variants. Furthermore, demographic and ecological heterogeneities could also be researched, such as population structure, in order to evaluate under which conditions such variants could more easily emerge and devise appropriate intervention schemes that could stop their fixation.
Finally, as described in the previous section, real-time collection and analysis of genetic data will soon offer vast amounts of highresolution data on population genetic measurements and phylogenetics, both in time and space. This will fundamentally change our perspective and the number of empirical observations that can be used to validate model frameworks. The use of more complex modelling approaches, such as individual-based models, including atypical dimensions such as detailed spatial scales (Box II) or molecular evolution (Box I), will not only be a valuable path of research, but could indeed become necessary approaches to be able to match the rich variety of observed data patterns across biological scales.

| Within-host models
The vast majority of dengue models has focused on viral dynamics at the population-level. Given the importance that the within-host processes (e.g., ADE and cross-immunity) play for both viral trans- Box II Sensitivity analysis using GPU acceleration The increased computational power due to GPU parallelization permits a deeper exploration into the epidemiological effects of different community structures, vector and host heterogeneities, environmental heterogeneities or human movement patterns. Here, in order to illustrate the use of GPUaccelerated agent-based modelling, we investigate how the connectivity between subpopulations, as a proxy for human movement/major commuting patterns, affects the epidemiological dynamics of dengue within a spatial meta-population model. In this framework, the population (human and mosquitoes) is divided into subpopulations and arranged spatially into a network of communities, connected by major trade and/or commuting routes. Dengue transmission events are dispersed through the network predominantly via local connections between communities as well as occasional but random transmission events between distant (i.e., not directly connected) subpopulations. The global connectivity pattern can then be quantified by means of the network's degree bias, which, in its simplest form, can be interpreted as the number of nodes (i.e., communities) that are only connected to one other node.
Plots a-b show two examples of networks with either a low or a high degree bias, respectively. As can be seen in the corresponding time series of serotype-specific incidence, the way that communities are connected to form a meta-population has a strong effect on the epidemiological dynamics, especially with respect to the interannual variability and serotype periodicity. This can be further illustrated by means of a sensitivity analysis, where we gradually increased the degree bias and recorded the model's relative change in four characteristic epidemiological and serotype-specific measures (plots c-d). In this specific example, where we considered a population size of 5 million humans and up to 6 million mosquitoes, we gained a speed-up of around 320 (using a Nvidia GTX 1070 graphics card). In other words, running the same analysis, which took just over 9 hr to complete, would have taken about 4 months on a single core CPU and still many weeks using multithreaded CPU computing on a high-end PC. This clearly illustrates how GPU-based computing may facilitate the use of bigger and more complex models for addressing some of the outstanding dengue epidemiological and evolutionary questions discussed in this work. modelling frameworks (Gujarati & Ambika, 2014;Nikin-Beers & Ciupe, 2015). While one study concluded that enhancing antibodies (ADE) were key in secondary infections (Gujarati & Ambika, 2014), the other study suggested instead a direct decrease of overall heterologous viral clearance, whereby cross-reactive antibodies render virions unavailable for further binding and subsequent clearance by ADCC (Nikin-Beers & Ciupe, 2015). In another study, the question about the role of the adaptive and innate immune responses in disease severity was addressed (Ben-Shachar & Koelle, 2014;Ben-Shachar et al., 2016).
It was shown that characteristic features of primary infection could be replicated solely by innate immune responses (NK cells). In contrast, features of secondary infection, including ones related to clinical outcome, required greater infectivity rates (ADE) together with T cell-mediated clearance of infected cells in a process arguably similar to well-described cytokine storms (Rothman, 2011;Katzelnick et al., 2017). In line with previous results by Clapham et al. (2014), it was proposed that the observed variations in viral load could largely be explained by assuming patient-specific incubation periods.
Generally and similarly to the aforementioned discrepancy between model assumptions related to immune interactions at the population level, within-host approaches have been able to replicate important and ubiquitous data patterns, but, either due to formalism or data restrictions, have been limited in establishing a consensus framework. In some cases, the models and the data sets differed, making it hard to assess whether contrasting conclusions are data-or model-driven. For instance, some models propose a key role of innate immunity in clearance of primary infection (Ben-Shachar & Koelle, 2014;Ben-Shachar et al., 2016), while others propose that either humoral or cell-mediated clearance is required (Nuraini et al., 2009;Ansari & Hesaaraki, 2012;Clapham et al., 2014;Gujarati & Ambika, 2014;Nikin-Beers & Ciupe, 2015;Clapham et al., 2016). Equally, asymmetries between serotypes in both cell infectivity and timing of peak viraemia versus symptomatic manifestations were reported in one study (Ben-Shachar et al., 2016) but not in another (Clapham et al., 2014).
A current major problem is the lack of fine-scaled data that captures the early, prepeak infection dynamics. This issue arises from the self-limiting nature of dengue infections, fast viral clearance rates and the absence of antiviral therapies against dengue, which effectively prevent human infection studies. In fact, antiviral therapy has been devalued due to the fact that the timing of clinical symptoms often overlaps with peak viral load (Clapham et al., 2014); that is, most patients will seek treatment at a point when the virus is already being cleared by the immune system. However, in another example of contrasting observations, if there are indeed significant differences between the serotypes, as suggested by Ben-Shachar et al. (Ben-Shachar et al., 2016), then there might be a question whether sero-specific strategies could be beneficial, for which modelling studies could contribute useful answers.
Another promising avenue is in the context of vaccination.
Recently, the first dengue vaccine was licensed (Dengvaxia ® ) and many others are in advanced trial stages (Schmitz, Roehrig, Barrett, & Hombach, 2011;Thomas & Endy, 2011;Schwartz, Halloran, Durbin, & Longini, 2015). For a good number of these, data will be made available on population-level efficacy but also on within-host markers and dynamic measurements of infection and immune response (e.g., Dorigatti, Aguas, Donnelly, Guy, & Coudeville, 2015). Such data can be used to inform new modelling frameworks to explain vaccine action (e.g., infection versus transmission blocking), which may have considerable impact on licensing and future distribution. We note, however, that is yet unknown whether fine-scale kinetic data will be available, which is essential to parameterize models (Magnus, 2013)

| GPU modelling frameworks
Many traditional approaches in theoretical epidemiology of dengue and other infectious diseases rely on mass-action principles whereby the rate at which a disease spreads through a population is directly proportional to either the number or the proportion of infected and susceptible individuals within that population. Under these assumptions, every individual has the same probability of getting infected, contributes equally to disease transmission and recovers at the same rate as everyone else. Their low computational footprint and analytical tractability make these models an attractive choice for investigating population-level dynamical behaviours, especially when homogeneity in time and space can safely be assumed. In many cases, this cannot be guaranteed, however, and individual variations, arising through the stochastic nature of infection events, for example, or through ecological and demographic heterogeneities, cannot be captured by these models and require a different approach altogether.
Agent or individual-based models offer a natural and intuitive way to account for these variations by keeping track of individual, probabilistic infection events in both humans and mosquito vectors and have been implemented to various degrees of realism and model complexity. As an example, spatial details can be included by dividing the population into smaller subpopulations arranged in a rectangular grid (see Figure 3). This approach has already highlighted how the spatial segregation of individuals, causing stochastic, local extinction and re-invasion, can explain much of dengue's observed epidemiology (Lourenço & Recker, 2013 In many cases, the computational overhead of agent-based models arises not necessarily from individual infection events but rather from keeping track of host and vector demography and ecology, for example the processes related to birth, death and ageing. As these are largely independent events, they are amenable to code parallelization and can thus benefit from graphics processing unit (GPU) acceleration, which specializes in attaining high arithmetic throughputs. Speed-ups of anywhere between 10 and 1000 times from a serialized central processing unit (CPU) code are theoretically feasible, although in practice this very much depends on the particular model and implementation.
That is, the benefits of GPU acceleration can easily get lost on increasingly refined spatial resolutions, for example, when more time is spent keeping track of individuals' movements (which is generally not parallelizable) than updating infection statuses or age. Nevertheless, the gain in computational speed-up through GPU acceleration enables us to simulate much bigger and more detailed models in significantly less time (see Box II).
The necessity to include higher-resolution in biological scales such as population structure in modelling frameworks is clear, given the recent rise of ever more detailed genetic Woolhouse et al., 2015), mobility (Kraemer, Perkins et al., 2015;Wesolowski et al., 2015;Lemey et al., 2014)  In the context of DENV, such approaches are now reaching the capacity to be modulated and fitted to particular, existing host F I G U R E 3 Increasing model complexity demands higher computational power. Model detail can be added by dividing a well-mixed population into separate subpopulations, arranged in a regular spatial grid or by means of complex networks to represent geographical distribution of villages, towns and cities, with edges corresponding to major human movement patterns. Depending on data availability, more spatial and demographic detail can be added by considering individual households, places of work or schools. However, the computational demands increase significantly with more detailed information to keep track of, making the model very setting-specific and impractical for sensitivity analyses and model fitting exercises populations and to simulate real-world scenarios in terms of transmission and control. This would be particularly valuable in the context of DENV vaccination modelling, which is recent in the literature (e.g., (Lourenço & Recker, 2016;Flasche et al., 2016;Chao et al., 2012;Coudeville & Garnett, 2012)). For most of the published studies, simplifying assumptions are made on population heterogeneities or agedependent factors, for instance. However, with the amelioration of computational costs, realistic vaccination campaigns can be assessed, including details related to relative targeting of rural versus urban settings or detailed catch-up and routine strategies dependent on host age. In terms of vector control programmes, the actual location of water reservoirs, the influence of climate or host mobility networks can also be considered. In summary, these advanced computing techniques have the capacity to change our perspective of modelling frameworks from being based on groups, generally of individuals sharing similar properties or phenotypes, to a perspective of host-pathogen systems focusing on the role of single individuals, and how the mechanics of transmission and evolution at that scale can emerge as self-evident patterns at the level of the population.

| CONCLUDING REMARKS
We have reviewed the biological and epidemiological background of dengue, together with the major achievements of computational approaches dedicated to the pathogen. In addition and in the context of its increasing success as a global threat to public health, we have also highlighted critical knowledge gaps and research underachievements that call for an urgent renewed focus and promotion of such methodologies. We argue that possible advancements, based on new processing strategies, real-time data sources and modelling frameworks already implemented for other pathogens, are already at reach of the research community. These new approaches are expected to make a significant contribution to our understanding of the evolutionary ecology and immunology of the dengue virus and its control in the near future. Dengue incidence data for Puerto Rico were provided by Michael Johansson from the CDC Dengue Branch, comprising clinically suspected cases of dengue fever (DF) and dengue haemorrhagic fever (DHF) in Puerto Rico between 1986 and 2012. Serotype-specific testing has varied over time, so we adjusted the serotype-specific incidence data proportionally to match the incidence of all suspected cases by month. A shorter time series had previously been published and analysed in (Johansson, Cummings, & Glass, 2009 (Nisalak et al., 2003).
Yearly counts of dengue publications  were collected from PubMed under two lists (queries). The first contained the total number of publications per year dedicated to dengue, including all hits with reference to dengue in the title and abstract (counts collected using the "results by year" functionality in PubMed). The second was a subset of the first, which included the use of dynamic models in the context of dengue research (manually curated). Queries were performed on the 1st of February 2017.