Linked within-host and between-host models and data for infectious diseases: a systematic review

The observed dynamics of infectious diseases are driven by processes across multiple scales. Here we focus on two: within-host, that is, how an infection progresses inside a single individual (for instance viral and immune dynamics), and between-host, that is, how the infection is transmitted between multiple individuals of a host population. The dynamics of each of these may be influenced by the other, particularly across evolutionary time. Thus understanding each of these scales, and the links between them, is necessary for a holistic understanding of the spread of infectious diseases. One approach to combining these scales is through mathematical modeling. We conducted a systematic review of the published literature on multi-scale mathematical models of disease transmission (as defined by combining within-host and between-host scales) to determine the extent to which mathematical models are being used to understand across-scale transmission, and the extent to which these models are being confronted with data. Following the PRISMA guidelines for systematic reviews, we identified 24 of 197 qualifying papers across 30 years that include both linked models at the within and between host scales and that used data to parameterize/calibrate models. We find that the approach that incorporates both modeling with data is under-utilized, if increasing. This highlights the need for better communication and collaboration between modelers and empiricists to build well-calibrated models that both improve understanding and may be used for prediction.


INTRODUCTION
In the study of biological systems, phenomena are often observed at multiple scales -from sub-cellular 39 to among entire populations. Here, we focus on the between-host scale and the within-host scale, given Mathematical and computational modeling, which has a rich history of application to the dynamics 48 of ecological systems and infectious diseases, has been used to study phenomena at both within-host 49 and between-host scales (as well as other scales as is reviewed in Garira (2017)). At the between-host 50 scale, classic compartmental models, like the SIR model, which represents the interactions between 51 susceptible individuals S, infected individuals I, and recovered individuals R, have been used to predict the 52 spread of infectious diseases between individuals in a population (Kermack andMcKendrick, 1991, 1927;53 Anderson and May, 1992). At the within-host scale, models such as the T IV model of viral dynamics, 54 which represents the interactions between target cells T , infected cells I, and virus V , have been used to 55 understand viral load within hosts (Perelson et al., 1996;Nowak and May, 2000). 56 To understand the outcomes produced by the interactions in and between different scales, a multi-scale 57 model that links the scales may be constructed. For example, an SIR model may be used to describe

75
Of the 22 reviews found by our search two were themselves systematic reviews (Dorratoltaj et al., number that is similar to our own study. However, they actually found three papers that did not appear in within-and between-host components as well as data, none of these three papers would match our criteria, 86 and thus it is unremarkable that we did not find these papers through our search.

87
In this review, we aim to illuminate the state of the field joining experimental data with mathematical 88 and computational models that bridge within-host and between-host scales. By doing so in a systematic 89 manner we expect to identify potential gaps in understanding and methodology. Thus, we examine papers 90 that incorporate models that contain linked within-host and between-host model components as well as 91 explicitly utilize data. While we have related an example that involves the linking of two compartmental 92 models in the context of a viral disease, we do not restrict our search to only compartmental models or to SURVEY METHODOLOGY a systematic review or a meta-analysis. The flowchart showing our procedures are presented in Fig. 1A. 101 We searched "All Databases" on Web of Science using the search terms (within-host* OR in-host-102 model* OR among-host-model* ) AND (between-host* OR nested-model* OR cross-scale-model*) AND 103 (pathogen* OR parasite* ) AND (transmi*) for papers published up to December 31, 2018. Terms 104 combined within parentheses with 'OR' require at least one of these terms while the 'AND' between 105 terms in parentheses requires something from each group of terms. Terms that end in a * indicate that any 106 form of the ending of the word would be acceptable. For example, 'transmi*' would return papers with 107 transmit or transmission. We note that this search type accesses the 'title,' 'abstract,' 'author keywords,' 108 and 'keywords plus.' If the search terms are not found in these locations, the papers will not be returned  Based on these search terms, we obtained 225 results (Fig. 1A). We initially eliminated 29 search 111 results, which included duplicates and other results that were not papers. Further, there was one paper 112 that could not be obtained in English (Verenini, 1983); only an Italian version was found. This left us  In the initial abstract screening phase, two randomly assigned people (i.e., two of LMC, FEM, ZG,  If an abstract was labeled with two 'Yes' or with one 'Yes' and one 'Maybe', we retained the paper 132 for full paper screening; if an abstract was labeled with two 'No' we excluded the paper from screening.

133
If an abstract was labeled with one 'Yes' and one 'No', we reviewed the abstract collectively to relabel it 134 to either two 'Yes', two 'No', or one 'Maybe'. If an abstract was labeled with one 'Maybe' and one 'No', 135 the person who labeled 'Maybe' was assigned to skim the paper to decide if the paper should be kept or 136 eliminated. If an abstract was labeled with two 'Maybe', a third randomly chosen person was assigned 137 to skim the paper to decide whether it should be kept or eliminated. A record was kept if it appeared to 138 have a linked model and/or data, but still was unclear if it had both; the paper was excluded otherwise.

139
Once this process was completed, we kept 62 papers for further screening, and excluded 133 papers  . In all, we included 24 papers in the full analyses (Fig. 1A). 151 For the papers that were included, we answered a detailed set of questions, which described important   3) were only collected for the 158 papers that were retained following the abstract screening stage but were neither reviews nor out of scope. Questions in boxes 4 through 8 were completed for all 24 papers that remained following the final screening stage. Questions are found in Text S1; Responses are found in Tables S1-S8; References to all included papers are found in Text S2; References to all excluded papers are found in Text S3; All recorded data can be found in our Supplemental Data Sets. those excluded) and papers meeting our criteria to include both models and data (i.e. those included) 164 increased in that time frame (Fig. 2A). 165 Papers spanned a variety of host species systems (Fig. S2). Infections of humans were, not surprisingly, 166 the most common in both the excluded (44/136) and included categories (8/24), followed by non-human 167 mammals (30 overall) and invertebrates (20 overall). Although human infections were considered in 168 the largest number of included papers overall, the proportion of included papers when broken down by 169 focal host species is largest for non-human mammals (7/30) and approximately the same for invertebrates 170 (3/20) and humans (8/52). The most common reason for exclusion was a lack of data being used with 171 the model (37%) followed by no model (21%) (note, that only one reason was recorded for each paper).

172
That is, many papers explore within-to between-host transmission either from a modeling or empirical 173 perspective, but many fewer link the models robustly to data. Recently, there have been a number of 174 review papers on multi-scales models with data, another common reason for exclusion (11%). 176 We considered whether the aim of each paper was primarily strategic (trying to understand underlying 177 dynamics) or primarily tactical (trying to make predictions) (Nisbet and Gurney, 1982). Of the papers 178 examined, most were classified as primarily strategic and very few papers as primarily tactical. Only one 179 paper was classified as both strategic and tactical (Vrancken et al., 2014) (Table S4.2). Included papers 180 were rarely found in highly specialized non-mathematical journals (2/24), but were relatively equally 181 spread between mathematically focused journals, biology focused, and for a general audience (Fig. S1).  . The x-axis shows the model types used in the within-host part of the model while the y-axis shows the model types used in the between-host model. The dots' diameter represents how many papers used a particular framework. In all cases, the categorization is based on the process/mechanism portion of the model being considered. Deterministic models included any mechanistic/process based model that did not include stochasticity in the process.

Traits of included papers
Stochastic models include any model that includes stochasticity in the description of the mechanism/process except IBMs. Here, IBMs include any models in which individuals or agents were modeled separately from each other and allowed to interact within a simulated environment. Statistical models are those that seek to fit a function to data without the function having an explicit link to a mechanism for what produces a pattern. between-host dynamics on another factor in their model (Table S4.3).

199
In the multi-scale models we considered, the within-host component and between-host component In all cases, our categorization is based on the process/mechanism portion of the model being considered.

203
Deterministic models included any mechanistic/process based model that did not include stochasticity in 204 the process. For example, this may include ODE/compartmental models. These models may have been fit 205 assuming a stochastic observation model overlaid on the dynamics. Stochastic models include any model 206 that includes stochasticity in the description of the mechanism/process. Examples of this would include 207 stochastic SIR models, or stochastic differential equations, but not IBMs. Here, IBMs include any models 208 in which individuals or agents were modeled separately from each other and allowed to interact within 209 a simulated environment. This is in contrast to models that include data collected on individuals and 210 where these are used to parameterize models (we would refer to these as "trait based"). IBS as defined 211 here may contain both deterministic and stochastic components. These models are typically much more 212 difficult to analyze and fit to data than either of the other two flavors of mechanistic models, hence why 213 we considered them separately. Finally, statistical models are those that seeks to fit a function to data 214 without the function having an explicit link to a mechanism for what produces a pattern. This would 215 include all flavors of regression, and many machine-learning approaches.
216 Figure 4 shows the types of within-host and between-host models used in the included papers. Most 217 studies used the deterministic model type at least once, either for within-or between-host models and 218 sometimes for both. In the included papers, within-host models were most commonly deterministic 219 (11/24), followed by statistical (9/24), individual-based (2/24), and stochastic (2/24). In contrast, for the 220 between-host models, the majority were deterministic (13/24), with a lower and more evenly distributed 221 representation of stochastic (5/24), individual based (3/24), and statistical (3/24). One study used an general, studies did not typically use the same modeling approach for both the within-and between-host components. As for host type, there was no evident correlation between model types and the focal host 225 species used in the model (Fig. S3). bidirectionally -both within-host to between-host and between-host to within-host (Table S5.5).

231
To link the within-host and between-host models, a linking mechanism was needed, which we 232 categorized either as a state or a trait. Linking via a state meant that an outcome of the model was  (Table S5.6).

239
Within-host models (Fig. 5A) are linked to the between-host models mostly via the pathogen load, 240 with more than half the papers using this linking mechanism (18/24). Pathogen growth rate was the 241 second most used trait to link the within-to between-host model (5/24 papers). All other within-host 242 linking mechanisms were used in two or fewer papers. Between-host models were also linked into the

265
Bayesian inference, although a popular statistical method, was only used three times in the included 266 papers (Fig. 7). Only a single paper (Volz et al., 2017) recorded using multiple fitting methods at the same 267 scale, and most papers used the same fitting method across all scales. There was a diversity of fitting methods used across scales (Fig. 7). Different fitting methods could, 269 and often were, used in the same papers for incorporation of data at different scales. The least squares 270 method was used most when fitting data at the within-host scale, followed by maximum likelihood.

271
Similarly, linking mechanisms primarily used least squares and maximum likelihood to fit data. In Manuscript to be reviewed contrast, between-host models were less consistent. Across model components the category 'Other' was mostly comprised of qualitative fitting methods, or papers for which authors were imprecise about how 274 they fit the data, but it also included methods such as cubic-spline interpolation.

276
Our objective in this review was to determine how multi-scale infectious disease models, focusing on 277 within-host and between-host scales, are used when they directly incorporate data. We focused on which 278 host species are modeled, which pathogens are modeled, which types of models are used, how the 279 within-host and between-host dynamics are linked, and at what scale data has been used. We found 280 that it was most common for these models to describe a human population, to model a viral disease, 281 to use a deterministic model at either of the two scales considered, to link the pathogen load at the 282 within-host scale, to link the transmission rate at the between-host scale, and to use data at the within-host 283 scale. It was least common for these models to describe a plant, fish, reptile, or amphibian population, 284 to model a bacterial, macroparasite, or fungal infection, to use a stochastic model at either of the two 285 scales considered, to link host symptoms at the within-host scale, to link the host recovery rate at the 286 between-host scale, and to use data at the between-host scale. 287 We speculate on the reasons for these outcomes. As human disease has tangible consequences directly 288 impacting the wider population, it is unsurprising that the primary host species to examine these multi- explicitly inking the within-host and between-host scales is challenging. Many studies defaulted to 296 the standard assumption that a higher pathogen load often correlates with a higher chance of disease 297 transmission, making pathogen load the simplest way to link the within-host and between-host scales.

298
Other linking mechanisms are often difficult to model because there may not be an obvious relationship 299 in how two elements at different scales affect one another. The incorporation of data was primarily at 300 the within-host scale, perhaps stemming from the fact that some of these relationships can be obtained 301 through laboratory based research. In contrast, between-host data may often require large-scale resources 302 and monitoring. 303 We were quite surprised that our search yielded only 24 papers that included both across-scale 304 modeling and substantial use of data. It is possible that our particular search terms may have been overly 305 restrictive. For instance, the search term "pathogen" may be less likely to be used to describe infectious 306 macro-parasites (e.g., worms). None-the-less, our relatively small included set indicates that there is infection outcome, and parasite evolution. The American Naturalist, 172(6):E244-E256.