Understanding the transmission dynamics of respiratory syncytial virus using multiple time series and nested models

The nature and role of re-infection and partial immunity are likely to be important determinants of the transmission dynamics of human respiratory syncytial virus (hRSV). We propose a single model structure that captures four possible host responses to infection and subsequent reinfection: partial susceptibility, altered infection duration, reduced infectiousness and temporary immunity (which might be partial). The magnitude of these responses is determined by four homotopy parameters, and by setting some of these parameters to extreme values we generate a set of eight nested, deterministic transmission models. In order to investigate hRSV transmission dynamics, we applied these models to incidence data from eight international locations. Seasonality is included as cyclic variation in transmission. Parameters associated with the natural history of the infection were assumed to be independent of geographic location, while others, such as those associated with seasonality, were assumed location specific. Models incorporating either of the two extreme assumptions for immunity (none or solid and lifelong) were unable to reproduce the observed dynamics. Model fits with either waning or partial immunity to disease or both were visually comparable. The best fitting structure was a lifelong partial immunity to both disease and infection. Observed patterns were reproduced by stochastic simulations using the parameter values estimated from the deterministic models.


Introduction
Human respiratory syncytial virus (hRSV) is ubiquitous, with the great majority of people having a primary exposure before their third birthday. Infection results in lower respiratory tract disease in about 40% of primary cases and is responsible for the hospitalisation of 0.1-2% of infants each year [1]. Recurrent infections in the same individual are common, and can occur within a single epidemic as well as in subsequent epidemics [2,3]. Although re-infections are generally less severe than the primary infection [4], the relative contributions of primary infection and re-infection to hRSV transmission remains unclear.
The transmission dynamics of hRSV are strongly seasonal with a pronounced annual component in many countries. For example, in Finland, double epidemic peaks, a minor followed by a major peak, are observed every 2 years [2]. Epidemics occur each winter in many temperate climates and are often coincident with seasonal rainfall and religious festivals in tropical countries [5,6]. Although commonly ascribed to climate, the factors responsible for variation in seasonality between locations remain unproven [7][8][9].
Mathematical models of viral transmission are usually based on compartmentalisation of the host population based on the individual's infection and immunity status. The two extreme models, and most studied, are the susceptible-infected-recovered (SIR) and susceptible-infected-susceptible (SIS) models. These frameworks, respectively, assume that individuals recover from infection to become either solidly immune to reinfection (e.g., as for measles, mumps and rubella), or totally susceptible to subsequent infection. An additional status is often added to the SIR framework to include those individuals who have been exposed (and are infected), but are yet to be infectious (SEIR). Intermediate models between SIR and SIS include degrees of susceptibility to reinfection, or a reinfection state that has, for example, reduced infectiousness; such a model might be susceptible-infected-(partially)-susceptible-(re)infected (SISI) [10]. Weber et al. [11] used an SEIRS model with seasonal forcing to estimate specific parameters related to seasonal transmission and acquisition of immunity. Two models of bovine RSV (bRSV) have been published: Greenhalgh et al. [12] demonstrated a subcritical endemic state (i.e., bRSV could persist in a population into which it could not invade) and de Jong et al. [13] applied a stochastic model to data. Both models were of the SISI format and therefore structurally similar to those described in other contexts [14][15][16]. In each of the attempts to model RSV, the model structure implicitly contains assumptions about the relationship between primary and subsequent infections. There are a number of routes by which the continuum between SIR and SIS can be bridged, which essentially depend on either considering individuals as being on a continuous scale of immunity [17] or on discretising this scale [10].
Our principal aim is to use a model in which the underlying transmission model structure can be estimated from data. We introduce a set of four ordinary differential equations that include homotopy parameters that provide paths along which different underlying transmission assumptions can be realised. In this way, we ''nest'' a number of biological assumptions within a single model structure. In order to gain some insight into both the underlying host-pathogen biology (which should be invariant in geographical space) and the seasonal contact pattern (which might vary between host populations), we use multiple time series from different locations to estimate parameters minimising implicit, structural assumptions. We validate the model estimates by making a quasi-independent prediction, and explore the effects of stochasticity using the best fitting model.

The data
The data sets were all records of the monthly or weekly number of diagnosed hRSV cases in a given population. In each data set these are hospitalisations, most probably predominantly of infants experiencing their first infection, representing the serious disease ''tip of the iceberg'' of all hRSV infections. Vaccination is not available, and so the data represent ''natural'' (unperturbed) infection dynamics. The eight populations from which such case notification data were acquired were: • Porto Alegre, Brazil monthly cases from 1990 to 2003 [18] • Rio de Janeiro, Brazil monthly cases from 1986 to 2003 [19,20] • England and Wales monthly cases from 1991 to 1998 (Communicable Disease Surveillance Centre, UK) • West Midlands, UK weekly cases from 1991 to 1998 (Health Protection Agency (West Midlands), Communicable Disease Surveillance Centre, UK) • Finland monthly cases from 1981 to 2000 [2] • Florida monthly cases from 1981 to 1997 [21] • The Gambia monthly cases from 1990 to 1994 [11,22] • Madrid, Spain monthly cases from 1991 to 2002 [23] • Singapore monthly cases from 1990 to 1995 [6].
The data are shown in Figs. 3 and 4.

The model
The full model (within which all others are nested) is illustrated by the flow diagram in Fig. 1 with parameters as defined in Table 1. The model is developed as follows. Individuals are born at rate l into the fully susceptible class (S) and can be infected at a rate K, wherefrom their infection (I S ) is lost at a rate s. Individuals are totally protected while they are infected, i.e., super-infection is not possible. Individuals that have been previously infected but recovered (R) become reinfected individuals (I R ) at a rate rK, with potentially reduced infectiousness (by a factor g) and potentially reduced infectious period by a factor q, i.e., infectivity being lost at a rate s/q. This structure includes two susceptible classes (S and R) and two infected classes (I S and I R ). The homotopy parameters r, g and q determine the differences between primary and subsequent infections in terms of susceptibility, infectiousness and duration of infection, respectively.
To include temporary (waning) immunity in the same framework, we assume that all individuals with experience of infection (I S , I R , R) can return to the fully susceptible class (S). Conceptually, if the loss of immunity is a separate process to loss of infection, then individuals need not recover to a partially protected class before losing immunity. In order to preserve the previous structure, we define the rate at which immunity wanes as as/q, so that a = 0 defines a model in which immunity does not wane. Note that the homotopy parameters a and q are not strictly  independent. Considering individuals with primary infection (I S ), they recover at a total rate s with a proportion a/q returning to the fully susceptible class and (1 À a/q) recovering to the partially susceptible class. Thus, to preserve the sense of the direction of flows: 0 6a 6 q. The full model equations are: The model variables represent the numbers of the population in each class, so that S + I S + I R + R = P, and the population size is assumed constant with birth and death rates equal. The rate of infection, K, is the product of a basic reproduction number and the total effective infectiousness: K = Q(t) · I(s + l) and I = I S + gI R . Transmission varies seasonally: , where b, a and / are treated as unknown parameters. The peak in transmission (/) is given in a fraction of a year (0 is 1st January and 1 is 31st December). The relative amplitude is determined by a, which varies between 0 (no seasonal variation) and 1 (where transmission is 0 at its minimum and twice the mean at its maximum). The overall infectiousness is set by the transmission coefficient b, and defined so that the mean basic reproduction number of primary cases is given as The four homotopy parameters (r, q, g and a) take values between 0 and 1, and by setting some of them to the extremes and estimating others, we obtain different sub-models. The full model without seasonality has a backwards bifurcation for certain conditions on the parameters, however, this condition does not apply for realistic parameter value ranges for hRSV.

Model parameterisation
Eight combinations of global and estimated parameters were the foundation of the analysis ( Table 2).

Global parameters
Parameters not permitted to vary between locations are described as global ( Table 2). The average duration of primary infection is fixed (i.e., not estimated) at 9 days for all models and locations [4]. The homotopy parameters are treated as global, in that they define the biological interaction between the host immune response and the virus. In Model 1, the parameters r, q, g and a are fixed at values that make the model a standard SIS model. In this case, immunity does not wane, and there are no transmission differences between the susceptibility classes and infection classes. In Model 2, the parameters r, q, g and a are fixed at values that make the model a standard SIR model. In Model 3, the parameters r, q and g are fixed at values that make the model a standard SIRS model. The parameter a is estimated to calculate a global value of the average duration of immunity. In Model 4, the parameters q, g and a are fixed at values that make the model equivalent to the partial immunity model developed by Gomes et al. [10]. The parameter r is estimated to obtain a global value of the altered susceptibility factor. In Model 5, the parameters q and g remain fixed while the partial immunity is allowed to wane. The parameters r and a are estimated to obtain global values of the altered susceptibility factor and average duration of immunity respectively. In Model 6, the parameter a is fixed at a value that makes the model equivalent to that proposed by Greenhalgh et al. [12]. The parameters r, q and g are estimated to obtain global values of the altered susceptibility, altered duration of infection and altered infectiousness factors, respectively. Model 7 assumes temporary resistance to disease rather than infection. The parameter r is fixed at unity while the parameters q, g and a are allowed to assume some value between zero and unity. This model therefore defines partial temporary immunity as an altered response to subsequent infection given that challenge occurs within a given time since previous infection. Model 8 is an extension to that proposed by Greenhalgh et al. [12] that allows the partial immunity to wane. The parameters r, g, q and a are estimated to obtain global values of the altered susceptibility factor, altered infectiousness factor, altered duration of infection factor and average duration of immunity, respectively.
A further global parameter is the percentage of primary cases that are hospitalised, h. The models predict the numbers of the population in each state at a given time (in years), and the data give the new hospitalised cases per month, therefore we must use a scaling factor in order to compare the model output with the data. Weber et al. [11] assumed that the incidence rate (number of new cases per unit time) is proportional to the number in the infected class at that time. In fitting the models, we assume that the data represent a proportion of the incidence of infections of completely susceptible individuals, i.e., a proportion of KS. The observed data are the cumulative numbers

KS:
If we assume that the incidence of hospitalised infections is constant over the interval, then the model output must be multiplied by the following scaling factor to relate to data The value of h was estimated for Finland as 2.1 [24]. In the Gambia, the RSV hospitalisation rate per child year was estimated at 0.0087 [25], which if it is assumed that all children will have experienced an infection by the age of 2 years would give a value of h of 1.74. This publication also cites a range of 0.018-0.198 per child year for RSV hospitalisation rates in developing countries. A recent community study in Kenya [1] stated that of 338 children, 133 tested positive for RSV and 4 were hospitalised with severe RSV. However the test used for RSV infection is known not to be 100% sensitive, so a minimum estimate of h would be 4/338= 0.012 and a maximum estimate of h would be 4/133 = 0.03. Since the range of estimates for h for developing countries encompasses the range of estimates for developed countries, we will assume that it is a global parameter in the estimation process.

Location specific parameters
For each of the eight locations, estimates were obtained for: (i) mean basic reproduction numbers (R 0 ); (ii) relative amplitudes of seasonal variation of the transmission rate (a); (iii) times in the year of the peak of the transmission rate (/).

Modelling fitting
The negative log-likelihood, LL À , was calculated under the assumption that the observations follow a Poisson distribution as follows: for n i 2 f1; :: where N i is the number of observations in location i, d n i are the individual observations and D n i are the corresponding expected incidences. The Poisson distribution was used because it is most suitable for considering count data [26]. This definition of the likelihood is also intrinsically weighted for larger population sizes and higher numbers of samples.
The negative log-likelihood, LL À , was used in the optimise feature of the computer package Berkeley Madonna [27] to obtain the best fit of each model to the eight data sets simultaneously. The fits of nested models were compared using the likelihood ratio. In each case the stable limit cycle of the model was fitted to the data, we assume a stable limit cycle. There is no reason to assume that the systems were perturbed at the time of the beginning of observation in any location and therefore be in a transient phase. What is more, the assumption of transient dynamics would require the estimation of initial conditions for the variables of each model therefore increasing the number of unknown parameters unnecessarily. The best fitting initial conditions were for a condition very close to stable limit cycles, therefore justifying their use. Thus an assumption of transient dynamics would allow a slightly higher and unjustified contribution of the earlier parts of the data sets to the likelihood.
In order to increase the accuracy of the model fit, multiple optimisations were carried out. First we obtained a visual fit using multiple runs and fits of nested models if available. We then performed minimisation of the negative log-likelihood using initial guesses that encompassed the visually fitted values. We repeated the minimisation using initial guesses that encompassed the previous fitted values. We repeated the previous step until each resulting estimate was within two significant figures of its initial guesses. This process was completed with between 12 and 21 minimisations.
The diagram in Fig. 2 shows the nested structure of the set of 8 models. The diagram demonstrates the fact that although all the models are nested within Model 8, they cannot be organised as a set of nested models with each model being nested within the next up to Model 8. Therefore some pairs of models cannot be compared directly using their likelihood ratio (e.g., Models 6 and 7), but all models can be compared with Model 8.

Stochastic model
The model equations were discretised using the Euler approximation for a differential and an interval of 0.1 month. The difference equations are as follows. S 0 n i ¼ l i P i À C n i S n i ðI S;n i þ gI R;n i Þ þ as q max½ðP i À S n i Þ; 0 À l i S n i ; I 0 S;n i ¼ C n i S n i ðI S;n i þ gI R;n i Þ À ðs þ l i ÞI S;n i ; I 0 R;n i ¼ rC n i ð1 À S n i À I S;n i À I R;n i ÞðI S;n i þ gI R;n i Þ À s q þ l i I R;n i ; n i ðt n i À t n i À1 Þ; 0g; I S;1 i ¼ I S 0i ; I S;n i þ1 ¼ maxfH½I S;n i þ I 0 S;n i ðt n i À t n i À1 Þ; 0g þ H½nP i ; I R;1 i ¼ I R 0i ; I R;n i þ1 ¼ maxfH½I R;n i þ I 0 R;n i ðt n i À t n i À1 Þ; 0g þ H½nP i ; D m ¼ C n i S n i ðI S;n i þ gI R;n i Þ h 100 is sampled from a Poisson distribution of mean x. The mean number of individuals in a given state (e.g., I S ) at a given time is determined from the previous numbers of individuals in each state and the parameters of the model. Then a value is sampled from a Poisson distribution with that mean. In order to avoid fade-out of infection during epidemic troughs, a random number of infections is introduced (sampled from a Poisson distribution of mean nP i ) representing a number of imported infections from outside the population of location i.

Quasi-independent validation
The fitted global and local parameters for England and Wales from Model 6 were then used to reproduce the weekly incidence of hRSV in the West Midlands region of England. Since the population of the West Midlands is roughly a tenth (0.100775) of that in England and Wales (ONS, Population Estimates Unit), the population size and inter-sampling interval for England and Wales was altered accordingly and the incidence from Model 6 was plotted against the hRSV case data for the West Midlands in Fig. 4. The model prediction underestimates the height of the peaks, but replicates the timing and shape of the annual epidemics.

Stochastic simulation
The stochastic model was run for each population for a period of 20 years. Fifty runs were performed for two values of n (0 and 2 · 10 À7 ). If it is assumed that each population is closed and that infections can only arise from transmission from other members of that population, i.e., that n = 0, then fade out can occur very easily during simulation (Fig. 5(a)).
When some infection events were permitted to occur randomly and independently of the infection levels in the population (i.e., n > 0), simulations then showed no fade out and rarely cycled at a different periodicity from the observed data for n = 2 · 10 À7 (Fig. 5(b)), which indicates a random influx of RSV cases at the level about one millionth of the population each week into each infectious class. Increasing this value further forced oscillations into a period of one year in all regions and increased the value of the inter-epidemic troughs.
Stochastic versions of Models 3, 5 and 7 were also run and had similar characteristics to Model 6, although fade out was more common.
NN indicates that the models are not nested and therefore cannot be compared. Fig. 3. The fit of Model 6 to the eight data sets. For each location, the model prediction of the incidence (line) is plotted with the monthly cases for that location (crosses).

Discussion
We have used a dynamic model with four homotopy parameters to gain insight into the natural history of hRSV infection and reinfection. By applying a family of nested models to multiple data sets we are able to discount the SIR model (Model 2), which assumes that immunity is solid and  lifelong. This is an expected result since reinfection is known to occur [28] and some individuals regaining susceptibility is not unexpected given that RSV has a high level of antigenic diversity, and this diversity is known to change with time [5,29,30]. However, we can also discount the simplest structures for reinfection, i.e., where reinfection is identical to primary infection (Models 1 and 4) although the risk of reinfection may be comparatively lower (Model 4). The remaining models provided comparable visual fits, but Model 6 was found to provide the best and most parsimonious fit based on statistical inference of the calculated log-likelihoods.
A breakdown of the negative log-likelihood by data set (see Table 4) for each model indicates that most between-model variation occurs for the Finland data set although there is noticeable variation in the Gambian values as well. There was a higher negative log-likelihood than for Models 6 and 8 for all other models for Finland. Since the dynamics for Finland are more complex with a biennial signal rather than an annual signal, it is reasonable to expect more variation in these results. However, Model 6 was one of the two best fitting models for every data set, indicating that the fit was not completely dominated by the Finland data.
The fitting procedure for the parent model (Model 8) involves maximising the likelihood in 35dimensional parameter space. This may seem a very high number of parameters, but it is important to note that many of these parameters are local and are associated with specific independent data sets. Therefore the parameter space is structured and is more like six 4-dimensional spaces and two 3-dimensional spaces all linked by a further 5 dimensions. However, an improved fitting method would be to use a multi-dimensional algorithm for each set of local parameters coupled with an update of the global parameters for each iteration, until convergence is achieved.
Model 6 describes a structure where immunity is partial but lifelong. Partial immunity manifests in two ways: first, a reduced risk of infection (partial immunity to infection) and second, an altered infectiousness and duration of repeat infections (partial immunity to disease). Interestingly, of the visually well fitting models, this is the only model that does not include any return of hosts to the completely naïve, fully susceptible state. This permits a decomposition into two transmission submodels: fully susceptible hosts (S) become infected (I S ) at a rate K to recover with partial immunity; partially immune hosts (R) become infected (I R ) at a reduced rate rK to recover back into R generating a recurrent process. The first submodel induces the epidemic threshold (R 0 = 1) above which infection invades a host population, and the second determines the reinfection threshold (R 0 ffi 1/rgq ffi 10, for model 6 with parameters as in Table 2). The estimated mean basic reproduction number for this model is consistently estimated at about 9.3, so that the seasonal changes in transmission oscillate near the threshold. For example, the basic reproduction number for Finland varies between 8.6 and 9.8 and results in a dramatic change in infection prevalence (Fig. 6).
Although Models 6 and 8 were found to be the best fitting models in the deterministic context, it is important to note that Models 3, 5 and 7 also provided good visual fits and their stochastic versions reproduced the observed dynamics. The fitted homotopy parameter values in Table 2 suggest that these three models are very similar. Hence, there appear only two model forms supported by the data: (i) lifelong partial immunity, and (ii) waning immunity. It should be possible to investigate which of the model types is most plausible through epidemiological investigations of individuals with known history of infection, and such studies are currently on going in Kenya [31] and India [32]. Evidence exists for reduced levels [33] and durations [33][34][35] of viral shedding during infection of adults and older children compared with young children [34,36]. This appears to support a lifelong alteration of the response to reinfection as indicated by Model 6. The estimated amplitudes for the oscillations in the transmission coefficient reveal different patterns under different model types (Table 4). Therefore, understanding what underlies seasonality can provide another useful indicator for model plausibility.
The predicted percentage of primary infections that are hospitalised is estimated at 1.3% for the partial immunity model (6) which is not dissimilar to a previous estimate of this value of 2.45% [11]. It is estimated at 0.038% for the waning immunity models (3, 5 and 7). This is because individuals in these models are able to completely lose their immunity and thus experience a primary infection more than once and at any age. However, hRSV infections of individuals older than 2 or 3 years that result in hospitalisation are negligible in number [37]. Therefore, given the assumption that the populations are uniformly distributed over age, the estimated percentage of primary cases (as defined by the waning immunity models) should be roughly the estimate for the partial immunity model divided by the average life expectancy (about 70 years) and multiplied by the average age of hospitalisation (about 2 years). This is in fact the case here since 1:3Â2 0:038 ffi 70. It could also be the case that the proportion hospitalised is dependent on the state of the epidemic, acting through, for example maternal antibody or dose effects.
De Jong et al. [13] showed that infections in previously infected (seropositive) animals are not likely to be the cause of persistence of bRSV in dairy farms. We found that persistence was common in three of the eight regions from stochastic simulation, and that persistence was only achieved in the remaining five locations if some amount of infections could be introduced. Some influx of infection from outside each location would seem a reasonable assumption, since none of the locations are completely isolated from other areas where hRSV is present.
We must be somewhat distrustful of the estimated parameter values given here due to the lack of independent estimates for the population sizes for all but the England and Wales and Turku (Finland) data sets. A recent identifiability analysis, performed on an SIR model with seasonal forcing (similar to Model 2) proved that the model was unidentifiable if either the population size or percentage hospitalised was unknown [38], so that even with perfect, continuous, noise-free data for the transient dynamical phase, unique estimates for the parameter values cannot be obtained. However, we believe that our results pertaining to model structure are robust since the two known population sizes will provide constraints on the global parameters (including the percentage of primary cases hospitalised) and these will in turn constrain the estimates for the remaining population sizes.
The advantage of a comparative analysis of global data is the investigation of common determinants of transmission. If temperature variation were the explanation for the seasonality of hRSV globally, we might expect the ordering of the locations by amplitude to correlate with the ordering according to latitude, which is not the case. Likewise, there is no consistent pattern relating the timing of peak transmission with temperature change. There are many apparent patterns in results in Table 4, however the number of global locations is too small to be confident of proposing robust hypotheses. We suggest that more likely explanations of epidemic patterns in hRSV disease will involve aspects of spatial dynamics, genetic diversity (acting through susceptibility to reinfection) as well as currently undefined seasonal signals.
hRSV exhibits substantial genetic variation, that has been linked to the risk of reinfection, i.e., the possibility of is genotype specific immunity. The simple model structures used here include genetic variation implicitly in the process of acquisition and loss of immunity, for example partial immunity to reinfection could be due to challenge with a different (heterologous) strain. The interactions between genetics and dynamics have been explored for a subset of the data presented here coupled with typing data [39], and demonstrate that these interactions are able to produce complex dynamics (e.g., 6 point cycles). However, the underlying seasonality remains largely unexplained.
The general condition for an epidemic is that the product of the density of susceptibles and basic reproduction number (the effective reproduction number) be greater than unity. Seasonality in SIR infections (e.g., measles) has been explained by the timing of the school year (i.e., when contact rates between children substantially increase). Although measles and hRSV generally have different seasonal patterns, we suggest that both patterns could be due to the same seasonal forcing, given that the duration of immunity against measles is different from that against hRSV. Thus, it could be that the same seasonal factors drive transmission patterns of hRSV, measles and other viruses, but the variation in timing of observed incidence is due to differences in immunity. We plan to explore this further with a number of viral infections.