Accounting for cross-immunity can improve forecast accuracy during influenza epidemics

Previous exposure to influenza viruses confers partial cross-immunity against future infections with related strains. However, this is not always accounted for explicitly in mathematical models used for forecasting during influenza outbreaks. We show that, if an influenza outbreak is due to a strain that is similar to one that has emerged previously, then accounting for cross-immunity explicitly can improve the accuracy of real-time forecasts. To do this, we consider two infectious disease outbreak forecasting models. In the first (the ``1-group model''), all individuals are assumed to be identical and partial cross-immunity is not accounted for. In the second (the ``2-group model''), individuals who have previously been infected by a related strain are assumed to be less likely to experience severe disease, and therefore recover more quickly than immunologically naive individuals. We fit both models to case notification data from Japan during the 2009 H1N1 influenza pandemic, and then generate synthetic data for a future outbreak by assuming that the 2-group model represents the epidemiology of influenza infections more accurately. We use the 1-group model (as well as the 2-group model for comparison) to generate forecasts that would be obtained in real-time as the future outbreak is ongoing, using parameter values estimated from the 2009 epidemic as informative priors, motivated by the fact that without using prior information from 2009, the forecasts are highly uncertain.In the scenario that we consider, the 1-group model only produces accurate outbreak forecasts once the peak of the epidemic has passed, even when the values of important epidemiological parameters such as the lengths of the mean incubation and infectious periods are known exactly. As a result, it is necessary to use the more epidemiologically realistic 2-group model to generate accurate forecasts.Accounting for partial cross-immunity driven by exposures in previous outbreaks explicitly is expected to improve the accuracy of epidemiological modelling forecasts during influenza outbreaks.


115
The 1-group SEIR model is described by the following differential equations, in which individuals are either (S)usceptible and available for infection, (E)xposed (i.e. infected but not yet infectious or symptomatic), (I)nfectious or, (R)emoved: In this model, the infection rate is governed by the parameter β, the mean latent period is 1/κ weeks and the 116 mean infectious period is 1/µ weeks. The basic reproduction number R 0 of the 1-group model is given by Following Cintrón-Arias et al. (2009), the number of recorded cases in week j (recorded at the end of that 118 week) where j is the integer number of weeks since the epidemic began, is given by The constant value S + E + I + R = N represents the effective population size. Since pathogens are most likely 120 to be transmitted locally, individuals in distant locations are not available for infection and so N is expected to 121 be smaller than the true population size (Gart, 1968;Pouillot et al., 2008). In Table 1  The 2-group model is an extension of the standard SEIR model in which immunologically naive and partially immune individuals are distinguished between. The 2-group model is given by the following system of differential equations: The basic reproduction number R 0 of the 2-group model is given by where Γ = (Γ 1 , Γ 2 ) is the eigenvector corresponding to the dominant eigenvalue of the matrix It has been shown that immune imprinting with a previous related influenza strain decreased the numbers of 128 severe cases of H1N1, H5N1 and H7N9 influenza (Gostic et al., 2016(Gostic et al., , 2019. We assume that partially immune 129 individuals experience less severe disease and therefore typically recover more quickly than immunologically naive 130 individuals (i.e. 1/µ I < 1/µ N ). To isolate this effect alone on the predictability of epidemics, in this model it is 131 assumed that partially immune and naive individuals are otherwise identical. 132 We assume that only cases of severe disease (i.e. infected individuals who were previously immunologically 133 naive) report infection, so that the number of recorded cases in week j is given by Since we assume that only immunologically naive individuals report infection, the infectious period of immuno-135 logically naive individuals in the 2-group model is assumed to be identical to the infectious period of individuals 136 in the 1-group model (i.e. 1/µ N = 1/µ).

137
Denoting the fraction of the population that is partially immune by ν, we have that S I + E I + I I + R I = νN  Table 1 we list the parameters that appear in equations (7) (Hastings, 1970). All other parameters are assumed to be known. A likelihood function is used where it is accounted for in the models) are normally distributed, and that this noise scales with the square root of the size 147 of the data (i.e. the number of cases can model the partially immune fraction of the population over the years between the end of a first epidemic 165 and the start of a future epidemic of a related strain of influenza. We assume the next epidemic infects the same 166 effective population as the first one (in 2009). This immune fraction of the population will decay due to births 167 and deaths, which we now consider because the timespan between major influenza epidemics (which are often 168 part of global pandemics) is typically many years (Kilbourne, 2006). Further details are given in Supplementary   169 Information Section S.4. We seed the future epidemics by assuming there are 10,000 recorded cases in the first 170 week from which we generate forecasts (three orders of magnitude smaller than the effective population sizes 171 used in our models). When using the 2-group model to forecast future epidemics, we assume we know the partial 172 immune fraction of the population exactly. In Figure 1 we have presented two mathematical models which can be used to forecast influenza epidemics. As 175 well as determining which model to use, we shall investigate three forecasting approaches (see Figure 2 for a 176 schematic of these approaches).   All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020.  Table 2. The numbers of recorded, unrecorded, and combined total weekly cases estimated 188 using the 2-group model are shown in Supplementary Information Section S.5. 189 We use the mean estimated transmission rate and effective population size from the 2-group model to generate 190 synthetic data for future epidemics. We assume that, if the simulated epidemic takes place further in future,   (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020. . https://doi.org/10.1101/2020.07.19.20157214 doi: medRxiv preprint If a major influenza epidemic were to occur, we would want to make real-time forecasts of its dynamics using 195 live data of the number of cases each week to update predictions. We consider the case of a future epidemic 196 occurring 25 years after the 2009 outbreak. We make forecasts using the 1-and 2-group models, at 10, 20, and 197 30 weeks after the first recorded cases. The results presented in Figure 4 show how the model forecasts change 198 as we update them. The uncertainty in the forecasts of both models is large when they are made at weeks 199 10 or 20. By contrast, if forecasts are made at week 30, then they accurately describe the remainder of the 200 epidemic. We remark that at week 30, the peak of the epidemic has already passed and so accurate forecasting 201 is less practically useful. We conclude that it is difficult to forecast the dynamics of an epidemic in real-time 202 without prior information about the transmission rate and effective population size. This result motivates us 203 to use information from previous epidemics when forecasting the dynamics of a future one; we investigate this 204 further below. 205 9 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020. . https://doi.org/10.1101/2020.07.19.20157214 doi: medRxiv preprint  (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Forecasting Epidemics in Advance
The copyright holder for this preprint this version posted July 20, 2020.

219
Assuming the 2-group model more accurately reflects the underlying epidemiology of the system, we conclude 220 that the 1-group model forecasts may differ markedly from the dynamics of the next epidemic of a related strain.

221
Forecasts using larger and smaller variances of the distributions of β and N are presented in Supplementary   222 Information Section S.7. The partially immune fraction, basic reproduction number, total number of recorded 223 cases, duration, and size and timing of the peak of the epidemic forecasted using the 2-group model, as the time 224 between epidemics increases, are shown in Supplementary Information Section S.8. As seen in Figure 5, the 225 further into the future the next epidemic of a related strain occurs then, all else being equal, the greater the total 226 and peak number of cases and the shorter the duration of the epidemic. 227 11 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

228
We showed above that real-time forecasts of an influenza epidemic, without informative priors for the fitting 229 parameters, were typically very uncertain (e.g. Figure 4 (a)-(b)). We therefore now consider using priors to

236
We considered different widths of priors used to inform the epidemic forecasts ( Figure 6 and figures in 237 12 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020. . https://doi.org/10.1101/2020.07.19.20157214 doi: medRxiv preprint used, there was sometimes a discrepancy between the calibrated model trajectory and the observed epidemic 239 data (e.g. left part of Figure 6 (f)). For that reason, we also show epidemic forecasts obtained using a wider prior 240 (Figure 6 (d), (g) and (j)), so that the model fits the observed data more accurately. When the 1-group model 241 is used, the forecast is either inaccurate (when a prior with low variance is used; e.g. Figure 6 (c)) or imprecise 242 (when a prior with higher variance is used; e.g. Figure 6 (d)). In either case, our main conclusion is unchanged: 243 predictions are improved when the more epidemiologically realistic 2-group model is used. 244 13 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020.   14 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020. to account for partial cross-immunity when forecasting influenza epidemics and, moreover, whether data from 250 previous epidemics can improve forecasts. 251 We have considered two different mathematical models which describe the spread of influenza, a 2-group   Finally, we considered incorporating knowledge of parameters from the 2009 epidemic, combined with using 271 outbreak data to make real-time forecasts during a future epidemic. When the 1-group model was used to make 272 forecasts, we found that wide priors had to be used to calibrate the model trajectories to the data, especially 273 early on. Although these priors, based on the 2009 epidemic, improved the forecasts compared to when no prior 274 information was used, the forecasts were still uncertain when made before the peak of the epidemic. By contrast, 275 the 2-group was able to make more accurate forecasts as it accounted for the changes in the number of partially 276 immune individuals and, hence, its priors accurately reflected the values of the true underlying parameters of 277 the system. We conclude that to forecast the long-term dynamics of major influenza epidemics, a model that 278 accounts for partial cross-immunity should be used, and data from related previous epidemics taken into account. 279 We note that there may be instances in which the underlying epidemiology of an outbreak is not well known, 280 or where a complex model cannot be parameterised due to poor surveillance and or lack of data (Gibbons et al., of an epidemic fully may still be informative. 289 We investigated whether or not it is necessary to include a particular source of heterogeneity (i.e. partial 290 immunity) when forecasting influenza epidemics. In a similar manner, we could extend our models to question 291 15 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020. . https://doi.org/10.1101/2020.07.19.20157214 doi: medRxiv preprint whether other sources of heterogeneity should be accounted for. For instance, age-structure could be incorporated 292 (Nishiura et al., 2010). We could also include spatial heterogeneity, by partitioning the population into distinct  Nonetheless, our approach has demonstrated the principle that including partial cross-immunity in forecasting 309 models during influenza epidemics can lead to more accurate forecasts. Cross-immunity due to previous infection ). We expect cross-immunity to remain important in future influenza epidemics. Consideration of partial 313 cross-immunity by epidemiological modellers is therefore of obvious public health importance.

328
There may be situations in which the assumptions of a model underlying the parameter estimation framework 329 are violated, for instance, how the amplitude of the noise in the data scales with the number of cases. These 330 incorrect assumptions may not be immediately apparent when observing the fitted models and data. However, 331 to look for any systematic deviations between the fitted models and data, we can carry out a residual analysis,

334
In Figure S1 we plot the residuals, as calculated by (S6) of the 1-group and 2-group models with least squares  Figure S1, which suggests that it is reasonable to assume that the noise in the observed 338 data of the number of cases scales with the square root of the size of the data. 17 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020.  (Hastings, 1970). It is assumed that the noise in the observed data scales with the square root of the size 342 of the data (i.e. the number of cases), hence we use the likelihood function where z(t j , θ 0 ) are the number of cases given by our model with the parameters θ, y j are the observed data, and 344 Z is a normalisation constant we need not consider.

345
To determine σ, we first fit the 1-and 2-group models to data from the 2009 epidemic in Japan using a least 346 squares procedure, again assuming that the observed data scales with the square root of the size of the data, 347 and that the observed data deviates from the model forecasts due to noise which the model does not account for, 348 such that 349 y j = z(t j ; θ) + j z(t j ; θ) 1/2 for j = 1, ..., n, where j are assumed to be independent and identically distributed random variables with zero mean E[ j ] = 0 350 and finite variance var[ j ] = σ 2 . We define the cost function The least squares parameter estimates are given by where Θ is the feasible set of parameter values. We solve the optimisation problem (S4) using MATLAB's 353 fminsearch function.

354
Following Cintrón-Arias et al. (2009), we estimate the noise scaling parameter σ 2 by where p is the number of parameters estimated from the data, and define the residuals by the ratio When fitting the models to data from the 2009 epidemic in Japan, we estimate σ using (S5), calculating 357 σ = 123.5 for the 1-group model and σ = 123.9 for the 2-group model respectively. Hence, we use the value 358 σ = 124 in the likelihood function (S1) throughout this study. A residual analysis of both models with the least 359 squares estimated parameters is given in Supplementary Information Section S.1, justifying the assumption the 360 noise in the observed data scales with square root size of the data.

361
18 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020. . https://doi.org/10.1101/2020.07.19.20157214 doi: medRxiv preprint The observable data we consider is the number of cases that have been recorded each week, C(j). Given that 363 we observe C(j) for j = 0, 1, ..., m, we wish to estimate the number of individuals in each compartment of our 364 models at week j as initial conditions to make forward forecasts.

365
When estimating initial conditions from case data from multiple previous week (i.e. when we are part way 366 through an epidemic), it is necessary to take data from every week into account, as for instance, an individual 367 who was reported as being infected many weeks ago may not have recovered yet. As we estimate initial conditions 368 from the observed data rather than model trajectories, we make the assumption that the cases are generated 369 uniformly in time (e.g. if there were seven cases reported in one week, we assume that there was one new case 370 each day).

371
Assuming an exponentially distributed infectious period as in our models, and first considering the 1-group model, if C(j) cases were reported each week for j = 0, 1, ..., m, the number of infected individuals I at week m is given by Restating (6) We now consider the 2-group model. As it is assumed that only cases of naive individuals becoming infected 378 are reported, we estimate the number of infected naive individuals at week m to be As partially immune and naive individuals are equally likely to be infected, we estimate the number of new 380 (unrecorded) cases of infectious partially immune individuals each week to be ν 1−ν C(j). Hence, we estimate that 19 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020. . https://doi.org/10.1101/2020.07.19.20157214 doi: medRxiv preprint The number of exposed, recovered, and susceptible individuals at week m in each group are estimated by 20 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020. . https://doi.org/10.1101/2020.07.19.20157214 doi: medRxiv preprint We assume that all infected naive individuals in the 2009 H1N1 influenza epidemic in Japan acquired partial 383 immunity to the virus and related strains. Hence, we can compute the partially immune fraction of the population 384 immediately after the epidemic, ν * , by where ν is the immune fraction before the epidemic, C T is the number of total cases recorded in the data, and 386 N is the total effective population size. Hence, from the 2-group model fitted to data from the 2009 epidemic in 387 Japan as shown in Figure 3, we estimate the partially immune fraction of the population immediately after the 388 epidemic to be ν * = 0.7443, using the mean estimated total effective population size, N = 4.660 × 10 7 .

389
To determine the partially immune fraction when a second epidemic in the future begins, we shall compute 390 the fraction as a function of the time since the end of the first epidemic. We shall assume that once an individual 391 acquires partial immunity, they remain partially immune for their lifetime.  by d(a). Hence, the probability, P (a, n), that an individual 395 of age a survives for n more years is given by where d(a)/b(a) is the probability than an individual aged a dies within one year.

397
The fraction of individuals who have survived for n years is then given by where ν * is the immune fraction immediately after the first epidemic. The result is shown in Figure S6  (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020. We assume that all cases of immunologically naive infected individuals are recorded in the data, and that no cases 407 relating to partially immune individuals are recorded, due to the assumption that naive individuals suffer more 408 severe symptoms and are therefore more likely to seek medical attention. In Figure S3 we display the number 409 of (recorded) infected naive individuals, the number of (unrecorded) infected partially immune individuals, and 410 their combined total, estimated by the 2-group model fitted to the data from the 2009 epidemic in Japan. Figure S3: The weekly number of recorded, unrecorded, and total (recorded and unrecorded) cases as estimated by the 2-group model with mean estimated parameters, fitted to data from the 2009 Japanese influenza epidemic.

411
22 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

413
Different variances of these distributions may be considered when making forecasts, values of which are given in 414   Table S1. 415
23 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020. 24 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020. . https://doi.org/10.1101/2020.07.19.20157214 doi: medRxiv preprint In Figure S6 we show how the partially immune fraction, basic reproduction numbers, and various quantities of 423 interest relating to the dynamics of a future epidemic (estimated using both the 1-group and 2-group models) vary 424 dependent on when the future epidemic occurs. The partially immune fraction decreases approximately linearly 425 before tapering to zero as time increases (Figure S6 (a)). This results in an increase in the basic reproduction 426 number, R 0 , of the 2-group model ( Figure S6 (b)). 427 We explore various quantities of interest which would inform policy makers on how best to control the  The total number of recorded cases of individuals seeking medical attention reflects the number of total medical 435 resources (e.g. antivirals, hospital beds, staff) required to treat individuals ( Figure S6 (c)). This increases in 436 the 2-group model the further into the future the next epidemic begins, due to the decreasing partially immune 437 population fraction. It would be useful for public health officials and medical facilities to know the expected 438 duration of the epidemic so they can plan and allocate resources accordingly. Here we define the duration of the 27 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted July 20, 2020. 28 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.