Accurately summarizing an outbreak using epidemiological models takes time

Recent outbreaks of Mpox and Ebola, and worrying waves of COVID-19, influenza and respiratory syncytial virus, have all led to a sharp increase in the use of epidemiological models to estimate key epidemiological parameters. The feasibility of this estimation task is known as the practical identifiability (PI) problem. Here, we investigate the PI of eight commonly reported statistics of the classic susceptible–infectious–recovered model using a new measure that shows how much a researcher can expect to learn in a model-based Bayesian analysis of prevalence data. Our findings show that the basic reproductive number and final outbreak size are often poorly identified, with learning exceeding that of individual model parameters only in the early stages of an outbreak. The peak intensity, peak timing and initial growth rate are better identified, being in expectation over 20 times more probable having seen the data by the time the underlying outbreak peaks. We then test PI for a variety of true parameter combinations and find that PI is especially problematic in slow-growing or less-severe outbreaks. These results add to the growing body of literature questioning the reliability of inferences from epidemiological models when limited data are available.

Incredible efforts have been made in recent years to apply epidemiological models to the empirical data borne out of the COVID-19 pandemic.The LitCovid aggregator currently contains over 3,000 papers on "epidemic forecasting" and "modelling and estimating" trends of COVID-19 spread [1].We are seeing similar waves of models and forecasts for recent outbreaks of monkeypox, Ebola, influenza and respiratory syncytial virus.However, the enormous variability in model predictions, even among works using the same model and similar data, erodes confidence when interpreting these efforts for policy decisions [2].It is clear that uncertainty remains about what we can expect to learn from models, and when.
Disease models tackle the difficult challenge of describing complex epidemic processes by relating mechanistic processes to population level observations such as daily reported cases.Identifying combinations of parameters which plausibly replicate observed data can help summarize the epidemic dynamics.Common statistics include the basic reproductive number R, the average number of new cases someone will cause in an entirely susceptible population, and the outbreak size, the fraction of the population who will eventually have had the disease.Because these indicators are the product of interacting social and biological phenomena, they are never available through direct observation.Fitting epidemiological models to data is one of the best options for estimating these important quantities [3].
The classic Susceptible-Infectious-Recovered (SIR) model accounts for a minimal number of critical mechanisms of disease spread.Infectious individuals infect susceptible individuals at a rate β and recover at a rate * Email: laurent.hebert-dufresne@uvm.edu α.These mechanisms can be tracked through time by a set of ordinary differential equations: It is common to consider S, I and R as a fraction of the population in a given state such that S + I + R = 1 at all time.The initial state of the population might not be known-especially the susceptible pool S 0 ≡ S(t = 0).Focusing on the second equations, we can see that the epidemic will grow exponentially at a rate βS 0 −α for initial small values of I, making it clear there will be large uncertainty in the value of individual parameters [4].Conversely, when I becomes small after the peak, the infectious population eventually decays exponentially at a rate α.Observations of I will therefore provide information about different parameters, or combinations thereof, at different points of an outbreak.However, how this information accumulates over time and how it allows us to identify key summary statistics is more complicated.The widespread application of models built on the SIR backbone has led several authors to caution that the reliability of predictions can be sensitive to available data [4,5], and even more so for common extensions such as the SEIR model [2,6].The question of whether parameters estimated from data are reliable, i.e. close to some hypothetical true parameters θ * = (α * , β * , S * 0 ) which generated the data, is termed the practical identifiability (PI) problem.Here we use a new measure which allows us to directly measure our ability to learn various epidemiological quantities.If u = f (θ) is an unknown variable to be estimated, our pseudo-Bayesian interpretation of the identifiability of u is the expected logarithm of the ratio between posterior and prior probabilities, evaluated at u * : where y | θ * are noisy observations of the epidemiological variable, e.g., daily case counts, and where the expectation is taken over realizations of the observation process.This measure reflects the magnitude of information a researcher can expect to gain when fitting a model to data, while allowing the effect of particular values of θ * to be studied.(1) does not require computationally expensive Bayesian inference methods to compute -a simple Monte Carlo procedure for estimating (1) is outlined in the SI Text.

RESULTS
Figure 1 shows the PI of the SIR model parameters, as well as five summary variables which are commonly calculated in terms of θ (see Table I for mathematical definitions), for a typical parametrization θ * of the model.Observations were distributed with relatively little noise, to better study PI inherent to the SIR model itself.δ u is computed daily for these eight variables using observations for the first 30 days.
The rate of learning for all variables is uneven over time, with each reaching plateaus of varying length before the peak.The infection rate β is the worst identified.Gaining information on α appears easier than β and S 0 and even exceeds learning for R and O after around T = 20 days of observation.PI of the peak intensity, peak timing, and growth rate increase more rapidly at first, with the learning of the growth rate happening particularly fast.The true growth rate is over 25 times more probable having seeing the data after only 5 days of observation.
To test the sensitivity of these findings to θ * , we computed δ u over a grid of values for β * and S * 0 (Figure 2).Since slower growing outbreaks will naturally contain less information per day [7], information gain was calculated using observations up until the first day after the epidemic peak.The outbreak size of the true epidemic was the most correlated with learning of the five summary variables, followed by growth rate.

DISCUSSION
The analysis presented here makes it clear that some epidemiological variables are easier to estimate through model dynamics than others, and emphasizes that most epidemiological summary statistics should be interpreted with caution when data are limited.Taken together, the rate of learning for all the variables suggests that learning takes place in three general phases.In phase 1, plausible parameter combinations quickly concentrate along the surface {θ : βS 0 − α = G * }, as infections increase exponentially with the initial growth rate.This explains the sharp but modest gain in information of all variables except for G during this phase.In phase 2, infections begin to saturate and parameter combinations matching FIG. 2. Change in practical identifiability δu when the true parameters θ * are varied.δu is calculated using daily observations up to the first day after the true (unobserved) outbreak has peaked.True parameters tested were all combinations of β * = 0.3, 0.5, . . ., 1.5 and S * 0 = 0.1, 0.3, . . ., 0.9 with α * = 0.2 fixed.Pearson correlation between δu and true outbreak size is given in corners of each panel.Priors are the same as in Figure 1.
the true peak intensity and timing become more plausible.However, for β especially, saturating case counts do little to further restrict the plausible parameter surface from phase 1.Finally, phase 3 is characterized by gradual information gain for the remaining variables.Since infections are slowly declining with α during this phase, this growth is explained by α * gradually being identified, which propagates to allow some remaining combinations on the plausible surface to be eliminated.
Parameters describing the mechanisms of the modelβ, α and S 0 -take a particularly long time to learn on account of quickly reaching a plateau at low values of δ u .As a result, the SIR model is more effective at forecasting short term statistics of the dynamics such as peak timing and intensity, than it is at estimating mechanisms.This result shows how difficult it is to estimate parameters from early data in the hope of forecasting the impacts of mechanistic interventions such as reducing β with preventive measures or increasing α with treatment [8].
Learning was nearly as difficult for the statistics R and O as for the individual model parameters, despite the fact that optimistically, these transformations would combine the information of each parameter they depend on.The failure of these statistics to resolve closely exchangeable parameter combinations limits their reliability for succinctly describing an epidemic.In contrast, the initial growth rate resolves such combinations to give rapid shrinkage to the correct value, despite encoding similar information as R about disease dynamics [9].This suggests growth rates are a more reliable "first look" at an outbreak when using prevalence data under the SIR model.
When varying the true values θ * , see Figure 2, we find that less-severe outbreaks are generally harder to learn, despite having more daily observations available before their peak.The initial susceptible population S 0 appears the most poorly identified across values of θ * by the peak, and the expected posterior shrinkage is even slightly negative for 25% of the tested values.An interesting implication for control measures is that the more we reduce the severity of true infection dynamics, the harder it will be to accurately estimate the impacts of interventions.Further, the mode of intervention matters: variability along the y-axis in Figure 2 for similar values of O * shows lowering S * 0 impacts learning differently than a reduction in β * .
Previous investigations into the PI of the SIR model have mainly focused on the PI of α and β under the simplified model where S 0 ≈ 1 is known.These works generally agree that PI of both α and β is limited during phase 1 [4], but that the majority of information available has been learned by the time the disease has peaked [6,10].Most comparably to the observational design in Figure 1, Capaldi et al. ( 2012) considered the asymptotic variance of β and α over an increasing timespan, and found the variance of both estimators decreased rapidly and smoothly just before and after the peak, respectively [7].In contrast, the delayed rate of learning of these parameters in Figure 1 paints a more pessimistic picture of PI when exact likelihoods and prior context is taken into account.This finding supports the idea that previous PI results based on sensitivity equations underestimate uncertainty, particularly during the early stages of an outbreak when the likelihood surface is highly nonlinear [11,12].
The Bayesian nature of our method of assessing PI means that estimates of model parameters and any variables which depend on them are sensitive to prior beliefs.In this report, our choice of uniform priors represents modest assumptions about an emerging pathogen: a priori, just over 50% of scenarios result in an outbreak (i.e. have βS 0 /α > 1), and outbreaks range from modest to highly severe (70% of individuals infected at peak).However, for many pathogens, more informative prior information is frequently available, for example on the recovery rate of a disease [13].Relative to more realistic settings for P (θ), this may mean α is more difficult to gain information about than β and S 0 .Further, our choice of priors shows that initial shrinkage in the likelihood surface can just as readily be explained by common-sense bounds on the model parameters.In this sense, not taking prior assumptions into account when calculating PI arguably over-reports learning.
While we have considered only noisy observation of the current infectious population, real data may also come in the form of daily new infections or cumulative case counts, and may suffer from lags in reporting or preferential sampling [14,15].Learning epidemiological variables from such data will have their own distinct challenges [6].PI of the SIR model should also be assessed with hierarchical models incorporating data from multiple sources, such as hospitalizations and isolated clinical experiments [16].Yet, our work shows that even in its simplest form, learning parameters and statistics of SIR dynamics takes time, limiting which inferences, forecasts, and control policies can be made from early epidemic data.

Data availability
Materials necessary to reproduce this analysis are available online at github.com/brendandaisy/epi-summaries-over-time.

Observation model
Infectious individuals are assumed to be independently tested at a fixed rate η at integral timepoints t = 1, . . ., T , giving a likelihood y t ∼ Poisson(ηI(t; θ * )), where I(t; θ * )) are the infectious dynamics parameterized by unknown values θ * .η = 1000 is assumed known throughout.

FIG. 4 .
FIG. 4. Density of samples from P (α, β | u * ), and P (β, S0 | u * ) given different summary transformations.True values and priors are the same as in the main text.