Uncertainty and error in SARS-CoV-2 epidemiological parameters inferred from population-level epidemic models

During the SARS-CoV-2 pandemic, epidemic models have been central to policy-making. Public health responses have been shaped by model-based projections and inferences, especially related to the impact of various non-pharmaceutical interventions. Accompanying this has been increased scrutiny over model performance, model assumptions, and the way that uncertainty is incorporated and presented. Here we consider a population-level model, focusing on how distributions representing host infectiousness and the infection-to-death times are modelled, and particularly on the impact of inferred epidemic characteristics if these distributions are mis-specified. We introduce an SIR-type model with the infected population structured by ‘infected age’, i.e. the number of days since first being infected, a formulation that enables distributions to be incorporated that are consistent with clinical data. We show that inference based on simpler models without infected age, which implicitly mis-specify these distributions, leads to substantial errors in inferred quantities relevant to policy-making, such as the reproduction number and the impact of interventions. We consider uncertainty quantification via a Bayesian approach, implementing this for both synthetic and real data focusing on UK data in the period 15 Feb–14 Jul 2020, and emphasising circumstances where it is misleading to neglect uncertainty. This manuscript was submitted as part of a theme issue on “Modelling COVID-19 and Preparedness for Future Pandemics”.


Introduction
A simple deterministic discrete time Susceptible-Infected-Removed ('SIR') epidemic model, sometimes termed the 'general epidemic model' (Diekmann and Heesterbeek, 2000), is defined as follows. Suppose that and are the number of susceptible and infectious individuals, respectively, on day ; and that between day and + 1 for some rate constants and , of the susceptibles become infectious, and of the infectious individuals become 'removed' (by death or recovery, so that they are no longer susceptible nor infectious). Though very simple, this model embodies important epidemiological principles and exhibits reasonable dynamics, including exponential growth in the early stages and decline once the susceptible population is suitably depleted. The model also connects to key epidemiological parameters such as the basic reproduction number,  0 , defined as expected number of cases directly generated by one case in a population where all individuals are susceptible, for this model equal to ∕ . The the aim to understand the impact of infected-age structure on model dynamics. In contrast, our focus here is on the ''inverse'' problem, i.e., to understand the impact on quantities inferred from data, and in particular to highlight circumstances when ignoring infected-age structure gives misleading conclusions.
Quantities of policy-making relevance and of interest to infer include: the timing and impact of behavioural changes, especially relating to non-pharmaceutical interventions (NPIs) such as 'lockdown' measures; the effect of NPIs on epidemiological characteristics such as the effective reproduction number,  (defined later); the timing and size of the peak in the number of new daily infections; and the number of individuals that remain susceptible at the end of an epidemic wave.
In Section 2 we describe the data available on the epidemic timecourse and on changes in mobility patterns relating to response to NPIs. We describe an infected-age-structured model, which is the central model in the paper, and some simpler models for comparison of the types sometimes used for inferring characteristics of an epidemic. The results in Section 3 include an application of the model to compare competing hypotheses for epidemic decline; a simulation study investigating the impact of using non-infected-age-structured models for inference when the data arise from an infected-age-structured model; and full Bayesian inference focusing on UK data for the SARS-CoV-2 outbreak in spring 2020, including for the quantities listed in the preceding paragraph. Section 4 contains a concluding discussion.

Data
We draw on epidemiological and clinical data for ancestral SARS-CoV-2 that was available following the first epidemic wave, as described further below.

Deaths associated with SARS-CoV-2
We focus on country-level SARS-CoV-2 mortality data for England and Wales for the period Feb-July 2020. The total population, denoted in the following models, is 59.1 million (ONS, 2020b). Various time-course data are available for this period, including the number of confirmed cases by specimen date, though such data are influenced by rapid changes to testing capacity (DHSC, 2020) and strategy (Dunn et al., 2020) that make them challenging to connect to variables in an epidemic model. For this reason we focus exclusively on data for SARS-CoV-2-associated deaths collated by date of death for England and Wales by the UK Office for National Statistics. These deaths data count registered deaths where COVID-19 was mentioned on the death certificate (ONS, 2020a), with the first death occurring on 8 Mar 2020. Although death data are available beyond July 2020, we restrict ourselves to considering inferences that may be made at the conclusion of the first epidemic wave. The deaths data are plotted later in results Fig. 3.

Mobility data
The UK government announced on 23 Mar 2020 that a lockdown would be imposed entailing severe restrictions including that no one should leave their place of residence except for some specific reasons (UK Government, 2020a). Over the next 5 months this legislation was revised approximately monthly, sequentially expanding the permissible reasons to leave one's place of residence, whilst introducing guidance on COVID-secure workplaces (Department for Business, Energy & Industrial Strategy and Department for Digital, Culture, Media & Sport, 2020) and mandating face coverings in some public spaces (UK Government, 2020b). Publicly available data indicate the extent to which public behaviour altered in this period. Google mobility data, for example, indicate a relative change, with respect to a normal baseline, in activity in various categories across the UK (Google, 2020), and provide a proxy for changes in mixing intensity in England & Wales over this period. Fig. 1A shows the mean of 'workplace' and 'transit' categories computed daily from the UK Google mobility data, which motivates later modelling of a change in population mixing intensity with a step function to account for the introduction of NPIs.

Clinical data on individuals' response to SARS-CoV-2 infection
The models in Section 2.2 involve epidemiological parameters drawn from the clinical literature that characterise individuals' infectiousness and mortality. These are summarised in Table 1, including 'default' values that are used in the later simulation study, and priors that are used for the Bayesian inference.
For most hosts infected with SARS-CoV-2, detectable viral load in the upper respiratory tract peaks at around 5-6 days following exposure . For many hosts this approximately aligns with the delay from exposure to experiencing symptoms (the infection-toonset period) (Backer et al., 2020), and thus there is a significant period of infectiousness prior to symptom onset Ashcroft et al., 2020). Some hosts remain asymptomatic throughout infection and experience similar viral load dynamics, besides somewhat faster viral clearance following peak viral load (Kissler et al., 2020). For the purposes of developing a transmission model, it is therefore important to describe a typical host infectiousness profile, , subject to ∑ ∞ =1 = 1, as depending on the time in days, , since infection (rather than with respect to symptom onset).
The generation time interval, defined as the period from host infection to the generation of progeny infection, has a distribution that describes such an infectiousness profile (Lehtinen et al., 2021). We use the inferred gamma-distributed generation time from the Singapore cluster in Ganyani et al. (2020), with mean 5.2 days and variance 2.96 days, to define a fixed infectiousness profile when inferring the impact of lockdown in Section 3.4. Since the models in this paper are in discrete time, we discretise the gamma distribution by taking is the gamma cumulative distribution function. Estimates from other international data sets available in 2020 are congruent with the estimate from Ganyani et al. (2020) (see, e.g. Sun et al., 2020;Ferretti et al., 2020). Estimates for the generation interval for the UK during 2020 became available in 2021: Challen et al. fit parametric distributions to infector-infectee data for the First Few Hundred cases monitored by PHE (Challen et al., 2021), while Hart et al. also fit a mechanistic model to household infector-infectee data that accounts for the role of symptom onset on infectiousness (Hart et al., 2021(Hart et al., , 2022. Resulting point estimates for the realised mean generation interval (accounting for behaviour and susceptible depletion Fraser, 2007) range from 3.6-4.9 days. To account for potential variability due to alternate estimates for the latent period, the role of an infected's behaviour -in particular the efficacy and timing of isolation -in influencing the generation interval, and the potential for unrepresentative sampling of infectorinfectee pairs, we draw on the range and reported confidence across early estimates to construct priors on the mean and variance of a gamma distributed infectiousness profile (see Table 1).
We refer to as the infectiousness profile rather than the generation time distribution, as the latter has a small correction due to any overlap in the distributions for time from infection to death and infectiousness by infected-age due to the non-zero hazard of death while infectious (see Section 2.2.5). Fig. 1B shows the 'default' infectiousness profile.
To infer the epidemic dynamics based on the death data, it is key to characterise accurately the infection-to-death distribution. We denote the probability mass function of the infection-to-death distribution by = 1}, such that is the proportion of individuals that die days after infection, conditional on ultimately dying from SARS-CoV-2. The infection-to-death distribution is the convolution of infection-to-onset and onset-to-death distributions, for which separately there are data.  (3), for , which describes changes in population mixing intensity. The data points are given by the mean of workplace and transit activity levels for the UK, with Saturday and Sunday removed (the four outliers being bank holidays). The blue step is given by Eq. (3) and contains two parameters which control the timing and severity of the drop in viral transmission due to lockdown measures: * and , respectively. Plots (B) and (C) respectively show the infectiousness profile, over a 15 day period, and the infection-to-death distribution, shown over a 50 day period; these correspond to the distributions and default parameters listed in Table 1. The infection-to-onset period is estimated from cases with well constrained exposure history (e.g. Backer et al., 2020). For consistency with our choice of infectiousness profile, we adopt the infection-toonset period used by Ganyani et al. (2020) when inferring the generation interval distribution above, namely a gamma distribution with mean 5.2 days and standard deviation 2.8 days. For onset to death, early data from Hubei, China, corrected for epidemic growth, suggested an average time 18 days and standard deviation 8.4 days (Verity et al., 2020). A report published by the UK government provides distributions for the time from symptom onset to death for over 22,000 patients hospitalised in the UK during the first epidemic wave (up to August 1st 2020), by age range and sex . The aggregate distribution is described by a gamma distribution with mean 14 days (std. dev. 9.8 days), and is stable when re-weighting the age-and sexdependent distributions to match those of reported deaths in England and Wales over the same period (ONS, 2020a), giving some confidence that this distribution is representative for the population we are modelling over the period of study. Similar estimates for the death delay are reported by Sherratt et al. (2021) based on confidential UK data.
We model the infection-to-death distribution as a negative binomial distribution, chosen for an appropriate shape, computing point estimates for its parameters by matching moments to the convolution of the Ganyani et al. (2020) infection-to-onset and Harrison et al. (2020) onset-to-death distributions described above. This leads to the parameters shown in Table 1 and the distribution plotted in Fig. 1C. There remains uncertainty, however, in the infection-to-death distribution owing to uncertainty in the infection-to-onset period (see Backer et al., 2020), the censoring effect of unknown symptom onset dates in the hospitalisation data , and regional variability (about which we lack data) that may influence the effective average time to death profiles (e.g. Hawryluk et al., 2020). We hence characterise uncertainty in the infection-to-death distribution via priors on the mean and dispersion parameters. We choose priors, shown in Table 1, such that infection-to-death distributions with high prior probability are consistent with the distributions estimated by Harrison et al. (2020) and Verity et al. (2020) time-to-death distributions.
A further key parameter for inferring epidemic dynamics from death data is the infection fatality rate (IFR), , which is the risk of death for an infected host, neglecting other host covariates. Seroprevalence of antibody to SARS-CoV-2 -together with evidence regarding the preservation of measurable antibody (Huang and Garcia-Carreras, 2020) -provides an estimate of the integrated exposure history to SARS-CoV-2, and enables estimation of the . In the results sections we explore the impact of different assumptions about : in Section 3.1 we let be a free parameter; in Section 3.2 we fix its value to the central population-weighted estimate for England from (O'Driscoll et al., 2020); and then in Sections 3.3 and 3.4 we try to characterise knowledge about with a judicious prior (Table 1) that reflects uncertainty Table 1 Summary of key epidemiological parameters drawn from clinical literature, as described in Section 2.1.3, including default values used later to generate synthetic data (Section 3.2), and priors used in the Bayesian inference (Section 3.4). The default values correspond with the plots of the infectiousness profile and infection-to-death distribution shown in Fig. 1B  and bias that may arise for various reasons, such as non-representative serological surveys, non-uniform prevalence in different risk groups (e.g. in care homes versus the surrounding community) and waning antibody levels (O'Driscoll et al., 2020).

Epidemic models
In the following we present several simple deterministic epidemic models. The first, called , is the central one in this paper and introduced with the goal of being as simple as possible whilst retaining structure by infected age. The subsequent , and 2 2 2 models are simple models that are not directly structured by infected age. Fig. 2 shows schematics of the various models.

A model structured by infected age:
We denote by , the number of individuals who on day became infected > 0 days ago; and the total number of infected individuals on day is thus = ∑ ∞ =1 , . In this model, between day and + 1 the number of new infections is +1,1 = ∑ ∞

=1
, , for suitable that in general depends on time (to reflect changing transmission owing to changes in population mixing intensity, e.g., due to NPIs) and infected age (to reflect non-constant infectiousness of those infected); and the number of new deaths amongst individuals with infected age is ℎ , , for suitable ℎ . The general infected-age-structured model is therefore for ∈ {0, 1, 2, 3 …} and ∈ {1, 2, 3, …}.  , as defined in Section 2.2. In (B) the compartment indicates deaths that will occur after an additional fixed delay . The model is the special case of the model with = 0. Model (D) includes two compartments for each of the exposed ( 1 , 2 ), infectious ( 1 , 2 ), and non-infectious ( 1 , 2 ) states.
In this model, in contrast to some common epidemic models including the non-infected-age-structured ones described later in this section, the denotes individuals who have been ''infected'' but are not necessarily ''infectious'', since individuals of infected age contribute to infecting susceptibles if and only if > 0. This removes the need for an variable for ''exposed but pre-infectious'' individuals, or an variable for ''recovered'' individuals.
The degrees of freedom in the model need to be controlled via some further modelling choices for the { } and {ℎ }. We will write Here, the infectiousness profile , defined in Section 2.1.3, is a probability mass function where ∑ ∞ =1 = 1, and models the population mixing intensity relative to pre-epidemic social behaviour and is subject to the constraint 0 = 1. In practice is a composite parameter capturing contact rates, social distancing (including mask wearing) and mobility. A simple model for the impact of introducing NPIs is to enable a sharp reduction in at some change-point time * . Enabling a continuous * helps in Markov Chain Monte Carlo procedures described later, hence we model to transition from 1 to a new baseline at change-point time * via in which ⌊⋅⌋, ⌈⋅⌉ and 1 (⋅) respectively denote the standard floor, ceiling and indicator functions. An example of { } is shown in Fig. 1A.
In view of the constraints on { } and { }, we include > 0 in (2) as a scale factor for the infectious pressure. Parameter = + + is the population size, which in this model is constant, because births and non-SARS-CoV-2 death are negligible on the time scales of interest and are thus neglected.
In the language of survival analysis, the {ℎ ∶ ≥ 1} is the hazard rate function for the death of infecteds, such that ℎ is the probability that an individual having survived − 1 days post-infection will die on day . In terms of the infection-to-death distribution, { }, and its corresponding cumulative distribution function, where is the infection fatality rate. Derivation of this expression is in Appendix A.

2.2.2.
And models A simple and commonly used model (e.g. Britton, 2020;Lourenço et al., 2020) that does not maintain structure by infected age is as follows.
In this model (and those below), the number of new daily infecteds is and = + + + . There is a close parallel to (1) but with key differences: infectiousness of infected individuals is constant with respect to infected age ( is taken to equal 1), and the hazard of removal from being infected is also constant (ℎ is taken to equal some constant ). This model includes an variable, because the assumption of constant infectiousness of individuals necessitates a way other than death for an infected individual to cease being infectious. By analogy to (1), the constant hazard implies that the duration in infected state is geometrically distributed.
The model is a generalisation of the model that replaces Eq. (5d) with which introduces a non-negative integer ''delay-to-death'' parameter, . Introducing a fixed delay in this way is a common modelling strategy to make infection-to-recovery and infection-to-death distributions distinct (e.g. Lourenço et al., 2020) and the latter non-exponential. The basic model is the special case with delay = 0. A generalisation of (7), in which is not necessarily an integer, is For example, if = 7.2 then 80% of the deaths will have a delay of 7 days and 20% will have a delay of 8 days. D.G. Whittaker et al. 2.2.3. An model A different strategy besides incorporating a delay is to incorporate additional model states (see e.g. Royal Society SET-C, 2020). A model variation in this spirit is to incorporate a non-infectious state that follows infection but precedes recovery or death, i.e., in which is a rate constant.

An
2 2 2 model The same strategy can be extended by including more states, for example an exposed (infected but not yet infectious) state, and by representing states using multiple compartments. Such an approach can enable the model dynamics to mimic the delay to peak infectiousness, and the delay between infection and death, and hence indirectly models infected ages (Hurtado and Kirosingh, 2019). The following model, chosen because it closely matches the approach of Kucharski et al. (2020), uses two compartments for each of the , and states. (1) (2) (1) (2) (1) (2) in which is a rate constant.

Connection to epidemiological parameters
The foregoing models connect directly to some key epidemiological parameters. A parameter important for characterising whether the epidemic is growing or declining is the time-varying reproduction number,  , of which there are multiple variations. We adopt for  in this paper the instantaneous reproduction number (Fraser, 2007), which is the average number of secondary infecteds generated by a single infected assuming there are no population level changes in susceptibility or mixing behaviour during their infection. We adopt the 'survival function' approach to calculating  ; performing the sum over infected age of the number of new infecteds generated by a host at infected age weighted by the proportion of hosts still infectious at infected age (Heffernan et al., 2005). For the model: where is the survival function of an infected individual associated with the hazard ℎ of removal from the , defined as the probability of not having been removed by day after infection, which equals = 1 − −1 . The product is the proportion of the  secondary infections that are generated at an infected age , and therefore when normalised is the (discretised) generation time distribution.
For the , , and 2 2 2 models, we compute  from the same definition, noting that infecteds must transit through before reaching ( = 1). Given residence in an infectious state, the number of new infecteds generated per infected ( ) is independent of infected-age. The infectiousness profile is = ∕ ∑ ( ), in which is the proportion of infecteds in an infectious state at infectedage , which can be derived by considering the disease progression defined in Eqs. (5), (8) or (9) as a Markov process (Appendix B). Hence depends on the number of and states as well as the parameters governing the total time in the latent ( ) and infected classes ( ). However, by construction, the average residence time in an infectious state is −1 for each of these models (Appendix B). Hence Estimates of  can be used to infer the benefit of mitigating measures such as NPIs for reducing transmission. Of strategic interest is to infer argmin { < 1}, as this indicates the time when NPIs were sufficient to put the epidemic into decline. Also related and important is the peak daily incidence, max { − +1 }, in which for each model − +1 is the number of new infecteds on day . The basic reproduction number, written  0 , is the special case for infection seeded into a fully susceptible population ( ≈ ) and prior to any mitigation ( = 1).
By design, the model explicitly incorporates the infectiousness profile, { }, and the infection-to-death distribution { }, such that these can be chosen according to clinical data. For the other models in which these are not explicit, it is helpful to understand what are the implied { } and { }. Calculation of these is in Appendix B.

Model initial conditions
In this paper, indexes the number of days since 15 Feb 2020, which is day = 0. We assume initially zero deaths, 0 = 0, and zero recovereds, 0 = 0 (for models including ) and 0 = − 0 for parameter 0 which is to be inferred. For the infected-age-structured model, it is necessary to specify how the 0 initial infecteds are distributed by infected age { 0, }. When the epidemic is growing exponentially the distribution of infecteds by infected age converges to an equilibrium distribution (see Appendix C), thus we assume convergence to this equilibrium distribution in the dynamics prior to day = 0, and for the numerical calculations in this paper we use the equilibrium distribution as the initial condition at = 0.

Observation model
The epidemic models above are deterministic. It is common to account for variability in the observed number of daily deaths, obs , on day , in mechanistic models of infectious disease transmission via a negative binomial observation model (e.g. Mathews et al., 2007;Cauchemez and Ferguson, 2008). The negative binomial model admits overdispersion, which is often present in count data on cases or deaths from an infectious disease owing to spatial and demographic heterogeneities, or other unmodelled processes (Held et al., 2019). We adopt the negative binomial model, assuming that independently for each , where NB( , ) denotes the negative binomial distribution with mean and dispersion parameter defined such that the variance equals + 2 (Robinson and Smyth, 2008).

Inference methods
We denote by * the free parameters that appear in the respective dynamical models, such that = ( * ), and by = ( * , ) the vector of all the parameters including in the observation model (12). The data  = { obs } are the daily deaths indexed by time . We denote by (| ) the likelihood function for under observation model (12). When adopting a Bayesian approach and specifying a prior distribution ( ) on , the posterior distribution for is then In the results sections below, where we compute point estimates of we do so using the maximum a posteriori (MAP) estimator then the inferred dynamics, and the epidemiological parameters inferred from them (defined in Section 2.2.5), are based on solutions of the respective epidemic model with * =̂ * MAP . The MAP estimator corresponds to the widely used maximum-likelihood estimator (MLE) MLE = argmax (| ) if the priors are uninformative, or more generally if ( ) is constant on a domain that containŝM LE and zero elsewhere. Elsewhere we target the full posterior distribution (13) by sampling from it using Markov chain Monte Carlo (MCMC). We used an adaptive Metropolis-Hastings MCMC algorithm (Haario et al., 2001) which involves, after a warm-up phase, adapting the covariance matrix of a multi-variate Gaussian proposal distribution according to the covariance of the accepted samples. Convergence to the stationary distribution is hastened with an initial maximisation of the posterior density with respect to using a covariance matrix adaptation evolution strategy (CMA-ES) (Hansen et al., 2003). These methods were implemented using the PINTS python package (Clerx et al., 2019).
For priors on , those arising from the clinical data are detailed in Table 1 and Section 2.1.3, in addition to which we specify: ∼ (1, 10) and 0 ∼ (1, 5 × 10 7 ) consistent with wide range of possible values for  0 and number of initial infecteds; * ∼  (31, = 3), where = 31 corresponds to where the Google mobility data, shown in Fig. 1A, first suggest a substantial reduction in mobility; and ∼ (0, 1) and ∼ (0, 1) such that both are uniform on their possible ranges of values. For the -type models, transition rates in Eqs. (5), (8) and (9) have upper limits such that, for example, no more than the entirety of a compartment can transition out in a single time step. Consequently, we take ∼ (0, 1) for the , & models, ∼ (0, 1) for the model and , , ∼ (0, 0.5) for the 2 2 2 model. Due to the additional scaling by 1∕ in the relationship between  and for SIR-type models (equation (11)) plausible values for are contracted and we use ∼ (0, 3).

A change in mixing intensity, and not 'herd immunity', is necessary to explain the epidemic dynamics
A theory after decline from the initial epidemic peak was that the epidemic dynamics were affected little by NPIs and could be explained by 'herd immunity' (Lourenço et al., 2020), that is, that  in (10) had reduced to below 1 because the number of remaining susceptibles, , had sufficiently decreased to curtail the growth. The theory that depletion of susceptibles is adequate to explain the dynamics can be tested by fitting both models to the data; the 'herd immunity' theory corresponds to a special case of the general model that has a step change in but with the restriction of having constant = 1, achieved in the model by setting = 1. This supposes no effect from NPIs, and thus that the decline in the epidemic must be on account of the depletion of susceptibles. In this restricted model the parameter * is redundant, therefore the restricted model has two fewer parameters than the unrestricted model. To consider this, we use the model, with the infectiousness profile and infection-to-death distribution parameters fixed to the values shown in Table 1, and noninformative priors on , 0 , * , , , and , then fit the model to the England and Wales deaths data described in Section 2.1.1. We then do likewise for the restricted model, with the extra restriction that = 1 and with * removed.
The fitted models are shown in Fig. 3, from which it is clear that the full model matches well to the data but the restricted model matches very poorly. The difference in the value of the maximised values of the log-posterior, log ( |), for the two models is 260, which for models differing, as here, by two degrees of freedom is overwhelming evidence that the restricted model is inadequate.
Some values of epidemiological parameters in the fitted restricted model also seem implausible; for example, the fitted 0 is greater than 1 million people, and the IFR is ∼4 times smaller than the best estimate from O'Driscoll et al. (2020) -see Fig. 3 for all of the inferred parameter values.
The values of the maximised likelihoods, and visual inspection of the fits, make clear that a change in transmission dynamics over time, via in the model, is necessary to explain the epidemic dynamics during the first outbreak. In other words, it was never plausible that 'herd immunity' was responsible for the end of the first outbreak, as confirmed by the subsequent resurgence of infections in autumn of 2020. We hence continue to use the step-change model of mixing intensity (3) in the following sections.

Inference for epidemiological parameters is unreliable if based on non infected-age structured models
To investigate the impact of model error on the values and uncertainties of inferred parameters -and in particular whether simpler epidemic models such as , D , and 2 2 2 can be used to infer accurately the epidemiological parameterswe generate synthetic data from the time-structured t model ( Section 2.1.1) and observation model (12), with the model parameters as specified in Table 2, then fit the various models to compare the inferred parameter values with the true ones. We include the datagenerating t in the set of models being fitted, primarily to check that the model parameters are indeed identifiable from the deaths data, but emphasise its advantage in being the data-generating model; the D.G. Whittaker et al.

Table 2
Details of the simulation study described in Section 3.2, including parameter values, in addition to those in Table 1, used to create synthetic data from the model; and MAP estimates of the model parameters and derived epidemiological parameters from fitting the various models to the synthetic data. The (-) indicate parameters not relevant to the particular model. The derived parameters are: the basic reproduction number  0 ; the minimum value of , min { }; the index of the day at which  first decreases below 1, argmin { < 1}; the maximum number of new infections on any day, max { − +1 }; the mean, , of the infectiousness profile; and the mean of the infection-to-death distribution, . The (A) and (B) tables differ in that (B) involves extra parameters being fixed when the models are fitted, as described in Section 3. results in this section are hence not suited to advocating for t over the other models, but rather for understanding how precisely understood mis-specifications of the other models lead to errors in epidemiological quantities inferred from them. When fitting the models, we fix the IFR, , to the default value from Table 1 and treat the other parameters as free. Hence the free parameters include those that determine the implied infectiousness profile and the infection-to-death distributions. The data-generating and fitted parameters values are summarised in Table 2A, and results are plotted in Fig. 4. Fig. 4A shows that each fitted model matches broadly well the deaths data. The and D models provide similar and relatively reasonable fits to the synthetic death data, but the peak daily deaths in these model fits occurs almost one week before the true peak.
Each of the simpler models -, , -substantially underestimate  0 compared to the true value, in each case at least by a factor of 1.6, but estimates from 2 2 2 are vastly too large and with very high posterior variance; see Fig. 4B. The latter also had the largest number of free parameters, so we also considered the case with fixed to the value 1∕5.2, which is an appropriate choice for this parameter in the sense that it fixes the model's onset-to-infection interval to match with the mean we assumed when setting the default infection-to-death distribution (see Section 2.1.3). Even with this parameter value fixed, there is little improvement in the inferred values for  0 . During the epidemic decline after introduction of NPIs, the misspecification of the model is less impactful on the inferred values for  , but the inferred time, * , at which NPIs are inferred to impact mixing intensity is highly variable between models: for example it is inferred around 14 days too late for the model, 7 days early for D and 9 days late for ( Table 2). Compared to the data-generating model, the inferred peak of new infections, max ,1 , and the inferred timing of this peak, argmax ,1 , are also very inaccurate. For and D models the peak is approximately 36% too low and the timing is respectively 14 days late and 2 days early. For the SIURD model the inferred peak is approximately 43% too high and 9 days too late. Because here we are fixing the IFR and are fitting to death data, all model versions return approximately the same remnant susceptible pool ( =end ∕ ≈ 0.88) by design. This means that from Eq. (10) if mixing intensity returns to pre-epidemic levels following the first wave ( = 1), and all other parameters are fixed, then  ≈ 0.88 0 . This provides an upper bound on the median  for a subsequent unmitigated outbreak equal to 2.8 in this simulated example. Because the simple SIR-like models underestimate  0 , they also substantially underestimate  for a subsequent unmitigated outbreak, suggesting an upper bound of the median  ≈ 1.4 ( , ) or 1.1 ( ) which are highly misleading. In contrast, estimates of the median unmitigated  for the 2 2 2 models are vastly inflated. In the context of influencing policy-making, the errors arising from the simple  SIR-like models are unacceptably large. Fig. 4c shows that the implied infectiousness profile and infection-to-death distributions for the fitted models were all quite different to the true ones.
In the preceding investigation with synthetic data, the datagenerating model was at an advantage over the other models through having its infectiousness profile and infection-to-death distribution specified correctly, and without free parameters related to these that needed estimating. To what extent did the other models perform poorly on account of their extra parameters, rather than their model mis-specification? To investigate, we fixed the values of those parameters to ''optimal'' values in the sense that the moments of their implied infectiousness profile and infection-to-death distribution matched as closely as possible to the true ones, and with the consequence that the number of free parameters is then the same for each model, including for the data-generating model. The parameters are summarised in Table 2B and explained as follows.   Tables 1 and 2; and dynamics of fitted  ,  , , and 2 2 2 models for the parameters indicated in Table 2B; this version has additional fixed parameters, whose values are chosen a priori such that for each model the infectiousness profile and infection-to-death distribution, shown in (C), matches as closely as possible to the data-generating ones. (B) Violin plots of the derived epidemiological parameters uncertainties for each model and the synthetic data true values (blue dotted lines).
To match the mean of the infectiousness profile to the true one (recalling that the relationship between this profile and the model parameters for the various models is explained in Appendix B), for , and models we take the infectivity rate = 1∕5.2 where, per model there are many choices for and that yield an implied infectiousness profile (Appendix B) with appropriate mean, however in general the variance is higher than that for the clinical infectiousness profile (Section 2.1.3). We therefore fix = , which minimises the variance in the implied infectiousness profile. We then select fixed values for and to match the means of the clinical infectiousness and time-to-death profiles to those of the implied profiles.
The results of this second fitting, with extra fixed parameters, are shown in and the results are shown in Table 2B and Fig. 5. For 2 2 2 in particular the results are much improved. Fig. 5B shows that the derived epidemiological parameters are much more reliably inferred, although  0 is still slightly underestimated and argmin { < 1} overestimated. The better performance is because the inclusion of the state in this model, with the duplicated compartments for , and , enables the fitted infectiousness profile and infection-to-death distribution to match reasonably closely to the true ones, as shown in Fig. 5C. For each of the other models this is not the case, there is a much larger mismatch, as shown in Fig. 5C, and a consequence is that the values of inferred epidemiological parameters, particularly  0 , remain highly unreliable.
In summary, a model such as 2 2 2 could plausibly lead to reasonable inference for epidemiological parameters, provided the number of pre-infectious, infectious and post-infectious compartments is judiciously chosen such that the implied infectiousness profile and infection-to-death distribution can match suitably closely the clinical data; though if, as in this paper, inference is based on deaths data (Section 2.1.3) then it appears important that the parameters of such a model are estimated a priori from the clinical data.
The results in Table 2 & Figs. 4 and 5 are generated from a single synthetic data set; however, the results are representative of the large number repetitions performed. Corresponding results from fitting these models to the real data are in the Supplementary Material (see Table  S1 and Figures S1 & S2). For the real data we have no true parameters with which to compare the inferred parameters for the various models, but we do observe analogous disagreement in the inferred parameters, which we attribute to mis-specification of the infectiousness and infection-to-death distributions in the simpler SIR models.

Bayesian inference leads to small posterior uncertainty when conditioning on the infectiousness profile and infection-to-death distribution
In this section, we adopt a Bayesian approach to characterise the uncertainty in the inferred parameters and predictions. We fixed the infectiousness profile and infection-to-death distribution to their default values from Table 1. The priors for the other parameters are as specified in Section 2.4, and results for fitting the model are shown in Fig. 6. Fig. 6A shows bi-and uni-variate marginal posterior distributions for the model parameters; these suggest relatively small posterior variance, and small posterior covariance except between and ; and 0 and . Fig. 6B shows a good match between the deaths data and the model dynamics based on the maximum a-posteriori (MAP) model parameter values. Fig. 6C shows corresponding inferred new infections, which peak in mid-March, owing to the inferred sharp change in , then decline.
Posterior distributions for derived epidemiological parameters are shown in Fig. 6E. The posterior for  0 is concentrated in the region ∼3 to 3.5; min { } is concentrated around 0.795; almost all the posterior mass for argmin { < 1} is on = 32, which is 18th March; and for max { ,1 } there is somewhat more posterior uncertainty, with values between ∼220,000-400,000 plausible, and the posterior mode at ∼280,000.
The change in matches quite closely to the Google mobility data, both in the timing, * , and the extent of the drop, , as shown in Fig. 6D. These numbers are consistent with a reduction in mixing throughout March (before the official lockdown was mandated) due to voluntary changes in mixing intensity, perhaps driven by media coverage and government announcements prior to lockdown. In other words, the results suggest the Google mobility data may accurately reflect the timing and drop in population mixing around the period a lockdown was mandated.
Our inferred values of  0 are slightly higher than estimates based on an exponential growth model fitted to the death data (2.6 (2.4-2.9), Lonergan and Chalmers (2020)), but consistent with other estimates for the UK (Royal Society SET-C, 2020). Estimates of  after lockdown are consistent with those from exponential decay models fitted to death data (Lonergan and Chalmers, 2020), and real-time estimates based on reported cases in the period April-June 2020 (see Royal Society SET-C (2020) and references therein) or surveyed contact rates in late March 2020 (Jarvis et al., 2020).

Incorporating uncertainty in the infectiousness profile and infection-todeath distribution leads to misleading conclusions owing to model error
Here we repeat the analysis of the previous section, except this time incorporating uncertainty on the four parameters -, 2 , , -describing the infectiousness and infection-to-death distributions. The priors on these parameters are as described in Section 2.1.3 and summarised in Table 1. Results analogous to Fig. 6 are shown in Fig. 7. These results show that posterior variances are somewhat larger but not substantially so, although MAP estimates are considerably different.
The MAP parameter gives visually a similarly good fit between model and observed data (Fig. 7B), predicting daily new infections to peak in early-to mid-March, after which a sharp drop in new infections is observed (Fig. 7C). The inferred mixing intensity in this case still agrees reasonably with the UK Google mobility data (Fig. 7D), but is predicted to drop earlier and further than was inferred in Section 3.3, when the values of those four parameters were instead conditioned upon.
The posteriors for derived parameters show  0 is distributed largely between 5 and 7 and peaking at ∼ 6; min { } is concentrated around 0.72, argmin { < 1} is some time between 11th and 17th March, with posterior peaking on 14th March, and max { ,1 } somewhere between ∼250,000-500,000, peaking at ∼350,000 (Fig. 7E). These posterior distributions are somewhat in conflict with some informative priors (the influence of the priors can be seen in Supplementary Figure  S3). For example, the peak infectiousness occurs ∼ 3 days later than estimated by Ganyani et al. (2020), and an  0 of around 6 is higher than most very early estimates of  0 (Park et al., 2020). However, reproduction numbers of around 6 have been reported for other European countries (Billah et al., 2020).
A possible explanation for the discrepancy in results between this section and Section 3.3, and with estimates from clinical data such as Ganyani et al. (2020), is that it is a consequence of model error.
To investigate this, we repeated the Bayesian inference of the present section and of Section 3.3 for the synthetic data shown in Fig. 4. Doing so leads to posterior distributions that are consistent with the datagenerating parameters, and consistent with each other, with larger variance under the approach of this section in which there is prior uncertainty on the four additional parameters. Results for these cases are in the Supplementary Material ( Supplementary Figures S4 and S5). In other words, the results of the Bayesian inference are as we might expect when the data arise from the assumed model, supporting the possibility that the unexpected results for the real data are on account of model error.
If so, what are the possible sources of model error? The negative binomial observation model heavily penalises discrepancies between modelled and observed deaths when the number of daily deaths is small, especially when the over-dispersion parameter, is small. The MAP estimate of is indeed small -around 10 −3 -especially so with the extra four degrees of freedom considered in this section, which enable the deterministic part of the model to explain more of the variability in the data. A consequence is that the inference is dominated by the model having to match well the data in the period when the number of daily deaths is small but growing rapidly. This is the period when the deterministic component of the model, which assumes a homogeneously mixing population and neglects stochastic D.G. Whittaker et al.  effects in the epidemic process, describes the dynamics least well. In summary: enabling the extra four degrees of freedom in this section appears to lead to an overfitted model, which leads to underestimation of the overdispersion parameter (and variance). The posterior consequently has low variance, but with inference dominated by the dynamics in a period when the model is likely to be least accurate, and the low-variance posterior is located in a region of parameter space that suggests model error is making the results unreliable.
The Supplementary Material contains results of some further exploration of this issue, including where we have fixed at two larger values, 0.01 and 0.1, (Supplementary Figures S6 and S7), which inflates posterior variance and shifts the MAP estimates to more plausible values, e.g. 4.5 for  0 ; and a case with the Negative Binomial observation model replaced by a Gaussian model (Supplementary Figure S8), such that the inference is not dominated by the early period with low prevalence, and for this case again the MAP estimates of parameters are more in accordance with other sources and with the results of Section 3.3.

Summary and discussion
We have explored various simple epidemic models, particularly investigating the importance of accurately characterising the infectiousness profile and the infection-to-death time distribution. These distributions are well characterised in the clinical literature and easy to incorporate into an SIR-type model structured by infected age. The results show that it is essential to incorporate them in models used for inferring the underlying epidemic dynamics from death data. Basic D.G. Whittaker et al. Fig. 7. Results from the Bayesian inference described in Section 3.4. The interpretation of (A-E) are as for Fig. 6. These results are analogous to those in Fig. 6, except here the analysis involved placing priors on -rather than conditioning on -the parameters , 2 , , of the infectiousness profile and infection-to-death distribution.
SIR models implicitly mis-specify the infectiousness and infection-todeath distributions and lead to highly misleading inferences about the impact of NPIs, the time and magnitude of peak infections, and the basic reproduction number.
In the particular context of the spring 2020 SARS-CoV-2 outbreak in England and Wales, we used the infected-age-structured model, , to compare the hypothesis that the epidemic decline was on account of NPIs versus the hypothesis, which was entertained at the time, that the decline was owing to 'herd immunity'. When using distributions for the infectious profile and infection-to-death distribution that are drawn from clinical literature, but fitting all the other model parameters, we found very strong evidence in favour of efficacious NPIs, rather than herd immunity, as the explanation for the epidemic decline (Fig. 3). This conclusion was in spite of accommodating the possibility of an implausibly low IFR, , and high initial number infected, 0 , both of which are to the benefit of the herd immunity hypothesis. This indicates that the death data, combined with infectiousness profile and infectionto-death distributions available from clinical data, enable dismissing of the 'herd immunity' hypothesis, without needing to know the IFR very accurately.
The limitations of very basic SIR-type models in mis-specifying the infectiousness profile and the infection-to-death has been pointed out by Keeling and Rohani (2008) and Wearing et al. (2005). Our work extends this to consider the impact of mis-specification when the model is used to infer epidemiological parameters and dynamics from data, and particularly when there is an abrupt change in population mixing intensity. Our findings show major errors in the inferred dynamicsfor example, the timing of changes in population mixing can be wrong by weeks -using basic SIR-type models, suggesting that untangling the impact of NPIs adopted in quick succession (e.g. Dehning et al., 2020) may be especially error-prone if using a simple SIR-like model. Models such as 2 2 2 that incorporate multiple infected states (in this case pre-infectious, infectious, post-infectious) and multiple sub-compartments per state can potentially perform well if the number of states, the number of sub-compartments, and the values of rate parameters are suitably chosen (see also Hurtado and Kirosingh, 2019;İşlier et al., 2020), but in our view this entails more complexity and no advantage compared using the model. Some models for SARS-CoV-2 transmission were calibrated using multiple data streams, particularly in the UK where data on cases, hospitalisations, prevalence and seroprevalence were also available (Royal Society SET-C, 2020). Doing so diminishes the influence of the deaths data, and hence diminishes the influence of how the infection-to-death distribution is specified, on the inferred quantities. However, other data streams bring their own modelling challenges -for example, how to connect the underlying epidemic model to data on cases when testing capacity is severely limited but rapidly growing -and entail added complexity that we have sought to avoid by basing inference upon the deaths data.
Our estimates for  0 are significantly larger when we take a fully Bayesian approach and fit parameters describing the infectiousness profile and infection-to-death distribution. Much of the variation between early estimates from case data has been attributed to assumptions for the generation time distribution (Park et al., 2020), which is closely linked to the infectiousness profile. The need to quantify uncertainty in estimates of  0 for use in policy-making has been highlighted, for example by Royal Society SET-C (2020). Uncertainty analysis of a spatial agent-based model suggests that parameters that largely determine the infectiousness profile (exposure to onset delays, onset to isolation delays) remain significant sources of overall uncertainty (Edeling et al., 2021). We found that incorporating uncertainty by placing priors with relatively large variance on the four parameters characterising the infectiousness profile and infection-to-death distribution -as opposed to conditioning on them -yielded low-variance posteriors concentrated in implausible regions of parameter space, contrary to our initial expectation that the additional prior uncertainty would largely just inflate the posterior variance. We understand this to be the impact of model error, with the negative binomial observation model leading to inference being dominated by a period in which the deterministic dynamics are least reliable. A tendency of deterministic models to yield overconfident and error-prone estimates of  from early incidence data has also been documented (King et al., 2015).
We have argued in favour of an SI D model structured by infected age over a basic SIR model, especially when the goal is inference from death data, but this model is itself only a coarse populationlevel model with many limitations. The model assumes homogeneous mixing, and neglects stochastic effects which might be appreciable when prevalence is low. To keep the model simple, we accommodated only a step change in the population mixing intensity, . For the observation model for deaths, we assumed a negative binomial model and conditional independence between different days, which are strong parametric assumptions upon which the inference can be sensitive. We did not attempt to model subtleties such as time-dependence in the IFR due to, for e.g., changes in hospital fatality depending on case loads within hospitals , or the time-dependence of generation intervals (Hart et al., 2022). Hence there are many possible sources of model error. In the context of inference with this model, we have found that affording too much freedom through high prior variance on parameters enables overfitting of the deterministic part and amplifies the impact of model error. Erring on the side of choosing priors with high variance is therefore not necessarily a conservative strategy, at least when model error is non-negligible.

Software availability
Open source software to reproduce the results in this paper is available at https://github.com/DGWhittaker/nottingham_covid_modelling.

Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Dominic Whittaker is now an employee of GlaxoSmithKline Plc.

Appendix C. Pseudo-steady distribution of infecteds by infected age
During a phase of epidemic growth with ≈ and = constant with respect to , the distribution of s by infected age reaches a dynamic equilibrium,