Dynamic causal modelling of COVID-19

This technical report describes a dynamic causal model of the spread of coronavirus through a population. The model is based upon ensemble or population dynamics that generate outcomes, like new cases and deaths over time. The purpose of this model is to quantify the uncertainty that attends predictions of relevant outcomes. By assuming suitable conditional dependencies, one can model the effects of interventions (e.g., social distancing) and differences among populations (e.g., herd immunity) to predict what might happen in different circumstances. Technically, this model leverages state-of-the-art variational (Bayesian) model inversion and comparison procedures, originally developed to characterise the responses of neuronal ensembles to perturbations. Here, this modelling is applied to epidemiological populations—to illustrate the kind of inferences that are supported and how the model per se can be optimised given timeseries data. Although the purpose of this paper is to describe a modelling protocol, the results illustrate some interesting perspectives on the current pandemic; for example, the nonlinear effects of herd immunity that speak to a self-organised mitigation process.


Introduction
The purpose of this paper is to show how dynamic causal modelling can be used to make predictions-and test hypotheses-about the ongoing coronavirus pandemic (Huang et al., 2020;Wu et al., 2020;Zhu et al., 2020). It should be read as a technical report 1 , written for people who want to understand what this kind of modelling has to offer (or just build an intuition about modelling pandemics). It contains a sufficient level of technical detail to implement the model using MATLAB (or its open source version Octave), while explaining things heuristically for non-technical readers. The examples in this report are used to showcase the procedures and subsequent inferences that can be drawn. Having said this, there are some quantitative results that will be of general interest. These results are entirely conditional upon the model used.
Dynamic causal modelling (DCM) refers to the characterisation of coupled dynamical systems in terms of how observable data are generated by unobserved (i.e., latent or hidden) causes (Friston et al., 2003;Moran et al., 2013). Dynamic causal modelling subsumes state estimation and system identification under one Bayesian procedure, to provide probability densities over unknown latent states (i.e., state estimation) and model parameters (i.e., system identification), respectively. Its focus is on estimating the uncertainty about these estimates to quantify the evidence for competing models, and the confidence in various predictions. In this sense, DCM combines data assimilation and uncertainty quantification within the same optimisation process. Specifically, the posterior densities (i.e., Bayesian beliefs) over states and parameters-and the precision of random fluctuations-are optimised by maximising a variational bound on the model's marginal likelihood, also known as model evidence. This bound is known as variational free energy or the evidence lower bound (ELBO) in machine learning (Friston et al., 2007;Hinton & Zemel, 1993;MacKay, 1995;Winn & Bishop, 2005).
Intuitively, this means one is trying to optimise probabilistic beliefs-about the unknown quantities generating some data-such that the (marginal) likelihood of those data is as large as possible. The marginal likelihood 2 or model evidence can always be expressed as accuracy minus complexity. This means that the best models provide an accurate account of some data as simply as possible. Therefore, the model with the highest evidence is not necessarily a description of the process generating data: rather, it is the simplest description that provides an accurate account of those data. In short, it is 'as if' the data were generated by this kind of model. Importantly, models with the highest evidence will generalise to new data and preclude overfitting, or overconfident predictions about outcomes that have yet to be measured. In light of this, it is imperative to select the parameters or models that maximise model evidence or variational free energy (as opposed to goodness of fit or accuracy). However, this requires the estimation of the uncertainty about model parameters and states, which is necessary to evaluate the (marginal) likelihood of the data at hand. This is why estimating uncertainty is crucial. Being able to score a model-in terms of its evidence-means that one can compare different models of the same data. This is known as Bayesian model comparison and plays an important role when testing different models or hypotheses about how the data are caused. We will see examples of this later. This aspect of dynamic causal modelling means that one does not have to commit to a particular form (i.e., parameterisation) of a model. Rather, one can explore a repertoire of plausible models and let the data decide which is the most apt.
Dynamic causal models are generative models that generate consequences (i.e., data) from causes (i.e., hidden states and parameters). The form of these models can vary depending upon the kind of system at hand. Here, we use a ubiquitous form of model; namely, a mean field approximation to loosely coupled ensembles or populations. In the neurosciences, this kind of model is applied to populations of neurons that respond to experimental stimulation (Marreiros et al., 2009;Moran et al., 2013). Here, we use the same mathematical approach to model a population of individuals and their response to an epidemic. The key idea behind these (mean field) models is that the constituents of the ensemble are exchangeable; in the sense that sampling people from the population at random will give the same average as following one person over a long period of time. Under this assumption 3 , one can then work out, analytically, how the probability distribution over various states of people evolve over time, e.g., whether someone was infected or not. This involves parameterising the probability that people will transition from one state to another. By assuming the population is large, one can work out the likelihood of observing a certain number of people who were infected, given the probabilistic state of the population at that point in time. In turn, one can work out the probability of a sequence or timeseries of new cases. This is the kind of generative model used here, where the latent states were chosen to generate the data that are-or could be-used to track a pandemic. Figure 1 provides an overview of this model. In terms of epidemiological models, this can be regarded as an extended SEIR (susceptible, exposed, infected and recovered) compartmental model (Kermack et al., 1997). Please see (Kucharski et al., 2020) for an application of this kind of model to COVID-19 4 .
There are number of advantages to using a model of this sort. First, it means that one can include every variable that 'matters', such that one is not just modelling the spread of an infection but an ensemble response in terms of behaviour (e.g., social distancing). This means that one can test hypotheses about the contribution of various responses that are installed in the model-or what would happen under a different kind of response. A second advantage of having a generative model is that one can evaluate its evidence in relation to alternative models, and therefore optimise the structure of the model itself. For example, does social distancing behaviour depend upon the number of people who are infected? Or, does it depend on how many people have tested positive for COVID-19? (this question is addressed below). A third advantage is more practical, in terms of data analysis: because we are dealing with ensemble dynamics, there is no need to create multiple realisations or random samples to estimate uncertainty. This is because the latent states are not the states of an individual but the sufficient statistics of a probability distribution over individual states. In other words, we replace random fluctuations in hidden states with hidden states that parameterise random fluctuations. The practical consequence of this is that one can fit these models quickly and efficiently-and perform model comparisons over thousands of models. A fourth advantage is that, given a set of transition probabilities, the ensemble dynamics are specified completely. This has the simple but important consequence that the only unknowns in the model are the parameters of these transition probabilities. Crucially, in this model, these do not change with time. This means that we can convert what would have been a very complicated, nonlinear state space model for data assimilation into a nonlinear mapping from some unknown (probability transition) parameters to a sequence of observations. We can therefore make precise predictions about the long-term future, under particular circumstances. This follows because the only uncertainty about outcomes inherits from the uncertainty about the parameters, which do not change with time. These points may sound subtle; however, the worked examples below have been chosen to illustrate these properties.
This technical report comprises four sections. The first details the generative model, with a focus on the conditional dependencies that underwrite the ensemble dynamics generating outcomes. The outcomes in question here pertain to a regional outbreak. This can be regarded as a generative model for the first wave of an epidemic in a large city or metropolis. This section considers variational model inversion and comparison, under hierarchical models. In other words, it considers the distinction between (first level) models of an outbreak in one country and (second level) models of differences among countries, in terms of model parameters. The second section briefly surveys the results of second level (between-country) modelling, looking at those aspects of the model that are conserved over countries (i.e., random effects) and those which are not (i.e., fixed effects). The third section then moves on to the dynamics and predictions for a single country; here, the United Kingdom. It considers the likely outcomes over the next few weeks and how confident one can be about these outcomes, given data from all countries to date. This section drills down on the parameters that matter in terms of affecting death rates. It presents a sensitivity analysis that establishes the contribution of parameters or causes in the model to eventual outcomes. It concludes by looking at the effects of social distancing and herd immunity. The final section concludes with a consideration of predictive validity by comparing predicted and actual outcomes.

The generative model
This section describes the generative model summarised schematically in Figure 1, while the data used to invert or fit this model are summarised in Figure 2. These data comprise global (worldwide) timeseries from countries and regions from the initial reports of positive cases in China to the current day 5 .
The generative model is a mean field model of ensemble dynamics. In other words, it is a state space model where the states correspond to the sufficient statistics (i.e., parameters) of a probability distribution over the states of an ensemble or population-here, a population of people who are in mutual contact at some point in their daily lives. This kind of model is used routinely to model populations of neurons, where the ensemble dynamics are cast as density dynamics, under Gaussian assumptions about the probability densities; e.g., (Marreiros et al., 2009). In other words, a model of how the mean and covariance of a population affects itself and the means and covariances of other populations. Here, we will focus on a single population and, crucially, use a discrete state space model. This means that we will be dealing with the sufficient statistics (i.e. expectations) of the probability of being in a In brief, this compartmental model generates timeseries data based on a mean field approximation to ensemble or population dynamics. The implicit probability distributions are over four latent factors, each with four levels or states. These factors are sufficient to generate measurable outcomes; for example, the number of new cases or the proportion of people infected. The first factor is the location of an individual, who can be at home, at work, in a critical care unit (CCU) or in the morgue. The second factor is infection status; namely, susceptible to infection, infected, infectious or immune. This model assumes that there is a progression from a state of susceptibility to immunity, through a period of (pre-contagious) infection to an infectious (contagious) status. The third factor is clinical status; namely, asymptomatic, symptomatic, acute respiratory distress syndrome (ARDS) or deceased. Again, there is an assumed progression from asymptomatic to ARDS, where people with ARDS can either recover to an asymptomatic state or not. Finally, the fourth factor represents diagnostic or testing status. An individual can be untested or waiting for the results of a test that can either be positive or negative. With this setup, one can be in one of four places, with any infectious status, expressing symptoms or not, and having test results or not. Note that-in this construction-it is possible to be infected and yet be asymptomatic. However, the marginal distributions are not independent, by virtue of the dynamics that describe the transition among states within each factor. Crucially, the transitions within any factor depend upon the marginal distribution of other factors. For example, the probability of becoming infected, given that one is susceptible to infection, depends upon whether one is at home or at work. Similarly, the probability of developing symptoms depends upon whether one is infected or not. The probability of testing negative depends upon whether one is susceptible (or immune) to infection, and so on. Finally, to complete the circular dependency, the probability of leaving home to go to work depends upon the number of infected people in the population, mediated by social distancing. The curvilinear arrows denote a conditioning of transition probabilities on the marginal distributions over other factors. These conditional dependencies constitute the mean field approximation and enable the dynamics to be solved or integrated over time. At any point in time, the probability of being in any combination of the four states determines what would be observed at the population level. For example, the occupancy of the deceased level of the clinical factor determines the current number of people who have recorded deaths. Similarly, the occupancy of the positive level of the testing factor determines the expected number of positive cases reported. From these expectations, the expected number of new cases per day can be generated. A more detailed description of the generative model-in terms of transition probabilities-can be found in in the main text. Figure 2. timeseries data. This figure provides a brief overview of the timeseries used for subsequent modelling, with a focus on the early trajectories of mortality. The upper left panel shows the distribution, over countries, of the number of days after the onset of an outbreakdefined as 8 days before more than one case was reported. At the time of writing (4 th April 2020), a substantial number of countries witnessed an outbreak lasting for more than 60 days. The upper right panel plots the total number of deaths against the durations in the left panel. Those countries whose outbreak started earlier have greater cumulative deaths. The middle left panel plots the new deaths reported (per day) over a 48-day period following the onset of an outbreak. The colours of the lines denote different countries. These countries are listed in the lower left panel, which plots the cumulative death rate. China is clearly the first country to be severely affected, with remaining countries evincing an accumulation of deaths some 30 days after China. The middle right panel is a logarithmic plot of the total deaths against population size in the initial (48-day) period. Interestingly, there is little correlation between the total number of deaths and population size. However, there is a stronger correlation between the total number of cases reported (within the first 48 days) and the cumulative deaths as shown in lower right panel. In this period, Germany has the greatest ratio of total cases to deaths. Countries were included if their outbreak had lasted for more than 48 days and more than 16 deaths had been reported. The timeseries were smoothed with a Gaussian kernel (full width half maximum of two days) to account for erratic reporting (e.g., recording deaths over the weekend). particular state at any one time. This renders the model a compartmental model (Kermack et al., 1997), where each state corresponds to a compartment. These latent states evolve according to transition probabilities that embody the causal influences and conditional dependencies that lend an epidemic its characteristic form. Our objective is to identify the right conditional dependencies-and form posterior beliefs about the model parameters that mediate these dependencies. Having done this, we can then simulate an entire trajectory into the distant future, even if we are only given data about the beginning of an outbreak 6 .
The model considers four different sorts of states (i.e., factors) that provide a description of any individual-sampled at random-that is sufficient to generate the data at hand. In brief, these factors were chosen to be as conditionally independent as possible to ensure an efficient estimation of the model parameters 7 . The four factors were an individual's location, infection status, clinical status and diagnostic status. In other words, we considered that any member of the population can be characterised in terms of where they were, whether they were infected, infectious or immune, whether they were showing mild and severe or fatal symptoms, and whether they had been tested with an ensuing positive or negative result. Each of these factors had four levels. For example, the location factor was divided into home, work, critical care unit, and the morgue. These states should not be taken too literally. For example, home stands in for anywhere that has a limited risk of exposure to, or contact with, an infected person (e.g., in the domestic home, in a non-critical hospital bed, in a care home, etc). Work stands in for anywhere that has a larger risk of exposure to-or contact with-an infected person and therefore covers non-work activities, such as going to the supermarket or participating in team sports. Similarly, designating someone as severely ill with acute respiratory distress syndrome (ARDS) is meant to cover any life-threatening conditions that would invite admission to intensive care.
Having established the state space, we can now turn to the causal aspect of the dynamic causal model. The causal structure of these models depends upon the dynamics or transitions from one state or another. It is at this point that a mean field approximation can be used. Mean field approximations are used widely in physics to approximate a full (joint) probability density with the product of a series of marginal densities (Bressloff & Newby, 2013;Marreiros et al., 2009;Schumacher et al., 2015;Zhang et al., 2019). In this case, the factorisation is fairly subtle: we will factorise the transition probabilities, such that the probability of moving among states-within each factor-depends upon the marginal distribution of other factors (with one exception). For example, the probability of developing symptoms when asymptomatic depends on, and only on, the probability that I am infected. In what follows, we will step through the conditional probabilities for each factor to show how the model is put together (and could be changed).

Transition probabilities and priors
The first factor has four levels, home, work, CCU and the morgue. People can leave home but will always return (with unit probability) over a day. The probability of leaving home has a (prior) baseline rate of one third but is nuanced by any social distancing imperatives. These imperatives are predicated on the proportion of the population that is currently infected, such that the social distancing parameter (an exponent) determines the probability of leaving home 8 . For example, social distancing is modelled as the propensity to leave home and expose oneself to interpersonal contacts. This can be modelled with the following transition probability: This means that the probability of leaving home, given I have no symptoms, is the probability I would have gone out normally, multiplied by a decreasing function of the proportion of people in the population who are infected. Formally, this proportion is the marginal probability of being infected, where the marginal probability of a factor is an average over the remaining factors. The marginal probability p l of the location factor is as follows: Where the final four equalities define each factor or state in the model. The parameters in this social distancing model are the probability of leaving home every day (θ out ) and the social distancing exponent (θ sde ).
The only other two places one can be are in a CCU or the morgue. The probability of moving to critical care depends upon bed (i.e., hospital) availability, which is modelled as a sigmoid function of the occupancy of this state (i.e., the probability that a CCU bed is occupied) and a bed capacity parameter (a threshold). If one has severe symptoms, then one stays in the CCU. Finally, the probability of moving to the morgue depends on, and only on, being deceased. Note that all these dependencies are different states of the clinical factor (see below). This means we can write the transition probabilities among the location factor for each level of the clinical factor as follows (with a slight abuse of notation): Here, the columns and rows of each transition probability matrix are ordered: home, work, CCU, morgue. The column indicates the current location and the row indicates the next location. Parameter θ cap is bed capacity threshold and is a decreasing sigmoid function. In brief, these transition probabilities mean that I will go out when asymptomatic, unless social distancing is in play. However, when I have symptoms I will stay at home, unless I am hospitalised with acute respiratory distress. I remain in critical care unless I recover and go home or die and move to the morgue, where I stay. Technically, the morgue is an absorbing state.
In a similar way, we can express the probability of moving between different states of infection (i.e., susceptible, infected, infectious and immune) as follows: These transition probabilities mean that when susceptible, the probability of becoming infected depends upon the number of social contacts-which depends upon the proportion of time spent at home. This dependency is parameterised in terms of a transition probability per contact (θ trn ) and the expected number of contacts at home (θ Rin ) and work (θ Rou ) 9 . Once infected, one remains in this state for a period of time that is parameterised by a transition rate (θ inf ). This parameterisation illustrates a generic property of transition probabilities; namely, an interpretation in terms of rate constants and, implicitly, time constants. The rate parameter θ is related to the rate constant κ and time constant τ according to: In other words, the probability of staying in any one state is determined by the characteristic length of time that state is occupied. This means that the rate parameter above can be specified, a priori, in terms of the number of days we expect people to be infected, before becoming infectious. Similarly, we can parameterise the transition from being infectious to being immune in terms of a typical period of being contagious, assuming that immunity is enduring and precludes reinfection 10 .
Note that in the model, everybody in the morgue is treated as having acquired immunity. The transitions among clinical states depend upon both the infection status and location as follows: ( ) ( ) The transitions among clinical states (i.e., asymptomatic, symptomatic, ARDS and deceased) are relatively straightforward: if I am not infected (i.e., susceptible or immune) I will move to the asymptomatic state, unless I am dead. However, if I am infected (i.e., infected or infectious), I will develop symptoms with a particular probability (θ dev ). Once I have developed symptoms, I will remain symptomatic and either recover to an asymptomatic state or develop acute respiratory distress with a particular probability (θ sev ). The parameterisation of these transitions depends upon the typical length of time that I 9 Here, inf trn infectious P p θ = − ⋅ can be interpreted as a probability of eluding infection with each interpersonal contact, such that the probability of remaining uninfected after θ R contacts is given by P θ R. Note, that there is no distinction between people at home and at work; both are equally likely to be infectious.
We can now assemble these transition probabilities into a probability transition matrix, and iterate from the first day to some time horizon, to generate a sequence of probability distributions over the joint space of all factors: Notice that this is a completely deterministic state space model, because all the randomness is contained in the probabilities. Notice also that the transition probability matrix T is both state and time dependent, because the transition probabilities above depend on marginal probabilities.
In this approximation, the number of contacts I make is a weighted average of the number of people I could infect at home and the number of people I meet outside, per day, times the number of days I am contagious. The effective reproduction rate is not a biological rate constant. However, it is a useful epidemiological summary statistic that indicates how quickly the disease spreads through a population. When less than one, the infection will decay to an endemic equilibrium. We will use this measure later to understand the role of herd immunity.
This completes the specification of the generative model of latent states. A list of the parameters and their prior means (and variances) is provided in Table 1. Notice that all of the parameters are scale parameters, i.e., they are rates or probabilities that cannot be negative. To enforce these positivity constraints, one applies a log transform to the parameters during model inversion or fitting. This has the advantage of being able to simplify the numerics using Gaussian assumptions about the prior density (via a lognormal assumption). In other words, although the scale parameters are implemented as probabilities or rates, they are estimated as log parameters, denoted by Note that prior variances are specified for log parameters. For example, a variance of 1/64 corresponds to a prior confidence interval of ~25% and can be considered weakly informative.
These prior expectations should be read as the effective rates and time constants as they manifest in a real-world setting. For example, a three-day period of contagion is shorter than the period that someone might be infectious (Wölfel et al., 2020) 14 , on the (prior) assumption that they will self-isolate, when they realise they could be contagious.

Initial conditions and population size
Further parameters are required to generate data, such as the size of the population and the number of people who are initially 11 It is revealing to note that the number of model parameters pertaining to PCR testing matches the number of parameters mediating the epidemiology per se. This reflects the fact that the generative model has to consider every aspect of how data are generated. In order to leverage the information in new positive tests, it is necessary to think carefully about all the parameters that contribute to these data; for example, the probability of being tested and the selection bias towards testing people who are more likely to be infected. Crucially, this bias has to be estimated during model inversion and could vary substantially from country to country. Although not implemented in this report, subsequent distinctions between Pillar 1 and 2 test data would be a nice example of different selection biases. This speaks to the importance of modelling Pillar 1 and 2 as distinct data modalities. From a technical perspective, equipping standard epidemiological models with an 'observation model' can be regarded as building a complete dynamic causal model. The key thing to bear in mind here is that the parameters of so-called observation models have to be treated in exactly the same way as epidemiological parameters, because they could show conditional dependencies. In dynamic causal modelling, all unknown parameters are treated in a uniform way to maximise (a free energy bound on) marginal likelihood.
12 Notice that this model is configured for new cases that are reported based on buccal swabs (i.e., am I currently infected?), not tests for antibody or immunological status. A different model would be required for forthcoming tests of immunity (i.e., have I been infected?). Furthermore, one might consider the sensitivity and specificity of any test by including sensitivity and specificity in (1.7). For example, 1 in 3 tests may be false negatives; especially, when avoiding bronchoalveolar lavage to minimise risk to clinicians: Wang et al., 2020b. Detection of SARS-CoV-2 in Different Types of Clinical Specimens. JAMA. 13 Added in revision: the reproduction ratio in this report was based upon an approximation to the expected number of people that I might infect, if I was infectious. In subsequent reports, the reproduction ratio was brought into line with more formal definitions, based on the geometric rate of increase in the prevalence of infection and the period of contagion. A minimum reproduction ratio (R) of nearly zero in this report corresponds to about 0.7 in subsequent (and other) reports.
14 Shedding of COVID-19 viral RNA from sputum can outlast the end of symptoms. Seroconversion occurs after 6-12 days but is not necessarily followed by a rapid decline of viral load. infected (θ N , θ n ) 15 , which parameterise the initial state of the population (where ⊗ denotes a Kronecker tensor product): In this technical report, we will choose a simpler option that treats a pandemic as a set of linked point processes that can be modelled as rare events. In other words, we will focus on modelling a single outbreak in a region or city and treat the response of the 'next city' as a discrete process post hoc. This simplifies the generative model; in the sense we only have to worry about the ensemble dynamics of the population that comprises one city . A complimentary perspective on this choice is that we are trying to model the first wave of an epidemic as it plays out in the first city to be affected. Any second wave can then be treated as the first wave of another city or region.

Effective Population:
Under the initial conditions, the population size can be set, a priori, to 1,000,000; noting that a small city comprises (by definition) a hundred thousand people, while a large city can exceed 10 million. This population parameter is a prior that is updated based on the available data, providing an estimate of the "effective population" size. Effective population is defined here as the proportion of the total population who are susceptible to infection, and therefore participate in the outbreak.
The assumption that the effective population size reflects the total population of a country is a hypothesis that we will test later 16 . For clarity, we are not implying that the remainder of the population classed as "not susceptible" are immune or resistant to COVID-19, rather there exists a sub-population who do not take part in the current outbreak for any of a variety of reasons that may include being shielded or geographically isolated from infected cases. Furthermore, as the effective population (and other parameters) are estimated directly from the data, they will therefore reflect the source of the information. At the time of writing, in the UK this was dominated by the London outbreak. Finally, as all parameters pertain to the effective population, proportions (or probabilities)-such as population immunityrequire appropriate scaling to be expressed as a percentage of the total (census) population.
The likelihood or observation model The outcomes considered in Figure 2 are new cases (of positive tests and deaths) per day. These can be generated by multiplying the appropriate probability by the (effective) population size. The appropriate probabilities here are just the expected occupancy of positive test and deceased states, respectively. Because we are dealing with large populations, the likelihood of any observed daily count has a binomial distribution that can be approximated by a Gaussian density 17 .
Here, outcomes are counts of rare events with a small probability π << 1 of occurring in a large population of size n >> 1. For example, the likelihood of observing a timeseries of daily deaths can be expressed as a function of the model parameters as follows: The advantage of this limiting (large population) case is that a (variance stabilising) square root transform of the data counts renders their variance unity. With the priors and likelihood model in place, we now have a full joint probability over causes (parameters) and consequences (outcomes). This is the generative model ( , ) ( | ) ( ) One can now use standard variational techniques (Friston et al., 2007) to estimate the posterior over model parameters and evaluate a variational bound on the model evidence or marginal likelihood. Mathematically, this is expressed as follows: Table 1 also includes a parameter for the proportion of people who are initially immune, which we will call on later.
These expressions show that maximising the variational free energy F with respect to an approximate posterior Q(ϑ) renders the Kullback-Leibler (KL) divergence between the true and approximate posterior as small as possible. At the same time, the free energy becomes a lower bound on the log evidence. The free energy can then be used to compare different models, where any differences correspond to a log Bayes factor or odds ratio (Kass & Raftery, 1995;Winn & Bishop, 2005).

Bayesian model comparison
One may be asking why we have chosen this particular state space and this parameterisation? Are there alternative model structures or parameterisations that would be more fit for purpose? The answer is that there will always be a better model, where 'better' is a model that has more evidence. This means that the model has to be optimised in relation to empirical data. This process is known as Bayesian model comparison based upon model evidence (Winn & Bishop, 2005). For example, in the above model we assumed that social distancing increases as a function of the proportion of the population who are infected (1.1). This stands in for a multifactorial influence on social behaviour that may be mediated in many ways. For example, government advice, personal choices, availability of transport, media reports of 'panic buying' and so on. So, what licenses us to model the causes of social distancing in terms of a probability that any member of the population is infected? The answer rests upon Bayesian model comparison. When inverting the model using data from countries with more than 16 deaths (see Figure 2), we obtained a log evidence (i.e., variational free energy) of -15701 natural units (nats). When replacing the cause of social distancing with the probability of encountering someone with symptoms-or the number of people testing positive-the model evidence fell substantially to -15969 and -15909 nats, respectively. In other words, there was overwhelming evidence in favour of infection rates as a primary drive for social distancing, over and above alternative models. We will return to the use of Bayesian model comparison later, when asking what factors determine differences between each country's response to the pandemic. Table 1 lists all the model parameters; henceforth, DCM parameters. In total, there are 21 DCM parameters. This may seem like a large number to estimate from the limited amount of data available (see Figure 2). The degree to which a parameter is informed by the data depends upon how changes in the parameter are expressed in data space. For example, increasing the effective population size will uniformly elevate the expected cases per day. Conversely, decreasing the number of initially infected people will delay the curve by shifting it in time. In short, a parameter can be identified if it has a relatively unique expression in the data. This speaks to an important point, the information in the data is not just in the total count-it is in the shape or form of the transient 18 .

Summary
On this view, there are many degrees of freedom in a timeseries that can be leveraged to identify a highly parameterised model. The issue of whether the model is over parameterised or under parameterised is exactly the issue resolved by Bayesian model comparison; namely, the removal of redundant parameters to suppress model complexity and ensure generalisation: see (1.13) 19 . One therefore requires the best measures of model evidence. This is the primary motivation for using variational Bayes; here, variational Laplace (Friston et al., 2007). The variational free energy, in most circumstances, provides a better approximation than alternatives such as the widely used Akaike information criteria and the widely used Bayesian information criteria (Penny, 2012).
One special aspect of the model above is that it has absorbing states. For example, whenever one enters the morgue, becomes immune, dies or has a definitive test result, one stays in that state: see Figure 1. This is important, because it means the long-term behaviour of the model has a fixed point. In other words, we know what the final outcomes will be. These outcomes are known as endemic equilibria. This means that the only uncertainty is about the trajectory from the present point in time to the distant future. We will see later that-when quantified in terms of Bayesian credible intervals-this uncertainty starts to decrease as we go into the distant future. This should be contrasted with alternative models that do not parameterise the influences that generate outcomes and therefore call upon exogenous inputs (e.g., statutory changes in policy or changes in people's behaviour). If these interventions are unknown, they will accumulate uncertainty over time. By design, we elude this problem by including everything that matters within the model and parameterising strategic responses (like social distancing) as an integral part of the transition probabilities.
We have made the simplifying assumption that every country reporting new cases is, effectively, reporting the first wave of an affected region or city. Clearly, some countries could suffer simultaneous outbreaks in multiple cities. This is accommodated by an effective population size that could be greater than the prior expectation of 1 million. This is an example of finding a simple model that best predicts outcomes-that may not be a veridical reflection of how those outcomes were actually generated. In other words, we will assume that each country behaves as if it has a single large city of at-risk denizens. In the next section, we look at the parameter estimates that obtain by pooling information from all countries, with a focus on between country differences, before turning to the epidemiology of a single country (the United Kingdom).
Hitherto, we have focused on a generative model for a single city. However, in a pandemic, many cities will be affected. This calls for a hierarchical generative model that considers the response of each city at the first level and a global response at the second. This is an important consideration because it means, from a Bayesian perspective, knowing what happens elsewhere places constraints (i.e., Bayesian shrinkage priors) on estimates of what is happening in a particular city. Clearly, this rests upon the extent to which certain model parameters are conserved from one city to another-and which are idiosyncratic or unique. This is a problem of hierarchical Bayesian modelling or parametric empirical Bayes (Friston et al., 2016;Kass & Steffey, 1989). In the illustrative examples below, we will adopt a second level model in which key (log) parameters are sampled from a Gaussian distribution with a global (worldwide) mean and variance. From the perspective of the generative model, this means that to generate a pandemic, one first samples city-specific parameters from a global distribution, adds a random effect, and uses the ensuing parameters to generate a timeseries for each city.

Parametric empirical Bayes and hierarchical models
This section considers the modelling of country-specific parameters, under a simple (general linear) model of between-country effects. This (second level) model requires us to specify which parameters are shared in a meaningful way between countries and which are unique to each country. Technically, this can be cast as the difference between random and fixed effects. Designating a particular parameter as a random effect means that this parameter was generated by sampling from a countrywide distribution, while a fixed effect is unique to each country. Under a general linear model, the distribution for random effects is Gaussian. In other words, to generate the parameter for a particular country, we take the global expectation and add a random Gaussian variate, whose variance has to be estimated under suitable hyperpriors. Furthermore, one has to specify systematic differences between countries in terms of independent variables; for example, does the latitude of a country have any systematic effect on the size of the at-risk population? The general linear model used here comprises a constant (i.e., the expectation or mean of each parameter over countries), the (logarithms of) total population size, and a series of independent variables based upon a discrete sine transform of latitude and longitude. The latter variables stand in for any systematic and geopolitical differences among countries that vary smoothly with their location. Notice that the total population size may or may not provide useful constraints on the effective size of the population at the first level. Under this hierarchical model, a bigger country may have a transport and communication infrastructure that could reduce the effective (at risk) population size. A hint that this may be the case is implicit in Figure 2, where there is no apparent relationship between the early incidence of deaths and total population size.
In the examples below, we treated the number of initial cases and the parameters pertaining to testing as fixed effects and all remaining parameters as random effects. The number of initial infected people determines the time at which a particular country evinces its outbreak. Although this clearly depends upon geography and other factors, there is no a priori reason to assume a random variation about an average onset time. Similarly, we assume that each country's capacity for testing was a fixed effect; thereby accommodating non-systematic testing or reporting strategies 20 . Note that in this kind of modelling, outcomes such as new cases can only be interpreted in relation to the probability of being tested and the availability of tests 21 .
With this model in place, we can now use standard procedures for parametric empirical Bayesian modelling (Friston et al., 2016;Kass & Steffey, 1989) to estimate the second level parameters that couple between-country independent variables to country-specific parameters of the DCM. However, there are a large number of these parameters-that may or may not contribute to model evidence. In other words, we need some way of removing redundant parameters based upon Bayesian model comparison. This calls upon another standard procedure called  . Each of these models corresponds to a particular combination of parameters that have been 'switched off', by shrinking their prior variance to zero. By averaging the posterior estimates in proportion to the evidence for each model, -known as Bayesian model averaging (Hoeting et al., 1999)-we can eliminate redundant parameters and thereby provide a simpler explanation for differences among countries. This is illustrated in the lower panels, which show the posterior densities before (left) and after (right) Bayesian model reduction. These estimates are shown in terms of their expectation or maximum a posteriori (MAP) value (as blue bars), with 90% Bayesian credible intervals (as pink bars).
The first 21 parameters are the global expectations of the DCM parameters. The remaining parameters are the coefficients that link various independent variables at the second level to the parameters of the transition probabilities at the first. Note that a substantial number of second level parameters have been removed; however, many are retained. This suggests that there are systematic variations over countries in certain random effects at the country level. Figure 4 provides an example based upon the largest effect mediated by the independent variables. In this analysis, latitude (i.e., distance from the South Pole) appears to reduce the effective size of an at-risk population. In other words, countries in the northern hemisphere have a smaller effective population size, relative to countries in the southern hemisphere. Clearly, there may be many reasons for this; for example, systematic differences in temperature or demographics. The key thing to take from this analysis is the tight credible intervals on the parameters, when averaging in this way. According to this analysis, the number of effective contacts at home is about three people, while this increases by an order of magnitude to about 30 people when leaving home. The symptomatic and acute respiratory distress periods have been estimated here at about five and 13 days respectively, with a delay in testing of about two days. These are the values that provide the simplest explanation for the global data at hand-and are in line with empirical estimates 22 . Figure 6 shows the country-specific parameter estimates for 12 of the 21 DCM parameters. These posterior densities were evaluated under the empirical priors from the parametric empirical Bayesian analysis above. As one might expect-in In this instance, the models compared are at the second or between-country level. In other words, the models compared contained all combinations of (second level) parameters (a parameter is removed by setting its prior variance to zero). If the model evidence increases-in virtue of reducing model complexity-then this parameter is redundant. The upper panels show the relative evidence of the most likely 256 models, in terms of log evidence (left panel) and the corresponding posterior probability (right panel). Redundant parameters are illustrated in the lower panels by comparing the posterior expectations before and after the Bayesian model reduction. The blue bars correspond to posterior expectations, while the pink bars denote 90% Bayesian credible intervals. The key thing to take from this analysis is that a large number of second level parameters have been eliminated. These second level parameters encode the effects of population size and geographical location, on each of the parameters of the generative model. The next figure illustrates the nonredundant effects that can be inferred with almost 100% posterior confidence. Here, the effective size of the population appears to depend upon the latitude of a country. The right panel shows the absolute values of the GLM parameters in matrix form, showing that the effective size of the population was most predictable (the largest values are in white), though not necessarily predictable by total population size. The red circle highlights the parameter mediating the relationship illustrated in the left panel. 25 Or, indeed, a previous pandemic, such as the 2009 H1H1 pandemic. We will return to this in the conclusion. 23 https://en.wikipedia.org/wiki/Greater_London 24 However, there does appear to be some predictive validity to these that are addressed in an epilogue. Note rather than dissect the predictive validity of each parameter and country, which is widely recognised as a challenging problem (Moghadas, S.M., Shoukat, A., Fitzpatrick, M.C., Wells, C.R., Sah, P., Pandey, A., Sachs, J.D., Wang, Z., Meyers, L.A., Singer, B.H., Galvani, A.P., 2020. Projecting hospital utilization during the COVID-19 outbreaks in the United States. Proc Natl Acad Sci U S A 117, 9122-9126.), we have provided some representative examples. A comprehensive analysis of this type would be beyond the scope of this report. It is also important to note that predictions based upon rate parameters and probabilities are a reflection of prior assumptions about these parameters, whereas predictions based upon the hidden states speak to the predictive validity of the DCM model structure (see below). virtue of the second level effects that survived Bayesian model reduction-there are some substantial differences between countries in certain parameters. For example, the effective population size in the United States of America is substantially greater than elsewhere at about 25 million (the population in New York state is about 19.4 million). The effective population size in the UK (dominated by cases in London) is estimated to be about 2.5 million (London has a population of about 8.96 million) 23 . Social distancing seems to be effective and sensitive to infection rates in France but much less so in Canada. The efficacy of social distancing in terms of the difference between the number of contacts at home and work is notably attenuated in the United Kingdom-that has the greatest number of home contacts and the least number of work contacts. Other notable differences are the increased probability of fatality in critical care evident in China. This is despite the effective population size being only about 2.5 million. Again, these assertions are not about actual states of affairs. These are the best explanations for the data under the simplest model of how those data were caused 24 .

Summary
This level of modelling is important because it enables the data or information from one country to inform estimates of the first level (DCM) parameters that underwrite the epidemic in another country 25 . This is another expression of the importance of having a hierarchical generative model for making sense of the data. Here, the generative model has latent causes that span different countries, thereby enabling the fusion of multimodal data from multiple countries (e.g., new test or death rates). Two natural questions now arise. Are there any systematic differences between countries in the parameters that shape epidemiological dynamics-and what do these dynamics or trajectories look like?
This concludes our brief treatment of between country effects, in which we have considered the potentially important role of Bayesian model reduction in identifying systematic variations in the evolution of an epidemic from country to country. The next section turns to the use of hierarchically informed estimates of DCM parameters to characterise an outbreak in a single country.

Dynamic causal modelling of a single country
This section drills down on the likely course of the epidemic in the UK, based upon the posterior density over DCM parameters afforded by the hierarchical (parametric empirical) Bayesian analysis of the previous section (listed in Table 2). Figure 7 shows the expected trajectory of death rates, new cases, and occupancy of CCU beds over a six-month (180 day) period.
These (posterior predictive) densities are shown in terms of an expected trajectory and 90% credible intervals (blue line and shaded areas, respectively). The black dots represent empirical data (available at the time of writing). Notice that the generative model can produce outcomes that may or may not be measured. Here, the estimates are based upon the new cases and deaths in Figure 2.
The panels on the left show that our confidence about the causes of new cases is relatively high during the period for which we have data and then becomes uncertain in the future. This reflects the fact that the data are informing those parameters that shaped the initial transient, whereas other parameters responsible for the late peak and subsequent trajectory are less informed. Notice that the uncertainty about cumulative deaths itself accumulates. On this analysis, we can be 90% confident that in five weeks, between 13,000 and 22,000 people may have died. Relative to the total population, the proportion of people dying is very small; however, the cumulative death rates in absolute numbers are substantial, in relation to seasonal influenza (indicated with broken red lines). Although cumulative death rates are small, they are concentrated within a short period of time, with near-identical CCU needs-with the risk of over-whelming available capacity (not to mention downstream effects from blocking other hospital admissions to prioritise the pandemic).
The underlying latent causes of these trajectories are shown in Figure 8. The upper panels reproduce the expected trajectories of the previous figure, while the lower panels show the underlying latent states in terms of expected rates or probabilities. For example, the social distancing measures are expressed in terms of an increasing probability of being at home, given the accumulation of infected cases in the population. During the peak expression of death rates, the proportion of people who are immune (herd immunity) increases to about 30% and then asymptotes at about 90%. This period is associated with a marked increase in the probability of developing symptoms (peaking at about 11 weeks, after the first reported cases). Interestingly, under these projections, the number of people expected to be in critical care should not exceed capacity: at its peak, the upper bound of the 90% credible interval for CCU occupancy is approximately 4200, this is within the current CCU capacity of London (corresponding to the projected capacity of the temporary Nightingale Hospital 26 in London, UK).
It is natural to ask which DCM parameters contributed the most to the trajectories in Figure 8. This is addressed using a sensitivity analysis. Intuitively, this involves changing a   particular parameter and seeing how much it affects the outcomes of interest. Figure 9 reports a sensitivity analysis of the parameters in terms of their direct contribution to cumulative deaths (upper panel) and how they interact (lower panel). These are effectively the gradient and Hessian matrix (respectively) of predicted cumulative deaths. The bars in the upper panel pointing to the left indicate parameters that decrease total deaths. These include social distancing and bed availability, which are-to some extent-under our control. Other factors that improve fatality rates include the symptomatic and acute respiratory distress periods and the probability of surviving outside critical care. These, at the present time, are not so amenable to intervention. Note that initial immunity has no effect in this analysis because we clamped the initial values to zero with very precise priors. We will relax this later. First, we look at the effect of social distancing by simulating the ensemble dynamics under increasing levels of the social distancing exponent (i.e., the sensitivity of our social distancing and self-isolation behaviour to the prevalence of the virus in the community).
It may be surprising to see that social distancing has such a small effect on total deaths (see upper panel in Figure 9).
However, the contribution of social distancing is in the context of how the epidemic elicits other responses; for example, increases in critical care capacity. Quantitatively speaking, increasing social distancing only delays the expression of morbidity in the population: it does not, in and of itself, decrease the cumulative cost (although it buys time to develop capacity, treatments, and primary interventions). This is especially the case if there is no effective limit on critical care capacity, because everybody who needs a bed can be accommodated. This speaks to the interaction between different causes or parameters in generating outcomes. In the particular case of the UK, the results in Figure 4 suggest that although social distancing is in play, self-isolation appears limited. This is because the number of contacts at home is relatively high (at over five); thereby attenuating the effect of social distancing.
In other words, slowing the spread of the virus depends upon reducing the number of contacts by social distancing. However, this will only work if there is a notable difference between the number of contacts at home and work. One can illustrate this by simulating the effects of social distancing, when it makes a difference. Figure 10 reproduces the results in Figure 8 but for 16 different levels of the social distancing parameter, while using the posterior expectation for contacts at home (of about four) from the Bayesian parameter average. Social distancing is expressed in terms of the probability of being found at home or work (see the panel labelled location). As we increase social distancing the probability and duration of being at home during the outbreak increases. This flattens the curve of death rates per day from about 600 to a peak of about 400. This is the basis of the mitigation ('curve flattening') strategies that have been adopted worldwide. The effect of this strategy is to reduce cumulative deaths and prevent finite resources being overwhelmed. In this example, from about 17,000 to 14,000, potentially saving about 3000 people. This is roughly four times the number of people who die in the equivalent period due to road traffic accidents. Interestingly, these (posterior predictive) projections suggest that social distancing can lead to an endgame in which not everybody has to be immune (see the middle panel labelled infection). We now look at herd immunity using the same analysis. Figure 11 reproduces the results in Figure 10 using the United Kingdom posterior estimates -but varying the initial (herd) immunity over 16 levels from, effectively, 0 to 100%. The effects of herd immunity are marked, with cumulative deaths ranging from about 18,000 with no immunity to very small numbers with a herd immunity of about 70%. The broken red lines in the upper right panel are the number of people dying from seasonal influenza (as in Figure 7). These projections suggest that there is a critical level of herd immunity that will effectively avert an epidemic; in virtue of reducing infection rates, such that the spread of the virus decays exponentially. If we now return to Figure 8, it can be seen that the critical level of herd immunity will, on the basis of these projections, be reached 2 to 3 weeks after the peak in death rates. At this point-according to the model-social distancing The key point to take from this figure is the quantification of uncertainty inherent in the credible intervals. In other words, uncertainty about the parameters propagates through to uncertainty in predicted outcomes. This uncertainty changes over time because of the nonlinear relationship between model parameters and ensemble dynamics. By model design, one can be certain about the final states; however, uncertainty about cumulative death rates itself accumulates. The mapping from parameters, through ensemble dynamics to outcomes is mediated by latent or hidden states. The trajectory of these states is illustrated in the next figure. 27 Note, only 2800 beds are ventilator/ITU beds. 28 We will use predictions-as opposed to projections-when appropriate, to emphasise the point that the generative model is not a timeseries model, in the sense that the unknown quantities (DCM parameters) do not change with time. This means the there is uncertainty about predictions in the future and the past, given uncertainty about the parameters (see Figure 7). This should be contrasted with the notion of forecasting or projection; however, predictions in the future, in this setting, can be construed as projections.
starts to decline as revealed by an increase in the probability of being at work. We will put some dates on this trajectory by expressing it as a narrative in the conclusion.
From a modelling perspective, the influence of initial herd immunity is important because it could form the basis of modelling the spread of the virus from city to another-and back again. In other words, more sophisticated generative . The expected death rate is shown in blue, new cases in red, predicted recovery rate in orange and CCU occupancy in yellow. The black dots correspond to empirical data. The lower four panels show the evolution of latent (ensemble) dynamics, in terms of the expected probability of being in various states. The first (location) panel shows that after about 5 to 6 weeks, there is sufficient evidence for the onset of an episode to induce social distancing, such that the probability of being found at work falls, over a couple of weeks to negligible levels. At this time, the number of infected people increases (to about 32%) with a concomitant probability of being infectious a few days later. During this time, the probability of becoming immune increases monotonically and saturates at about 20 weeks. Clinically, the probability of becoming symptomatic rises to about 30%, with a small probability of developing acute respiratory distress and, possibly death (these probabilities are very small and cannot be seen in this graph). In terms of testing, there is a progressive increase in the number of people tested, with a concomitant decrease in those untested or waiting for their results. Interestingly, initially the number of negative tests increases monotonically, while the proportion of positive tests starts to catch up during the peak of the episode. Under these parameters, the entire episode lasts for about 10 weeks, or less than three months. The broken red line in the upper left panel shows the typical number of CCU beds available to a well-resourced city, prior to the outbreak. models can be envisaged, in which an infected person from one city is transported to another city with a small probability or rate. Reciprocal exchange between cities, (and ensuing 'second waves') will then depend sensitively on the respective herd immunities in different regions. Anecdotally, other major pandemics, without social isolation strategies, have almost invariably been followed by a second peak that is as high (e.g., the 2009 H1N1 pandemic), or higher, than the first. Under the current model, this would be handled in terms of a second region being infected by the first city and so on; like a chain of dominos or the spread of a bushfire (Rhodes & Anderson, 1998; Zhang & Tang, 2016). Crucially, the effect of the second city (i.e., wave) on the first will be sensitive to the herd immunity established by the first wave. In this sense, it is interesting to know how initial levels of immunity shape a regional outbreak, under idealised assumptions. Figure 12 illustrates the interaction between immunity and viral spread as characterised by the effective reproduction rate, R (a.k.a. number or ratio); see (1.9). This figure plots the   predicted death rates for the United Kingdom and the accompanying fluctuations in R and herd immunity, where both are treated as outcomes of the generative model. The key thing to observe is that with low levels of immunity, R is fairly high at around 2.5 (current estimates of the basic reproduction ratio 29 R 0 , in the literature, range from 1.4 to 3.9). As soon as social distancing comes into play, R falls dramatically to almost 0. However, when social distancing is relaxed some weeks later, R remains low due to the partial acquisition of herd immunity, during the peak of the epidemic. Note that herd immunity in this setting pertains to, and only to, the effective or at-risk population: 80% herd immunity a few months from onset would otherwise be overly optimistic, compared to other de novo pandemics; e.g., (Donaldson et al., 2009). On the other hand, an occult herd immunity (i.e. not accompanied by symptoms) is consistent with undocumented infection and rapid dissemination (Li et al., 2020). Note that this way of characterising the spread of a virus depends upon many variables (in this model, two factors and three parameters). And can vary from country to country. Repeating the above analysis for China gives a much higher initial or basic reproduction rate, which is consistent with empirical reports (Sanche et al., 2020).

This concludes our characterisation of projections for what is likely to happen and what could happen under different
scenarios for a particular country. In the final section, we revisit the confidence with which these posterior predictive projections can be made.

Predictive validity
Variational approaches-of the sort described in this technical report-use all the data at hand to furnish statistically efficient estimates of model parameters and evidence. This contrasts with alternative approaches based on cross-validation. In the cross-validation schemes, model evidence is approximated by cross-validation accuracy. In other words, the evidence for a model is scored by the log likelihood that some withheld or test data can be explained by the model. Although model comparison based upon a variational evidence bound renders cross-validation unnecessary, one can apply the same procedures to demonstrate predictive validity. Figure 13 illustrates this by fitting partial timeseries from one country (Italy) using the empirical priors afforded by the parametric empirical Bayesian analysis. These partial data comprise the early phase of new cases. If the model has predictive validity, the ensuing posterior predictive density should contain the data that was withheld during estimation. Figure 13 presents an example of forward prediction over a 10-day period that contains the peak death rate. In this example, the withheld data are largely within the 90% credible intervals, speaking to the predictive validity of the generative model. There are two caveats here: first, similar analyses using very early timeseries from Italy failed to predict the peak, because of insufficient (initial) constraints in the data. Second, the credible intervals probably suffer from the well-known overconfidence problem in variational Bayes, and the implicit mean field approximation (MacKay, 2003) 30 .

Conclusions
We have rehearsed variational procedures for the inversion of a generative model of a viral epidemic-and have extended this model using hierarchical Bayesian inference (parametric empirical Bayes) to deal with the differential responses of each country, in the context of a worldwide pandemic. Clearly, this narrative is entirely conditioned on the generative model used to make these predictions (e.g., the assumption of lasting immunity, which may or may not be true). The narrative is offered in a deliberately definitive fashion to illustrate the effect of resolving uncertainty about what will happen. It has been argued that many deleterious effects of the 30 Note further that the credible intervals can include negative values. This is an artefact of the way in which the intervals are computed: here, we used a first-order Taylor expansion to propagate uncertainty about the parameters through to uncertainty about the outcomes. However, because this generative model is non-linear in the parameters, high-order terms are necessarily neglected. 31 This narrative is not offered as a prediction -but as an example of the kind of predictions afforded by dynamic causal modelling. An aspect of these predictions is that they include systemic factors beyond the epidemiology per se. The best example of this is the above predictions about social distancing, which could be read as 'lockdown'; namely the probability that I will leave home. This highlights a key distinction between dynamic causal models and standard quantitative epidemiological modelling that treats things like 'lockdown' as interventions that are supplied to the model. In contrast, interventions such as social distancing and testing are modelled as an integral part of the process -and are estimated on the basis of the data at hand. One consequence of this is that one can make predictions about when 'interventions' -or their suspension -will occur in the future. 29 The basic reproduction ratio is a constant that scores the spread of a contagion in a susceptible population. This corresponds to the effective reproduction ratio at the beginning of the outbreak, when everybody is susceptible. See Figure 12  uncertainty about what will happen. This is a key motivation behind procedures that quantify uncertainty, above and beyond being able to evaluate the evidence for different hypotheses about what will happen.
One aspect of this is reflected in rhetoric such as "there is no clear exit strategy". It is reassuring to note that, if one subscribes to the above model, there is a clear exit strategy inherent in the self-organised mitigation 32 afforded by herd immunity. For example, within a week of the peak death rate, there should be sufficient herd immunity to preclude any resurgence of infections in, say, London. The term 'self-organised' is used carefully here. This is because we are part of this process, through the effect of social distancing on our location, contact with infected people and subsequent dissemination of COVID-19. In other words, this formulation does not preclude strategic (e.g., nonpharmacological) interventions; rather, it embraces them as part of the self-organising ensemble dynamics 33 .

Outstanding issues
This technical report describes an initial implementation of the DCM framework to provide a generative model of a viral epidemic, and to demonstrate the potential utility of such modelling. Clearly there are a number of ways this model could be refined. Our hope in making it open source is that it will allow others to identify issues, contribute to improvements-and help facilitate objective comparisons with other models-using Bayesian model comparison.
There remain a number of outstanding issues: The generative model-at both the first and second level-needs to be explored more thoroughly. At the first level, this may entail the addition of other factors; for example, splitting the population into age groups or different classes of clinical vulnerability. Procedurally, this should be fairly simple, by specifying the DCM parameters for each age group (or cohort) separately and precluding transitions between age groups (or cohorts). One could also consider the fine graining of states within each factor. For example, making a more careful distinction between being in and not in critical care (e.g., being in self-isolation, being in a hospital, community care home, rural or urban location and so on). At the between city or country level, the parameters of the general linear model could be easily extended to include a host of demographic and geographic independent variables. Finally, it would be fairly straightforward to use increasingly fine-grained outcomes, using regional timeseries, as opposed to country timeseries (these data are currently available from: https://github.com/ CSSEGISandData/COVID-19).
Another plausible extension to the hierarchical model is to include previous outbreaks of MERS and SARS (Middle East and Severe Acute Respiratory Syndrome, respectively) in the model. This would entail supplementing the timeseries with historical (i.e., legacy) data and replicating the general linear model for each type of virus. In effect, this would place Figure 13. predictive validity. This figure uses the same format as Figure 7; however, here, the posterior estimates are based upon partial data, from early in the timeseries for an exemplar country (Italy). These estimates were obtained under (parametric) empirical Bayesian priors. The red dots show outcomes that were not used to estimate the expected trajectories (and credible intervals). This example illustrates the predictive validity of the estimates for a 10-day period following the last datapoint, which capture the rise to the peak of new cases.
other words, more information about the DCM parameters can be installed through adjusting the prior expectations and variances. The utility of these adjustments would then be assessed in terms of model evidence. This may be particularly relevant as reliable data about bed occupancy, proportion of people recovered, etc becomes available.
empirical priors or constraints on any parameter that shares characteristics with MERS-CoV and SARS-CoV.
In terms of the model parameters-as opposed to model structure-more precise knowledge about the underlying causes of an epidemic will afford more precise posteriors. In A key aspect of the generative model used in this technical report is that it precludes any exogenous interventions of a strategic sort. In other words, the things that matter are built into the model and estimated as latent causes. However, prior knowledge about fluctuating factors, such as closing schools or limiting international air flights, could be entered by conditioning the DCM parameters on exogenous inputs. This would explicitly install intervention policies into the model. Again, these conditions would only be licensed by an increase in model evidence (i.e., through comparing the evidence for models with and without some structured intervention). This may be especially important when it comes to modelling future interventions, for example, a 'sawtooth' social distancing protocol. A simple example of this kind of extension would be including a time dependent increase in the capacity for testing: at present, constraints on testing rates are assumed to be constant.
A complementary approach would be to explore models in which social distancing depends upon variables that can be measured or inferred reliably (e.g., the rate of increase of people testing positive) and optimise the parameters of the ensuing model to minimise cumulative deaths. In principle, this should provide an operational equation that could be regarded as an adaptive (social distancing) policy, which accommodates as much as can be inferred about the epidemiology as possible.
A key outstanding issue is the modelling of how one region (or city) affects another-and how the outbreak spreads from region to region. This may be an important aspect of these kinds of models; especially when it comes to modelling second waves as 'echoes' of infection, which are reflected back to the original epicentre. As noted above, the ability of these echoes to engender a second wave may be sensitively dependent on the herd immunity established during the first episode. Herd immunity is therefore an important (currently latent or unobserved) state. This speaks to the importance of antibody testing in furnishing empirical constraints on herd immunity. In turn, this motivates antibody testing, even if the specificity and sensitivity of available tests are low. Sensitivity and specificity are not only part of generative models, they can be estimated along with the other model parameters. In this setting, the role of antibody testing would be to provide data for population modelling and strategic advice-not to establish whether any particular person is immune or not (e.g., to allow them to go back to work).
Finally, it would be useful to assess the construct validity of the variational scheme adopted in dynamic causal modelling, in relation to schemes that do not make mean field approximations. These schemes usually rely upon some form of sampling (e.g., Markov Chain Monte Carlo sampling) and cross-validation. Cross-validation accuracy can be regarded as a useful but computationally expensive proxy for model evidence and is the usual way that modellers perform automatic Bayesian computation. Given the prevalence of these sampling based (non-variational) schemes, it would be encouraging if both approaches converged on roughly the same predictions. The aim of this technical report is to place variational schemes on the table, so that construct validation becomes a possibility in the short-term future.

Software note
The figures in this technical report can be reproduced using annotated ( ., 2020)). The code is also compatible with GNU Octave 5.2. Details about future developments of the software will be available from https://www.fil.ion.ucl.ac.uk/spm/covid-19/.

Posthoc evaluation of model predictions
This epilogue was written three months after the report was submitted, providing an opportunity to revisit some of the predictions in light of actual outcomes. Although the predictions in this report were used to illustrate the nature of the predictions supported by models that included social distancing, they can be used to assess the predictive validity of the DCM.
Subsequently, the DCM was optimized using Bayesian model comparison. A crucial addition was the inclusion of heterogeneity in the response of the population to viral infection. However, even the simple DCM above accommodated sufficient heterogeneity-in terms of the distinction between an effective and total (census) population-to provide some accurate predictions.
In brief, the shape and timing of the epidemic in London was predicted to within a few days. Conversely, the number of fatalities and positive test results were overestimated by a factor of about 3. In what follows, we list the accurate and inaccurate predictions. We assume that the census population of London was 8.96 million 34 . London's population is taken to be the effective population estimated to be 2.49 million (see Table 2) and social distancing is read as lockdown (i.e., the probability of leaving home).

Accurate predictions
• "Based on current data, reports of new cases in London are expected to peak on April 5" Daily confirmed cases of coronavirus in London (and the UK) peaked on April 5 35 .
• "A peak in death rates around April 10 (Good Friday This prediction corresponds to 8.9% = 32% x 2.49/8.98 of the census population of London, which coincides with the consensus estimates at that time. "Professor Chris Whitty admits he thinks at least 10% of the capital has been infected" (published on 24-April-2020) 38 .
• "Improvements should be seen by May 8, shortly after the May bank holiday, when social distancing will be relaxed." On May 8, the first Black Lives Matter demonstrations started in London. This was followed by the first governmental relaxation of lockdown on May 10: "So, work from home if you can, but you should go to work if you can't work from home." (Prime Minister's address to the nation: 10-May-2020) 39 • "At this time [May 8] herd immunity should have risen to about 80%" Population immunity in the effective population corresponds to 80% x 2.49 / 8.9 = 22% seroprevalence in the census population, which had risen to 17.5% in the previous week: "After making adjustments for the accuracy of the assay and the age and gender distribution of the population, the overall adjusted prevalence in London increased from 1.5% in week 13 to 12.3% in weeks 15 to 16 and 17.5% in week 18" (week ending May 3, 2020) 40 .
• "By June 12, death rates should have fallen to low levels with over 90% of people being immune" Weekly reported deaths in London hospitals for the week ending June 11 fell to 22 (with positive tests)) 41 . Seroprevalence for this period was not reported.
• "By June 12, social distancing [lockdown] will no longer be a feature of daily life." The second governmental relaxation of lockdown was announced on June 10 and June 23, with an initial reopening of shops, and an easing of the two-metre social distancing rule: "[A]s the Business Secretary confirmed yesterday, we can now allow all shops to reopen from Monday." (Prime Minister's statement that the coronavirus press conference: 10-June-2020) 42 "Thanks to our progress, we can now go further and safely ease the lockdown in England. At every stage, caution will remain our watchword, and each step will be conditional and reversible. Mr Speaker, given the significant fall in the prevalence of the virus, we can change the two-metre social distancing rule, from 4th July." (Prime Minister's statement to the House: 23-June-2020) 43 Inaccurate predictions These were overestimates; daily deaths in London peaked at 249 on April 9 with cumulative deaths at the time of writing (17-July-2020) of 6,106 45 . This represents consistent overestimates by factors of 3.2 and 2.8, respectively. This may reflect the fact that the data used in the report included regions in the United Kingdom outside London.

Software availability
Software is available from: https://www.fil.ion.ucl.ac.uk/spm/ covid-19/. This technical report presents a dynamical causal model of the transmission dynamics of COVID-19. I believe this paper is one of very few (if any) that follow this type of approach which makes it interesting and an important contribution to the literature even after dozens of modeling papers on the topic have been published or are in the process of publication. The paper is well described and the results are interesting and present a new approach for assessing the role of multiple factors on the spread of COVID-19. However, the epidemic has advanced significantly, and it would be good to see how the results and perspectives are shaped by more recent data. Authors should consider updating the paper with the most recent data available, and discuss how their analysis/conclusions are shaped by integrating additional data.

Are sufficient details provided to allow replication of the method development and its use by others? Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions about the method and its performance adequately supported by the findings presented in the article? Yes We have tried to revise the paper to preserve its original content (by limiting changes to the main text to clarify and unpack things). We have used new footnotes no. 11, 13, 16, 24, 31, and 33 and a new section "Posthoc evaluation of model predictions" to address issues that have arisen since submission (for example, the validity of predictions in light of actual outcomes).
We hope this revised version is helpful in further clarifying our new approach. This is an interesting and expansive modelling paper from a group of scientists that do not primarily focus on modelling infectious diseases, I think contributions to epidemiology from other fields should always be welcomed and this is no exception. The techniques employed in this paper are less of a different type of model and more of an entirely different modelling framework. As such, I see part of my job in this review as trying to bridge the gaps between the language and techniques of dynamic causal modelling and infectious disease modelling. Hopefully in doing so I am able to present any criticism in a way that both the authors and other infectious disease modellers are able to follow and understand. The dynamic causal model developed in this paper can be understood roughly as a stochastic compartmental SEIR model that has 1) a "generative" model that describes movement between unobserved states over time (infection, recovery etc) and 2) an "observational model" that describes the likelihood for the parameter values in the generative model given the observed data (in this case daily deaths and positive tests). The generative model has four components: location, which determines where you are and the contacts you make; infection, which is akin to the susceptible -exposed -infectious -recovered model used commonly; clinical, which determines the clinical presentation should you become infected; testing, which links your current infection status to the result of a swab test. You can be at various states within any of these four components at once, for example I could be an asymptomatic, infectious person at work that has not been tested. How I move between these states is governed by a matrix of probabilities that can be non-linear in time and as a response to feedback from other parameters within the model (for example my probability of observing lockdown can grow as more people die during the outbreak).
I think ultimately the generative model is comparable to a complicated SEIR model and the next step in the mind of an infectious disease modeller is to use the likelihood from the observational model in a fitting method such as MCMC to generate samples from the posterior distribution of the generative model parameters. Instead, dynamic causal modelling has a developed body of theory that allows for approximation of the analytical solution to the posterior of the model parameters that maximises the model evidence (marginal likelihood). This allows for immediate comparison of different generative model structures on the same data through selecting the model with the optimal log model evidence, which is also referred to as "variational free energy" (a similar process to the commonly used Akaike or Bayesian information criterion). This was refreshing to me as it can sometimes be difficult to obtain AIC/BIC after fitting your model depending on how you have fit it, such as in the probabilistic modelling language Stan where you sometimes need to calculate the leave-one-out information criterion (LOO-IC) yourself.
Another interesting methodological addition from the dynamic causal modelling framework is fitting the model to data from several different countries and then assigning model parameters as fixed or random effects, using a generalised linear model to estimate the between-country effects of certain covariates. In the manuscript the authors show the results of this process, finding a relationship between the latitude of a country and the effective population size of the outbreak inferred by the model. While, as the authors acknowledge, latitude here is very likely a proxy for other socio-economic variables, this approach could potentially yield interesting results using a wider selection of between-country effects or as a heuristic device to try and understand what factors are driving the model fit in each country. This is complemented with a technique called "Bayesian model reduction", which efficiently prunes redundant parameters out of the model to simultaneously achieve model parsimony and perform a sensitivity analysis of sorts since it involves fixing the prior of each parameter and looking at the difference in model fit. To me, the framework of dynamic causal modelling seems to make available several tools that should be of interest to infectious disease modellers. It is not the case that infectious disease modellers don't already try to reduce models or compare them between countries, but what is attractive about the dynamic causal modelling approach is the coherency of the framework and the availability of software to perform the methods for models in general (although I think most infectious disease modellers would prefer to use R rather than MATLAB). At the very least, the methods employed in the dynamic causal modelling framework could be adapted to work with the more familiar combined compartmental model and MCMC approach. The methods in the dynamic causal modelling framework are heavily used and accepted in the field of neuroscience, so I don't think it's my job in this review to scrutinise them in particular outside of understanding them to the point where I can understand how the model in this particular paper was fitted.
With the general modelling approach summarised I can move on to the specifics of the structure of the generative and observation models:

Predictive validity
In a similar way to AIC/BIC, I think I am correct in thinking that model selection using variational free energy only provides a relative score of model fit and not an objective score. Choosing the best model out of a suite of models does not guarantee that this best model fits well, for this we need to turn to predictive validity and this is where I think the model laid out in the paper is at its weakest. Below is the best-fitting model's prediction for London in full: "Based on current data, reports of new cases in London are expected to peak on April 5, followed by a peak in death rates around April 10 (Good Friday). At this time, critical care unit occupancy should peak, approaching-but not exceeding-capacity, based on current predictions and resource availability. At the peak of death rates, the proportion of people infected (in London) is expected to be about 32%, which should then be surpassed by the proportion of people who are immune at this time. Improvements should be seen by May 8, shortly after the May bank holiday, when social distancing will be relaxed. At this time herd immunity should have risen to about 80%, about 12% of London's population will have been tested. Just under half of those tested will be positive. By June 12, death rates should have fallen to low levels with over 90% of people being immune and social distancing will no longer be a feature of daily life." It's quite hard to tell if we are meant to interpret this as an example of what sort of narrative could be derived from the results of the model, or whether this is a genuine model prediction. If it is the latter, then I would expect to see mention of when the prediction was made, as well as plots showing the prediction (shown in Figures 12 and 6) against the data which is now available. The authors do this for their predictions for Italy ( Figure 13) but not London. I am writing this review on June 10th and at the time of writing the number of deaths on the 9th June was 286. Without numbers given for the prediction it's hard to know if this counts as "low levels" or not, the 8th June was the beginning of week 24 and the corresponding prediction of daily deaths in Figure 12 is near zero.

Effective population size
Perhaps more concerning than the prediction for deaths is the prediction for immunity. In the paper I find it quite difficult to tell what exactly is being spoken about when it comes to immunity. The model fits a parameter called "effective population" (θN) that I think could do with some further explanation, it seems to be the case that immunity is presented as the number of infections inferred by the model divided by the effective population. When the model was fitted to UK data it inferred an effective population size of ~2.5 million people. It's quite hard to tell but from Figure 8, looking at the cumulative cases inferred by the model and the proportion of the population entering the immune category, it seems like the model has predicted that nearly all of the 2.5 million people in the effective population are now immune.
Here is what the authors say about the effective population parameter: "In this technical report, we will choose a simpler option that treats a pandemic as a set of linked point processes that can be modelled as rare events. In other words, we will focus on modelling a single outbreak in a region or city and treat the response of the 'next city' as a discrete process post hoc. This simplifies the generative model; in the sense we only have to worry about the ensemble dynamics of the population that comprises one city. A complimentary perspective on this choice is that we are trying to model the first wave of an epidemic as it plays out in the first city to be affected. Any second wave can then be treated as the first wave of another city or region .Under this choice, the population size can be set, a priori, to 1,000,000; noting that a small city comprises (by definition)a hundred thousand people, while a large city can exceed 10 million. Note that this is a prior expectation, the effective population size is estimated from the data: the assumption that the effective population size reflects the total population of a country is a hypothesis (that we will test later)." It is true that you can use a model with a population size under 67 million, look at the dynamics of the outbreak from the model output, and infer things about the potential effectiveness of social distancing, eventual likelihood of herd immunity, and so on, that would be true in a larger population. However, you would not fit a model to death data for all of the UK using a population parameter that is smaller than the population of the UK. I think the model output as shown in the manuscript is a best guess at the outbreak dynamics if the number of deaths and cases observed in a place with a population of 67 million people were instead observed in a place with a population of 2.5 million. As a result of fitting to death rates for a population 30 times bigger than the one in your model, you would expect to find that almost everyone is infected.
Since the writing of this manuscript, serological studies have started to emerge which estimate the percentage of the population that have been infected (which would correspond to the immune compartment in the model) . On the 24th May the ONS estimated that around 7% of the UK have antibodies for COVID-19, rising to 17% in London. Even acknowledging that serology studies are not perfect and that the ones performed so far have been quite small scale, this is really quite a different picture than the 90% population immunity presented by the model output.
The picture is similar in serological studies across the world, even in healthcare workers in hard-hit cities like Barcelona that would have faced constant exposure to infection.
What is the result of fitting the model to UK deaths and reported cases with a fixed, actual value for the effective population? Or at least using the UK population as the prior value? I think either a) the model output should be more clearly presented as an example or b) you should acknowledge that the model output gives predictions that seem very different from the emerging evidence

CCU fatality
The fitted probability that a person dies given that they are in the CCU (θfat) for China and Italy is very high (nearly 100% and well over 50%). How well does this compare to actual observed mortality rates in CCUs?
For example, this paper 1 , found 26% mortality in ICUs in Lombardy, Italy in early March.

Observation model
The UK data collated by the John Hopkins COVID-19 data repository that the authors use fetches data from here. The observation model could be improved by including a delay between the actual occurrence of death and its eventual reporting in the official statistics, sometimes it can take a couple of days for deaths to appear in the government figures. I think this could interfere with the model fit as it tries to align deaths and reported cases (which it currently reasons have both happened on that day).

Reporting structure
It is also important to consider the structure of the surveillance system when trying to fit to reported cases. In the UK for a good while tests were only undertaken on hospital admissions that were severe enough to warrant being admitted overnight (or at least that is what the official policy was). Other countries like South Korea had drive-through test centres. This is going to cause a huge discrepancy in how you should interpret changes in reported cases.

Between-country parameter value variation
It is strange that there is so much variation in some of the parameters between countries. For example, the contagious period is around 1 day in China but around 3-5 days in France? What is the biological reasoning behind this? Arguably there could be some genetic variation in the virus between countries but could that cause such a significant difference? Is there any empirical evidence that supports differences in how long your are contagious between countries?
The same goes for the numbers of contacts at home or contacts at work. People in the United Kingdom are estimated to have around 7 contacts at home, but the average size of UK households is just 2.3. It would be good to link the output of these variables to any empirical data that is available to show that they are meaningful and do actually correspond to whatever data might be available. One of the countries with the lowest effective contacts in the household (~ 1.5) has a higher average household size than the UK of 2.5.
The variable for the probability of infection given contact (θtrn) is fairly stable apart from China and Australia where it is relatively large and small, respectively. Do the authors have any thoughts why this might be the case?

Age structure
The model does not include any kind of age structure. Age has a large effect on the fatality of infection and should therefore be accounted for. Countries with an older population would likely see a higher fatality rate. Age could also influence the amount and types of contacts that people make, with more intergenerational contacts happening within the home and more intragenerational contacts happening at work or school.

Summary
The model described in this paper is an interesting and important first step at putting together a model of infectious disease dynamics within the framework of dynamic causal modelling. However, when the particular model here is fit to data I don't think it displays that it has captured the dynamics of the outbreak well wherever it is able to be compared to separately collected bits of data such as seroprevalence or CCU mortality.
I think what has happened in the model fitting process for the most part is that the variation introduced into the time series of deaths and reported cases due to differing surveillance and reporting structures, differing testing regimes, differing outbreak responses, and differing population demographics between countries have been accounted for within the generative model through between-country variation in parameters such as the effective population size, numbers of contacts at work (for example, do most people in China really have between 100 and 150 effective contacts at work?), CCU fatality, contagious period length, and others. The unfortunate reality is that with a flexible enough model (in terms of numbers of parameters) it is always possible to produce a fit that very closely matches the reported case and death data observed so far. The real test for this model is whether the estimated parameter values that can be compared to other sources of data match what we observe empirically and I think it is fairly obvious that this has not happened.
Sadly I don't think that I can recommend this paper for indexing as it currently stands because I don't think it is clear what it is trying to do. I think the easiest way of resolving this problem is for the authors to ask themselves the question "Do I think the model predictions made for the UK in this paper are plausible or are they examples of predictions that can be made from the model?". If the predictions are examples then this paper is an introduction to disease modelling using dynamic causal modelling and the predictions should be more clearly labelled as examples. The paper could then be further improved by showing how methods such as the between-country parameter comparisons using the hierarchical GLM correspond to the types of questions that disease modellers want to answer. Alternatively, if the authors do think that the predictions made in this paper are accurate, then they need to be far more stringent comparing their predictions with data that has become available since they are made and have questions to answer regarding the gap between the 90% immunity in London that they predict and the 17% that has been estimated by the ONS. That London may have already reached herd immunity has huge implications for future intervention policies, the most significant being that there is no danger of a second wave. If we behave as if there is 90% immunity (completely end social distancing etc.) but we are in fact well below herd immunity, then we will have likely caused the second wave through our own actions.

Recommendations in brief:
Compare We would like to thank you for the considerable time and effort you have spent reviewing our manuscript. Your thoroughness and attention to detail, in what must be very busy and challenging times, has been very much appreciated. We were particularly impressed with the summary of the technical aspects of this work, which are useful and informed descriptions in their own right.
We have tried to revise the paper to preserve its original content (by limiting changes to the main text to clarify and unpack things). We have used footnotes and a new 'to address issues that have arisen since submission (for example, the validity of predictions in light of actual outcomes).
Below are the replies to the comments, that for clarity we have grouped into key themes. We hope these revisions are what you had in mind:

A) Predictive validity:
The primary purpose of this paper was to serve as a technical report, introducing a methodology that could be, and was, used to answer specific questions about epidemiological parameters and epidemiological model structure. To clarify this, we have emphasised that the narrative at the end of the paper is an example of the kind of predictions that can be made, rather than a definitive prediction per se (footnote 31): "This narrative is not offered as a prediction -but as an example of the kind of predictions afforded by dynamic causal modelling. An aspect of these predictions is that they include systemic factors beyond the epidemiology per se. The best example of this is the above predictions about social distancing, which could be read as 'lockdown'; namely the probability that I will leave home. This highlights a key distinction between dynamic causal models and standard quantitative epidemiological modelling that treats things like 'lockdown' as interventions that are supplied to the model. In contrast, interventions such as social distancing and testing are modelled as an integral part of the process -and are estimated on the basis of the data at hand. One consequence of this is that one can make predictions about when 'interventions' -or their suspension -will occur in the future." Regarding specific predictive validity, we thought it would be disingenuous to change the predictions in light of subsequent outcomes-or the procedures that were applied in subsequent reports. However, we have now added an extensive 'Posthoc evaluation of model predictions' section in the revised version that addresses the predictions in light of current data. This section implicitly addresses the specific points about predictions in the reviewers' comments. We have also attempted to make the demarcation between a procedural and predictive contribution clearer throughout the text by including footnotes like the following (footnote 33): "To reiterate, the purpose of this technical report was to introduce the variational procedures entailed by dynamic causal modelling in the setting of quantitative, epidemiological modelling. Since this report was submitted, several papers have used procedures described in this report to address specific questions; for example, the impact of lockdown cycles, the effect of population fluxes among regional outbreaks, the efficacy of testing and tracing, and the impact of heterogeneous susceptibility and transmission. Crucially, in line with a key message of this foundational paper, each successive application of the dynamic causal modelling leveraged Bayesian model comparison to update the model as new data became available." We also take the opportunity to future-proof retrospective evaluations of the reproduction ratio with the following footnote 13: "Added in revision: the reproduction ratio in this report was based upon an approximation to the expected number of people that I might infect, if I was infectious. In subsequent reports, the reproduction ratio was brought into line with more formal definitions, based on the geometric rate of increase in the prevalence of infection and the period of contagion. A minimum reproduction ratio (R) of nearly zero in this report corresponds to about 0.7 in subsequent (and other) reports." In addition to these, we have also incorporated a number of additional changes outlined below.

B) Effective population
It is clear that the "effective population" terminology, particularly in respect to immunity, represents a common source of confusion. To rectify this, we have made a number of changes throughout the paper. First, we have amended the "Initial Conditions and population size" section, splitting it and introducing a new subsection as follows: We have also annotated the legend to figure 11, and made the following change to immunity predictions, to clarify this further:

C) Improvements to model:
We appreciate the number of suggestions to help refine or improve this model further. As surmised in the "Predictive Validity" section of your review, this report provides an initial technical description for the kind of analyses that could be used via the presented methodology. In a sense, it represents a proof of concept for this type of modelling, and we acknowledge there are many directions and improvements that could be made such as There remain a number of outstanding issues:" Additionally, in a separate piece of work [1] we have also formally compared an ODE-based SEIR model to the DCM presented here. Here the SEIR was developed originally by Moghadas et al.
[2] to assess CCU projections due to COVID-19 in the US. The SEIR model comprised 12 states including asymptomatic and subclinical infected states, self-isolation, and separate states of hospitalization [2]. We optimised parameters for both the SEIR and DCM using identical variational processes to those presented here. Taking data from seven European countries including the UK, we found that the approximate model evidence or Free Energy provided very strong support for the DCM as compared to the SEIR model, suggesting that marginal state occupancy was important when accounting for those data.
In particular Log Bayes Factors of >100 was evidenced for all seven datasets. This comparative analysis is currently under review.

D) Observation model
We thank the reviewer for highlighting this. We are aware that delays in reporting deaths and reporting of statistics over weekends do represent potential confounds to the observed time series data. In this work, we perform smoothing of time series by several days to deal with these delays in reporting. Delays in reporting PCR testing were modelled explicitly in terms of a 'waiting for a test' state because entry into this state depends upon testing capacity. Conversely, a simple delay in reporting a death can be accommodated by an increase in effective dwell time in critical care. One could consider a DCM that modelled the delay in reporting deaths explicitly-and then use Bayesian model comparison to compare models with and without delays. We did not do this; however, the conditional dependencies between an additional delay parameter and the existing parameters would probably reduce the marginal likelihood (i.e., Bayesian model evidence) of an extended DCM.

E) Reporting structure
We agree that differences in testing and reporting strategies will impact the data. In the model presented, the testing rate parameter accounts for some of these differences. We have added the following footnote 11 to emphasise the importance of this part of the model.

F) CCU fatality/Between-country parameter value variation
First a disclaimer is that these assertions (for example Fig. 6, showing differences among countries) are not about actual states of affairs. These are the best explanations for the data available at the time, under the simplest model of how those data were caused. However, there does appear to be some degree of predictive validity; for example, the predicted CCU mortality rate in the UK in April (at the time of writing of the paper) of about 48%, was close to data published on the 4th April by the Intensive Care National Audit and Research Centre (critical care mortality = 50.1% [3]). Regarding the Italian data from Lombardy, whilst the mortality rate was lower (26%), the data was acquired earlier on in the pandemic (February 20 to March 18) before the peak in cases. Rather than dissect the predictive validity of each parameter and country, which is widely recognised as an extremely challenging problem [4], we would reiterate that this paper is intended as a technical report for DCM, and provides examples of the types of questions that could be addressed using this method. To clarify these points, we have modified the following in the "Parametric empirical Bayes and hierarchical modelling section": [5] https://en.wikipedia.org/wiki/Greater_London ******************************************************************************************** Posthoc evaluation of model predictions This section was written three months after the report was submitted, providing an opportunity to revisit some of the predictions in light of actual outcomes. Although the predictions in this report were used to illustrate the nature of the predictions supported by models that included social distancing, they can be used to assess the predictive validity of the DCM.
Subsequently, the DCM was optimized using Bayesian model comparison. A crucial addition was the inclusion of heterogeneity in the response of the population to viral infection. However, even the simple DCM above accommodated sufficient heterogeneity-in terms of the distinction between an effective and total (census) population-to provide some accurate predictions.
In brief, the shape and timing of the epidemic in London was predicted to within a few days. Conversely, the number of fatalities and positive test results were overestimated by a factor of about 3. In what follows, we list the accurate and inaccurate predictions. We assume that the census population of London was 8.96 million [1]. London's population is taken to be the effective population estimated to be 2.49 million (see Table 2) and social distancing is read as lockdown (i.e., the probability of leaving home). "At the peak of death rates [April 10], the proportion of people infected (in London) is expected to be about 32%" ○ This prediction corresponds to 8.9% = 32% x 2.49/8.96 of the census population of London, which coincides with the consensus estimates at that time. "Professor Chris Whitty admits he thinks at least 10% of the capital has been infected" (published on 24-April-2020) [5].
"Improvements should be seen by May 8, shortly after the May bank holiday, when social distancing will be relaxed." ○ On May 8, the first Black Lives Matter demonstrations started in London. This was followed by the first governmental relaxation of lockdown on May 10: "So, work from home if you can, but you should go to work if you can't work from home." (Prime Minister's address to the nation: 10-May-2020) [6] "At this time [May 8] herd immunity should have risen to about 80%" ○ Population immunity in the effective population corresponds to 80% x 2.49 / 8.9 = 22% seroprevalence in the census population, which had risen to 17.5% in the previous week: "After making adjustments for the accuracy of the assay and the age and gender distribution of the population, the overall adjusted prevalence in London increased from 1.5% in week 13 to 12.3% in weeks 15 to 16 and 17.5% in week 18" (week ending May 3, 2020) [7]. "By June 12, death rates should have fallen to low levels with over 90% of people being immune" ○ Weekly reported deaths in London hospitals for the week ending June 11 fell to 22 (with positive tests)[8]. Seroprevalence for this period was not reported.
"By June 12, social distancing [lockdown] will no longer be a feature of daily life." ○ The second governmental relaxation of lockdown was announced on June 10 and June 23, with an initial reopening of shops, and an easing of the two-metre social distancing rule: "[A]s the Business Secretary confirmed yesterday, we can now allow all shops to reopen from Monday." (Prime Minister's statement that the coronavirus press conference: 10-June-2020) [9] "Thanks to our progress, we can now go further and safely ease the lockdown in England. At every stage, caution will remain our watchword, and each step will be conditional and reversible. Mr Speaker, given the significant fall in the prevalence of the virus, we can change the two-metre social distancing rule, from 4th July." (Prime Minister's statement to the House: 23-June-2020) [10] Inaccurate predictions "About 12% of London's population will have been tested (May 8). Just under half of those ○ tested will be positive." This was an overestimate: 12% of the effective population corresponds to 143,424 = 12% x .48 x 2.49 positive tests. At the time of writing (17-July-2020), only 34,397 people have tested positive in London [11]-a quarter of the predicted number.
From Figure 8: Peak daily death rate 807 (710-950) with cumulative deaths of 17,500 (14,000-21,000) ○ These were overestimates; daily deaths in London peaked at 249 on April 9 with cumulative deaths at the time of writing (17-July-2020) of 6,106 [12]. This represents consistent overestimates by factors of 3.2 and 2.8, respectively. This may reflect the fact that the data used in the report included regions in the United Kingdom outside London.