Dynamic survival analysis for non-Markovian epidemic models

We present a new method for analysing stochastic epidemic models under minimal assumptions. The method, dubbed dynamic survival analysis (DSA), is based on a simple yet powerful observation, namely that population-level mean-field trajectories described by a system of partial differential equations may also approximate individual-level times of infection and recovery. This idea gives rise to a certain non-Markovian agent-based model and provides an agent-level likelihood function for a random sample of infection and/or recovery times. Extensive numerical analyses on both synthetic and real epidemic data from foot-and-mouth disease in the UK (2001) and COVID-19 in India (2020) show good accuracy and confirm the method’s versatility in likelihood-based parameter estimation. The accompanying software package gives prospective users a practical tool for modelling, analysing and interpreting epidemic data with the help of the DSA approach.


Introduction
The standard approach to building a stochastic compartmental epidemic model is to make use of a continuous-time Markov chain (CTMC) to keep track of the sizes of the compartments over time (e.g. number of individuals with different immunological statuses) using counting processes [1]. Following the random time change representation of Poisson processes [2,3], the trajectory equations for those counting processes are written in terms of independent, unit rate Poisson processes. When the size of the population under consideration is large, those counting processes, appropriately scaled, converge to deterministic, continuous real-valued functions satisfying certain ordinary differential equations (ODEs) by virtue of the functional law of large numbers (FLLN) for Poisson processes [4,5]. This convergence provides a link between the stochastic and the deterministic world and the limiting ODEs are often referred to as the mean-field equations in the literature. Famous examples include the classical Kermack-McKendrick equations for the susceptible-infected-recovered (SIR) epidemic model [6].
However, this astounding popularity of the standard Markov models or the corresponding mean-field ODE models seems to belie their apparent lack of faithfulness to the underlying biology of the disease. To quote van Kampen [7], 'Non-Markov is the rule, Markov is the exception'.
Indeed, the population count-based Markov models assume exponentially distributed inter-event times. As a consequence, the instantaneous rates of infection and recovery are assumed constant regardless of key epidemiologically relevant covariates, such as the age of infection (see §2), time since vaccination, etc. As shown in [8] (in particular, see table 1 and fig. 1), the estimates obtained by assuming a Markovian model when the underlying model is non-Markovian could be significantly biased. While there are more advanced stochastic models that do incorporate those covariates (as we will also do in this paper), those models are often fit to data in an ad hoc fashion; or are too computationally expensive to be useful for practical purposes. Our aim in this work is to build a principled and rigorous statistical approach to fitting those more advanced stochastic models to data without compromising on simplicity.
In this paper, we present a survival analytic approach, dubbed dynamic survival analysis (DSA), that constructs probability distributions of individual times of infection and recovery from population-level (mean-field) trajectory equations. In [9], a subset of the authors first employed this idea in the context of the classical Kermack-McKendrick Markovian (SIR) epidemics described by their mean-field ODEs. Here, we extend the idea to the vastly more realistic class of non-Markovian models that allow non-exponential contact interval [8] and infectious periods. The theoretical underpinning is laid down by an extension of the so-called Sellke construction [1,10], which we describe in detail in §2.2.
There are several advantages of DSA. First, DSA does not require knowledge of the size of the susceptible population, which is almost always unknown in real epidemics and often assumed to be the population of the entire city, state, or even a country. In fact, DSA not only avoids this ad hoc adjustment, but also provides a ready estimate of the effective population size, tracking of which could provide further insights into an ongoing epidemic. Second, DSA does not require the whole epidemic trajectory and works with only a random sample of infection and, if available, recovery times. Third, on the strength of its survival analytic foundation, DSA is able to handle censoring, truncation and aggregation of data (over time and population) in a straightforward manner. We illustrate some of these features of the DSA method below.
The rest of the paper is structured as follows. §2 describes the stochastic model in terms of measure-valued processes and the so-called Sellke construction, along with their large population mean-field approximations. In §3, we describe the DSA modelling approach in detail before conducting extensive numerical analysis in §4. We apply the DSA method to the epidemics of foot-and-mouth disease (FMD) in the UK and COVID-19 in India. In §4, we also provide synthetic data analysis so that DSA could be compared against the true data-generating model. Finally, we conclude with a short discussion in §5. For the sake of completeness, additional mathematical derivations and numerical figures are provided in appendices, where we also compare the performances of Markovian and non-Markovian DSA on the FMD dataset.

Stochastic model
Because we want to keep track of important epidemiological covariates along with counts of individuals in different compartments, our primary tool will be measure-valued processes, which are naturally capable of carrying more information than raw population counts. The measure-valued representation will also allow us to turn an inherently non-Markovian model into a Markov model, albeit on a more abstract state space. While the age of infection ( §2) is the most natural choice for 'age', one may also use the notion of age to account for other important covariates that describe time since some specific event. For instance, the biological age, time since vaccination are important for certain infectious diseases. Therefore, we use the term 'age' in a broad sense and keep track of the ages of individuals with different immunological statuses (susceptible, infected, recovered/removed).
Suppose we have n susceptible and m infected individuals initially. We assume that m depends on n in the sense that m/ n → ρ as n → ∞ for some ρ ∈ (0, 1). Let us now define the following stochastic processes: where N S (t), N I (t) and N R (t) are, respectively, the total numbers of susceptible, infected and recovered individuals in the population at time t. The quantities s k (t), i k (t) and r k (t) are the ages of the kth susceptible, infected and recovered individuals (following some specific ordering convention). The set-function δ x is the Dirac measure, i.e. for a set A, the function δ x (A) takes value 1 if x ∈ A and 0 otherwise. The stochastic processes X S t , X I t and X R t keep track of the age distribution of the population of individuals. For instance, taking the 'age' for the infected individuals to represent the age of infection, X I t ð½3:5, 7Þ gives us the number of infected individuals whose ages of infection lie in the set [3.5, 7]. Now, define the stochastic process The process X t is a Markov process. Although we do not explicitly show the dependence of the stochastic process X t on the initial size of the susceptible population n, it is worth keeping in mind.

Contact intervals and infectious periods
We adopt the pairwise model of [8] to describe the dynamics of the epidemic process under the stochastic mass-action setup. There are two types of events: infection and natural recovery. In order to describe the intensities (of the Markov process X t ) corresponding to these two types of events, let us introduce two functions: b : R þ Â R þ ! R þ and g : R þ ! R þ . The function β(u, v) describes the instantaneous intensity of an infectious contact between a susceptible individual of age u and an infectious individual of age v. That is, the probability that a susceptible individual of age u will be infected by an infectious individual of age v in the next δt time unit is n − 1 β(u, v)δt under the stochastic law of mass-action, where δt is assumed infinitesimally small. In the language of the pairwise model [8] of infectious diseases, the function β characterizes the probability law of the contact intervals. The function γ is the hazard function that characterizes the probability law of the infectious period. Note that neither of these two probability laws needs to be exponential, even though X t itself is a Markov process (see [11] for a similar example in the context of a stochastic chemical reaction network (CRN)). The infection and natural recovery processes are assumed independent. We also assume that recovered individuals can no longer infect others or be infected. The stochastic process X t can be simulated by extending the standard Doob-Gillespie's stochastic simulation algorithm (SSA) in a straightforward manner. An alternative approach to simulating individual trajectories is the Sellke construction, which also provides the theoretical underpinning to the DSA approach. For the sake of simplicity, we will assume in the following that the function β(u, v) depends only on the age v of the infected individual and not on the age u of the susceptible individual, i.e. β(u, v) = β(v). This will allow for a simpler and a more intuitive description of the Sellke construction. The general case of β(u, v) is considered in appendix A.

Sellke construction
The classical Sellke construction [1] provides an alternative individual-based description of the standard stochastic mass-action (SIR) epidemic model. It can be shown that the resultant epidemic process is equivalent to the original population-level stochastic model in the sense that the counts of individuals with different immunological statuses have the same probability law under both constructions. However, the crux of the Sellke construction is that it describes the epidemic process in terms of individual survival probabilities (i.e. for an initially susceptible individual, the probability of remaining susceptible until time t). This is useful for parameter inference. The classical Sellke construction can be adapted to the age-structured epidemic model of ours in a straightforward fashion.
To each of the initial n susceptible individuals, we assign a threshold, an exponentially distributed random variable with mean one. Let U i denote the threshold corresponding to the ith susceptible individual. The random variables U 1 , U 2 , …, U n are independent. Let U (1) , U (2) , …, U (n) be the corresponding order statistics, i.e. U (1) ≤ U (2) ≤ … ≤ U (n) . Let us now define the cumulative infection pressure where, for a point measure n ¼ P n i¼1 d xi and a measurable function f, the notation 〈n, f〉 denotes the integration of the function f with respect to the measure ν, i.e. hn, fi : ¼ The epidemic process proceeds as follows: The first infection occurs when the cumulative infection pressure exceeds the smallest individual threshold, i.e. when AðtÞ ! U ð1Þ for the first time; the second infection occurs when AðtÞ ! U ð2Þ , and so on. Note that infected individuals recover following an infectious period that has a probability law characterized by the hazard function γ. Therefore, it is possible that the cumulative infection pressure becomes constant when the last infected individual recovers and there are no more infected individuals. Susceptible individuals whose thresholds are never exceeded by the cumulative infection pressure AðtÞ escape infection and never leave the susceptible compartment. Figure 1 provides a pictorial description of the Sellke construction. Let us denote the time of infection of an initially susceptible individual by T I . In essence, the Sellke construction specifies an individual-level survival function: The probability that an initially susceptible individual i remains susceptible until time t, conditional on the history (filtration) H tÀ of the epidemic process, is given by where U i Exponential ð1Þ is the threshold of the individual i. This survival probability will play a crucial role in devising the DSA-likelihood function. It is worth pointing out that the random variable T I is improper because some individuals may escape infection with positive probability. From the classical theory of stochastic epidemiology, we know that appropriately scaled population counts in CTMC-based epidemic models converge to solutions to ODEs in the large population (mean-field) limit [1]. They are a consequence of the FLLN-type approximation theorems for Markov processes [4,5]. The intuition is that the stochastic fluctuation, which is typically described in terms of a zeromean martingale after a Doob-Meyer decomposition of the counting processes around the mean, vanishes in the limit. A similar intuition holds true for measure-valued Markov processes. Indeed, the scaled process n −1 X t converges to a vector of deterministic measure-valued functions in the limit of n → ∞. Furthermore, when the limiting measurevalued functions admit densities, it is possible to describe them using partial differential equation (PDE) (e.g. see [12,13] and appendices).

Mean-field limit
We are interested in the limit of the epidemic process as n → ∞ with m/n → ρ, for some ρ ∈ (0, 1). Therefore, in the limit, the total scaled population size is (1 + ρ). We scale the system this way because we wish to interpret the susceptible curve as a survival function, which takes the value of one at zero. We shall elaborate further on this point in §3 on DSA.
Under some technical assumptions on the intensities and the initial population size (more precise statement in appendix A), the scaled stochastic process n −1 X t converges to a deterministic continuous function x R t Þ as n → ∞, where the components x S t , x I t and x R t are measurevalued functions. The main technical tools required to establish the convergence are borrowed from existing probability theory literature. In particular, similar techniques and derivations can be found in [12][13][14][15][16]. A brief, intuitive sketch of the proof of convergence of the scaled process n −1 X t to the deterministic function x t is provided in appendix A for the sake of completeness. royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20220124 The densities y S ðt, †Þ, y I ðt, †Þ and y R ðt, †Þ of x S t , x I t and x R t satisfy the system of PDE given in (A 5) in appendix A. Because of our simplifying assumption β(u, v) = β(v), it makes sense to integrate out the age component for the susceptible and the recovered individuals. Therefore, by defining z S ðtÞ : ¼ ð 1 0 y S ðt, sÞds, and z R ðtÞ : ¼ we can write the limiting system as follows: where S g is the survival function of the probability distribution characterized by the hazard function γ. That is, . Unfortunately, y I does not admit an explicit solution. However, efficient numerical methods exist. We describe the solution scheme that we adopted in appendix B. The limiting proportion of recovered individuals z R is also fully described by the limiting density y I of infected individuals For different choices of the functions β and γ depending on the particular infectious disease in question, one can solve (2.4) numerically and fit to data. Typically, one would assume a parametric representation of the functions β and γ, and then attempt to infer those parameters based on data. However, a common problem in epidemiological literature is that the choice of the likelihood function is often ad hoc and strictly speaking, unjustifiable. To this end, the DSA method [9,[17][18][19][20] provides, in a principled way, a likelihood function based on a random sample of transfer times. 1 In the next section, we describe the DSA method in greater detail.

Dynamic survival analysis and parameter inference
The DSA method combines dynamical systems theory and survival analysis. For a given dynamical system, typically described by ODEs or PDEs for population counts/proportions, the DSA method provides an alternative interpretation that characterizes probability laws of transfer times [9,17,19,21]. The mathematical underpinning is provided by a novel application of the Sellke construction. Rewriting (2.4) and with the initial condition z S (0) = 1, we immediately see which is precisely the limit of the survival function P(T I . t j H tÀ ) according to the Sellke construction in (2.3) as n → ∞. That is, Note that the random variable T I in the Sellke construction depends on n even though we do not show it explicitly to keep the notations simple. Therefore, the function z S , the limiting proportion of susceptible individuals, can be interpreted as a survival function. However, the survival function z S is improper because z S (∞) > 0. The quantity z S (∞) is precisely the limiting proportion of susceptible individuals that forever escape the infection. The survival function z S can be made proper by conditioning on individuals who get infected [9]. Another important observation is that the 'time to infection' random variables associated with the initially susceptible individuals become independent in the limit of n → ∞. This phenomenon is sometimes referred to as mean-field independence [21,22].

Likelihood contribution of infection times
Let us denote by θ the set of parameters required to describe the contact interval distribution in terms of β and the infectious period in terms of γ. On account of the Sellke construction, we can treat the function z S as an improper survival function for the (improper) random variable T I , the time to infection for an initially susceptible individual. Therefore, we can define the conditional probability density function (PDF) for the infection times, where τ T : = 1 − z S (T ). Also, set τ : = τ ∞ . The PDF f T is proper by virtue of the conditioning. Most epidemic and pandemic trajectories are only partially observed. A crucial advantage of the DSA approach is that it does not require the whole trajectory. Suppose we have a random sample of infection times t 1 , t 2 , …, t K from an epidemic trajectory observed partially until time T, for some finite, positive number T. Then, following the meanfield independence, the contribution of the infection times to the DSA likelihood function is given by The contribution ' I can be modified in a straightforward fashion if the infection times are censored and/or truncated [19].

Likelihood contribution of recovery times
Now, let us describe the contribution of the recovery times to the DSA likelihood. While the recovery times are often royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20220124 not observed, or only partially observed (with further possibility of censoring or truncation), when available they can be incorporated into the DSA likelihood function, rendering it more informative. There are two possible scenarios. Let us consider the simpler case first: we have a random sample s 1 , s 2 , …, s L of infectious periods. Then, denoting the PDF of the probability law characterized by the hazard function γ by r γ , the contribution of the random sample of infectious periods to the DSA likelihood function is given by Now, let us consider the second case: we do not directly observe individual infectious periods, but only observe recovery times. Suppose u 1 , u 2 , …, u M is a random sample of recovery times of M individuals whose infection times are unknown. They are precisely a random sample of the sum of two independent random variables: time to infection and infectious period. Therefore, we can define the convolution-form PDF conditional on the partially observed epidemic trajectory until time T, where Now, with the conditional PDF of the recovery times given in (3.4), we can write down the contribution of the random sample u 1 , u 2 , …, u M of recovery times as follows: The conditional PDF g T,θ , in general, does not admit a closedform expression. However, it can be computed numerically.

The DSA likelihood
Suppose we have a random sample t 1 , t 2 , …, t K of infection times, a random sample s 1 , s 2 , …, s L of infectious periods and a random sample u 1 , u 2 , …, u M of recovery times. Then, the DSA likelihood function is given by Note that it is not necessary to have data on recovery times. The likelihood contribution ' I (θ) is adequate for parameter inference. See [17], where parameter inference was done for the COVID-19 pandemic in the state of OH, USA, based only on infection times. When information on recovery times is unavailable, we simply set ' Often it is easier to work with the log-likelihood function. Therefore, for the purpose of parameter inference, we also define the DSA log-likelihood function The maximum-likelihood estimate (MLE)û of the parameter θ is then numerically obtained by maximizing the log-likelihood function LðuÞ. That is, We present numerical results in §4. For Bayesian methods, we need to introduce a prior for the parameter θ and then implement a Markov Chain Monte Carlo (MCMC) algorithm to approximate the posterior distribution of the parameter θ. However, we do not pursue the Bayesian path in this paper.

Mean-field limits as Chapman-Kolmogorov equations
An alternative way to view DSA is to interpret the limiting trajectory equations as satisfying Chapman-Kolmogorov equations (written in the differential form) for certain probability distributions. Let us pick a random individual embedded in an infinitely large population (mean-field) and follow in time. Let WðtÞ [ fS, S c g denote a timeinhomogeneous CTMC that keeps track of whether an individual is in the susceptible compartment (S) or not (S c ). We specify the time-dependent instantaneous transition rates of W(t) as follows: where x t is the mean-field FLLN limit of the stochastic pro- Then, following the previous discussion, DSA, in essence, is tantamount to writing It is straightforward to verify that p t satisfies d dt p t ¼ p t QðtÞ, ð3:11Þ which is the time-inhomogeneous Chapman-Kolmogorov equation (in the differential form) for the marginal distribution. It is in this viewpoint that we say the limiting mean-field equations given in equation (A 5) satisfy the Chapman-Kolmogorov equations for the probability distribution p t . It is worth mentioning that the time derivative (d/d t )p t gives us what is popularly known as the chemical master equation (CME) in the physical sciences literature. Note that our Chapman-Kolmogorov viewpoint is somewhat different from the notion of a generalized master equation (GME) [23,24] in that we are not attempting to describe the original stochastic system in §2 with equation (3.11), but rather constructing a Markov chain whose Chapman-Kolmogorov equations are given by the mean-field limit of the original stochastic process.
Viewing the limiting trajectory equations as satisfying Chapman-Kolmogorov equations also reveals that, if we have data only on individual infection times, the likelihood function ' I in equation (3.2) is essentially a Markov likelihood function. Here, we described the Chapman-Kolmogorov viewpoint on the simplistic state space fS, S c g. In general, we could construct a Markov process WðtÞ [ W : ¼ fS, I, Rg Â ½0, 1Þ that keeps track of the immunological status along with the age of the individual. Accordingly, DSA can be shown to be tantamount to describing the transition kernel for W(t) in terms of the mean-royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20220124 field trajectory equations x t and their densities y S , y I , y R . Since this viewpoint is only a side note and not the main aim of the paper, we leave the discussion for a future work. We do, however, refer interested readers to [25], where the authors use a stationary GME with memory terms and show that the effect of molecular memory is equivalent to the introduction of a feedback in the context of intracellular reaction processes. We also remark that in certain examples, e.g. [26], the non-Markovian formulation can be shown to be equivalent to a Markovian formulation in that the steady state of the non-Markovian process can be reduced to that of an equivalent Markov process.

Estimate of effective population size
In addition to giving a simple product-form likelihood function for θ, DSA also gives a ready estimate of the effective population size. Given k T , the number of cases observed by time T, the effective population size can be estimated by the discount estimatorn In similar vein, we can also estimate the final size of the epidemic as follows:k Refer to [9,17] for further discussions on this.

Numerical results
In this section, we demonstrate how the DSA method can be used for inference of model parameters from infectious disease outbreak data using the likelihood functions described in §3. Typical outbreak data consist of population-level aggregated counts (such as the daily number of newly positive cases). Hence, we use this scenario as a benchmark for numerical validation. At the beginning, we will analyse synthetic data and make several simplifying assumptions, which we will gradually remove in favour of more realistic models when considering datasets from real epidemic outbreaks, such as the FMD epidemic in the UK and the COVID-19 pandemic in India.

Synthetic data
We begin by carrying out DSA analysis on synthetic data. We begin by keeping the premise deliberately simple: we assume that the family of the infectious period is known in that the functional form of the hazard function γ (or the PDF characterized by γ) is known, but the parameters are to be inferred along with the initial condition of the PDE (A 5) and a constant infection rate, β. To this end, we begin by assuming the infectious period is a GAMMA random variable. The rationale behind this choice is the flexibility of the GAMMA distribution and its historical importance in the infectious disease epidemiology, as a natural generalization of the EXPONENTIAL distribution [27][28][29][30][31]. The proposed inference scheme, of course, works for any other distribution, such as the LOG-LOGISTIC or WEIBULL (see §4.2). All the code to reproduce the results in this section is available online, 2 and a brief description of the numerical scheme used to solve the PDE can be found in appendix B.

Description of data
The Sellke construction is an excellent means to generate exact simulations of an epidemic. We simulate an outbreak on a population of N = 10 000 individuals. Epidemics are run until no infected individuals are present in the population. Datasets consist of the series of infection and recovery times taken from the simulation, without noise nor delays. We consider three different scenarios, characterized by different availability of data: we either work with only recovery times, with only infection times, or with both. We generate 1000 datasets from the same initial conditions, to characterize the distribution of the estimates. Estimates are found by means of a mix of global and local optimization routines.
The objective is to infer the initial proportion of infected individuals ρ = 50/9950, the per-contact infection rate β = 0.25, and the parameters of the distribution of infectious period, which is a GAMMA distribution with mean μ = 9 and variance σ 2 = 6. Results are shown in figures 2 and 3.
We find that inference based on only infection times using the likelihood function ' I (θ) in (3.2) results in wider distributions for all inferred parameters, suggesting greater uncertainty, than inference based on both. This is expected because the likelihood function '(θ) in (3.7) is more informative than the likelihood function ' I (θ) in (3.2). In general, the true parameters are always near the mode of the distributions of the inferred parameters. It is worth noting that when the infection rate β is overestimated, the initial proportion of infected individuals ρ is underestimated, and vice versa. This suggests a potential statistical unidentifiability of the parameters. Outbreaks starting with a higher number of infected individuals but smaller transmission rate may be hard to distinguish from those that start with a smaller number of infected individuals but with higher transmission rate.
The mean and the standard deviation of the distribution of the infectious period are reported in figure 3. We observe that inference based only on infection times, in general, accurately captures the mean of the distribution of the infectious period but tends to overestimate the variance. The overall quality of inference improves significantly when recovery times are also available.

Foot-and-mouth disease
Let us now turn to real datasets. We consider the 2001 FMD outbreak in the UK. The outbreak began at the end of February 2001 and ended in September 2001, affecting more than 2000 farms. Policy makers' efforts to control the epidemic resulted in the culling of millions of herds and flocks [32]. Because of the specific interventions taken to control the outbreak, we interpret the infectious period in the DSA model as the time from when the disease hit a farm to the elimination of infected herds, i.e. the time to removal. Since this quantity is unlikely to be exponentially distributed, we fit a more flexible GAMMA distribution to it. For the contact interval distribution characterized by the hazard function β, we assume a WEIBULL distribution, which is in line with other methods present in the literature [33]. We note that both these choices may be viewed as generalizing the usual Markovian model based on two EXPONENTIAL distributions.
The dataset 3 consists of daily incidence of infected premises by time of report, {t i , I i }, with no information on removal times (figure 4). For each day t i , we distribute the number of new cases I i uniformly in the interval (t i−1 , t i ).
royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20220124 Furthermore, we consider only the first 80 days of data, to exclude the noisy tail and potentially confounding effects of strict measures. These simplifying assumptions allow us to maximize the likelihood ' I (θ) in (3.2). Since the original data points are too noisy, we consider the 7-day moving average of the counts, starting from day 6. This results in a smoother dataset that is less noisy, although a bit delayed with respect to the true one.
Maximum-likelihood estimates are obtained by means of a mix of global and local optimization routines. The distributions of inferred contact interval and infectious period are shown in figure 5. The shapes of the inferred distributions are in line with findings from other studies of same outbreak [34]. Our model with WEIBULL contact interval distribution and GAMMA infectious period does not consider the incubation period explicitly. Once both infectious period and contact interval distributions are known, we can find R 0 using the formula R 0 ¼ Ð 1 0 S g ðtÞbðtÞdt [35], where S γ , we recall, is the survival function of the infectious period distribution. This gives a point-estimate of R 0 = 2.55. Finally, the effective population size inferred wasn T ¼ 2284.
We compute confidence intervals using a bootstrap method, which we describe now. We first solve the limiting PDE (A 5) with the MLE estimates. From the solution, we compute the distribution of infection times PDF (3.1). This distribution is used to generate n = 500 synthetic datasets with as many datapoints as the original one, consisting of simulated dates of infections, on which we repeat the inference. Each new set of inferred parameters is then used to produce both the estimate for R 0 (shown in figure 11) and the (t, I(t)) incidence curve that we can compare against the true data.
Finally, when computing confidence intervals, we compensate for other sources of noise that cannot be explicitly accounted for in our the model but are present in realworld data, such as testing limits, day-of-the-week effects and various sources of delays. This variance-adjustment is done by inflating the confidence intervals by a factor determined by taking the square root of the variance between the data points and the 7-day moving average. Results are shown in figure 6. As can be verified, the trajectories do capture the epidemic trend quite well in that all the data points lie within the variance-adjusted 95% confidence interval.     figure 13 in appendix C. The DSA model is better able to capture the dynamics of the real epidemic, and gives substantially more reliable predictions, even when only a few observations are taken into account. Another important aspect is the interpretability of the results. The DSA model allows us to get a realistic idea of how both the contact interval and the infectious period distributions appear, whereas standard EXPONENTIAL-EXPONENTIAL models do not provide much insight, beyond looking at the rate of the underlying EXPONENTIAL distributions. Another important result that the DSA model is able to achieve is the estimation of the effective population size. This can be interpreted as the number of farms that were potentially involved in the dynamics, and it is therefore an upper limit on the total number of farms that might become infected.

Third wave of COVID-19 in India
The analysis of FMD outbreak data makes use of only infection times. As the synthetic data analysis suggests that inference based only on infection times tends to be poorer compared to when both infection times as well as recovery times are available, we now analyse an epidemic where both times are available.
In a global effort to document and control the ongoing COVID-19 pandemic, many governments provided freely available population-level datasets that we can use as case studies for inference when both infection and recovery times are known. Various countries adopted strong nonpharmaceutical measures that drastically changed the local dynamics of the epidemic, resulting in several distinct epidemic waves. At the same time, new SARS-CoV-2 variants emerged with markedly different epidemiological characteristics. To curtail the impact of such exogenous factors, we consider only the third wave in India. 4 Data consist of daily incidence and prevalence of cases, recoveries and deaths, meaning that we have data to inform both likelihoods in      royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20220124 Similar to our approach on the FMD data, daily cases are distributed uniformly across the day. Because the DSA method requires only a random sample of infection and recovery times, we work with a dataset generated by taking a random sample (without replacement) of size 3000. We do not consider exogenous factors such as under-reporting of cases as they are beyond the scope of this paper. It is worth noting, however, that these exogenous factors surely have an impact on the results and can be accounted for by a more refined model.
The best-fitting inferred contact interval and infectious period distributions are shown in figure 9. There are roughly in line with estimates of viral load and recovery distributions, respectively, from the literature [36]. The point estimate for the reproduction rate is R 0 = 1.69. Although R 0 of the SARS-CoV-2 Delta variant is estimated to be in the range 3-8 [37], it is more realistic to compare our estimate with R t calculated from observed cases in that period, as our model uses only that source of information. The recovery distribution has a mean of 5.6 days and a variance of 26 days, so it is rather wide and right-skewed. The contact interval distribution is more peaked, with a slightly lower mean (around 4.5 days) and a variance of roughly 10. It is important to notice that infection times represent the collection of specimens from infected individuals, and recovery times follow country-specific healthcare system protocols, so they do not necessarily coincide with the true infectious distributions. Furthermore, the infectious period starts immediately after the incubation time has passed, while time to recovery is usually calculated from the onset of symptoms. Finally, the estimated effective population size is 31 million people. This is likely an underestimate because of the underreporting of cases in India during the Delta wave.
Confidence intervals are computed in a similar way to the FMD analysis, with two major differences: the 7-day moving averages result in a curve that is too delayed with respect to the actual one because of exponential growth/decline. Although this effect may be accounted for by considering exponential moving averages, we preferred not to modify royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20220124 the data that way. For a similar reason, computing the variance-adjusted confidence intervals that take into account all the noise that cannot be explained by the model is not possible. Therefore, the confidence intervals, displayed in figure 10, underestimate the true variability of the underlying process, but seem to be generally in good agreement with the data. Interestingly, repeating the inference on different subsets of the original dataset does not produce significantly different estimates for the two distributions of interest. This suggests that the method is robust, not only because we have many data points to inform the likelihood, but also because we consider both the infection times and the recovery/death times. The distribution of the estimates of the reproduction number is shown in appendix C (figure 12).

Discussion
In this paper, we presented a method called DSA to both model and infer parameters of non-Markovian epidemic models. A crucial advantage of DSA is that it makes the entire toolkit of survival analysis available for making inference on dynamical systems. Therefore, DSA handles censored, truncated data in a straightforward and principled way. For instance, see [20] for an application of the DSA method adapted to a simple Markovian susceptible-exposed-infected-recovered (SEIR) model, where a snapshot of COVID-19 positivity data gathered through mass testing is used to analyse transmission in an Ohio prison. The analysis helped uncover the grave COVID-19 situation in correctional facilities in Ohio. Also, see [18] where we used the DSA approach coupled with approximate Bayesian computation (ABC) method to quantify the population-level effect of the mass vaccination campaign against COVID-19 in Israel. The analysis further helped quantify the indirect effect of vaccination on the unvaccinated young population in Israel. In [19], the DSA method was used to analyse the individual-level epidemic data from the Ebola pandemic in the Democratic Republic of Congo, suggesting success of the ring vaccination and contact tracing efforts evident from much lower estimates of the effective population size than previous analyses.
In this paper, we adopted the law of mass-action to model the interactions among the individuals for the sake of simplicity. Under the law of mass-action, an infected individual can potentially infect any susceptible individual in the population. This is in contrast with network-based models, where infected individuals can only infect their neighbours (connections defined by the graph adjacency matrix) [17,[37][38][39]. However, inferring the underlying network structure is a nontrivial task and often infeasible. This is particularly true when the underlying network exhibits complex substructures [41]. Therefore, the mass-action models are still routinely used despite being unrealistic in many epidemics. Nevertheless, an immediate future direction for us would be to develop the DSA methodology for a non-Markovian network model.
The crux of the DSA methodology lies in the change in perspective about dynamical systems-one that views them as describing probability distributions of times of infection and recovery, as opposed to describing (scaled) counts. As such, the method is completely general and could be quickly adapted to the particular setting of any infectious disease. We hope the software package [42] will help translate the DSA methodology into a useful practical tool in modern infectious disease epidemiology. royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20220124 All authors gave final approval for publication and agreed to be held accountable for the work performed therein.
Conflict of interest declaration. We declare we have no competing interests. Funding. E.K. and W.R.K. were supported by the National Institute of Allergy and Infectious Diseases (NIAID) grant no. R01 AI116770, and G.A.R., E.K. and W.R.K. were supported by the National Science Foundation (NSF) grant no. DMS-2027001. I.Z.K. and F.D.L. acknowledge support from the Leverhulme Trust for the Research Project grant no. RPG-2017-370. The content is solely the responsibility of the authors and does not represent the official views of the NSF or NIAID.

A.2. Assumptions
It is sufficient to assume that the global jump rates (in terms of the instantaneous intensity functions β and γ) of the Markov process X t are bounded above by a positive, finite quantity and that the initial population size does not explode in the sense that E[n À1 ðN S ð0Þ þ N I ð0ÞÞ] , 1 in order to ensure the trajectory equation (A 1) admits a unique path-wise solution ðX S In addition to the assumption of the global jump rates (in terms of the instantaneous intensity functions β and γ) of the Markov process X t being bounded above by a positive finite quantity, we also assume that the intensity functions β and γ are continuous. In order to study the FLLN approximation of the scaled process n −1 X t , we further assume a finite second moment condition on the initial population size. That is, we assume sup n E[n À2 ðN S ð0Þ þ N I ð0ÞÞ 2 ] , 1. Finally, we assume the initial age distribution does not explode.
Note that the assumptions about the initial size of the population are satisfied because n is chosen to be the size of the initial susceptible population and m/n → ρ ∈ (0, 1), as mentioned in §2. With the above technical assumptions, we are now ready to study the moments of the stochastic process X t and associated martingale processes.

A.3. Moments and martingale properties
Note that the components X S t , X I t and X R t of X t satisfy the stochastic integral equations described in (A 1). Then, for a sufficiently large class of test functions f : (a, s) → f s (a), the component measure-valued processes satisfy  ðA 2Þ For different choices of the test function f, (A 2) can be used to study various moments of the component measure-valued processes X S t , X I t and X R t . Moreover, (A 2) allows us to study certain martingale processes associated with the stochastic process X t . For susceptible, infected and recovered compartments, define the stochastic processes PDE (e.g. [12,13]): (@ t þ @ s ) y S ðt, sÞ ¼ Ày S ðt, sÞ ð 1 0 bðs, uÞy I ðt, uÞdu, (@ t þ @ s ) y I ðt, sÞ ¼ ÀgðsÞy I ðt, sÞ and (@ t þ @ s ) y R ðt, sÞ ¼ 0, We set y R (0, s) = 0 for all s [ R þ in keeping with our assumption that initially there are no recovered individuals. One can interpret y S (t, s), y I (t, s) and y R (t, s) as the densities at time t of susceptible, infected and recovered individuals at age s. The limiting system of PDE in (A 5) is linear in y S , y I and y R , but non-local. Now, since we have assumed the global jump rates are bounded above by a finite-positive number, we can show that the solutions remain bounded on finite-time intervals. In order to prove the uniqueness of the solutions, we can show that the distance between two possible solutions must vanish by invoking Grönwall's lemma and by virtue of the fact that the solutions remain bounded on finite-time intervals.
Appendix B. Numerical scheme to solve the mean-field PDE In this section we describe the numerical schemes used to solve the PDE equation (2.4). In numerical terms, equation (2.4) is, inside the domain, an advection equation with one spatial dimension, in which the characteristics move at velocity U(x, t) = 1, and with a forcing term given by the righthand side term −γ(s, t)y(s, t). Such equations are well known and can be solved with an explicit Semi-Lagrangian scheme [45,46]. The potential source of numerical instability comes from the nonlinear non-local boundary condition equation (2.5). We have opted for a numerical scheme that combines the explicit Semi-Lagrangian approach inside the domain and an implicit method to treat the solution at the boundary. Note that at the boundary we need to compute a scalar quantity; therefore, implementing an implicit method does not have a notable impact on the run-time of the numerical scheme itself, while improving the stability of the solution.
We define a mesh with spacing ΔX = 1/M and ΔT = 1/N, and points s i = iΔx and t n = nΔT, with 0 ≤ i ≤ M and 0 ≤ n ≤ N.
The discretized set of equations is then: yðt nþ1 , s i Þ À yðt n , s i À ðDX=DTÞÞ DT For simplicity, we use ΔX = ΔT, so that s i − (ΔX/ΔT ) = s i−1 . In algorithm B.1, we outline our implementation of the code. This returns z s (t) and z I ðtÞ ¼ Ð 1 0 y I ðt, sÞ ds. It is straightforward to modify it to return y I (t, s).
We report the estimates for R 0 from the bootstrap analysis of the FMD data ( figure 11) and COVID-19 Delta wave in India (figure 12) (see §4). Results are based on 500 bootstrap samples obtained from simulating infection/recovery times with parameters given by the MLE. Additionally, we report the cumulative incidence distribution for the prediction of the FMD compared to that of a standard SIR model ( figure 13). Finally, we show the histogram of the distribution of effective population size from bootstrap analysis in figure 14.