A Kermack-McKendrick model with age of infection starting from a single or multiple cohorts of infected patients

During an epidemic, the infectiousness of infected individuals is known to depend on the time since the individual was infected, that is called the age of infection. Here we study the parameter identifiability of the Kermack-McKendrick model with age of infection which takes into account this dependency. By considering a single cohort of individuals, we show that the daily reproduction number can be obtained by solving a Volterra integral equation that depends on the flow of new infected individuals. We test the consistency our the method by generating data from deterministic and stochastic numerical simulations. Finally we apply our method to a dataset from SARS-CoV-1 with detailed information on a single cluster of patients. We stress the necessity of taking into account the initial data in the analysis to ensure the identifiability of the problem.


Introduction
The existence of an individual infection (or contagiousness) period of variable length among infected individuals and during the history of an epidemic is a proven fact in all contagious diseases. The origin of this variability is multiple. It may be due to (i) a variation in the symptomatic state of the infected person (such as the change in frequency and intensity of the cough, source of emission of the infectious agent) due to variable immune defenses since the beginning of his infection (e.g., due to the transition between initial innate immune reaction and secondary adaptive reaction); (ii) a variation in the environmental conditions of transport and survival of the infectious agent in the atmosphere (heat, humidity, sunshine, altitude, etc.), in a more or less favorable socio-sanitary environment presenting different spreading characteristics (as compliance with distancing or confinement, existence of preventive vitamin supplementation, educational level, food quality, demographic isolate, etc.); (iii) a variation in the state of defense of the final host (immune defense and self-protection); (iv) a variation in the virulence of the infectious agent, which can mutate or possibly change of intermediary host.
Due to the complexity of the contagion mechanisms and of their variations, accentuated by the fact that the rate of the epidemic may be the result of several geographically distant clusters starting at different times, it is preferable, in a first approach, to assume that the initial conditions are reduced to a single cluster. The challenge is then to estimate, on each day of the infection period, the transmissibility rate allowing the calculation of the daily reproduction number of the contagious disease.
Continuous time model: Recall that the age of infection a is the time since individuals become infected. The major difficulty in comparing the data and the Kermack-McKendrick model with age of infection is to identify: 1) the initial distribution of infected with respect to the age of infection; 2) the daily reproduction number R 0 (a) which is the reproduction number at the age of infection a (i.e. the average number of secondary cases produced by a single infected at the age of infection a). We can decompose the daily reproduction number as follows , where (A) is the rate of transmission rate at time t 0 (we assume the transmission rate to be constant during the period where R 0 (a) is evaluated), (B) is the number of susceptible individuals at time t 0 (we assume the number of susceptibles to be constant during the period where R 0 (a) is evaluated), (C) is the probability that an infected who have been for a days is infectious i.e., capable to transmit the pathogen (or the fraction of infectious among the infected with age of infection a), (D) the probability for an infected for a days to be still infected.
Then the basic reproduction number (i.e. the number of secondary cases produced by a single infected) is given by Here we partly solve the problem of finding the initial distribution of infected by assuming that we start the epidemic at time t 0 with a single cohort of I 0 new infected patients. That is, the epidemic starts with I 0 infected patients all with age of infection a = 0. The case of an epidemic starting from a single infected patient (usually called the patient 0) corresponds to the case I 0 = 1. This is a common assumption in epidemiology. Note that the time t 0 at which the first patient becomes infected is also unknown for most epidemics.
Assume that the epidemic starts at time t 0 with a cohort of I 0 new infected patients (i.e., all with age of infection a = 0). Then N (t) the flow of infected at time t satisfies the model starting from a single cohort of infected The equation (1.1) will be derived as an extension of the Kermac-McKendrick model starting from a single cohort of infected (see equation (4.5)). This model remains valid as long as the transmission rate τ (t) and the number of susceptible hosts S(t) remain constant. So this model is valid when the epidemic starts.
Assume that I 0 is fixed and the function a → R 0 (a) is given. Then the map t → N (t) can be obtained by solving (1.1). The goal of the article is consider the converse problem. That is, assume that I 0 is fixed and assume that t → N (t) is given from the data. Then the map a → R 0 (a) can be obtained by solving the Volterra integral equation Therefore if the map t → N (t) is known, we can theoretically derive the average dynamics of infection at the level of a single patient.
The standard assumption used in the literature is where a + > 0 the maximal age of infectiousness for an infected patient.
Then for t ≥ t 0 + a + , the equation (1.1) becomes Now assume for example that N (t) = N 0 e λt , we obtain after simplifications a standard characteristic equation Assume that λ > 0 is given. Consider a → χ(a) any non-negative and non-null continuous function satisfying Then R 0 (a) = χ(a) a+ 0 χ(s) × e −λs ds satisfies (1.5). Therefore, neglecting the initial value in the Volterra equation leads to a non-identifiable problem (in general). This shows the crucial role of the initial value in identifying the function a → R 0 (a). Day by day model: The model (1.1) with a single cohort of infected becomes a discrete Volterra equation where (I) is the number of infected produced directly by the I 0 infected individuals already present on day t 0 ; and (II) is the number of new infected individuals at time t produced by the new infected individuals since day t 0 .
Next, by setting a = t − t 0 , we obtain the day by day equation for the daily reproduction number In the above formula and throughout the paper, we use the following convention for the sum m d=k = 0, whenever m < k.
If we consider the first terms of the discrete time Volterra equation (1.6), we obtain . . .
In practice, we can assume that R 0 (0) = 0 since infected individuals are not infectious immediately after being infected. Under this additional assumption, we obtain the system . . .
When reliable information is available on the first cluster(s), the best formula for calculating daily basic reproduction numbers is equation (1.2) (or its the discrete time version (1.7)). Based on (1.4), some methods have been developed in the literature to cope with the lack of a precise information.
For instance in [2,3], the authors following [27] propose an optimization algorithm for estimating the daily basic reproduction numbers. In [4], D. Bernoulli mentions in 1760 the changes in the contagiousness parameters and places as a crucial challenge for the prediction of the transition between endemic and epidemic peaks in a prophetic sentence: "Le retour d'une épidémie longtemps suspendue fait un ravage plus terrible dans une seule année qu'une endémie uniforme ne pourrait faire pendant un nombre d'années considérable" (The return of a long-suspended epidemic wreaks more terrible havoc in a single year than a uniform endemic could do for a considerable number of years). In [10], the authors use a deconvolution algorithm for calculating the daily basic reproduction numbers. In each case, the problem of the initial conditions is evoked at best only through the hypothesis of a unique "patient zero". Despite the considerable means of current investigation, in particular those of the WHO and the members of the government of the WHO, it is rare that this patient is identified (this was the case for H1N1 in Mexico). The patient zero, also called index or primary case is the first patient identified in a given population during an epidemiological investigation. It points out the source of the spread of a disease in a given reservoir, but this search is in general very difficult as it was the case for HIV in North America [32].
One of the main difficulties in estimating the R 0 (a) function is its non-identifiability in general. Recent studies [8,9,16,17,25] developed methods to identify the various parameters for the COVID-19 pandemic by using cumulative reported cases data and differential equations models. Differential equations can be written in the form studied here by assuming that R 0 (a) (or equivalently, β(a)) is independent of a. Suppose that we are restricted to a period when the data is growing exponentially fast. If we take a fixed function β(a), then by adapting the method developed in [25], we could identify a transmission rate τ so that the output of the model stays very close to the data, for any function β(a). The same could be achieved with a good phenomenological description of the data by using the method developed in [8,16,17] with a time-dependent transmission rate. This means that the reported cases data is not sufficient to determine accurately the function R 0 (a). Without a good description of the initial distribution, it is hopeless to identify R 0 (a) by using reported cases only.
In this article, we first extend the Kermack-McKendrick model to initial conditions that are a linear combination of Dirac masses. The PDE model (2.1) can not be extended in the space of measures (due to a lack of time continuity for the solutions). However, the Volterra integral equation can be extended and still makes perfect sense whenever we use Dirac masses for the initial condition. In many real examples, the initial distribution must be a linear combination of Dirac masses since the data are discrete at the early stage of an epidemic (in a city, a country). Indeed, at the early beginning of an epidemic, the epidemic starts from a few cases imported from other places. Therefore Dirac masses make perfect sense. In practice, the early stage of an outbreak is often undocumented and generally difficult to determine. But our study applies to data from a finite collection of clusters that is easier to determine using contact tracing.
Consequently, for the single cohort model, we can reverse the problem, and by assuming that the daily number of new infected is known, we can compute the daily reproduction number by solving a Volterra integral equation. The daily basic reproduction number informs us about the dynamics of infection at the level of a single patient. Therefore, knowing R 0 (a) should help the medical doctors decide about quarantine measures. Reported case data for clusters are particularly valuable for reconstructing the dynamics of infection at the level of a single individual.
In this paper, we also provide an Individual-Based Model (IBM) (see Appendix B). This IBM converges to the deterministic model whenever the initial number of infected increases. We use this IBM to generate sample data to test our method and compute the daily basic reproduction number. This will allow us to test the effects of the day-to-day discretization (on the data) and the impact of stochastic perturbations on the daily reproduction number. We conclude the paper by applying our approach to a cluster of SARS-CoV-1 in Singapore.
The plan of the paper is the following. In Section 2, we recall the Kermack-McKendrick model with age of infection. We explain how to derive the Volterra formulation of the model, and we compare it with the Kermack-McKendrick SI model with age of infection (ODE model). In Section 3, we explain how to connect the model with the data. In Section 4, we extend the Kermack-McKendrick model with age of infection in the case where the epidemic starts from a single or multiple cohorts of infected individuals. In Section 5 we derive an equation to compute the daily reproduction number from the data. In Section 6 we consider a day by day discretized Kermack-McKendrick model with age of infection. In Section 7, we run some numerical simulations, we compare the deterministic model with a stochastic individual based simulation presented in appendix. In Section 8, we compare the model with some data from SARS-CoV-1, and we discuss the data from SARS-CoV-2. i(t, a)da is the number of infected at time t with infection age between a 1 and a 2 . Therefore the total number of infected individuals at time t is Let β(a) ∈ [0, 1] be the probability to be contagious or infectious (i.e. capable to transmit the pathogen) at the age of infection a. The quantity β(a) can be interpreted as the fraction of infected individuals with age of infection a that are infectious. Then the total number of contagious individuals (or also called infectious individuals) (i.e., the individuals capable of transmitting the pathogen) at time t is The model of Kermack-McKendrick [22] with age of infection is the following, for each t ≥ t 0 In the model, S(t) is the number of susceptible individuals at time t, and t → τ (t) is the transmission rate at time t, and ν ≥ 0 is the rate at which individuals die or recover. Here, the parameter ν is assumed to be independent of the age of infection a. This is a simplifying assumption to improve the readability of the paper. The parameter ν combines both the specific fatality rate and the recovery rate.
The above equation can be understood first as follows where (I) is the flow of new infected, and (II) is the flow of individuals who die or recover.
We make the following assumption.
Assumption 2.1. We assume that (ii) The probability to be infectious at the age of infection a → β(a) ∈ L ∞ + (0, +∞) is a non-negative and measurable function of a which is bounded by 1;

Volterra integral equation formulation of the model
In the model (2.1), the quantity is the flow of new infected individuals at time t.
By using the S-equation in system (2.1), we obtain By integrating the i-equation of system (2.1) along the characteristics, we obtain (2.5) By using (2.5), we deduce that t → N (t) satisfies the following Volterra integral equation where (I) is the flow of new infected individuals at time t produced by the infected individuals already present on day t 0 ; (II) is the flow of new infected individuals at time t produced by the new infected individuals since day t 0 .
By using equations (2.4) and (2.6), we can summarize the epidemic model (2.1), by saying that t → N (t) is the unique continuous map satisfying where and The function Λ(t) is the number of infectious individuals (capable to transmit the pathogen) at time t among the infected individuals already present at time t 0 .
The function t → Λ(t) plays a fundamental role in solving the Volterra equation. Indeed, the quantity is the number of infected produced between the instants t 1 and t 2 by the infected already present at time t 0 . So, for example, if no new infected are produced by the infected already present at time t 0 , that is if then there will be no new infected at all after the time t 0 , that is The function t → Λ(t) can be regarded as the initial distribution of the Volterra integral equation (2.7).

Connecting the data and the model
The data are represented by the function t → CR(t) which is the cumulative number of reported cases at time t. We assume that the flow of reported cases is a fraction 0 ≤ f ≤ 1 of the flow of recovering individuals, that is By using (2.5), we can compute the number of infected at time t. That is where is the total number of infected at time t 0 . By using equations (3.1) and (3.2), we obtain or equivalently (by using the change of variable σ = t − a) By choosing t = t 0 we obtain and by differentiating both sides of the above equation, we obtain Therefore we obtain the following connection between the data and the model.

Connection between the data and the model
Let t → CR(t) be the cumulative number of reported cases. Then the initial number of infected is given by and the flow of new infected individuals N (t) at time t is given by

Kermack-McKendrick model starting from a single and multiple cohorts of infected patients
The major difficulty to compare the model (2.4) with the data is to identify the functions a → i 0 (a) and a → β(a). To simplify the discussion, let us consider the model at the early stage of the epidemic. When the epidemic just starts we can assume that the transmission rate t → τ (t) remains constant, and the number of susceptible individuals t → S(t) is constant and equal to S 0 . Under such a simplifying assumption the Volterra equation (2.4) becomes (4.1)

Initial distribution for a single cohort of infected with age of infection a = 0
In order to understand the mathematical concept of Dirac mass centered at 0, we first consider an approximation by an exponential law i 0 (a) = I 0 κ e −κa , mean and standard deviation equal to 1/κ. Then a Dirac mass centered at 0 can be understood as the limit of such a distribution when κ goes to +∞. The limit of needs some explanations. Recall that is the initial number of infected individuals with infection age a in between a 1 and a 2 at time t = 0. We deduce that That is to say that, when κ tends to +∞, the initial distribution of population i 0 (a) is approaching the case where all the infected individuals at time t 0 have the same age of infection a = 0.
For short, we write where δ 0 (a) is called the Dirac mass centered at 0.

Model starting from a single cohort of infected with age of infection a = 0
Recall that In order to derive the Kermack-McKendrick model with Dirac mass initial distribution as limit, we first need the following result.
Proof. Let ε > 0. We observe that Then we have The result from the fact that Then by using (2.6), the Kermack-McKendrick model can be reformulated for t ≥ t 0 , as the following system By taking first a formal limit when κ → +∞, we obtain the model starting from a single cohort of infected.

Kermack-McKendrick model starting from a single cohort of infected
Assume that the initial distribution of infected only contains a single cohort composed of I 0 individuals all with age of infection a = 0 at time t 0 . Then the flow of new infected t → N (t) is the unique continuous solution of the Volterra integral equation where S(t) is obtained from (2.4).
The following theorem says that the model with a single cohort of infected extends the earlier model of Kermack-McKendrick with initial distribution in L 1 . This theorem is a consequence of Lemma 4.1 and the continuity of the semiflow generated by the Volterra integral equation. We refer to Ducrot and Magal [14] for more results on this topic.  Remark 4.3. When the initial distribution is a Dirac mass centered at a = 0, the total number of infected individuals at time t is and the number of infectious individuals at time t is

Kermack-McKendrick model starting from multiple cohorts of infected
Assume that the initial distribution of infected consists in n ≥ 1 cohorts of infected with age of infection a 1 < a 2 < . . . < a n at time t 0 . That is i 0 (a) = I 1 0 δ a1 (a) + . . . + I n 0 δ an (a).
where I j 0 is the number of infected in the j th -cohort at time t 0 . Then the flow of infected t → N (t) satisfies the following Volterra integral equation where S(t) is obtained from (2.4).

Basic reproduction number for the extended model (4.3)
In this section, we assume that the transmission t → τ (t) is constant equal to τ , and t → S(t) is constant equal to S 0 .
Define the daily reproduction numbers Assuming that the number of susceptible individuals t → S(t) is constant and equal S 0 in the N -equation (4.3), then we obtain By using the change of variable s = t − t 0 , Replacing the notation s by t, and define By replacing N t0 (t) by the right hand side of (4.6) in the integral term of (4.6) we obtain where the convolution is defined by and for each integer n ≥ 3, We can interpret the N -equation (4.5) concretely as follows Flow of infected produced at time t by the first I0 infected individuals Flow of infected produced at time t by the second generation of infected individuals Flow of infected produced at time t by the third generation of infected individuals + . . .
Flow of infected produced at time t by the n th generation of infected individuals + . . .

Basic reproduction number
The total number of the first generation of new infected produced by a single infected patient with age of infection a = 0 at time t = t 0 is called the basic reproduction number. That is The flow of the first generation of new infected produced by a single infected patient have been infected for a days is called the daily reproduction numbers. When the time scale is one day, the function R 0 (a) is also the average daily number of case produced by a single patient at the age of infection a.
Proposition 4.4. The total number of cases produced by the n th generation of infected resulting from a single infected patient is Proof. By using Fubini's theorem we have and by making the change of variable l = t − a we obtain and the result follows by induction.
5 Computing the age dependent reproduction number Γ(a) from the data By using (4.3), we obtain the following result.
Computing Γ(a) from the data Assume in addition that the parameters t 0 , S 0 > 0, I 0 , ν > 0, and the function t → τ (t) are given. Then the function t → Γ(t) can be obtained from the flow of new infected t → N (t), as the unique solution of the Volterra integral equation where S(t) is obtained by using (2.4).
Remark 5.1. Assume that patients can not transmit the pathogen when the age of infection is above a + > 0. That is Γ(a) = 0, ∀a ≥ a + .
Then the equation (5.1) becomes for all t ≥ t 0 + a + ,

Some explicit examples of a → R 0 (a)
We assume that the transmission t → τ (t) is constant equal to τ , and t → S(t) is constant equal to S 0 . Then and by setting a = t − t 0 , the equation (5.1) becomes

Day by day Kermack-McKendrick model with age of infection
The variation of the number of susceptible individuals S(t) is given each day t = t 0 , t 0 + 1, . . ., by where S 0 is the number of susceptible on day 0, S(t) is the number of susceptible on day t and N (d) is the daily number of new infected individuals on day d. By analogy with the equation (2.6), the daily number of new infected individuals satisfies the following discrete time Volterra integral equation for all ∀t = t 0 , t 0 + 1, t 0 + 2, . . . ,

Day by day single cohort model and daily basic reproduction number
Assume that t → τ (t) equal τ 0 , and t → S(t) is constant equal to S 0 . Assume that the epidemic starts at time t 0 with a cohort of I 0 new infected patient (i.e. with age of infection a = 0). The model (4.5) with a single cohort of infected becomes a discrete Volterra equation We obtain the day by day equation for the daily reproduction number 7 Numerical simulations

Comparison of deterministic and stochastic simulations
In the simulations, the unit of time is one day, and we fix S 0 = 10 7 = 10 000 000, 1/ν = 9 days, and R 0 = 1.1.
For each function β(a) described below, the parameter τ is obtained numerically by using the following formula where the integral is computed by using the Simpson integration method.
In the following, we use the numerical scheme described in Appendix A to run the simulation of the Volterra integral equation (4.3)-(4.1). We use the Individual Based Model (IBM) described in Appendix B to run the stochastic simulations of the model. In the following, we illustrate the convergence of the IBM to the deterministic model whenever I 0 increases.

Example 1
We assume that the probability to be infectious is a shifted gamma like distribution. That is with a 0 = 3 days, β 0 = e/2 = 1.3591, and β 1 = 1/2 = 0.5. In the Figures 2-3 we use the IBM to investigate some properties of the clusters obtained from the stochastic simulations. We compare such a stochastic sample with the original a → R 0 (a).
By comparing  1 000), we apply the discrete-time equation (6.5) to reconstruct a → R 0 (a) from the trajectories of the deterministic or stochastic models. In the deterministic model, we observe the effect of the day-by-day discretization (which corresponds to the daily reported data). In the stochastic case, we observe the effect of the stochasticity of the IBM. Simulations for I 0 = 10: In Figures 6-8 we focus on the reconstruction of the daily reproduction number R 0 (a) = τ S 0 e −ν a β (a). In Figure 6, we focus on the reconstruction of the daily reproduction number from deterministic simulations, while in  Figure 6: On the left hand side, we plot the daily basic reproduction number by using the original formula R 0 (a) with (7.1). On the right-hand side, we apply formula (6.5) to the flow of new infected obtained from the deterministic model. We vary I 0 = 6, 10, 14. The value I 0 = 10 corresponds to the value used for the simulation of the deterministic model. The yellow curve gives the best visual fit, and the R 0 (a) becomes negative whenever I 0 becomes too small.    4), and we compare it with the cumulative number of cases obtained from 500 runs of the IBM. On the right-hand side, we plot the average values of the 500 runs obtained from the IBM as well as the quantiles (10% − 90% (light blue) and 25% − 75% (blue)).
In Figures 11-13 we focus on the reconstruction of the daily reproduction number R 0 (a) = τ S 0 e −ν a β (a). In Figure 11, we focus on the reconstruction of the daily reproduction number from deterministic simulations, while in Figures 12-13 we focus on the reconstruction of the daily reproduction number from stochastic simulations.  Figure 11: On the left hand side, we plot the daily basic reproduction number by using the original formula R 0 (a) with (7.1). On the right-hand side, we apply formula (6.5) to the flow of new infected obtained from the deterministic model. We vary I 0 = 600, 1000, 1400. The value I 0 = 1000 corresponds to the value used for the simulation of the deterministic model. The yellow curve gives the best visual fit, and the R 0 (a) becomes negative whenever I 0 becomes too small.   Figure 13: On the left-hand side, we plot the daily number of cases t → N (t) (for t = 0, 1, 2, . . .) obtained by summing the daily number of cases for 500 runs of the IBM. On the right-hand side, we apply formula (6.5) (with I 0 = 500 × 1000) to the daily number of cases obtained from the IBM.

Example 2
It is common to see biphasic flu clinically: after incubation of one day, there is a high fever, then a drop in temperature before rising again, hence the term "V" fever [7]. Such a biphasic contagiousness is also observed in Covid-19. The viral load in throat swab and sputum has been measured for Covid-19 patients, which leads to biphasic contagiousness [10,28]. To cover these type of infectious diseases, we introduce the following form for the probability to be infectious  → β(a). On the right-hand side, we plot the function In the Figures 15-16 we use the IBM to investigate some properties of the clusters obtained from the stochastic simulations. We compare such a stochastic sample with the original a → R 0 (a).
By comparing  1 000), we apply the discrete-time equation (6.5) to reconstruct a → R 0 (a) from the trajectories of the deterministic or stochastic models. In the deterministic model, we observe the effect of the day-by-day discretization (which corresponds to the daily reported data). In the stochastic case, we observe the effect of the stochasticity of the IBM.  Simulations for I 0 = 10:  In Figures 19-21 we focus on the reconstruction of the daily reproduction number R 0 (a) = τ S 0 e −ν a β (a). In Figure 19, we focus on the reconstruction of the daily reproduction number from deterministic simulations, while in Figures 20-21 we focus on the reconstruction of the daily reproduction number from stochastic simulations.  Figure 19: On the left hand side, we plot the daily basic reproduction number by using the original formula R 0 (a) with (7.2). On the right-hand side, we apply formula (6.5) to the flow of new infected obtained from the deterministic model. We vary I 0 = 6, 10, 14. The value I 0 = 10 corresponds to the value used for the simulation of the deterministic model. The yellow curve gives the best visual fit, and the R 0 (a) becomes negative whenever I 0 becomes too small. In Figures 24-26 we focus on the reconstruction of the daily reproduction number R 0 (a) = τ S 0 e −ν a β (a). In Figure 24, we focus on the reconstruction of the daily reproduction number from deterministic simulations, while in Figures 25-26 we focus on the reconstruction of the daily reproduction number from stochastic simulations.  Figure 24: On the left hand side, we plot the daily basic reproduction number by using the original formula R 0 (a) with (7.2). On the right-hand side, we apply formula (6.5) to the flow of new infected obtained from the deterministic model. We vary I 0 = 600, 1000, 1400. The value I 0 = 1000 corresponds to the value used for the simulation of the deterministic model. The yellow curve gives the best visual fit, and the R 0 (a) becomes negative whenever I 0 becomes too small.  Figure 26: On the left-hand side, we plot the daily number of cases t → N (t) (for t = 0, 1, 2, . . .) obtained by summing the daily number of cases for 500 runs of the IBM. On the right-hand side, we apply formula (6.5) (with I 0 = 500 × 1000) to the daily number of cases obtained from the IBM.

Application to SARS-CoV-1
In practice, the Kermack-McKendrick model starting from a Dirac mass means that the epidemic starts from a single patient at time t 0 (whenever I 0 = 1) or from a group of I 0 infected patients all with the same age of infection a = 0 at time t 0 . This assumption corresponds to the standard conception of a cluster in epidemiology. An example of such a cluster is obtained [34] for the SARS-CoV-1 epidemic in Singapore in 2003. The cluster is represented by a network of contact between individuals in Figure 27. On the right-hand side, we plot the daily reported cases from Singapore for the epidemic of SARS in 2003. Case 1 generated 21 cases and 3 suspected cases, case 2 generated 23 cases and 5 suspected cases, case 3 generated 23 cases and 18 suspected cases, case 4 generated 40 cases and 22 suspected cases, case 5 generated 15 cases and 0 suspected cases [34]. The cases 1,2,3,4,5 correspond respectively to the patients 1, 6, 35, 130 and 127. Figure 27 presents the time series of reported cases by source of infection and date of fever onset. In Figure 28 we present three representations of these data in continuous time: as a step function, regularized by Gaussian average and rolling weekly average. In Figure 29 we apply the continuous-time model to the rolling weekly regularization of the data. Similar to the reconstruction of R 0 (a) presented in Figures 6-8, 11-13, 19-21, 24-26, the basic reproduction number R 0 (a) becomes negative after a given age. Our interpretation is that the data are far from perfect and involves sampling errors and probably a large number of undetected cases. The fact that the transmission rate is subject to variations in time could also explain this negativity. In Figure 30 we apply the discrete model (1.6) to the original data for different values of I 0 . Finally, in Figure 31, we transform the data by taking advantage of the information on the source of infection given in [34]. We fix an incubation period of 5 days, which corresponds to the average incubation period reported in [34]. Then we shift all secondary cases produced by the six sources identified in the article [34] to the same origin, as if all cases had been produced by the same cluster of six individuals. We present the data on the left-hand side of Figure 31 and apply the method to obtain R 0 (a) for parameters I 0 = 30, I 0 = 50, I 0 = 100.  Figure 28: Regularizations of the daily cases data from the SARS-CoV-1 outbreak in Singapore [34]. The applications in Figure 29 are done with the "Rolling Weekly" regularization.  Figure 31: Left: Daily reported cases of a theoretical cluster based on the data from the SARS-CoV-1 outbreak in Singapore in 2003 [34] (blue line). The secondary cases produced by the six patients identified in [34] are shifted to the same initial infection date of the infector, as if all were produced by the same cluster starting at t = 0. We also use an incubation period of 5 days, as indicated in [34]. Right: Numerical solution of the R 0 (a) function computed by using the discrete model (1.6) with I 0 = 30, I 0 = 50 and I 0 = 100.

Discussion
We see from the numerical simulations in Section 7 that the initial number of infected I 0 has a very significant influence on the value of the basic daily reproduction numbers R 0 (d): these decrease sharply with I 0 , until they become negative and their fluctuations increase in stochastic simulations. This tendency to negativity for small I 0 and these fluctuations are only corrected when the results are averaged for a large number of stochastic simulations (500). It can also be noted that the stochastic simulations lead to a behavior of the hyper exponential type in the coefficient of variation of the secondary cases produced by an infectious individual, that is to say that it is relatively constant and much greater than 1. This phenomenon is to be related to the exponential character of the gamma distribution used in the simulations.
In deterministic simulations, one can observe a great dependence of the results with regard to the value of I 0 concerning the reconstruction of the daily basic reproduction numbers R 0 (a) along the contagiousness period of an individual. In particular, the set of days at which R 0 (a) fluctuates becoming in some cases negative is important for the great values of I 0 . In stochastic simulations, we observe the same behavior for the different curves related to the R 0 (a) curves sample, but the expectation of this curves sample considerably attenuates these fluctuations and the coefficient of variation of the curves remains approximately constant, while being greater than 1, as in the case of hyperexponential distributions, in agreement with the exponential character of the part D of the equation (1.7) defining R 0 (a).
Concerning the clusters, from observations made during investigations of the start of the outbreak in some countries [1, 5, 6, 11, 15, 18-21, 24, 29-31, 33], it is possible to get spatial and temporal information on the start of the epidemic, but these studies rarely allow the estimation of the parameters S 0 and τ in the concerned population and worse, they give no indication of how long they remain constant. Here we assumed that they remained constant only during the period of exponential growth of new cases observed.
Our work provides a method to reconstruct the daily basic reproduction number R 0 (a) from the daily reported cases data, as long as we consider a cluster starting from a single infected. This is a strong assumption which is usually neglected. It is extremely hard to find in the literature a datasets which satisfies this assumption. For COVID-19, we did not find any publication including suitable data. While not published yet, we believe that this kind of data could be gathered by a detailed contact-tracing and -duly anonymized -could be made available by request. That would allow the development of more realistic and accurate methods for the analysis and forecast of epidemics.