In vivo kinetics of SARS-CoV-2 infection and its relationship with a person’s infectiousness

Significance Quantifying the kinetics of SARS-CoV-2 infection and individual infectiousness is important for understanding SARS-CoV-2 transmission and evaluating intervention strategies. Here, we developed within-host models of SARS-CoV-2 infection, and by fitting them to clinical data, we estimated key within-host viral dynamic parameters. We also developed a mechanistic model for viral transmission and show that the logarithm of the viral load in the upper respiratory tract serves as an appropriate surrogate for a person’s infectiousness. Using data on how viral load changes during infection, we further evaluated the effectiveness of PCR and antigen-based testing strategies for averting transmission and identifying infected individuals.


Choice of fixed parameter values
Total target cell numbers in the absence of infection ( 0 ) It has been estimated that there are 4×10 8 epithelial cells in the upper respiratory tract (URT) (1). Hou et al. recently estimated that the fraction of cells that express angiotensin-converting enzyme 2 (ACE2), i.e. the receptor for SARS-CoV-2 entry, on cell surface is approximately 20% in the URT (2). There is a much higher fraction of cells expressing the type II transmembrane serine protease TMPRSS2, a co-receptor for SARS-CoV-2 entry (2). Therefore, in our model, we assume the initial number of target cells in the URT, 1,0 =8×10 7 cells, i.e., 20% of the total epithelial cells. Note that for a standard viral dynamics model (as well as in our immunity model described below), the number of initial target cells and the virus production rate are unidentifiable and only their product is identifiable (3). Thus, an increase (decrease) in the initial number of target cells will lead to corresponding decrease (increase) in the estimate of the virus production rate but not in the estimate of other parameters such as  or R0 (see Eqs. [4] and [5]).
Initial number of infected cells (in an eclipse phase), 0 Evidence strongly suggests that the URT is the initial site of infection (2,4). Thus, we assume that one cell in the URT is infected at the start of infection, 0 = 1 cell, rather than setting an initial viral load, which avoids the complication of predicting whether one or more infectious virions would be present for any chosen V0. This approach is similar to that in Ref. (5), which showed that this assumption does not change the dynamics of the model significantly as any initial viral particles that succeed in initiating infection must infect one or more cells (rapidly) before being cleared. In a sensitivity analysis, we test 0 = 10 cells.
Virus clearance rate, c We set c=10/day, because in vivo viral clearance is usually fast in many infections, including for respiratory infections such as influenza (1, 5, 6). We and others have used this value of c in previous models of infection by SARS-CoV-2 (7,8). In sensitivity analyses, we set c= 5 or 20/day. showed viral titers of 10 4 PFU/ml in vitro at 6 hour post infection, the earliest time point sampled.

The target cell limited (TCL) model
We first construct a within-host model based on the target cell limited model. The model keeps track of the total numbers of target cells (T), cells in the eclipse phase of infection (E), i.e., infected cells not yet producing virus, productively infected cells (I) and total viruses ( ). To compare the model with data, we keep track of sampled viruses, i.e., virus levels measured in nasal pharyngeal swabs, , and assume that these levels are proportional to the actual number of viruses in the URT, . The ordinary differential equations (ODEs) describing the model are where is the proportion of virus sampled from the URT in either a single swab for the German data or in 1 mL of fluid that the swab was placed in for the NBA data.
Model simplification. From Eqn. S1, we have: Let = , and = / , we get we get the following ODEs shown as Eqn. [2] in the main text: 3. Innate immunity models We constructed three versions of the innate immune model. The first version of the model is used in the analyses in the main text, and is termed the innate immune model throughout the main text.

a. Innate immune model -refractory cells
In the first version of the innate immune model, we keep track of type I interferon (F) and cells refractory to infection (R), in addition to the compartments in the TCL model. We assume that binding of interferons to receptors on target cells stimulates genes that make target cells refractory to infection In this model, the impact of the innate immune response is to convert target cells into refractory cells at rate , where is a rate constant. Refractory cells can become target cells again at rate . Interferon is produced and cleared at rates and , respectively.
To minimize the number of unknown parameters, we simplify the model by making the quasisteady-state assumption that the interferon dynamics are much faster than the dynamics of infected cells and assume that = 0. Thus = or = .
Let Φ = , so that the ODEs for the innate immunity model become b. Innate immune modelreducing infectivity The second version of the model considers that interferons may reduce infectivity, i.e., make cells less susceptible to infection. Again, we make the quasi-steady-state assumption that the interferon dynamics are much faster than the dynamics of infected cells and assume that is proportional to I. The ODEs for the model are where is a constant representing the effect of innate response mediators such as type I interferon.
c. Innate immune modelreducing virus production The third version of the model considers the potential impact of the innate response on reducing virus production from infected cells. For example, in the hepatitis C virus infection administration of type I interferon reduces viral RNA replication and viral production (10, 11). As above, we make the quasi-steady-state assumption that the interferon dynamics are much faster than the dynamics of infected cells and assume that is proportional to I. The ODEs for the model are where is a constant representing the effect of innate response mediators such as interferon.

Model fitting and parameter estimation Estimating time of infection
To estimate the times of infection of individuals in the NBA dataset, we fit both the TCL model and the innate immune model to viral load measurements from each individual by minimizing the least-squared residual error between viral load measurements and the model predicted viral load on a logarithm scale. The best-estimates of the times of infection are reported in Table S1.
Parameter estimation from all datasets We used a population approach, based on non-linear mixed effect modeling, to fit the data from all patients simultaneously, with each of the models. We fixed infection dates as estimated using that model (Table S1) for the NBA dataset and to the known infection dates of the German dataset. We allowed random effects on the fitted parameters.
We analyzed the source of the dataset, i.e., the NBA data or the German data, as a categorical covariate for fitted parameters. We first tested the model by assuming all of the parameters covary with the covariate. We then exclude the parameter that has the lowest p-value by the Pearson's correlation test, which tests whether covariates should be removed from the model by Monolix. All estimations were performed using Monolix (Monolix Suite 2019R1, Antony, France: Lixoft SAS, 2019, lixoft.com/products/monolix/).

Inferring the relationship between the number of infectious viruses and viral load
To understand how the level of infectious viruses relates to viral load, we constructed and fit mathematical models to the three datasets. In the first dataset, 'the Jaafar dataset', Jaafar et al. Relationship between viral load and Ct values reported in Jaafar et al. (12) First, the viral load, V, measured as RNA copies in a sample is related to cycle threshold values, C, as: where the constants a and b are determined by the RT-PCR assay used. Jaafar et al. measured the Ct values using the LightCycler 480 instrument (Roche Diagnostics) (12). According to a recent report (14)

Relationship between viral load and infectious viruses
We assume the number of infectious viruses that was in the sample for cell culture experiment to be a random variable, Y, that follows a Poisson distribution. We consider three alternative models describing how the mean number of infectious viruses in a sample, = ( ), is related to viral load measured by qPCR: the 'linear' model, the 'power-law' model and the 'saturation' model: 1. The linear model We first assume that the mean of the infectious virus in a sample, , is proportional to the viral load, V, in the sample, i.e., This is the simplest model describing the relationship between infectious viruses and viral load. However, as we will show below, the model does not fit the three datasets. Therefore, we developed two additional models to describe this relationship.
2. The power-law model In this model, we assume that the mean of the infectious virus in a sample, , is related to the viral load, V, by a power-law function where B and h are constants.
3. The saturation model In this model, we assume that the mean of the infectious virus in a sample, , is related to the viral load, V, by a Hill function where and are constants.

Probability of cell culture positivity
We now calculate the probability of cell culture positivity for the three models in Eqs.
[S9]-[S11]. We assume the number of infectious viruses in the sample, Y, is a random variable that follows a Poisson distribution with mean E [Y]. If each infectious virus has a probability to establish infection such that the cell culture becomes positive, then the number of viruses that successfully establish an infection in cell culture is Poisson with parameter = ( ) = . Thus, the probability of one or more viruses successfully infecting the culture so that it tests positive is Using the subscript i to denotes the model for , for the linear model, we substitute the expression for in Eq.
Model fitting Jaafar et al. reported the total number of samples and the number of samples that were positive in cell culture for each Ct value (ranging between 11 and 37) (12). We can calculate the likelihood of observing these numbers given the probabilities of cell culture positivity as defined in Eqs.
[S13]-[S15]. Then the probability of observing the positive cell cultures in a total of cultures where j denotes the j th Ct value of the sample put into culture, j= 11, 12…,37, is where is the viral load corresponding to the j th Ct value.
The negative log-likelihood (NLL) of the i th model given all the data in Jaafar et al. is then given by = − ∑ log , , = 1,2,3 [S17] In the Kohmer dataset and the Jones dataset, for each sample, the viral load and the cell culture positivity were reported. We calculate the likelihood of the k th observation being positive or negative as where is the viral load of the k th observation.

Model comparison
To compare models, we compute the AIC scores as where is the number of estimated parameters of the i th model, i.e., 1 = 1 (parameter A), 2 = 2 (parameters D and h) and 3 = 3 (parameters G, h and Km).

Results
We fitted the three models to the three datasets by minimizing the NLLs described in Eqs.
[S17] and [S19]. According to the AIC scores, with the lower the score the better the model, the saturation model is the best model to describe the Jaafar dataset (Table S5). The power-law model is the best model to describe both the Jones dataset and the Kohmer dataset (Table S5). The saturation model only had slightly higher AIC scores and thus also has considerable support (15).
Interestingly, the estimated parameter values using the saturation model are very similar across the three datasets (Table S6), emphasizing the reliability of these estimates. For the saturation model in the main text, we use ℎ = 0.51, = 8.8 × 10 6 RNA copies/ml as estimated from the Jaafar dataset. For the power-law model in the main text, we use ℎ = 0.53 as estimated from the Jones dataset. Figure S1.

The individual infectiousness profile (blue lines) predicted by the saturation model for individuals in the Germany study (A) and the NBA study (B).
Parameters used are the same as in Fig. 2D. In panel A, the expected serial interval (SI), the fraction of presymptomatic transmission and the infectious period are reported. In panel B, only the expected serial interval (SI) and the infectious period are reported., because the symptom onset dates for these individuals are unknown. Horizontal dashed lines denote the threshold we defined (i.e. 0.02) above which a person becomes infectious. Vertical lines in panel A denote the time of symptom onset as reported in Ref. (16).      load dynamics from model simulations and right panels shows the predicted infectiousness for the corresponding viral load dynamics. In Scenario 1, we assumed for simplicity, that in breakthrough infections, viral load uniformly decreases across all time points by 10, 100 or 1000-fold (dashed lines). In Scenario 2, we assumed that the peak viral load is decreased by 10, 100, or 1000-fold, but the exponential growth and decline curves are kept the same as in unvaccinated participant 737 except that if the viral load hits the reduced peak VL defined for the vaccinated simulation scenario, the VL remains at the peak. The predicted reductions in overall infectiousness are shown in the legends of the right panels.