Quantifying asymptomatic infection and transmission of COVID-19 in New York City using observed cases, serology, and testing capacity

Significance As health officials face another wave of COVID-19, they require estimates of the proportion of infected cases that develop symptoms, and the extent to which symptomatic and asymptomatic cases contribute to community transmission. Recent asymptomatic testing guidelines are ambiguous. Using an epidemiological model that includes testing capacity, we show that many infections are nonsymptomatic but contribute substantially to community transmission in the aggregate. Their individual transmissibility remains uncertain. If they transmit as well as symptomatic infections, the epidemic may spread at faster rates than current models often assume. If they do not, then each symptomatic case generates, on average, a higher number of secondary infections than typically assumed. Regardless, controlling transmission requires community-wide interventions informed by extensive, well-documented asymptomatic testing.

For exposed compartment E m where m = 1:

Accumulator Variables
Let C Q1 represent the total number of individuals with severe COVID-19 cases who enter the hospital over a single-day period.In the SEPIAR model, this is the number of people moving from compartment I S1 to H in a single day.
We assume that non-severe COVID-19 cases are sampled at the same time in the course of their infection as severe cases, provided that sufficient testing capacity is available.Let C Q3 represent the total number of people who move from compartment I S1 to compartment I S2 over a single day in the SEPIAR model.These people represents symptomatic COVID-19 cases that do not become severe.
The quantities C Q1 and C Q3 generated from the epidemiological model are used as inputs for the testing model.

Fitted Parameters
Unless otherwise mentioned, we fit the recovery rate for non-severe symptomatic infections (γ), the scaling factors for asymptomatic and pre-symptomatic transmission (b a and b p ), the scaling factor for post-intervention transmission (b q ), the proportion of new infected cases that will become symptomatic (p S ), the proportion of symptomatic cases that are severe enough to require hospitalization (p H ), the reproductive number for symptomatic cases (R 0 ) the dispersion parameter for RT-PCR testing (σ M ), and the initial number of infected (I 0 ) and exposed (E 0 ) individuals at the start of the simulation on March 1, 2020.
We constrain the fitting algorithm to explore only positive values for all fitted parameters and only values between 0 and 1 for p S , p H , b a , b p , and b q .

Initial Grid Searches of SEPIR and SEIAR Models
We first fit the SEPIR model, which does not have asymptomatic transmission, and the SEIAR model, which does not have pre-sympamatic tranmission (Figure 11).For each model, we generate a grid of 25,000 initial parameter combinations using Latin Hypercule Sampling.For each initial combination, the given model is fit to observed case data for two sets of 50 iterations using the iterated filtering algorithm MIF2 [1] using 50,000 particles.The final likelihood of each parameter combination with respect to observed case data is then estimated using the sequential Monte Carlo algorithm pfilter [2].We then isolate all parameter combinations within 2 log-likelihood units of the parameter combination with the highest likelihood (the maximum likelihood estimate or MLE), and calculate the likelihood of each combination with respect to the serology data.We then isolate the parameter combination with the highest likelihood with respect to the serology data.

Monte Carlo Profile of SEPIAR Model
For the analysis of the full SEPIAR model, which includes both pre-symptomatic and asymptomatic transmission, we construct a Monte Carlo Profile [3] of the relative strength of asymptomatic transmission (b a ).As Figure 2   The values for all of the other fitted parameters were uniformly drawn from the boundaries of all parameter combinations obtained from fitting the SEIAR model that had likelihoods with respect to case data within 20 loglikelihood units of the MLE.This yielded a total of 1200 starting points.Each profile starting point was then fit to case data using the iterated filtering algorithm MIF2 [1] and the Sequential Monte Carlo algorithm pfilter [2], with MIF2 constrained to keep b a unchanged.For all parameter combinations that were supported by observed case data (i.e. that had log-likelihoods within 2 units of the MLE), we then calculated the likelihood with respect to serology.All parameter combinations from the full SEPIAR model with serology likelihoods within 2-log-likelihood units of the MLE were used in subsequent analyses of the proportion of cases that are symptomatic (p S ), the reproductive number in symptomatic individuals ( R 0 ), and the overall reproductive number for the model (R 0NGM ) which was calculated using the Next Generation Matrix [4].

Model Comparison
We compare the likelihoods with respect to the serology data of all the maximum likelihood estimates from the SEPIR, SEIAR, and SEPIAR models via the Likelihood Ratio Test.Recall that when calculating the likelihood with respect to case data, all three models had maximum likelihoods that were not statistically different.

Testing Model
The testing model is implemented with discrete time steps of a day, denoted hereafter by t.

Testing Capacity Data
We use the total number of RT-PCR tests conducted each day across the five counties compromising New York City as measured by the New York State Health Department's COVID tracker as the daily testing capacity for the city.
This capacity is fed into the model as a co-variate (L reported (t)).
Let L 0 (t) represent the initial testing capacity on day t.Recall that we assume it took 2 days to conduct a RT-PCR Test.Therefore, we advance the testing capacity by 2 days from the observed value, since the testing capacity on day t will correspond to the number of tests conducted on day t + 2.

Testing Priorities
Given a finite daily testing capacity, we assume that tests are used in the following order during any given day until the testing capacity is exhausted: We assume that any remaining unused testing capacity remaining after all four queues have been tested is used to test individuals without COVID-19.
Thus, this unused capacity does not roll over to the next day.
Supporting Figure S4 illustrates these testing priorities.

General framework for incorporating changes in testing capacity
Below we provide the detailed steps of the general testing framework which can be applied to other locations where testing capacity is changing over time.A diagram of this framework is shown in Supporting Figure S3.We illustrate the steps by focusing on the testing of hospitalized severe COVID-19 cases.
In subsection 3.4, we describe several modifications to this framework that we make to take into account additional queues and modifications that are specific to New York City in the early stage of the epidemic.
We use the variable Q 1 to represent all COVID-19 hospitalized cases that have been sampled but have not been tested yet.We first add the hospitalized COVID-19 cases that have just been sampled on day t (C Q1 (t)) to Q 1 .
Let the variable T Q1 represent all of the people in Queue 1 who will be tested.
If L 0 (t) is bigger than the Queue 1, then everyone in Queue 1 can be tested and Let L unused (t) represent the testing capacity left over after Q1 has been tested: There are then no individuals left in Q1 who need to be tested.
However, suppose that on a given day there is not enough testing capacity to test everyone in Q1 (i.e. at the start of that day, L 0 (t) < Q 1 (t)).
In this case, we assume that all of the testing capacity is used to test Q1: We therefore decrease Queue 1 by the number of people who were tested.
In this case, there is no unused testing capacity: Let Y Q1 (t) represent the number of people who tested positive on day t, and c the sensitivity of the RT-PCR assay.
We simulate RT-PCR testing as a draw from a Binomial distribution where If there is unused testing capacity after everyone in Queue 1 has been tested (i.e.L unused (t) > 0, then this capacity can be used to test individuals in other queues with a lower priority; see Section 3.4).
If testing of any of the individuals in these later queues results in additional new positive cases, these are added to Y Q1 (t) to obtain the total number of expected new COVID-19 cases during day t, denoted by Y sum (t).

NYC Specific Modifications and Additional Queues
There are several aspects of the testing model that are particular to New York City and the initial wave of the epidemic in the U.S. We summarize those aspects here.With sufficient description of the specific testing implementation, similar modifications could be considered for other locations.
• Re-Sampling of Hospitalized Cases-We assume that hospitalized COVID-19 cases will be re-sampled twice over a 24-hour period as they leave the hospital, to make sure that they have recovered.We take this into account with a second queue, Queue Follow-up steps for queues with lower priority (See Figure S4) as they left the hospital, and need to be tested again.We do not simulate RT-PCR testing, keep track of results of testing, or deal with lags for Queue 2, since the RT-PCR results of this re-sampling are not relevant for our model.What is relevant is that at least some of the available testing capacity L unused (t) may be used up re-testing hospitalized patients as they leave the hospital before other groups such as non-severe symptomatic patients can be tested.
In our epidemiological model, we assume that individuals spend an average of 13 days in the hospital.To be consistent, in our testing model, we assume that individuals who test positive for COVID-19 will be re-sampled once 13 days after they enter Queue 1, and then re-sampled a second time 1 day later.To keep track of the days since each sample entered Queue 1, we modify our implementation of Queue 1 by introducing initial sampling cohorts.There are thirteen cohorts, representing people who entered Queue 1 up to 13 days before time t.Let T Q1 v (t) represent the number of people who were sampled v days before day t who will be tested on day t.We calculate T Q1 v (t) as we loop through each cohort v in Queue 1, Q 1v , starting with the oldest (Q 1 V ) and ending with the most recent (Q 11 ).
For example, if there is sufficient capacity to test everyone in the oldest cohort (i.e.L 0 > Q 1 V (t)), then we essentially follow a similar procedure to equations 15 and 16: We similarly decrease L unused (t) as the capacity is used up from testing each subsequent cohort.For example, suppose that there is enough capacity to test a subsequent cohort v (i.e.L unused (t) > Q 1v (t)).Then: Alternatively, when we do not have enough testing capacity to test all of the cohort, then: We loop through all cohorts in Queue 1 until either all the people in the queue have been tested or until the unused testing capacity is exhausted.
When simulating RT-PCR testing, we again loop over all sampling cohorts v from 1 : V .Equation 21 is modified accordingly: For the first re-sampling, we need to keep track of those individuals who tested positive when they first entered the hospital.Because different cases form the same cohort may be tested on different days, we need a variable to accumulate those cases that tested positive and belong to the same cohort.Let P Q1 v (t) represent all cases from Queue 1 initially sampled v − 1 days before time t who have tested positive so far.
We accumulate the total number of people in initial sampling cohort v who have tested positive so far by adding the number of people from that cohort who tested positive on day t (Y Q1 v (t)) to P Q1 v (t): For the first re-sampling, the oldest cohort is entered into Q 2 (t) : For the second re-sampling, we keep track of the individuals in this oldest cohort and enter them into Q2 again one day later (on day t + 1).
At the end of each simulation day, all initial sampling cohorts are advanced by 1 day.
• 2-Day Lag in Testing-We incorporate a 2-day lag between when tests are conducted and results are reported to take into account the 48 hour testing time of early RT-PCR tests.We do not update cohorts during this lag, so this effectively adds another 2 days between the initial sampling and the re-sampling beyond the 13 days spent in the hospital.If this framework is applied to other locations and time periods, this modification may not be needed, particularly if rapid diagnosis RT-PCR tests are in use.
• Testing of non-symptomatic severe cases-Queue 3, which records the number of non-severe symptomatic cases that need to be tested, is implemented identically to Queue 1, except that individuals can exit the queue at rate γ as they recover, even if they have not yet been tested.
Sampling cohorts are used in this queue as well.
• Re-testing of non-symptomatic severe cases-Early CDC guidelines [5] recommended a 14-day quarantine for non-hospitalized symptomatic individuals, and that these individuals be re-tested at the end of the quaran-tine.Queue 4, which records the re-sampling of non-severe symptomatic cases as they exit quarantine, is implemented identically to Queue 2, except that cases are re-sampled 14 days after they enter Queue 3, rather than 13 days, to match the length of the quarantine period.
• Queues are numbered in order of priority-Any unused testing capacity after Queue 1 is empty is applied to Queue 2, and subsequently Queues 3 and 4.
• Severe non-COVID hospitalized cases-Queue NC, which represents severe non-COVID-19 respiratory cases, is implemented in a similar fashion as Queue 1.Let G severe (w, y) represent our estimate of the weekly expected severe non-COVID-19 respiratory cases that would be sampled for testing in the hospital, where w denotes the week, and y, the year.
This estimate is obtained from syndrome surveillance data as described later in Section 4. The number of daily new cases that enter Queue NC, C QNC , is therefore: Queue NC has the same priority as Queue 1, since both groups of individuals will present with severe respiratory symptoms at the time of sampling.
Both groups are hence sampled at once.For the situation where testing capacity is greater than the total samples in a given sampling cohort in Queue 1 and Queue NC, we simply test all samples from that cohort in both queues.For the situation where the testing capacity is limiting, we simulate a draw from a hypergeometric distribution.For example, in the model, we modify equation 26 as follows: We assume that the RT-PCR test is 100% specific, so all cases in Queue NC will test negative since they do not have COVID-19.We thus do not simulate RT-PCR testing for Queue NC.The main importance of individuals in Queue NC is that they deplete some of the testing capacity that would otherwise be used for Queue 1.

Measurement Model
Let Y sum represent the total number of new positive tests from all of the queues.
We assume that there is additional negative-binomial distributed dispersion after the RT-PCR testing with standard deviation σ M .Thus, if Y is the observed number number of daily cases, we simulate Y from a negative binomial distribution with mean equal to Y sum and and variance

Syndrome Surveillance Estimates
We estimate the number of non-COVID-19 severe respiratory cases that may have presented each week of the epidemic in NYC hospitals using syndrome surveillance data from NYC hospital emergency departments and observed influenza cases in NYC in previous years.The estimate we seek here represents the typical number of non-COVID-19 respiratory cases one would expect in a given week of the year given the seasonal pattern of influenza cases and that of other respiratory ailments that present to NYC hospitals.Health Department's web portal [6].These reports include all cases in which the chief complaint mentioned bronchitis, chest cold, chest congestion, chest pain, cough, difficulty breathing, pneumonia, shortness of breath, or upper respiratory infection.For this analysis, we use weeks 2-20 from 2016,2017 and 2019.

Confirmed Flu Cases
Confirmed influenza cases from all New York Counties from 2016-2020 for all counties in New York State were obtained from the New York State public health portal [7].We used data from the five countries comprising New York City that correspond to the same time period as the syndrome surveillance data.We excluded 2018 from this analysis since 2018 was an anomalous, severe, influenza season [8].

Description of Statistical Model
Recall that G severe (w, y) is the number of non-COVID-19 respiratory infections during week w of year y that were severe enough to be sampled for COVID-19 testing.This was the quantity added to Queue NC in Equation 30 of Section 3.4.We assume that these cases are a fixed fraction s of the total non-COVID respiratory syndrome cases presenting in the emergency departments of hospitals in NYC in week w of year y, which we denote as (G(w, y)): We assume that in the absence of COVID-19, a portion of observed respiratory syndrome surveillance reports are associated with influenza, and that the non-influenza associated reports have a fixed seasonality.
Therefore, we consider that our estimate for the non-COVID-19 weekly respiratory syndrome surveillance cases (G(w, y)) varies linearly with confirmed influenza cases F (w, y) in NYC and presents additional weekly variability whose effect is represented non-para-metrically with a polynomial dependency as follows: where

Model Fitting and Simulation
We estimate the intercept g 0 and influenza coefficient g F via a linear regression of respiratory syndrome surveillance reports and confirmed cases in New York City in 2016,2017 and 2019.We then fit the residuals from this linear regression to a third-order polynomial seasonality function to obtain estimates of coefficients b i .We estimate the error term σ by measuring the residual standard error from the polynomial fit.
When fitting the epidemiological model, we simulate values for G(w, y) using observed weekly influenza cases in 2020 as the co-variate F (w, y) obtained from the same source [7].A plot of the fitted model is shown in Figure S10.

Estimation of Proportion Sampled for COVID-19 testing
Recall that the scaling parameter s represents the probability that an individual who shows up to the emergency department with respiratory symptoms is severe enough to merit testing for COVID-19.As a proxy for this value, we use the ratio of individuals aged 65 or older who were hospitalized for influenza to the number of individuals aged 65 or older who had a medical visit for influenza during the 2018-2019 influenza season [9].
The value we use for this scaling parameter s for the fitting of all three models is 0.16.
We profile over this parameter value as a sensitivity analysis(see Figure S8).
Allowing this parameter to vary does not result in a higher likelihood with respect to serology data compared to the MLE parameter combination from the main analysis.

Detection of anomalies in 2020 syndrome surveillance
From our model of non COVID-19 respiratory cases, we can obtain an estimate of the expected number of syndrome surveillance reports G w,y in 2020 in the absence of COVID-19.Note that we do not use the scaling parameter s in this calculation, unlike when fitting the epidemiological model.
If we subtract this value from the observed respiratory syndrome surveillance reports in 2020, we obtain a metric for anomalous respiratory syndrome surveillance reports related to COVID-19.This is the pink line in Figure 5 of the main manuscript.

Overall Reproductive Number Derivation
Following [4], we derive R 0NGM as the leading eigenvalue of the following matrix: which is composed of two other matrices, T and Σ −1 , defined below.

Figure 1 :
Figure 1: Diagram of the grid searches for the SEPIR (A) and SEIAR (B) models.

1 . 2 . 3 . 4 .
Initial testing of hospitalized patients.Hospitalized patients include patients who have severe cases of COVID-19 (Queue 1) as well as indi-viduals who do not have COVID-19 but are hospitalized with respiratory symptoms (Queue NC).Patients are added to queue 1 as they enter compartment H in the epidemiological model.Re-testing hospitalized patients when they leave the hospital (Queue 2).Patients hospitalized with COVID-19 are re-tested twice over a 24 hour period as they exit the hospital.Initial testing of non-severe symptomatic cases (Queue 3).Patients are added to queue 1 as they enter compartment I S2 in the epidemiological model.Re-testing of non-severe symptomatic cases 14 days after they are first sampled (Queue 4).

Figure 3 :
Figure 3: Diagram of the general testing framework described in Section 3.3.The New York City-specific modifications described in Section 3.4 are not shown here.

Figure 5 :Figure 6 :Figure 7 :
Figure 5: Monte Carlo profile of the strength of transmission in asymptomatic cases relative to that of symptomatic cases (b a ).Each point represents the parameter combination from the Monte Carlo profile for b a with the highest log-likelihood (with respect to observed cases) for a given value of b a .All points above the blue line are supported by the case data (i.e. they have likelihoods within 2 log likelihood units of the profile MLE).

Figure 8 :Figure 9 :
Figure 8: Monte Carlo profile of the probability that an individual (who does not have COIVD-19) who shows up to the emergency department with respiratory symptoms is severe enough to merit testing for COVID-19 (s).Each point represents the parameter combination from the Monte Carlo profile for the scaling parameter s with the highest log-likelihood (with respect to observed cases) for a given value of s.All points above the blue line have likelihoods within 2-log likelihood units of the profile MLE).

Figure 10 :
Figure 10: Plot of observed respiratory syndrome surveillance reports compared to simulations from fitted statistical model The red line corresponds to weekly respiratory infections from syndrome surveillance reports in NYC hospitals in 2016, 2017 and 2019 that were used to fit the statistical model in Section 4. The blue line represents the median estimate for the number of expected syndrome surveillance reports (G(w, y) for that week and year from 100 simulations from the fitted statistical model.The shaded light blue region represents the 2.5% and 97.5% quantiles from those 100 simulations.

Table 2 :
SEPIAR Profile Fitting Procedure.This diagram summarizes how the Monte Carlo profile of b a for the SEPIAR model was fit to case data and subsequently to serology.Comparison of MLE Likelihoods from SEPIR, SEIAR, and SEPIAR models with respect to serology data