Impact of insufficient detection in COVID-19 outbreaks

Abstract: The COVID-19 (novel coronavirus disease 2019) pandemic has tremendously impacted global health and economics. Early detection of COVID-19 infections is important for patient treatment and for controlling the epidemic. However, many countries/regions suffer from a shortage of nucleic acid testing (NAT) due to either resource limitations or epidemic control measures. The exact number of infective cases is mostly unknown in counties/regions with insufficient NAT, which has been a major issue in predicting and controlling the epidemic. In this paper, we propose a mathematical model to quantitatively identify the influences of insufficient detection on the COVID-19 epidemic. We extend the classical SEIR (susceptible-exposed-infections-recovered) model to include random detections which are described by Poisson processes. We apply the model to the epidemic in Guam, Texas, the Virgin Islands, and Wyoming in the United States and determine the detection probabilities by fitting model simulations with the reported number of infected, recovered, and dead cases. We further study the effects of varying the detection probabilities and show that low level-detection probabilities significantly affect the epidemic; increasing the detection probability of asymptomatic infections can effectively reduce the the scale of the epidemic. This study suggests that early detection is important for the control of the COVID-19 epidemic.


Introduction
Coronavirus disease 2019 (COVID-19), a pandemic disease caused by the novel severe acute respiratory syndrome (SARS)-like coronavirus (SARS-CoV-2), has spread globally since it was first reported in December 2019 in Wuhan, China. As of May 25, 2021, there were more than 166 million confirmed cases including more than 3 million deaths [1]. Controlling the rapid spread of  has been an emergency global public health issue. Many countries have implemented different types of nonpharmaceutical interventions (NPIs) to control the COVID-19 epidemic, such as restricting travel, stopping parties, closing cities, closing schools, and self-protection [2][3][4]. Furthermore, vaccines have been widely used in many countries. However, the epidemic situation is far from under control, and the second wave of COVID-19 erupted in India in April 2021, resulting in more than 300,000 new cases reported every day. Importantly, the number of reported cases may be much lower than the exact number due to the low coverage rate of nucleic acid testing (NAT). Many countries/regions are suffering from an NAT shortage due to either resource limitations or control measures. Insufficient detection of COVID-19 may seriously affect the clinical intervention of infected patients and forecasting of epidemic tends. Nevertheless, the impact of insufficient detection of COVID-19 has not been clearly quantified.
Many mathematical models have been proposed to investigate the effects of various measures on controlling the epidemic and forecast the dynamics of the spread of COVID-19. Most models are formulated as differential equations that originate from the classical compartmental dynamics of SIR (susceptible-infectious-recovered) or SEIR (susceptible-exposed-infectious-recovered) models [5][6][7][8][9]. Many studies have attempted to forecast of the COVID-19 epidemic based on the established model formulation and parameters estimated from reported data to forecast the COVID-19 epidemic [10][11][12][13][14][15]. Alternatively, data-driven studies try to forecast epidemic dynamics directly from statistical analyses of reported data [16][17][18][19][20]. Other studies have attempted to improve the estimation of COVID-19 mortality by combining historical and current mortality data, statistical test models, and SIR epidemic models [21,22]. However, some counties do not perform NAT for light or moderate symptomatic infections, potentially leading to missing data and serious prediction problems. More importantly, some dead cases are not detected and hence are missing from reported data. Hence, the estimation of real infected cases from reported dead case numbers can be misleading and modeling and forecasting the spread of COVID-19 remains a challenge [23].
This study was intended to investigate the impact of insufficient detection on the prediction and control of COIVD-19 outbreaks. We propose a mathematical model with random (Poisson process) detections and varying detection probabilities for infection. We use epidemic data from Guam, Texas, the Virgin Islands, and Wyoming in the United States as examples to estimate the model parameters and study the potential effects of varying the detection probabilities. Based on the proposed model, we discuss possible long-term scenarios for COVID-19 by analyzing the role of detection probabilities in reducing the final scale and duration of COVID-19 outbreaks.

Model formulation
We extended the classical SEIR model to the situation of insufficient detections of cases of infection and death. The model is illustrated in Figure 1. Clinically, COVID-19 infections can be separated into two subpopulations: asymptomatic infections (I 1 ) and symptomatic infections (I 2 ). The infection rate of a susceptible person is β (day −1 ), the transition rate from latent (E) to the asymptomatic infections compartment (I 1 ) is γ 1 (day −1 ), and the transition rate from I 1 to the symptomatic infections compartment (I 2 ) is γ 2 (day −1 ). To compare the model simulation with reported recovered and dead cases, we distinguished the compartments of recovered (R) and dead and further separated the death compartment into unreported (D 1 ) and reported (D 2 ) dead cases. The unreported dead cases mainly come from undetected infections, with a rate µ 1 (day −1 ). The reported dead cases may come from the hospitalized compartment with a rate µ 2 (day −1 ) or the infectious compartment with a rate µ 3 (day −1 ). Here, we omitted death from asymptomatic infections. We assumed that asymptomatic (I 1 ) and symptomatic infected (I 2 ) patients are detected with daily detection probabilities p 1 and p 2 , respectively, and the confirmed cases are moved to the hospitalized compartment (H). Moreover, we assumed perfect isolation so confirmed infectious cases are hospitalized or isolated immediately so that they no longer contribute to infections. We further assumed that patients in the compartments of E, I 1 , and I 2 recover automatically, with rates α E , α 1 , and α 2 (day −1 ), respectively, and that hospitalized patients recover with a rate α H (day −1 ). The above model assumptions lead to the following differential equations for Here, N = S + E + I 1 + I 2 + H + R + D 1 + D 2 represents the total population number, and is assumed to be a constant. We introduced a factor k to represent the relative infection rate of asymptomatic infections (I 1 ) to symptomatic infections (I 2 ). The terms Ψ 1 (I 1 , p 1 ) and Ψ 2 (I 2 , p 2 ) are nonhomogeneous Poisson processes with varying arrival rates λ 1 (t) = p 1 I 1 (t) and λ 2 (t) = p 2 I 2 (t), respectively, which represent the number of infections patients testing positive per unit time. Thus, our model is implicitly stochastic since increments of infected individuals are randomly subtracted from I 1 and I 2 and added to the confirmed compartment H. The detection probabilities p 1 and p 2 are explicitly included in the model and are dependent on the epidemic control policy and NAT resources. The parameters associated with the infection rate (β), detection probabilities (p 1 and p 2 ), and death rates (µ 2 and µ 3 ) may vary with time, especially during the early stages of the outbreak of a novel epidemic disease, and hence are piecewise functions of time.
Model parameters and the range of parameter values are listed in Table 1. The implicit stochastic model (2.1) can be solved numerically through a modified Euler method. Consider a differential equation of the form Poisson process with arrival rate λ i (t). The numerical scheme of the modified Euler method is given by where P(λ) represents a Poisson distribution random number with parameter λ.

Data collection
Insufficient detection of COVID-19 is a common issue in many countries/regions for various reasons, such as limited testing resources, a large number of asymptomatic or moderately symptomatic infections, or government control policies. Here, we used reported epidemic data from Guam, Texas, the Virgin Islands and Wyoming in the United States from April 12, 2020 to February 28, 2021 according to the COVID-19 Map from the John Hopkins Coronavirus Resource Center [24]. The retrieved data include cumulative numbers of confirmed cases, recovered cases, and dead cases, which are shown in Figure 2.

Parameter estimation
To estimate model parameters, we referred to the reported data, and randomly sample the parameter values for each parameter over the ranges listed in Table 1, and choose a parameter set that minimizes the mean square error between simulation results and the time series of reported cumulative numbers of confirmed, recovered, and dead cases. In parameter estimations, we first compared the data for   Here, day 0 corresponds to April 12, 2020. confirmed cases at different stages to obtain the estimated values of most epidemic parameters in the model; then, we compared the data for recovered cases and dead cases based on known results to obtain estimations of the other parameters. In sampling the parameters, we assumed that latent infections have a higher self-recovery rate (α E ) than asymptomatic infections (α 1 ) due to innate immune responses at the early stage after infection. Moreover, the detection probability of symptomatic infection (p 2 ) is usually higher than that of asymptomatic infection (p 1 ).
For comparisons with the reported data, we also need to estimate the initial values. The initial values of variables H, R, and D 2 were obtained from the reported data (on April 12, 2020). The initial value of susceptible persons (S ) was retrieved from open sources [36]. The initial values of E, I 1 , I 2 , D 1 were estimated by minimizing the mean square error. Estimated initial values are shown in Table 2, and parameter values are shown in Table 3 and Figures 3 and 4. Here, we note that the infection rate β, detection probabilities p 1 and p 2 , and death rate µ 2 and µ 3 are piecewise functions, since they may change with distancing policies and clinical conditions. Based on the parameter values in Table 3, the estimated initial values in Table 2 and equation (1), we can obtain the simulated value of Figure 2. Comparisons between simulations and epidemic data are shown in Figure 5.
According to the estimated parameters in Table 3, the infection rate β obviously varies at different stages, which may reflect the distancing policies of the local government and people's attitudes towards the disease. The death rates µ 2 and µ 3 are lower in the later stage than in the early stage in clinical strategies in the later stage.
The estimated detection probabilities of the 4 states are shown in Figure 4 as piecewise functions, and suggest possible changes in NAT. According to our estimation, the detection probability for symp- (a) These parameters are defined by the piecewise functions given in Figure 3.     Figure 4. Estimated piecewise functions of the detection probabilities p 1 and p 2 in the four states: Guam, Texas, the Virgin Islands, and Wyoming.
tomatic infections (p 2 ) is higher in the later stage than in the early stage in all states, with a maximum detection probability larger than 0.8 in Guam, Texas, and Wyoming, and larger than 0.5 in the Virgin Islands. The detection probability for asymptomatic infections (p 1 ) also increases in Guam and Texas, but is much lower than that for symptomatic infections. Moreover, the detection probability for asymptomatic infections in Virgin Islands and Wyoming are extremely low.

Increasing the detection probability for asymptomatic infections can reduce the epidemic scale
The above simulation shows that the proposed model is capable of reproducing epidemic dynamics. To further quantify the influence of the detection probability p 1 on the COVID-19 epidemic, we took the parameters for Guam as an example in the following study. Based on our parameter estimation, the probability p 1 in Guam increased from 0.045 at the early stage to 0.13 in the later stage. Here, we took p 1 = 0.13 as the default value. First, we varied the detection probability p 1 (p 1 = 0.07, 0.1, 0.13, 0. 16, 0.19) for asymptomatic infections at constant values of the other parameters. Here, we set p 1 as a constant in the model simulations.
We note that H(t) in the model equation represents the number of hospitalized patients, which varies with time due to newly confirmed cases, recovered patients and dead patients. Here, to quantify the epidemic dynamics, we are interested in the daily confirmed cases defined as H new (t) = Ψ 1 (I 1 (t), p 1 ) + Ψ 2 (I 2 (t), p 2 ). (3.1) Moreover, we also examined the peak value of daily confirmed cases {H new (t)}, (3.2) and the cumulative new confirmed cases Similarly, we also consider the daily new infected cases the peak value {I new (t)}, (3.5) and the corresponding cumulative new infected cases Similar to the classical SIR or SEIR models, the daily confirmed case number increases to reach a peak value and then decreases to 0 as time t approaches infinity. The cumulative confirmed case number saturates at a final value when t is large enough. The daily new confirmed case number obviously increases if the detection probability p 1 is decreased (Figure 6a). If p 1 is reduced by half (p 1 = 0.07), the peak value of daily confirmed cases can be as high as 5000, and the number decreases to 98 if the detection probability increases to p 1 = 0.16, approximately 1% of that with p 1 = 0.07. We further examined the peak value of both daily new confirmed cases and new infections cases; both numbers exponentially decrease with the detection probability p 1 , while the daily infection number is more sensitively dependent on changes in the detection probability (Figure 6b).
We further examined the dependence of cumulative confirmed cases on the detection probability p 1 (Figure 6c). The cumulative confirmed case number obviously increases with the reduction of p 1 . Model simulations predict a final epidemic scale of 4 × 10 4 cases when p 1 = 0.07, and the number decreases to 2.7% (1160 cases) when p 1 increases to 0.16. These results suggest that increasing the detection probability p 1 can effectively reduce the epidemic scale.
To quantify the epidemic dynamics with various detection probabilities, we defined a relative increase index of daily new confirmed cases (∆ ‡ H) as the ratio of changes in daily new confirmed cases to daily new confirmed cases, which is formulated as where ∆ is a difference operator defined as ∆ f (t) = f (t) − f (t − 1) for any function f (t). Based on our model simulations, the time evolution of ∆ ‡ H is shown in Figure 6d. Despite the obvious dependence of epidemic dynamics on the detection probability p 1 , the relative increase index ∆ ‡ H is insensitive to p 1 ; however the underlying mechanism is not yet known. Hence, the proposed relative increase index may be used to quantify epidemic dynamics that are independent of the detection probability for asymptomatic infections.

Increasing the detection probability for symptomatic infections reduces the epidemic scale
We further examined the influence of changing the detection probability for symptomatic infections (p 2 ). Based on our parameter estimation, the detection probability p 2 in Guam increased from 0.3 at the early stage to 0.882 in the later stage. Here, we took p 2 = 0.7 as the default value. We varied the detection probability p 2 (p 2 = 0.5, 0.6, 0.7, 0.8, 0.9) and fixed other parameters as their default values. Simulation results are shown in Figure 7. Similar to the results with changes in p 1 , the daily new confirmed case number increases with p 2 (Figure 7a), and the peak value of daily new confirmed cases and the peak value of new infections both decrease with increasing detection probability ( Figure  7b). Increasing the probability p 2 can decrease the number of cumulative confirmed cases (Figure 7c). We further examined the relative increase index ∆ ‡ H and found that the index is independent of the detection probability p 2 (Figure 7d).

Prediction of epidemic scales with varying detection probabilities
To examine the impact of the detection probabilities p 1 and p 2 on epidemic size in the four states of Guam, Texas, the Virgin Islands, and Wyoming in the US, we performed model simulations with varying detection probabilities p 1 ∈ (0, 0.2) and p 2 ∈ (0, 1) and fixed other parameters unchanged as their estimated values shown in Table 3 and Figure 3. The simulation results for the cumulative new infected cases I C (t) in Guam, Texas, the Virgin Islands and Wyoming are shown in Figure 8. The model simulation shows that cumulative infected cases obviously decrease if either the detection probability p 1 or p 2 is increased. Specifically, in Guam and Texas, the epidemic size obviously decreases when p 1 varies from 0 to 0.1. For the Virgin Islands, the epidemic size obviously decreases when p 2 = 0 and p 1 increases from 0 to 0.2, and a slight increase in p 2 from 0 may significantly reduce the epidemic size. Similar results are obtained for Wyoming; slight increases in the detection probabilities p 1 or p 2 from 0 would obviously reduce the epidemic size. These results suggest the importance of performing NAT detection and isolating of confirmed cases in controlling the COVID-19 epidemic.  Here, the detection probability p 1 ranges from 0 to 0.2, and the detection probability p 2 ranges from 0 to 1.

Conclusions
Many counties suffer from insufficient detection of COVID-19 infections, which may result in underestimation of the epidemic size and, in turn, hamper appropriate epidemic control measures. Our study proposed a mathematical model to investigate how insufficient detection may affect the dynamics of the spread of COVID-19. The model explicitly considers various detection probabilities for asymptomatic and symptomatic infections. We took the reported data from four states in the US as an example in our study and tuned the model parameters. We found that detection probabilities may vary over time with different strategies of control measures.
Model simulations show that both infected and confirmed cases are sensitively dependent on the detection probability. Insufficient detection for either asymptomatic or symptomatic infections may worsen the situation of the COVID-19 epidemic, including increasing in the number of daily new confirmed cases, the peak value of daily new infections, and the cumulative number of confirmed cases.
We further investigated the influence of varying the detection probabilities for both asymptomatic and symptomatic infections on the epidemic scale of our model. Simulations show that increasing the detection probability can significantly reduce the epidemic size. The detection probability for asymptomatic infections is very important for reducing the size of the epidemic. Therefore, early detection and isolation of COVID-19 infections is important for the control of the epidemic. Nevertheless, asymptomatic infections often generate false-positive and false-negative results for asymptomatic infections, and there is a tradeoff between test sensitivity and test frequency when there are limitations in the testing budget. In this case, a multiscale modeling study has recommended that low-sensitivity tests be employed at high frequency [37].