Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access November 14, 2023

Evaluating early pandemic response through length-of-stay analysis of case logs and epidemiological modeling: A case study of Singapore in early 2020

  • Jaya Sreevalsan-Nair EMAIL logo , Anuj Mubayi , Janvi Chhabra , Reddy Rani Vangimalla and Pritesh Rajesh Ghogale

Abstract

It is now known that early government interventions in pandemic management helps in slowing down the pandemic in the initial phase, during which a conservative basic reproduction number can be maintained. There have been several ways to evaluate these early response strategies for COVID-19 during its outbreak globally in 2020. As a novelty, we evaluate them through the lens of patient recovery logistics. Here, we use a data-driven approach of recovery analysis in a case study of Singapore during January 22–April 01, 2020, which is effectively the analysis of length-of-stay in the government healthcare facility, National Center for Infectious Diseases. We propose the use of a data-driven method involving periodization, statistical analysis, regression models, and epidemiological models. We demonstrate that the estimates of reproduction number in Singapore shows variation in different age groups and periods, indicating the success of early intervention strategy in the initial transmission stages of the pandemic.

MSC 2010: 62-07; 62H10; 62P10; 91C20; 92D30

1 Introduction

The viral contagion, named COVID-19 (C19), was declared as a global pandemic by the World Health Organization (WHO) on March 11, 2020.[1] The pandemic, characterized by atypical pneumonia, is caused by a virus from the coronavirus family, namely, SARS-CoV2 (severe acute respiratory syndrome coronavirus-2), which is a positive-sense single-stranded RNA virus. A global pandemic in the 21st century saw a rapid exchange of information at a global level, which involved country-wise data collection amongst other mitigation and/or containment responses [29,30,74,78], as a collective effort.

The interest in analyzing the data related to C19 has been multi-faceted, of which the lessons from the country-wise early response are critical for preparedness for future pandemics [68]. These lessons transcend public healthcare studies and also cover the social, political, cultural, and economic aspects of how each country responded to the global pandemic. The varied early response strategies include examples of the collectivist and self-sacrificing tendencies in East and Southeast Asian countries, restrictions limited by fragile healthcare systems in Australasian, East European, Scandinavian, Middle Eastern, African, and South American countries, and slow response by the developed economies in North America and Europe avoiding infringement of personal freedom of citizens. The studies pertaining to the early response have matured since 2020 and shifted focus from just the initial data-driven studies of disease transmission [4,74] to other hidden and sometimes, forgotten, aspects of social policy response [22].

It is now known that the effective early responses involved fast coordinated responses in the form of government interventions in developed economies, especially, e.g., Taiwan, Australia, and Singapore. [28,50, 65,74]. It included a combination of containment and mitigative strategies [13]. Containment strategies included active contact tracing using digital technologies, policies governing early screening/isolation/quarantine, and widespread mask use [64,65,70,74], and mitigative strategies included vaccinations [13]. The containment strategies were quicker to arrive at and adopt than the mitigative ones, and the type of response was time sensitive in the face of an evolving global pandemic.

The learnings from these studies lean towards building resilient early responses through active comprehensive communication, adapting healthcare capacity, preserving health system functions and resources, and reduction of vulnerabilities [28]. For the implementation of such response systems, there is a dominant global urgency for governance supported by scientific evidence, amongst others [28], including modeling the response [7]. Today, there are global information platforms, such as the COVID-19 health system response monitor that are used by health policy experts to collate information and distill key policy insights related to health systems using governance tools [72].

In the post-COVID era, there is a need to study the effectiveness of such early response systems during the outbreak. There are several data-driven studies that have emerged of which we are interested in those on nonpharmaceutical interventions (NPIs). The NPIs include government interventions, which are complex to study as they involve several sectors in each country, e.g., the government, healthcare, news media, transport, and education. Ranking these NPIs in the wake of the outbreak [29], the following NPIs were ranked highly: (a) small gathering cancellations, (b) closure of educational institutions, (c) border restrictions, (d) increase in healthcare and public health capacities, (e) individual movement restrictions, and (f) national lockdown. Such a ranking has been arrived at through an extensive analysis of time-varying reproduction number t obtained from a coded dataset of NPIs. At the same time, the government agencies/departments of developed economies also actively maintained and provided public-facing data in a timely fashion through notices and social media posts, which have been found to be insightful for the public [59]. A few of such agencies include the Ministry of Health (MOH) in Singapore, the Centers for Disease Control and Prevention (CDC) in the United States, and Public Health England (PHE). The data that are available in the public domain, i.e., the open data, can be harnessed for several retrospective studies, e.g., study done in African countries [34], Singapore [67], and so on. We observe that several studies are based on the incidence of the disease [34,67,69].

Here, we shift the focus of data analysis to be viewed through the lens of recovery. We demonstrate a case study using early response data in the form of demographic information in case logs through in-depth data analysis. We observe that particularly for Singapore, there have been critical assessments of its early response and communications [1,16,32], which emphasizes the 3P strategy – planning, preparedness, and protective equipment [15]. Interestingly, despite the active early response, the disease rapidly spread in April 2020 owing to the dense population in workers’ dormitories. The government acted quickly to put in “circuit-breaker” strategies to control the spread and the fatality rate [1,16,32]. Thus, the response before April 2020 to slow the pandemic down is different from that after the exponential spread. Given the availability of reliable open data and a definite scope of government interventions, we choose C19 in Singapore during January 22–April 01, 2020, for our case study.

2 A case study: Singapore during January 22–April 01, 2020

In this article, we use the case study of the early outbreak of COVID-19 (C19) in Singapore to demonstrate how the data-driven analysis of open data can feed into different epidemiological models. Our goals are to:

  1. use different types of statistical models to estimate length-of-(hospital)-stay (LOS) for different population demographics (i.e., age and gender) in different stages of the epidemic, and

  2. integrate fine-to-coarse scale of population analysis by using individual-level log data (that may be extended to clinical data) to draw cohort-level implications.

In Singapore, COVID-19 was managed at the time of its outbreak in January 2020 using active surveillance for case detection and containment. The disease was managed by admitting all suspected and positive cases in an outbreak screening center, the National Center for Infectious Diseases (NCID) [40]. The patients were determined to be infected through an initial chest radiograph and positive results of respiratory samples tested using the SARS Cov-2 reverse transcriptase polymerase chain reaction (RT-PCR) test. The patients were determined to be disease free only after all symptoms were resolved, and two nasopharyngeal samples gave negative results on the RT-PCR test for two consecutive days.

Definition 1

(LOS [in-hospital]) We define the length of in-hospital stay, Δ t LOS , as the period between index admission to the hospital of a patient to the discharge, which is roughly equivalent to the period of time between the infection and the time of being disease free. Hence, Δ t LOS can be considered equivalent to the recovery period in this case.

2.1 Dataset

Here, we describe the dataset for our case study and why we chose the same.

As on April 02, 2020, there were 827,419 positively confirmed C19 cases, and 40,777 deaths, spread across 206 countries,[2] and the ratio of the number of the recovered[3] to the infected patients, r r i 0.2 . On the other hand, in Singapore, 245 of 1,000 C19 positive cases had clinically recovered by April 1, thus giving r r i = 0.245 . In addition to the recovered, i.e., the hospital-discharged cases, the number of deaths in Singapore owing to the C19 complications was only three by April 1. The recovered cases were 79.5 % of the closed cases (i.e., recovery or death) worldwide, while it was at 98.8 % in Singapore, on April 1. Both Singapore and South Korea are Southeast Asian countries that aggressively followed the “test and trace strategy,” which has been shown to slow the pandemic down [43]. South Korea crossed its first 1,000 C19 confirmations[4] during January 20–February 26, i.e., within 38 days, compared to 70 days in Singapore, giving r r i = 0.019 with the recovered being only 66.67% of closed cases [64].

Singapore had fared relatively well in its initial pandemic response owing to its effective public healthcare system that implemented a quick and efficient pandemic response since its first case got confirmed on January 23, 2020. The government enforced strict quarantine, isolation, hospital surveillance, large-scale contact tracing, and testing. The centralization of the system has facilitated responsive data gathering and anonymized patient-wise reporting to the public. The first cases of local transmission were determined on February 4th, after which there were concerted efforts to protect the age groups, namely, (0–20) and (60+) years, considered vulnerable to the C19 conditions at that time. On the other hand, South Korea did not focus on early containment of local transmission, even though it had implemented other measures as done by Singapore. Thus, the overall progression of C19 in Singapore indicates that its early measures had effectively slowed the local transmission down.

There have been extensive studies on the effect of preventive measures, e.g., restrictions on human mobility, on the C19 transmission dynamics [23]. There has been recent work that has shown how epidemic diffusion is slowed down by larger fractions of the population that deny risks, which is implicitly enforced by the government [20], as was the case with Singapore also. An interesting study has determined that the government measures for C19 management take 9 days of time-lag for being effective [44].

We use the open data available through the official press releases or notices of the MOH, the Government of Singapore, for this study. The type of data used here is administrative data, which includes the hospital admission-discharge logs and is used for demographic analysis. The quality of administrative data is known to improve when data collection is done using machines with patient IDs [19]. Thus, we emphasize the credibility of our study owing to the reliability, accessibility, and availability of the data published by the governmental agency [51]. Due to the unique practice of mandatory hospitalization of all C19 patients that were followed in Singapore during the early period of the pandemic, the data on C19 recovery has been systematically gathered, and also, the recovery period is the same as LOS in hospitals. In our previous works [6163], we determined the statistical model for LOS estimation based on recovery data of patients recovered until April 01, 2020, in Singapore. Here, we shift our focus on the available recovery data of all patients infected until April 1st.

The data for our work have been collated from the health surveillance reports containing case summaries in the public press releases made by the MOH. We use the age, gender, positive confirmation date, and discharge date from these anonymized summaries. We consider Δ t LOS as both an observed count variable and as a time-series variable, here. In our study, we use statistical modeling for estimating Δ t LOS , and then we use the estimated value in epidemiological models. We purposefully chose the period until April 01, 2020, for identifying the C19 patients for threefold reasons: (i) with the number of infected being 1,000 at this juncture, the data are available reliably until April 01, 2020, (ii) April 1 marks the end of the period of community transmission in Singapore, after which the country has entered the epidemic/pandemic stage, and (iii) the available data are complete, thus facilitating the use of appropriate statistical modeling. Despite an organized healthcare sector in Singapore, the number of infections took a steep uptake following an exponential curve beyond April 1. Even in the presence of a rigorous internal tracking mechanism, the case IDs are not published regularly in government notices. For the data of 1,000 patients infected on or before April 1, the demographic analysis is shown in Figure 1 and Table 1.

Figure 1 
                  Visualization of available data of C19 patients in Singapore during Jan 23–Apr 01, 2020. (a) Cumulative stacked bar charts as population pyramids give the age–gender distribution, and (b) visual representation of period-wise available data, showing (i) all patients infected as well as recovered, (ii) patients confirmed C19 positive, and (iii) patients discharged upon clinical recovery. Periodization used in (a) uses a positive confirmation date, whereas in (b), the discharge dates are shown in (i) and (iii). The available data have 1,000 C19 patients (576M, 424F), of whom 671 (395M, 276F) have discharge dates. The last discharge date in the available data is Apr 20, 2020.
Figure 1

Visualization of available data of C19 patients in Singapore during Jan 23–Apr 01, 2020. (a) Cumulative stacked bar charts as population pyramids give the age–gender distribution, and (b) visual representation of period-wise available data, showing (i) all patients infected as well as recovered, (ii) patients confirmed C19 positive, and (iii) patients discharged upon clinical recovery. Periodization used in (a) uses a positive confirmation date, whereas in (b), the discharge dates are shown in (i) and (iii). The available data have 1,000 C19 patients (576M, 424F), of whom 671 (395M, 276F) have discharge dates. The last discharge date in the available data is Apr 20, 2020.

Table 1

Percentage values of the age–gender structure of the C19 patients during January 23–April 01, 2020, in Singapore, from the available data from MOH (Figure 1(a))

Age group (in years) (0–9) (10–19) (20–29) (30–39) (40–49) (50–59) (60–69) (70–79) (80–89) (90+) Total
Gender
Positive C19 confirmation (in %-age of 1,000 patients (576M, 424F))
Male 0.6 1.2 15.4 12.4 9.8 9.3 6.1 2.2 6 0.0 57.6
Female 0.7 1.7 11.9 7.8 5.7 7.1 4.9 1.6 0.9 0.1 42.4
Total 1.3 2.9 27.3 20.2 15.5 16.4 11 3.8 1.5 0.1 100
Clinically recovered (in %-age of 671 patients (395M, 276F))
Male 0.6 1 14 12.4 11.33 11.03 6.4 1.9 0.3 0 58.9
Female 0.6 1 8.95 9.1 5.81 7.3 6.1 1.8 0.45 0 41.1
Total 1.2 2 22.95 21.5 17.14 18.33 12.5 3.7 0.75 0 100

Bold values represent high percentage values.

Age-gender distribution. Given that school children and senior citizens are considered as vulnerable groups, we consider the age brackets (0–19), (20–59), and (60–89). They are referred to as A 1 , A 2 , and A 3 , respectively. However, most studies use a finer scale of age brackets of 10 years [54]. Here, we use the finer scale of age brackets for preliminary data analysis and the coarser one for regression modeling.

The preliminary analysis shows that the age–gender distribution of the confirmed (i.e., the infected), and the recovered are different, as shown in the population pyramid (Figure 1(a)(i)). The gender ratios (male to female) of the infected and the recovered are 1.36 and 1.43, respectively (Table 1). The highest count of confirmed cases is in the age group (20–29) years (Figure 1(a)(ii)), but that of recovered cases is in the age group (30–39) years (Figure 1(a)(iii)). This demonstrates that the affected population is in the overall age group of (20–39) years. These counts also indicate that the restricted mobility of the vulnerable group of the population, e.g., the older male [77], has led to shifts in the distribution of contagion in the population, especially with respect to age.

Periodization. The first imported and local transmission cases were confirmed on January 23 and February 4, respectively. A sudden spike in the daily confirmations occurred on March 17. We periodize using these events for piecewise analysis of the pandemic progression (Figure 1(b)):

  1. P 1 : January 23–February 03, 2020,

  2. P 2 : February 04–March 16, 2020,

  3. P 3 : March 17, 2020–April 01, 2020, and

  4. P 4 : April 02, 2020, and thereafter.

The period-wise counts of positive confirmed cases, N i , discharged/recovered cases, N r , and deceased, N d , are presented in Table 2. The rationale behind our periodization is based on the transmission stages of the pandemic, as explained in A.

Table 2

Counts of C19 positive confirmed cases, N i , discharged/recovered cases, N r , and deceased, N d in Singapore based on transmission stages, used in our case study

Period Description of transmission stage Time-period Cumulative
N i N r N d N i N r
P 1 Imported cases January 23–February 03, 2020 22 0 0 22 0
P 2 Local transmission February 04–March 16, 2020 225 109 0 247 109
P 3 Cluster of cases March 17–April 01, 2020 753 136 3 1,000 245
P 4 Community transmission April 02, 2020– 0* 42 6 0 * 1,000 671

* Not the total count, as the total count is not of interest to our study.

Not the total count, as the count of our interest is the number of C19 patients infected in P 3 who recovered in P 4 .

Bold values represent high percentage values.

The period-wise age–gender distribution shows that P 1 is different from both P 2 and P 3 (Figure 1(a)). The ratio of recovered to infected, r r i , i.e., N r N i , has increased from 0.00 to 0.45, and again dipped to 0.245, through P 1 to P 3 . However, the trend in r r i does not explicitly indicate the influence of shifts in the age–gender distribution (Figure 1(a)(ii)) of N i and N r on tr . Hence, we analyze the time-varying Δ t LOS using descriptive statistics and regression. We limit our study of P 4 to only the available data of recovery length of the patients confirmed during P 2 and P 3 but recovered during P 4 . This is because P 4 has stretched from April 1st, 2020, until now, as the world has witnessed. There have been several changes during P 4 itself which cannot be studied using the analysis based on transmission stages alone.

2.2 Statistical estimates of LOS

Our statistical modeling and analysis of the recovery period data supplement other approaches of estimation based on the count of the infected [4] and clinical/laboratory/treatment analysis [79], and to a lesser extent, epidemiological transmission models [57]. Our work is different from the LOS estimation for patients requiring special care during C19 [5], as the dataset of our interest is of all C19 patients during the first three transmission stages of the pandemic in Singapore.

Descriptive statistical analysis of Δ t LOS in groups based on age, gender, and period provides a preliminary understanding of the influence of these variables on Δ t LOS . We further use an appropriate regression model to estimate Δ t LOS . Here, we consider Δ t LOS as an observed count variable and fit the frequency distribution of Δ t LOS into a multivariate linear regression model. Here, we use semi-parametric generalized linear models (GLM) for regression, with age, gender, and period as the predictor variables, and Δ t LOS as the dependent variable.

For the period-wise analysis of Δ t LOS , grouping is needed for determining significant sub-populations. One way to group the discharged patients is based on their positive confirmation date, referred to as G +cfrmDt . An alternative grouping is based on their discharge date, referred to as G -cfrmDt . Since the protocols followed in the hospital are similar for patients with closer admission dates, the G +cfrmDt grouping shows more cohesive descriptive statistics of Δ t LOS than the G -cfrmDt one. Thus, we rationalize the use of G +cfrmDt grouping here for period-based analysis. We observe in G +cfrmDt grouping how P 1 has larger variation compared to P 2 and P 3 (Figure 2(b)). By virtue of sparser data in P 1 , we observe that both P 2 and P 3 jointly influence the overall trend (Figure 2(a)). We do not observe such insights from G -cfrmDt grouping (Figure 2(c)). Thus, the period-wise age–gender-based analysis using box and whisker plots corroborates with our rationale behind the choice of G +cfrmDt grouping over G -cfrmDt for period-based analysis.

Figure 2 
                  Gender- and age-wise box-and-whisker plots of LOS of 671 discharged patients grouped in periods during (i) January 23–February 03 [
                        
                           
                           
                              
                                 
                                    P
                                 
                                 
                                    1
                                 
                              
                           
                           {P}_{1}
                        
                     ] (ii) February 04–March 16 [
                        
                           
                           
                              
                                 
                                    P
                                 
                                 
                                    2
                                 
                              
                           
                           {P}_{2}
                        
                     ], and (iii) March 17–April 01 [
                        
                           
                           
                              
                                 
                                    P
                                 
                                 
                                    3
                                 
                              
                           
                           {P}_{3}
                        
                     ]. Periods: (a) during January 23–April 01 [entire period], (b) period-wise grouping of patients based on C19 positive confirmation date (
                        
                           
                           
                              
                                 
                                    G
                                 
                                 
                                    +cfrmDt
                                 
                              
                           
                           {{\mathbb{G}}}_{{\bf{+cfrmDt}}}
                        
                     ), and (c) period-wise grouping of patients based on clinical recovery/hospital discharge dates (
                        
                           
                           
                              
                                 
                                    G
                                 
                                 
                                    -cfrmDt
                                 
                              
                           
                           {{\mathbb{G}}}_{{\bf{-cfrmDt}}}
                        
                     ). The number of patients considered is (a) 671, ((b),(i)) 22, ((b),(ii)) 208, ((b),(iii)) 441, ((c),(i)) 109, ((c),(ii)) 136, and ((c),(iii)) 426.
Figure 2

Gender- and age-wise box-and-whisker plots of LOS of 671 discharged patients grouped in periods during (i) January 23–February 03 [ P 1 ] (ii) February 04–March 16 [ P 2 ], and (iii) March 17–April 01 [ P 3 ]. Periods: (a) during January 23–April 01 [entire period], (b) period-wise grouping of patients based on C19 positive confirmation date ( G +cfrmDt ), and (c) period-wise grouping of patients based on clinical recovery/hospital discharge dates ( G -cfrmDt ). The number of patients considered is (a) 671, ((b),(i)) 22, ((b),(ii)) 208, ((b),(iii)) 441, ((c),(i)) 109, ((c),(ii)) 136, and ((c),(iii)) 426.

Given the age-based mobility measures, we use coarse age binning for performing hypothesis testing and regression modeling. Thus, we use the binning of { A 1 , A 2 , A 3 } , { male , female } , and { P 1 , P 2 , P 3 } for age, gender, and period, respectively, for hypothesis testing prior to determining appropriate GLMs. We use the Kruskal–Wallis H (KWH) test of Δ t LOS with respect to age bins, and with the period, separately. The KWH test is a rank-based nonparametric test to determine if two or more groups of an independent ordinal/continuous variable (i.e., age, and period) have statistically significant differences with the continuous dependent variable (i.e., Δ t LOS ). We use the Mann–Whitney–Wilcoxon (MWW) test of Δ t LOS with respect to gender. The MWW test is a nonparametric two-sample test to verify the null hypothesis that the distribution of a continuous variable (i.e., Δ t LOS ), which is not normal, is the same in two or more independent groups formed based on a nominal/categorical independent variable (i.e., gender).

We now determine the appropriate GLM by experimenting with Poisson (PRM) and negative binomial (NBM) distributions for Δ t LOS . These distributions are commonly used for LOS, owing to its naturally skewed distribution [11], and for over-dispersed data [76], such as LOS. In general, PRM and NBM are traditionally used for frequency count data [73], which is the property associated with LOS here. PRM assumes spatial independence, whereas NBM is more generalized [73]. Generally, most biological mechanisms that may be over-dispersed for PRM distribution, NBMs have been found to have a better fit. Here, we fit the model using frequency distribution of LOS, where the model uses LOS as the dependent variable Y , and age, gender, and period of infection as predictor variables X 1 , X 2 , and X 3 , respectively. They are used as numerical, nominal, and ordinal type variables, respectively. We assume the frequency distribution of Y to be Poisson with E ( Y ) = Var ( Y ) = μ , or NBM, where Var ( Y ) = μ is relaxed. Thus, the GLM with intercept β 0 , regression coefficients, β i for i = 1 , 2 , 3 , and error term ε is given as follows:

g ( μ ) = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + ε .

We use the mean and median in the different groups of patients, from the descriptive statistical analysis, to determine if a single linear regression model or a set of local linear models will be a good fit for the data. The latter, called as piecewise linear regression modeling, uses several local linear approximations, thus providing more accurate and significant insights than a single global model, following the principle of local linearity [25]. Even for large-scale datasets, learning models have been proposed to determine the piecewise linear regression models [27], whose scalability with the dataset size can be improved [3]. In our work, since the dataset is small, we use its semantic context for arriving at the linear models. The inference of such a model is done using the likelihood approach [66].

2.2.1 Descriptive statistical analysis

Δ t LOS observed in 671 patients infected during the entire period, January 23–April 01, 2020, have the descriptive statistics as shown in Figure 3(i)–(iii), with the following (#patients, mean of Δ t LOS , the standard deviation of Δ t LOS , the median of Δ t LOS ), given as ( N , μ , σ , Mdn ) in Table 3.

Figure 3 
                     Box-and-whisker plots of LOS of 671 discharged C19 patients, infected during January 23-April 01, 2020, in Singapore, based on (i) age, (ii) gender, and (iii) period, with 
                           
                              
                              
                                 
                                    
                                       G
                                    
                                    
                                       +cfrmDt
                                    
                                 
                              
                              {{\mathbb{G}}}_{{\bf{+cfrmDt}}}
                           
                         grouping.
Figure 3

Box-and-whisker plots of LOS of 671 discharged C19 patients, infected during January 23-April 01, 2020, in Singapore, based on (i) age, (ii) gender, and (iii) period, with G +cfrmDt grouping.

Table 3

Descriptive statistics (mean μ , standard deviation σ , median Mdn ) of the length of in-hospital stay ( Δ t LOS ) of C19 patients in Singapore, infected during the first three transmission stages

Description # Patients Δ t LOS (in days)
N μ σ Mdn
Overall 671 16.09 7.63 16
Age wise, age groups in years (Figure 3(i))
A 1 (0–9) 8 12.75 8.31 15
(10–19) 14 14.71 4.08 15.5
A 2 (20–29) 153 16.75 6.65 17
(30–39) 144 15.10 6.92 13.5
(40–49) 115 15.60 6.85 15
(50–59) 123 15.71 7.19 15
A 3 (60–69) 84 17.74 9.70 16
(70–79) 25 18.84 13.22 17
(80–89) 5 12.60 7.23 11
Overall, age weighted 15.6
Gender wise (Figure 3(ii))
M Male 395 16.25 8 16
F Female 276 15.86 7.08 15
Overall, gender weighted 15.3
Period wise, with G +cfrmDt grouping (Figure 3(iii))
P 1 (Jan 23–Feb 03, 2020) 19 17.32 6.25 17
P 2 (Feb 04–Mar 16, 2020) 211 15.82 10.04 14
P 3 (Mar 17–Apr 01, 2020) 441 16.16 6.23 16
Overall, period weighted 15.4

Bold values represent high percentage values.

We observe in Table 3 that the median of Δ t LOS is comparable to that reported in some of the early analyses of C19 patients, i.e., 15 days [4], while another study had concluded it to be 20 days for recovery [79]. The overall, gender-weighted, age-weighted, and period-weighted medians of Δ t LOS are 16, 15.6, 15.3, and 15.4 days, respectively. Thus, the overall descriptive statistical analysis gives a conservative estimate of Δ t LOS to be 15.5 days.

For the gender-based analysis, we observe that the descriptive statistics for the male and female gender are similar. Given the variations in the descriptive statistics observed in different age groups as well as periods, we hypothesize that there is a strong influence of age as well as period on Δ t LOS , unlike gender.

We perform further analysis through the hypothesis tests and regression models to identify the extent of influence of variables, namely, age and national transmission stage on the recovery period, i.e., Δ t LOS .

2.2.2 Multivariate linear regression models

The regression model can be generated piecewise with respect to a predictor variable in the case of the nonmonotonous behavior of the observed variable. We observe that the median Δ t LOS monotonously decreases from age bins (20–29) to (50–59), and then further monotonously increases with the age (Figure 3 (i)). We observe anomalous behavior in age bins (30–39) and (80–89). Like age, we observe that Δ t LOS shows nonmonotonous behavior in the case of the period (Figure 3 (iii)). Hence, we use the piecewise GLMs for regression with respect to age bins, as well as the periods indicating transmission stages of C19 in Singapore.

We have implemented the hypothetical tests using kruskal.test and wilcox.test in the Stats package in R [52]. We use the null hypothesis H 0 , “all groups are statistically similar,” and the alternative H 1 hypothesis, “all groups are not statistically similar.” We reject the H 0 when the p -value < 5%. Table 4 shows that groups based on age bins give statistically significant results for P 2 with G +cfrmDt grouping as well as for males, thus indicating that age influences Δ t LOS . For period-wise grouping, we observe statistically significant results with A 1 sub-population as well as male, thus indicating the influence of transmission stage on Δ t LOS . We observe that gender-wise grouping does not show any statistical significance. Overall, these results confirm the use of agebinning as well as periodization for piecewise GLMs.

Table 4

Hypothesis testing results of 671 discharged patients using the Kruskal–Wallis H test of Δ t LOS with respect to age and period, and using the Mann–Whitney–Wilcoxon test of Δ t LOS with respect to gender

Δ t LOS by Age bins using Kruskal–Wallis H test
Parameters All P 1 P 2 P 3 Male Female
N (# samples) 671 19 211 441 395 276
d f (degrees of freedom) 2 1 2 2 2 2
H (H-statistic) 2.35 0.00 6.05 2.33 8.72 0.80
p ( p -value) 0.30 1.00 0.05 0.31 0.01 0.67
Δ t LOS by period [ G +cfrmDt ] using Kruskal–Wallis H test
Parameters All A 1 A 2 A 3 Male Female
N (# samples) 671 22 535 114 395 276
d f (degrees of freedom) 2 1 2 2 2 2
H (H-statistic) 6.25 4.61 5.47 0.20 6.15 1.44
p ( p -value) 0.04 0.03 0.06 0.90 0.04 0.48
Δ t LOS by gender using Mann-Whitney–Wilcoxon test
Parameters All P 1 P 2 P 3 A 1 A 2 A 3
N (# samples) 671 19 211 441 22 535 114
W (W-statistic) 55,757 59.5 5,404 24,171 55.5 33,189 2047
p ( p -value) 0.62 0.25 0.98 0.56 0.76 0.62 0.02
r (effect size statistic) 0.019 0.270 0.001 0.027 0.070 0.021 0.225

Bold values represent high percentage values.

After identifying the independent variables, we implemented the regression models using the glm in the Stats package in R [52]. For GLM with Poisson and binomial families, the dispersion is fixed at 1.0, and the number of parameters ( k ) is the same as the number of coefficients in the regression model [52]. The NBM distribution has an additional parameter to model over-dispersion in the data. We use the Akaike information criterion (AIC) and its corrected version for a small sample size (AICc) for determining the goodness of fit of our proposed models. For the number of samples ( N ) in the data, we use AIC when ( N k > 40 ) , and AICc, otherwise [9].

Regression models with age binning: We consider two modeling scenarios for GLM with age binning. We use three independent variables, namely, age, gender, and period, here. The modeling scenarios are Model 1 , which is a single GLM for all 671 patients; and Model 2 , a piecewise linear model with three GLMs separately for A 1 , A 2 , and A 3 . While we have used the age bins for grouping in 2 , age is used in both models as a numerical variable. Our results are given in Table 5. The count distribution and approximation of distribution functions are shown in Figure 4(a) and (b), respectively. Our inferences are as follows:

  1. We find that the variables, i.e., age, gender, and period, do not show consistency in their statistical significance across 1 or 2 , for both PRM and NBM.

  2. The NBM shows medians and the variances (observable from range and IQR) of deviance residuals 0 , lower AIC/AICc, and more consistent Δ t LOS for maximum likelihood value than the PRM. Thus, NBM is a better fit than PRM when using age binned piecewise GLM.

  3. Figure 4(b) also demonstrates that NBM gives a better fit than PRM.

  4. Overall, the small sample size in 2 with large variance for age (0–19) makes the modeling erroneous. For both PRM and NBM, the remaining linear models in the piecewise 2 are a better fit than the single model, 1 , owing to the lower error measures (AIC/AICc and deviance residuals). We also find that the errors, mean squared error and root mean squared error in 1 , are reduced when we exclude the samples (age: 0–19) from the observations. The goodness of fit criteria demonstrates that NBM is a better fit than PRM in this scenario.

Overall, by choosing NBM and 2 , we obtain a frequency-weighted estimate of Δ t LOS = 12.96 days using the argument value with maximum likelihood. Our additional experiments show that the error values increase in the absence of the period as an independent variable. The key conclusion from the hypothesis testing and multivariate regression analysis is that a GLM using age binning for piecewise modeling, i.e., M2, with NBM distribution is an apt model to estimate Δ t LOS .

Table 5

Results of the generalized linear models with age, gender, and period as independent variables, using Poisson distribution and negative binomial distribution of Δ t LOS , with all data, 1 , and piecewise linear model for age binning, 2

Model results 1 : all data 2 : A 1 2 : A 2 2 : A 3
(age:(0–89)) (age:(0–19)) (age:(20–59)) (age:(60–89))
N 671 22 535 114
d f 670 21 534 113
Poisson regression model (PRM)
k 4 4 4 4
N k 167.5 5.25 133.5 28.25
Coefficients:
Intercept (p = 0.000, (p = 0.037, (p = 0.000, (p = 0.000,
CI = [2.58, 2.83]) CI = [0.020,1.76]) CI = [2.57, 2.88]) CI = [3.09, 4.19])
Age (p = 0.002, (p = 0.58, (p = 0.201, (p = 0.72,
CI = [0.0007, 0.003]) CI = [ 0.023 , 0.013]) CI = [ 0.003 , 0.0006]) CI = [ 0.008 , 0.006])
Gender (p = 0.18, (p = 0.32, (p = 0.27, (p = 0.00,
CI = [ 0.06 , 0.012]) CI = [ 0.36 , 0.119]) CI = [ 0.02 , 0.068]) CI = [ 0.363 , 0.185 ])
Period (p = 0.574, (p = 0.00, (p = 0.387, (p = 0.009,
CI = [ 0.025 , 0.046]) CI = [0.36, 1.06]) CI = [ 0.02 , 0.058]) CI = [ 0.189 , 0.026 ])
#F. S. Iter. 4 5 4 5
AIC; AICc 5401.9; NA NA; 149.45 4098.7; NA NA; 1085.13
Dev. Res. ( Mdn = 0.137 , ( Mdn = 0.273 ( Mdn = 0.128 , ( Mdn = 0.344 ,
IQR = [ 1.38 , 1.009]) IQR = [ 0.84 , 0.632]) IQR = [ 1.30 , 1.027]) IQR = [ 1.63 , 0.933])
MSE 0.32 0.35 0.30 0.37
RMSE 0.57 0.59 0.55 0.61
MLE 9.8% at Δ t LOS = 16 7.4% at Δ t LOS = 15 9.9% at Δ t LOS = 15 8.0% at Δ t LOS = 17
NBM
#Parameters ( k ) 5 5 5 5
N k 134 4.2 106.8 22.6
Coefficients:
Intercept (p = 0.000, (p = 0.090, (p = 0.000, (p = 0.000,
CI = [2.45, 2.94]) CI = [ 0.15 , 2.004]) CI = [2.45, 3.005]) CI = [2.42, 4.94])
Age (p = 0.100, (p = 0.861, (p = 0.49, (p = 0.75,
CI = [ 0.0003 , 0.0041]) CI = [ 0.0263 , 0.022]) CI = [ 0.0046 , 0.0022]) CI = [ 0.019 , 0.0145])
Gender (p = 0.52, (p = 0.42, (p = 0.54, (p = 0.006,
CI = [ 0.097 , 0.049]) CI = [ 0.477 , 0.201]) CI = [ 0.054 , 0.103]) CI = [ 0.458 , 0.076 ])
Period (p = 0.72, (p = 0.003, (p = 0.642, (p = 0.290,
CI = [ 0.05 , 0.079]) CI = [0.248, 1.148]) CI = [ 0.054 , 0.088]) CI = [ 0.279 , 0.082])
#F. S. Iter. 1 1 1 1
AIC; AICc 4569.6; NA NA; 149.6 3591; NA NA; 820.8
Dev. Res. ( Mdn = 0.071 , ( Mdn = 0.22 ( Mdn = 0.072 , ( Mdn = 0.15 ,
IQR = [ 0.76 , 0.52 ] ) IQR = [ 0.61 , 0.53 ] ) IQR = [ 0.75 , 0.55 ] ) IQR = [ 0.81 , 0.401 ] )
MSE 0.325 0.342 0.300 0.376
RMSE 0.57 0.58 0.55 0.61
MLE 5.6% at Δ t LOS = 13 6.3% at Δ t LOS = 12 5.9% at Δ t LOS = 13 4.7% at Δ t LOS = 13

CI: 95% confidence interval, Mdn : Median, Fisher scoring iterations, Error with respect to observed data, NA Not applicable, N : #Samples, d f : #Degrees of freedom, k : #Parameters, Dev. Res.: Deviance residuals, MLE: Maximum likelihood estimate/value.

Bold values represent high percentage values.

Figure 4 
                     Multivariate linear regression model of the recovery-period 
                           
                              
                              
                                 tr
                              
                              {\rm{tr}}
                           
                         for all 671 C19 discharged cases in Singapore during January 23–April 01, for (i) age (0–89) years [
                           
                              
                              
                                 
                                    
                                       ℳ
                                    
                                    
                                       1
                                    
                                 
                              
                              {{\mathcal{ {\mathcal M} }}}_{1}
                           
                        ], (ii) age bin (0–19) years [
                           
                              
                              
                                 
                                    
                                       ℳ
                                    
                                    
                                       2
                                    
                                 
                              
                              {{\mathcal{ {\mathcal M} }}}_{2}
                           
                        ], (iii) age bin (20–59) years [
                           
                              
                              
                                 
                                    
                                       ℳ
                                    
                                    
                                       2
                                    
                                 
                              
                              {{\mathcal{ {\mathcal M} }}}_{2}
                           
                        ], and (iv) age bin (60–89) years [
                           
                              
                              
                                 
                                    
                                       ℳ
                                    
                                    
                                       2
                                    
                                 
                              
                              {{\mathcal{ {\mathcal M} }}}_{2}
                           
                        ]; uses (a) patient-count distribution and (b) fitting Poisson and negative binomial (NegBin) distributions. The number of patients considered is (i) 671, (ii) 22, (iii) 535, and (iv) 114, using 
                           
                              
                              
                                 
                                    
                                       G
                                    
                                    
                                       +cfrmDt
                                    
                                 
                              
                              {{\mathbb{G}}}_{{\bf{+cfrmDt}}}
                           
                         grouping.
Figure 4

Multivariate linear regression model of the recovery-period tr for all 671 C19 discharged cases in Singapore during January 23–April 01, for (i) age (0–89) years [ 1 ], (ii) age bin (0–19) years [ 2 ], (iii) age bin (20–59) years [ 2 ], and (iv) age bin (60–89) years [ 2 ]; uses (a) patient-count distribution and (b) fitting Poisson and negative binomial (NegBin) distributions. The number of patients considered is (i) 671, (ii) 22, (iii) 535, and (iv) 114, using G +cfrmDt grouping.

Regression models with period binning. We now consider the piecewise modeling of GLM using periodization as a criterion. Here, the modeling scenarios are as follows: Model 1 , which is a single GLM for all 671 patients; and Model 2 , a piecewise linear model with three GLMs separately for P 1 , P 2 , and P 3 . Since the period is a categorical variable and has a constant value for the piecewise grouping in 2 , we now compute both 1 and 2 using 2 independent variables, namely, age, and gender. Our results are given in Table 6, and our inferences are as follows:

  1. We find that the gender is not statistically significant consistently across 1 or 2 , for both PRM and NBM.

  2. Just as in the age binned models, the NBM shows medians and the variances (observable from range and IQR) of deviance residuals closer to 0, lower AIC/AICc and more consistent Δ t LOS for maximum likelihood value than the PRM. Thus, NBM is a better fit than PRM when using period binned piecewise GLM.

  3. The piecewise model 2 is a better fit than the single global model, 1 , as demonstrated by the goodness-of-fit criteria with the lower error measures, i.e., AIC/AICc and deviance residuals.

Overall, choosing NBM and 2 , we obtain a frequency-weighted estimate of Δ t LOS = 12.76 days, as the argument for maximum likelihood. The key conclusion from the hypothesis testing and multivariate regression analysis is that a GLM using period binning for piecewise modeling, i.e., M2, with NBM distribution is an apt model to estimate Δ t LOS .

Table 6

Results of the generalized linear models with age and gender as independent variables, using Poisson distribution and negative binomial distribution of recovery period Δ t LOS , with all data, 1 , and piecewise linear model for period binning, 2 , both with G +cfrmDt grouping

Model results 1 : All data 2 : P 1 2 : P 2 2 : P 3
(Jan 23–Apr 01, 2020) (Jan 23–Feb 03, 2020) (Feb 04–Mar 16, 2020) (Mar 17–Apr 01, 2020)
N 671 22 208 441
d f 670 21 207 440
PRM
k 3 3 3 3
N k 223.33 7.00 69.00 146.67
Coefficients:
Intercept (p = 0.000, (p = 0.000, (p = 0.000, (p = 0.000,
CI = [2.66, 2.81]) CI = [2.48, 3.37]) CI = [2.32, 2.59]) CI = [2.76, 2.94])
Age (p = 0.002, (p = 0.503, (p = 0.000, (p = 0.235,
CI = [0.00, 0.01]) CI = [0.00, 0.01]) CI = [0.01, 0.01] CI = [0.00, 0.00])
Gender (p = 0.176, (p = 0.131, (p = 0.067, (p = 0.360,
CI = [ 0.07 , 0.01]) CI = [ 0.37 , 0.05]) CI = [ 0.14 , 0.00]) CI = [ 0.07 , 0.03])
#F. S. Iter. 4 4 4 4
AIC; AICc 5394.7; NA NA; 159.52 2069; NA 3126.2; NA
Dev. Res. ( Mdn = 0.142 , ( Mdn = 0.047 , ( Mdn = 0.4624 , ( Mdn = 0.063 ,
IQR = [ 1.37 , 0.99 ] ) IQR = [ 1.02 , 1.03 ] ) IQR = [ 1.92 , 1.41 ] ) IQR = [ 1.09 , 0.93 ] )
MSE 0.33 0.18 0.52 0.23
RMSE 0.57 0.43 0.72 0.48
MLE 9.8% at Δ t LOS = 16 9.3% at Δ t LOS = 16 8.9% at Δ t LOS = 15 9.9% at Δ t LOS = 16
Negative binomial regression model (NBM)
k 4 4 4 4
N k 167.50 5.25 51.75 110.00
Coefficients:
Intercept (p = 0.000, (p = 0.000, (p = 0.000, (p = 0.000,
CI = [2.59, 2.88]) CI = [2.23, 3.64]) CI = [2.10, 2.76]) CI = [2.7, 3.00])
Age (p = 0.109, (p = 0.702, (p = 0.02, (p = 0.46,
CI = [0.00, 0.01]) CI = [ 0.01 , 0.01 ]) CI = [0.00, 0.01]) CI = [0.00, 0.00])
Gender (p = 0.51, (p = 0.338, (p = 0.616, (p = 0.565,
CI = [ 0.10 , 0.05 ]) CI = [ 0.47 , 0.16 ]) CI = [ 0.21 , 0.12 ]) CI = [ 0.1 , 0.05 ])
#F. S. Iter. 1 1 1 1
AIC; AICc 4564.7; NA NA; 152.36 1485.1; NA 2882.8; NA
Dev. Res. ( Mdn = 0.077 , ( Mdn = 0.028 , ( Mdn = 0.18 , ( Mdn = 0.04 ,
IQR = [ 0.75 , 0.52 ] ) IQR = [ 0.718 , 0.664 ] ) IQR = [ 0.85 , 0.60 ] ) IQR = [ 0.70 , 0.570 ] )
MSE 0.33 0.18 0.52 0.23
RMSE 0.57 0.43 0.72 0.48
MLE 5.6% at Δ t LOS = 13 6.4% at Δ t LOS = 14 4.8% at Δ t LOS = 10 6.4% at Δ t LOS = 14

CI: 95% confidence interval, Mdn : Median, Fisher scoring iterations, Error with respect to observed data, NA Not applicable, N : #Samples, d f : #Degrees of freedom, k : #Parameters, Dev. Res.: Deviance residuals, MLE: maximum likelihood estimate/value.

Bold values represent high percentage values.

2.2.3 Summary of regression models

We observe that the scenarios of piecewise linear models, i.e., with age binning using three independent variables, and with period binning using two independent variables, yield comparable estimates. This is because gender does not have statistical significance in either, and the values of age and transmission stage (period) have been incorporated in both scenarios. We also observe that the government regulations in Singapore influenced the limited exposure of the vulnerable population, i.e., children, and senior citizens, slowing down of the progression of the transmission stages. Thus, we can conclude that our results reflect the efficient implementation of the pandemic response by the government. In general, the response by the government changed primarily when the disease transmission advances to the next stage. From the regression analysis, we conclude that Δ t LOS 13 days by rounding off. In comparison to our previous works [6163], where Δ t LOS 9 days, we now have a more complete and credible but a higher estimate based on the overall recovery analysis of C19 patients infected until April 1.

We observe that the regression models give a lower estimate compared to the median value of Δ t LOS 15 days in descriptive statistical analysis. A key result of our work is that the estimated in-hospital length of stay is 13 days, which is much lower than estimated in the literature published in June 2020 [4,79]. At this juncture, we can conclude that the overall lower Δ t LOS may be attributed to the government interventions and hospitalization, as an early response strategy.

Our additional analysis of the data using local regression is given in Appendix B, and clustering/classification outcomes using machine learning algorithms are included in Appendix C. Our curated data and statistical analysis code are available at https://github.com/vrrani/COVID19-Singapore.

In the case study of Singapore, Δ t LOS is analogous to the C19 recovery period. Hence, our analysis is not complete without the epidemiological models.

2.3 Epidemiological models

We estimated the value of Δ t LOS using statistical distributions and also determined its mean and variance for different populations using descriptive statistics. These can now be used as key parameters in appropriate epidemiological models [57].

Since the contagion had a time-varying reproduction number ( R t ) with characteristic trends in specific time periods [37], the use of periodization for piecewise analysis has been prevalent [37]. The governing differential equations in the simplest susceptible-infected-recovered (SIR) model, known as the Kermack–McKendrick model [35], are as follows:

(1) d N s d t = β N s N p N i , d N i d t = β N s N p N i γ N i , d N r d t = γ N i ,

β is the rate at which an infected individual infects others, γ is the transition rate in SIR model[5] N is the size of the population, N i is the number of infected persons (i.e., with positive COVID-19 confirmation), N r is the number of recovered persons (i.e., clinically recovered), and N s is the number of susceptible people. The basic reproduction number or reproduction rate, 0 = β γ , characterizes an infection. 0 > 1 implies that the infection will continue to spread, and 0 < 1 implies that the spread is limited and under control. In April 2020, 0 for COVID-19 was estimated to be 2.2 [39,55]. γ is estimated as the reciprocal of the infectious period [46], i.e., the recovery period Δ t LOS , which implies that γ is in the range of (0.05–0.077), based on estimates of Δ t LOS to be in the range of 13–20 days [4,79]. In our work, using an approximate estimate of Δ t LOS 13 from the piecewise regression models, we can now estimate γ 0.077 , during January–June 2020 in Singapore. Thus, our estimated recovery rate in a simple SIR model is higher than the published results at the time.

Figure 5 
                  State diagrams of our selected epidemiological models. (a) Model 
                        
                           
                           
                              
                                 
                                    ℰℳ
                                 
                                 
                                    1
                                 
                              
                           
                           {{\mathcal{ {\mathcal E}  {\mathcal M} }}}_{1}
                        
                     : SITR and (b) Model 
                        
                           
                           
                              
                                 
                                    ℰℳ
                                 
                                 
                                    2
                                 
                              
                           
                           {{\mathcal{ {\mathcal E}  {\mathcal M} }}}_{2}
                        
                     : Staged treatment/linear chain model.
Figure 5

State diagrams of our selected epidemiological models. (a) Model ℰℳ 1 : SITR and (b) Model ℰℳ 2 : Staged treatment/linear chain model.

2.3.1 Selected extensions of SIR model

Since the recovery in the case study involved treatment with hospitalization, we now extend the SIR model to those that incorporate treatment and hospitalization. Our study incorporates the government intervention of hospitalizing all infected patients, thus requiring the need for extended models. Thus, our work is different from a more generic SEIR (suspected-exposed-infected-recovered) model [67] on a similar dataset. We use the following models (Figure 5):

  1. Model ℰℳ 1 : SITR (susceptible-infected-treatment-recovery) model,

  2. Model ℰℳ 2 : linear chain model,

  3. Model ℰℳ 3 : age-structured SIHR (SIR with hospitalization) model.

In all three models described below, N refers to the total population. We define each model here using their concerned state variables and parameters defined in Table 7. The outcome of each of these models is the basic reproduction number 0 . We effectively compute the time-varying variant t when we compute 0 for different periods.

Table 7

State variables and parameters of our selected epidemiological models

Epidemiological models Specific state variables Common state variables
Symbol Definition Symbol Definition
ℰℳ 1 (SITR) T Treatment population S Susceptible population
ℰℳ 2 (Linear chain) T i Treatment population at i th stage I Infected population
ℰℳ 3 (Age-structured SIHR) H Hospitalized population R Recovered population
Parameters
Symbol Definition Value
ρ %age reduction of infectivity of T class individuals as compared to I class individuals 0.30/day [10]
ℰℳ 1 (SITR) and σ Per capita rate at which symptomatic individuals are isolated 0.1639/day [37]
ℰℳ 2 (Linear chain) β Infection rate 0.4/day [41]
α 1 Recovery rate 0.1809/day [37,42]
α 2 1/(Length of stay) = Δ t LOS 1 Estimated (Section 2.2)
ℰℳ 2 (Linear chain) m Number of stages in the treatment Estimated (Table 3)
ℰℳ 3 M Age-contact matrix [14]
(Age-structured SIHR) h i Hospitalization for i th age bracket [26]
β Infection rate 0.4/day [41]
θ i 1/(Length of stay = Δ t LOS 1 ) Estimated (Section 2.2)

ℰℳ 1 (SIHR) This is a SIR model with a treatment stage [56], given by the following equations:

(2) d S d t = β S ( I + ρ T ) N d I d t = β S ( I + ρ T ) N ( α 1 + σ ) I d T d t = σ I α 2 T d R d t = α 1 I + α 2 T

ℰℳ 2 linear chain/staged treatment model (SITmR) This is a SIR model with linear treatment stages [36]. The stages of treatment, i.e, m are estimated with the following equation:

μ ( L O S ) = 1 α 2 and σ ( L O S ) = 1 ( α 2 * s q r t ( m ) ) .

We use the values in Table 3 for the values of μ ( L O S ) and σ ( L O S ) , thus estimating α 2 and m for the same. The dynamics of the model are given by the following equations:

(3) d S d t = β S I + ρ i = 1 m T N d I d t = β S I + ρ i = 1 m T N ( α + σ ) I d T 1 d t = σ I ( m α 1 T 0 + δ T 0 ) d T i d t = d T i 1 d t + ( δ T i ) d R d t = α I + α 1 T m .

ℰℳ 3 age-structured SIHR. The effect of disease on each age group could vary as seen in Section 2.2. Hence, we use the age-structured SIR model with hospitalization stage [54]. The dynamics of the model are given by the following equations:

(4) d S i d t = β S i N j = 1 n i j I j d I i d t = β S i N j = 1 n i j I j γ I i d H i d t = γ h i I i θ i H i d R i d t = γ ( 1 h i ) I i + θ i H i .

2.3.2 Experiments and results

We perform two ensembles of experiments, namely, ( E 11 , E 12 ) and ( E 21 , E 22 ):

  1. Age-based experiments: In E 11 , we use the values of the Δ t LOS from the mean values from descriptive statistics (Table 3). In E 12 , we use the values of MLE Δ t LOS from the NBM regression model in age-based GLM (Table 5). Here, we run ℰℳ 1 , ℰℳ 2 , and ℰℳ 3 . For ℰℳ 3 , we use the MLE Δ t LOS obtained for the coarser age brackets, A 1 , A 2 , and A 3 , to the finer age groups or age brackets of 10 years, as given in Table 3.

  2. Period-based experiments: Here, we compare the 0 between the entire period (January 22–April 01, 2020), and for each period, P 1 , P 2 , and P 3 . Here, ℰℳ 3 is not applicable as the population samples in a combined age- and period-wise grouping are not sufficient for statistical analysis. In E 21 , we run ℰℳ 1 and ℰℳ 2 with values of Δ t LOS from descriptive statistics (Table 3). In E 22 , we run the two models but with MLE Δ t LOS from NBM in period-based GLM (Table 6).

It must be noted that for all experiments with ℰℳ 2 , we estimate the value of the number of treatment stages m using the mean ( μ ( L O S ) ) and variance ( σ ( L O S ) ) from descriptive statistics (Table 3), as they are more appropriate than using the values from a fitted regression model. The values of the parameters of the models (Table 7, and equations (2)–(4)) used in our experiments are given in Table 8.

Table 8

The estimated values of Δ t LOS used in the experiments with epidemiological values and the resultant basic reproduction number 0 .

Model ℰℳ 1 ℰℳ 2 ℰℳ 3
Experiment Δ t LOS 0 Δ t LOS μ ( L O S ) * σ ( L O S ) * 0 Δ t LOS 0
Age-based ensemble
E 11 16.09* 2.11 16.09* 16.09* 7.63* 1.45 [12.75, 14.71, 16.75, 15.10, …]* 1.75
E 12 13# 1.87 13# 16.09* 7.63* 1.51 [12, 12, 13, 13, …]# 1.75
Period-based ensemble
E 21
All data 16.09* 2.11 16.09* 1.66
P 1 17.32* 2.11 17.32* 16.09* 7.63* 1.63
P 2 15.82* 2.11 15.82* 1.66
P 3 16.16* 2.11 17.16* 1.64
E 22
All data 13 1.87 13 16.09* 7.63* 1.66
P 1 14 1.98 14 17.32* 6.25* 1.63
P 2 10 1.73 10 15.82* 7.08* 1.66
P 3 14 1.98 14 16.16* 6.23* 1.64

The values from Tables 3, 5, and 6 are marked with *, #, and , respectively. The resultant 0 is given in boldface and in italic values

Bold values represent high percentage values.

We have reported the results of ( E 11 , E 12 ) and ( E 21 , E 22 ) in Table 8 in boldface.

We observe the following from Table 8:

  1. In the age-based ensemble ( E 11 , E 12 ), the ℰℳ 1 is sensitive to the changes in the estimates of Δ t LOS , which is reflected in the change in 0 . This is unlike ℰℳ 2 and ℰℳ 3 .

  2. In the period-based ensemble ( E 21 , E 22 ), ℰℳ 1 is sensitive to the differences in parameters across periods in E 22 , but not in E 21 . Also, the ℰℳ 2 is sensitive to the differences in parameters across periods but insensitive to the values of μ and σ in ℰℳ 2 . This may be attributed to the fact that the m value is approximately the same across both experiments.

  3. ℰℳ 2 gives a lower estimate of 0 and ℰℳ 3 gives relatively higher value.

  4. For ℰℳ 1 , the 0 estimate is lower when the values of Δ t LOS are estimated using GLMs/regression instead of the statistical descriptors.

  5. All estimates of 0 in our experiments are conservatively low but are close to the forecasted result of the outbreak seen in Wuhan, China in February 2020 [4].

2.3.3 Summary of epidemiological models

Overall, we conclude that the flexibility in choosing the Δ t LOS estimate from statistical descriptors or regression models provides variations in computing the basic reproduction number. Hence, the data-driven analysis and modeling must be done with appropriate experimentation. Thus, we have demonstrated how case logs data of C19 patients can feed into the population/cohort-based epidemiological modeling by carefully integrating the different scales. Instead of estimating the dynamic/time-varying reproduction number t , we estimated the basic reproduction number 0 , as the reproduction number may be considered constant during the initial stages of the pandemic.

In the period-wise experiments, we observe a dip in the 0 in P 2 , thus indicating that the government interventions to contain local transmission have been effective. P 1 is characterized by imported cases, and P 3 by cluster of cases, which is a larger number. Thus, we can conclude that the crux of the government response in Singapore was in slowing down the pandemic in P 2 .

3 Discussion

Our case study demonstrates the untapped potential of open data for in-depth analysis of early responses through government interventions. However, open data comes with its own issues of management and usability. We discuss a few aspects of the usability of open data in health data science in general and in the context of a global pandemic. In this regard, we also elucidate the significance of the dataset presented in our case study in Section 2.

3.1 Open data in health data science

There are three controversies about health data science [49], of which the first and the third are relevant to our work.

  1. Controversy I: Data shall be used only for the purpose of data collection.

  2. Controversy II: Big Data can not replace traditional medical research practices.

  3. Controversy III: Health data shall not be reused for research without explicit patient consent, to protect patient privacy.

We observe that our work disagrees with the statements of controversies I and III. The data collected by the governmental agency have been for monitoring the C19 reproduction rate. However, to counter controversy I, we have demonstrated how the collected data can be further used in C19 recovery analysis. This reuse of data for research also applies to controversy III. In terms of patient privacy, the data published on the MOH website are nonpersonal data [64,70], and most times is also in the form of an aggregation of case logs. If there is a provision for proper information governance controls, then the health data can be reused for research without explicit patient consent [49], which is applicable to the data in our case study. Thus, our use of open data for recovery analysis conforms to the best practices in health data science.

3.2 Access to open data for early pandemic response

Open data have led to several analyses leading to cohort-based studies and insights into managing the pandemic. Here, we list a few of the examples of interesting analyses of C19 open data – mass behavior for buying face masks [60], quality of research funding [47], impact to universal healthcare [45], seasonal behavior of C19 [2], etc. At this juncture, open data are available through several open resource repositories [21,31,48,71,78].

However, there are gaps in data sharing in several countries, which could be attributed to the high population density, lack of data collection infrastructure, national/sovereign policies and priorities, etc. [64]. This transcends the case logs to include clinical data [79], wastewater data [48], genome sequencing [53], contact tracing [64], and so on, which can still be anonymized. It still remains that sharing data for responding to a pandemic facilitates better surveillance, expert collaboration, coordination, and equity analysis [48]. Sharing data, thus, improves the preparedness of the countries through surveillance and thus, have a positive impact globally [53].

While the available open data and communication in the public domain are beneficial for pandemic management, it is also fraught with misleading information [24], which has a large negative impact. Digital technologies using machine learning help in detecting misinformation. That said, the best measure is to avoid misinformation completely, which is a difficult problem today with the high penetration of Internet access globally.

3.3 Uniqueness of the data in case study

We considered this period because infection spread almost linearly in this initial period of the COVID-19 outbreak in Singapore owing to surveillance by the government, thus resulting in negligible deaths. Moreover, the datasets were complete and freely available during this period of time by the health ministry. There are three key aspects of the dataset used in our case study that contribute to its uniqueness.

First, in the progression of COVID-19, as is known today, there is an exponential growth in infections in the last of the four transmission stages of the disease progression model, i.e., the “community transmission” [12]. The exponential growth makes it difficult to study the transition between the third transmission stage (i.e., “cluster of cases”) and the fourth one using a linear model. Hence, we used a dataset that is sufficient to model the transition between transmission stages. Also, when the number of cases is very high during the last transmission stage, there are also lapses in the protocol followed in the hospital, which leads to potential issues of inhomogeneity in the collected data. For the dataset, we have considered the infections between January 22, 2020, and April 01, 2020, there are 1,000 patients in all. The next 1,000 infections happened by April 10, and then by April 14, and so on. (The details of converting this timeline to our proposed periodization are provided in Appendix A.) This gives an impression of how keeping the study controlled in a specific instance in the progression of the pandemic helps in generating models that can demonstrate the dependencies of variables.

Second, the number of deaths in the population in Singapore is one, which we consider negligible. Thus, this assumption enables us to use the susceptible-infected-recovered variants, including the hospitalizations and excluding the deceased, of the epidemiological model.

Third, very few countries have collected data systematically as Singapore, and that is also because of the slowing down of the pandemic in its escalation to the pandemic stage. South Korea has similar infrastructure and protocols as Singapore except that their early outbreak measures were not as stringent as those in Singapore [64]. Hence, they reached the pandemic stage within a month, i.e. by the end of February 2020. Hence, putting all these different aspects into consideration, we observe that our choice of data is owing to its uniqueness and its potential to provide insight into socioeconomic dynamics.

4 Conclusions

Here, we demonstrated the evaluation of the early response government interventions for the COVID-19 pandemic through a case study of Singapore. Our work is different from similar analyses as we use the recovery analysis or the LOS estimation as a way to understand the outcome of the interventions. We used the open data available as case logs in the notices of the MOH and conducted an in-depth analysis. We performed statistical analysis and regression modeling to estimate the LOS, Δ t LOS and then used this estimate in our selected epidemiological models that include treatment and/or hospitalization. Our case study shows that open data can be effectively used for evaluating early response strategies at a broad level, going from individual (case logs) to population (cohorts) scales.

Acknowledgments

We are thankful to the comments from anonymous reviewers that go a long way in improving the manuscript. The authors would like to thank the continued and generous support of IIITB that helped in undertake solving a socially relevant problem, which is presented in this paper.

  1. Funding information: This work was supported partially by IIITB, EHRC, and Visvesvaraya PhD scheme.

  2. Conflict of interest: The authors state no conflict of interest.

  3. Ethical approval: This research did not require ethical approval.

  4. Data availability statement: Our curated data and statistical analysis code are available at https://github.com/vrrani/COVID19-Singapore.

Appendix A

Here, we provide a detailed background of our case study needed to explain the rationale behind our proposed periodization. The pandemic response implemented by the Government of Singapore is based on the stage of its transmission in the country. Even though this is the response for several developed countries, the strategies adopted in Singapore are known to have slowed the pandemic in early 2020.

Pandemic response in Singapore. The healthcare system of Singapore has implemented a quick and efficient pandemic response since its first case got confirmed on January 23, by enforcing strict quarantine, isolation, hospital surveillance, large-scale contact tracing, and testing. For containing the spread of contagions, systems have been in place for upscale, quick-responsive, and aggressive contact tracing at entry points of the country (airports) and through local healthcare providers. Since the SARS outbreak in 2003, Singapore has systematically strengthened its centralized system of managing the spread of infectious diseases [75]. The measures include opening dedicated facilities (the NCID), National Public Health Laboratory, and more biosafety level-3 laboratories), increasing capacity in the public healthcare system (e.g., negative pressure isolation beds, personal protective equipment, trained health professionals), and deploying formal (digital) platforms for inter-governmental agency cooperation. There has been a holistic improvement, supported by increased economic investment, in building expertise in infectious disease management. This organized system has thus facilitated controlled management of the C19 transmission in the country. The centralization has also facilitated responsive data gathering and anonymized patient-wise reporting to the public.

The rationale behind our proposed periodization. The following timeline of nationwide measures[6] mandated by the government is relevant here:

  1. January 23, 2020: Singapore commenced contact tracing and expanded travel advisory, recommending Singaporeans avoid travel to Hubei Province, China. Hospital surveillance of local cases had also begun, which helped in identifying the first four local transmission cases (i.e., 19, 20, 21, 24) on February 4th.

  2. February 04, 2020: Additional measures to contain the risk of spread beyond local clusters to the broader community had been announced. Aggressive contact tracing and expanded surveillance efforts to conduct testing on all pneumonia cases in the hospitals had been intensified. Precautions had been implemented to reduce the spread in vulnerable groups. Measures had been taken in schools to lower the risk of transmission among students and staff, including staggered recesses and suspension of school assemblies. Widespread suspension of large groups and communal activities in preschools had begun, and similarly, with external excursions and large-scale gatherings in eldercare facilities.

  3. March 3, 2020: The border had started banning travelers from countries that were hotspots. A complete ban on international travel into Singapore was implemented on March 22.

  4. March 16–19, 2020: The Ministry of Education and the Ministry of Foreign Affairs, with support from Singapore Airlines, had begun repatriation of Singaporeans working and studying outside of Singapore. This activity continued until the end of April.

We also identify the following significant dates relevant to our study:

  1. On January 23, the first imported case from Wuhan, China, was confirmed C19 positive.

  2. On February 04, the local transmission began in Singapore, and coincidentally, the first clinically recovered patient was discharged from the hospital.

  3. On March 17, the total/cumulative number of positively confirmed patients has a spike with 44 new cases on a single day. This is the start of community transmission.

  4. On March 21, the first two deaths owing to complications from C19 were recorded.

  5. On April 01, the number of confirmed patients reached 1,000, with 245 discharged and 3 deceased.

The aforementioned timeline provides the rationale behind the periodization we propose in Section 2.1:

  1. P 1 : January 23–February 03, 2020,

  2. P 2 : February 04–March 16, 2020,

  3. P 3 : March 17, 2020–April 01, 2020, and

  4. P 4 : April 02, 2020 and thereafter.

Our periodization P 1 P 3 also correspond to the stages 2–4 of the established disease progression model, where the four sequential stages are “imported cases only,” “sporadic cases/local transmission,” “clusters of cases,” and “community transmission” [12]. Thus, the progression in the transmission stages has been used as a “rule” for our proposed periodization.

Overall, our findings demonstrate that the age-based mobility measures by the government have influenced the transmission of the disease in the country. A timely review of the early pandemic response by the Government of Singapore [38] has opined on the positive influence of the response on socio-economic activities and the healthcare system in the country. Our study corroborates the same.

Appendix B

For the case study discussed in Section 2, we use the time series of Δ t LOS and fit a line using a local regression model, for which we choose the loess model. Loess is a non-parametric local regression model for smoothening empirical time-series data [17] and scatterplots [33].

Loess model The loess model [17] is used to smooth the conditional means in the observations of Δ t LOS for getting its trendline. Loess uses a linear least squares regression model, y = g ( x ) + ε , with independent and dependent variables, x and y , respectively, a smooth function g (we use the Gaussian function), and a zero-mean constant-scale random variable as the error value, ε . The loss function ε w ( x ) = w ( x ) . ( y g ( x ) ) 2 is minimized using a weight function w ( x ) . Loess uses a key parameter, span α , which is the control parameter for the size of the local neighborhood. With d as the normalized distance from data points to the fitted curve, the tricubic weight function is:

w ( x ) = ( 1 d 3 ) 3 , if α < 1 0 , otherwise .

Results of loess model We smooth the time-series of the Δ t LOS , represented as a scatter plot based on hospital admission dates (Figure A1) using Loess, implemented using loess with default span α = 0.75 in R [52]. The loess model is a default local regression model used for a sample with less than 1,000 observations in the Stats package in R. The optimal span has been identified based on the values that produce a relatively less residual standard error (rse) as well as show appropriate smoothing in the visualizations. The loess model for discharged patients during January 23–April 01 gives the following degrees of freedom ( d f ) and r s e for N patients: ( N = 671 , d f = 5.37 , RSE = 145.4 ) (Figure A1(i)). Δ t LOS has a slope of 2 5 o in P 3 . Thus, the key conclusion from the local regression model on the time series is the negative slope, i.e., a downward trend in Δ t LOS in P 3 . Given the higher reliability of the data until P 3 , we have excluded the time-series analysis of P 4 here. This allows us to study the trend based on reliable data. We find that the smoothing of period-wise observations using separate loess models does not provide a model with a good fit, and hence, is not as effective as the overall model, as shown in Figure A1(ii).

Figure A1 
                  Scatter plot of the length of in-hospital stay for 671 discharged patients, until April 1, with the timeline of their C19-positive confirmation, with a trend-line estimated using loess model (i) for all 671 patients during the entire period, and (ii) separately for different periods, i.e., 22, 208, and 441 patients in 
                        
                           
                           
                              
                                 
                                    P
                                 
                                 
                                    1
                                 
                              
                           
                           {P}_{1}
                        
                     , 
                        
                           
                           
                              
                                 
                                    P
                                 
                                 
                                    2
                                 
                              
                           
                           {P}_{2}
                        
                     , and 
                        
                           
                           
                              
                                 
                                    P
                                 
                                 
                                    3
                                 
                              
                           
                           {P}_{3}
                        
                     , respectively. The fitted loess curve and its error ribbon show a decrease in LOS during the entire period.
Figure A1

Scatter plot of the length of in-hospital stay for 671 discharged patients, until April 1, with the timeline of their C19-positive confirmation, with a trend-line estimated using loess model (i) for all 671 patients during the entire period, and (ii) separately for different periods, i.e., 22, 208, and 441 patients in P 1 , P 2 , and P 3 , respectively. The fitted loess curve and its error ribbon show a decrease in LOS during the entire period.

Appendix C

For the case study discussed in Section 2, we can further use machine learning for finding patterns in the data to implement classification and clustering. After modeling the count data of LOS based on age, gender, and period, we now determine the existence of a characteristic pattern in age, gender, and LOS, based on the periodization. Using age, gender, and Δ t LOS as patient-specific attributes, we mine the pattern in these attributes based on the transmission stage or period.

Classification and clustering Here, we use classifiers and clustering algorithms to group patients. The rationale is that if a characteristic pattern is present in the patient-wise data, then the class or cluster of the patient data would correspond to a transmission stage, respectively. We use the following methods:

  1. Supervised learning for classification, with a 70–30 split for training and testing, enables grouping patients by a label. Since there is a class imbalance in the data, given the relatively sparse data during P 1 , we use stratified split, to accommodate uniform percentages of all classes in both training and testing data. Thus, we use support vector machine (SVM) [18] and ensemble methods, such as random forest classifier [6], and AdaBoost [58], as classifiers.

  2. Unsupervised learning for clustering enables us to find natural clusters. Given the number of clusters, n c is the same as that of transmission stages, we choose methods that take n c as an input. Thus, we use k-means, spectral clustering, agglomerative clustering using ward and average linkage, and Gaussian mixture models.

We use classification accuracy and intersection over union (Jaccard similarity) as metrics to compare results with the ground truth, i.e., the raw data. For classification, we also use the mean absolute error (MAE).

Results of the periodization-based patterns Here, we report the results of running classifiers and clustering algorithms on patient-wise data, i.e., age, gender, and LOS, to determine the period during which the patient has been infected. At this juncture, we also conduct another experiment without gender as a feature, since gender has been found to be statistically insignificant in our regression model, and in the absence of the categorical variable, i.e., gender, all variables are numerical variables. We have implemented the algorithms and the metrics using Python package sklearn [8].

When experimenting with different metrics, we have found that the accuracy and intersection over union (Jaccard similarity) give similar results. Our results are shown in Table A1. We observe that, with or without gender as a feature, the classification accuracy is the best for SVM with the polynomial kernel, at 66–67% classification accuracy. Similarly, the clustering accuracy is the best for agglomerative clustering using average linkage, at 66%. Thus, we conclude that there is a weak grouping pattern in the demographic and LOS attributes with respect to the period.

Table A1

Performance in classifiers and clustering algorithms on data with patient-wise attributes for grouping by the period or transmission stage

Features Age, gender, and LOS Age, and LOS
Method MAE Accuracy MAE Accuracy
Supervised classification
Random forest classifier 0.4703 0.5594 0.4752 0.5742
AdaBoost 0.4406 0.5891 0.4604 0.5742
SVM (polynomial kernel) 0.3564 0.6782 0.3713 0.6634
Clustering algorithm
k-Means 0.6572 0.6572
Spectral clustering 0.6572 0.6572
Gaussian mixture model 0.6617 0.6617
Agglomerative clustering:
Ward 0.6572 0.6572
Average linkage 0.6602 0.6647

References

[1] Abdullah, W. J., & Kim, S. (2020). Singapore’s responses to the COVID-19 outbreak: A critical assessment. The American Review of Public Administration, 50(6–7), 770–776. 10.1177/0275074020942454Search in Google Scholar

[2] Alamo, T., Reina, D. G., Mammarella, M., & Abella, A. (2020). COVID-19: Open-data resources for monitoring, modeling, and forecasting the epidemic. Electronics, 9(5), 827. 10.3390/electronics9050827Search in Google Scholar

[3] Anagnostopoulos, C., & Triantafillou, P. (2020). Large-scale predictive modeling and analytics through regression queries in data management systems. International Journal of Data Science and Analytics, 9(1), 17–55. 10.1007/s41060-018-0163-5Search in Google Scholar

[4] Anastassopoulou, C., Russo, L., Tsakris, A., & Siettos, C. (2020). Data-based analysis, modelling and forecasting of the COVID-19 outbreak. PloS One, 15(3), e0230405. 10.1371/journal.pone.0230405Search in Google Scholar PubMed PubMed Central

[5] Bezzan, V. P., & Rocco, C. D. (2021). Predicting special care during the COVID-19 pandemic: A machine learning approach. Health Information Science and Systems, 9(1), 1–13. 10.1007/s13755-021-00164-6Search in Google Scholar PubMed PubMed Central

[6] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. 10.1023/A:1010933404324Search in Google Scholar

[7] Brooks-Pollock, E., Danon, L., Jombart, T., & Pellis, L. (2021). Modelling that shaped the early COVID-19 pandemic response in the UK. Philosophical Transactions of the Royal Society B, 376(1829), 20210001. 10.1098/rstb.2021.0001Search in Google Scholar PubMed PubMed Central

[8] Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., …, Varoquaux, G. (2013). API design for machine learning software: Experiences from the scikit-learn project. Cornell University. arXiv preprint arXiv:1309.0238.Search in Google Scholar

[9] Burnham, K. P., & Anderson, D. R. (1998). Model selection and inference: A practical information-theoretic approach. Springer Verlag. 10.1007/978-1-4757-2917-7_3Search in Google Scholar

[10] Bustamante-Orellana, C., Cevallos-Chavez, J., Montalvo-Clavijo, C., Sullivan, J., Michael, E., & Mubayi, A. (2020). Modeling and preparedness: The transmission dynamics of covid-19 outbreak in provinces of ecuador. medRxiv, pages 2020–07. 10.1101/2020.07.09.20150078Search in Google Scholar

[11] Carter, E. M., & Potts, H. W. (2014). Predicting length of stay from an electronic patient record system: A primary total knee replacement example. BMC Medical Informatics and Decision Making, 14(1), 26. 10.1186/1472-6947-14-26Search in Google Scholar PubMed PubMed Central

[12] Chamola, V., Hassija, V., Gupta, V., & Guizani, M. (2020). A comprehensive review of the COVID-19 pandemic and the role of IoT, drones, AI, blockchain, and 5G in managing its impact. IEEE Access, 8, 90225–90265. 10.1109/ACCESS.2020.2992341Search in Google Scholar

[13] Chen, H., Shi, L., Zhang, Y., Wang, X., & Sun, G. (2021). A cross-country core strategy comparison in China, Japan, Singapore and South Korea during the early COVID-19 pandemic. Globalization and Health, 17, 1–10. 10.1186/s12992-021-00672-wSearch in Google Scholar PubMed PubMed Central

[14] Chikina, M., & Pegden, W. (2020). Modeling strict age-targeted mitigation strategies for covid-19. PloS One, 15(7), e0236237. 10.1371/journal.pone.0236237Search in Google Scholar PubMed PubMed Central

[15] Chotirmall, S. H., Wang, L.-F., & Abisheganaden, J. A. (2020). Letter from Singapore: The clinical and research response to COVID-19. Respirology (Carlton, Vic.), 25(10), 1101. 10.1111/resp.13929Search in Google Scholar PubMed PubMed Central

[16] Chua, A. Q., Tan, M. M. J., Verma, M., Han, E. K. L., Hsu, L. Y., Cook, A. R., …, Legido-Quigley, H. (2020). Health system resilience in managing the COVID-19 pandemic: Lessons from Singapore. BMJ Global Health, 5(9), e003317. 10.1136/bmjgh-2020-003317Search in Google Scholar PubMed PubMed Central

[17] Cleveland, W. S., & Devlin, S. J. (1988). Locally weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association, 83(403), 596–610. 10.1080/01621459.1988.10478639Search in Google Scholar

[18] Cortes, C., & Vapnik, V. (1995). Support vector machine. Machine Learning, 20(3), 273–297. 10.1007/BF00994018Search in Google Scholar

[19] Cruz-Correia, R., Ferreira, D., Bacelar, G., Marques, P., & Maranhaaao, P. (2018). Personalised medicine challenges: Quality of data. International Journal of Data Science and Analytics, 6(3), 251–259. 10.1007/s41060-018-0127-9Search in Google Scholar

[20] d’Andrea, V., Gallotti, R., Castaldo, N., & De Domenico, M. (2022). Individual risk perception and empirical social structures shape the dynamics of infectious disease outbreaks. PLOS Computational Biology, 18(2), e1009760. 10.1371/journal.pcbi.1009760Search in Google Scholar PubMed PubMed Central

[21] Desvars-Larrive, A., Dervic, E., Haug, N., Niederkrotenthaler, T., Chen, J., Di Natale, A., & Chakraborty, A. (2020). A structured open dataset of government interventions in response to COVID-19. Scientific Data, 7(1), 285. 10.1101/2020.05.04.20090498Search in Google Scholar

[22] Dorlach, T. (2023). Social policy responses to Covid-19 in the global south: Evidence from 36 countries. Social Policy and Society, 22(1), 94–105. 10.1017/S1474746422000264Search in Google Scholar

[23] Du, B., Zhao, Z., Zhao, J., Yu, L., Sun, L., & Lv, W. (2021). Modelling the epidemic dynamics of COVID-19 with consideration of human mobility. International Journal of Data Science and Analytics, 12(4), 369–382. 10.1007/s41060-021-00271-3Search in Google Scholar PubMed PubMed Central

[24] Elhadad, M. K., Li, K. F., & Gebali, F. (2020). Detecting misleading information on COVID-19. IEEE Access, 8, 165201–165215. 10.1109/ACCESS.2020.3022867Search in Google Scholar PubMed PubMed Central

[25] Fan, J., & Gijbels, I. (1996). Local polynomial modelling and its applications: Monographs on statistics and applied probability 66. Chapman and Hall/CRC. Search in Google Scholar

[26] Ferguson, N., Laydon, D., Nedjati Gilani, G., Imai, N., Ainslie, K., Baguelin, M., …, Dighe, A. (2020). Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand. Imperial College London, 10(77482), 491–497.Search in Google Scholar

[27] Ferrari-Trecate, G., & Muselli, M. (2002). A new learning method for piecewise linear regression. In International conference on artificial neural networks (pp. 444–449). Springer. 10.1007/3-540-46084-5_72Search in Google Scholar

[28] Haldane, V., De Foo, C., Abdalla, S. M., Jung, A.-S., Tan, M., Wu, S., …, Perez, T. (2021). Health systems resilience in managing the COVID-19 pandemic: Lessons from 28 countries. Nature Medicine, 27(6), 964–980. 10.1038/s41591-021-01381-ySearch in Google Scholar PubMed

[29] Haug, N., Geyrhofer, L., Londei, A., Dervic, E., Desvars-Larrive, A., Loreto, V., …, Klimek, P., (2020). Ranking the effectiveness of worldwide COVID-19 government interventions. Nature Human Behaviour, 4(12), 1303–1312. 10.1038/s41562-020-01009-0Search in Google Scholar PubMed

[30] He, R., Zhang, J., Mao, Y., Degomme, O., & Zhang, W.-H. (2020). Preparedness and responses faced during the COVID-19 pandemic in Belgium: An observational study and using the national open data. International Journal of Environmental Research and Public Health, 17(21), 7985. 10.3390/ijerph17217985Search in Google Scholar PubMed PubMed Central

[31] Hu, T., Guan, W. W., Zhu, X., Shao, Y., Liu, L., Du, J., …, Zhang, L. (2020). Building an open resources repository for COVID-19 research. Data and Information Management, 4(3), 130–147. 10.2478/dim-2020-0012Search in Google Scholar PubMed PubMed Central

[32] Jacinta, I., Chen, P., Yap, J., Hsu, L. Y., & Teo, Y. Y. (2020). COVID-19 and Singapore: From early response to circuit breaker. Ann Acad Med Singapore, 49, 561–572. 10.47102/annals-acadmedsg.2020239Search in Google Scholar

[33] Jacoby, W. G. (2000). Loess: A nonparametric, graphical tool for depicting relationships between variables. Electoral Studies, 19(4), 577–613. 10.1016/S0261-3794(99)00028-1Search in Google Scholar

[34] James, A., Dalal, J., Kousi, T., Vivacqua, D., Câmara, D. C. P., Dos Reis, I. C., …, , Lee, T. M. (2022). An in-depth statistical analysis of the COVID-19 pandemic’s initial spread in the WHO African region. BMJ Global Health, 7(4), e007295. 10.1136/bmjgh-2021-007295Search in Google Scholar PubMed PubMed Central

[35] Kermack, W. O., & McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London. Series A, Containing papers of a mathematical and physical character, 115(772), 700–721. 10.1098/rspa.1927.0118Search in Google Scholar

[36] Kribs-Zaleta, C., Siddiqui, N. A., Kumar, N., & Das, P. (2009). The control reproduction number and case-under reporting of visceral Leishmaniasis in Bihar, Academia, India. Search in Google Scholar

[37] Kucharski, A. J., Russell, T. W., Diamond, C., Liu, Y., Edmunds, J., Funk, S., and Eggo, R. M. (2020). Early dynamics of transmission and control of COVID-19: a mathematical modelling study. The Lancet Infectious Diseases. https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(20)30144-4/fulltext. 10.1016/S1473-3099(20)30144-4Search in Google Scholar PubMed PubMed Central

[38] Lee, W., & Ong, C. (2020). Overview of rapid mitigating strategies in Singapore during the COVID-19 pandemic. Public Health, 185, 15–17. 10.1016/j.puhe.2020.05.015Search in Google Scholar PubMed PubMed Central

[39] Li, Q., Guan, X., Wu, P., Wang, X., Zhou, L., Tong, Y., …, Feng, Z. (2020). Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. New England Journal of Medicine, 382(13), 1199–1207. 10.1056/NEJMoa2001316Search in Google Scholar PubMed PubMed Central

[40] Lin, R. J., Lee, T. H., & Lye, D. C. B. (2020). From SARS to COVID-19: The Singapore Journey. The Medical Journal of Australia, 212(11):497–502.e1.10.5694/mja2.50623Search in Google Scholar PubMed PubMed Central

[41] Liu, J., Ong, G. P., & Pang, V. J. (2022). Modelling effectiveness of COVID-19 pandemic control policies using an Area-based SEIR model with consideration of infection during interzonal travel. Transportation Research Part A: Policy and Practice, 161, 25–47. 10.1016/j.tra.2022.05.003Search in Google Scholar PubMed PubMed Central

[42] Liu, T., Hu, J., Kang, M., Lin, L., Zhong, H., Xiao, J., …, Deng, A. (2020a). Transmission dynamics of 2019 novel coronavirus (2019-ncov). http://dx.doi.org/10.2139/ssrn.3526307. 10.2139/ssrn.3526307Search in Google Scholar

[43] Liu, Y., Qin, J., Fan, Y., Zhou, Y., Follmann, D. A., & Huang, C.-Y. (2020b). Estimation of infection density and epidemic size of COVID-19 using the back-calculation algorithm. Health Information Science and Systems, 8(1), 1–8. 10.1007/s13755-020-00122-8Search in Google Scholar PubMed PubMed Central

[44] Lo, K. L., Zhang, M., Chen, Y., & Mi, J. J. (2021). Forecasting the trend of COVID-19 considering the impacts of public health interventions: An application of FGM and buffer level. Journal of Healthcare Informatics Research, 5(4), 497–528. 10.1007/s41666-021-00103-wSearch in Google Scholar PubMed PubMed Central

[45] McGlacken-Byrne, D., Parker, S., & Burke, S. (2023). Tracking aspects of healthcare activity during the first nine months of COVID-19 in Ireland: A secondary analysis of publicly available data. HRB Open Research, 4, 98. 10.12688/hrbopenres.13372.2Search in Google Scholar

[46] McMahan, C. S., Self, S., Rennert, L., Kalbaugh, C., Kriebel, D., Graves, D., …, Freedman, DL. (2021). COVID-19 wastewater epidemiology: a model to estimate infected populations. The Lancet Planetary Health, 5(12), e874–e881. 10.1016/S2542-5196(21)00230-8Search in Google Scholar PubMed PubMed Central

[47] Mugabushaka, A.-M., van Eck, N. J., & Waltman, L. (2022). Funding Covid-19 research: Insights from an exploratory analysis using open data infrastructures. Quantitative Science Studies, 3(3), 560–582. 10.1162/qss_a_00212Search in Google Scholar

[48] Naughton, C. C., Roman Jr, F. A., Alvarado, A. G. F., Tariqi, A. Q., Deeming, M. A., Kadonsky, K. F., …, Katsivelis, P. (2023). Show us the data: global COVID-19 wastewater monitoring efforts, equity, and gaps. FEMS Microbes, 4, xtad003. 10.1093/femsmc/xtad003Search in Google Scholar PubMed PubMed Central

[49] Peek, N., & Rodrigues, P. P. (2018). Three controversies in health data science. International Journal of Data Science and Analytics, 6(3), 261–269. 10.1007/s41060-018-0109-ySearch in Google Scholar PubMed PubMed Central

[50] Price, D. J., Shearer, F. M., Meehan, M. T., McBryde, E., Moss, R., Golding, N., …, Abbott, S. (2020). Early analysis of the Australian COVID-19 epidemic. Elife, 9, e58785. 10.7554/eLife.58785Search in Google Scholar PubMed PubMed Central

[51] Pung, R., Chiew, C. J., Young, B. E., Chin, S., Chen, M. I.-C., Clapham, H. E., …, Lee, V. J. M. (2020). Investigation of three clusters of COVID-19 in Singapore: implications for surveillance and response measures. The Lancet, 395, 1039–46. 10.1016/S0140-6736(20)30528-6Search in Google Scholar PubMed PubMed Central

[52] R Core Team. (2013). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Search in Google Scholar

[53] Rahman, N., O’Cathail, C., Zyoud, A., Sokolov, A., Munnink, B. O., Grüning, B., …, Yuan, D. Y. (2023). Mobilisation and analyses of publicly available SARS-CoV-2 data for pandemic responses. bioRxiv, pp. 2023–04. 10.1101/2023.04.19.537514Search in Google Scholar

[54] Ram, V., & Schaposnik, L. P. (2021). A modified age-structured sir model for covid-19 type viruses. Scientific Reports, 11(1), 15194. 10.1038/s41598-021-94609-3Search in Google Scholar PubMed PubMed Central

[55] Riou, J., & Althaus, C. L. (2020). Pattern of early human-to-human transmission of Wuhan 2019 novel coronavirus (2019-nCoV), December 2019 to January 2020. Euro Surveillance, 25(4,pii:2000058). published correction appears in Euro Surveill. 2020 Feb;25(7). 10.2807/1560-7917.ES.2020.25.4.2000058Search in Google Scholar PubMed PubMed Central

[56] Rojas, J. H., Paredes, M., Banerjee, M., Akman, O., & Mubayi, A. (2022). Mathematical modeling and dynamics of sars-cov-2 in colombia. Letters in Biomathematics, 9(1), 41–56. Search in Google Scholar

[57] Roques, L., Klein, E., Papaix, J., & Soubeyrand, S. (2020). Mechanistic-statistical sir modelling for early estimation of the actual number of cases and mortality rate from covid-19. medRxiv. https://www.medrxiv.org/content/early/2020/03/24/2020.03.22.20040915.full.pdf. Search in Google Scholar

[58] Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336. 10.1023/A:1007614523901Search in Google Scholar

[59] Sesagiri Raamkumar, A., Tan, S. G., & Wee, H. L. (2020). Measuring the outreach efforts of public health authorities and the public response on Facebook during the COVID-19 pandemic in early 2020: Cross-country comparison. Journal of Medical Internet Research, 22(5), e19334. 10.2196/19334Search in Google Scholar PubMed PubMed Central

[60] Shibuya, Y., Lai, C.-M., Hamm, A., Takagi, S., & Sekimoto, Y. (2022). Do open data impact citizens’ behavior? Assessing face mask panic buying behaviors during the COVID-19 pandemic. Scientific Reports, 12(1), 17607. 10.1038/s41598-022-22471-ySearch in Google Scholar PubMed PubMed Central

[61] Sreevalsan-Nair, J., Vangimalla, R. R., & Ghogale, P. R. (2020a). Analysis of clinical recovery-period and recovery rate estimation of the first 1,000 COVID-19 patients in Singapore. medRxiv. Search in Google Scholar

[62] Sreevalsan-Nair, J., Vangimalla, R. R., & Ghogale, P. R. (2020b). Estimation of length of in-hospital stay using demographic data of the first 1,000 COVID-19 Patients in Singapore. medRxiv. 10.1101/2020.04.17.20069724Search in Google Scholar

[63] Sreevalsan-Nair, J., Vangimalla, R. R., & Ghogale, P. R. (2020c). Influence of COVID-19 transmission stages and demographics on length of in-hospital stay in Singapore for the first 1000 patients [version 1; not peer reviewed]. F1000Research 2020, 9(ISCB Comm J). Search in Google Scholar

[64] Sridhar, V., Sreevalsan-Nair, J., Ghogale, P. R., & Vangimalla, R. R. (2022). Sharing and use of non-personal health information: Case of the COVID-19 pandemic. In: V. Sridhar, (Ed.), Data Centric Living: Algorithms, Digitization and Regulation, (chapter 8, 1st ed.). India: Routledge. 10.4324/9781003093442-8Search in Google Scholar

[65] Summers, J., Cheng, H.-Y., Lin, H.-H., Barnard, L. T., Kvalsvig, A., Wilson, N., & Baker, M. G. (2020). Potential lessons from the Taiwan and New Zealand health responses to the COVID-19 pandemic. The Lancet Regional Health-Western Pacific, 4, 100044. 10.1016/j.lanwpc.2020.100044Search in Google Scholar PubMed PubMed Central

[66] Sun, J., & Zhao, X. (2013). Statistical analysis of panel count data. Springer. 10.1007/978-1-4614-8715-9Search in Google Scholar

[67] Tan, J. B., Cook, M. J., Logan, P., Rozanova, L., & Wilder-Smith, A. (2021). Singapore’s pandemic preparedness: an overview of the first wave of COVID-19. International Journal of Environmental Research and Public Health, 18(1), 252. 10.3390/ijerph18010252Search in Google Scholar PubMed PubMed Central

[68] Tang, J. W., Caniza, M. A., Dinn, M., Dwyer, D. E., Heraud, J.-M., Jennings, L. C., …, Marr, L. C. (2022). An exploration of the political, social, economic and cultural factors affecting how different global regions initially reacted to the COVID-19 pandemic. Interface Focus, 12(2), 20210079. 10.1098/rsfs.2021.0079Search in Google Scholar PubMed PubMed Central

[69] Ulahannan, J. P., Narayanan, N., Thalhath, N., Prabhakaran, P., Chaliyeduth, S., Suresh, S. P., …, Ulahannan, J. (2020). A citizen science initiative for open data and visualization of COVID-19 outbreak in Kerala, India. Journal of the American Medical Informatics Association, 27(12), 1913–1920. 10.1093/jamia/ocaa203Search in Google Scholar PubMed PubMed Central

[70] Vogt, F., Haire, B., Selvey, L., Katelaris, A. L., & Kaldor, J. (2022). Effectiveness evaluation of digital contact tracing for COVID-19 in New South Wales, Australia. The Lancet Public Health, 7(3), e250–e258. 10.1016/S2468-2667(22)00010-XSearch in Google Scholar PubMed PubMed Central

[71] Wahltinez, O., Cheung, A., Alcantara, R., Cheung, D., Daswani, M., Erlinger, A., …, Brenner, M. P. (2022). COVID-19 Open-Data a global-scale spatially granular meta-dataset for coronavirus disease. Scientific Data, 9(1), 162. 10.1038/s41597-022-01263-zSearch in Google Scholar PubMed PubMed Central

[72] Waitzberg, R., Hernández-Quevedo, C., Bernal-Delgado, E., Estupinnnnán-Romero, F., Angulo-Pueyo, E., Theodorou, M., …, Kaitelidou, D. (2022). Early health system responses to the COVID-19 pandemic in Mediterranean countries: A tale of successes and challenges. Health Policy, 126(5), 465–475. 10.1016/j.healthpol.2021.10.007Search in Google Scholar PubMed PubMed Central

[73] White, G. C., & Bennetts, R. E. (1996). Analysis of frequency count data using the negative binomial distribution. Ecology, 77(8), 2549–2557. 10.2307/2265753Search in Google Scholar

[74] Whitelaw, S., Mamas, M. A., Topol, E., & Van Spall, H. G. (2020). Applications of digital technology in COVID-19 pandemic planning and response. The Lancet Digital Health, 2(8), e435–e440. 10.1016/S2589-7500(20)30142-4Search in Google Scholar PubMed PubMed Central

[75] Wong, J. E. L., Leo, Y. S., & Tan, C. C. (2020). COVID-19 in Singapore-Current experience: Critical global issues that require attention and action. JAMA. https://jamanetwork.com/journals/jama/fullarticle/2761890. 10.1001/jama.2020.2467Search in Google Scholar PubMed

[76] Xiang, L., Lee, A. H., Yau, K. K., & McLachlan, G. J. (2007). A score test for overdispersion in zero-inflated poisson mixed regression model. Statistics in Medicine, 26(7), 1608–1622. 10.1002/sim.2616Search in Google Scholar PubMed

[77] Yang, X., Yu, Y., Xu, J., Shu, H., Xia, J., Liu, H., …, Shang, Y. (2020). Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. The Lancet Respiratory Medicine. https://www.thelancet.com/journals/lanres/article/PIIS2213-2600(20)30079-5/fulltext. 10.1016/S2213-2600(20)30079-5Search in Google Scholar PubMed PubMed Central

[78] Zheng, Q., Jones, F. K., Leavitt, S. V., Ung, L., Labrique, A. B., Peters, D. H., …, Azman, A. S. (2020). HIT-COVID, a global database tracking public health interventions to COVID-19. Scientific Data, 7(1), 286. 10.1038/s41597-020-00610-2Search in Google Scholar PubMed PubMed Central

[79] Zhou, F., Yu, T., Du, R., Fan, G., Liu, Y., Liu, Z., …, Cao, B. (2020). Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. The Lancet, 395(10229), 1054–1062. 10.1016/S0140-6736(20)30566-3Search in Google Scholar PubMed PubMed Central

Received: 2023-05-13
Revised: 2023-09-03
Accepted: 2023-10-09
Published Online: 2023-11-14

© 2023 the author(s), published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 27.4.2024 from https://www.degruyter.com/document/doi/10.1515/cmb-2023-0104/html
Scroll to top button