Development and validation of a risk stratification model for screening suspected cases of COVID-19 in China

How to quickly identify high-risk populations is critical to epidemic control. We developed and validated a risk prediction model for screening SARS-CoV-2 infection in suspected cases with an epidemiological history. A total of 1019 patients, ≥13 years of age, who had an epidemiological history were enrolled from fever clinics between January 2020 and February 2020. Among 103 (10.11%) cases of COVID-19 were confirmed. Multivariable analysis summarized four features associated with increased risk of SARS-CoV-2 infection, summarized in the mnemonic COVID-19-REAL: radiological evidence of pneumonia (1 point), eosinophils < 0.005 × 109/L (1 point), age ≥ 32 years (2 points), and leukocytes < 6.05 × 109 /L (1 point). The area under the ROC curve for the training group was 0.863 (95% CI, 0.813 - 0.912). A cut-off value of less than 3 points for COVID-19-REAL was assigned to define the low-risk population. Only 10 (2.70%) of 371 patients were proved to be SARS-CoV-2 positive, with a negative predictive value of 0.973. External validation was similar. This study provides a simple, practical, and robust screening model, COVID-19-REAL, able to identify populations at high risk for SARS-CoV-2 infection.


INTRODUCTION
At the end of December 2019, an outbreak of pneumonia caused by a novel coronavirus (severe acute respiratory syndrome coronavirus 2, SARS-CoV-2) was reported in Wuhan, China [1]. Transmission takes place through respiratory droplets and other routes such as ocular surfaces [2][3][4]. This highly contagious virus spread rapidly to other cities of China, and gave rise to a global outbreak. As of Mar 23, 2020, over 300,000 cases of COVID-19 have been confirmed worldwide, and more than 10,000 have died. The number of confirmed cases is still increasing. One study estimates the basic reproductive number (R0) to be 2.68, and the epidemic doubling time to be 6.4 days [5]. The control of COVID-19 must include detection and isolation of latent infection. A considerable proportion of COVID-19 cases are infected by those who only had mild AGING symptoms [6,7]. COVID-19 patients have the highest viral load near symptom presentation [8]. Moreover, the rapid spread of COVID-19 has meant that large numbers of patients with suspicious symptoms are often crowded into fever clinics for diagnosis.
At present, cases are confirmed by a positive result with high-throughput sequencing or real-time reversetranscriptase polymerase-chain-reaction (RT-PCR) assay of samples from nasal or pharyngeal swabs [9]. However, nucleic acid tests are not available to all suspected patients in pandemic areas due to the shortage of equipment and reagents [10,11]. Testing for all cases with mild symptoms and/or an epidemiological history can lead to competition for resources. In addition, undiagnosed mild-type COVID-19 patients who were not properly isolated could become sources of infection as their viral load peaks near symptom presentation, which could explain the rapid spread of this epidemic [12]. A large proportion of infected cases continue to test negative for viral RNA, even after they develop clinical manifestations, and positive chest CT (computed tomography) results [13,14]. This dilemma demands a fast and accurate model for early screening for SARS-CoV-2 infections to prioritize high-risk patients for clinical care, isolation, and contact tracking. Previous studies reported that a number of COVID-19 patients exhibit lymphopenia and thrombocytopenia [15][16][17]. Blood counts and high-sensitivity C-reactive protein (hsCRP) are commonly used for early identification of fever [18], and CT is used to assess pneumonia. These tests are simple and fast, and nearly all patients with fever or respiratory symptoms can be tested. We first compared alterations of hematological parameters between cases with and without SARS-CoV-2 infection, then developed and validated a novel score-based prognostic model (COVID-19-REAL) for SARS-CoV-2 infection.

Patient characteristics
A total of 1019 patients were enrolled in this study out of the 1076 patients who presented to fever clinics until 5 February 2020. Fifty-seven patients were excluded, including one with stroke, two with organ transplantation, one with HIV, 12 with cancer, one with active tuberculosis, 18 with age < 12 years, and 22 unconfirmed cases until 10 February 2020 ( Figure 1). Of the 1019 patients, 485 (48%) were female, and the median age was 34 years (range 13 to 91 years). The characteristics of the patients are shown in Table 1. All received sequencing or nucleic acid testing using RT-PCR; 103 (10.11%) tested positive for SAR-CoV-2 (Supplementary Table 1).

Association factors for SARS-CoV-2 infection
The association between age and infection rate is presented in Figure 2A. The rate of SARS-CoV-2 infection increased with age. After stratifying patients by age quartile, the positive rate of SARS-CoV-2 infection from first to fourth quartile was 2.90%, 3.06%, 12.14%, and 23.81% in the training group, and 2.97%, 3.45%, 6.72%, and 23.28% in the validation group ( Figure 2B, C). The risk of infection in last two quartiles was relatively higher than the first two quartiles. The infection rate was lower (less than 5%) for patients with age < 32 years. Subgroup analyses were performed for patients with age ≥ 32 years to stratify those as high-risk population.

A COVID-19 prediction model based on age, leukocyte, and eosinophil and radiological evidence of pneumonia
The AUROC value for the prediction of leukocytes and eosinophils in the training group for COVID-19 diagnosis were 0.747 and 0.729, respectively. This was comparable to the validation group, where the AUROC value for leukocytes and eosinophils were 0.763 and 0.772 (Supplementary Figure 1). Using Youden's index, the optimal cut-off value for leukocytes and eosinophils were 6.05 × 10 9 /L and 0.005 × 10 9 /L.

DISCUSSION
Beginning in mid-January 2020, a large number of people living in Wuhan left the area via public transportation due to Chinese New Year, leading to a dramatic increase in confirmed or suspected cases nationwide. The management of these suspected cases is of major concern. Nucleic acid testing is currently the main diagnostic method, but the sensitivity and specificity of nucleic acid tests are yet to be verified, and the overall detection rate is constrained by virus concentration and sampling method. Another problem is that some patients with positive chest CT images test negative for COVID-19 by RT-PCR [14]. With such issues in mind, we proposed a robust, highthroughput screening model to help prioritize high-risk patients. We used the data of routine blood tests and CT images to develop a score system (COVID-19-REAL) that can stratify patients into risk groups. Suspected cases with 0 -< 3 points had a predicted probability of 99.16% in training and 97.3% in validation groups for not being infected by SARS-CoV-2. This risk classification can be employed by clinicians and medical institutions, especially those with inadequate detection reagents or equipment, to make rational allocation of resources.
Previous investigations have revealed valuable information about demographics for COVID-19. Most patients with COVID-19 are older [16]. We first stratified patients according to age. Two earlier studies stated the median age of the patients was 56 and 59 years [15,19]. In our study, the median age was 47 years. We found the risk of infection significantly increased with age, from less than 3% to over 23% from the first to last quartile.
The level of leukocytes, monocytes, lymphocytes, eosinophils, neutrophils, and platelets was dramatically lower in COVID-19 patients. Our results are consistent with previous research that patients exhibited leukopenia, lymphopenia, and thrombocytopenia after SARS-CoV-2 infection [15,20]. Some researchers suggested a decreased level of white blood cells could serve as an auxiliary diagnosis [20]. Similar patterns emerged in SARS-CoV, with cases of lymphopenia and neutropenia [21,22], and decreased levels of leukocytes and platelets [23]. A SARS-CoV model showed that neutrophils, lymphocytes, and leukocytes were significantly reduced the day after infection [24]. In a SARS-CoV MA15 infection model, the decrease of peripheral blood cells was explained by inflammatory cell infiltration to the lungs [25]. The N protein of AGING SARS-CoV enhances eosinophilic infiltration into the lungs and aggravates lung inflammation [26]. Lung lesions were the most important feature of SARS-CoV-2 infections [20], and eosinophilopenia may indicate a poor prognosis of COVID-19 [27]. These results shed light on the neglected role that eosinophils might play in the progression of respiratory disease.
To better stratify SARS-CoV-2 infection risk for the suspected cases, four criteria including leukocytes < 6.05×10 9 /L (1 point), eosinophils < 0.005×10 9 /L (1 point), radiological evidence of pneumonia (1 point), and age ≥ 32 years (2 point) were used to determine the likelihood of SARS-CoV-2 infection. We defined four risk groups: very low risk (0 point), low risk (1 -2 points), moderate risk (3 points), and high risk (4 -5 points). According to the cut-off value that was assigned as less than 3 points of COVID-19-REAL score, the number of suspected cases who required priority examination and hospitalization decreased by 70.94% and 71.98%, while maintaining a false negative rate of 2.70% and 2.24% in training and validation group, respectively.
Clinical decision models have been explored to predict infection of SARS-CoV-2. Sun et al. [28] studied 788 cases in Singapore to identify populations at high risk for COVID-19. From their large population-based study, a model that combined laboratory blood tests, clinical findings, and radiology was proposed, and the AUROC was 0.88 (95% CI: 0.83-0.93). Similar to our cohort, those authors found that eosinophils and CT imaged pneumonia were strong predictors. However, their conclusions were limited by a lack of external verification, clinical inapplicability caused by redundant parameters, and missing data in laboratory blood tests.
The advantage of present study is that a simple and applicable prediction model, COVID-19-REAL, which combines age, radiological image, and two functionally related hematological indicators (i.e., leukocytes and eosinophils) has been developed to stratify and distinguish between high-and low-risk populations suspected of SARS-CoV-2 infection. This evaluation of suspected cases based on age, radiological image, and two dichotomous criteria could be easily implemented in routine clinical practice. In clinical settings where resources and testing kits are limited, patients with advanced respiratory symptoms are usually tested first. However, those undiagnosed mild-type COVID-19 patients who were not properly isolated would become sources of infection as the viral load peaked near symptom presentation. This score system will be of great help for early infection screening and offer more information for physicians to help prioritize high-risk patients.

AGING
There are limitations in current study. Our training and validation data comes from China; their applicability to Western populations must be separately evaluated. The results were obtained from people over 12 years of age, and may not be applicable to younger people. Only routine tests including hsCRP, radiological image, and blood cell count were performed, and other hematological indicators including liver and kidney function are lacking.
In conclusion, this study provides a simple, practical, and robust screening model (COVID-19-REAL) to identify high risk populations for SARS-CoV-2 infection. This prediction model will help reduce the burden on hospitals in pandemic areas and help them allocate resources more rationally.

Patients
Suspect cases of COVID-19 with age ≥13 years with an epidemiological history were included from fever clinics of the First Affiliated Hospital, College of Medicine, Zhejiang University and Taizhou Enze Medical Center (Group), Enze Hospital, between 23 January 2020 and 5 February 2020. All suspected cases received sequencing or RT-PCR assay for SARS-CoV-2. According to National Health Commission, an epidemiological history of COVID-19 is defined as follows: within 14 days before the onset of the disease (1) there were tourism or residence histories of Wuhan or its surrounding areas, or other communities with confirmed cases; (2) there were contacts with confirmed cases of COVID-19; (3) there were contacts with suspected cases (having fever or respiratory symptoms) from Wuhan or its surrounding areas, or other communities with confirmed cases; (4) one confirmed case was found in an enclosed environment (such as a family house, a construction site, an office, etc.), with one or more cases of fever/respiratory tract infection re found at the same time The patient-selection process is shown in Figure 1.
The COVID-19 cases were all confirmed by sequencing or RT-PCR assay [9]. The RT-PCR was mainly performed using a commercial kit for SARS-CoV-2 detection (BoJie, Shanghai, China) which was approved by China Food and Drug Administration. We excluded patients with HIV infection, cancer, organ transplantation, stoke, active tuberculosis, severe and critical COVID-19 patients according to the National Health Commission [17], and suspected cases without confirmed laboratory evidence until 10 February 2020. The study was approved by the Ethics Committee of the First Affiliated Hospital, College of Medicine, Zhejiang University, and complied with the ethical guidelines of the Declaration of Helsinki. The researchers only analyzed anonymous data, so informed consent was waived. Age, gender, laboratory assessments consisting of hsCRP, complete blood count, and radiological images were obtained from electronic medical records. Radiological evidence of pneumonia was defined as lung consolidation and/or ground-glass opacity [20]. The images were reviewed independently by two radiologists, and if there were disagreements, a third radiologist would perform further examination.

Statistical analysis
Continuous variables were expressed as medians and interquartile range (IQR), and were compared by t-test or

Supplementary Tables
Supplementary Table 1