Introduction

Coronavirus disease 2019 (COVID-19) causes severe respiratory complications, including acute respiratory failure and has put significant strain on healthcare systems worldwide to accommodate the massive influx of critically ill patients. For example, patients with acute-respiratory distress syndrome (ARDS) from COVID-19 often require intubation and intensive care unit (ICU) care, which are resource intensive [1]. Because of the shortage of mechanical ventilators and ICU care, it is crucial to accurately and timely predict COVID-19 patients who will require critical care. More importantly, an early determination of patient prognosis is useful when implementing new treatments such as remdesivir [2], convalescent plasma transfusion [3], ruxolitinib [4], and other emerging therapies, leading to better outcomes [5, 6]. Progression to critical illness in COVID-19 occurs across all patients, especially those with comorbidities such as obesity, cardiovascular disease, chronic lung disease, hypertension, or cancer [7]. However, it remains challenging to predict when a COVID-19 patient will progress to critical illness.

Beyond patient demographics and laboratory parameters, chest CT has been instrumental in distinguishing severe from non-severe cases of COVID-19. For example, chest CT severity score has been developed and shown to correlate disease severity and/or emergent status in COVID-19 patients [8,9,10]. Patients with severe disease will generally have diffuse multi-lobe involvement, pleural effusion, consolidation, bronchial wall thickening, and poor lung aeration on chest CT [8,9,10,11,12]. Due to the characteristic signs of COVID-19 on chest CT, artificial intelligence (AI) may have utility in ascribing disease severity status and prognosis to patients [13].

Using chest CT and clinical data, this study aimed to develop an artificial intelligence (AI) system to predict future deterioration to critical illness in COVID-19 patients.

Methods

Patient cohorts

A total of 1,051 patients with confirmed positive COVID-19 by RT-PCR and chest CT imaging suggestive of pneumonia were retrospectively identified from nine hospitals in Hunan Province, China; the Hospital of the University of Pennsylvania in Philadelphia (HUP) in Philadelphia, PA; the Rhode Island Hospital (RIH) in Providence, RI; and open-source data from a previously published paper [13]. Ninety COVID-19 patients without abnormality on chest CT were excluded as they did not develop critical illness and the addition of these patients would have inflated model performance. A diagram illustrating patient inclusion and exclusion criteria is shown in Fig. 1.

Fig. 1
figure 1

Patient exclusion inclusion workflow. Abbreviations: CT, computed tomography; RIH, Rhode Island Hospital; HUP, Hospital of the University of Pennsylvania; AI, artificial intelligence; RT-PCR, reverse transcriptase polymerase chain reaction

The identified CT scans were directly downloaded from the hospital Picture Archiving and Communications System (PACS). The open-source data containing chest CT images and clinical metadata of COVID-19 patients confirmed by molecular PCR were directly downloaded from the China National Center for Bioinformation (http://ncov-ai.big.ac.cn/download?lang=en). Four radiologists, each with 5–10 years of experience in thoracic radiology and direct clinical experience with COVID-19 chest CT cases, reviewed and assessed the CT scans. Two radiologists independently reviewed each half of the 1051 cases (F. F. Xie and L. P. Zhu for one half, S. Li and D. Cao for the other half). A third radiologist (Z. Xiong), with 20 years of cardiothoracic imaging experience, helped to resolve differences to reach consensus.

The RT-PCR results were extracted from the patients’ electronic medical records in the hospital information system. The RT-PCR assays were performed using TaqMan One-Step RT-PCR Kits from Shanghai Huirui Biotechnology Co., Ltd. or Shanghai BioGerm Medical Biotechnology Co., Ltd., both of which have had their use approved by the China Food and Drug Administration and the COVID-19 RT-PCR test (Laboratory Corporation of America) for Chinese and US cohorts, respectively. For patients with multiple RT-PCR assays, a positive result on the last performed test was used as a confirmation of diagnosis.

The institutional review boards of all institutions approved this retrospective study, and written informed consent was waived. To avoid any potential breach of confidentiality, the patient data were de-identified and had no linkage to the researchers.

Clinical information

The patient’s age, sex, symptom (presence or absence of fever), white blood cell count, lymphocyte count, comorbidity status (cardiovascular disease, hypertension, chronic obstructive pulmonary disease, diabetes, chronic liver disease, chronic kidney disease, cancer, and human immunodeficiency virus [HIV]), and exposure history with the COVID-19 epicenter and/or another patient with COVID-19 were collected. Utilization of mechanical ventilation and/or intensive medical care and/or progression to death was recorded as well. For all patients, admission and discharge times were also recorded.

The patients were determined to be critical or have severe disease if they reached any of the following endpoints: mechanical ventilation, intensive medical care, and death. If not, they were determined to be non-critical. For critical patients, the timeframe of their progression to critical event was calculated from time of CT to the earliest time of developing one of the above three critical events. A comparison of clinical data among the four patient cohorts is shown in Supplementary Table 1.

AI models

First, the pulmonary tissue and lung parenchymal abnormality from COVID-19 were automatically segmented on CT images by a deep learning (DL) model using deep convolutional neural network. Examples of the automatically segmented pulmonary tissue and lung parenchymal abnormality from COVID-19 are demonstrated in Supplementary Figure 1. Second, DL-based severity prediction models were built using lung and lesion segmentations to determine whether a patient with COVID-19 will develop critical or non-critical illness at the time of CT scan. Third, progression prediction models were built by using DL features extracted from the severity prediction models and clinical data as input to random survival forest to assign risk scores to different subjects. The workflow pipeline is shown in Fig. 2. The detailed descriptions of AI-based lung and lesion segmentation (Supplementary Text 1), AI-based severity prediction models (Supplementary Text 2), and AI-based progression prediction models (Supplementary Text 3) are included in the supporting information.

Fig. 2
figure 2

Illustration of our analysis pipeline. The pipeline includes a severity prediction stage and two progression prediction branches. (a) Deep learning (DL)–based severity prediction. The top 10 segmented lung slices by largest area of pathology were used as input to EfficientNet to predict disease severity based on individual slices, and then pooled to predict severity at the patient level. (b) DL-based progression prediction. In this branch, 256-D DL features from the model were aggregated via an average pool layer for each patient. Then, a random survival forest model was optimized based on the DL features to assign risk scores to different subjects. (c) Clinical (Clin) based progression prediction. In this branch, 15 clinical features extracted from demographic recordings were input to another survival forest model to assign risk scores to different subjects. Finally, for each patient, the DL-based prediction and Clin-based prediction were combined to predict progression for each patient

Statistical analysis

Segmentations were evaluated by calculating the Dice similarity coefficient scores and visual examinations. Accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (ROC-AUC) were calculated for the classification models. The 95% confidence intervals on accuracy, sensitivity, and specificity were determined using the adjusted Wald method [14]. C-index for right-censored data was applied to evaluate the performance of progression prediction models [15]. Time-dependent ROC-AUC was calculated from the obtained risk scores and progression information via the Kaplan-Meier method [16] to further evaluate the progression prediction performance.

Data and code availability

In respect of patient confidentiality and consent, the CT images and clinical information datasets analyzed in this study are not available for download. The models generated by these datasets are publicly available at https://www.dropbox.com/sh/g1w13gyoezq36y8/AAC0DvGyuLHdPPXKtvOQ_lTma?dl=0

The DL models for severity prediction were implemented with Keras (version 2.2.5) and Tensorflow (version 1.12.3) and trained with NVIDIA V100 GPUs. The progression prediction models were implemented with Scikit-Learn (version 0.21.3). To allow other researchers to develop their models, the code is publicly available on Github at http://github.com/robinwang08/COVID19.

Results

Patient characteristics

From the 1,051 patients with RT-PCR confirmed COVID-19 and chest CT, 282 patients developed critical illness (aka severe disease), which was defined as requiring ICU admission and/or mechanical ventilation and/or reaching death during their hospital stay. The median time of initial CT acquisition from RT-PCR confirmed COVID-19 diagnosis was 2 days. The distribution of time between CT acquisition and RT-PCR diagnosis is shown in Supplementary Figure 2. The median age of patients who progressed to critical illness was higher than that of non-critical patients (57 vs. 45 years, p < 0.001). The median number of days from admission to critical illness was 0.4 days (range: 0 to 30 days). The clinical characteristics of COVID-19 patients with critical or non-critical illness are shown in Table 1. This cohort was randomly divided into training, validation, and testing sets with a 7:1:2 split ratio to build the severity prediction models and progression prediction models. Patient characteristics across sets can be found in Supplementary Table 2.

Table 1 Clinical characteristics of critical and non-critical COVID-19 patients

AI-based segmentation of lung and lesion

Deep learning–based lung and lesion segmentations resulted in mean validation Dice similarity coefficients of 0.97 ± 0.02 and 0.62 ± 0.22, respectively, using manual segmentation as the gold standard. The average volume error for lesion is 27.1% ± 29.0% of the ground truth. Average false positive rate is 48.3% and average false negative rate is 12.6%, indicating that the automatic model tends to overestimate the lesion. Supplementary Figure 3 shows the Bland-Altman plots for the volume differences from the automatic and manual segmentation. Therefore, in order to reduce the impact from automatic segmentation, the generated lung and lesion segmentations were then manually corrected as needed—a total of 115 lesion segmentations (11%) were manually corrected. Examples of manually corrected segmentations are shown in Supplementary Figure 4.

Severity and progression prediction models

The model utilizing the top 10 segmented lung slices based on the largest lesion area for each patient achieved a slice level accuracy of 0.778 (95% CI: 0.760–0.795), sensitivity of 0.710 (95% CI: 0.667–0.751), specificity of 0.796 (95% CI: 0.776–0.815), and a ROC-AUC of 0.832. At the pooled patient level, the model achieved an accuracy of 0.833 (95% CI: 0.776–0.877), sensitivity of 0.622 (95% CI: 0.476–0.749), specificity of 0.890 (95% CI: 0.832–0.930), and a ROC-AUC of 0.856. A summary of the results for the model is shown in Supplementary Table 3 with the ROC-AUC graphs presented in Fig. 3.

Fig. 3
figure 3

Performance of deep learning severity model in area under receiver operating characteristic curve (ROC-AUC) utilizing top ten segmented lung slices by largest lesion area

Built to predict the length of time from admission to critical illness, the progression prediction model was optimized on the same data split of severity prediction model and tested on an independent test set. The prediction model achieved a C-index of 0.804, demonstrating success in assigning patients with risk scores consistent with their progression outcomes. As shown in Fig. 4a–c, the progression prediction model achieved time-dependent ROC-AUC of 0.82, 0.81, and 0.83 for prediction of progression risk at cutoff values of 3, 5, and 7 days, respectively. Median of risk scores obtained by the progression prediction model were utilized to stratify patients into high-risk and low-risk subgroups. As indicated by survival curves of the stratified risk groups shown in Fig. 4f, the high-risk and low-risk subgroups had statistically significant difference in the risk of disease progression (p < 0.0001, log-rank test).

Fig. 4
figure 4

Time-dependent ROC-AUCs and risk stratified subgroup survival curves based on deep learning (DL) features extracted from top lung slices. ac Time-dependent ROC curves and AUCs with different cutoff values (3-day, 5-day, and 7-day). df The risk-stratified survival curves based on DL-based progression prediction, clinical-based progression prediction, and combined progression prediction. The y-axis is survival probability representing the probability of not progressing to critical event. The “+” in survival curves denotes the censored patient. Risk tables of these stratification results are also listed in the bottom of this figure

Progression prediction models based on DL features and clinical data separately achieved C-index values of 0.719 and 0.774, respectively. As shown in Fig. 4d and e, the median of risk scores obtained by these two progression prediction models also successfully stratified the patients into groups with distinctive critical outcomes (p value < 0.0001, log-rank test). The model based on the CT severity scores as described by Yeun-Chung Chang et al [17] achieved a C-index value of 0.724. The progression prediction and risk stratification performances of the aforementioned methods are summarized in Supplementary Table 4.

Discussion

Early disease detection and treatment has been linked to decreased mortality in COVID-19 patients, especially for those who are severely ill [5, 6]. Approximately 15% of COVID-19 patients develop acute-respiratory distress syndrome and over half of ICU admits develop hypoxia or respiratory exhaustion [18]. Early anticipation of severe disease development is crucial because it allows for timely intervention, which can potentially improve outcomes for critically ill COVID-19 patients [18]. Further, strained resources such as mechanical ventilators or extracorporeal membrane oxygenation (ECMO) machines can be allocated appropriately when patients’ disease trajectories are known. This study developed an AI system based on chest CT and clinical data that predicts COVID-19 disease progression better than clinical data alone. This is relevant and impactful because it demonstrates that AI has the potential to help identify patients at risk for progression to critical illness and affect patient care by integrating data from multiple sources.

AI can help in identifying patients at risk for progression to critical illness within the timeframe for early treatment and improve prognosis. Remdesivir was recently approved for emergency use by the Food and Drug Administration (FDA) and, along with ruxolitinib [4], has shown preliminary promise in reducing recovery time for critically ill patients [19]. Similarly, convalescent plasma transfusion (CPT) has shown efficacy in reducing clinical symptoms and mortality for severely ill COVID-19 patients [3]. Historically, CPT has been used to successfully treat Spanish influenza A (H1N1), severe acute respiratory syndrome (SARS), and Ebola among others [3]. If these results are maintained in future studies, identifying patients with severe disease early on may be necessary to maximize clinical benefit of these treatments.

This chest CT–based AI system has the potential to assist physicians in patient management by enhancing clinical data in the prediction of progression risk. In contrast to previous studies [13, 20, 21], this study was attentive to the exact timespan between the performance of chest CT and the earliest occurrence of critical events (i.e., ICU admission, intubation, or death) using a multi-international cohort of patients from different institutions. Different from existing studies that build prediction models using pattern classification techniques, our AI system was built in a time-to-event (survival) analysis framework that can effectively handle censored data in the risk prediction study. By using the risk prediction AI model based on CT imaging and clinical data, it may be possible to stratify patients into different risk groups for progression to critical illness, assign a critical window for early treatment, and have a more informed timeline for obtaining advanced respiratory support equipment.

The study has several limitations. There was likely patient selection bias secondary to the retrospective and multi-institutional nature of the study. For example, the clinical characteristics differed between the cohorts from USA and China. While the percentage of critical patients was 22% for the cohort form China, it was almost 50% for both cohorts from USA. The data heterogeneity likely reflects the differences in practice pattern. In the Chinese population, the CT scans were used more often for COVID-19 screening, especially during the earlier period of the pandemic. Other explanations include the difference in demographics and disease prevalence. However, our study encompasses data from two hospitals in the USA, nine hospitals in China, and open-source data [13]. The stable performance on an independent held-out test set supported the robustness and generalizability of the model. Additionally, the definition of critical outcome defined herein may encompass more patients than is typical of other COVID severity studies such as respiratory failure, septic shock, or multiple organ dysfunction. The present study focused on patients who went to the ICU, were intubated, or died.

Conclusions

AI based on CT imaging and clinical data has the potential for prediction of risk for future deterioration to critical illness among patients with COVID-19. As this technology is further developed, providers may be able to utilize AI to help designate high-risk patients based on their disease severity and progression risk prediction, enabling them to readily allocate appropriate treatments, equipment, and other necessary resources and to assist with early clinical decision-making.