CT-based radiomics for predicting the rapid progression of coronavirus disease 2019 (COVID-19) pneumonia lesions

Objectives: To develop and validate a radiomic model to predict the rapid progression (defined as volume growth of pneumonia lesions > 50% within seven days) in patients with coronavirus disease 2019 (COVID-19). Methods: Patients with laboratory-confirmed COVID-19 who underwent longitudinal chest CT between January 01 and February 18, 2020 were included. A total of 1316 radiomic features were extracted from the lung parenchyma window for each CT. The least absolute shrinkage and selection operator (LASSO), Relief, Las Vegas Wrapper (LVW), L1-norm-Support Vector Machine (L1-norm-SVM), and recursive feature elimination (RFE) were applied to select the features that associated with rapid progression. Four machine learning classifiers were used for modeling, including Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR), and Decision Tree (DT). Accordingly, 20 radiomic models were developed on the basis of 296 CT scans and validated in 74 CT scans. Model performance was determined by the receiver operating characteristic curve. Results: A total of 107 patients (median age, 49.0 years, interquartile range, 35–54) were evaluated. The patients underwent a total of 370 chest CT scans with a median interval of 4 days (interquartile range, 3–5 days). The combination methods of L1-norm SVM and SVM with 17 radiomic features yielded the highest performance in predicting the likelihood of rapid progression of pneumonia lesions on next CT scan, with an AUC of 0.857 (95% CI: 0.766–0.947), sensitivity of 87.5%, and specificity of 70.7%. Conclusions: Our radiomic model based on longitudinal chest CT data could predict the rapid progression of pneumonia lesions, which may facilitate the CT follow-up intervals and reduce the radiation. Advances in knowledge: Radiomic features extracted from the current chest CT have potential in predicting the likelihood of rapid progression of pneumonia lesions on the next chest CT, which would improve clinical decision-making regarding timely treatment.


INTRODUCTION
The rapid spread of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) as a potentially fatal disease is a major and urgent threat to global health. 1 As of August 14, 2020, there are more than 21.05 million confirmed cases by the World Health Organization (WHO) with 752,378 deaths. 2 Since the outbreak of COVID-19, chest CT plays an indispensable role in the detection, diagnosis, and follow-up of COVID-19 pneumonia. 3 Chest CT not only presents the clinical course of COVID-19 infection and the disease severity but also predicts the poor outcomes of patients. [4][5][6] However, multiple CT scans in short time during the COVID-19 pandemic arouses great concern about the radiation burden of the patients and healthcare workers. It is widely accepted that ionizing radiation increases the lifetime likelihood of developing cancer. 7 Some previous studies have tried a low-dose chest CT scan in the diagnosis of COVID-19 pneumonia to reduce radiation dose. [8][9][10][11][12] However, low-dose CT may miss some key Objectives: To develop and validate a radiomic model to predict the rapid progression (defined as volume growth of pneumonia lesions > 50% within seven days) in patients with coronavirus disease 2019 (COVID- 19). Methods: Patients with laboratory-confirmed COVID-19 who underwent longitudinal chest CT between January 01 and February 18, 2020 were included. A total of 1316 radiomic features were extracted from the lung parenchyma window for each CT. The least absolute shrinkage and selection operator (LASSO), Relief, Las Vegas Wrapper (LVW), L1-norm-Support Vector Machine (L1-norm-SVM), and recursive feature elimination (RFE) were applied to select the features that associated with rapid progression. Four machine learning classifiers were used for modeling, including Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR), and Decision Tree (DT). Accordingly, 20 radiomic models were developed on the basis of 296 CT scans and validated in 74 CT scans. Model performance was determined by the receiver operating characteristic curve.
Results: A total of 107 patients (median age, 49.0 years, interquartile range, 35-54) were evaluated. The patients underwent a total of 370 chest CT scans with a median interval of 4 days (interquartile range, 3-5 days). The combination methods of L1-norm SVM and SVM with 17 radiomic features yielded the highest performance in predicting the likelihood of rapid progression of pneumonia lesions on next CT scan, with an AUC of 0.857 (95% CI: 0.766-0.947), sensitivity of 87.5%, and specificity of 70.7%. Conclusions: Our radiomic model based on longitudinal chest CT data could predict the rapid progression of pneumonia lesions, which may facilitate the CT follow-up intervals and reduce the radiation. Advances in knowledge: Radiomic features extracted from the current chest CT have potential in predicting the likelihood of rapid progression of pneumonia lesions on the next chest CT, which would improve clinical decision-making regarding timely treatment.
signs of COVID-19 pneumonia compared with standard-dose CT. In this present study, for the first time, we aimed to develop a CT-based radiomic model to predict the probability of rapid progression of COVID-19 pneumonia to guide the follow-up interval of chest CT scan, which may reduce ionizing radiation dose and estimated cancer risk.

Patient data
This study was approved by the institutional review board and the need for written informed consent was waived. A total of 118 COVID-19 patients from two designated hospitals were retrospectively included between January 8, 2020 and February 25, 2020. Adult patients had a laboratory-confirmed COVID-19, which was achieved by real-time reverse transcriptionpolymerase chain reaction (RT-PCR) assay of throat swab samples (at least two samples were taken, at least 24 h apart) for COVID-19 according to the protocol established by the WHO. The 118 CT scans from 11 patients were excluded due to no follow-up CT scans or the interval time of two adjacent CT scans >7 days. Finally, a total of 370 CT scans from 107 patients were analyzed in this study. The whole dataset was randomly divided into two subsets, 80% for training and the remaining 20% for validation using 10-fold cross-validation. A representative case and the flowchart of patients and CT scans inclusion are shown in Figure 1. The distribution of number of CT scans and patients under different time interval between two adjacent CT scans is illustruted in Figure 2.

CT image acquisitions
Patients underwent chest CT scans by CT 64 scanner (GE Medical System), Siemens Emotion 16 scanner (Siemens Healthineers; Erlangen, Germany), or ICT 128 scanner (Philips Healthcare, Netherlands). No contrast agent was administered. CT acquisition parameters of the three CT scanners were shown in Table 1.

CT image segmentation
We used a previously trained 3D U-net ++that based on 2000 COVID-19 pneumonia cases to pre-segment the COVID-19 pneumonia lesions in this present study. Several days later, an experienced radiologist (with more than 15 years' experience in chest imaging) edited and verified the pre-segmentation results, removed false positives and segmented the missed lesions.
Radiomic feature extraction All raw CT images were preprocessed by 1 mm*1 mm*1 mm resampling. Radiomic features were extracted from the lung window (window width: 1500 Hounsfield Unit [HU], window level: 600 HU) in the Python (v.3.7.0, Beaverton, Ore; https://www. python. org/) by using the Pyradiomics package (v.3.0; https:// github. com/ Radiomics/ pyradiomics). The parameters used in different transforms in c were presented in Table 2. A total of 1316 radiomic features were extracted under seven image types including Originial, Wavelet, LoG, Square, SquareRoot, Exponential and Logarithm. The class and corresponding number of radiomics features are presented in Table 3. All radiomic features were normalized by min_max.

Radiomic feature selection
Considering the high-dimensional radiomic features may contain redundant information, five feature selectors including the least absolute shrinkage and selection operator (LASSO), 13 Relief, 14 Las Vegas Wrapper (LVW), 15 L1-norm-Support Vector Machine (L1-norm-SVM), 16 and recursive feature elimination (RFE) 17 were used to reduce the dimensions of the features before the machine learning was applied to train the models.
Machine-learning-based radiomic model construction Rapid progression of pneumonia lesions meant volume growth >50% within seven days, which was calculated as the ratio of the pneumonia volume on the next CT scan (V2) to the pneumonia volume on the current CT scan (V1). The threshold of 50% was identified according to the COVID-19 guidelines (trial version 6) released by the National Health Commission of China. 18 V2/ V1 >1.5 indicates that the current CT scan is a positive sample, otherwise, a negative sample. For unbiased estimates of diagnostic accuracy, our dataset was randomly split into training and validation datasets with a ratio of 4:1. The proportions of positive and negative samples in training and testing datasets were the same when splitting the dataset. Four common machine learning algorithms including Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR) and Decision Tree (DT) were applied to predict the occurrence of rapid progression. To select the optimal model and hyperparameters for each model, we conducted 10-fold cross-validation on each training dataset. The hyperparameters of 20 combination models were showed in Supplementary Material 1. The model with the highest area under the curve (AUC) is considered to be the optimal model. All models building was performed in the Python environment (v.3.7.0, https://www. python. org/) by using the Scikit-learn package (v.0.23.1; https:// scikit-learn. org/).

Statistical analysis
Categorical variables were expressed as counts and percentage, while continuous variables are shown as median and interquartile range. All the statistical analyses were performed using Python, v.3.7.0 (Beaverton, Ore; https://www. python. org/). The packages were used as follows: "mlr" for LR, "randomForest" for RF, and "e1071" for SVM. Receiver operating characteristic curve (ROC) analyses were performed to evaluate the performance of different models to predict the occurrence of rapid progression. The AUC comparison of different models used Delong test. 19 A p < 0.05 was considered significant.

DISCUSSION
In this present study, we developed and validated radiomic models based on this chest CT scan to predict the probability of rapid progression of COVID-19 pneumonia on next CT scan. We combined five feature selection methods and four classification methods and the results showed that the combination of L1-norm SVM +SVM outperformed other combinations, yielding an AUC of 0.857 (95% CI: 0.766-0.947), sensitivity of 87.5%, specificity of 70.7%, and accuracy of 74.3%.
Radiomics is a quantitative tool for medical imaging, which enhances the existing data available to clinicians by means of advanced mathematical analysis from the field of machine learning. This novel approach has been widely used in diagnosing, staging, predicting treatment response and prognosis of cancers. Since the outbreak of COVID-19, some studies have applied CT-based radiomics to copy with this emergent infectious diseases. By high-throughput extracting huge amounts of features from chest CT images of COVID-19 pneumonia, radiomics can reflect underlying information that associates with disease heterogeneity. Fang et al developed a radiomic nomogram with high performance in differentiating COVID-19 from other types of viral pneumonia. 20 Recent studies also used radiomics to diagnose and predict the outcomes of COVID-19. For example, researchers proposed a non-invasive and quantitative radiomic model using CT to predict poor outcomes in advance among COVID-19 patients. [21][22][23] Wei et al 24  Current evidence demonstrated that radiomics has potential in the clinical management of COVID-19. Unlike previous studies, this study provided a radiomic tool to predict COVID-19 patients who had high-risk of rapid progression of pneumonia within seven days and the results showed promising.
The performance of a radiomic model would be affected by the each step of radiomic workflow. The differences in noise and resolution of CT images from the different CT systems may impact the reproducibility of radiomics, for instance, different values of radiomic features. 29 We used image pre-processing to reduce the bias caused by different scanners and imaging protocols. According to previous studies, we extracted three common classes of radiomic features (first-order statistics, shape-based features and texture features) by using the Pyradiomics package. Regarding the various choices of feature selection and modelling methodologies, the identification of optimal machine learning  methods for radiomic applications is a crucial step towards stable and clinically relevant clinical-decision support systems; thus, multiple machine-learning methods should be employed and compared. In this study, we chose five feature selectors and four modeling methods to identify the best combination and found that L1-norm SVM +SVM achieved the highest performance in the specific task of predicting rapid progression of COVID-19. Zhang et al 30 evaluated six feature selection methods and nine classifiers to predict the recurrence and distant metastasis in patients with advanced nasopharyngeal carcinoma and found that the combination methods Random Forest (RF) +RF performed the best. In this study, we identified 17 radiomic features that were most strongly related to the prediction outcome, consisting of 3 first-order statistics and 14 texture features. All are associated with image uniformity and heterogeneity. COVID-19 pneumonia lesions with high-risk of rapid progression were more heterogeneous (e.g. mixed ground-glass opacity and consolidation) than those with low-risk of rapid progression. Previous studies have showed that radiomics or texture analysis can characterize tumor phenotypes and reflect the tumor heterogeneity. 31,32 This study also demonstrated that radiomics features can serve as an effective biomarker of COVID-19 pneumonia by reflecting the heterogeneity of lesions.

Model
This study also has some limitations. First, the retrospective nature of this study. Second, the clinical and laboratory variables did not be integrated into the prediction model because they were not matched with the each chest CT examinations. Third, the effect of treatment on the COVID-19 pneumonia did not be considered because there were no specific treatment of COVID-19. The drugs of COVID-19 used in clinical setting were mixed although there were recommends of guidelines. Finally, this model lacks of external validation, whose generalization needs to be tested in other institutions.
In conclusion, we proposed a CT-based radiomic model to predict the rapid progression of COVID-19 pneumonia, which may rationalize the chest CT follow-up intervals of COVID-19 patients and would benefit the clinical management of COVID-19 patients.