Radiomics Analysis of Iodine-Based Material Decomposition Images With Dual-Energy Computed Tomography Imaging for Preoperatively Predicting Microsatellite Instability Status in Colorectal Cancer

Purpose: The aim of this study was to investigate the value of radiomics analysis of iodine-based material decomposition (MD) images with dual-energy computed tomography (DECT) imaging for preoperatively predicting microsatellite instability (MSI) status in colorectal cancer (CRC). Methods: This study included 102 CRC patients proved by postoperative pathology, and their MSI status was confirmed by immunohistochemistry staining. All patients underwent preoperative DECT imaging scanned on either a Revolution CT or Discovery CT 750HD scanner, and the iodine-based MD images in the venous phase were reconstructed. The clinical, CT-reported, and radiomics features were obtained and analyzed. Data from the Revolution CT scanner were used to establish a radiomics model to predict MSI status (70% samples were randomly selected as the training set, and the remaining samples were used to validate); and data from the Discovery CT 750HD scanner were used to test the radiomics model. The stable radiomics features with both inter-user and intra-user stability were selected for the next analysis. The feature dimension reduction was performed by using Student's t-test or Mann–Whitney U-test, Spearman's rank correlation test, min–max standardization, one-hot encoding, and least absolute shrinkage and selection operator selection method. The multiparameter logistic regression model was established to predict MSI status. The model performances were evaluated: The discrimination performance was accessed by receiver operating characteristic (ROC) curve analysis; the calibration performance was tested by calibration curve accompanied by Hosmer–Lemeshow test; the clinical utility was assessed by decision curve analysis. Results: Nine top-ranked features were finally selected to construct the radiomics model. In the training set, the area under the ROC curve (AUC) was 0.961 (accuracy: 0.875; sensitivity: 1.000; specificity: 0.812); in the validation set, the AUC was 0.918 (accuracy: 0.875; sensitivity: 0.875; specificity: 0.857). In the testing set, the diagnostic performance was slightly lower with AUC of 0.875 (accuracy: 0.788; sensitivity: 0.909; specificity: 0.727). A nomogram based on clinical factors and radiomics score was generated via the proposed logistic regression model. Good calibration and clinical utility were observed using the calibration and decision curve analyses, respectively. Conclusion: Radiomics analysis of iodine-based MD images with DECT imaging holds great potential to predict MSI status in CRC patients.


INTRODUCTION
Colorectal cancer (CRC) is the third most common cancer and the second leading cause of cancer-related death worldwide (1). The occurrence and development of CRC are accompanied by a series of genetic abnormalities, of which microsatellite instability (MSI) is an important pathway in carcinogenesis (2). According to previous reports, even though MSI occurs in only approximately 15% of CRCs, it has gained considerable attention by clinicians owing to its significant value for CRC prognosis and treatment (2,3). Microsatellite stability (MSS) status is maintained by the mismatch repair (MMR) genes, which are applied to repair genetic sequences that have been erroneous during replication in normal tissues. When MMR system is impaired, the error microsatellite sequences will accumulate, resulting in MSI and early onset of CRC (2). Obtaining MSI status is necessary because the MSI CRC tissues possess special biological behaviors, they are more likely to have a better prognosis and benefit from immunotherapy, and they may be resistant to fluorouracil chemotherapy (4). However, the methods for assessing MSI status including immunohistochemistry (IHC) and polymerase chain reaction (PCR) are all based on pathological tissues obtained by invasive methods. And these advanced biological tests have not been widely generalized owing to the limitation of advanced medical equipment in local institutions (5). Thus, development of noninvasive and cost-effective method for predicting MSI status could be meaningful for clinicians to obtain more diagnostic clues and guide further treatment strategies.
Given the growing number of applications in clinical diagnosis, dual-energy computed tomography (DECT) has been considered as a milestone in CT imaging because it can provide quantitative measurements to characterize the lesions (6). DECT can generate accurate iodine-based material decomposition (MD) images, which can reflect the vascularization of various tissues via measuring the contrast material (iodine) concentration (IC) (7)(8)(9). And the correlation between IC values and MSI status has been reported in previous studies (10,11). However, from the iodine-based MD images, we can only routinely obtain the mean value of IC in lesions, and more imaging characteristics such as heterogeneity remain untapped. Radiomics analysis achieved the conversion of medical images to high-dimensional mineable data to quantitatively and comprehensively describe tissues' characteristics from imaging (12). Several scholars have reported that the radiomics features extracted from CT images showed some value in predicting MSI status in CRC patients; however, the diagnostic performance was limited (13,14). Accordingly, we have presumed that the radiomics analysis of iodine-based MD images might serve as a non-invasive and reproducible way to preoperatively assess MSI status in CRC patients and set up a study to investigate its diagnostic efficacy.

Patient Population
Our institutional review board approved this retrospective study with waiver of the informed consent. Patients examined in our institution from January 2016 to March 2019 who met the following criteria were included in our study. Inclusion criteria are as follows: (1) underwent curative-intent surgical resection and diagnosed as CRC by postoperative pathology; (2) underwent abdominal enhanced DECT examination within about 1 week before surgery; and (3) with MSI information tested by IHC staining in pathological report. Exclusion criteria are (1) with any local or systematic anticancer therapy (radiotherapy, chemotherapy, and biotherapy) before CT imaging; (2) without available digital imaging data and communications in medicine (DICOM) files in our system; (3) without available or complete clinical data; and (4) with invisible target lesion on CT images. According to the outcomes of MSI testing in the pathological report, we collected 653 CRC patients including 34 MSI CRC patients (incidence rate of 5.2%) and 619 MSS CRC patients. For further statistical analysis, 34 MSI CRC patients (23 scanned on Revolution CT and 11 scanned on Discovery CT 750HD) and 68 controls with MSS CRC (46 scanned on Revolution CT and 22 scanned on Discovery CT 750HD) in a 1:2 ratio (15) (randomly selected from 619 MSS CRC patients) were ultimately included in our study (61 males and 41 females; age: 63.82 ± 11.51 years; range 26-87 years). The flowchart of patient selection process is shown in Figure 1. The demographics of CRC patients is listed in Table 1. The clinical data of all CRC patients including age, gender, carcinoembryonic antigen (CEA) (normal level, 0-5 U/ml), carbohydrate antigen 19-9 (CA19-9) (normal level, 0-27 U/ml), alcohol history, smoking history, hypertension history, diabetes history, and family history of cancer were recorded. The included CRC patients were divided into two independent cohorts: (1) primary cohort: CRC patients examined on the Revolution CT scanner were used to establish a radiomics model to predict MSI status (70% samples were randomly selected as the training set, and the remaining samples were used to validate); and (2) testing cohort: CRC patients examined on the Discovery CT 750HD scanner were used to test the predictive model.

Microsatellite Instability Status Assessment
The MSI status was assessed by IHC staining of MMR proteins (MLH1, MSH2, PMS2, and MSH6). IHC staining was routinely performed based on postoperative tissues via standard streptavidin biotin-peroxidase procedure. According to the staining results of MMR proteins, patients were classified into the MSI or MSS group. CRC tissues with at least one of four negatively stained MMR proteins were defined as MSI CRC; others with four positively stained proteins were defined as MSS CRC (2).

Iodine-Based Material Decomposition Image Acquisition and Analysis
The abdominal DECT scans were performed on a Revolution CT scanner or Discovery CT 750HD scanner (GE Healthcare) in supine position. The non-enhanced abdominal CT scan was performed first with the conventional CT protocol of using the tube voltage of 120 kVp. The contrast-enhanced CT scans were performed using the dual-energy spectral CT scanning mode using the following scan parameters: helical, rapid switch administered via the antecubital vein at an injection rate of 3 ml/s. The arterial phase, venous phase, and delayed phase scans were obtained after 30, 60, and 120 s following the administration of contrast agents. The CT scans covered the abdomen and pelvis from the dome of diaphragm to pubic symphysis. After CT scans, the iodine-based MD images in the venous phase were reconstructed at 1.25-mm image slice thickness and interval using the Gemstone Spectral Imaging (GSI) software on an advanced workstation 4.6 (AW 4.6; GE Healthcare). Image analysis was performed by an abdominal radiologist with 3 years of experience and independently verified by another trained radiologist with 5 years of experience to reduce possible bias. Their discrepant interpretations were resolved via consultation. These observers were blinded to all clinical and pathological information of CRC patients. The following data extracted from CT images were analyzed and recorded: (a) tumor size, defined as the maximum axial diameter of tumors on images; (b) tumor location, subclassified as right colon, left colon, and rectum; (c) CTreported serous invasion, defined as irregular projections from the serosal surface, and/or clouding of the pericolic fat, and/or loss of the normal fat planes, and/or thickened contiguous fascial reflections; (d) CT-reported lymph node invasion, defined as enlarged lymph node (short-axis diameter > 1 cm), and/or clustered at least three lymph nodes (16).

Tumor Segmentation and Radiomics Feature Extraction
The ROI was placed by two experienced abdominal radiologists independently. Radiologist 1 (with 5 years of experience) performed the segmentation of all patients twice with a 6month interval. Radiologist 2 (with 3 years of experience) performed the segmentation of all patients once. From the iodine-based MD images of venous phase, the two radiologists selected the slice with the largest axial diameter of CRC tumor and its adjacent upper and lower slices. Then, they manually outlined the boundary of the visible tumor on the selected slices via an open-source software ITK-SNAP (version 3.6.0) (17). The ROIs were required to include the area of necrosis and bleeding within the tumor and excluded perienteric fat and intestinal contents. To correct for acquisition-related differences of differing voxel resolutions in the two different CT scanners, voxel dimensions (mm) of each iodine-based MD image dataset were isotropically resampled to a common voxel spacing 0.5 × 0.5 × 0.5 mm 3 (x, y, z) via linear interpolation algorithm (18,19). Next, a total of 606 radiomics features for each CRC patient were extracted via Artificial Intelligent Kit (GE Healthcare) in concordance with the reference manual by the "Image Biomarker Standardization Initiative." These features were divided into four groups: (1) first-order histogram features (n = 42); (2) second-order texture features: gray level co-occurrence matrix (GLCM) (n = 240), Haralick features (n = 10); (3) grey-level zone size matrix (GLZSM) (n = 11); and (4) Gaussian transform (n = 303). The inter-user variability for radiologist 1 and intra-user variability between radiologist 1 and radiologist 2 in tumor segmentation were analyzed via intraclass correlation coefficient (ICC) method [type: single rater; definition: absolute agreement; model: interuser ICC: two-way random effects; intra-user ICC: two-way mixed effects (20)]. Details of radiomics features are described in Figure 2. The formulas of radiomics parameters are shown in Supplementary I.

Feature Selection and Prediction Model Building
The dimensionality reduction of all features including clinical, CT-reported, and radiomics features was performed based on training dataset for further analysis. First, the radiomics features with both inter-user and intra-user stability (with ICC values >0.90) were selected via ICC analysis. Second, the continuous features with significant differences (p < 0.05) between MSI and MSS groups were selected by Student's t-test (for normally distributed data) or Mann-Whitney U-test (for non-normally distributed data). Then, the categorical features (gender, CEA, CA19-9, alcohol history, smoking history, hypertension history, diabetes history, family history of cancer, tumor location, CT-reported serous invasion, and CT-reported lymph node invasion) were encoded by using one-hot encoding. Onehot encoding uses N-bit state registers to encode N status, each of which has its own register bits, and at any time, only one of them is valid. One-hot encoding can convert the category variables into a form readily available to machine learning algorithms (21). For example, the "CT-reported tumor location" has three status, right colon, left colon, and rectum, which were coded as "1, 2, 3" in our study first. Then we used one-hot encoding method to encode right colon, left colon, and rectum as 100, 010, 001, respectively. Onehot encoding method can ensure that "1, 2, 3" represents  the tumor location instead of the true value of 1, 2, or 3. Third, Spearman's rank correlation test was performed for each feature. Radiomics features with correlation coefficient ≥0.9 were selected into the following steps and then transferred with minmax standardization, whereas all features were normalized to a range of 0 to 1. Finally, the least absolute shrinkage and selection operator (LASSO) selection method was further used to identify the top-ranked and most valuable features to build the predictive model.
The selected features were applied to construct multiparameter logistic regression model to predict MSI status. The 5-fold cross-validation technique was used for model selection. The data in training set were divided into five subsets equally. Then, four subsets were selected each time to train, and the remaining one subset was used to test. By changing the subtest set in turn, five loss function values (L (w)) during the above five models would be obtained. The average value of L (w) was calculated. When the average L (w) reached a minimum value, the optimization of the logistic regression model would be completed, and the final model would be constructed. The details of L (w) are shown in Supplementary II. A nomogram based on clinical factors and radiomics score was generated via the proposed logistic regression model. The probability of MSI status defined as a nomogram score can be calculated for each patient by using the developed nomogram. The data from the Revolution CT equipment were used to establish and validate the radiomics model, and the data from the Discovery CT 750HD equipment were used to test the radiomics model.

Radiomics Model Evaluation
The discrimination performance was accessed by using receiver operating characteristic (ROC) curve analysis. The area under the ROC curve (AUC), accuracy, sensitivity, and specificity was calculated. DeLong's test was used to compare the statistically difference between AUCs. The calibration performance was tested by using the calibration curve accompanied by the Hosmer-Lemeshow test (H-L test). The calibration curves measure the consistency between the predicted MSI status probability and the actual MSI status probability. The H-L test assesses the goodness of fit of the prediction models. The clinical utility of radiomics model was assessed by using decision curve analysis. For decision curve, the horizontal axis indicates the threshold probability with a range of 0.0 to 1.0. The vertical axis indicates the clinical net benefit values. There are two reference lines defined under the assumption that all patients are diagnosed to be either MSI or MSS. A larger area under the decision curve suggests a better clinical utility. All statistical analyses were conducted with R software (version 3.6.0; https://www. r-project.org/). The workflow of radiomics analysis is shown in Figure 3.

Inter-user and Intra-user Variabilities
The stable radiomics features (with ICC values >0.90) were as follows: 503 features between the two sets of measurements for FIGURE 4 | The heat map gives a good visual representation of the feature correlation. The x axis and y axis indicate features. Color scale on the right side displays the absolute value of the correlation coefficient (higher from 0 to 1, and from blue to yellow). radiologist 1, 568 features between the first measurement of radiologist 1 and radiologist 2, and 430 features between the second measurement of radiologist 1 and radiologist 2. Finally, 429 features were considered stable with both inter-user and intra-user stability. These 429 features obtained by radiologist 1 in the first measurement were used for the next analysis.

Feature Selection and Radiomics Model Building
From a total of 429 radiomics features and 13 clinical or CT-reported features, the nine top-ranked features were finally selected for subsequent analysis: gender, smoking, family history of cancer, MaxIntensity, uniformity, GLCMEnergy_AllDirection_offset6_SD_Gaussian, GLCMEnergy_angle90_offset8_Gaussian, GLCMEntropy_ AllDirection_offset8_Gaussian, and HaralickCorrelation_ AllDirection_offset8_SD_Gaussian. The correlation heat map summarizes the correlations of features (Figure 4). Feature selection using the LASSO algorithm is shown in Figure 5. The nomogram based on clinical factors and radiomics score is shown in Figure 6.    Table 2). DeLong's test revealed that above AUCs had no significant difference, which suggested that there was no overfitting among training, validation, and testing sets:  Figure 7. Good calibrations of radiomics models for predicting MSI status in training, validation, and testing sets are shown in Figure 8. The H-L test was not significant (p > 0.05), demonstrating a good fit (training set: p = 0.462; validation set: p = 0.785; testing set: p = 0.568). The decision curves for radiomics models in training, validation, and testing sets (with net benefit of 17.44, 15.40, and 13.43, respectively) are presented in Figure 9.

DISCUSSION
In our study, we established a radiomics model based on iodinebased MD images to predict MSI status in CRC patients before surgery. We achieved a good diagnostic performance based on data from the Revolution CT equipment in both the training set (AUC, 0.961) and validation set (AUC, 0.918). And this radiomics model was also suitable for the iodine-based MD images acquired on another CT equipment (Discovery CT 750HD) although with slightly lower diagnostic performance (AUC, 0.875).
Iodine-based MD images can quantitatively reflect the vascularization of tissues. A clear relationship between blood supply and IC values calculated from iodine-based MD images has been confirmed; the richer blood supply is accompanied   by a higher IC value (8,22,23). Our previous study has found that IC value of MSI CRC was significantly lower than that of MSS CRC and demonstrated the clinical value of using the IC values to distinguish MSI status with limited diagnostic performance (10). The measurement of IC in the previous study only reflected the average IC value in ROIs, and more information (such as tumor heterogeneity) was not evaluated. Therefore, radiomics approach, which can provide a wealth of complementary information of the images, should further extend our knowledge and improve the diagnosis (24)(25)(26). In our study, six radiomics features were finally selected as the most closely related features to the MSI status. For iodine-based MD images, the MaxIntensity generally represents the most abundant blood supply value within the predefined ROI. This may be explained by the biological characteristics of MSI CRC with less angiogenic phenotype confirmed by previous investigations (11,27). The uniformity is a measure of the sum of the squares of each intensity value. From the perspective of image smoothness, the higher the intensity value, the higher the uniformity of the image. GLCMEnergy_AllDirection_offset6_SD_Gaussian and GLCMEnergy_angle90_offset8_Gaussian describe the uniformity of the intensity level distribution. GLCMEntropy_AllDirection_offset8_Gaussian describes the randomness of image values. It mainly calculates the average amount of information to encode image values. HaralickCorrelation_AllDirection_offset8_SD_Gaussian measures the linear dependency of grey levels of neighboring pixels; in other words, it measures the similarity of the grey levels in neighboring pixels and tells how correlated a pixel is to its neighbor over the whole image (28,29). They have all served as recognized parameters to reflect tumor heterogeneity. We reviewed the biological differences between MSI and MSS tumors and tried to explain the imaging heterogeneity observed in this study. De Smedt et al. suggested that the morphological heterogeneity was the most striking feature to distinguish MSI from MSS CRC. Histologically, MSI CRC is often more inclined to present with a mixed morphological patterns including glandular, mucinous, and solid content, which caused the tumor heterogeneity (30). In addition, the higher incidence of internal heterogeneity in MSI CRC may also be explained by a higher density of tumor-infiltrating lymphocytes and a lower cell proliferation rate than MSS CRC (31,32). Our results that the imaging heterogeneity was a biomarker for MSI tumors were consistent with those of previous studies (14). During radiomics analysis, integrating diverse clinical features plays an important role in improving the performance of the diagnostic model. We recorded the clinical features and CT reported features, which were discrete data except for age and tumor size. We used the one-hot encoding to process category variables, with the main benefits of one-hot encoding as follows: (1) to solve the problem that the classifier is not good at processing category data; (2) to a certain extent also play a role in expanding features; and (3) to choose the most representative new features. In our study, we found that the gender, smoking, and family history of cancer were closely related with the MSI status in CRC patients, and further explorations were required based on larger samples (13,33).
Radiomics analysis is a promising method to unveil large amount of tumor features hidden in medical images. However, previous studies have reported that the repeatability of radiomics features can be influenced by different CT scanners (34). Our study included data from two different DECT scanners including Revolution CT and Discovery CT 750HD. We first used the data obtained from the Revolution CT scanner to establish a model for preoperatively predicting MSI status; the AUCs of training and validation sets were 0.961 and 0.918, respectively. Subsequently, we analyzed whether this radiomics model was suitable for another DECT scanner (Discovery CT 750HD), and we found that the performance was good with AUC of 0.875. Our results suggested that the radiomics model established in this study was applicable to both Revolution CT and Discovery CT 750HD, and this might be attributed to the stability of the iodine quantification, and that there is little effect of various DECT scanners and acquisition parameters on iodine density (35). Hence, further studies are recommended to focus on the radiomics analysis of iodine-based MD images with DECT imaging.
Our study has several limitations. First, the study was retrospective and may result in inherent biases. Second, although IHC test is a reliable way to assess MSI status, the PCR should still be recommended. Third, only a handful of patients were analyzed owing to the low incidence rate of MSI in CRC patients. Further studies are required using a larger sample. Fourth, only three slices of CT images were analyzed, and we plan to compare the performance of using three slices and whole tumors in future investigations. Fifth, some discrepancies caused by manually outlined ROIs are unavoidable, even though we had made efforts to minimize the bias by using two trained radiologists. Sixth, our data were only from a single center. In the future, we will try to collect multicenter data to reinforce the conclusions of our study.
In conclusion, radiomics analysis based on iodine-based MD images with DECT imaging can provide a relatively high diagnostic value for predicting MSI status in CRC patients. This study provides insight into the potential applications of using radiomics analysis of iodine-based MD images produced via DECT in predicting MSI status, and its usefulness for preoperatively providing more information in CRC clinical outcome and treatment decision making.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The First Affiliated Hospital of Dalian Medical University. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
Data analysis and interpretation, study design, manuscript writing, and manuscript approval were performed by JW, QZ, and AL, and they are accountable for all aspects of the work. CT data and pathological data analysis and interpretation, statistical analysis, and manuscript approval were performed by YZ. CT data analysis and manuscript approval were performed by YL and AC. Statistical analysis and manuscript approval were performed by XL, TW, JL, and YG.

FUNDING
This work was supported by the Program for Training Capital Science and Technology Leading Talents (grant number Z181100006318003). The funding body contributes to the design of the study and analysis of data.