MRI-Based Radiomics Predicts Tumor Response to Neoadjuvant Chemoradiotherapy in Locally Advanced Rectal Cancer

Background: Conventional methods for predicting treatment response to neoadjuvant chemoradiotherapy (nCRT) in patients with locally advanced rectal cancer (LARC) are limited. Methods: This study retrospectively recruited 134 LARC patients who underwent standard nCRT followed by total mesorectal excision surgery in our institution. Based on pre-operative axial T2-weighted images, machine learning radiomics was performed. A receiver operating characteristic (ROC) curve was performed to test the efficiencies of the predictive model. Results: Among the 134 patients, 32 (23.9%) achieved pathological complete response (pCR), 69 (51.5%) achieved a good response, and 91 (67.9%) achieved down-staging. For prediction of pCR, good-response, and down-staging, the predictive model demonstrated high classification efficiencies, with an AUC value of 0.91 (95% CI: 0.83–0.98), 0.90 (95% CI: 0.83–0.97), and 0.93 (95% CI: 0.87–0.98), respectively. Conclusion: Our machine learning radiomics model showed promise for predicting response to nCRT in patients with LARC. Our predictive model based on the commonly used T2-weighted images on pelvic Magnetic Resonance Imaging (MRI) scans has the potential to be adapted in clinical practice. Novelty and Impact Statements: Methods for predicting the response of the locally advanced rectal cancer (LARC, T3-4, or N+) to neoadjuvant chemoradiotherapy (nCRT) is lacking. In the present study, we developed a new machine learning radiomics method based on T2-weighted images. As a non-invasive tool, this method facilitates prediction performance effectively. It achieves a satisfactory overall diagnostic accuracy for predicting of pCR, good response, and down-staging show an AUC of 0.908, 0.902, and 0.930 in LARC patients, respectively.


INTRODUCTION
Rectal cancer is a common malignancy worldwide, accounting for ∼30-50% of colorectal cancer (1,2). Moreover, in rectal cancer patients, lesions are usually located in middlelow rectum, which causes increased difficulty in treatment and worse prognosis, especially the locally advanced rectal cancer (LARC, T3-4 or N+) (3,4). Currently, neoadjuvant chemoradiotherapy (nCRT) followed by total mesorectal excision is the recommended treatment for LARC patients, especially those with lesions located in the middle-low rectum (5). The advantages of nCRT are usually significant (6,7). However, the response of LARC to nCRT varies widely, ranging from pathological complete response (pCR, ypT0N0M0) with no viable tumor cells left in the surgical specimen, to virtually no tumor regression at all (stable) or even tumor progression in a small group of patients (8,9). Among these patients, pCR is not only associated with favorable disease-free and overall survival (7,10), but also motivates the "watch-and-wait" treatment strategy, a non-operative option for patients achieving clinical complete response (11). Therefore, clinicians are motivated to identify ways to accurately predict patients' individual responses to nCRT.
Radiological examination has been considered to be one of the means most likely to accomplish this task (12). Among all modalities, Magnetic Resonance Imaging (MRI) is regarded as the most promising method because it uses no radiation, shows high soft tissue resolution, and has wide routine clinical application for evaluation of rectal cancer. Notably, some conventional and functional MRI methods have been reported to show some advantages in predicting tumor response to nCRT (13)(14)(15). Unfortunately, conventional MRI analysis remains limited when predict treatment response in individual patient using experience (16). There is a need to develop new methods.
Quantitative image data analysis, such as texture analysis and radiomics are procedures for converting clinical images into high-dimensional, exploitable, and quantitative imaging features by high-throughput extraction of data-characterization algorithms (17). In addition to clinical outcomes, the biomedical information contained in medical images, such as overall information about phenotype and microenvironment of the tumor, may be vitally important for evidence-based clinical Abbreviations: LARC, locally advanced rectal cancer; nCRT, neoadjuvant chemoradiotherapy; MRI, magnetic resonance imaging; RC, rectal cancer; CRC, colorectal cancer; pCR, pathological complete response; cCR, clinical complete response; AUC, area under curve; DWI, diffusion-weighted imaging; GR, good response; ROI, region of interest; ROC, receiver operating characteristic. decision support. In theory, all magnetic resonance images in different can be used as a source of analysis. In theory, for quantitative analysis, used features can be extracted from images of all modalities (12,16,(18)(19)(20)(21)(22). However, T2 weighted image is almost the most widely used one, when considering the wide availability of images which can be stably acquired based on different machines. Quantitative image data analysis methods have the potential to reveal such biomedical information, providing an opportunity to improve decision-support in oncology and non-invasively (17,23). The potential advantage of this kind of method has already been verified in colorectal cancer (24) and a variety of other cancers, including nasopharyngeal carcinoma (25), lung cancer (17), and breast cancer (26). Recently, some independent studies (12,(19)(20)(21)(22)24) reported that a multimodality MRI based radiomics model could predict RC tumor response to nCRT with an improved accuracy for pCR and good response prediction. However, due to the relatively small sample size, or the inclusion of multimodality images with other MRI sequences such as diffusion-weighted imaging, or the lack of integration of important relevant clinical pathological features, there is a need for improving accuracy of the prediction model.
In the present study, we retrospectively collected 134 consecutive surgically and pathologically confirmed LARC patients who received standard nCRT before surgery. We developed a machine learning radiomics model based on imaging data extracted from the T2-weighted images, and validated its prediction efficiency of treatment response to nCRT in patients with LARC.

Patients
This retrospective study was approved by our institutional review board (IRB No. 201610070). The written informed consents from patients were waived.
Medical data of consecutive biopsy-proven rectal adenocarcinoma patients with LARC treated with nCRT followed by total mesorectal excision between March 2009 and December 2017 in our institution were retrospectively analyzed. Complete clinical data, including MRI imaging of all patient's performed before radiotherapy, was analyzed. Details about the inclusion and exclusion criteria, clinical and pathological characteristics, and treatments information can be found in Supplementary Files. The patients recruiting process was shown in Figure 1.

Pathological Assessments of Tumor Samples
Each specimen was sampled and evaluated by two experienced dedicated gastrointestinal pathologists. The two pathologists were both blind to the MRI data and clinical data. Criteria for pCRT and non-pCRT were defined as described in previous reports (8). We also classified TRG 3-4 into the good response (GR) group, and TRG 0-2 into the non-GR group according to Dowrak/Rödel's system (27). Changes in TNM staging were recorded by comparing to cTNM before the surgery, and responses were classed as either down-staging or non-downstaging (stability and progression). Details can be found in the Supplementary Files.

MRI Image Acquisition
All patients underwent an MRI scan in our hospital with either a 1.5 Tesla (Siemens, Erlangen, Germany) or a 3.0 Telsa scanner (GE, Milwaukee, US), using a phased-array body coil, 3-10 days before the start of chemoradiation. To ensure MRI image quality, a quality assurance check was performed biweekly by a hospital radiological physicist and executed bimonthly by the Siemens or GE engineer, as appropriate, according to the maintenance rules for the MRI scanners in our institute. Axial T2-weighted (T2w fast spin echo sequence) images (T2WI) and T1-weighted (T1w spin echo sequence) images (T1WI) were acquired regularly. Subsequently, multiphase T1w images were obtained before and after contrast injection, using a spoiled gradient echo sequence (LAVA/VIBE sequence). Contrast injection and data acquisition were triggered simultaneously. Briefly, a total of four repetitions were acquired, including one before the contrast injection and three after the injection (at 28, 65, and 120 s). For contrast, generally 90-100 ml of the gadolinium-based contrast media dimeglumine gadopentetate (Magnevist; Schering Diagnostics AG, Berlin, Germany) was administrated intravenously at a rate of 2.5 ml/s through a high pressure injector (Optistar LE, Liebel-Flarisheim Company, OH, USA).
Since all patients had at least three kinds of MRI images (T1WI, T2WI, and enhanced T1WI), MRI images from these three serials were included in the present study.

MRI Image Analysis
All MRI images of each patient were evaluated independently by two experienced abdominal radiologists (reader 1. C.C with 7 years of experience; reader 2, L.X.Y with 15 years of experience), who were totally blinded to all medical information. Final disagreement was resolved in a panel format including two additional radiologists (L.W.H and Y.X.P). The location and boundary of the tumor were confirmed, tumor size, the distance from the lower edge of the tumor to the anal canal, and the MRIbased TNM stage were recorded. The findings were recorded by consensus.

Texture Analysis Feature Extraction
For each patient, an anonymized representative axial T2WI image in which the lesion had the largest cross-sectional area was selected and retrieved from Picture Archiving and Communication System (PACS, Carestrem, Canada) using Digital Imaging and Communications in Medicine (DICOM) Works software (version 1.3.5). Subsequently, each image was transferred to a personal computer and inputted into the texture analysis software (MaZda Version 4.6, Institute of Electronics, Technical University of Lodz, Poland) (28).
Briefly, the process of texture analysis feature extraction was conducted by 2-steps as follows: (a) selecting and retrieving the suitable MRI images, and then (b) outlining the tumors as the regions of interest (ROI), and extracting quantitative texture analysis features by using the texture analysis software (MaZda Version 4.6, Institute of Electronics, Technical University of Lodz, Poland).
Tumors were outlined as a region of interest (ROI) by performing MaZda on T2WI, while using all other image sequences (especially gadolinium-enhanced images) as references in cases where the margin of the rectal lesion was difficult to define on unenhanced images. Briefly, a ROI was delineated initially by following the tumor outline, with notation that fat and air outside the mass are not included. Then, the ROI was saved for subsequent texture analysis. Contouring was performed carefully to cover the maximum extent of the tumor without exceeding the lesion border, to avoid contamination from adjacent normal rectal tissues or the intestinal lumen. For each ROI, a total of 340 quantitative features were automatically generated using MaZda software, including a gray level histogram, gradient, run-length matrix, co-occurrence matrix, autoregressive model, and wavelet transform analysis according to the software settings.

Evaluation of the Reproducibility of Radiomics Feature Extraction by the Two Radiologists
The reproducibility assessment of the features extracted by the two radiologists from the independent segmentations of T2WI images of all patients was performed. The inter-observer (reader 1 v reader 2) and intra-observer (reader 1 twice) correlation coefficient values were evaluated. The final consistency is evaluated by the following criteria regarding the correlation coefficient values: <0.20 indicates poor reproducibility, 0.21-0.40 fair reproducibility, 0.40-0.60 moderate reproducibility, 0.61-0.80 good reproducibility, and 0.81-1.00 excellent reproducibility. Generally, a correlation coefficient >0.75 is regarded as being in good agreement.
For the Kappa consistency test, excellent, good, and poor agreement were defined as kappa values of >0.81, in the range of 0.61-0.80, and <0.60, respectively.
The Mann-Whitney U-test was used to compare the values of each feature between the two groups. An independent samples t-test or Kruskal-Wallis H test, where appropriate, was used to assess the differences between the features generated by reader 1 (first time) and those generated by reader 2, as well as between the features generated twice by reader 1.
Inter-observer and intra-observer reproducibility of texture feature extraction was initially analyzed with 50 randomly chosen images from all T2WI images selected for evaluation by the two radiologists (reader 1, and reader 2). To assess the intraobserver reproducibility, reader 1 repeated the generation of texture features twice within a 2-week period following the same procedure. Reader 1 completed the workflow for the remaining images.

Statistical Analysis, Features Selection, Signature Generation, and Prediction Model Building
All statistical analyses were conducted using IBM SPSS version 20.0.0 (IBM Corporation, Armonk, NY, USA). To test the difference between groups, the Wilcoxon rank-sum test was performed for the quantitative features, and the chi-square test or fisher's exact test was performed for the qualitative features.
All data processing, data reduction and feature selection, and model built were performed using MATLAB 2017a (The Mathworks, Inc., Natick, MA, USA). The least absolute shrinkage and selection operator (LASSO) method, was used to select the most useful predictive features from the primary data set, and a radiomics score (Rad-score) was calculated for each patient at the mean time as a linear combination of selected features that were weighted by their respective coefficients. Based on these selected features, another classification model was also constructed by Random forest (RF), and the RF-score was generated. Subsequently, a combined classification model was finally built by the support vector machine (SVM) method (SVMscore), based on Rad-score and RF-score in the previous step. On the basis of two SVM-scores obtained, calculated from TE and TRC features, respectively, a final classification model was generated by using the SVM method again (SVM-score-final). Through the above steps, a total of seven models were generated representing each classification task, considering there are two kinds of data (Texture analysis [TA] features, and Traditional radiological-clinicopathological [TRC] features) that were used to build the model. The seven models include three models generated from TA features (model based on Rad-score, RF model, and first-step SVM model), three models generated from TRC features (model based on Rad-score, RF model, and firststep SVM model), and one combined SVM model.
The basic idea of this algorithm is to consider LASSO and RF as weak regressors and combine them using SVM. For each type of data, i.e., texture feature, we first use LASSO to obtain the Rad-score, and use its side product, i.e., important features, as the feature set of RF to obtain the RF-score. Since Rad-score and RF-score are independently acquired by two different weak regressors, using SVM to regress them in a twodimensional plane achieves a better result than by them owns. Moreover, data sets Texture feature and Traditional radiologicalclinicopathological data are also independent to each other. So for the same reason, we use SVM to regress the scores from Texture feature and Traditional radiological-clinicopathological data to get the final regression score. Finally, the regressed scores can be binarized for further prediction.
To evaluate the performance of the models, all patients were divided into two cohorts: a training cohort and a validation cohort. The models were developed in the training cohort, and tested in the validation cohort. The classification efficiencies of each kind of model mentioned above, including the receiver operating characteristic (ROC) curves, both in the training and validation cohort were calculated. A P-value < 0.05 was considered statistically significant. Details of the flow chart for building the classification model are shown in Figure 2.

Patients Characteristics
There were134 patients enrolled in the present study. Patients' characteristics in the training and validation cohorts were summarized in Table 1. Patients were randomly allocated into training cohort and validation cohort in a 3:1 ratio for building the pCR predictive model, the down-staging model, and the good response model. No differences were found between the training and validation cohorts in any of the three models. In addition, the patients' clinicopathological data mostly consisting of laboratory data were also used in building the predictive models.

The Classification Model Building and Predicting Efficiency
From a total of 340 features that were extracted from T2-wighted images for each patient, a set of features with corresponding numbers were selected by LASSO and used to calculate the Radscores for the pCR, Good Response, and Down-staging models.

Predicting Pathological Complete Response (pCR)
On the basis of the selected 10 texture and 8 clinicopathological features, a predictive model was finally constructed with SVM method for pCR prediction. The SVM model yielded an AUC of 90.78% in the training cohort, and 87.45% in the validation cohort (Figure 3 and Supplementary Figure 1).

Predicting Good Response (GR)
The predictive model built based on the 10 texture features and 7 clinicopathological features achieved an AUC of 90.17% in the training cohort, and 89.72% in the validation cohort (Figure 3 and Supplementary Figure 1).

Predicting Down-Staging
The predictive model with 10 texture features and 7 clinicopathological features showed an AUC of 92.97% in the training cohort, and 89.20% in the validation cohort (Figure 3 and Supplementary Figure 1). Details about prediction efficiency of three kinds of models could be found in Table 2.
The correlation matrix of the selected features used in the three kinds of models was showed as Figure 4.

DISCUSSION
To the best of our knowledge, this was the first cohort studied to date utilizing monosequence-MRI-based machine learning radiomics to predict tumor response to neoadjuvant chemoradiation therapy in patients with locally advanced rectal cancer. Our predictive model constructed with both radiomics features and clinicopathological data achieved higher accuracies than previously reported in the literature, with an AUC of more than 90%. Substantial evidence from prior studies has demonstrated that a number of clinicopathologcial and radiological features may help to predict treatment response (16,18,29). Nevertheless, no single factor has stood out to be the most reliable way for clinicians to use in decision making process (6,16). It is important to distinguish the LARC patients who will likely to respond to nCRT from patients who would not. However, this has not been achieved yet. We introduced here a new imaging oriented strategy for a better prediction, which may have potential for clinical practice.
Our study is in general accord with prior research (19,30,31). Nie et al. (12) have reported a relatively satisfactory result by using a radiomics method, with an AUC of 0.84 for pCR and 0.89 for good response prediction. Most recently, Cui et al. (19) reported a further attempt on a bigger group LARC patients by similar methods, which show very high predictive efficiency with an AUC of 0.944. In addition, several LARC studies (20,21) also perform similar radiomics-based studies with good experimental results, using features extracted from multimodality MR images including T2WI. However, there were obvious advantages in our study when compared to these studies. First, we fully evaluated three aspects of the treatment response: not only pCR and good response, but also down-staging. Our study has the potential to provide more information on the tumor and treatment response. Second, the number of enrolled patients in our study (n = 134), was larger than that in the Nie's (n = 48), and comparable to Cui y's (n = 186), which can ensure the desired prediction results. Third, we also included conventional MRI findings and clinicopathological data which may further improve the prediction. Lastly, our radiomic features were extracted from only one sequence, i.e., the T2-weighted images, other than the multi-sequence MRI images used in previous studies. The T2 weighted images are commonly used in clinical practice, which is familiar to radiologists. In addition, it can be acquired easily and the images are quite stable in appearance, especially when compared with images obtained by special sequence, such as diffusion weighted images. Notably, diffusion weighted images are prone to distortion and susceptibility artifacts, which affect tumor segmentation and data extraction. Similarly, other sequences such as T1-weighted dynamic contrast enhanced images depend on the amount and distribution of the injected contrast-enhancing agent, which might be influenced by variable hemodynamic conditions in the patients. The exact reason for why quantitative MRI-based texture data appear to be able to predict treatment response is still largely unknown. In theory, the biological phenotype of tumors, including treatment response, is largely determined by their underlying molecular subtypes, whose manifestations may vary. One of the phenotypes may be radiological heterogeneity, including inter-and intra-tumor heterogeneity. A large body of literature indicates that texture based radiomic modeling can evaluate tumor heterogeneity, and can correlate radiological findings with underlying genomic and biological characteristics, including prognosis and treatment response (17,23). Our study may add into the literature in this regard as we have shown a predictive model for treatment response with high accuracy. From another perspective, the large amount of previous evidence (13,15), supporting using advanced MRI-based radiomic features to predict different responses to nCRT in patients with rectal cancer.
In addition, we introduced clinicopathological features into the prediction model, which may contribute significantly to the improvement of prediction efficiencies. These features may represent, to some extent, some of the intrinsic properties of the tumor (32,33). For example, the fecal occult blood test and red cell counts may indicate oxygen status of tumor. Neutrophil counts, Monocyte counts, globulin, or platelet counts, may actually reflect the immune status of LARC patients to some extent. The hypoxia and immune status of the tumor can influence tumor treatment response and mediate radiotherapy resistance (34,35). Pathology type and distance from the anal verge also influence the tumor response, as has been shown in previous studies (36,37). Our study results suggest that these clinicopathological data may play an important role in treatment response.
There were several limitations in our study. First, as a retrospective study, there may be a selection bias. Second, the sample size in our study was modest, which may affect the  accuracy and stability of the predictive models. Third, both the building and validation of the models were conducted in our institution with a single dataset. A multicenter prospective study might be helpful to further validate and optimize our prediction models. The texture features were extracted from the largest cross-sectional area of the tumor rather than from the entire tumor, which may raise questions as to whether these features were optimally representative of the characteristics of the entire tumor. Lastly, the MRI images used in the texture feature extraction were obtained from three different MRI scanners (Siemens and GE) in our hospital, and differences among the scanners may potentially influence the texture features and the subsequent model building. Future research is needed to standardize the signal intensity among different MRI scanners.

CONCLUSION
Our study showed a predictive model built with radiomic features and clinicopathological data was promising to predict tumor response to neoadjuvant chemoradiation in patients with locally advanced rectal cancer. In addition, our method developed with information from the clinically obtained T2weighted sequence may be used as a complimentary tool to assist clinical decision making. Nevertheless, future prospective multicenter studies with larger samples will be needed to validate our study result and to optimize the prediction models for clinical practice.