Deep Learning vs. Radiomics for Predicting Axillary Lymph Node Metastasis of Breast Cancer Using Ultrasound Images: Don't Forget the Peritumoral Region

Objective: Axillary lymph node (ALN) metastasis status is important in guiding treatment in breast cancer. The aims were to assess how deep convolutional neural network (CNN) performed compared with radiomics analysis in predicting ALN metastasis using breast ultrasound, and to investigate the value of both intratumoral and peritumoral regions in ALN metastasis prediction. Methods: We retrospectively enrolled 479 breast cancer patients with 2,395 breast ultrasound images. Based on the intratumoral, peritumoral, and combined intra- and peritumoral regions, three CNNs were built using DenseNet, and three radiomics models were built using random forest, respectively. By combining the molecular subtype, another three CNNs and three radiomics models were built. All models were built on training cohort (343 patients 1,715 images) and evaluated on testing cohort (136 patients 680 images) with ROC analysis. Another prospective cohort of 16 patients was enrolled to further test the models. Results: AUCs of image-only CNNs in both training/testing cohorts were 0.957/0.912 for combined region, 0.944/0.775 for peritumoral region, and 0.937/0.748 for intratumoral region, which were numerically higher than their corresponding radiomics models with AUCs of 0.940/0.886, 0.920/0.724, and 0.913/0.693. The overall performance of image-molecular CNNs in terms of AUCs on training/testing cohorts slightly increased to 0.962/0.933, 0.951/0.813, and 0.931/0.794, respectively. AUCs of both CNNs and radiomics models built on combined region were significantly better than those on either intratumoral or peritumoral region on the testing cohort (p < 0.05). In the prospective study, the CNN model built on combined region achieved the highest AUC of 0.95 among all image-only models. Conclusions: CNNs showed numerically better overall performance compared with radiomics models in predicting ALN metastasis in breast cancer. For both CNNs and radiomics models, combining intratumoral, and peritumoral regions achieved significantly better performance.

Results: AUCs of image-only CNNs in both training/testing cohorts were 0.957/0.912 for combined region, 0.944/0.775 for peritumoral region, and 0.937/0.748 for intratumoral region, which were numerically higher than their corresponding radiomics models with AUCs of 0.940/0.886, 0.920/0.724, and 0.913/0.693. The overall performance of image-molecular CNNs in terms of AUCs on training/testing cohorts slightly increased to 0.962/0.933, 0.951/0.813, and 0.931/0.794, respectively. AUCs of both CNNs and radiomics models built on combined region were significantly better than those on either intratumoral or peritumoral region on the testing cohort (p < 0.05). In the prospective study, the CNN model built on combined region achieved the highest AUC of 0.95 among all image-only models.

INTRODUCTION
Breast cancer is the leading malignancy in females (1). Axillary lymph node (ALN) metastasis status is one of the most important factors in guiding treatment decision making in breast cancer (2). Traditionally, the nodal status was assessed by surgical methods such as sentinel lymph node biopsy (SLNB) and axillary lymph node dissection (ALND) (3). According to the guideline from American Society of Clinical Oncology, SLNB is considered to have a high overall accuracy ranging from 93 to 97.6% with a relatively low false negative rate (FNR) ranging from 4.6 to 16.7% in detecting axillary metastasis (4). However, these surgical approaches have been considered controversial due to the invasiveness, potential complications, and possible overtreatment (3)(4)(5)(6).
Ultrasound is a widely-used tool in breast cancer assessment as it is non-invasive, radiation-free, real-time and well-tolerated in women. Previous studies have shown that axillary ultrasound (AUS) may provide useful information relevant to ALN status in breast cancer (7). However, AUS alone has moderate sensitivity and may not be a reliable predictor for nodal metastasis (7,8).
Recently, imaging-based machine learning approaches have been demonstrated promising in cancer diagnosis. There are two most popular machine learning approaches: radiomics analysis and convolutional neural networks (CNN). Radiomics analysis relies on a pipeline including extraction of numerous handcrafted imaging features, followed by feature selection and machine learning-based classification. Handcrafted radiomics features extracted from the breast tumor area have been demonstrated predictive in ALN metastasis, with FNRs ranging from 13.9 to 25% (9,10). However, handcrafted features are limited to the current knowledge of medical imaging, which may limit the potential of the predictive model. Deep learning improves this handcrafted pipeline by automatically learning discriminative features directly from images. Recent studies have shown that deep CNN-based approaches can achieve state-of-the-art performance in lesion detection and cancer diagnosis (11)(12)(13). To our knowledge, no studies have assessed breast ultrasound-based CNN in predicting ALN status for breast tumor.
Most studies have focused on mining predictive imaging features within the tumor, while the surrounding tissues were ignored. Previous evidence has shown that the peritumoral region-the tumor-adjacent parenchyma immediately surrounding the tumor mass-may offer valuable outcomeassociated information (14)(15)(16). Two recent studies have demonstrated that handcrafted imaging features from peritumoral region in Dynamic Contrast-Enhanced MRI (DCE-MRI) are associated with sentinel lymph node metastasis (9) and pathological complete response to neoadjuvant chemotherapy (17) in breast cancer. Here, we hypothesize that deep CNN built based on intra-and peritumoral regions in breast ultrasound could provide relevant information in predicting ALN status. We are interested in comparing the performance of deep CNNs and radiomics models. Additionally, breast cancer can be classified into different molecular subtypes with distinct prognosis and respond differently to specific therapies (18). Therefore, we further assessed if deep CNNs or radiomics models combining imaging features and molecular subtypes could offer improved accuracy.
In this hypothesis-driven study, we first developed deep CNNs and radiomics models based on intratumoral, peritumoral, and combined regions in breast ultrasound images for predicting ALN metastasis. We then aimed to find out how on each region deep CNNs performed compared with radiomics models.

Study Population
The study was approved by the Ethics Committee of Peking University Shenzhen Hospital (PUSH). Informed consent was waved from all patients by the ethics committee of PUSH. From the pathology and radiology databases in PUSH, a retrospective search was performed to recruit female patients with breast cancer between January 2016 and December 2018. The inclusion criteria were patients (1) with histologically-confirmed primary breast cancer, (2) with pretreatment breast ultrasound images, (3) with known ALN metastasis status determined by final histopathology, (4) with known molecular subtypes, and (5) without neoadjuvant chemotherapy prior to SLNB or ALND. The exclusion rules were that patients (1) with very small region of interest in the ultrasound images (<100 pixels) and (2) without SLNB or ALND. Finally, 479 patients with 479 breast tumors (136 positive and 343 negative ALNs) were included in this study. This cohort was randomly divided into a training cohort of 359 patients and a testing cohort of 120 patients at a ratio 3:1. The patient recruitment pathway was shown in Figure S1.
The baseline clinical and histopathological data were derived from patient medical records, including age, histological grade, immunohistochemistry (IHC) results and ALN status (positive or negative). According to the 2017 St Gallen International Expert Consensus, each patient was classified into one of four molecular subtypes: human epidermal growth factor receptor-2 (HER2) positive, triple-negative, Luminal A, and Luminal B (18). The status of HER2, ER, progesterone receptors (PR) and Ki-67 was assessed by IHC. Based on the IHC results, the subtype can be determined.

Ultrasound Image Acquisition
The breast ultrasound examinations were performed by breast radiologists in our center using the Hitachi Ascendus ultrasound system equipped with 13-3 MHz linear array transducers. The examinations and assessments were conducted according to the 5th edition of Breast Imaging Reporting and Data System (BI-RADS) presented by American College of Radiology (ACR) (19). The parameters were set as follows: depth, 4-5 cm; brightness gain, 10-25 dB; dynamic range, 70 dB; frame rate, 26 frame per second. Patients were placed in supine or lateral position. The field of view was set to have the pectoralis muscle at the deepest aspect of the image. The focal zone was adjusted to be centered at the lesion. Ultrasound images were acquired and documented into the Picture Archiving and Communication Systems (PACS). For each lesion, five images were selected from PACS by a breast radiologist (XL with 5 years' experience in breast radiology) and used in our study according to the following scheme: (1) an image along the longest axis of lesion. (2) an image orthogonal to the first image. (3) three images at other angles where the lesion was clearly presented. The five images together represented the ultrasonographic features of a 3D lesion from different angels. For all 479 patients, we finally obtained 2395 images in total, including 1715 images (343 patients) in the training cohort and 680 images (136 patients) in the testing cohort.

ROI Delineation
The tumor region in each ultrasound image was manually delineated using the ITK-SNAP software (http://www.itksnap. org) by one radiologist (XL) who were blinded to the clinical and histopathological data of patients. A second breast radiologist (DS with 12 years' experience in breast radiology) reviewed all the delineations. Any disagreement between the two raters was resolved by discussion and consensus. The peritumoral regions were obtained by dilating the delineated tumor contour by ∼5 mm based on a standard morphological dilation operation using an inhouse program implemented in Matlab 2016b (MathWorks, Natick, MA). For each ultrasound slice, three region of interest (ROI) images were finally obtained: the intratumor ROI, the peritumor ROI, and the combined ROI that merged the intratumor and the peritumoral regions. Examples of ultrasound slices overlapped with intratumoral and peritumoral ROIs for two patients were shown in Figure 1.

Deep Learning With DenseNet
Deep CNN can automatically learn discriminative features from imaging data by stacking multiple convolutional layers. Among different CNN variants, densely connected convolutional network (DenseNet) has shown superior classification performance as it strengthens feature propagation while reduces parameter number (20). This is accomplished by connecting each layer to every other layer in a feed-forward fashion with less computational complexity. Here, our model was built based on the standard DenseNet-121 (20). All ROI images were resized into 224 × 224. The resized ROI images were used as input and transformed through the chained convolutional layers, yielding a class probability vector as the prediction results. The network was trained from scratch with cross entropy loss function and Adam optimizer with a learning rate of 0.0001, a batch size of 16, and a regularization weight of 0.0001. In the training cohort, data augmentation approaches including random rotation, random shear and random zoom were employed before the training procedure to avoid possible overfitting. The network was implemented on Keras (https:// keras.io/) with the TensorFlow library as the backend (https:// www.tensorflow.org/). The architecture of the image-only CNN network was shown in Figure 2. The details of the convolutional network implementation can be found in Table S1.

Deep Learning-Based Predictive Model Building
For predicting the nodal status, three image-only CNN models, including the intratumoral CNN, the peritumoral CNN and the combined-region CNN, were built with the DenseNet based on the intratumor ROI images, the peritumor ROI images, and the combined ROI images, respectively. Furthermore, three corresponding image-molecular models were also built based on the DenseNet by using both ROI images and molecular subtype information as the network input. Specifically, the molecular subtype information was incorporated as input to the fullyconnected layers of the DenseNet, as shown in Figure 2.

Radiomics Feature Extraction and Selection
For each ultrasound slice, 104 radiomics features were extracted from each of the three ROI areas by using an open-source toolbox named Pyradiomics (https://pyradiomics.readthedocs. io) (21). Three groups of features were extracted, including shape features, intensity features, and texture features, as summarized in Table S2. Eleven shape features describing the geometric characteristics of the ROI were extracted. Eighteen intensity features describing the first-order distribution of the ROI intensities were extracted. Seventy-five texture features were computed to describe the patterns, or the high-order distributions of the ROI intensities with five different methods, including the gray-level co-occurrence matrix (GLCM), graylevel run length matrix (GLRLM), gray level size zone matrix (GLSZM), gray level dependence matrix (GLDM), and neighborhood gray-tone difference matrix (NGTDM). The detailed definitions of the radiomics features used can be found in two articles (22,23). Having high-dimensional radiomics features, feature selection is required to reduce the dimension and avoid overfitting. Here an efficient machine learning-based wrapper algorithm, Boruta, was used to select a subset of features that were relevant to the prediction outcome (24). Boruta evaluated feature relevance iteratively by comparing the importance of original features with that achieved by artificially added random features, yielding an all-relevant subset of features that was considered optimal for the classification task. Here we used the R package Boruta for Boruta feature selection.

Radiomics-Based Predictive Model Building
Based on the selected radiomics features, three image-only radiomics models were built using random forest algorithm (25) based on the intratumor ROI, the peritumor ROI, and the combined ROI, respectively. Correspondingly, three imagemolecular radiomics models were also built using random forest by integrating ROI images and molecular subtype information as the input. After testing different settings, the tree number of all random forest classifiers was set to 300. Gini index was used as importance measure (26). The R package randomForest was used for random forest classification.

Statistical Analysis
The difference in age, histological grades and molecular subtypes between training and testing cohorts was assessed with χ 2 test or Wilcoxon rank-sum test, where appropriate.
All 12 prediction models (3 image-only CNNs, 3 imageonly radiomics models, 3 image-molecular CNNs and 3 image-molecular radiomics models) were trained on the training cohort and evaluated on the testing cohort. Because each tumor had five ultrasound images, there were five corresponding prediction outcomes in the form of class probabilities. Among them, the median probability was chosen as the final prediction of each tumor and was used for statistical analysis. The prediction performance was assessed by the area under the receiver operating characteristic (ROC) curve (AUC), accuracy (ACC), sensitivity (SEN), specificity (SPE), positive predictive value (PPV), and negative predictive value (NPV). The AUCs between two models were statistically compared using a DeLong test (27). All statistical analyses were performed with R software, version 3.5.1 (https://www.rproject.org/). All statistical tests were two sided, and p < 0.05 indicated significant. Table 1. No significant difference was found in patient age, histological grades, molecular subtypes and ALN status between the training and testing cohorts (p = 0.457 to 0.844).

Image-Only Deep CNNs vs. Radiomics Models
The predictive performance of the three image-only deep CNNs and the three image-only radiomics models in both training and testing cohorts is summarized in Table 2. Their ROC curves in both training and testing cohorts are shown in Figure 3, respectively. The radiomics feature selection results can be found in Table S3. Among all six image-only models, the combinedregion CNN achieved the best performance with a highest AUC of 0.912 and a highest accuracy of 89.3% in the testing cohort. In the testing cohort, the CNN built on each region performed better than the corresponding radiomics model built on the same region in terms of AUC and accuracy, but the differences of AUCs between the CNNs and their corresponding radiomics models were not statistically significant (Image-only CNN vs. Radiomics

Image-Molecular Deep CNNs vs. Radiomics Models
The performance of the three image-molecular CNNs and the three image-molecular radiomics models is summarized in Table 3. Their ROC curves in both training and testing cohorts are shown in Figure 4. From Tables 2, 3, it can be found that the overall performance of the image-molecular models was slightly higher than those of their corresponding image-only models in the testing cohort, but no significant AUC differences were found between them. Among all 12 predictive models built in our study, the image-molecular CNN model built based on the combinedregion achieved the best performance with a highest AUC of 0.933, a highest accuracy of 90.3% and a highest NPV of 0.958 in the testing cohort. All image-molecular CNNs achieved higher AUCs and higher accuracy than their corresponding radiomics models built based on the same tumoral region, but there were no significant differences between their AUCs (Image-molecular

Assessment of Peritumoral and Intratumoral Regions
The predictive value of different tumoral regions were assessed by comparing the models built with the same machine learning methods (CNN or radiomics). It was observed that for the image-only CNNs and image-only radiomics models, the AUCs of the peritumoral models were slightly higher than those of the intratumoral models in the testing cohort, and their AUC differences were not significant (Image-only Peritumoral vs. The image-only CNNs and image-only radiomics models built based on combined-region achieved higher AUCs than their corresponding models built based on either the intratumoral or peritumoral region in the testing cohort, where the AUC differences between them were significant (Image-only Combined-region vs.

Prospective Validation
To further validate the CNNs and radiomics models, we performed a validation study using a relatively small prospective cohort. From November 18 2019 to December 12 2019, 16 breast cancer patients (6 node positive and 10 node negative) with 80 breast ultrasound images (each had 5 images as described in section Ultrasound Image Acquisition) were finally enrolled for analysis. Age, grade, and node status were obtained for the 16 patients and were summarized in Table 1. All six imageonly prediction models were tested. As we did not obtain IHC results, the image-molecular models were not tested. The model performance in this prospective cohort was summarized in Table 4. The ROC curves of all tested models were shown in Figure S2. We observed that the CNN built on the combined region achieved the highest AUC of 0.95 and the highest accuracy of 81.3%, where two patients with positive node and one patient with negative node were misclassified. In general, CNNs outperformed radiomics models; prediction models built on combined region outperformed those built on either intratumor region or peritumor region only. The results were consistent with previous observation on the retrospective cohort.

DISCUSSION
The major findings of this study were that deep CNN, built by combining intratumoral and peritumoral regions in breast ultrasound images, outperformed radiomics models in predicting ALN metastasis. Although imaging-based machine learning approaches have been demonstrated useful in assessing breast cancers, few studies have been done on evaluating the value of intra-and peritumoral regions in metastasis prediction (9), and no studies have investigated how breast ultrasound-based deep CNNs performed compared with radiomics models. In this study, we first developed three types of CNN models based on intratumoral, peritumoral, and combined regions, respectively in ultrasound images for assessing the nodal  metastasis, and further compared the performance of the three CNNs with three radiomics models built based on the same regions in nodal metastasis prediction. Moreover, we evaluated if further benefit can be obtained by integrating ultrasound images and molecular subtype information into the predictive models. Note that besides a high AUC, a high NPV is also important as accurately identifying patients with negative nodes [∼65% in all breast cancer patients (28)] helps to avoid axillary overtreatment and reduce associated serious complications.
Identification of possible association between breast ultrasound features and ALN status has undoubtful clinical benefit. In clinical routine, the axilla can be staged clinically by palpation or surgically by SLNB or ALND. Although SLNB has less severe complications compared with ALND, it is not risk-free and SLNB-associated complications have been reported in large prospective trials (6). As palpation is inaccurate (29), AUS is performed to provide more relevant information. AUS alone has a reported sensitivity of 39-60%, specificity of 90-96%, PPV of 80-91%, and NPV of 75-83% (6,30,31). This implied that despite of an acceptable specificity above 90%, prior to surgery about 40-60% of nodal metastases cannot be found by AUS and about 20-25% of patients with a negative AUS have been assessed as modal metastases after surgery. In case of suspicious ALN, AUS alone or combined with ultrasound-guided needle biopsy is performed for axillary staging to select patients who would benefit from ALND. A recent meta-analysis has shown that the use of AUS in stratifying patients directly to fast-track ALND without SLNB leads to overtreatment in up to two-thirds of patients (32). These data indicated that AUS alone is not sufficiently accurate for axillary staging.
Recent studies have shown the value of radiomics features from primary lesion in predicting the lymph node metastasis for different cancer sites, e.g., CT radiomics features in colorectal cancer (33), MRI/CT radiomics features in bladder cancer (34,35) and esophageal cancer (36). For breast cancer, two recent studies have assessed the value of radiomics features extracted from the primary tumor region at DCE-MRI and diffusion-weighted MRI (DWI) in predicting sentinel lymph node metastasis, where the reported AUC, sensitivity and specificity ranging from 0.805 to 0.869, 0.700-0.778, and 0.747-861 respectively (9,10). In our study, we built three image-only radiomics models by using both peri-and intratumoral regions in multiple ultrasound slices per lesion. The combined-region radiomics model achieved an AUC of 0.886, a sensitivity of 87.5%   and a specificity of 81.8% on the testing cohort, which were comparable with the previous radiomics models built with MRI.
Although promising, an efficient radiomics analysis heavily relies on a handcrafted image processing pipeline comprising three tightly coupled steps: feature extraction, feature selection and machine learning model building. Small variations in each stage may affect the prediction accuracy and stability (37). Deep CNN improves this pipeline by automatically learning predictive features on its own and yields a class probability vector as output. Currently, CNN-based learning methods have achieved diagnostic accuracy levels in skin cancer (11) and retinal diseases (12,13), which have been unattainable via radiomics approaches. For breast cancer, a comparative study (38) demonstrated that CNN was superior to radiomic analysis in terms of a significantly higher AUC (0.88 vs. 0.81, p < 0.001) for classification of enhancing lesions as benign or malignant at MRI. Another comparative study in Kooi et al. (39) also demonstrated that CNN was superior to radiomics-based software in detection of mammographic breast lesions. In our study, all six CNNs (three image-only and three image-molecular) achieved higher AUC and accuracy than corresponding radiomics models built on the same regions on both training and testing cohorts. Note that in our results the differences between their AUCs (CNN vs. radiomics) were not significant (DeLong p > 0.05).
Most image analysis studies on breast cancer was focused on the intratumoral region. Evidences have demonstrated that imaging features of peritumoral regions can offer outcomerelated information. Several studies have demonstrated that the enhancement patterns of tumor-adjacent parenchyma in DCE-MRI were associated with chemotherapy response (14), local recurrence (15), and survival (16) in breast cancer. In a recent study (40) the grade of peritumoral edema identified in breast MRI has been independently associated with disease recurrence. In study by Zhou et al. (41), it was demonstrated that the peritumoral stiffness assessed by ultrasound elastography of malignant breast lesions was higher than that of benign lesions. A 2017 study (17) was the first attempt to extract radiomics features from both intratumoral and peritumoral regions in breast DCE-MRI, where the features successfully predicted the pathological complete response to neoadjuvant chemotherapy. A more recent 2019 study (9) for the first time demonstrated the feasibility of predicting sentinel lymph node metastasis by using intratumoral and peritumoral radiomics features in DCE-MRI, achieving an AUC of 0.806 and an NPV of 82.4% with radiomics features only. Our study has shown the value of peritumoral ultrasonographic CNN features in predicting nodal metastasis with an AUC of 0.775 and an NPV of 91.6%. By combining both intra-and peritumoral regions, the CNN achieved a significantly better AUC of 0.912 and an NPV of 94.4%. The FNRs of the image-only CNN model built by combining the intra-and peritumoral regions achieved 5.9, 9.3, and 10% in the training, testing, and prospective data sets,  The biological mechanism underlying the peritumoral imaging features and their association with clinical outcomes remains unclear. Many cancer studies have shown that biological changes in the tissue immediately surrounding the breast tumor mass might be potential predictive or prognostic markers, such as peritumor lymphovascular invasion (42,43), peritumoral lymphocytic infiltration (44), and peritumoral edema (45). In study by Zhao et al. (46) it was suggested that vascular endothelial growth factor (VEGF)-C/D induced peritumoral lymphangiogenesis may be one mechanism that leads to metastatic spread. In study by Wu et al. (16) the prognostic peritumoral features were associated with the tumor necrosis factor (TNF) signaling pathway that has been involved in oncogenic angiogenesis, invasion, and metastasis (47). Further studies are warranted to determine how the underlying biological changes were reflected by peritumor imaging features.
Our study has several limitations. The first limitation was the limited population size which may lead to bias. Larger patient population from more centers should be involved in future to improve the machined learning-based models. The population size of the prospective cohort is particularly small, where significant bias may occur. We will recruit more prospective data in future to further evaluate our methods in clinical practice. The second limitation was that all image data was obtained on the same type of ultrasound machine. In future we will evaluate our models on more heterogeneous image data acquired with different machines. Moreover, we built our CNNs and radiomics models using only ultrasound images and molecular subtypes. In future we will build more comprehensive models by incorporating more clinical and pathological data. Our future research also includes the exploring of biological mechanism underlying the association between intratumoral/peritumoral imaging features and nodal metastasis. We will also assess the possible incremental value of the tumoral ultrasonographic features over the AUS in axillary staging.
In conclusion, CNNs built on tumoral regions in ultrasound images allowed accurate prediction of ALN metastasis, which achieved higher AUC and NPV than radiomics models. Either CNNs or radiomic models built on peritumor regions performed slighter better than those built on intratumor regions, while combining both intra-and peritumoral regions achieved significantly better AUCs and higher NPVs. Further integrating the molecular subtype information into either CNNs or radiomics models can slightly benefit the performance.

DATA AVAILABILITY STATEMENT
To achieve repeatability, the data set of this study, including pretrained CNN models, imaging data of the prospective cohort, statistical analysis, and the Python implementation, was deposited into the Mendeley data library (https://data.mendeley. com/datasets/rc32mg38rb/draft?a=2333e5fd-e7b1-4603-b06e-b609d79bab11).

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee of Peking University Shenzhen Hospital. The ethics committee waived the requirement of written informed consent for participation.

AUTHOR CONTRIBUTIONS
DS, Z-CL, and DL conceived and designed the study. XL collected the clinical and image data and performed image pre-processing. QS, YZ, LL, and KY analyzed the image data and performed the statistical analysis. QS wrote the manuscript. All authors approved the final manuscript.