Prediction of nonsentinel lymph node metastasis in breast cancer patients based on machine learning

Background Develop the best machine learning (ML) model to predict nonsentinel lymph node metastases (NSLNM) in breast cancer patients. Methods From June 2016 to August 2022, 1005 breast cancer patients were included in this retrospective study. Univariate and multivariate analyses were performed using logistic regression. Six ML models were introduced, and their performance was compared. Results NSLNM occurred in 338 (33.6%) of 1005 patients. The best ML model was XGBoost, whose average area under the curve (AUC) based on 10-fold cross-verification was 0.722. It performed better than the nomogram, which was based on logistic regression (AUC: 0.764 vs. 0.706). Conclusions The ML model XGBoost can well predict NSLNM in breast cancer patients. Supplementary Information The online version contains supplementary material available at 10.1186/s12957-023-03109-3.


Introduction
Cancer and cardiovascular diseases are the two main causes of death across the world and seriously harm people's physical and mental health [1].According to data from the World Health Organization (WHO), the number of newly diagnosed cancers in 2020 totaled 19.29 million, of which 2.26 million were breast cancers, and approximately 685,000 died from breast cancer [2].Breast cancer leads the world in morbidity and mortality rates in most countries [2].At the same time, the treatment regimens of breast cancer are changing over time.In 1985, the results of the National Surgical Adjuvant Breast and Bowel Project (NSABP) B-06 study demonstrated that breast-conserving surgery combined with radiotherapy led to no significant difference in overall survival (OS) and disease-free survival (DFS) of patients with early breast cancer compared with mastectomy, which raised the proportion of breast cancer patients treated with breast-conserving surgery [3], and the safety of breast-conserving surgery was confirmed in the following 20 years of follow-up [4].In 2010, the results of the NSABP B-32 study showed that for malignant breast tumor patients with negative axillary lymph nodes, the success rate of axillary sentinel lymph node biopsy (SLNB) was 97.2%, and the falsenegative rate was only 9.8%.There were no significant differences in OS, DFS, or local recurrence rate (LRR) for patients with negative sentinel lymph nodes but without axillary lymph node dissection (ALND) compared with those who underwent ALND [5,6].The risk of lymphedema and reduced range of motion in the upper limbs associated with ALND is not negligible, and it seriously affects the quality of life of patients [7].The AMAROS study showed that early breast cancer patients who underwent SLNB combined with radiotherapy had similar axillary lymph node recurrence and DFS rates as those who underwent ALND, even if there were 1 or 2 sentinel lymph node metastases (SLNMs) [8].In 2015, the American College of Surgeons Oncology Group (ACOSOG) Z0011 study confirmed that SLNB combined with radiotherapy could exempt early breast cancer patients with 1 or 2 SLNMs from ALND [9,10], which further promotes the clinical application of SLNB.However, ALND is required for breast-conserving surgery patients with more than three sentinel lymph node metastases or total mastectomy patients with more than one sentinel lymph node metastasis.Studies have shown that 40-60% of breast cancer patients who undergo SLNB and further undergo ALND have no other lymph node metastases [11][12][13].With the progress of individualized treatment of breast cancer and patients' increasing demand for quality of life, axillary lymph node management is more inclined to include the evaluation of tumor staging and prognosis to accurately predict the risk of axillary lymph node metastasis, which can avoid surgical complications caused by overtreatment and thereby improve patients' quality of life.It can also help reduce the recurrence risk for breast cancer patients with nonsentinel lymph node metastases (NSLNMs) who undergo SLNB but not ALND.
In recent years, machine learning (ML) has been used to manage different medical problems, such as pathologic diagnosis and treatment support, and ML models constructed in previous studies not only have better model performance but also have higher prediction accuracy [14][15][16].Few models have been constructed to predict NSLNM.Guo Xu and his team constructed a deep learning model to predict NSLNM, but they failed to explain the impacts of different variables in their model [17].Yang, ZB et al. [18] developed a nomogram to predict NSLNM, which showed an area under the curve (AUC) of 0.718 in the training set and 0.742 in the validation set, but its performance had not been compared with that of ML models.
Lundberg et al. first conceived the SHapley Additive exPlanations (SHAP) framework, which has been applied to machine learning [19].It can assess the contributions of different features in different ML models, allowing the performance of each model to be reasonably compared [20].
The purpose of this study was to construct an optimal ML model to predict the NSLNM of breast cancer patients by using preoperative and intraoperative clinicopathological and imaging features and to choose the best model by using the SHAP framework.This study also compared its performance with that of a nomogram.

Patients
A total of 3658 malignant breast cancer patients undergoing surgery at Harbin Medical University Cancer Hospital from June 2016 to August 2022 were retrospectively enrolled.This study was approved by the Ethics Committee of Harbin Medical University Cancer Hospital.It conforms to the 1964 Helsinki Declaration of the World Medical Association and its subsequent revisions.Informed consent from our hospital was signed by each patient before receiving treatment.
The inclusion criteria were as follows: no other breast cancer treatment prior to breast surgery and SLNB and ALND performed during breast surgery.
Exclusion criteria are as follows: Patients who received neoadjuvant therapy before breast surgery in our hospital, patients who received SLNB without ALND or directly received ALND during breast surgery, the pathological type was ductal carcinoma in situ, a distant metastasis, and male breast cancer patient.
Finally, a total of 1005 breast cancer patients were included.Their details are shown in Fig. 1.

Axillary lymph node status management
Methylene blue dye was injected into the intradermal, subcutaneous, areolar, and glandular areas (peritumor, intratumor, subtumor) 10-15 min before breast cancer surgery (Jichuan Pharmaceutical Group, China), or a carbon nanosuspension was injected into the subserous membrane along the peritumoral site at 4-6 points (Chongqing Lemei Pharmaceutical, China) during surgery to facilitate the localization of SLNB.Sentinel and nonsentinel lymph nodes were evaluated in hematoxylin-eosin (HE)-stained sections that were fixed with 10% formalin and embedded in paraffin.After fixation, successive sections of the lymph nodes were obtained for definitive analysis of lymph node status.

Classification
An estrogen receptor (ER) immunohistochemical (IHC) detection degree of < 1% nuclear staining was interpreted as ER negativity, and an IHC-positive degree between 1 and 10% nuclear staining was interpreted as ER weak positivity.An IHC-positive degree of > 10% nuclear staining was interpreted as ER positivity [21].Progesterone receptor (PR) was negative if its IHC-positive degree was < 1%, weakly positive if its IHC-positive degree was between 1 and 20%, and positive if its IHC-positive degree was > 20% [22].A human epidermal growth factor receptor-2 (HER-2) IHC result of 0 was defined as HER-2 negative.Low HER-2 expression was defined as a HER-2 IHC result of 1+ or a HER-2 IHC result of 2+ along with negative fluorescence in situ hybridization (FISH).A HER-2 IHC result of 3+ or a HER-2 IHC result of 2+ with positive FISH was defined as HER-2 positivity [23].The Ki-67 expression level was divided into the Ki-67 ≤ 14% group and the Ki-67 > 14% group [24].According to the results of IHC, all patients were divided into luminal A, luminal B, triple-negative breast cancer (TNBC), and HER-2 overexpression groups [22].
According to current American Joint Committee on Cancer (AJCC) standards [25], single tumor cells or maximum tumor diameter < 2 mm in axillary lymph nodes was defined as node negative, and tumor diameter ≥ 2 mm was defined as node positive.Pathological lymph node staging (pN) was determined according to the number of positive axillary lymph nodes.The staging was as follows: pN0 meant no axillary lymph node metastasis, pN1 meant 1-3 axillary lymph node metastases, pN2 meant 4-9 axillary lymph node metastases, and pN3 meant more than 9 axillary lymph node metastases.
Since the patients included in this study were Chinese women with breast cancer, body mass index (BMI) was classified into different groups according to the standards of the Chinese Health Commission.BMI = weight (kg)/ height (m 2 ), and the underweight group was defined as BMI < 18.5 kg/m 2 .BMI between 18.5 and 23.9 kg/m 2 was the normal group, BMI between 24 and 27.9 kg/m 2 was the overweight group, and BMI ≥ 28 kg/m 2 was the obesity group.
Considering the small number of patients with other types of breast cancer included, the patients were divided into infiltrating ductal carcinoma and other types of carcinoma according to pathological type, including invasive lobular carcinoma (18 patients), invasive micropapillary carcinoma (3 patients), ductal carcinoma in situ with microinvasion (3 patients), and mucinous carcinoma (2 patients).

Data preprocessing and feature selection
The k-nearest neighbor imputer (KNNImputer) was used to supplement parameters with missing values less than 30% [26].Recursive feature elimination was applied to select the best variables (Fig. S1).The best number of variables was 12: age, BMI, pregnancy history, nipple retraction, single/multiple tumors, cT stage, blood flow signal of tumor, cN stage, ultrasound (US) BI-RADS
The LR model is usually applied to explore how characteristics influence binary variables [27].In the face of a regression or classification problem, the cost function is established, the optimal model parameters are iteratively solved by the optimization method, and then the quality of the resulting model is verified by testing.
SVM is applied to classify things with multidimensional attributes into two categories [28].It is a supervised learning model that is commonly used for pattern recognition, classification, and regression analysis.Based on structural risk minimization theory, it constructs the optimal hyperplane in the feature space so that the learner is globally optimized, and the expectation of the whole sample space satisfies a certain upper bound with a certain probability.
KNN is one of the most commonly used nonparametric classification techniques.Its working premise is that if most of the nearest k samples to a given sample belong to a certain class in the feature space, then they all must belong to the same class.The KNN method is only related to a very small number of adjacent samples in the class decision.Because the KNN method mainly depends on a few neighboring samples, rather than the method of discriminating the class domain to determine the category, the KNN method is more suitable for dividing the sample with more crossover or overlap of class domains [29].
Classifiers that use multiple trees to train and predict samples are called RF classifiers, which reduces training variance and improves integration and generalization capabilities [30].Its training can be highly parallelized, which has advantages for large-sample training speed in the era of big data.Since the decision tree nodes that divide the features can be randomly selected, the model can still be trained efficiently even when the sample feature dimension is very high.
The DT algorithm can be divided layer by layer according to the characteristics of the data until all the characteristics are divided, so it can be used to solve classification and regression problems [31].It is a kind of nonparametric supervised learning that is easy to understand, applicable to all kinds of data, and has good performance in solving various problems, especially various integrated algorithms with tree models as the core.It is widely used in various industries and fields.
XGBoost is an ML technique that can process missing data and build accurate prediction models from weak prediction models [32].It is good at capturing dependencies between complex data, can obtain effective models from large-scale datasets, and supports multiple systems and languages in practical terms.

Statistical methods
All patients were randomly divided into training and testing sets at a 7:3 ratio (Fig. 1).The ML prediction model was developed in the training set and optimized by using 10-fold cross-validation.The AUC, accuracy, recall rate, F1 value, and precision were used to evaluate the ability of each ML model.Brier scores were applied to evaluate the overall performance of the model [33].Pearson's χ 2 or Fisher's exact test was used for intergroup analysis.Univariate and multivariate analyses were performed using logistic regression.Based on multivariable logistic regression analysis, a nomogram was built, whose accuracy was determined by calculating its C-index.The internal verification was carried out by the bootstrap method, and the difference between the actual value and the predicted value obtained from the column chart was analyzed graphically.To more intuitively explain the optimal ML model, we introduce the SHAP framework, whose interpretability has been demonstrated in many cancers [18,[34][35][36].It can demonstrate the contributions of various variables in any ML model to the outcome event [20].All statistics were performed using Python 3.9 and R language 4.1.2.P< 0.05 was considered statistically significant.

Machine learning model construction and performance comparison
Twelve variables were selected to develop ML models.The relationships between different variables are shown in Fig. 2. Based on the above 12 variables, six ML models were developed on the training set, and learning curves showed that there was no overfitting of these six machine learning models (Fig. 3).Therefore, we further compared the performance of different ML models using the AUC value, accuracy, precision, F1 value, and Brier score.The results show that in the training set with 10-fold crossvalidation, the average AUC value of the XGBoost model was the largest (0.722, Fig. 4a), and its accuracy was the highest (0.673, Fig. 4b).Moreover, in both the training set and the test set, the AUC value of the XGBoost model was the largest, at 0.781 (Fig. 4c) and 0.764 (Fig. 4d), respectively.The Brier score was the smallest in    the training set and the second smallest in the test set, at 0.196 (Fig. 4e) and 0.191 (Fig. 4f ).In the test set, the accuracy and precision of the XGBoost model were the second largest, at 0.752 and 0.723, respectively.The recall rate and F1 value of the XGBoost model were the highest, at 0.728 and 0.726, respectively (Table 3).The positive predictive value and negative predictive value of the XGBoost model were highly consistent with the real values (Fig. 5).In conclusion, of the six ML models tested, the XGBoost model demonstrated the best performance.

Establishment of the nomogram and performance comparison with the XGBoost model
Based on multivariate logistic regression analysis, the SLN-positive ratio was an independent predictor of NSLNM.In a previous study, SLN status was also correlated with NSLNM [38].Therefore, these two variables were applied to develop the nomogram.The C-index of the nomogram in the training set and test set was 0.706 and 0.647, respectively.After internal verification by the bootstrap method, the C-index in the training set and test set was similar, at 0.706 and 0.646, respectively.Figure S2a shows a nomogram for predicting NSLNM based on the SLN-positive ratio and SLN group.Based on the scores from the different states of the nomogram's variables, the probability of NSLNM for a certain patient can be obtained.The AUC values of this model in the training set and the test set were 0.647 (Fig. S2b) and 0.706 (Fig. S2c), respectively.The deviation between the predicted value and the actual value in the training set and the test set was somewhat large (Fig. S2d, e).In the training and test sets, the AUC value of XGBoost was larger than that of the nomogram (0.781 vs. 0.647; 0.764 vs. 0.706; Table S1).These results showed that the XGBoost model was superior to the nomogram in predicting NSLNM.

Interpretability of the XGBoost model
Based on the above results, XGBoost was the best model to predict NSLNM.To make this model and its prediction easier to understand, this study makes use of the SHAP framework.Figure 6a shows the first ten characteristic parameters affecting NSLNM: SLN-positive ratio, BMI, MG BI-RADS classification, SLN group, cT, number of births, age, cN, US blood flow signal of tumor, and US BI-RADS classification.To explore how these characteristics affect NSLNM, SHAP values are further used for interpretation (Fig. 6b).The SHAP value (X-axis) represents the degree to which the feature influenced NSLNM, and the feature ranking (Y-axis) represents the size of the feature values.Red dots represent higher values, and blue dots represent lower values.The results show that compared with an SLN-positive ratio ≤ 0.5, a sentinel node-positive ratio > 0.5 was more likely to be found along with NSLNM.Compared with the low-BMI group and the normal-BMI group, the overweight and obesity group was more likely to develop NSLNM.Compared with the lower class of MG BI-RADS, the higher class of MG BI-RADS was more likely to be found with NSLNM.NSLNM was more likely to occur in the SLNpositive group than in the SLN-negative group.NSLNM This study also individualized the interpretation of the model and took two typical examples to verify the accuracy of XGBoost: one patient with actual NSLNM (Fig. 7a) and one patient without NSLNM (Fig. 7b).
Arrows demonstrate the effects of different variables on the outcome prediction.Red and blue arrows show whether the variable was likely to occur (red) or not (blue).The combined effects of all variables provided the final SHAP value, corresponding to the predicted score.The patient with NSLNM had a high SHAP value of 1.57

Discussion
In this study, we constructed six ML models to predict NSLNM using preoperative and intraoperative clinicopathological features and compared their performance.The XGBoost model showed the best performance, and its predictive ability was also superior to that of the nomogram.The XGBoost model was well explained through the SHAP framework.
In previous studies, LVI, grade, pathological tumor size, and molecular typing of breast cancer were often included in prediction model to predict NSLNM [39][40][41][42].Although the inclusion of these postoperative parameters improved the prediction accuracy, the difficulty in obtaining these parameters preoperatively and intraoperatively may limit their clinical application.A previous study used clinical tumor size to establish predictive models [43].Therefore, clinical tumor size was put into the predictive models in this study.Murata, T et al. included 804 patients with operable primary breast cancer and showed that NSLNM was more likely to occur with an SLNpositive ratio of ≥ 0.5 than with an SLN-positive ratio of < 0.5 (p = 0.024) [44].Wang Nana et al. retrospectively analyzed 495 patients and found that patients in the SLNpositive group were more likely to develop NSLNM than those in the SLN-negative group (p < 0.001) [41].This study also demonstrated that the SLN-positive rate was an independent predictor of NSLNM.Some scholars [45] have found that NSLNM was closely related to the ultrasound tumor boundary and blood flow signal (p = 0.038, p = 0.036).This study had similar results, 26 patients (7.7%) with clear ultrasound tumor boundaries had NSLNM, while 83 patients (24.6%) with ambiguous ultrasound tumor boundaries and 229 patients (67.8%) with unclear ultrasound tumor boundaries had NSLNM.Patients with ambiguous or unclear tumor boundaries were more likely to develop NSLNM.In the patients with NSLNM, most patients (84.3%) showed a blood flow signal.The above parameters were not independent predictors of NSLNM, which may be attributed to the fact that SLN-negative patients were also included in this study.
Kuo YL et al. retrospectively analyzed 1496 malignant breast cancer patients and established a nomogram to predict NSLNM.The model showed good predictive performance, and the AUC value of the model was 0.738 [46], but it is not clear whether it was better than the ML model.Mi DU et al. developed an ML model to predict 3-year and 5-year disease-specific survival for oral and pharyngeal cancers and compared its performance with conventional Cox regression, showing that the ML model had better predictive performance [47].However, no such comparison has been made in breast cancer for predicting NSLNM.In this study, for the first time, the prediction performance of NSLNM was compared between the ML model and nomogram.The results demonstrated that the AUC value of the XGBoost model was larger than that of the traditional nomogram (0.781 vs. 0.647; 0.764 vs. 0.706).Compared with traditional regression Fig. 7 NSLNM prediction of two typical patients models, ML models can more accurately identify and analyze the potential relationships between different variables, and their predictive accuracy is particularly suitable for achieving individualized therapy and predictive medicine [48], which will help us better solve clinical problems.
A deep learning radiomics model has been developed to predict the risk of NSLNM.Its sensitivity for NSLNM was 98.4% (95% CI: 95.6-99.9%),and its negative predictive value was 91.7% (95% CI: 88.8-97.9%) in the validation set [17].This model has good predictive ability, but the lack of explanation of the model makes it impossible for readers to intuitively understand the prediction results of the ML model, and its complex region of interest (ROI) drawing also limits its application for clinical breast surgeons.In this study, six powerful ML models were developed using clinicopathological features that are easy to obtain, and their predictive performance for NSLNM was compared.All models showed good predictive performance; the XGBoost model is the best.We visualized the optimal model with SHAP values and graphs.The summary charts show the effects of different variables on NSLNM, among which the SLN-positive ratio had the greatest impact on NSLNM.Compared with an SLN-positive ratio ≤ 0.5, an SLN-positive ratio > 0.5 was more likely to produce NSLNM.Two typical patients (one with NSLNM and one without NSLNM) were also explained using force diagrams.
Some studies [11][12][13] showed that 40-60% of breast cancer patients with SLNB further underwent ALND, even if no other lymph node metastasis was found.Some breast cancer patients chose to directly undergo ALND due to poor finances condition.With ALND comes the problem of lymphedema, which limits the upper limb function of breast cancer patients, leading to worse working ability and lower income, creating a vicious cycle.On the other hand, some patients were found to have NSLNM with negative SLNs (6.2% in this study), which could lead to a second surgery of the axilla.Therefore, accurate prediction of NSLNM is necessary.The XGBoost model in this study showed powerful predictive ability, which could help us avoid overtreatment or undertreatment.There is still a long way to go before this model can be applied to real-world medical settings because it still needs to be tested in different populations.In addition, developing a software application (APP) based on this model will be a difficult and time-consuming project.
Although the XGBoost model developed here can well predict NSLNM, this study has some limitations.First, this is a retrospective study conducted at a single institution.The inclusion of multicenter data would be more conducive to model validation.Second, with the exception of the group with breast invasive ductal carcinoma, few patients had other pathological types of breast cancer.If the sample size of patients with other pathological types of breast cancer can be increased, the probability of occurrence of NSLNM in different pathological types can be better compared, and the ML model developed will be more suitable for clinical practice.

Conclusion
The optimal ML model XGBoost was developed using preoperative and intraoperative clinicopathological features and was superior to the traditional nomogram in predicting NSLNM.The SHAP framework can explain how the best model works, intuitively display the influence of characteristic variables on NSLNM, realize the clinical translation of machine learning technology, and assist clinicians in making more individualized and accurate diagnosis and treatment plans.

Fig. 1
Fig. 1 The flow chart of patients selection and the flow chart for the development, evaluation, and explanation of models

Fig. 3
Fig. 3 Learning curves of different machine learning models

Fig. 4
Fig. 4 Performance comparison of different machine learning models

Fig. 5 Fig. 6
Fig.5The confusion matrix of different ML models

Table 1
The relationship between characteristics and non-SLN metastasis

Table 2
Relationship between training set characteristics and NSLNM

Table 3
Results of NSLNM predicted by different ML Abbreviations: LR Logistic regression model, DT Decision tree model, SVM Support vector machine model, KNN K-nearest neighbor model, RF Random forest model, XGB Extreme gradient boosting model