Predicting the survival of patients with pancreatic neuroendocrine neoplasms using deep learning: A study based on Surveillance, Epidemiology, and End Results database

Abstract Background The study aims to evaluate the performance of three advanced machine learning algorithms and a traditional Cox proportional hazard (CoxPH) model in predicting the overall survival (OS) of patients with pancreatic neuroendocrine neoplasms (PNENs). Method The clinicopathological dataset obtained from the Surveillance, Epidemiology, and End Results database was randomly assigned to the training set and testing set at a ratio of 7:3. The concordance index (C‐index) and integrated Brier score (IBS) were used to compare the predictive performance of the models. The accuracy of the model in predicting the 5‐year and 10‐year survival rates was compared using the receiver operating characteristic curve, decision curve analysis (DCA) and calibration curve. Results This study included 3239 patients with PNENs in total. The DeepSurv model had the highest C‐index of 0.7882 in the testing set and training set and the lowest IBS of 0.1278 in the testing set compared with the CoxPH, neural multitask logistic and random survival forest models (C‐index = 0.7501, 0.7616, and 0.7612, respectively; IBS = 0.1397, 0.1418, and 0.1432, respectively). Moreover, the DeepSurv model had the highest accuracy in predicting 5‐ and 10‐year OS rates (area under the curve: 0.87 and 0.90). DCA showed that the DeepSurv model had high potential for clinical decisions in 5‐ and 10‐year OS models. Finally, we developed an online application based on the DeepSurv model for clinical use (https://whuh‐ml‐neuroendocrinetumor‐app‐predict‐oyw5km.streamlit.app/). Conclusions All four models analyzed above can predict the prognosis of PNENs well, among which the DeepSurv model has the best prediction performance.


| INTRODUCTION
Neuroendocrine tumors (NENs) originate from neuroendocrine cells and can occur in all organs; the most common sites are the gastrointestinal tract, pancreas, and lungs. 1 Although rare, the incidence of NENs has increased significantly over the past decades, reaching 6.98 cases per 100,000 people as reported by the Surveillance, Epidemiology, and End Results (SEER) program. 2 As a subgroup of NENs, pancreatic neuroendocrine neoplasms (PNENs) account for approximately 7% of all NENs, and the highest incidence reaches 0.48 per 10,0000 people. 2,3 PNENs are classified as nonfunctional and functional tumors according to whether they have hormone secretion function and hormone-induced clinical symptoms. Nonfunctional PNENs account for 60%-90% of PNENs, and functional PNENs are rare and mainly include insulinomas, glucagonomas, gastrinomas, growth hormone tumors, vasoactive intestinal polypeptide (VIP) tumors, and adrenocorticotropic hormone tumors. 4 Due to the highly heterogeneous biological behavior and complex clinical manifestations, there is a great challenge for physicians to make early diagnosis and prognosis evaluations. According to the SEER database, the median survival time of PNEN patients with distant metastasis is only 27 months, and the 5-year and 10-year overall survival (OS) rates are only 27% and 11%, respectively. 5 Even after radical surgery, the 5-year recurrence and metastasis rates range from 21% to 47%. 6,7 Consequently, accurate models are necessary to predict the prognosis of PNENs.
TNM stage, tumor size, organ invasion, and Ki-67 index have been included in the prognostic system by the European Neuroendocrine Tumor Society (ENETS) and American Joint Committee on Cancer (AJCC), but factors such as age, sex, tumor number or therapy methods are still lacking. 8,9 Therefore, a comprehensive accurate survival prediction model made up of numerous clinicopathological features is necessary to contribute to treatment decisions and disease surveillance.
With the continuous development of computer technology, machine learning (ML) technology has provided a new method for tumor diagnosis and management. 10 It can use information from large datasets to create more accurate prediction models. Therefore, we aim to assess the performance of three advanced ML algorithms and a traditional Cox proportional hazard (CoxPH) model in predicting the OS of PNENs in this study. Finally, the best model will be implemented in available software for clinical use.

| Data collection
We retrospectively collected and analyzed data of patients with PNENs from the SEER database (Version 8.4.0; National Cancer Institute, Bethesda, MD) from the 18 cancer registries from 2000 to 2018. SEER data are open to the public, and approval from the local ethics committee is not needed.
The following variables from SEER were obtained: age, sex, marital status, race, primary tumor site, AJCC TNM stage (seventh edition), grade, treatment methods (surgical type, radiotherapy, and chemotherapy), tumor size, tumor number, tumor extension, distant metastasis, survival months and status. In the pathological grading system, PNENs were divided into a well-differentiated group and a poorly differentiated group.

| Feature selection
Correlations between clinicopathological features were calculated using the STATS R package. Highly collinear variables with Pearson correlation values greater than 0.7 were excluded because they would overfit the model. 11 The prognosis of the characteristic was evaluated using univariate and multivariate Cox regression methods. The final features obtained from the above analysis methods were used for the development of the models.

| Data preprocessing
We preprocessed the data extracted from the SEER database. Unordered categorical variables were quantified and converted into binary categorical variables using the one-hot encoding method. Continuous variables did not require any special treatment. The MissForest method was used to interpolate the missing data, which was suitable for continuous and categorical variables. Its core algorithm was to use known variables as independent variables and those with missing values as dependent variables to establish a random forest to predict missing values. 12

| Model development
In this study, the patients from the SEER database were divided into a training set and a testing set at a ratio of 7:3. Four popular algorithms, including DeepSurv, the neural multitask logistic model (NMLTR), random survival forest (RSF), and the standard CoxPH model, were selected to develop the training model. CoxPH is a semiparametric regression model that analyzes the impact of multiple variables on patient survival. RSF is a combination of random forest and survival analysis methods that can assess the risk function and analyze right-censored data by using integrated prediction of multiple decision trees. 13 As the prognostic neural network model based on Cox regression analysis, DeepSurv is more stable than linear regression or random survival forest prediction. 14 NMLTR is based on multitask logistic regression, which can improve the network structure and loss function. Moreover, on the training dataset, fivefold cross validation was used to adjust the hyperparameters, and then the parameter combination with the best predictive ability was selected.

| Model performance evaluation
After obtaining the final models, the accuracy and overall prediction performance of each model on the testing and training datasets were evaluated and compared using the concordance index (C-index) 15 and integrated Brier score (IBS). 16 The sensitivity and specificity of the prediction models were evaluated using the receiver operating characteristic (ROC) curve and the area under the curve (AUC). The clinical usefulness and net benefit of the models were assessed using decision curve analysis (DCA). 17

| Model importance
First, we permuted the data of the features in the testing dataset to calculate the importance of the clinicopathologic characteristics. The accuracy of the model was then calculated by using the displacement data to determine the feature importance. 18

| Algorithm deployment
The "Streamlit" package in Python was used to develop an online application of PNEN survival probability based on the model with the best performance.

| Statistical analysis
Continuous variables such as clinical and pathological features are displayed as the mean value ± standard deviation. Frequencies (n) and percentages (%) were used to represent categorical variables. For statistical analysis, we used the chi-square test and Student's t test. Python (version 3.3.8) was applied to construct the ML models. The date of initial diagnosis to death or last follow-up was used to represent OS. All calculations and analyses were performed in R software (version 4.2.1; the R Foundation for Statistical Computing). p < 0.05 was considered statistically significant in this study.

| Patient characteristics
The study collected data from a total of 3239 patients diagnosed with PNENs in the SEER database from 2000 to 2018, and Table 1 presents the clinicopathological features of these patients. There were 1451 (45.0%) females and 1788 (55.0%) males. The median age at diagnosis was 60 years (range: 50-69 years). A total of 38.0% of patients were not married, and 62.0% were married. Seventy-nine percent of patients were white, 12% were black, and 8.9% were other. According to the primary site of PNENs, 1032 (32%) were in the head of the pancreas, 963 (30%) were in the tail, 420 (13%) were in the body, and 833 (26%) were in other sites. There were 814 (25%) patients in stage I, 603 (19%) in stage II, 103 (3.2%) in stage III, and 1719 (53%) in stage IV. The grading results showed that 81.2% of the tumors were well differentiated, and 18.8% were poorly differentiated. For the surgical method, 1744 (55%) patients did not receive surgery, 1278 (40%) received local or partial pancreatectomy, and 169 (5.3%) received total or extended pancreatectomy. In addition, 94 (2.9%) patients received radiotherapy, and 1093 (34%) patients received chemotherapy. For the tumor size, the median size was 38 mm (range: 24-60 mm). For the tumor number, 2965 (92%) cases had one tumor, and 274 (8.5%) had more than one tumor. Distant metastases occurred in 1719 (53%) patients. Localized tumors accounted for 56%, no vascular invasion accounted for 32%, and

| Feature selection and data preprocessing
The univariate Cox regression analysis revealed that most clinicopathological characteristics except for race and radiotherapy were significantly associated with OS in patients with PNENs (Table 1). According to the results of the multivariate Cox regression, the independent prognostic factors for OS were age (p < 0.001), sex (p < 0.01), stage (p < 0.01) and surgery (p < 0.001). There was a high degree of collinearity between stage and distant metastasis, as shown in Figure 1. Overall, we finally included the following 10 characteristics in the model development: age, sex, marital status, race, primary tumor site, grade, surgery, chemotherapy, tumor size, and tumor extension. Next, according to a ratio of 7:3, we divided the dataset into a training set (2268 cases) and a testing set (971 cases). The data distribution of these features is shown in Table 2.

| Model performance comparisons
To confirm the best predictive model, several methods were used to compare the performance between the CoxPH, NMLTR, DeepSurv, and RSF models. Table 3 shows the values of the C-index and IBS for these models.
In the testing set, the DeepSurv model had the highest Cindex value of 0.7882 compared with the CoxPH, NMLTR and RSF models (C-index = 0.7501, 0.7616, and 0.7612, respectively). The IBS of the CoxPH, NMLTR, DeepSurv, and RSF models were 0.1397, 0.1418, 0.1278 and 0.1432, respectively (Table 3; Figure 2). The higher value of the C-index and lower value of the IBS indicated that the DeepSurv model performed better than the other models.
To assess the accuracy of the four models, ROC curves were constructed. The results showed that the DeepSurv model had the best predictive performance for 5-and  (Figure 3A,B). In addition, we also performed the ROC curve of the seventh AJCC TNM staging system to compare the predictive model with commonly used prognostic classifications and found that the AUCs of the four models in the testing set were all

F I G U R E 1 Correlation coefficients for variables in the dataset.
The value of the correlation coefficient varied between +1 and 1. Blue represents a positive correlation, and yellow represents a negative correlation.
larger than that of the AJCC staging system for both 5and 10-year OS (AUC = 0.721 and 0.805) ( Figure S1). The DCA results showed that all the prognostic models had good positive net benefit at the 5-and 10-year time points ( Figure 3C,D). Followed by the NMTLR and CoxPH models, the calibration plots of 5-and 10-year OS suggested the best concordance between the predictive values and observed probabilities in the DeepSurv and RSF models ( Figure 3E,F). The above results suggested that the DeepSurv model had higher performance and accuracy than the NMLTR, RSF and classical CoxPH models in predicting the survival prognosis of patients with PNENs.

| Performance of the models tested by risk stratification of survival time
Patients with PNENs were classified into high-and lowrisk groups based on median risk scores of the CoxPH, NMLTR, DeepSurv, and RSF models ( Figure 4A-D). The OS of patients in the high-risk group was shorter than that of patients in the low-risk group in the four models (p < 0.001, Figure 4E-H). For stratifying the survival probability of PNEN patients, the findings revealed that the four models all showed good performance. Figure 5 presents the feature importance for the accuracy of the DeepSurv, NMLTR and RSF prognostic models. The C-index was reduced on average by more than 1% with surgery, grade, age, chemotherapy, tumor extension, primary tumor site and tumor size.

| Development of the online survival application
To better apply the conclusions of this study to clinical practice, we developed a web-based PNEN survival probability application based on the DeepSurv model (https:// whuh-ml-neuro endoc rinet umor-app-predi ct-oyw5km. strea mlit.app/). This website could provide clinicians, patients, and researchers with the 3-, 5-and 10-year survival probabilities of PNENs. The illustration of the online application is also shown in Figure 6.

| DISCUSSION
In this study, we used the SEER database to collect the important clinicopathological characteristics of PNEN patients and constructed prognostic models of PNENs through three common ML algorithms (NMTLR, RSF, and DeepSurv) and the classical CoxPH model. We first performed CoxPH regression and collinearity analysis to determine variables related to the prognosis of 3239 patients with PNENs. Finally, age, sex, marital status, race, primary site, grade, surgery, chemotherapy, tumor size, and tumor extension were included in the models. A variety of methods were applied to compare the prediction effectiveness of each model for PNENs. PNEN is a rare heterogeneous tumor that is usually diagnosed at an advanced stage and has a poor prognosis. 19 Early detection and treatment are crucial to improve the prognosis of these patients. At present, there are several prognostic detection systems to assess the prognosis of patients with PNENs. Zhai et al. found that based on the SEER database, grading was superior to TNM staging in predicting the survival prognosis of PNENs. 20 A metaanalysis by Gao et al. showed that surgical margin, G-stage, TNM stage, lymph node, metastasis, vascular invasion and necrosis were associated with the prognosis of PNENs. 21 In addition, one study established a nomogram of recurrence after radical resection of PNENs and found that the number of positive lymph nodes, tumor diameter, Ki-67 index and perineural or vascular invasion were prognostic factors. 22 In addition to these factors, primary tumor site, grade, surgical procedure, chemotherapy, race, and marital status were included in the model of this study.
In recent years, ML has been widely applied in medical imaging, auxiliary diagnosis and disease prediction of NENs. Klimov  to predict metastatic risk in PNENs. 23 The pathological grade of patients with PNENs might be identified precisely using CT combined with ML. 24,25 However, ML has not yet been reported in predicting the survival of PNENs. Therefore, this study aimed to focus on the role of ML models in the prognosis of PNENs. We established NMTLR, RSF, DeepSurv and CoxPH models and then evaluated the performance of these four models with the C-index, IBS, ROC, calibration chart and DCA. Our study showed that compared with the CoxPH model, the C-index of the NMTLR, RSF and DeepSurv models was higher in the training set and testing set, indicating that the prediction performance of ML was better than that of CoxPH regression. Among the three ML models, the DeepSurv model had the best prognostic performance. Moreover, the IBS of the DeepSurv model was the lowest, and the AUC area was the largest in predicting the 5-and 10-year OS. Additionally, we also found that the AUCs of the four models to predict 5-and 10-year OS were all larger than those of the traditional AJCC staging system. All of these findings suggested that the DeepSurv model was more accurate in predicting the survival of patients with PNENs.
Previous studies often used Cox regression to assess the survival of PNEN patients, which can evaluate the effects of multiple factors on survival time simultaneously but cannot identify complex nonlinear relationships among variables. 26 In contrast, ML can incorporate nonlinear factors well into the impact on the results. As a prognostic neural network model, DeepSurv had more stable prediction ability than linear regression or RSF. 14 Oei et al. found that the conditional survival forest and DeepSurv model were better than the CoxPH model in predicting the survival prognosis of nasopharyngeal carcinoma with the C-index and IBS. 27 Kim et al. found that the DeepSurv model was more precise in predicting the survival rate of oral cancer patients than the RSF and Cox proportional hazards models. 28 Yan et al. found that DeepSurv was the most successful model in predicting the prognosis of chondrosarcoma compared with the RSF, CoxPH, and NMTLR models. 11 Deep learning can convert linear and nonlinear predictor variables into linear combinations to predict the prognosis of patients by using multistage neural networks and reduce the structural bias associated with missing follow-up information, which can predict the survival probability of patients at any time point. 27,29 Therefore, when dealing with large samples and multivariate and nonlinear data, the DeepSurv model has obvious advantages in prediction compared with other models.
Finally, to provide clinicians, patients and researchers with more convenient data visualization, we developed a free online application based on the DeepSurv algorithm in this study. The application can dynamically predict the OS probability of PNEN patients with different clinicopathological features at different time points, which can be accessed via the website https://whuh-ml-neuro endoc rinet umor-app-predi ct-oyw5km.strea mlit.app/.
However, there were some limitations in our study. First, this study only included data from patients with PNENs in some regions of the United States, and we did not verify the established predictive model in the external dataset. The clinicopathological data of PNEN patients from other centers are not easy to obtain due to the particularity of clinicopathological information. In addition, we could only access the inferred data of previously published articles, rather than the original data, which cannot be verified externally. We will collect relevant clinicopathological information from PNEN patients to further evaluate the reliability of the DeepSurv method in future studies. Second, the specific chemotherapy methods, which are significant predictive factors for PNENs, are not included in the SEER database. Third, prospective studies are required to provide more reliable evidence of the predictive performance of the models.

| CONCLUSION
In conclusion, based on the basic clinical characteristics of PNENs in the SEER database, this study established and compared three ML algorithms and a traditional algorithm to predict the survival performance of patients with PNENs. Studies have shown that all four models F I G U R E 5 Heatmap of feature importance for the DeepSurv, NMLTR and RSF models. The values were expressed as the percentage decrease in the C-index after replacement. The higher the values of the feature suggested that it was more important to the predictive accuracy of the deep learning models. NMLTR, neural multitask logistic; RSF, random survival forest could predict the prognosis of PNENs well, among which the DeepSurv model had the best prediction performance. As a deep learning algorithm, the DeepSurv model had accuracy rates of 87% and 90% in predicting the 5-and 10year OS, respectively, which had a certain potential clinical application value. We finally developed an application based on the DeepSurv model that could provide reliable individual survival information for patients with PNENs and help with clinical decision-making.