Prognostic artificial intelligence model to predict 5 year survival at 1 year after gastric cancer surgery based on nutrition and body morphometry

Abstract Background Personalized survival prediction is important in gastric cancer patients after gastrectomy based on large datasets with many variables including time‐varying factors in nutrition and body morphometry. One year after gastrectomy might be the optimal timing to predict long‐term survival because most patients experience significant nutritional change, muscle loss, and postoperative changes in the first year after gastrectomy. We aimed to develop a personalized prognostic artificial intelligence (AI) model to predict 5 year survival at 1 year after gastrectomy. Methods From a prospectively built gastric surgery registry from a tertiary hospital, 4025 gastric cancer patients (mean age 56.1 ± 10.9, 36.2% females) treated gastrectomy and survived more than a year were selected. Eighty‐nine variables including clinical and derived time‐varying variables were used as input variables. We proposed a multi‐tree extreme gradient boosting (XGBoost) algorithm, an ensemble AI algorithm based on 100 datasets derived from repeated five‐fold cross‐validation. Internal validation was performed in split datasets (n = 1121) by comparing our proposed model and six other AI algorithms. External validation was performed in 590 patients from other hospitals (mean age 55.9 ± 11.2, 37.3% females). We performed a sensitivity analysis to analyse the effect of the nutritional and fat/muscle indices using a leave‐one‐out method. Results In the internal validation, our proposed model showed AUROC of 0.8237, which outperformed the other AI algorithms (0.7988–0.8165), 80.00% sensitivity, 72.34% specificity, and 76.17% balanced accuracy. In the external validation, our model showed AUROC of 0.8903, 86.96% sensitivity, 74.60% specificity, and 80.78% balanced accuracy. Sensitivity analysis demonstrated that the nutritional and fat/muscle indices influenced the balanced accuracy by 0.31% and 6.29% in the internal and external validation set, respectively. Our developed AI model was published on a website for personalized survival prediction. Conclusions Our proposed AI model provides substantially good performance in predicting 5 year survival at 1 year after gastric cancer surgery. The nutritional and fat/muscle indices contributed to increase the prediction performance of our AI model.


Introduction
Gastrectomy is a pivotal treatment option which provides the possibility to cure patients with gastric cancer. 1 Diagnosis at earlier stages, the introduction of perioperative chemotherapy, and advances in surgical techniques have enabled clinicians to achieve better patient survival due to this malignancy. 2 As the number of long-term survivors increases, precision medicine and personal stratification of patient prognosis are gaining emphasis because the tumour, node, metastasis (TNM) staging system does not provide accurate predictions of patient survival during and after treatment. 3 Many prognostic models using various nomograms, scoring systems, and artificial intelligence (AI) models, were developed to predict the overall survival of patients after surgery. [4][5][6][7] However, none of these models have been used extensively in clinical practices due to the limited accuracy in predicting the survival of patients in various situations.
We hypothesize that one of the main reasons for the inaccuracy is the limited number of prognostic variables available to build an adequate model. For simplicity and the uniform application of the prognostic models, prior models depended on a few known variables such as the TNM staging system, age, sex, tumour location, tumour histology, and the extent of the surgery. 3,6,8 However, recent studies demonstrated that additional variables could affect patient survival in gastric cancer, such as nutrition, sarcopenia, anaemia, and interval changes in these variables between pre-operation and post-operation. 9,10 Based on these findings, we assumed that a higher accuracy of prognostication could be achieved by an in-depth analysis of as many variables as possible that could be easily derived from clinical data and body morphometry data derived from imaging.
Based on prior research, we postulated that the optimal timing to predict long-term survival would be 1 year after gastrectomy when patients are recovered from several changes derived from surgery and adjusted to new metabolic balance. 9,10 In addition, the time-varying host factors such as interval change in muscle mass and nutrition between preoperative and postoperative period may influence the long-term survival, thus should be included in prognostic model. 4 A deep learning method might be a better tool than conventional prognostic models such as the Cox-hazard proportional regression model in the construction of a prognostic model that consists of many variables. Deep learning has an advantage in handling big clinical data with non-linear effects, interactions between variables, and time-varying effects between variables, such as before and after surgery. 3 There have been a few encouraging studies in building a prognostic model for patients treated with gastrectomy using deep learning AI techniques 3,7,11,12 ; however, there have been shortcomings in the use of the model, which include insufficient patient cohort size, a limited number of variables included in the model, exclusion of non-disease-related variables, and inadequate external validation process of the model.
In this study, we aimed to develop and validate an AI prognostic model for predicting the 5 year survival at 1 year after gastric cancer surgery using big data sets (>4000 cases) with many patient-related variables, including nutrition, skeletal muscle mass, visceral and subcutaneous adipose tissue mass, sarcopenia, obesity, co-morbidities, and interval changes in these variables between before and after surgery as well as cancer-related variables.

Methods
This study was approved by two institutional review boards: Asan Medical Center (AMC), Seoul, Korea (IRB No. 2017-0216) and Ajou University Hospital (AUH), Suwon, Korea (AJIRB-MED-MDB-22-012). Informed consent was waived in all participants by IRBs. All methods were performed in accordance with the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines. 13

Patient data for AI model development
We used the comprehensive gastric cancer surgery registry that was prospectively built at the AMC between 2003 and 2012. This registry contained data from 6229 patients with 63 clinicopathologic variables. Among the total of 6229 patients, we excluded the deceased within 1 year after surgery or patients with missed follow-ups (n = 688). In addition, we excluded 1516 patients who were missing more than 19 variables (30% of clinical variables) or abdominal CT scans at preoperative period and 1 year after gastrectomy. A total of 4025 patients' data (mean age 56.1 ± 10.9, 36.2% females) from AMC were used in the training of our AI model.
The 63 clinical variables were classified into demographic variables, physical indices, laboratory results, nutritional index, fat/muscle indices, surgery-related variables, clinicpathological information, and co-morbidities. Table 1 summarizes the 63 clinical variables used in creating our AI model and the statistical summaries of the clinical variables in the survived and deceased groups. For the physical index variables, we used height, weight, and body mass index (BMI).
The laboratory results included cholesterol, haemoglobin, albumin, and protein values. For the nutritional index, we used the nutritional risk index (NRI), which was calculated based on the formula ((1.519 × serum albumin g/ L) + 0.417 × (present weight/usual weight)) × 100. 14 For body morphometry data including fat/muscle indices, we used an artificial intelligence solution (AID-U ™ , iIAD Inc,  Seoul, Korea) to measure subcutaneous fat area (SFA), visceral fat area (VFA) and skeletal muscle area (SMA) at L3 vertebral body level on abdominal CT scans. The skeletal muscle mass index (SMI) was calculated by the SMA divided by the height squared (SMA/height 2 ) or by the adjusted body mass index (SMA/BMI). 15 For the surgical information, we used the type of gastrectomy, the type of anastomosis, treatment intention, history of previous gastric surgery or endoscopic submucosal dissection (ESD), operation method (open vs. laparoscopic approach), extent of lymph node dissection, and length of proximal and distal resection margins.
For clinicopathological data, we used cancer stage, number of tumours, tumour size, number of metastatic lymph nodes, number of retrieved lymph nodes, the presence and diameter of extranodal extension of a metastatic lymph node, the diameter of an extranodal extension of a metastatic lymph node, the presence of lymphovascular invasion and perineural invasion by tumour cells, gross appearance of advanced gastric cancer and early gastric cancer, pathological tumour type, Lauren classification, and tumour location. [16][17][18] For co-morbidities, we included diabetes mellitus, hypertension, chronic active hepatitis, liver cirrhosis, tuberculosis, myocardial infarction, cerebrovascular accidents, valvular heart disease, chronic obstructive pulmonary disease, asthma, and chronic renal failure in the input variables.
The effects of interval changes in variables between preoperative and 1 year postoperative records were recorded as time-varying measurements. Changes in physical indices, laboratory results, the nutritional index, and fat/muscle indices, specifically weight, BMI, cholesterol, haemoglobin, albumin, protein, NRI, SFA, SMA, SMI, and SMA/BMI, were calculated as the time-varying indices.

Patient data for external validation
A total of 590 patients' data (mean age 55.9 ± 11.2, 37.3% females) were used as an external validation for our AI model. The data was collected based on a comprehensive gastric cancer registry prospectively built at the AUH between 2010 and 2012. In the AUH dataset, only 28 of the 63 clinical variables were available. Table 2 summarizes the available clinical variables from the AUH dataset. The AUH's clinical variables did not include all the co-morbidities. Only the cancer stage was considered in the clinic-pathological information. The surgical information only included the types of surgery and anastomosis. The other clinical variables such as age/sex, physical indices, laboratory results, and body morphometry data with fat/muscle indices were all measured, except for protein values (Table S1). It was challenging but worthwhile to evaluate the developed AI model only using a fraction of the clinical variables, which mostly were age/sex, physical indices, laboratory information, and fat/muscle indices.

Final variables with derived time-varying variables
For the AI model to predict the 5 year survival, 63 clinical variables were used. In addition to the 63 variables, the differ-  (Table S2). The external validation data had 28 variables available, which was extended to 54 variables (Table S2).

Data split and cross-validation
In this study, our data is composed of training, internal validation, and external validation data. The AMC data was split into training and internal validation data with a ratio of 8:2 in a stratified fashion. The internal validation dataset was used only for an independent test of the developed AI model and not for training. In addition, the whole AUH dataset was used only for external validation and never used in the model training. Table S3 summarizes the datasets for training, internal validation, and external validation.
A five-fold cross-validation was performed and repeated 20 times to confirm the model's generalization ability using the training data. The training dataset (n = 3220) was first randomly shuffled and divided into five equal groups in a stratified manner. Subsequently, four groups were selected for training the model, and the remaining group was used for testing. This process was repeated five times by changing which group was the testing data. The whole process was repeated 20 times. The finalized AI model was based on the repeated five-fold cross-validation and is described in subsequent sections. We evaluated the performance of the AI model using the internal validation data and the external validation data.

Preprocessing
There were missing variables in the AMC (training and internal validation data) and AUH (external validation data) datasets (Table S4). The average and standard deviations of the missing data in the AMC and AUH datasets were 26.7 ± 30.4%, and 0.1 ± 0.1%, respectively. Note that the percentages of missing data for AUH were considered only for the available variables summarized in Table 2. The missing variable in the training data was replaced with the missing variable's mean from across the training, internal validation, and external validation datasets. The same variable replace-ment method was also applied to replace the unavailable variables in the AUH dataset.
A dataset standardization was performed and is a common requirement for machine learning estimators. The standardization changes the data distribution of each variable with a mean of zero and a standard deviation of one using the equation: where mean train ð Þand SD train ð Þare the mean and standard deviation of each variable in the training dataset. The standardization was applied to the training, internal validation, and external validation datasets.

Multi-tree extreme gradient boosting
The extreme gradient boosting (XGBoost) model was adopted to develop the AI model to predict the 5 year survival. 19 In this study, we trained the XGBoost model using the five-fold cross-validation method and repeated this 20 times using the training data. Then, the results were ensembled into 100 trees with soft voting. Figure S1 illustrates the ensemble AI model, based on the combination of the 100 trees produced by the XGBoost algorithms. Each tree was produced from the XGBoost algorithm by setting the maximum depth to 2, the learning rate to 0.1, the number of tree estimators to 50, the value of the regularization parameter α to 0.8, the fraction of observations to 0.2, and the fraction of columns to 0.8.
In this study, the number of deceased patients was much lower than the number of survived patients. Thus, for each tree from XGBoost, we up-sampled the deceased patient data using the synthetic minority over-sampling technique (SMOTE), aiming to prevent the model's bias toward the survived patient data by balancing the data in the two groups. After modelling the multi-tree XGBoost model, the contribution of each of the 89 variables to the prediction of survival was investigated via a variable importance analysis. The repeated five-fold cross-validation provided 100 sets of important variables. We then averaged and normalized the sets of important variables in order that the values from each classifier were in the range from zero to one.
To compare the performance of our proposed predictive AI model, we separately trained the following models, random forest (RF), 20 gradient boosting machine (GBM), 21 adaptive boosting (Adaboost), 22,23 light gradient boosting machine (LightGBM), 24 categorical boosting (CatBoost), 25 and ensemble models including XGBoost. 19 The implementation and analysation of the machine learning models was done using Imbalanced-learn (version 0.8.

Performance evaluation of AI models
The performance of our AI model's prediction was evaluated and compared based on repeated K-fold cross-validation using the isolated testing data. The model was additionally evaluated with the external validation data. Sensitivity, specificity, accuracy, and balanced accuracy metrics were evaluated and defined as: BalancedAccuracy ¼ Sensitivity þ Specificity 2 ; True positive (TP), true negative (TN), false positive (FP), and false negative (FN) were used in the evaluations. The balanced accuracy evaluated the imbalance between the survived and the deceased groups. In addition, we computed the area under the receiver operating characteristics (AUROC).

Sensitivity analysis
In addition to developing an AI model, we included variables corresponding to pre-and post-operative nutritional and body morphometry data such as NRI, SFA, VFA, SMA, SMI, and SMA/BMI. To investigate the effect of these variables, we adopted a leave-one-out method by excluding the preand post-operative nutritional and fat/muscle indices from the variables and repeated the training of the AI model. Subsequently, we evaluated the prediction performance based on cross-validation, internal validation, and external validation data.

Public website deployment
We deployed the AI model on a public web server (http://airesearch.co.kr/survival) through Amazon Web Services, which provides a secure, durable, and scalable service. After accessing the website, a user enters the clinical variables, which are encoded by the website's server, and users can immediately obtain the predicted 5 year survival. There is no need to enter private information other than the clinical variables. The entered information is immediately deleted after the prediction is derived, so there is no risk of information exposure. Code is available at https://github.com/ HeewonChung92/Gastric_Cancer_Survival.

Results
Variable importance rankings Table 3 shows the ranked averages of the important variables using XGBoost via the repeated K-fold cross-validation method. The results showed that 25 variables contributed to the prediction of survival. Among the variables, age had the highest importance value, followed by preoperative albumin, preoperative NRI, T stage, and the percent difference of haemoglobin. In addition, the results showed that co-morbidities and surgical information did not contribute to the prediction of survival. Furthermore, among the top 25 variables, 15 were related to the pre-and postoperative variables, including physical indices, laboratory results, the nutritional index, and fat/muscle indices. These pre-and

K-fold cross-validation results
Based on the repeated five-fold cross-validation, the AI model shows a sensitivity of 76.77%, a specificity of 75.26%, an accuracy of 75.32%, a balanced accuracy of 76.01%, and an AUROC of 0.8118 ( Table 4). The results showed that the AI model provided higher values of sensitivity, specificity, accuracy, balanced accuracy, and AUROC than those from any other models, including RF, GBM, AdaBoost, LightGBM, CatBoost, and Ensemble.

Internal validation results
Using the isolated split data (n = 1121) only for internal validation, the AI model showed a sensitivity of 80.00%, a specificity of 72.34%, an accuracy of 72.67%, a balanced accuracy of 76.17%, and an AUROC of 0.8237. Table 5 summarizes the internal validation data results in comparison with other machine learning algorithms. The results showed that our AI model provided higher values of balanced accuracy and AUROC than those from any other model.

External validation results
With the independent external validation data (n = 590), our AI model showed a sensitivity of 86.96%, a specificity of 74.60%, an accuracy of 75.08%, a balanced accuracy of 80.78%, and an AUROC of 0.8903 ( Table 6). The overall accuracy increased with the external validation data compared with the internal validation data with a change in sensitivity from 80.00% to 86.96%, specificity from 72.34% to 74.60%, accuracy from 72.67% to 75.08%, balanced accuracy from 76.17% to 80.78%, and AUROC from 0.8237 to 0.8903, even though the external validation data only had 58 extended variables (28 original variables), compared with a total of 89 variables included in training the model. Although there were fewer variables in the external validation data, there were enough variables with high importance values. Among the top 25 variables in the training dataset, the external validation dataset included 17 variables. Specifically, the external validation data included six variables among the top seven predictive variables, including age, preoperative albumin, preoperative NRI, the difference in the percentage of haemoglobin, cancer stage, and 1 year postoperative NRI. The external validation dataset had less pre-and postoperative missing data than the internal validation dataset. More specifically, among the top 25 variables in the training and internal validation datasets, the percentages of missing 1 year postoperative data for NRI, SMI, VFA, and the difference in NRI were 53.9%, 69.9%, 69.8%, and 53.9%, respectively. On the other hand, in the external validation dataset, the percentages of missing 1 year postoperative data for NRI, SMI, VFA, and the difference in NRI were only 0.2%, 0.0%, 0.0%, and 0.2%, respectively. The lower number of  missing data in the pre-and postoperative variables improved the predictive model's performance. Table 7 summarizes the sensitivity analysis results from this new AI model trained without the nutritional and fat/muscle indices in comparison with our AI model. The difference (diff) represents the accuracy metrics value from our AI model subtracted from the new model without nutritional and fat/ muscle indices. The results showed that the balanced accuracy and AUROC decreased by 1.70% and 0.0223, respectively, using the cross-validation dataset. Using the internal validation dataset, the balanced accuracy and AUROC decreased by 0.31% and 0.0076, respectively. Using the external validation dataset, the balanced accuracy and AUROC decreased by 6.29% and 0.0213, respectively. These results indicated that the nutritional and fat/muscle indices improved the prediction performance of our AI model.

Website deployment
The web application provides the 5 year survival probability at 1 year after gastric surgery, as shown in Figure 1. A user inputs quantified 89 clinical variables (Figure 1(A)), and the user is provided with a 5 year survival prediction (Figure 1 (B)).

Discussion
Our 5 year predictive AI model, multi-tree XGBoost, was able to predict 5 year survival at 1 year after gastric surgery with high accuracy with data from two medical institutions. The internal validation dataset, data from AMC, had a balanced accuracy of 76.17% and an AUROC of 0.8237. The external validation dataset, data from AHU, had a balanced accuracy of 80.78% and an AUROC of 0.8903. We complied our study datasets to have several unique characteristics. First, we incorporated as many variables in the model as possible, initially 63 variables, to reflect modern clinical practices and patient characteristics. Second, we incorporated variables that exceeded routine clinicopathological data, including nutritional and fat/muscle indices, such as NRI, SFA, VFA, SMA, SMI, and SMA/BMI. Third, both preoperative and postoperative variables were included in our model to reflect time-varying effects in variables. Simply using preoperative or postoperative variables alone might not reflect the physiological changes from gastrectomy. Finally, we trained our AI model using large, well-curated datasets. Thus, data from 4025 patients were included from high-volume specialty centres. To the best of our knowledge, this study used the largest cohort for an AI predictive model for patients with gastric cancer treated with gastrectomy; this enabled us to build an AI model with high prediction accuracy.
In this study, we adopted the Ensemble XGBoost model. The XGBoost model is a highly recognized machine learning approach for its efficiency and accuracy, and as one of the boosting algorithms, it integrates multiple tree models and delivers an improved prediction accuracy. We applied ensemble learning with a soft voting algorithm to enhance prediction accuracy. We postulated that our technical approach was well suited for an AI prediction model with many variables. The prediction accuracy of our AI model compared with other AI algorithms, including RF, GBM, AdaBoost, LightGBM, CatBoost, and Ensemble, our Ensemble XGBoost model yielded the highest prediction accuracy.
Interestingly, part of our study results demonstrated the effect of nutritional and fat/muscle indices on patient survival using the leave-one-out sensitivity analysis method. When   we excluded each of the pre-and postoperative nutritional and fat/muscle indices, the prediction accuracy decreased in both the internal validation dataset and external validation dataset. In addition, the feature importance analysis showed that most nutritional and fat/muscle indices were the relatively high contributors in predicting the 5 year survival: preoperative NRI (3rd), 1 year postoperative NRI (7th), 1 year postoperative SMI (11th), 1 year postoperative VFA (12th), 1 year postoperative protein (14th), postoperative 1 year cholesterol (15th), preoperative weight (19th), preoperative BMI (21th) and difference in nutritional risk index (25th) among the 64 variables. These results clearly showed that incorporating the nutritional and fat/muscle indices improved the prediction performances of the AI model. Previous studies reported that prolonged malnutrition early in life increased the risk of gastric cancer mortality later in life. 26 The majority of patients with gastric cancer experienced cancer anorexia-cachexia syndrome with weight loss, reduced appetite, fatigue, and weakness. 27 In addition, it was reported that malnutrition before and after gastrectomy significantly and adversely affected overall survival. 28 Our study is the first to use the pre-and postoperative nutritional and fat/muscle indices in machine learning algorithms, whereas previous studies have only performed an analysis based on patient statistics. [26][27][28][29][30][31] The main causes of post-gastrectomy death vary along the time after gastrectomy. 32 Of these, our AI model is intended to predict long-term survival at 1 year after gastrectomy using big data sets including patients' status of nutrition and body morphometry. Our AI model does not intend to predict early post-gastrectomy mortality which is mainly contributed to old age, metabolic/nutritional imbalance, tumour recurrence, and postoperative complications. 33,34 Therefore, we excluded all patients who deceased within 1 year after gastrectomy and did not include postoperative complications in our model. In our AI model, we did not include the surgeon factors such as extent of experience and surgical skill level. As a high-volume centre, we perform about 1800 gastrectomy per year. We have put huge efforts to standardize the operation procedures as well as patient management processes, as reported previously. 35 Thus, we did not consider the surgeon factor in our AI model.
Our model is available as a web toolkit, so anyone can use our AI model. Currently, the application does not store any information entered by users. However, we plan to store information entered by users upon agreement to improve the AI model via a real-time learning process. We will use our developed web application to acquire additional data and perform real-time training to update the model. Though our AI model demonstrated high accuracy in predicting the 5 year survival, several limitations exist. First, our AI model was trained using data from a single high-volume specialty gastrectomy centre. Because our insti-tution performs approximately 1800 gastric surgeries every year, our patients' survival might exceed those of other institutions or clinical researchers. In addition, we validated our AI model using a single external institution that is also a high-volume specialty gastrectomy centre. Next, our data included only Korean patients. In future studies, we will train and apply our AI model to more datasets comprising more diverse subjects. To overcome these generalization issues, it may be necessary to validate our AI model using external datasets, such as data from various medical institutions. In addition, we plan to further develop our AI model using extended variables. Finally, the training dataset had a high percentage of missing. Notably, most pre-and postoperative variables corresponding to nutritional and fat/muscle indices had 70% or higher missing data. Nevertheless, we achieved high accuracy which might be attributed to the imputation method to replace missing values in the training dataset. In general, accuracy is higher on imputed dataset as compared with incomplete dataset. 36 In the future, it is important to collect more data with as much information as possible.
In conclusion, we developed an AI model to predict the 5 year survival probability in gastric cancer patients treated with gastrectomy 1 year after surgery using a large training cohort and many variables, including pre-and postoperative nutritional and fat/muscle indices. Our performance in predicting 5 year survival is overall accurate and may be helpful for healthcare providers and patients to increase survival after gastrectomy.