Machine Learning Models of Acute Kidney Injury Prediction in Acute Pancreatitis Patients

Background. Acute kidney injury (AKI) has long been recognized as a common and important complication of acute pancreatitis (AP). In the study, machine learning (ML) techniques were used to establish predictive models for AKI in AP patients during hospitalization. This is a retrospective review of prospectively collected data of AP patients admitted within one week after the onset of abdominal pain to our department from January 2014 to January 2019. Eighty patients developed AKI after admission (AKI group) and 254 patients did not (non-AKI group) in the hospital. With the provision of additional information such as demographic characteristics or laboratory data, support vector machine (SVM), random forest (RF), classification and regression tree (CART), and extreme gradient boosting (XGBoost) were used to build models of AKI prediction and compared to the predictive performance of the classic model using logistic regression (LR). XGBoost performed best in predicting AKI with an AUC of 91.93% among the machine learning models. The AUC of logistic regression analysis was 87.28%. Present findings suggest that compared to the classical logistic regression model, machine learning models using features that can be easily obtained at admission had a better performance in predicting AKI in the AP patients.


Introduction
Acute pancreatitis (AP) is an inflammatory abnormal condition of the exocrine pancreas, and most AP patients have mild disease courses and obtain recovery within one week [1]. There are about 20% of patients that will develop severe complications such as persistent organ failure and systemic inflammatory response syndrome (SIRS). Acute kidney injury (AKI) has long been recognized as a common and important complication of AP, and the incidence is as high as 10%-42% [2,3]. Furthermore, AP patients concomitant with AKI suffer from a poor prognosis with a mortality of 25%-75% [4][5][6][7]. Hence, the early identification and timely management of AKI in AP patients seem very important.
However, it is difficult to identify renal injury early depending on traditional indicators, and the main reason lies in that when there is an increase in serum creatinine (SCr) or a decrease in urine output, kidney damage has already occurred unstoppably [8].
Previous studies have identified a series of risk factors for predicting AKI, including triglyceride levels, age, male sex, procalcitonin, hypoxemia, abdominal compartment syndrome, and some biomarkers [9], and developed several AKI prediction models using classical regression methods [10][11][12]. However, their predictive performance was rarely reported regarding the area under the receiver operating characteristic curve (AUROC), the primary measure of the prediction model [13]. Furthermore, the classical logistic regression model is sensitive to the multicollinearity of independent variables, which makes the model easy to underfit and far from accurate. Recently, artificial intelligence applications have been gradually implemented in the medical field by using machine learning [14][15][16], having excellent performance in predicting complications compared to logistic regression analysis. Unsupervised learning and supervised learning are two types of machine learning used widely. Unsupervised learning such as random forest [17] and classification trees [18] allows the model to work on its own to discover information and mainly deals with the unlabeled data. Supervised learning such as extreme gradient boosting [19] learns from labeled training data and predicts outcomes for unforeseen data. However, there are few studies using machine learning approaches to predict acute kidney injury in AP patients.
Therefore, in this study, we aimed to develop AKI predictive models for AP patients by using different machine learning algorithms, mainly constituted of classification and regression tree (CART), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost), as well as comparing the their predictive performances with those of the classical multivariable logistic regression (LR) methods.

Patients.
We performed a retrospective observational study of AP patients admitted to the Center of Severe Acute Pancreatitis (CSAP) of Jinling Hospital, Nanjing, China, from January 2014 to January 2019. The center is a tertiary center for acute pancreatitis located in eastern China. Patients who met the following criteria were included: (1) diagnosis of AP and (2) admission to our department within one week after the disease onset. Patients who were older than 75 or younger than 18 already developed AKI before admission and suspected of chronic pancreatitis, pancreatic tumors, pancreatic trauma, and pregnancy were excluded to minimize bias. All the data were retrieved from a prospectively collected electronic database with the approval of the Acute Pancreatitis Database Management Committee. Informed consent from individuals was waived due to the retrospective, observational, and anonymous nature of the current study.

2.2.
Definition. AP (ICD-10, K85) was diagnosed according to the definition in the 2012 revision of the Atlanta classification [20]. Acute kidney injury (AKI) (ICD-10: N17) was diagnosed and staged using the Kidney Disease: Improving Global Outcomes (KDIGO) criteria based on serum/plasma creatinine and urine output [8]. And the patient meeting the diagnosis during the whole hospitalization of AP is calculated into the AKI group. Alcohol abuse (ICD-10, F10) and smoking (Z72.0, Z86.43, and Z87.891) were identified using relevant diagnostic codes.

Data Collection.
We collected data on demographic characteristics, previous medical history, physical examination, laboratory examination, and therapeutic treatments of each patient. Based on previous studies, we selected 23 possible risk factors for predicting AKI, including etiology, demographic data (age, gender, smoking, and alcohol abuse), body mass index (BMI), hypertension, intra-abdominal pressure (IAP), disease severity scores (APACHE II), acute respiratory distress syndrome (ARDS), and laboratory examination (amylase, lipase, triglyceride (TG), cholesterol, white blood cells (WBC * 10 9 ), c-reactive protein (CRP), interleukin-6 (IL-6), procalcitonin (PCT), total bilirubin (TBIL), alanine aminotransferase (ALT), hemoglobin (Hb), platelet (PLT), and prothrombin time (PT)). All of the data were available from the hospitalized patient electronic medical record system within 24 h after admission. However, the values of serum IL-6 levels were not complete (240 out of 334 total patients), so we filled the lost value with the mean of the remaining data.
2.4. Statistical Analysis. The population characteristics are presented using medians and interquartile ranges (IQR) for continuous variables and count and percentages for the dichotomous variables. For continuous variables, we used the Kolmogorov-Smirnov test to analyze the normalization of the distributed data and used Mann-Whitney U tests to analyze nonnormally distributed data. A p value < 0.05 was taken as statistically significant.
Prior to developing predictive models, the data collected were divided into 70% of the training dataset and 30% of the test dataset. The training dataset was used for developing predictive models using machine learning and logistic regression algorithms. The parameters of the models were continuously adjusted using tenfold cross-validation to reduce the chances of overfitting, and then, the final performance of each model was validated and compared in the test dataset. The area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy were adopted as the comparative measure between different models.

Logistic Regression Algorithms and Machine
Learning Algorithms 2.5.1. Logistic Regression (LR). The logistic regression model is a discrete selection and generalized linear regression analysis model [21]. It has been widely used in medicine, industry, and other areas. It uses the sigmoid function to map the predicted value to a probability value on ð0, 1Þ to help judge the result (Figure 1(a)). This model can be applied to both continuous and categorical independent variables.

Classification and Regression Tree (CART).
The classification and regression tree [18] is a tree-like prediction model ( Figure 1(b)). Each nonleaf node in the tree represents a feature value input by the model. The branch path under the node represents the possible attributes of the feature value. Each leaf node represents one or more samples, and the path taken from the root node to the leaf node represents the classification process of the sample. The decision tree   itself has no specific requirements for the input eigenvalues and can be used for both numerical data (including continuous and discrete outcome) and logical or categorical data. The CART algorithm uses the Gini index to select the optimal feature. The Gini index represents the purity of the model, and its value is between 0 and 1.

Random Forest (RF).
The random forest is an integrated classifier with multiple decision trees [17], which belongs to the bagging algorithm (Figure 1(c)). There is no dependency between the weak learners that can be generated in parallel and fitted. The outputs of the weak learners are combined (by mean, mode, etc.) as a model output. The random forest is an evolved version of the bagging algorithm which uses a CART decision tree as a weak learner.

Support Vector Machine (SVM).
The support vector machine [22] is a supervised learning model applied to classification and regression problems. For linearly separable problems, the model constructs hyperplanes (sets) in a high-dimensional or infinite-dimensional space to separate samples; for linearly inseparable problems, the model chooses a suitable kernel function (φ) to map the samples to a high-dimensional space that is much higher than the original space dimension, so that the samples are linearly separable in the high-dimensional space (Figure 1(d)).

Extreme Gradient Boosting (XGBoost).
The extreme gradient boosting (XGBoost) is an efficient system implementation of the Gradient Boosting Decision Tree (GBDT) algorithm, which belongs to the boosting algorithm [19,23]. (1) Weak learner 1 is trained with initial weights from the training set, (2) the weights of the training samples are updated according to the learning error rate, (3) the weights of weak learner 1 are increased, (4) weak learner 2 will be trained with new weights, and this will be iterated until the number of weak learners reaches the specified number T, and (5) finally, a total of weak learners are combined to obtain the final strong learner (Figure 1(e)).

Predictive Effects of Different Models.
We generated five models, including LR (logistic regression), SVM (support vector machine), XGBoost (extreme gradient boosting), RF (random forest), and CART (classification and regression tree), to predict the development of AKI in AP patients after admission. Figure 2 shows the performance of 5 different models in predicting AKI on the test dataset in terms of receiver operating characteristic (ROC) curves. The areas under ROC curves (AUC) demonstrated that the XGBoost model achieved the best predictive effects for AKI with an AUC of 0.9193 compared with other models. Taking the LR model as a reference, the XGBoost model and RF model outperformed it in predicting AKI while the SVM model and CART model failed as shown by AUC values. Table 3 presents a set of detailed performance metrics for the 5 models. As to all of the five metrics, the XGBoost achieved the best performance with the highest AUC (0.9193), the highest sensitivity (0.6190), the highest specificity (0.8815), and the second-highest accuracy (0.8631). The ranks of feature importance in each model are listed in Table 4. As shown, APACHE II, IAP, and PCT rank the top three features contributing to the development of the prediction models for AKI in AP patients.

Discussion
Acute kidney injury (AKI) is a common complication of AP, and its incidence is 14%-43% [2,3,24]. According to relevant research reports, AKI developed by AP may be caused by the release of a large number of inflammatory mediators and cytokines, which lead to microcirculation disorders and tissue damage [25]. At the same time, hypercoagulability and SIRS may cause damage to renal tubules [26]. In this study, PCT is the second most important risk factor in the XGBoost. The clinical outcomes of AP patients complicated with AKI are extremely poor, and the mortality reported in the previous studies is up to 40-70% [6,24]. Hence, it should be at the top of the priority list to identify high-risk patients and prevent their renal function from further deterioration.
We compared the performance of four machine learning models and the traditional logistic regression model to predict AKI in the early stage. The result showed that XGBoost achieved the best performance in predicting AKI in terms of the combined predictive performance and predictive stability. XGBoost is a scalable tree boosting system that is widely used by data scientists and provides state-of-the-art results on many problems. XGBoost helps to reduce overfitting compared to gradient tree boosting by only a random subset of descriptors in building a tree and is known as the "regularized boosting" technique. The balance between sensitivity and specificity for each of the algorithms should also be evaluated. In particular, XGBoost had higher specificity than sensitivity, meaning it is more prone to be correct in ruling out AKI than detecting it. Our results demonstrated that the XGBoost appears to be a very effective machine learning method in terms of specificity and accuracy.
We listed the features of the highest importance in the three best-performing models. The APACHE II score, IAP, PCT, and lipase turned out to be the top three most important features. The APACHE II score is a nonspecific scoring system, which is related to the severity and complications of AP [27,28]. Previous studies found that the APACHE II score is an independent risk factor for AP complicated with AKI [29,30]. The median APACHE II score of patients in the AKI group is much higher than that in the non-AKI group (18.28 vs. 9.87, p < 0:005). IAP is the most important feature in the XGBoost model, and previous studies showed that IAP is the independent risk factor for AKI [31][32][33][34]. Locally in the abdomen, intra-abdominal hypertension compresses and compromises blood flow in the renal parenchyma, vena cava, and renal vein. Increased IAP has a multitude of effects on the kidney through a series of mechanisms that result in a decrease in the glomerular filtration rate

Gastroenterology Research and Practice
(GFR) with oliguria, which usually is the first clinical evidence of kidney impairment [35,36]. Screening and intervention to decrease IAP and improve vital perfusion of the kidney are essential to minimize the negative effects [37].
Novel machine learning techniques are relatively free of these limitations of conventional statistical analysis and have demonstrated improved predictive performance compared to classical statistical methods, and machine learning has been used to predict AKI in some disease populations (e.g., severely burned patients and patients receiving liver transplants) and shows favorable performances [38,39]. Compared with traditionally static predictive models, deeplearning techniques have the advantages in the ability to automatically learn the features and relationship of the read-ily available data [40], which makes the early prediction of AKI possible before the significant changes in classical indicators, for instance, creatinine and/or urine output. Earlier identification of renal injury with the easily obtained medical data at admission provides a "therapeutic window" for clinicians to take preventive measures to avoid further renal function damage.
Previous studies showed that early detection and treatment of AKI can help most patients recover renal function and reach a better clinical outcome [41,42]. Therefore, it is particularly important to identify the risk factors and    Gastroenterology Research and Practice prognostic factors for acute pancreatitis with acute renal injury in the early stage, so as to develop a predictive model to help clinicians take preventive intervention measures and avoid renal function damage [43]. Our study provides a predictive model with machine learning algorithms that can give a better performance in predicting AKI of AP patients than the classical LR algorithm. A model using machine learning produced by our study may have a positive effect on the outcome of the AP patients. Our study has several limitations. Firstly, our analysis used only a small number of cases from data derived from a single AP treatment center. There may be some differences in the performance of machine learning techniques when they are applied to a sample of a different institution with a different distribution of covariates. Secondly, the study does not use the models produced by the last 5 years in our center, in other centers, or in some open databases.
Compared to the classical logistic regression model, machine learning models (XGBoost and RF) using features that can be easily obtained at admission had a better performance in predicting AKI in the AP patients. Predictive models using machine learning algorithms may help clinicians predict AKI early and may prevent the renal function from further injury.

Data Availability
The data in this study are available for other researchers to verify the results of our article, replicate the analysis, and conduct secondary analyses. Other researchers can send emails (e-mail address: njzyantol@hotmail.com) to contact us for obtaining our data.