Predicting Timing of Surgical Intervention Using Recurrent Neural Network for Necrotizing Pancreatitis

Previous research focused on the qualitative discussion of the correlation between surgical time and mortality in acute pancreatitis (AP). Recommendations for surgical timing of necrotizing pancreatitis are delayed as far as possible, without recommendations for individuals. The aim of this article is to predict timing of surgical intervention in necrotizing pancreatitis with recurrent neural network (RNN). Time series data in AP were retrospectively extracted from a hospital in China (n = 15,813) to develop model, and the Cerner Health Facts database in the United States (n = 142,650) was used to externally validate. The developed model, time-aware Phased-Decay long short-term memory (LSTM), was used to predict timing of surgical intervention and critical clinical features in necrotizing pancreatitis based on laboratory tests, compared to other machine learning models. Area under the ROC curve (AUC) of RNN-based models was more than 0.70. The AUC of time-aware Phased-Decay LSTM (0.75) with more explanatory was similar to that (0.76) of Phased-Decay LSTM to predict surgical timing. The developed model visualized the specific surgical process and laboratory indicators changes for the patients with AP from the onset to discharge. Different clinical features had different contribution modes with time to the specific surgical event in AP. Heart function contributed the most to the prediction of surgical intervention at the onset and in the first week. Our developed model could monitor the specific surgical process from the onset of AP to discharge and extract the contribution modes of clinical features with time in necrotizing pancreatitis.


I. INTRODUCTION
Acute pancreatitis (AP), an inflammatory disorder of pancreas, is the leading cause of admission to hospital for gastrointestinal disorders in the United States and many other countries [1], [2], and cost an estimated 2.6 billion dollars per year in hospitalization [3], with an annual incidence ranging from 13 to 45/100,000 people [4]- [6]. The inpatient mortality of AP is 0.8% [7], and in severe cases, it could even reach 30% [8] where presenting many therapeutic challenges [9], The associate editor coordinating the review of this manuscript and approving it for publication was Yudong Zhang . [10]. Severe AP is characterized by the presence of either infected (peri)pancreatic necrosis or persistent organ failure (POF) [11]. The current management guideline for necrotizing pancreatitis from IAP/APA [8] recommends delaying the timing of surgical intervention until four or more weeks. In addition to IAP/APA guideline, recommendations for surgical timing of necrotizing pancreatitis in the United States, United Kingdom, Italy, and Japan are also delayed as far as possible, without recommendations for individuals [12]- [15].
A prospective study observed that POF in the first week was more likely to determine mortality than infection in patients with necrotizing pancreatitis [16]. While a VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ prospective cohort study from the Netherlands showed that there were no associations between infection, onset of organ failure, duration of organ failure and mortality in patients with necrotizing pancreatitis [17]. Additionally, current studies cannot explain the relationship between the suggested surgical indications of necrotizing pancreatitis and surgical timing [18]- [21]. A randomized controlled trial (RCT) which was established to optimize timings of surgery following percutaneous catheter drainage (PCD) in patients with infected pancreatic necrosis was forced to stop early due to practical difficulties [22]. The surgical timing problem has not been resolved by RCT. Therefore, in order to predict the timing of surgical intervention for necrotizing pancreatitis individually, a novel recurrent neural network (RNN) based deep learning model, namely time-aware Phased-Decay LSTM, was developed and applied. There are three contributions in this study: 1) we developed a novel deep learning model, which performed better than conservative machine learning approaches; 2) the developed model was applied to predict the timing of the specific surgical intervention and important laboratory indicators in necrotizing pancreatitis for the first time; 3) patients with necrotizing pancreatitis could be monitored from the onset of AP to discharge by using our developed model.

II. RELATED WORK
In order to solve specific medical problems, many machine learning and deep learning algorithms are improved to be effectively applied and solved on the medical problems [23]- [28]. Clinical studies generally have a small sample size, so data enhancement or transfer learning is needed to help train the model [29]- [32]. However, the sample size of this study is relatively large in clinical practice, so data enhancement and transfer learning can be avoided.
Additionally, predicting the timing of surgical intervention in patients with necrotizing pancreatitis is a very meaningful task. However, previous research just focused on qualitative discussion of the correlation between surgical time and mortality rate [33]. Even though our previous study has proved that there is a L-shaped relationship between timing of surgical intervention and risk of death in necrotizing pancreatitis patients [34], it cannot explain the relationship between the timing of surgical intervention and risk of death personally. The methods they used were mainly based on hypothesis testing and regression models. Machine learning methods were also used to classify infected necrotizing pancreatitis for surgery within or beyond four weeks [35]. However, modeling temporal information is common in the field of clinical research. Wu et al. made a comprehensive summary of the characteristics of multivariate time series [36]. The bin method is often adopted because time stamp is unified and the sequences are changed from asynchronous to synchronous [37], [38]. The typical approach is to fill the missing values in some way such as mean value filling, last observation filling, generating missing mask information, filling with zero etc., and then to predict by using classifiers [37]- [39]. A good approach is to integrate missing value filling and modeling prediction. A representative work is applying decay based on gated recurrent unit (GRU), a variant of RNN, named GRU-D [40]. The GRU-D model introduces the concept of decay, which makes full use of the missing mask information of the variables and the interval time information between the monitoring points to fill the missing values. The filling process is guided by the prediction results. Such an architecture can relatively give a more reasonable padding value. However, GRU-D may fill in some values that should not exist, thereby losing the information of whether the patient was actually followed up, which can be solved by phase gate of the phased long short-term memory (Phased-LSTM) [41], [42], another variant of RNN. Recently, with respect to modeling for clinical time series data, Choi et al. used the attention mechanism, which weighted two dimensions of time and variables and achieved good accuracy and interpretability [43]. Neil et al. extended the gate mechanism of RNN to automatically choose which moments of information are valuable for hidden state updates. As a result, not only the model training was accelerated, but better accuracy also was achieved [42]. It was reported that in the time series prediction task, the performance of the RNN-derived model is better than that of the Hidden Markov model (HMM) [44].

III. TIME-AWARE PHASED-DECAY LSTM A. MOTIVATION
It is very challenging to fuse and model based on the asynchronous, unequal length, and heterogeneity data. Therefore, laboratory tests only were used to predict. According to our data, laboratory tests included 1 ∼ 33 measurements, and the average number of measurements was 5.36 in AP. Those sequences were characterized by unequal spacing, unequal length, and asynchronous. RNN and its derivative models can handle the problem of unequal length for temporal sequences, which cannot be handled by HMM. Unequal spacing and asynchronous can result in missing values during temporal sequences analysis. We adopted decay to improve the missing issue in this study. The introduction of phase gate enables RNN to automatically learn the weights in the time dimension and control the hidden layer update. At the same time, time is also mapped into a vector of the same length as the input vector and multiplied by it to achieve the effect of filtering redundant information from the variable dimension. Missing values at each time point in sequences are different, and decay can fill in sequences. However, decay is not able to directly solve the problem of rationality of filling by using phase gate. Therefore, we introduced weight gate, which can automatically learn different time points and give their weights, so as to make up for the problem of filling rationality. On the other hand, as AP is a sudden-onset and severe disease, time is very important for the treatment process, and the variables are different for the importance of surgical intervention at different time points. The time-aware mechanism proposed in this study can maximize the usability of time information and provide important clinical features for surgical intervention of necrotizing pancreatitis at different time points. This is the motivation for developing time-aware Phased-Decay LSTM.

B. MODEL STRUCTURE
Time-aware Phased-Decay LSTM uses information from the current time and previous time to predict whether an event will happen in the next period. Fig. 1 shows its model structure, where t is a time point in the range from the onset of AP to discharge in an encounter; X t is the input vector at t; m t is the missing value mask vector at t;X is the mean vector; γ t is the decay rate that indicates how much information will be retained from t − 1 to t; h t is the hidden layer at t; C t is the cell state at t; k t is the value of the phase gate function calculated by t; w t is the weight that is mapped by t.
We use the standard LSTM framework to explain the neural network framework in our developed model, which includes forget gate layer, input gate layer, cell state, output gate layer, and hidden layer, as shown in Eq. (1) ∼ Eq. (5), respectively.
After introducing the standard LSTM, we will further add decay, time-aware and phase gate. Before inputting for LSTM, decay [40] is used to solve the problem of missing values. Suppose there are N patients, D variables, and T time points in total. If only the case of the n-th patient is considered, then four types of variables including time series X, masking for X, timestamps for X, and time interval for X, as shown in Eq. (6) ∼ Eq. (9), respectively, are as input, as well as previous moment for X.
Decay rate is defined by: where δ t is D-dimensional time interval vector. m t is the mask of missing features with a binary masking vector to indicate whether a feature is measured or not at t.X is the global mean vector. The input and hidden layer for LSTM can be updated as Eq. (11) and Eq. (12), respectively.
Time weight is determined by Eq. (13), which is developed to update the input for LSTM as shown in Eq. (14). Time weight represents time-aware mechanism in this work.
Phase gate function k t of Phased-LSTM [42] is used to achieve more efficient training over the irregularly sampled data. To reduce the calculation burden and achieve the same effect as the previous segmented function, we use sin function to get k t formulating as where s represents phase position and c represents adjustment of the vertical position of the periodic function through updating: The prediction values can be calculated by: where softmax function is for classification output and identity function is for continuous output, and W , s, c, b are parameters. We got parameters by minimizing the loss functions defined below.

C. PREDICTING THE TIMING OF SPECIFIC SURGICAL INTERVENTION
We defined our loss function for the single task of predicting the timing of specific surgical intervention in AP as: where batch represents the number of patients in an iteration, T represents the number of measurement of features for a patient in an encounter, ϕ t indicates the real label of surgical intervention andφ t represents the prediction value.

D. MULTI-TASK PREDICTION
We predicted critical laboratory indicators for AP such as C-reactive protein, white blood cell count (WBC), creatinine, and calcium ion, as well as predicting the timing of specific surgical intervention using the development cohort. These critical laboratory indicators were output as continuous values. The loss functions for multi-task were defined as: where ψ e is the real label,ψ e is the predicting value of critical laboratory indicators and E is the number of critical laboratory indicators.

IV. EXPERIMENTS A. PATIENTS
For necrotizing pancreatitis, surgical treatment can be classified into three categories: drainage, pancreatectomy, and pancreatic necrotic tissue removal plus extensive drainage [45]. Therefore, in this study, patients who underwent any of the three surgical interventions described above were defined as patients with necrotizing pancreatitis to develop our model. We collected data on patients admitted to West China Hospital of Sichuan University from 2010 to 2018. The inclusion criteria for the study patients were: 1) diagnosis with AP on admission based on International Classification of Diseases (ICD) code; 2) having as least one surgical intervention experience including pancreatic drainage, pancreatic debridement, or pancreatectomy in an encounter. 15,813 patients diagnosed with AP were initially identified. We further screened patients who underwent the specific surgery and survived after surgery. Finally, 1,112 patients were included in our development cohort (Fig. 2).

C. TRAINING AND TESTING
Given a time point t, if the time of any specific surgical intervention was included in the range of [t, t + 7) days, the label of y at t was 1, otherwise it was 0 in the development cohort. All laboratory tests information from admission to t was regarded as input, and then features with a missing rate of less than 85% were selected. The missing rate was calculated by the formula: where I ij represents whether it is missing and n is the number of patients. The length of the time point was taken as 15. The original length was kept if the length was less than 15, and the last 15 time points were taken if the length was over 15.
We used the union of the monitoring points of each sequence, which preserves the original time information to the greatest extent.
The raw dataset after pre-processing in the development cohort was further divided into training dataset, validation dataset and testing dataset according to the proportions of 70%, 20% and 10%. In order to compare with traditional classifiers that perform well, such as support vector machine (SVM) and random forest (RF), SVM and RF with decay were adopted to predict the timing of the specific surgical intervention. We also compared standard LSTM and GRU, time-aware LSTM and GRU, and Phased-LSTM, where decay was used to fill missing values.
The definition of the label in the single task of prediction is whether to perform target surgery in a certain time window in the future. Obviously, the length of the time window is a hyperparameter. It is very important for AP. Based on the performance of the validation dataset, the size of time window was tuned to determine the specific period of time for the target surgery. The time window ranging from 3 to 10 days were tested based on the best-performing model in the development cohort. Fig. 3 shows the accuracy and AUC from the best-performing model by tuning the time window from 3 to 10 days. When the time window was set as sevenday, the model performed best based on accuracy or AUC.
The number of hidden layer neurons was set to 200. The batch size was set to 350 and the learning rate was set to 0.001. Models were optimized using a gradient descent approach. The performance of models was an average of 100 epochs. We added dropout layer to avoid overfitting when training [47]. Preprocessing of data, SVM and RF were done with R (version 3.6.0), and RNN-based models were done with Python (version 3.5).

1) PERFORMANCE OF THE DEVELOPMENT COHORT
In a single prediction task, standard LSTM and GRU obtained higher area under the ROC curve (AUC) in testing than SVM and RF. With the addition of time-aware, the AUC of testing increased compared to standard LSTM and GRU. Phased-Decay LSTM (P-D-LSTM) had the highest AUC of testing at 0.76. The AUC of testing of time-aware Phased-Decay-LSTM was like that of Phased-Decay LSTM, which was 0.75. The AUC of testing from multi-task prediction  Table 1 also shows the standard deviation of performance.

2) VISUALIZATION OF AP FROM THE ONSET TO DISCHARGE
We visualized the changes in the patient's status since the onset of AP. A patient case was taken as an example. Fig. 4 shows the predicted possibility of surgery for the AP patient from onset to discharge based on the laboratory indicators using time-aware Phased-Decay LSTM. The estimated probability for patients who didn't have surgery should be less than 0.5, while the estimated probability for patients who have undergone surgery should be higher than 0.5. As seen in Fig. 4, our developed model learned the changes in the laboratory indicators to accurately predict the timing of the specific surgical intervention.  Except for the contribution of the coagulation group, which increased during the first week, the other groups all declined to a certain degree. The contributions of infection index, pH balance, kidney function and heart function all declined before four weeks, and then flattened after four weeks. The highest contribution was heart function and the lowest contribution was thrombosis index.

E. EXTERNAL VALIDATION
We used the Cerner Health Facts database to predict the timing of any surgical intervention in AP to validate the performance of our developed model externally. At first, based on the ICD code, patients diagnosed with AP on admission (142,650 patients) were extracted. Due to the differences in surgical records between EHR of the Cerner Health Facts database and EMR of West China Hospital of Sichuan University, we extracted the patients with AP who had any surgical intervention experience in the Cerner Health Facts database. A total of 1,597 patients were included in the external validation dataset. The data processing in the external validation dataset was the same with that in the development dataset.
By using our developed model, the AUC can reach 0.71 to predict the timing of any surgical intervention in AP in the external validation cohort, close to the performance of the development cohort, where the performance also got about 0.70 based on the data from Northeast, Midwest, South and West region of the Cerner Health Facts database (Table 2).

V. DISCUSSION
This study developed a novel model, time-aware Phased-Decay LSTM, which could give the importance of clinical features with time in acute disease. For the first time, the novel model based on RNN was applied to the timing prediction of the specific surgical intervention in AP. And by using the model, the specific surgical process of AP from the onset to discharge could be monitored.
Because LSTM can forget and update the long-term state for better prediction, while SVM and RF contain redundant information, which affects performance. LSTM performed better than SVM and RF, which illustrates the advantages of time series models in processing time series data. After adding time-aware, the performance of the model was better compared to standard LSTM and GRU because AP is timesensitive. In addition to predicting the timing of surgical intervention, the novel model can also predict clinical features that surgeons believe are important.
The performance of our developed model was not as good as that of Phased-Decay LSTM when predicting timing, but it was close. It shows that phase gate function of Phased-Decay LSTM can effectively use time information, however, it was hard to explain. By using time-aware, we got the importance of clinical features in the time dimension for the specific surgical intervention. It indicates that in the developing course of AP, at different time points, we can focus on the more important features. For example, heart function contributed the most at the onset and in the first week.
Based on the data, we found the seven-day time window was the best prediction interval because if the frequency of laboratory tests monitoring was low in EMR, it cannot be accurately used to predict in a narrow window, and some laboratory indicators may have a large change of value after one week which makes it difficult to predict accurately in an excessively long window. The specific surgical process of patients with AP from the onset to discharge was monitored dynamically for the first time through seven-day time window.
The prediction results for some patients were not accurate. Due to the asynchronous, unequal length, and heterogeneity of the data, it is very challenging to monitor the patients with AP from the onset to complications to surgical intervention to discharge in real time based on EMR, which takes time. Thus, the possible reason for the failure is the inclusion of laboratory information only. To some extent, it illustrates that our developed model is useful for predicting the timing of the specific surgical intervention in AP. By using this patient-specific model, whether a patient with AP needs a specific surgical intervention in the next week can be determined in advance. It would be a great help when making surgical decisions in AP. The Cerner Health Facts database validated the generalization ability of our developed model.
There are some limitations in this work. We collected patients' data for necrotizing pancreatitis retrospectively, rather than prospectively. There may be some misclassifications. It is the first time to use RNN-based model to predict the timing of surgical intervention in necrotizing pancreatitis and there were some failure prediction cases, which may be related to the lack of including information (laboratory information only). Next, we will collect more multimodal data of AP, like clinical notes, images and EMR to improve the performance of our model.

VI. CONCLUSION
In this study, the developed novel model, time-aware Phased-Decay LSTM, could extract the contribution modes of clinical features with time that was important for surgical intervention and monitor the surgical process from the onset of AP to discharge in necrotizing pancreatitis. XIAOBO ZHOU received the Ph.D. degree in applied mathematics from Peking University. He is currently a Professor and the Director of the Center for Systems Medicine, School of Biomedical and Bioinformatics, The University of Texas Health Science Center at Houston. His research interests include data sciences, imaging informatics, and clinical informatics. VOLUME 8, 2020