Early Prediction of Mortality for Septic Patients Visiting Emergency Room Based on Explainable Machine Learning: A Real-World Multicenter Study

- ¹Department of Medical Informatics, School of Medicine, Kangwon National University, Chuncheon, Korea.
- ²Institute of Medical Science, School of Medicine, Kangwon National University, Chuncheon, Korea.
- ³Department of Medical Bigdata Convergence, Kangwon National University, Chuncheon, Korea.
- ⁴Department of Convergence Security, Kangwon National University, Chuncheon, Korea.
- ⁵Department of Biomedical Research Institute, Kangwon National University Hospital, Chuncheon, Korea.
- ⁶University-Industry Cooperation Foundation, Kangwon National University, Chuncheon, Korea.
- ⁷Department of Research and Development, ZIOVISION Co. Ltd., Chuncheon, Korea.
- ⁸Department of Internal Medicine, Kangwon National University Hospital, Chuncheon, Korea.
- ⁹Department of Internal Medicine, School of Medicine, Kangwon National University, Chuncheon, Korea.
- ¹⁰Department of Computer Science and Engineering, Kangwon National University, Chuncheon, Korea.
- ¹¹Department of Computer Science and Engineering, Seoul National University of Science and Technology, Seoul, Korea.
- ¹²Department of Pulmonary and Critical Care Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea.
Address for Correspondence: Jeongwon Heo, MD, PhD. Department of Internal Medicine, School of Medicine, Kangwon National University, 1 Kangwondaehak-gil, Chuncheon 24341, Republic of Korea. Email: doctorhjw@naver.com

^*Sang Won Park and Na Young Yeo contributed equally to this work.

Received June 15, 2023; Accepted December 05, 2023.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Worldwide, sepsis is the leading cause of death in hospitals. If mortality rates in patients with sepsis can be predicted early, medical resources can be allocated efficiently. We constructed machine learning (ML) models to predict the mortality of patients with sepsis in a hospital emergency department.

Methods

This study prospectively collected nationwide data from an ongoing multicenter cohort of patients with sepsis identified in the emergency department. Patients were enrolled from 19 hospitals between September 2019 and December 2020. For acquired data from 3,657 survivors and 1,455 deaths, six ML models (logistic regression, support vector machine, random forest, extreme gradient boosting [XGBoost], light gradient boosting machine, and categorical boosting [CatBoost]) were constructed using fivefold cross-validation to predict mortality. Through these models, 44 clinical variables measured on the day of admission were compared with six sequential organ failure assessment (SOFA) components (PaO₂/FIO₂ [PF], platelets (PLT), bilirubin, cardiovascular, Glasgow Coma Scale score, and creatinine). The confidence interval (CI) was obtained by performing 10,000 repeated measurements via random sampling of the test dataset. All results were explained and interpreted using Shapley’s additive explanations (SHAP).

Results

Of the 5,112 participants, CatBoost exhibited the highest area under the curve (AUC) of 0.800 (95% CI, 0.756–0.840) using clinical variables. Using the SOFA components for the same patient, XGBoost exhibited the highest AUC of 0.678 (95% CI, 0.626–0.730). As interpreted by SHAP, albumin, lactate, blood urea nitrogen, and international normalization ratio were determined to significantly affect the results. Additionally, PF and PLTs in the SOFA component significantly influenced the prediction results.

Conclusion

Newly established ML-based models achieved good prediction of mortality in patients with sepsis. Using several clinical variables acquired at the baseline can provide more accurate results for early predictions than using SOFA components. Additionally, the impact of each variable was identified.

Graphical Abstract

Keywords

Clinical Decision Support System (CDSS); Explainable Artificial Intelligence (XAI); Machine Learning; Mortality Prediction, Sepsis

INTRODUCTION

Sepsis is a severe and potentially life-threatening condition affecting millions of people worldwide every year.1, 2 It is a complex syndrome that arises from the body’s response to an infection and can lead to organ dysfunction, septic shock, and death if not treated promptly and adequately.3, 4 If we can accurately predict the prognosis of patients with sepsis, we can distribute the limited medical resources more efficiently and improve sepsis prognosis.5 In particular, early prediction of the prognosis of patients presented to an emergency room (ER) with sepsis is crucial for making informed decisions about their management and optimizing their chances of survival.6

Currently, sequential organ failure assessment (SOFA) component scores are used in clinical risk stratification of patients with sepsis; however, they have limitations of poor specificity and sensitivity.7, 8, 9, 10, 11 The SOFA score is one of the main criteria for sepsis-3 definitions. Several previous studies have suggested predictive results for patients with sepsis using the SOFA score; however, they have achieved varying performances in predicting short-term mortality.12, 13, 14 In other words, the generalizability of the prediction models is limited. The SOFA component score comprises six organ scores, each set in the range 0–4 points. It was calculated by simply adding the scores of the six organ systems and did not consider the interaction with various organ systems.11 Thus, it may have different effects on the mortality risk of patients with sepsis and reduce the predictive power of the model.

Recently, machine learning (ML) has attracted the attention of and gained recognition from clinicians owing to the evolution of computer technology.15 ML studies have emerged as a promising approach for predicting sepsis mortality, which can help clinicians identify high-risk patients and provide timely and appropriate treatment.16, 17, 18 In addition, the application of new ML techniques and model implementations can be widely used in predictive models of various diseases and can exhibit better performance than conventional logistic regression (LR) or Cox regression analysis.19, 20 Recent studies have shown that ML models have good predictive ability for identifying patients with sepsis who are at a high risk of mortality.21, 22, 23 ML models can analyze large amounts of patient data, such as vital signs, laboratory results, and medical histories, and can identify patterns who may not be apparent to human clinicians.24, 25 Furthermore, the accuracy of mortality predictions can be significantly improved using large-scale data. These models can provide personalized risk predictions for individual patients, thus helping clinicians make informed decisions regarding treatment and management. They can also provide insights into the factors contributing to a patient’s risk of death, which can help design a personalized treatment plan that can lead to improved patient outcomes and more efficient allocation of healthcare resources. However, there is a lack of evidence demonstrating the advantage of ML algorithms to predict mortality in sepsis patients by leveraging large amounts of clinical data acquired from early-patient conditions and comparing them to SOFA component scores to demonstrate their superiority.

This study aims to develop an ML model to predict the mortality rate of patients with sepsis admitted to an ER using baseline data. In addition, a new ML framework can be constructed using various clinical variables in a large dataset to achieve better performance than the SOFA score. We use explainable artificial intelligence (XAI) to confirm the effects of individual factors on the mortality prediction performance in patients with sepsis.

METHODS

Data source and participants

This prospective cohort study used data from an ongoing nationwide cohort study conducted by the Korean Sepsis Alliance. The patients were enrolled from 19 participating hospitals between September 2019 and December 2020. The protocols for patient enrollment and data collection have been described previously.26 Patients were included if they were at least 19 years old and diagnosed with sepsis or septic shock in the emergency department. The diagnoses of sepsis and septic shock were based on the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3).4 We investigated patients who were diagnosed with sepsis visiting an ER, regardless of whether they were admitted to a ward or intensive care unit (ICU). Patients were excluded if they were not prescribed antibiotics, prescribed antibiotics that were not in accordance with the guidelines, prescribed antibiotics to which the cultured organism proved to be resistant.27 The average time that all patients investigated in this study stayed in ER was 12 hours. Empiric therapy was defined as the initial use of antimicrobial treatment without specific identification of the causative pathogen. The empiric therapy appropriateness is evaluated based on the results of drug susceptibility tests or relevant guideline recommendations.28 Crystalloid fluid was administered for bolus loading. The volume at the consideration of the treating clinician but at least 30 mL/kg.29

Data source

We used multidimensional data acquired from 5,112 patients with sepsis (demographic characteristics, clinical information, pre-existing comorbidities, initial laboratory findings, and initial characteristics of infection) as variables (Table 1). A total of 44 variables were selected for ML modeling.30 The final 44 clinical variables were selected by a respiratory specialist after selecting only variables with an NA value of less than 10% from a total of 213 variables in the cohort (Fig. 1). For the SOFA component score, six organ system-categorized variables (respiratory, cardiovascular, liver, renal, coagulation, and neurological) were selected and used (Table 2, Supplementary Table 1).

Table 1
Subject characteristics for clinical variables

Click for larger image
Click for full table
Download as Excel file

Fig. 1
Flowchart of patient selection and machine-learning-model application for mortality prediction. Of the 7,113 patients diagnosed with sepsis, 5,112 patients were selected after excluding patients with missing values and outliers. Of all selected patients, 80% were used for model training and 20% for model testing.
NA = not available, LR = logistic regression, SVM = support vector machine, RF = random forest, XGBoost = extreme gradient boosting, Light GBM = light gradient boosting machine, CatBoost = categorical boosting.

Table 2
Subject characteristics for SOFA score

Click for larger image
Click for full table
Download as Excel file

Data pre-processing

Raw-data prior information can directly affect the optimized classifier performance. In this regard, data pre-processing is important to obtain efficient mortality classification performance using ML algorithms.31 Data normalization is essential for scaling and transforming data to minimize bias. We removed the NA values that existed in the dataset used in this study and the values that were incorrectly written under the decision of an expert clinician and judged to be outliers. In addition, the skewness method was applied to biased data variables to determine the outliers that exist within quantitative values and transform them into a normal distribution. Finally, max–min normalization pre-processing was performed.

Data split and validation

Of the total data, 80% were used for training, and the remaining data were used as test data. Validation data were used for 20% of the total training set. Of the selected participant data (i.e., all participants), 20% were removed entirely from the cross-validation-based estimation of hyperparameter values for each of the four classification methods. In addition, stratified k-fold cross-validation (k = 5) was performed to avoid label distortions that might occur during model generation and maintain model stability (Fig. 2). The stratified k-fold cross-validation technique was similar to the regular k-fold cross-validation, except that stratified sampling was used instead of random sampling. To demonstrate the reliability of the individual model results, half of the entire test dataset was randomly sampled, and repeated tests were performed 10,000 times to present the confidence intervals (CIs) for the test results.

Fig. 2
Stratified k-fold cross-validation. Each result is averaged to obtain the validated performance of the model.

Statistical analysis

Continuous variables are presented as mean with standard deviation. The Student’s t-tests were used to confirm the difference in mean ratio between alive and death groups. Categorial variables are expressed as number and percentage, and were compared using the χ² tests.

ML

LR

As a form of supervised learning in ML, LR is a conventional probabilistic statistical model for classification that has been broadly used across disciplines in medical sciences.32 The LR models the relation between a continuous independent and categorical dependent variables. It can predict and classify a sample to a group as a probability value between 0 and 1, which learns the relationship between the independent variables x1, x2, ..., xn, and the dependent variable y as a specific function. In other words: y = (w1x1 + ... + wnxn), where w1, ..., wn are trainable parameters and σ denotes the sigmoid function, such that σ(t) = 1/(1 + e−t). In linear regression, the predicted dependent variable falls within the range [-∞, ∞]. The LR to classify binary tasks becomes possible by application of the sigmoid function, which always returns a probability in the range [0, 1].

Support vector machine (SVM)

The linear discriminant function of the SVM is an algorithm that allows classification by defining a decision hyperplane in two or more dimensions.33 The SVM is a commonly used algorithm in ML for classification tasks and it uses the optimal hyperplane, which maximizes the gap between two groups. The points at which each datum is distributed and the distance between the classes of parallel hyperplanes passing the support vector through the optimal separating hyperplane are used for classification using margins, such as hard or soft margins, which can be determined by maximizing. The hyperplane is not unique and can be estimated by maximizing the classifier’s performance, that is, the classifier's ability to operate satisfactorily with any data.

Random forest (RF)

RF is an ensemble learning method that creates a multitude of decision trees at training time and outputs the class, which is the mode of the classes (classification) or mean prediction (regression) of the individual trees.34 Each tree in the RF is trained on a randomly selected subset of the data and features, which helps to reduce overfitting and improve the generalizability of the model. During the prediction, each tree in the RF produces a class prediction, and the class with the most votes across all trees is selected as the final prediction.

Gradient boosting

Gradient boosting is an ensemble method that has the advantage of constructing a strong classification by combining several common weak classifiers. In this study, extreme gradient boosting (XGBoost), light gradient boosting machine (Light GBM), and categorical boosting (CatBoost) were widely used for analyzing structural tabular data.35 ^– 37

XGBoost is a tree-based ensemble-learning algorithm that combines multiple decision trees and uses classification and regression based on gradient descent. It expands the decision trees horizontally (i.e., levelwise) to reduce their depth. It works well on imbalanced datasets and has excellent accuracy and speed. Light GBM is an algorithm similar to XGBoost, but can learn faster for large datasets. It uses a leaf-centered histogram-based segmentation algorithm and applies a leafwise division method to make decisions. The CatBoost algorithm specializes in handling categorical variables. It applies its own categorical data processing method, and can process both numerical and categorical variables. It uses the symmetric tree method and ordered target encoding technique and has a built-in overfitting prevention function. We reflected on the characteristics of the high-dimensional and large-capacity datasets used in this study and confirmed the adequacy of the models. Because all three methods have relative strengths and weaknesses, we compared their performance in predicting mortality in patients with sepsis.

Interpretation model based on Shapley additive explanations (SHAP)

We explained the classification prediction of our model with SHAP,38 which is a method to explain models with feature importance and inspired by the concept of coalition game theory by replacing “player” in the theory with “data feature” in tabular data. In the ML context, agents correspond to the features of the data, and the goal is to explain the model’s prediction. This was deduced to be a linear function of this feature. Original model f is explained using surrogate model g. Surrogate model g is defined as follows:

gz = Φ0 + ∑j=1MΦjzj,

where z' denotes the coalition vector, the attribution of feature j, and M denotes the maximum number of features.

Evaluation performance

Five model performance metrics (accuracy, recall, precision, F1 score, and specificity) were used to evaluate the proposed model performance. As this study focused on the accurate classification of survival for patients with sepsis, the true positive metric was established for the overall performance evaluation of the classification model.

Tools

All programming work for ML in this study was performed in Python programming language (version 3.9.16) and statistical analysis was conducted in R (version 4.2.2). All ML modeling was performed using Scikit-learn version 1.2.0 and hyperparameter tuning was performed using Optuna, a hyperparameter optimization framework.39, 40

Ethics statement

The study was approved by the Institutional Review Board of each participating hospital, and the requirement for informed consent was waived because of the non-interventional observational nature of the study. This study was approved by the Institutional Research Board of Kangwon National University Hospital (KNUH-2022-11-010-001), and written informed consent was obtained from all patients.

RESULTS

Subject characteristics

Tables 1 and 2 summarize the subjects used in this study with descriptive statistics of survivors and non-survivors of sepsis from the cohort. Table 1 lists the participants’ information regarding the selected clinical variables, and Table 2 lists the SOFA scores. The characteristics of subjects with sepsis were stratified by training and test datasets for selected clinical variables and SOFA scores (Supplementary Tables 2 and 3).

Model performance

In selected clinical variables, of all six ML methods, area under the curve (AUC) of the CatBoost model (0.800; 95% CI, 0.756–0.840) exhibited the highest discriminative ability (Table 3), followed by the Light GBM model (0.795; 95% CI, 0.750–0.836) and XGBoost model (0.797; 95% CI, 0.754–0.838). The accuracy of XGBoost (0.773; 95% CI, 0.744–0.802) also exhibited the highest accuracy among all six ML methods, followed by the CatBoost model (0.769; 95% CI, 0.742–0.798) and Light GBM model (0.763; 95% CI, 0.726–0.797). The RF model (0.736; 95% CI, 0.688–0.783) had the lowest AUC among all ML models. Among the ML models, the CatBoost model exhibited the highest specificity (0.937; 95% CI, 0.910–0.962), followed by the Light GBM (0.937; 95% CI, 0.910–0.962) and XGBoost (0.929; 95% CI, 0.902–0.954) models. The XGBoost model also exhibited the highest AUPRC (0.626, 95% CI 0.558–0.697), similar to the CatBoost (0.621; 95% CI, 0.551–0.694) and Light GBM (0.617; 95% CI, 0.546–0.689) models. All metrics of the different models for predicting sepsis-related mortality are shown in Fig. 3 and Table 3. Most performance metrics of the Boost model were superior to those of the other three models.

Table 3
Model performance results

Click for larger image
Click for full table
Download as Excel file

Fig. 3
Receiver operating-characteristic curve of six machine learning models for two datasets. (A) Clinical variables. (B) SOFA component scores.
LR = logistic regression, SVM = support vector machine, RF = random forest, XGB = extreme gradient boosting, LGB = light gradient boosting machine, CAT = categorical boosting.

For the SOFA score, of all six ML methods, the AUC of the XGBoost model (0.678; 95% CI, 0.626–0.730) exhibited the highest discriminative ability, similar to that of the CatBoost (0.676; 95% CI, 0.623–0.727), Light GBM (0.667; 95% CI, 0.615–0.720), and RF (0.672; 95% CI, 0.620–0.723) models (Table 3). The accuracy of CatBoost model (0.732; 95% CI, 0.708–0.755) was the highest among all six models, followed by XGBoost (0.725; 95% CI, 0.703–0.750) and SVM (0.675; 95% CI, 0.636–0.712). LR model (0.629; 95% CI, 0.587–0.669) had the lowest AUC among the ML models.

Among the ML models, CatBoost demonstrated the highest specificity (0.956; 95% CI, 0.934–0.975) followed by XGBoost (0.948; 95% CI, 0.924–0.970). The F1 score of the RF model (0.480; 95% CI, 0.420–0.536) was higher than that of the other models, followed by the Light GBM model (0.461; 95% CI, 0.398–0.522), similar to the results of the LR model (0.469; 95% CI, 0.412–0.524). All metrics of the different models are shown in Fig. 3 and Table 3.

Interpretation of models by SHAP

To interpret the contribution of each variable to each model result, the Shapley value was used to measure the influence of all variables in each model. The SHAP global bar plot explains the global feature importance, where the global importance of each feature is taken as the mean absolute value for that feature over all the given test datasets. These visualizations show the impact on the overall prediction of all models in the order of the relative importance of each variable. Figs. 4A and 5A show the explainable results in the CatBoost and Light GBM models for the clinical variables and SOFA score, respectively. For a more comprehensive understanding of the variable importance in each model, we present all SHAP results in the Supplementary Figs. 1 and 2. In the clinical variable dataset, lactate and albumin had greater effects than all other models among the 44 clinical variables, and blood urea nitrogen had a high effect in all models.

Fig. 4
Feature importance for clinical variables based on interpretation using Shapley’s additive explanations of categorical boosting. (A) Global importance of each feature as presented mean absolute value. (B) Information-dense summary of how the top features in a dataset impact the model output.
PaO₂ = pressure of arterial oxygen, PaCO₂ = pressure of arterial carbon dioxide, INR = prothrombin time-international normalized ratio.

Fig. 5
Feature importance for SOFA component scores based on interpretation using Shapley’s additive explanations of the light gradient boosting machine. (A) Global importance of each feature presented as the mean absolute value. (B) Information-dense summary of how the top features in a dataset impact the model output.
SOFA = sequential organ failure assessment, PF = PaO₂/FIO₂, PLT = platelets, GCS = Glasgow Coma Scale score, BIL = bilirubin, Cardio = cardiovascular.

The SHAP summary plot explains the information-dense summary of the effect of the top features in a dataset on the model results. For each instance, the explanation is represented by a single dot for each feature flow. Figs. 4B and 5B show the association between feature values and CatBoost and Light GBM in the clinical variable dataset through Shapley values. The SOFA-PaO₂/FiO₂ variable exhibited the highest influence in all models except the SVM model (Fig. 5, Supplementary Figs. 3 and 4). Platelets have a high influence, which contributes significantly to the performance of all models.

DISCUSSION

In summary, in this study, we performed ML-based mortality-prediction classification using both real-world clinical variables obtained through multicenter and the widely known SOFA score. As a result, we presented the excellence of optimized model performance through clinical information for each patient and presented an explainable ML method to help the clinical decision support system (CDSS). A CDSS can offer a systematic application of health-related knowledge and analysis of available data. It also assists clinical decision to understand the effect of variables for septic patients, and identify opportunities for quality improvement.41 The effectiveness of several ML algorithms for mortality prediction of sepsis patients based on the 44 clinical-factor combinations utilized in this study was evaluated. Overall, the variables used in this study were easily obtained clinically from patients with sepsis who visited the ER, and the reliability of the model was further demonstrated by the possibility of efficient interpretation and consistency with clinical experience.

Sepsis with high morbidity and mortality is a life-threatening disease requiring prompt diagnosis and treatment to improve patient outcomes. One of the most critical factors in the management of sepsis is the accurate prediction of patient mortality. The SOFA score is widely known as a tool for assessing organ function and predicting mortality in patients with sepsis. However, recent studies have shown that the accuracy of SOFA score in predicting mortality may be limited.42, 43, 44

In this study, we suggest that ML based on clinical variables obtained at the baseline in patients with sepsis presented to the emergency department can be used to predict mortality risk in patients with sepsis and has a higher predictive value than SOFA components. Early detection of septic patients at risk of death can help healthcare providers prioritize interventions and effectively allocate resources.27 Unlike early prediction diagnosis of sepsis, the advantage of early prediction of mortality for septic patients have not been fully explored through research. However, ML-based mortality prediction can calculate and present the risk of death using patient’s vital signs, laboratory results, clinical records, etc. Additionally, early mortality prediction may help allocate resources more efficiently and does not mean discontinuing treatment in high-risk patients. Patients identified as having a high risk of mortality may undergo more intensive monitoring and receive more intensive interventions, whereas those assessed to be at lower risk may adhere to standard treatment protocols.45 Consequently, this risk stratification enables medical doctors to personalize treatment plans for individual patients more effectively. This personalized approach ensures optimal deployment and allocation of resources where they are needed most. Furthermore, healthcare institutions can allocate critical resources like ICU beds, ventilators, and specialized medical staff more efficiently by prioritizing patients at higher risk of mortality.46 This targeted allocation enhances the likelihood of survival by delivering crucial interventions to those most in need and suggests that constructing predictive models can help a CDSS.44

This study, which developed an ML model for predicting sepsis-related mortality in an ER, is a crucial step toward improving sepsis management. The constructed ML models have the potential to analyze vast amounts of data and identify complex patterns that may not be apparent using traditional statistical methods.47, 48, 49 The performances of CatBoost (AUC = 0.800), Light GBM (AUC = 0.795), and XGBoost (AUC = 0.797) were significantly better than those of the other three models for the patient record variables (Table 3). The performances of RF (AUC = 0.672), XGBoost (AUC = 0.678), and CatBoost (AUC = 0.676) were significantly better than those of the other three models for the SOFA score components variables (Table 3). All models were validated using stratified k-fold (k = 5) cross-validation during training and demonstrated good generalization performance by iterating the model with a random selection of test data and presenting CIs. The implemented prediction was more accurate in classifying mortality risk in patients with sepsis than a previous prediction study whose research objects were also from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database (AUC = 0.61) and other previous studies using SOFA scores, including the results of AUC 0.612–0.752.12, 13, 14, 50

In this context, the development of an ML model using real-world clinical data that can predict sepsis mortality with higher accuracy than the existing SOFA score is a significant achievement. In addition, compared with existing studies, many existing studies have been conducted using MIMIC data to predict the mortality rate of sepsis patients. However, most of them investigated for the patients after admission to the ICU. These studies utilize LR, RF, and SVM, or even though they utilize GBM, they do not differentiate the models in detail. The models have a disadvantage in that the number of variables used is different, and it does not provide superior performance compared to this study even though it is a patient admitted to the ICU. In addition, although they presented the influence of individual variables, it was just done using conventional statistical analysis.51, 52, 53 Thus, the ML models constructed in this study provide more accurate predictions and improve patient outcomes. The development of an ML model in this study aimed to predict the early mortality rate of patients with sepsis who visit an ER, which has significant implications for improving patient outcomes and reducing healthcare costs. The model was found to have higher accuracy in predicting sepsis mortality when using various clinical variables than when using SOFA score based on ML application. It could demonstrate the potential of ML in enhancing sepsis management.

Generally, an artificial intelligence (AI) model, being a black box, cannot have its results, predictions, or decision-making interpreted and explained. Several studies have been conducted to interpret AI models to explain AI behavior. Compared with generally derived feature importance, feature importance produced by SHAP values provide a detailed explanation by considering offsets among all variables for result calculations by carrying out permutation calculations.54, 55 First, we developed a surrogate model, aiming to mirror the original model's behavior and calculate the SHAP values. A surrogate model represents a more straightforward and interpretable version, designed for enhanced ease of explanation. The SHAP values for each feature were determined by assessing how the surrogate model’s predictions changed when that feature was altered to its baseline value (usually the average or median of the training dataset). Traditional feature importance might indicate varying levels of significance based on the impact of parameters as the model undergoes successive iterations. Hence, in this research, we utilized the SHAP to quantify the importance of all variables in patient records and characteristics of the SOFA components, which influenced the performance of the ML classifier. Both lactate and albumin are laboratory markers that have been shown to be strongly associated with sepsis and its severity. From the SHAP values in Fig. 4, it was confirmed that lactate and albumin were important factors in predicting the mortality rate of patients with sepsis. Lactate, a by-product of anaerobic metabolism, is commonly used as a marker of tissue hypoxia and can result from sepsis-induced hypo-perfusion.55 Elevated lactate levels have been associated with increased mortality in patients with sepsis. In contrast, albumin is a protein synthesized by the liver and is commonly used as a marker of nutritional status and inflammation. In sepsis, albumin levels often decrease owing to increased capillary permeability and a fluid shift from the intravascular space to the interstitial space. Low albumin levels have been shown to be associated with increased mortality in septic patients. Our studies have shown that patients with sepsis and higher lactate and lower albumin levels are at an increased risk of mortality. Therefore, monitoring these laboratory markers in patients with sepsis may be useful for predicting the prognosis and guiding their management.

However, this study has several limitations. First, we used prospectively collected multi-center data. However, despite using a large amount of data and clinical variables, nonstructural data, such as bio-signals and images, are not included. In the future, it will be necessary to conduct a multimodal study by combining structural tablet data with nonstructural data. Second, no longitudinal studies were conducted. We performed mortality prediction for patients with sepsis; however, this was a cross-sectional study. The research results can be used as part of a CDSS; longitudinal studies are needed to design more precise and personalized treatments. Third, although this study aims for early mortality prediction in septic patients using mainly ER-obtained clinical variables and a large, continuously updated cohort, it also includes hard-to-detect early variables like infectious source control and initial empirical therapy. This is to secure a large number of investigated patients for this study who can be excluded through exclusion of the that variables, and we considered as important variables in clinical practice. However, as a result of comparison among various ML models, and XAI interpreted through interactions between variables by constructed in this study, the majority of variables that can be quickly obtained in the ER had a significant effect and influence on early mortality prediction. In further study, it is believed that a more accurate early mortality prediction model can be developed for more patients through future prospective study data collection. Lastly, interpretation through SHAP has potential issues, because the attribution of feature importance is normally based on random permutations. This approach has limitations, such as a high correlation among predictors. The presence of high correlation among features can lead to SHAP values that may not precisely represent the actual importance of each feature. In future studies, we plan to implement Kernel SHAP, a method that employs a weighted sampling strategy to calculate each SHAP value, taking into account the inter-correlations among features.

In conclusion, a new ML model for the prediction of sepsis mortality was developed. The developed ML model was demonstrated to predict sepsis mortality with a higher accuracy than the SOFA score; this is a significant achievement in sepsis management. A new ML model was constructed using various clinical variables in a large dataset, resulting in improved performance compared to SOFA. This model has the potential to improve patient outcomes by enabling more accurate predictions and facilitating timely interventions. However, further studies are necessary to validate this model and ensure its suitability for clinical practice.

SUPPLEMENTARY MATERIALS

Supplementary Table 1

SOFA score criteria

Click here to view.^{(41K, doc)}

Supplementary Table 2

Data split of clinical variables

Click here to view.^{(137K, doc)}

Supplementary Table 3

Data split of clinical variables

Click here to view.^{(74K, doc)}

Supplementary Fig. 1

SHAP figure of clinical variables. Feature importance for clinical variables based on interpretation using Shapley’s additive explanations of machine learning models for information dense summary of how the top features in a dataset impact the model output. (A) Logistic regression. (B) Support vector machine. (C) Random forest. (D) Extreme gradient boosting. (E) Light gradient boosting machine.

Click here to view.^{(1M, doc)}

Supplementary Fig. 2

Confusion matrix results for clinical variables in all six models. (A) LR. (B) SVM. (C) RF. (D) XGB. (E) LGB. (F) CatBoost.

Click here to view.^{(429K, doc)}

Supplementary Fig. 3

SHAP figure of SOFA score. Feature importance for SOFA component scores based on interpretation using Shapley’s additive explanations for information-dense summary of how the top features in a dataset impact the model output. (A) Logistic regression. (B) Support vector machine. (C) Random forest. (D) Light gradient boosting machine. (E) Categorical boosting.

Click here to view.^{(246K, doc)}

Supplementary Fig. 4

Confusion matrix results for SOFA component scores in all six models. (A) LR. (B) SVM. (C) RF. (D) XGB. (E) LGB. (F) CatBoost.

Click here to view.^{(170K, doc)}

Notes

Funding:This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI) funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI21C1074), and supported by Korean Sepsis Alliance (KSA) affiliated with Korean Society of Critical Care Medicine (KSCCM) funded by the Research Program of the Korea Disease Control and Prevention Agency (fund code: 2019E280500, 2020E280700, 2021-10-026).

Disclosure:The authors have no potential conflicts of interest to disclose.

Author Contributions:

Conceptualization: Park SW, Heo J.
Data curation: Oh DK, Lee SY, Park M, Lim CM, Heo J, Park SW, Kim D,¹ Ha TJ, Kim TH, Lee DH, Kim D,² Choi S, Kim MK.
Formal analysis: Moon DH, Heo YJ, Kim WJ, Lee SJ, Heo J.
Funding acquisition: Heo J.
Investigation: Park SW, Kang S, Yeo NY, Heo J.
Methodology: Park SW, Yeo NY.
Software: Park SW, Yeo NY.
Validation: Kim Y, Choi HS.
Visualization: Park SW, Yeo NY.
Writing - original draft: Park SW, Kang S, Yeo NY.
Writing - review & editing: Park SW, Kang S, Yeo NY, Han SS, Heo J, Kim Y, Choi HS.

Kim D,¹ Dowon Kim; Kim D,² DoHyun Kim.

ACKNOWLEDGMENTS

We were supported by the Korean Sepsis Alliance (KSA) affiliated with the Korean Society of Critical Care Medicine (KSCCM). The following persons and institutions participated in the Korean Sepsis Alliance (KSA): Sang-Bum Hong, Gee Young Suh, Kyeongman Jeon, Ryoung-Eun Ko, Young-Jae Cho, Yeon Joo Lee, Sung Yoon Lim, Sunghoon Park; Korea University Anam Hospital – Jae-myeong Lee; Daegu Catholic University Hospital – Kyung Chan Kim; Seoul National University Bundang Hospital – Yeon Joo Lee; Inje University Sanggye Paik Hospital – Youjin Chang; Samsung Medical Center – Kyeongman Jeon; Seoul National University Hospital – Sang-Min Lee; Suk-Kyung Hong; Pusan National University Yangsan Hospital – Woo Hyun Cho; Chonnam National University Hospital – Sang Hyun Kwak; Jeonbuk National University Hospital – Heung Bum Lee; Ulsan University Hospital – Jong-Joon Ahn; Jeju National University Hospital – Gil Myeong Seong; Chungnam National University Hospital – Song-I Lee; Hallym University Sacred Heart Hospital – Sunghoon Park; Hanyang University Guri Hospital – Tai Sun Park; Severance Hospital – Su Hwan Lee; Yeungnam University Medical Center – Eun Young Choi; Chungnam National University Sejong Hospital – Jae Young Moon; Inje University Ilsan Paik Hospital – Hyung Koo Kang.

References

1. Rudd KE, Kissoon N, Limmathurotsakul D, Bory S, Mutahunga B, Seymour CW, et al. The global burden of sepsis: barriers and potential solutions. Crit Care 2018;22(1):232.
  PubMed
  
  CrossRef
1. Park DW, Chun BC, Kim JM, Sohn JW, Peck KR, Kim YS, et al. Epidemiological and clinical characteristics of community-acquired severe sepsis and septic shock: a prospective observational study in 12 university hospitals in Korea. J Korean Med Sci 2012;27(11):1308–1314.
  PubMed
  
  CrossRef
1. Reaven MS, Rozario NL, McCarter MS, Heffner AC. Incidence and risk factors associated with early death in patients with emergency department septic shock. Acute Crit Care 2022;37(2):193–201.
  PubMed
  
  CrossRef
1. Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA 2016;315(8):801–810.
  PubMed
  
  CrossRef
1. Kim HI, Park S. Sepsis: early recognition and optimized treatment. Tuberc Respir Dis (Seoul) 2019;82(1):6–14.
  PubMed
  
  CrossRef
1. Seymour CW, Kennedy JN, Wang S, Chang CH, Elliott CF, Xu Z, et al. Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis. JAMA 2019;321(20):2003–2017.
  PubMed
  
  CrossRef
1. Liu H, Zhang L, Xu F, Li S, Wang Z, Han D, et al. Establishment of a prognostic model for patients with sepsis based on SOFA: a retrospective cohort study. J Int Med Res 2021;49(9):3000605211044892
  PubMed
1. Thakur R, Naga Rohith V, Arora JK. Mean SOFA score in comparison with APACHE II score in predicting mortality in surgical patients with sepsis. Cureus 2023;15(3):e36653
  PubMed
1. Koozi H, Lidestam A, Lengquist M, Johnsson P, Frigyesi A. A simple mortality prediction model for sepsis patients in intensive care. J Intensive Care Soc 2023;24(4):372–378.
  PubMed
  
  CrossRef
1. Li W, Wang M, Zhu B, Zhu Y, Xi X. Prediction of median survival time in sepsis patients by the SOFA score combined with different predictors. Burns Trauma 2020;8:tkz006
  PubMed
  
  CrossRef
1. Pan X, Xie J, Zhang L, Wang X, Zhang S, Zhuang Y, et al. Evaluate prognostic accuracy of SOFA component score for mortality among adults with sepsis by machine learning method. BMC Infect Dis 2023;23(1):76.
  PubMed
  
  CrossRef
1. Yang J, Liao Y, Dai Y, Hu L, Cai Y. Prediction of prognosis in sepsis patients by the SOFA score combined with miR-150. Adv Clin Exp Med 2022;31(1):9–15.
  PubMed
  
  CrossRef
1. Liu Z, Meng Z, Li Y, Zhao J, Wu S, Gou S, et al. Prognostic accuracy of the serum lactate level, the SOFA score and the qSOFA score for mortality among adults with Sepsis. Scand J Trauma Resusc Emerg Med 2019;27(1):51.
  PubMed
  
  CrossRef
1. Li Y, Yan C, Gan Z, Xi X, Tan Z, Li J, et al. Prognostic values of SOFA score, qSOFA score, and LODS score for patients with sepsis. Ann Palliat Med 2020;9(3):1037–1044.
  PubMed
  
  CrossRef
1. Yue S, Li S, Huang X, Liu J, Hou X, Zhao Y, et al. Machine learning for the prediction of acute kidney injury in patients with sepsis. J Transl Med 2022;20(1):215.
  PubMed
  
  CrossRef
1. Kijpaisalratana N, Sanglertsinlapachai D, Techaratsami S, Musikatavorn K, Saoraya J. Machine learning algorithms for early sepsis detection in the emergency department: a retrospective study. Int J Med Inform 2022;160:104689
  PubMed
  
  CrossRef
1. Yao RQ, Jin X, Wang GW, Yu Y, Wu GS, Zhu YB, et al. A machine learning-based prediction of hospital mortality in patients with postoperative sepsis. Front Med (Lausanne) 2020;7:445.
  PubMed
  
  CrossRef
1. Moor M, Rieck B, Horn M, Jutzeler CR, Borgwardt K. Early prediction of sepsis in the ICU using machine learning: a systematic review. Front Med (Lausanne) 2021;8:607952
  PubMed
  
  CrossRef
1. Du M, Haag DG, Lynch JW, Mittinty MN. Comparison of the tree-based machine learning algorithms to cox regression in predicting the survival of oral and pharyngeal cancers: ANALYSES based on SEER database. Cancers (Basel) 2020;12(10):2802.
  PubMed
  
  CrossRef
1. Hou N, Li M, He L, Xie B, Wang L, Zhang R, et al. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost. J Transl Med 2020;18(1):462.
  PubMed
  
  CrossRef
1. Hu CA, Chen CM, Fang YC, Liang SJ, Wang HC, Fang WF, et al. Using a machine learning approach to predict mortality in critically ill influenza patients: a cross-sectional retrospective multicentre study in Taiwan. BMJ Open 2020;10(2):e033898
  PubMed
  
  CrossRef
1. Rodríguez A, Mendoza D, Ascuntar J, Jaimes F. Supervised classification techniques for prediction of mortality in adult patients with sepsis. Am J Emerg Med 2021;45:392–397.
  CrossRef
1. Greco M, Caruso PF, Spano S, Citterio G, Desai A, Molteni A, et al. Machine learning for early outcome prediction in septic patients in the emergency department. Algorithms 2023;16(2):76.
  CrossRef
1. van Doorn WP, Stassen PM, Borggreve HF, Schalkwijk MJ, Stoffers J, Bekers O, et al. A comparison of machine learning models versus clinical evaluation for mortality prediction in patients with sepsis. PLoS One 2021;16(1):e0245157
  PubMed
  
  CrossRef
1. Kong G, Lin K, Hu Y. Using machine learning methods to predict in-hospital mortality of sepsis patients in the ICU. BMC Med Inform Decis Mak 2020;20(1):251.
  PubMed
  
  CrossRef
1. Park S, Jeon K, Oh DK, Choi EY, Seong GM, Heo J, et al. Normothermia in patients with sepsis who present to emergency departments is associated with low compliance with sepsis bundles and increased in-hospital mortality rate. Crit Care Med 2020;48(10):1462–1470.
  PubMed
  
  CrossRef
1. Valera Durán M. Surviving sepsis campaign: International guidelines for management of sepsis and septic shock: 2016. Rev Electron AnestesiaR 2017;9:2.
1. Na SJ, Oh DK, Park S, Lee YJ, Hong SB, Park MH, et al. Clinical characteristics and outcomes of neutropenic sepsis: a multicenter cohort study. Shock 2022;57(5):659–665.
  PubMed
  
  CrossRef
1. Yeo HJ, Lee YS, Kim TH, Jang JH, Lee HB, Oh DK, et al. Vasopressor initiation within 1 hour of fluid loading is associated with increased mortality in septic shock patients: analysis of national registry data. Crit Care Med 2022;50(4):e351–e360.
  PubMed
  
  CrossRef
1. Jeon K, Na SJ, Oh DK, Park S, Choi EY, Kim SC, et al. Characteristics, management and clinical outcomes of patients with sepsis: a multicenter cohort study in Korea. Acute Crit Care 2019;34(3):179–191.
  PubMed
  
  CrossRef
1. Singh D, Singh B. Investigating the impact of data normalization on classification performance. Appl Soft Comput 2020;97:105524
  CrossRef
1. Feng J, Xu H, Mannor S, Yan S. Robust logistic regression and classification; NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems; December 8-13, 2014; Montreal, Canada. Cambridge, MA, USA: MIT Press; 2014. pp. 253-261.
1. Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995;20(3):273–297.
1. Breiman L. Random forests. Mach Learn 2001;45(1):5–32.
  CrossRef
1. Chen T, Guestrin C. XGBoost: a scalable tree boosting system; Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD’16; August 13-17, 2016; San Francisco, CA, USA. New York, NY, USA: Association for Computing Machinery; 2016. pp. 785-794.
1. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 2017;30:3149–3157.
1. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. Adv Neural Inf Process Syst 2018;31:6638–6648.
1. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 2017;30:4765–4774.
1. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res 2011;12(85):2825–2830.
1. Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: a next-generation hyperparameter optimization framework; Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; August 4-8, 2019; Anchorage, AK, USA. New York, NY, USA: Association for Computing Machinery; 2019. pp. 2623-2631.
1. Amland RC, Hahn-Cover KE. Clinical decision support for early recognition of sepsis. Am J Med Qual 2019;34(5):494–501.
  PubMed
  
  CrossRef
1. Do SN, Dao CX, Nguyen TA, Nguyen MH, Pham DT, Nguyen NT, et al. Sequential Organ Failure Assessment (SOFA) Score for predicting mortality in patients with sepsis in Vietnamese intensive care units: a multicentre, cross-sectional study. BMJ Open 2023;13(3):e064870
  PubMed
  
  CrossRef
1. Moreno R, Rhodes A, Piquilloud L, Hernandez G, Takala J, Gershengorn HB, et al. The Sequential Organ Failure Assessment (SOFA) Score: has the time come for an update? Crit Care 2023;27(1):15.
  PubMed
  
  CrossRef
1. Seymour CW, Liu VX, Iwashyna TJ, Brunkhorst FM, Rea TD, Scherag A, et al. Assessment of clinical criteria for sepsis: for the third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA 2016;315(8):762–774.
  PubMed
  
  CrossRef
1. Schinkel M, Nanayakkara PW, Wiersinga WJ. Sepsis performance improvement programs: from evidence toward clinical implementation. Crit Care 2022;26(1):77.
  PubMed
  
  CrossRef
1. Burney M, Underwood J, McEvoy S, Nelson G, Dzierba A, Kauari V, et al. Early detection and treatment of severe sepsis in the emergency department: identifying barriers to implementation of a protocol-based approach. J Emerg Nurs 2012;38(6):512–517.
  PubMed
  
  CrossRef
1. Rajula HS, Verlato G, Manchia M, Antonucci N, Fanos V. Comparison of conventional statistical methods with machine learning in medicine: diagnosis, drug development, and treatment. Medicina (Kaunas) 2020;56(9):455.
  PubMed
  
  CrossRef
1. Churpek MM, Yuen TC, Winslow C, Meltzer DO, Kattan MW, Edelson DP. Multicenter comparison of machine learning methods and conventional regression for predicting clinical deterioration on the wards. Crit Care Med 2016;44(2):368–374.
  PubMed
  
  CrossRef
1. Nohara Y, Matsumoto K, Soejima H, Nakashima N. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Programs Biomed 2022;214:106584
  PubMed
  
  CrossRef
1. Janiesch C, Zschech P, Heinrich K. Machine learning and deep learning. Electron Mark 2021;31(3):685–695.
  CrossRef
1. Ribas Ripoll VJ, Vellido A, Romero E, Ruiz-Rodríguez JC. Sepsis mortality prediction with the Quotient Basis Kernel. Artif Intell Med 2014;61(1):45–52.
  PubMed
  
  CrossRef
1. Zhang Z, Hong Y. Development of a novel score for the prediction of hospital mortality in patients with severe sepsis: the use of electronic healthcare records with LASSO regression. Oncotarget 2017;8(30):49637–49645.
  PubMed
  
  CrossRef
1. Li X, Zhou Y, Dvornek NC, Gu Y, Ventola P, Duncan JS. Efficient Shapley explanation for features importance estimation under uncertainty. Med Image Comput Comput Assist Interv 2020;12261:792–801.
  PubMed
1. Bakker J, Nijsten MW, Jansen TC. Clinical use of lactate monitoring in critically ill patients. Ann Intensive Care 2013;3(1):12.
  PubMed
  
  CrossRef
1. Quinlan GJ, Martin GS, Evans TW. Albumin: biochemical properties and therapeutic potential. Hepatology 2005;41(6):1211–1219.
  PubMed
  
  CrossRef