Machine Learning Models for Predicting Mortality in Critically Ill Patients with Sepsis-Associated Acute Kidney Injury: A Systematic Review

While machine learning (ML) models hold promise for enhancing the management of acute kidney injury (AKI) in sepsis patients, creating models that are equitable and unbiased is crucial for accurate patient stratification and timely interventions. This study aimed to systematically summarize existing evidence to determine the effectiveness of ML algorithms for predicting mortality in patients with sepsis-associated AKI. An exhaustive literature search was conducted across several electronic databases, including PubMed, Scopus, and Web of Science, employing specific search terms. This review included studies published from 1 January 2000 to 1 February 2024. Studies were included if they reported on the use of ML for predicting mortality in patients with sepsis-associated AKI. Studies not written in English or with insufficient data were excluded. Data extraction and quality assessment were performed independently by two reviewers. Five studies were included in the final analysis, reporting a male predominance (>50%) among patients with sepsis-associated AKI. Limited data on race and ethnicity were available across the studies, with White patients comprising the majority of the study cohorts. The predictive models demonstrated varying levels of performance, with area under the receiver operating characteristic curve (AUROC) values ranging from 0.60 to 0.87. Algorithms such as extreme gradient boosting (XGBoost), random forest (RF), and logistic regression (LR) showed the best performance in terms of accuracy. The findings of this study show that ML models hold immense ability to identify high-risk patients, predict the progression of AKI early, and improve survival rates. However, the lack of fairness in ML models for predicting mortality in critically ill patients with sepsis-associated AKI could perpetuate existing healthcare disparities. Therefore, it is crucial to develop trustworthy ML models to ensure their widespread adoption and reliance by both healthcare professionals and patients.


Introduction
Acute kidney injury (AKI) is a common public health threat worldwide, affecting one in five adults and one in three children [1].The risk of AKI is even higher among patients with sepsis, playing a critical role in 40-50% of cases [2].Previous studies have highlighted that sepsis increases the likelihood of in-hospital mortality by six-to eightfold [3,4].However, the timely identification and careful management of high-risk AKI patients may reduce mortality risk [5,6].Recent technological advancements paved the way for developing automated real-time monitoring tools, facilitating appropriate clinical interventions [7][8][9].
In recent decades, machine learning (ML) has emerged as a pivotal tool for predicting disease onset and effectively managing medical conditions [10][11][12].By leveraging extensive clinical datasets, comprising patient demographics, previous drug and disease history, organizational factors, lifestyle variables, and biomarkers [13], ML algorithms can correctly identify patterns and risk factors associated with conditions, such as AKI [14][15][16]].Existing algorithms can analyze complex interactions among various factors, facilitating the development of predictive models that assess an individual's likelihood of developing AKI [17].ML models also hold promise in personalized treatment plans by stratifying individual responses to various interventions, optimizing strategies for disease management, and ultimately improving patient outcomes [18][19][20].
However, developing fair ML models for stratifying high-risk AKI patients is crucial to minimize potential biases and disparities in healthcare outcomes.There is a dearth of prior research examining the fairness (e.g., testing ML performance in various racial and ethnic backgrounds) of ML models, a factor critical for ensuring equitable treatment and accurate classification across diverse demographic groups.This study aims to fill this gap by evaluating the performance of ML models in predicting mortality in patients with sepsis-associated AKI using insights gleaned from previously published research.Ensuring fairness in ML for mortality prediction, healthcare systems can offer more equitable and precise care to individuals, regardless of their racial and ethnic background or contextual orientations.Moreover, this study also provides strategies for reducing bias and developing fair models within real-world clinical settings.

Search Strategy
A systematic search was conducted across popular electronic databases such as PubMed, Web of Science Core Collection, and Scopus between 1 January 2000 and 1 February 2024.The following keywords were used to retrieve all relevant articles: "machine learning" AND "sepsis-associated acute kidney injury" AND "mortality".This search strategy underwent validation by an expert and involved the development of structure search strategies.Language restriction was not applied during the search process.The bibliographies of retrieved studies were further scrutinized to identify additional relevant studies for inclusion.

Eligibility Criteria
Studies were only eligible for inclusion if they met the following criteria: (i) described ML algorithms, (ii) focused on mortality in patients with sepsis-associated AKI as the outcome of interest, (iii) reported the predictive performance of ML models, and (iv) were clinical studies.Studies were excluded if they (i) were not written in English or (ii) were reviews, editorials, letters, or conference abstracts.No specific study design or setting was prioritized in the inclusion criteria.

Study Selection
Two reviewers (M.M.I. and C.C.W.) independently screened the titles and abstracts to identify relevant studies for inclusion and exclusion.Eligible articles underwent fulltext review, with duplicate studies removed.The same two reviewers conducted fulltext screening if the studies met the inclusion criteria.Any discrepancies during the screening process were resolved through discussions between the two reviewers, reaching a final consensus.

Risk of Bias Assessment
The same two reviewers independently utilized the prediction model risk of bias assessment tool (PROBAST) to assess the methodological quality of each included ML model study [21].This tool is specifically designed to assess the risk of bias and the applicability of diagnostic and prognostic prediction model studies.It is structured around four domains: participants, predictors, outcome, and analysis, which are crucial for evaluating the reliability and relevance of included studies.PROBAST rigorously assesses the methodologies to identify potential biases that might influence outcomes [22].This tool is highly regarded for its systematic, structured method of assessing the integrity and clarity of the prediction model [23].

Data Extraction
A table was generated to record all relevant data.The extracted information was independently collected and cross-checked by the same two authors.Any discrepancies were resolved through consensus based on predefined criteria.All data items were collected according to the Cochrane guidance for data collection and the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist [24].This tool is used to assist in the appraisal and extraction of relevant data from prediction model studies for systematic reviews [25].This checklist guides reviewers through a comprehensive assessment of study methodologies, focusing on key aspects such as study participants, predictors, outcomes, and statistical analysis methods [26].We extracted the following information from each study: first author, publication year, country, total number of patients, mean age, study design, AKI definition, patient inclusion and exclusion criteria, data partitioning method, name of ML model, internal and external validation details, and predictive performance metrics (including discrimination and calibration).If the included study reports multiple ML models, we separately recorded the predictive performance of each model.

Statistical Analysis
A descriptive summary was provided regarding the types of ML models used, the prediction of mortality in patients with sepsis-associated AKI, risk of bias assessment, and model validation.The results of ML studies were presented for each predicted outcome to illustrate the predictive performance of each type of ML model.

Search Results
Initially, 85 studies were retrieved through the search process, followed by the exclusion of 66 duplicate studies.A total of 19 studies underwent initial screening, resulting in the exclusion of 12 studies based on title and abstract review.Of the remaining seven studies, a full-text evaluation was conducted, leading to the exclusion of two studies upon review.Finally, five studies met the criteria for inclusion in this systematic review [27][28][29][30][31] (Figure 1).

Study Characteristics
A total of 57,769 patients were enrolled across five studies.Table 1 presents the baseline characteristics of these studies.All studies utilized MIMIC data to develop and evaluate the performance of ML models.Luo et al. [27], however, employed data from the eICU Collaborative Research Database (eICU-CRD) to test their ML models.The majority (>50%) of patients across studies were male.While only two studies reported on race and ethnicity, a predominance of White patients was included [27,28].Vital signs such as heart rate, respiratory rate, and blood pressure, along with laboratory tests including serum creatinine, platelets, white blood cell count (WBC), blood urea nitrogen (BUN), and urine outputs, were consistently assessed across all studies to predict mortality.Additionally, the Sequential Organ Failure Assessment (SOFA) score and Simplified Acute Physiology Score (SAPS II) were commonly utilized for outcome prediction in patients with severe sepsis-associated AKI.

Study Characteristics
A total of 57,769 patients were enrolled across five studies.Table 1 presents the baseline characteristics of these studies.All studies utilized MIMIC data to develop and evaluate the performance of ML models.Luo et al. [27], however, employed data from the eICU Collaborative Research Database (eICU-CRD) to test their ML models.The majority (>50%) of patients across studies were male.While only two studies reported on race and ethnicity, a predominance of White patients was included [27,28].Vital signs such as heart rate, respiratory rate, and blood pressure, along with laboratory tests including serum creatinine, platelets, white blood cell count (WBC), blood urea nitrogen (BUN), and urine   The studies included in the systematic review were predominantly conducted in ICU settings using Sepsis-3 criteria as the selection standard.Of the five studies reviewed, three utilized data partition methods for model development and evaluation, involving various approaches to dividing the dataset into training and testing subsets.Two studies employed cross-validation methods, iteratively training and testing the model on different data subsets to assess generalizability and performance.The proportion of missing data varied, and different imputation methods such as XGBoost, MiceForest, and K-nearest neighbor were employed.Notably, none of the studies tested their models across different racial groups, highlighting a significant gap in evaluating the models' fairness and generalizability (Table 2).

Predictor Identification
Gao et al.
[29] initially considered 51 original factors to predict mortality in patients with sepsis-associated AKI.They then applied ensemble stepwise feature ranking to identify the 11 most promising factors.Li et al. [30] collected 44 variables and identified 24 statistically significant variables associated with mortality using LASSO regression.Luo et al. [27] applied the XGBoost algorithm to assess the importance of various fea-tures in predicting mortality.For exploring the interpretability of these XGBoost models, the Shapley Additive Explanations (SHAP) method was also utilized.Yang et al. [31] initially conducted univariate regression analysis on all variables, excluding any with a p-value above 0.05.They then employed the Boruta algorithm, which relies on random forest techniques, to select the most important variables for predicting mortality.Finally, Zhou et al. [28] employed the recursive feature elimination (RFE) algorithm to pinpoint key variables for their ML models.

Risk of Bias
Figure 2 presents the overall risk of bias assessment based on PROBAST criteria.All studies were rated as having a low risk of bias in both the participant and predictor domains.However, in the "outcome" domain, three studies (60%) received an unclear risk of bias rating due to insufficient information on the outcome definition and assessment.Additionally, all studies were rated as having a high risk of bias due to the absence of an external validation and evaluation of machine learning performance across diverse groups.Consequently, the overall risk of bias for the included studies was considered high, primarily due to the unclear ratings in the outcome domain and the lack of external validation.
tics 2024, 14, x FOR PEER REVIEW 7 Additionally, all studies were rated as having a high risk of bias due to the absence o external validation and evaluation of machine learning performance across div groups.Consequently, the overall risk of bias for the included studies was consid high, primarily due to the unclear ratings in the outcome domain and the lack of exte validation.

Figure 2.
Overall risk of bias assessment of included studies using PROBAST.

Performance of Machine Learning Models
The predictive models demonstrated varying levels of performance, with area u the receiver operating characteristic curve (AUROC) values ranging from 0.60 to 0.87 included studies developed and tested several popular machine learning (ML) mo including logistic regression (LR), random forest (RF), and extreme gradient boos (XGBoost).All studies developed and validated the XGBoost model, which showe area under the receiver operating characteristic curve (AUROC) ranging from 0.79 to and an accuracy between 0.77 and 0.83.Random forest (RF) and logistic regression were also commonly used to predict mortality in patients with sepsis-associated achieving an average accuracy of 0.813 and an AUROC of 0.802.Table 3 presents the formance of these ML models in predicting mortality.

Performance of Machine Learning Models
The predictive models demonstrated varying levels of performance, with area under the receiver operating characteristic curve (AUROC) values ranging from 0.60 to 0.87.The included studies developed and tested several popular machine learning (ML) models, including logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost).All studies developed and validated the XGBoost model, which showed an area under the receiver operating characteristic curve (AUROC) ranging from 0.79 to 0.87 and an accuracy between 0.77 and 0.83.Random forest (RF) and logistic regression (LR) were also commonly used to predict mortality in patients with sepsis-associated AKI, achieving an average accuracy of 0.813 and an AUROC of 0.802.Table 3 presents the performance of these ML models in predicting mortality.

Discussion
This systematic review of ML models for predicting mortality in patients with sepsisassociated acute kidney injury (AKI) reveals several key findings and areas for future research.To our knowledge, this represents the first systematic review of ML studies aimed at predicting mortality in this patient population.The findings of this study show that included studies primarily conducted in ICU settings and utilized Sepsis-3 criteria for patient selection.Data partitioning and cross-validation methods were employed to develop and evaluate the models, with three studies using traditional partitioning techniques and two opting for cross-validation to ensure generalizability.
Feature selection and imputation methods varied across studies, with XGBoost, Mice-Forest, and K-nearest neighbor, as these methods were crucial in handling missing data and identifying significant predictors of mortality.Vital signs, comorbidities, and laboratory tests emerged as critical prognostic indicators, aligning with clinical insights and reinforcing the importance of these variables in risk stratification.The included studies utilized widely used feature selection techniques such as SHAP, LASSO regression, and recursive feature elimination to identify key variables for their ML models.The performance of ML models varied across studies, with AUROC values ranging from 0.60 to 0.87.Notably, algorithms such as XGBoost, random forest (RF), and logistic regression (LR) consistently emerged as high-performing models.Despite the advancements, a significant gap was identified in the evaluation of these models across different racial groups.None of the studies included in the systematic review tested the performance of models in diverse racial populations, raising concerns about the fairness and generalizability of the findings.
Recent evidence underscores the importance of ensuring the fairness of ML models when predicting diseases such as AKI [32][33][34].It is important to note that AKI affects individuals from diverse backgrounds, including various racial and ethnic backgrounds, socioeconomic status, sexual orientation, and geographical regions [35][36][37].Without fair development, algorithms may disproportionately impact certain demographic groups, leading to inequitable healthcare outcomes.Algorithmic biases may also result in inaccurate predictions or misdiagnoses of high-risk AKI, potentially causing inappropriate treatments or delays in care.However, developing fair ML models ensures all individuals receive accurate assessments and appropriate interventions, regardless of their demographic characteristics [38,39].
Ethical concerns are increasingly raised regarding the differential impact that models may exert on under-represented communities [40][41][42].Therefore, there is widespread acknowledgment of the urgent need for fairness in AI, especially within healthcare ML models [43].With AKI prevalence higher among minority groups, it is essential to develop fair ML models using diverse and representative datasets.Continuous monitoring and testing of the ML model performance across various demographic groups are essential for identifying and effectively addressing relevant biases.Indeed, including individual, organizational, and community factors in the model development and validation processes can offer valuable insights into fairness considerations [33,44,45].To ensure algorithms' accountability and trustworthiness, transparent documentation of the model's decisionmaking process and regular audits are recommended.

Strengths and Limitations:
Our study has several strengths.Firstly, it is the first systematic review assessing the fairness of ML models in predicting mortality among patients with sepsis-associated AKI.Secondly, our findings also elucidate commonly utilized ML models, offering valuable insights for researchers embarking on similar investigations.Nevertheless, our study has limitations.Our study has several limitations that could affect the robustness and applicability of our findings.Firstly, it is based on a small sample size, including only five existing studies, which may reduce the statistical power and comprehensive nature of our analysis.More studies could have broadened the scope and depth of our assessment.Secondly, the use of a single dataset across all studies limits the generalizability of our conclusions, as results might not be replicable with different data.Lastly, the performance of the ML models, while satisfactory in predicting mortality, is drawn from data with limited racial and ethnic diversity.This lack of representation could potentially perpetuate health disparities, suggesting a need for more diverse datasets to ensure broader applicability and equity in healthcare outcomes.

Future Direction:
To enhance the robustness and relevance of future studies, several steps should be taken based on the current limitations.None of the included studies evaluated their models in diverse racial and ethnic groups, which raises concerns about fairness and generalizability.Future research should focus on developing and testing models that can be applied across various populations to help reduce health disparities.
(c) Advanced Imputation Techniques: Although various imputation methods were used in the included studies, the development and application of more advanced techniques could significantly improve model accuracy, especially in datasets with high levels of missing data.Each study should report the percentage of missing data to assist researchers in future studies.Using longitudinal data can enhance the predictive power of machine learning models for predicting mortality among sepsis patients with AKI.Future research should aim to develop models that incorporate and analyze temporal trends in patient data.
By addressing these gaps, future research can enhance the development of ML models that are not only accurate but also fair and generalizable, ultimately improving clinical outcomes for patients with sepsis-associated AKI.There is a need for research on the integration of ML models into clinical workflows; therefore, future studies should assess the feasibility, acceptance, and impact of these models in real-world clinical settings to facilitate their adoption.

Conclusions
In this study, we evaluated ML models to predict mortality in patients with sepsisassociated acute kidney injury (AKI), a condition noted for its high morbidity and mortality rates.Our findings show that ML models such as logistic regression, random forest, and extreme gradient boosting (XGBoost) delivered promising results.These outcomes highlight the efficacy of advanced data-driven methods in improving predictive accuracy, essential for timely interventions and management in septic patients.Key feature selection methods like Shapley Additive Explanations (SHAP), LASSO regression, and recursive feature elimination (RFE) were crucial in optimizing these models, underscoring the importance of selecting clinically relevant variables to maximize the ML model's benefits in healthcare.Despite the robust performance of these ML models, included studies showed a significant risk of bias, potentially limiting the application of ML in medical diagnostics and planning.Particularly, included studies often excluded minority populations, which raises concerns about the models' effectiveness across diverse socio-economic and racial groups.Future research should focus on enhancing the fairness of algorithms to ensure consistent and reliable predictions across all demographic groups.

Table 1 .
Baseline chrematistics of included studies.

Figure 2 .
Figure 2. Overall risk of bias assessment of included studies using PROBAST.
(a) External Validation: One significant research gap is the absence of external validation.Most studies included in this review used the same dataset for both developing and evaluating the model.Future research should focus on validating machine learning models with external datasets to ensure their robustness and applicability across various clinical settings.(b) Racial and Ethnic Diversity: (d) Comprehensive Feature Selection: Although the included studies utilized popular feature selection algorithms, future research should explore more comprehensive and innovative techniques to identify novel prognostic indicators.Integrating clinical expertise with advanced statistical methods can lead to more reliable predictors of mortality among sepsis patients with AKI.(e) Longitudinal Data: Author Contributions: Conceptualization, C.-C.W. and M.M.I.; methodology, T.N.P.; software, T.N.P.; validation, M.-C.L.; formal analysis, Y.-C.W.; investigation, M.-C.L.; resources, M.M.I.; data curation, M.M.I.; writing-original draft preparation, M.M.I., C.-C.W.; writing-review and editing, T.N.P.; visualization, MML.; supervision, M.M.I.All authors have read and agreed to the published version of the manuscript.

Table 2 .
Summary of included studies on machine learning models for mortality prediction in sepsis-associated acute kidney injury.

Table 3 .
The performance of machine learning models for predicting mortality in patients with sis-associated acute kidney injury.