Explainable machine learning for predicting neurological outcome in hemorrhagic and ischemic stroke patients in critical care

Aim The objective of this study is to develop accurate machine learning (ML) models for predicting the neurological status at hospital discharge of critically ill patients with hemorrhagic and ischemic stroke and identify the risk factors associated with the neurological outcome of stroke, thereby providing healthcare professionals with enhanced clinical decision-making guidance. Materials and methods Data of stroke patients were extracted from the eICU Collaborative Research Database (eICU-CRD) for training and testing sets and the Medical Information Mart for Intensive Care IV (MIMIC IV) database for external validation. Four machine learning models, namely gradient boosting classifier (GBC), logistic regression (LR), multi-layer perceptron (MLP), and random forest (RF), were used for prediction of neurological outcome. Furthermore, shapley additive explanations (SHAP) algorithm was applied to explain models visually. Results A total of 1,216 hemorrhagic stroke patients and 954 ischemic stroke patients from eICU-CRD and 921 hemorrhagic stroke patients 902 ischemic stroke patients from MIMIC IV were included in this study. In the hemorrhagic stroke cohort, the LR model achieved the highest area under curve (AUC) of 0.887 in the test cohort, while in the ischemic stroke cohort, the RF model demonstrated the best performance with an AUC of 0.867 in the test cohort. Further analysis of risk factors was conducted using SHAP analysis and the results of this study were converted into an online prediction tool. Conclusion ML models are reliable tools for predicting hemorrhagic and ischemic stroke neurological outcome and have the potential to improve critical care of stroke patients. The summarized risk factors obtained from SHAP enable a more nuanced understanding of the reasoning behind prediction outcomes and the optimization of the treatment strategy.


Introduction
Stroke encompasses a set of conditions characterized by the sudden rupture or occlusion of cerebral blood vessels, ultimately resulting in insufficient blood flow and subsequent damage to brain tissue.Clinically, stroke is broadly classified into two main types-ischemic and hemorrhagic-with the latter comprising intracerebral and subarachnoid hemorrhage forms (1).Stroke affects a staggering one in every four individuals over 25 years of age, rendering it the second most common cause of mortality and third leading cause of disability among adult populations worldwide (2).Approximately 16 million people worldwide suffer from various motor and cognitive impairments as a result of stroke, which are often unavoidable sequelae for stroke patients, and severely affects the mobility and quality of life of stroke victims (3).
Acute stroke patients often enter the intensive care unit (ICU) due to consciousness disorders, cardiopulmonary complications, circulatory instability, or acute thrombolytic therapy (4).Compared with patients admitted to a dedicated neurological ward or stroke unit, those with stroke who are admitted to the ICU exhibit heightened neurological severity, notable impairment of consciousness at a moderate to severe level, often necessitating mechanical ventilation, and encounter an elevated risk of hospital mortality (5,6).ICU provides complex and resource-intensive treatment for hospitalized patients with severe conditions, but current medical resources are often insufficient to meet the needs of ICU patients, and hospitals face pressure to improve critical care efficiency and reduce costs (7).Early prediction of neurological outcome in critically ill stroke patients can provide important references for patients and their families, and can also guide clinicians to give the best intervention measures to patients.
In contrast to conventional predictive models that rely on established variables for computation, machine learning (ML) approaches offer the distinct advantage of incorporating a broader range of variables that more comprehensively capture the intricacies and inherent unpredictability of human physiology (8,9).Consequently, ML has emerged as a promising tool in the medical field, with its capacity to integrate abundant variables, extract nuanced insights, and generalize acquired knowledge to novel cases with remarkable efficiency and precision (10,11).Furthermore, interpretable machine learning is increasingly being applied in clinical research, demonstrating robust clinical applicability and guiding capabilities (12,13).
In this work, we aimed to construct ML models for early and effective prediction of neurological outcome at hospital discharge in critically ill patients with hemorrhagic and ischemic stroke, and employed the shapley additive explanations (SHAP) methods to elucidate the underlying reasons and decision-making processes involved within the optimal algorithm.

Study design
The implementation of the study design was shown in Figure 1.

Participants
In this study, patients diagnosed with stroke according to the ninth and 10th revisions of the international classification of diseases were included (Table 1).These patients were then stratified into hemorrhagic and ischemic cohorts for comparative analysis.Inclusion criteria included individuals over 18 years of age but under 89 years of age who had been in the ICU for at least more than 24 h, along with a Glasgow Coma Scale (GCS) score within 24 h of admission and a documented motor GCS score within 24 h prior to discharge.It is important to note that in the case of repeat ICU admissions, only data relating to the first ICU admission were retained.

Variables extraction and outcome
In this study, detailed demographic data were collected on age, gender, race, weight, height, and body mass index (BMI).The maximum, minimum and mean values of vital signs during the initial 24 h of ICU admission were extracted, encompassing heart rate (HR), systolic blood pressure (SBP), diastolic blood pressure (DBP), mean blood pressure (MBP), temperature, respiratory rate (RR), and oxygen saturation (SpO 2 ).Laboratory parameters obtained within the first 24 h of admission were also extracted.For certain parameters with multiple measurements, both the maximum and minimum values were evaluated.Moreover, the medical interventions employed during the 1st day of admission, such as mechanical ventilation and renal replacement therapy, along with the illness severity scoring systems, namely Charlson comorbidity index, GCS, acute physiology score III (APS III), and sequential organ failure assessment (SOFA), were recorded.Table 2 summarized the variables extracted.
The objective of this study was to investigate neurological status at the time of hospital discharge.In the stroke population, the National Institute of Health stroke scale (NIHSS) is a widely accepted metric to determine neurological outcomes.However, since NIHSS was not recorded in eICU-CRD or MIMIC IV, we adopted a surrogate neurological outcome marker based on the motor subscore of the Glasgow Coma Score (mGCS) at discharge.The mGCS score was stratified into two categories, favorable (mGCS score of 6) and unfavorable (mGCS score < 5).

Data preprocessing
In our data preprocessing approach, variables with missing values exceeding 40% were identified as unreliable and thus removed from the dataset to ensure data integrity.Outliers were then detected using the Interquartile Range (IQR) method, computed as the difference between the 75th and 25th percentiles (Q3 and Q1, respectively).Data points falling outside the range of Q1 -1.5 * IQR or Q3 + 1.5 * IQR were flagged as outliers and subsequently eliminated based on statistical conventions.Finally, the multiple imputation method was employed for the imputation of missing numerical values.Renowned for its robustness and capacity to handle intricate datasets, this algorithm efficiently imputes missing values while preserving the inherent data structure.

Model development
In order to prevent overfitting and simplify the model, we simplified the feature data for each outcome, thereby enabling the models to identify underlying patterns in the data and enhance their generalization capability.To achieve this goal, we utilized the least absolute shrinkage and selection operator (LASSO), a machine learning algorithm, and selected the optimal regularization coefficient lambda through a crossvalidation process.Specifically, we opted 10-fold for crossvalidation to determine the regularization parameter λ (penalty parameter) in the LASSO algorithm.This parameter facilitates variable selection and shrinkage, enabling the compression of some non-essential variables to zero once λ surpasses a certain threshold, thereby excluding them from the model.Through the computation of performance metrics, such as mean squared error, across various λ values during the cross-validation process, we identified the λ value corresponding to the minimum mean squared error as the final regularization parameter.At this λ value, the non-zero coefficients denote the selected significant features.Our study involved four machine learning models, namely Gradient Boosting Classifier (GBC), Logistic Regression (LR), Multi-Layer Perceptron (MLP), and Random Forest (RF).They are all classification models in supervised learning, with GBC being an ensemble learning model, MLP being a deep learning model, while LR and RF are traditional machine learning models.To optimize model performance, we performed hyperparameter tuning with pre-set hyperparameters (Supplementary Tables 1, 2).The hyperparameter tuning was carried out using 10-fold cross-validation during the training set loop.

External validation
We conducted external validation using the MIMIC IV dataset, ensuring consistency in patient inclusion criteria and data processing methods with those described for the eICU patient data.Additionally, we ensured that the clinical indicators analyzed in the MIMIC database maintained consistent units with those in the eICU patient data to ensure the accuracy of validation.

Statistical analysis
We evaluated the predictive performance of our model by measuring several common performance metrics, including accuracy, positive predictive value (PPV), negative predictive value (NPV), sensitivity, specificity, F-measure (F1), and area under the curve (AUC).To determine statistical significance, we used a threshold of P < 0.05 and applied Two-tailed Student's t-tests or Mann-Whitney U-tests for continuous variables, as well as Chi-squared or Fisher's exact tests for categorical variables.

Participants
A total of 1,216 hemorrhagic stroke patients and 954 ischemic stroke patients from the eICU-CRD dataset, as well as 921 hemorrhagic stroke patients and 902 ischemic stroke patients from the MIMIC IV dataset, were included in this study.The comparison of baseline features is presented in Table 3, while Supplementary Table 3 provides a summary of specific information for all patients.
In cohort extracted from the eICU-CRD, 33.47% (n = 407) of hemorrhagic stroke patients had unfavorable neurological outcome at discharge, while 25.47% (n = 243) of ischemic stroke patients had unfavorable neurological outcome at discharge.The proportion of unfavorable neurological outcome for hemorrhagic and ischemic stroke patients from MIMIC IV was 50.71% (n = 467) and 40.13% (n = 362), respectively.Supplementary Table 4 showed the specific information on hemorrhagic and ischemic stroke patients across training and testing sets.

Feature selection
The results of feature selection based on the LASSO algorithm was shown in Figures 2A-D.The optimal regularization coefficient lambda for each clinical outcome was selected through a crossvalidation process.In hemorrhagic cohort, the optimal lambda value for predicting neurological outcome was 0.01023367 and 0.01618563 in ischemic cohort.Figure 2E showed a Venn diagram of the features selected to predict neurological outcome in hemorrhagic and ischemic stroke.Upon application of the LASSO algorithm, a total of 26 and 23 features were discerned to be associated with neurological outcomes in patients diagnosed with hemorrhagic stroke and ischemic stroke, respectively.Encouragingly, it was observed that 13 features exhibited shared significance across both stroke types, emphasizing potential converging mechanisms influencing neurological outcomes.

Model performance
Four models, GBC, LR, MLP, and RF were generated to predict neurological outcome in the extracted cohort using the selected features.Acute physiology and chronic health evaluation IV (APACHE IV), a wildly used method for evaluating critically ill patients, was assessed to compared with the generated models in testing set.A set of detailed performance metrics for various machine learning models was presented in Table 4. Figures 3A, C depicted the predictive performances of the four models and APACHE reference in terms of AUC curve and decision curve analysis (DCA) curve.Among the four models, LR model showed the highest accuracy (0.83), PPV (0.734), specificity (0.86), F1 (0.752) and AUC (0.887).The performance of the optimal model was improved compared to APACHE reference.According to the DCA curves of the four predictive models, the net benefit for LR model was larger over the range of the other models.
In the same way, the four models were generated to predict neurological outcome in ischemic stroke cohort.Figures 3B, D exhibited the discrimination performance of these models via AUC and DCA curves in the testing set.The predictive performance of each model was presented in Table 5.Of the four models, RF model had the best predictive performance (AUC = 0.867).Besides, RF model had the highest F1 (0.66).In addition, GBC model had the highest NPV (0.973) and sensitivity (0.945) and LR model had the highest accuracy (0.794), PPV (0.57) and specificity (0.799).

Model interpretation
In order to comprehensively elucidate the effect of various clinical features on the neurological outcome of stroke patients, we employed the SHAP algorithm to determine their overall positive or negative impact on the optimal model output.As shown in Figure 4A, GCS score ranked the first in importance among the features for predicting neurological outcome in hemorrhagic stroke cohort, followed by APS III score, age, glucose_min, and sodium_max.Figure 4B showed that GCS score had the most potent predictive power in predicting neurological outcome in ischemic stroke patients, followed by APS III score, SOFA score, mechanical ventilation and temperature_max.To offer a comprehensive overview of feature importance ranking in optimal model construction, Supplementary Figure 1 also provided the summaries of the feature importance ranking for predicting neurological outcome in hemorrhagic and ischemic stroke.
To facilitate a more intuitive understanding of how alterations in individual clinical features impact the model's output, we included SHAP dependence plots that depict the top 10 contributing features for each model in Figure 5, SHAP dependence plots of the remaining features were shown in Supplementary Figures 2, 3.In the case of hemorrhagic stroke, as illustrated in Figures 5A-J, patients with lower GCS score (<11), sodium_min (<137 mmol/L), SpO 2 _mean (<97%) and lighter weight (<80 kg) or higher APS III score (>50), glucose_min (>125 mg/dL), sodium_max (>142 mmol/L), heart rate_max (>100 beats/minute), temperature_max (>37.5 • C), and order age (>65 years) are more likely to be predicted as having unfavorable neurological outcome.In the ischemic stroke group, the effects of GCS score, APS III score, temperature_max, heart rate_max, glucose_min, and sodium_max on the model's predictions aligned with those observed in the hemorrhagic stroke group.Additionally, ischemic stroke patients with higher SOFA score (>3) and wbc_max (>10 9/L ), lower bicarbonate_min (<23 mEq/L) or treated with mechanical ventilation are more prone to be predicted as having unfavorable neurological outcome (Figures 5K-T).

External validation
Our predictive analyses of the MIMIC IV validation cohorts demonstrated a consistent pattern with that observed in the eICU-CRD cohorts (Supplementary Figure 4).In the validation set, the LR model demonstrated an AUC of 0.836 for predicting neurological outcome in hemorrhagic stroke patients, which represented a decrease of 0.051 compared to the testing set.For the ischemic stroke validation set, the AUC of the RF model was 0.856 for neurological outcome, demonstrating a reduction of 0.011.A comprehensive overview of external validation results is in Supplementary Tables 5, 6.

Online tool for prediction
Based on the optimal model for predicting neurological outcome in hemorrhagic and ischemic stroke, along with the relevant clinical variables that it encompasses, we devised an online prediction tool (Figure 6).By selecting the stroke type and entering the relevant clinical data, the user can obtain an automatically generated ID for outcome query.After entering the above ID into the query interface, the user can obtain the neurological outcome prediction result.This prediction tool is accessible at: http://www.strokeprophet.cn.

Discussion
This retrospective analysis delved into the medical records of stroke patients, utilizing data from the eICU-CRD multicenter database, and effectively validated the findings using the MIMIC IV database.By employing multiple algorithmic techniques and various machine learning models, we successfully identified the features of clinical indicators within the first 24 h of admission that are highly correlated with the neurofunctional state at discharge.Importantly, we integrated the SHAP algorithm and existing literature to provide comprehensive interpretability and inference for these factors, which holds significant value in guiding future prospective research and supporting clinical decision-making in stroke care.Most importantly, the results of this study have been translated into a practical online tool that enables precise prediction of neurologic prognosis in hemorrhagic and ischemic stroke patients using features from the first 24 h in the ICU.
Categorically, stroke can be divided into two primary subtypes: ischemic stroke, accounting for 87% of all cases, and hemorrhagic stroke, comprising intracerebral hemorrhage and subarachnoid hemorrhage, which collectively make up 10 and 3% of stroke occurrences, respectively.Prompt emergency interventions aimed at restoring blood flow prove crucial in improving patient prognosis in the context of ischemic stroke.Conversely, effective management of hemorrhagic stroke necessitates surgical hemostasis and the control of intracranial pressure (14).Existing studies primarily consist of single-center, retrospective observational research aimed at understanding the differences in neurofunctional influencing factors between these two subtypes of stroke (14).However, there a lack of reported research investigating the neurofunctional prediction and exploration of risk factors specifically in these two stroke patient populations.Our study significantly contributes to filling this gap in knowledge.
Our comprehensive study reveals that within 24 h of hospital admission, ventilator use, age, temperature, GCS score, APS III score, heart rate, blood sodium levels, blood calcium levels, respiratory rate, blood bicarbonate levels, blood glucose  Recent discoveries indicate that the requirement for mechanical ventilation among stroke patients may exhibit a stronger correlation with the site of brain injury rather than the stroke subtype itself.Consequently, optimizing ventilation strategies assumes a crucial role in enhancing the prognosis of patients within the hemorrhagic stroke cohort (15).In stark contrast, our investigation unveils an intriguing observation: aside from its influence on hemorrhagic stroke outcomes, mechanical ventilator employment also exerts a significant detrimental impact on the prognosis of neurological function in patients suffering from ischemic stroke.Furthermore, our findings identify respiratory rate as a contributing factor affecting the neurological prognosis in both stroke subtypes.These observations may be germane to the development of complicating conditions such as ventilatorassociated pneumonia and acute respiratory distress syndrome (16).Therefore, when treating stroke patients, individual circumstances should be carefully considered, mechanical ventilation therapy should be optimized.
Several studies have yielded compelling evidence regarding the influence of hypernatremia and hypernatremia on the neurological prognosis of stroke patients (17,18).Notably, we have identified a novel association between blood bicarbonate ion levels and the neurological prognosis in both hemorrhagic and ischemic stroke patients, a correlation that has not been previously reported in the literature.The presence of acute hypernatremia and underlying diabetes, recognized risk factors for cerebrovascular disease, further exacerbate the neurological prognosis in stroke patients (19).Additionally, heightened heart rate and elevated body temperature emerge as crucial factors contributing to an unfavorable prognosis among this patient population (20, 21).Our investigation also reveals that advanced age significantly impairs neurological recovery, with older stroke patients exhibiting diminished prognosis relative to their younger counterparts (22).Intriguingly, the observations in elderly mice suggest an enhanced propensity for neutrophil plugging in the ischemic brain microcirculation post-stroke, resulting in compromised blood flow and worsened prognosis (23).Furthermore, dysfunctionality within microglia in aged mouse brains may contribute to the deteriorating neurological prognosis following stroke (24).Similarly, Peng et al.'s report on stroke cohorts from 1990 to 2019 also indicates a relative inadequacy in healthcare for stroke patients across all age groups, underscoring the need for further investigation into the impact of age on mortality trends (25).Disparities in neurological prognosis between racial groups have been noted, with Black patients exhibiting poorer outcomes compared to White patients, potentially linked to stroke subtype (26).Furthermore, Huang et al.'s study also found that race is one of the top 11 most important FIGURE SHAP dependence plots of the top features of predicting neurological outcome in hemorrhagic stroke cohort (A-J) and ischemic stroke cohort (K-T).SHAP, Shapley additive explanations; GCS, Glasgow Coma Scale; APS III, Acute Physiology Score III; SpO , Oxygen saturation; SOFA, Sequential Organ Failure Assessment; WBC, white blood cell.
features for predicting 28-day all-cause in-hospital mortality among hypertensive ischemic or hemorrhagic stroke patients (27).Our study corroborates these findings, highlighting that White patients and Hispanic patients experience inferior neurological prognosis in hemorrhagic stroke, while other ethnic groups fare worse in ischemic stroke.Remarkably, Black patients exhibit poorer prognoses across both stroke subtypes (28).Nevertheless, comprehensive investigations are warranted to validate and elucidate these observations.Scoring systems play a pivotal role in evaluating the condition and predicting the prognosis of patients, thereby guiding treatment strategies and facilitating informed decision-making.Among these scoring systems, the GCS has widespread application in stroke patients, individuals undergoing open-heart surgery, and those with varying degrees of coma arising from diverse etiologies (29).Our investigation highlights a significant correlation between GCS score and APS III scores and the neurological outcome of patients afflicted by both hemorrhagic and ischemic stroke.Notably, hemorrhagic stroke patients exhibited diminished neurological prognosis when their GCS score was ≤11, whereas ischemic stroke patients experienced compromised functional prognosis when their GCS score fell below ≤12.However, the underlying mechanisms contributing to these findings necessitate further inquiry.Furthermore, our study reveals that the SOFA score exerts a more pronounced impact on the neurological prognosis of individuals with ischemic stroke relative to the aforementioned scoring metrics (30).
Electrolyte imbalance manifests earlier in individuals suffering from ischemic stroke and holds the potential to serve as a prognostic indicator for neurological outcomes among stroke patients (31).In light of our investigation, it has been established that excessive serum chloride ion concentration (>110 mmol/L) exerts a more pronounced influence on neurological function within the hemorrhagic stroke patient cohort, thereby emphasizing the role of averting hyperchloremia in enhancing neurological outcomes for individuals with hemorrhagic stroke (32).Remarkably, our novel observation reveals that heightened anion gap exhibits a greater impact on the neurological prognosis of ischemic stroke patients compared to ischemic individuals, an unprecedented discovery in current literature.Clinical practitioners should therefore dedicate substantial attention to blood gas analysis results, ensuring prompt correction of electrolyte imbalances while avoiding overcorrection.
Studies in the field of stroke hemodynamics have predominantly concentrated on ischemic stroke due to its higher incidence rate compared to hemorrhagic stroke.Both hypertension and hypotension have detrimental effects on acute ischemic stroke.Generally, antihypertensive therapy is recommended when systolic blood pressure (SBP) exceeds 220 or 180 mmHg in patients eligible for thrombolytic therapy.In the case of acute hemorrhagic stroke, SBP exceeding 140 mmHg is associated with an unfavorable neurological prognosis (33).Notably, our investigation reveals that when monitoring circulatory dynamics in stroke patients, it is crucial to adopt distinct approaches depending on whether the patient has a hemorrhagic or ischemic stroke.For patients with hemorrhagic stroke, the prognostic significance of maximum diastolic blood pressure on neurological function is of greater importance.Conversely, in ischemic patients, both the highest and lowest mean blood pressure levels exert an impact on prognosis.Furthermore, the administration of vasopressors escalates the risk of poor neurological prognosis among individuals with hemorrhagic stroke.
In conjunction with the aforementioned findings, it is imperative to consider several additional factors that warrant separate attention and pertain to patients afflicted with different stroke types.Among individuals with hemorrhagic stroke, maintaining an optimal level of SpO 2 proves advantageous for prognosis.Notably, accumulating evidence suggests that oxygen therapy is not devoid of risks and should be withheld when SpO 2 exceeds 90%.Moreover, in patients at risk of hypercapnia, the threshold for oxygen therapy ought to be even lower, specifically ≤ 88%.Once oxygen therapy has been initiated, diligent monitoring of the patient's oxygen saturation and inhaled oxygen concentration becomes paramount in order to maintain SpO 2 within the targeted range (93-96%) and avert the detrimental consequences of hypercapnia (34).Furthermore, it is worth noting that hemorrhagic stroke patients with lower body weight exhibit an inferior neurological prognosis (35), while diminished levels of blood urea nitrogen are likewise associated with an unfavorable neurological outcome (36).
In the context of individuals suffering from ischemic stroke, heightened emphasis should be placed on monitoring the patient's blood leucocyte count as a means to mitigate infection risk and enhance neurological prognosis (37).Moreover, it is crucial to duly acknowledge the substantial influence exerted by the international normalized ratio (INR) and plasma prothrombin time on ischemic stroke patients, especially following thrombolytic therapy, warranting careful attention (38, 39).These discoveries offer valuable insights into the management of stroke patients necessitating intensive care unit treatment and furnish guidance for improved resource allocation in clinical decision-making.
The final point we wish to emphasize is that machine learning algorithms excel at constructing complex models and making informed decisions when provided with ample and relevant data.In our study, the choice of machine learning models was carefully guided by their unique strengths and documented effectiveness in similar tasks.The GBC model was selected for its exceptional performance in managing high-dimensional data and intricate feature relationships.By leveraging ensemble learning techniques, GBC adeptly captures non-linear associations within the data, making it well-suited for addressing complex classification challenges (40).The RF model was chosen owing to its resilience and efficiency in handling high-dimensional datasets with numerous features (41).Notably, Elsaid et al.'s study on hemorrhagic transformation prediction found that both RFC and GBC models, capable of capturing non-linear interactions among predictor variables, yielded the best predictive performance (AUC: 0.91, 95% CI: 0.85-0.95;AUC: 0.91, 95% CI: 0.86-0.95,respectively) (42).For binary classification tasks, the simplicity, interpretability, and effectiveness of LR made it a prudent choice.Su et al.'s investigation on post-stroke cognitive impairment (PSCI) underscored LR's efficacy in discerning the varying impacts of different factors on cognitive impairment (43).The MLP model was selected for its ability to handle complex non-linear relationships, as demonstrated in Zhou et al.'s study on dementia cognitive footprint recognition (44).MLP emerged as the top-performing model, showcasing satisfactory performance in dementia identification.In our study, the LR model demonstrated the highest AUC of 0.887 in the hemorrhagic stroke cohort, whereas the RF model exhibited superior performance in the ischemic stroke cohort, with an AUC of 0.867.Considering the insightful observations derived from SHAP regarding the importance of variables in predicting each stroke subtype, we analyze and speculate that the primary reason for the differences in optimal models lies in the interactions among predictive factors.Specifically, in hemorrhagic stroke prediction, the most crucial variables predominantly consisted of continuous variables such as age, weight, and glucose.This suggests that LR, with its linear decision boundary, may be better suited to capture the linear relationships among these predictors, thereby achieving higher predictive performance.Conversely, in ischemic stroke prediction, notable variables included the use of mechanical ventilation (mechvent).This intriguing finding suggests that RF, with its capacity to capture complex non-linear relationships, may excel in identifying intricate patterns involving categorical variables like mechvent, which LR may overlook due to its linear nature.These findings lead us to hypothesize that LR's superior performance in hemorrhagic stroke prediction may be attributed to the predominantly linear relationships among predictors, while RF's effectiveness in predicting ischemic stroke-induced damage may stem from its ability to capture non-linear associations.However, further research and validation are warranted to confirm these hypotheses and gain deeper insights into the underlying mechanisms driving model performance discrepancies between stroke subtypes.
The present study has several limitations that warrant consideration.Firstly, our analysis was retrospective in nature, which resulted in the exclusion of certain variables with high missing rates but potentially significant predictive value, such as lactate levels.Secondly, our inability to directly evaluate patients necessitated reliance on diagnostic codes to define our patient cohort, raising the possibility of incorrect associations due to misclassifications.Thirdly, our focus on patients who remained in the ICU for over 24 h and had an mGCS score 24 h before discharge resulted in the exclusion of a sizable number of patients, and this may have introduced some bias into our results.In addition, some important features, namely neuroimaging and electrophysiological examinations, were not included in this study due to the limitations of the database.

Conclusions
In summary, we employed advanced machine learning techniques to identify and compare the shared and distinct factors that influence hospital discharge outcomes in both hemorrhagic and ischemic stroke.By elucidating these factors, our research will contribute to the advancement of knowledge in the field of stroke, inform medical decision-making, and guide personalized treatment strategies.Our results also demonstrate that machine learning models outperform single standard scoring systems, with the potential to ultimately improve patient outcomes.
The present study was a retrospective modeling study utilizing data from two widely used databases-the eICU Collaborative Research Database (eICU-CRD v2.0) spanning 2014-2015 and the Medical Information Mart for Intensive Care IV (MIMIC IV version 2.2) covering 2008-2019.The author of this study underwent rigorous training, culminating in certification (number 49437998), and was tasked with data extraction following secure access to both databases.

FIGURE
FIGUREWorkflow of the study and flow diagram of study participants inclusion.

FIGURE
FIGURE Feature selection using LASSO algorithm.Tuning parameter λ selection by -fold cross-validation with minimum criteria.The binomial deviance is plotted on the y-axis against the logarithm of λ on the x-axis.Vertical dotted lines are drawn at the optimal value of λ, which corresponds to the point where the model achieves the best fit to the data.(A, C) Depicting partial likelihood deviance of the LASSO regression for neurological outcome of patients with hemorrhagic and ischemic stroke, respectively.(B, D) Coe cient profile of the clinical features associated with neurological outcome of patients with hemorrhagic stroke and ischemic stroke.(E) Venn diagram of selected features associated with neurological outcome in hemorrhagic and ischemic stroke.LASSO, least absolute shrinkage and selection operator.

FIGURE
FIGURE Model performance in hemorrhagic stroke test cohort and ischemic stroke test cohort.Receiver operating characteristic (ROC) analysis of GBC, LR, MLP, RF models, and APACHE reference.(A) Hemorrhagic stroke; (B) Ischemic stroke.Decision curve analysis (DCA) curves of four machine learning models.(C) Hemorrhagic stroke; (D) Ischemic stroke.GBC, gradient boosting classifier; LR, logistic regression; MLP, multi-layer perceptron; RF, random forest; APACHE, acute physiology and chronic health evaluation.

FIGURE
FIGURESHAP analysis of the optimal model for neurological outcome in hemorrhagic stroke cohort (A) and ischemic stroke cohort (B).SHAP, Shapley additive explanations.

FIGURE
FIGUREAn example of online tool for prediction.
TABLE The international classification of diseases codes of the patients included in the study.
TABLE Clinical features overview.
TABLE Model performance summary of all models in hemorrhagic stroke test cohort.
TABLE Model performance summary of all models in ischemic stroke test cohort.