Prediction of Intracranial Hypertension and Brain Tissue Hypoxia Utilizing High-Resolution Data from the BOOST-II Clinical Trial

The current approach to intracranial hypertension and brain tissue hypoxia is reactive, based on fixed thresholds. We used statistical machine learning on high-frequency intracranial pressure (ICP) and partial brain tissue oxygen tension (PbtO2) data obtained from the BOOST-II trial with the goal of constructing robust quantitative models to predict ICP/PbtO2 crises. We derived the following machine learning models: logistic regression (LR), elastic net, and random forest. We split the data set into 70–30% for training and testing and utilized a discrete-time survival analysis framework and 5-fold hyperparameter optimization strategy for all models. We compared model performances on discrimination between events and non-events of increased ICP or low PbtO2 with the area under the receiver operating characteristic (AUROC) curve. We further analyzed clinical utility through a decision curve analysis (DCA). When considering discrimination, the number of features, and interpretability, we identified the RF model that combined the most recent ICP reading, episode number, and longitudinal trends over the preceding 30 min as the best performing for predicting ICP crisis events within the next 30 min (AUC 0.78). For PbtO2, the LR model utilizing the most recent reading, episode number, and longitudinal trends over the preceding 30 min was the best performing (AUC, 0.84). The DCA showed clinical usefulness for wide risk of thresholds for both ICP and PbtO2 predictions. Acceptable alerting thresholds could range from 20% to 80% depending on a patient-specific assessment of the benefit-risk ratio of a given intervention in response to the alert.


Introduction
Annually, >5.5 million people experience severe (sTBI) traumatic brain injury (TBI) worldwide. 1 Outcomes have not substantially changed over the past 30 years with mortality of 30-40% and very limited breakthroughs after almost 200 randomized controlled trials (RCTs). 2,3 There are few effective treatments for sTBI, and presently management is centered on the early evacuation of mass lesions and identification and treatment of secondary brain injury (SBI) that evolves in the hours and days after initial impact. Contemporary critical care of patients with sTBI aims to identify and manage SBI by monitoring of intracranial pressure (ICP), cerebral perfusion pressure (CPP), and partial brain tissue oxygen tension (PbtO 2 ). 4 This approach is recommended by the Brain Trauma Foundation guidelines and more recently by the Seattle International Severe TBI Consensus Conference. 5,6 The therapeutic paradigm underlying these recommendations is a reactive one, where fixed, population-based treatment thresholds are observed and acted upon to alleviate SBI. However, by the time treatment is enacted, it may be too late. The ability to predict the onset of these ''crisis'' events would provide clinicians with valuable time to attempt aborting or manage these episodes more effectively, instead of merely reacting when thresholds are violated. 7 Prediction efforts can be broadly divided into two approaches: 1) ICP forecasting, involving algorithms designed to predict future ICP values, and 2) ICP dose prediction, which involves algorithms aimed at the development of early warning systems of impending crisis events. Our work belongs to the latter category. A few studies have been published attempting to forecast future ICP values, or predict the onset of ICP crisis events, and one investigation has explored both ICP and PbtO 2 dose predictions. [8][9][10] In this article, we report on the performance and clinical usefulness of predictive models for intracranial hypertension and brain tissue hypoxia in high-frequency data obtained from the BOOST-II (Brain Oxygen Optimization in Severe Traumatic Brain Injury) phase II randomized trial. 11 The objectives are 2-fold: 1) explore machine learning models utilizing high-frequency data and using a minimal set of features that can predict intracranial hypertension and brain tissue hypoxia insults as defined in BOOST-II; 2) show that this modeling is of clinical utility based on decision curve analysis (DCA).

Methods
The BOOST II Data Set is the source of data for the present work, and the study has been approved by the University of Chicago (UChicago) institutional review board (IRB) under protocol IRB19-1847. The BOOST-II study was a two-arm, single-blind, prospective, randomized controlled multi-center phase II trial assessing safety and efficacy of a management protocol optimizing PbtO 2 post-sTBI (ClinicalTrials.gov registration NCT: 00974259); 110 patients were randomized. After randomization, the control group (ICP only) was managed with a standard-of-care step-wise intervention strategy triggered by an ICP ‡20 mm Hg for >5 min. The intervention group (ICP + PbtO 2 ) was medically managed with step-wise treatments to correct either an ICP increase or a reduction in PbtO 2 (£20 mm Hg, >5 min).
The study concluded that a treatment protocol guided by both ICP and PbtO 2 reduces the duration of measured brain tissue hypoxia. 11 The combined ICP/PbtO 2 group was managed according to four types of events: A) no interventions; B) high ICP; C) low PbtO 2 ; and D) high ICP + low PbtO 2 (Supplementary Table S1 provides patient characteristics who experienced at least one ICP or PbtO 2 event vs. patients who experienced no events). For ICP prediction, we investigated the succession of events from A or C to B or D; once this change was detected, subsequent observations were discarded (i.e., all B/D->B/D episodes were removed). For PbtO 2 , the succession used was A or B to C or D; once this change was detected, subsequent observations were discarded (i.e., all C/D->C/D removed). We constructed several sets of features for both outcomes. First, we used event number and the last recorded measurement. Then, we expanded to include trends (mean, median, standard deviation, minimum, maximum, difference between first and last recording, difference between most recent recordings, and area under the curve [AUC]) over the preceding 30 min.
Finally, we added frequency-domain measures (slope of power spectrum distribution, variance of power spectrum distribution, and approximate entropy). We derived the following machine learning models: logistic regression (LR), elastic net (EN), and random forest (RF). We split the data set into 70-30% for training and testing and utilized a discrete-time survival analysis framework and 5-fold hyperparameter optimization strategy for all models. Briefly, data within the training set were blocked into 30-min intervals where the last observations were chosen as representative of the block. Models were trained to predict the outcome within the next block. The test data set was not blocked when evaluating model performances. We compared model performances on discrimination between events and non-events of increased ICP or low PbtO 2 with the area under the receiver operating characteristic (AUROC) curve. We further analyzed clinical utility or net benefit from making treatment decisions based on predictions through a DCA using a set of observations sampled from the test data set that had an equal proportion of outcomes. 12 Results Figure 1 (upper panel) depicts the AUROC best model performances for ICP prediction, with the RF exhibiting best performance (AUROC, 0.78), whereas for PbtO 2 , the LR model was the best performing (AUROC, 0.84). Supplementary Tables S2 and S3 provide AUROCs for ICP and PbtO 2 prediction within the next 30 min for the various models tested. In summary, the LR model that only used the most recent ICP reading and episode number predicted ICP crisis events within the next 30 min with good discrimination (AUC, 0.73; 95% confidence interval, 0.72-0.74). However, extending to an RF model improved performance (RF AUC 0.78 vs. LR AUC 0.73; p < 0.001). For PbtO 2 crisis event prediction, the addition of longitudinal features to a feature set comprising the most recent PbtO 2 reading and episode number did not improve performance of the LR model (AUC, 0.84 vs. 0.83). Extension to RF decreased performance (RF AUC 0.82 vs. LR AUC 0.84), indicating overfitting.
Overall, when considering discrimination, number of features, and interpretability, we identified the RF model that combined the most recent ICP reading, episode number, and longitudinal trends over the preceding 30 min as the best performing model for predicting ICP crisis events within the next 30 min. An LR utilizing the most recent PbtO 2 reading, episode number, and longitudinal trends over the preceding 30 min was the best performing model for predicting PbtO 2 (see Fig. 2 for feature variable importance plots). The DCA showed clinical usefulness for a wide risk of thresholds for both ICP and PbtO 2 predictions (Fig. 1, lower panel). Acceptable alerting thresholds could range from 20% to 80% depending on a patientspecific assessment of the benefit-risk ratio of a given intervention in response to the alert.

Discussion
The approach underlying current management of ICP, CPP, and PbtO 2 is based on mostly fixed, generic treatment thresholds as triggers for an escalating list of interventions. An important caveat is that by the time treatment is initiated, even if a return to desirable values is achieved, irreversible SBI may have occurred. This could partly explain the lack of effect, or indeed the negative clinical outcome, of a reactive management mode toward fixed values of ICP in guiding treatment. [13][14][15] Combining statistical machine learning with clinical insight allows the construction of robust quantitative models to predict ICP/PbtO 2 crises. Although previous work has been published on predicting ICP crises, these approaches used either low-resolution data, utilized hundreds of independent variables, or require hours-long epochs of monitoring to deliver predictions. [8][9][10]16 Recently, Carra and colleagues undertook the validation testing of Gaussian processes (GPs)-based predictive modeling using the high-resolution CENTER-TBI data set. 10 These algorithms demonstrated good intercenter robustness, with the model achieving an accuracy of 88%, sensitivity of 83%, and specificity of 91% in providing a 30-min forewarning of an ICP crisis (defined as an ICP >30 mm Hg lasting at least 10 consecutive min). However, using GPs, though promising with retrospective data, is computationally intensive and requires 4 h of input data to allow it to make a prediction. In contrast, we show here that it is possible to achieve reasonable predictive performance using few and clinically intuitive features, such as the most recent ICP/PbtO 2 reading, episode number, and longitudinal trends over the preceding 30 min, to form models that are perhaps less prone to overfitting and more likely to generalize to clinical settings. The performance of these features is consistent with past work on ICP/ PbtO 2 predictive modeling from a single-center retrospective study of 817 sTBI patients based on prospectively collected physiological data. 9 It should be noted that metrics of accuracy such as AUROC do not address the clinical value of a model, and, in fact, models with very different AUROCs can be comparable, or even models with higher AUROCs can sometimes lead to inferior clinical utility. 17 For these reasons, we undertook DCA, as suggested by Steyerberg and colleagues. 12 The decision curves demonstrate that the presented models can be of clinical utility, given that within a wide threshold range they provide higher net benefits than strategies of always treating (alert-all policy), and over current practice in which no warning exists (no alert policy). Setting an alerting threshold is a clinical decision, with acceptable thresholds ranging from 20% to 80% depending on a patientspecific assessment of the benefit-risk ratio of a given intervention in response to the alert; the riskier the intervention, the higher should the alerting threshold be.
These models require prospective validation to inform individualized prediction assessments in real time. Besides a prospective assessment of accuracy, real-time validation can provide mechanistic insights. A limitation of the presented purely data-driven predictive modeling is that it does not address the mechanisms behind predicted crises events. In order to design clinical management approaches, characterization of the mechanisms responsible for generating crises is further required. This approach may be novel in targeting SBI after TBI; nevertheless, it has been shown in other clinical environments that delivering alerts for predicted cardiorespiratory instability to providers leads to a marked decrease in both instability duration and the numbers of occurrences of serious instability episodes. 18,19 Higher doses of intracranial hypertension, cerebral hypoperfusion, and brain tissue hypoxia have been associated with worse outcomes after sTBI. 9,11,[20][21][22][23] The ability to predict such events could enhance efforts to reduce the burden of these insults and, by extension, potentially improve functional outcomes.

Conclusion
Combining statistical machine learning with clinical insight allows the construction of robust, clinically valuable quantitative models to predict ICP/PbtO 2 crises. These models require prospective validation for their performance and in order to gain mechanistic insights. An accurate, automatic system of alarm delivery sets the stage for considering and testing a preemptive clinical algorithm for the prevention of crisis events. Such a clinical algorithm, if successful, could shift our treatment approaches from the current reactive mode to a preemptive one. responsibility of the authors and do not necessarily represent the official views of the National Institute of Neurological Disorders and Stroke (NINDS) or NIH.
Author Disclosure Statement Christos Lazaridis, David O. Okonkwo, and Ramon Diaz-Arrastia have been part of the original investigator team for BOOST-II. Supplementary Table S1  Supplementary Table S2  Supplementary Table S3