Clinical Prediction Models and Predictors for Death or Adverse Neurodevelopmental Outcome in Term Newborns with Hypoxic-Ischemic Encephalopathy: A Systematic Review of the Literature

Background: Although many predictive parameters have been studied, an internationally accepted, validated predictive model to predict the clinical outcome of asphyxiated infants suffering from hypoxic-ischemic encephalopathy is currently lacking. The aim of this study was to identify, appraise and summarize available clinical prediction models, and provide an overview of all investigated predictors for the outcome death or neurodevelopmental impairment in this population. Methods: A systematic literature search was performed in Medline and Embase. Two reviewers independently included eligible studies and extracted data. The quality was assessed using PROBAST for prediction model studies and QUIPS assessment tools for predictor studies. Results: A total of nine prediction models were included. These models were very heterogeneous in number of predictors assessed, methods of model derivation, and primary outcomes. All studies had a high risk of bias following the PROBAST assessment and low applicability due to complex model presentation. A total of 104 predictor studies were included investigating various predictors, showing tremendous heterogeneity in investigated predictors, timing of predictors, primary outcomes, results, and methodological quality according to QUIPS. Selected high-quality studies with accurate discriminating performance provide clinicians and researchers an evidence map of predictors for prognostication after HIE in newborns. Conclusion: Given the low methodological quality of the currently published clinical prediction models, implementation into clinical practice is not yet possible. Therefore, there is an urgent need to develop a prediction model which complies with the PROBAST guideline. An overview of potential predictors to include in a prediction model is presented.

very heterogeneous in number of predictors assessed, methods of model derivation, and primary outcomes.All studies had a high risk of bias following the PROBAST assessment and low applicability due to complex model presentation.A total of 104 predictor studies were included investigating various predictors, showing tremendous heterogeneity in investigated predictors, timing of predictors, primary outcomes, results, and methodological quality according to QUIPS.Selected high-quality studies with accurate discriminating performance provide clinicians and researchers an evidence map of predictors for prognostication after HIE in newborns.Conclusion: Given the low methodological quality of the currently published clinical prediction models, implementation into clinical practice is not yet possible.Therefore, there is an urgent need to develop a prediction model which complies with the PROBAST guideline.An overview of potential predictors to include in a prediction model is presented.

Introduction
Hypoxic-ischemic encephalopathy (HIE) is the leading cause of death or long-term neurological disabilities in full term newborns with a reported incidence ranging from 1 to 8 events per 1,000 live births in developed countries [1][2][3][4][5].Accurate prediction of long-term outcome of infants suffering from HIE is essential for counseling parents, for making evidence-based intensive care decisions, for triaging infants for follow-up programs, and for identifying those patients who should be targeted in future intervention studies.Despite the availability of multiple clinical, neurophysiological, and imaging tests and scores, prognostication after HIE remains a challenge [6,7].Ideally, predictors or prediction models should accurately distinguish infants with an adverse outcome from patients surviving with normal outcome over a wide range of baseline risks.
Recently, three systematic reviews have been published evaluating reported individual predictors for adverse outcome following HIE.However, these reviews have several limitations preventing generalizability and completeness of the results.First, most reviews described individual predictors in the pre-cooling area [6].Second, they summarized only the neurophysiological and neuroimaging modalities, excluding potential valuable clinical predictors used in daily care [7,8].In addition to individual predictors, multivariate prediction models investigating a combination of these predictors have also emerged in literature, but to our knowledge, no systematic review evaluating the available prediction models for HIE has been performed [9].
The primary aim of this study was to identify, appraise, and summarize all available multivariate prediction models for newborns with HIE who underwent therapeutic hypothermia for outcome death or neurodevelopmental impairment.Second, this study aimed to identify and appraise the evidence on all reported individual predictors for future derivation or updating existing prediction models.

Search Strategy and Data Sources
This systematic review was performed according to the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) [10] and reported following the Preferred Reporting Items for Systematic Reviews and Metaanalyses (PRISMA) guidelines [11].The protocol was registered in PROSPERO (registration number: CRD42020173899).An electronic search for studies predicting death or adverse neurodevelopmental outcome in term newborns suffering HIE after perinatal asphyxia with and without hypothermia was performed in the bibliographical databases MEDLINE All and EMBASE, both via the Ovid interface, in August 2021.The full search query is provided in online supplementary material 1 (for all online suppl.material, see www.karger.com/doi/10.1159/000530411).The primary search was not restricted to language or time.Furthermore, the citations of eligible studies were cross-referenced for additional studies.

Eligibility Criteria
Studies were included if they: (1) were observational retrospective or prospective cohort studies, (2) included term newborns (≥35 weeks gestational age, birth weight ≥1,800 g) with HIE due to perinatal asphyxia, treated with or without therapeutic hypothermia, (3) investigated multivariate prediction models, or one or more individual predictors/determinants for the outcome at interest and included an analysis of the effect of the predictor(s) on outcome (i.e., sensitivity/specificity/negative predictive-or positive predictive value/receiver operating characteristic (ROC) curve/ odds ratio/relative risk or risk ratio).Only diagnostic tests currently used in daily care in this population were considered relevant (e.g., neurophysiological test; imaging tools such as ultrasound or MRI; clinical scores; laboratory parameters), (4) clearly documented the follow-up period.A follow-up period of at least 18 months was deemed sufficient for long-term neurodevelopmental outcome, (5) only English, Spanish, French, German, and Dutch articles were included.Studies were excluded if they: (1) included patients with congenital malformations, (2) were published before 2009 (as it was deemed that studies performed before 2009 might not reflect the current state of medical care in the NICU), (3) did not report hypothermia and normothermia as separate patient groups 4) did not publish the full-text study.

Data Collection
After removing the duplicate articles, titles, and abstracts were screened using Rayyan by two independent reviewers (J.F.L. and T.R.H.), based on title, abstract, date, and language [12].Articles were sorted in the following groups: "clearly eligible", "clearly not eligible" and "possibly eligible."Clearly eligible and possibly eligible articles were read full text by both reviewers to decide on inclusion or exclusion.Disagreements were resolved by discussing the article with a third reviewer (W.O.) until consensus was reached on in-or exclusion.If the full text was not available or if a conference abstract was found, authors were contacted for the associated full text if available.

Data Extraction
Data extraction was independently performed by two reviewers (J.L. and K.B.), a standardized modified tool based on the CHARMS checklist [10].Included studies were divided into group 1: those studies describing derivation and/or external validation of a prediction model; group 2: those studies describing oneor multiple individual predictors.Of each included study the following information was extracted: patient characteristics, predictive factors, outcome, and the range of probabilities for the outcome.The outcome of the studies was divided into the categories: short-term and long-term outcomes.A short-term adverse outcome (at discharge or before 18 months) was defined as death or abnormal clinical examination, whereas a long-term adverse outcome at ≥ 18 months was defined as Predictors for Outcome of Term Newborns with Hypoxic-Ischemic Encephalopathy death, abnormal clinical examination, or abnormal neurodevelopmental testing as defined per study.

The Methodological Quality
Two reviewers (J.F.L. and K.B.) independently performed the critical appraisal of all included studies.
To assess the methodological quality of the prediction models, the prediction model study Risk of Bias Assessment Tool (PRO-BAST) was used [13].For prediction model studies, the PROBAST assesses the applicability and the risk of bias in the following domains: source of data, participants, model evaluation, outcomes to be predicted, candidate predictors, sample size, missing data, model development, model performance, model evaluation, results, interpretation, and discussion.
The Quality in Prognosis Studies (QUIPS) tool was used to assess the risk of bias in studies describing individual predictive variables [14].The QUIPS tool measures the following methodological domains: study participation, study attrition, prognostic factor measurement, outcome measurement, statistical analysis, and reporting.The QUIPS domain confounding was excluded because confounding is a causal concept and correction for confounding in predictive studies could lead to bias by overfitting the prediction [15,16].

Data Analyses
The results of the prediction models were summarized by narrative synthesis and descriptive statistics, reporting the performance of each included model addressing discrimination, calibration, and explained variance [17].Each model was assessed for generalizability.The number of events per variable (EPV) was calculated for each developed model by dividing the number of events by the number of candidate predictors.
Results of individual predictor studies were summarized by narrative synthesis and descriptive statistics and were evaluated for predictive values.For overview, a descriptive map of all investigated individual predictors was made.In order to highlight potential individual predictors for future incorporation into a multivariate prediction model, we selected and reported those studies with good methodological quality and reported individual predictors with fair discriminating performance.The cut-off for selecting the studies with good methodological quality was having ≥3 low risk of bias domains and not having a high risk of bias domain in the QUIPS assessment.A study was judged to have fair discriminating performance if a statistical significance odds ratios or risk ratio was published, if an AUC of >0.7 was found, or if a specificity in that study higher than the median of all studies (87%) was found.We performed a sensitivity analyses, including only prospective studies, to determine the amount of heterogeneity and risk of bias for this subset of records.

Results
The primary search identified 13,275 articles, of which 3,583 duplicates were excluded, and 4,844 articles were manually removed because of a publication date before the year 2009.After title and abstract screening, 4,270 articles were excluded, leaving 578 articles for full-text reading.In total, nine studies were included describing prediction models [9,[18][19][20][21][22][23][24][25], and 104 studies (online suppl.material 2) were included describing individual predictors.Two studies which described a prediction model as well as individual predictors were included in both groups [20,22].The full selection process is shown in Figure 1.

Prediction Models Description Participants and Studies
The study characteristics of the nine included prediction models are described in Table 1.The majority of these prediction models (n = 5) were single-center retrospective cohort studies.Most studies were conducted in the USA or Europe, and the patients were included between 2003 and 2016.All prediction models included asphyxiated newborns ≥36 weeks gestational age, and all patients were treated with hypothermia except for one study, which developed a prediction model during normothermia treatment [9].

Model Development
The number of patients included in the selected prediction model studies ranged from 38 to 666, and the number of events for each outcome ranged from 9 to 242.The EPV, which could be calculated in 78 percent of the models, was less than 10 in one model [19].The median number of predictors included in the final model was 4 (range 2-6).The selected predictors in the final models are displayed in Table 1.The predictors APGAR and SARNAT score, spontaneous activity on physical examination, and gray matter abnormalities on MRI were used in multiple models.Most models specified the timing of some variables, but only one model specified the timing of all the variables of the models [24].Furthermore, the prediction models were heterogeneous regarding the outcomes of interest, manner of the outcome testing and timing.
Different methods for model development were used, such as multivariable logistic regression [9,[19][20][21][22][23][24][25], community health access, and rural transformation model (CHART modelling) [23], support vector machine modelling (SVM modelling) [18], machine learning model with network modelling [21].The candidate predictors were selected based on p values, expert opinion, or a backward selection process.None of the prediction models used multiple imputation for missing data.Except for one study [25] using single imputation, the other studies either did not mention how missing data were handled [9,18,20,21] or they excluded observations with missing data [19,22,23].In addition, none of the studies described how continuous variables were handled.

Model Performance, Validation, and Presentation
The model's discriminative ability was reported in two models with the AUC: 0.861-0.9[22,24], and only one of these studies reported a ROC curve [24].Two studies described the calibration process [9,19], and two studies described the explained variance 0.42-0.819[9,20].Internal validation was performed in one model using 80% of the cohort as developing cohort, and 20% of the cohort for internal validation [24].None of the identified models were externally validated.Only one study reported the complete regression syntax of their model, including the intercept and the final predictor weights (online suppl.material 3) [25].

Risk of Bias
Table 2 shows the quality assessment of all included prediction model studies by the PROBAST [13].The overall risk of bias of all developed models was rated as high.This high risk of bias was found in the analysis domain due to inadequate handling of missing data, selection of predictors based on univariable analysis, absence of internal and external validation, the lack of full evaluation of the model's performance reporting the discrimination and calibration and the lack of the presentation of the final model with the included predictors and their assigned weights.

Individual Predictors Description Participants and Studies
An overview of all investigated individual predictors and their outcomes is displayed in online supplementary material 4-9, and the characteristics of each included study are in online supplementary material 10.  outcomes, and statistics used, were very heterogeneous.The majority of studies investigated the predictive value of imaging and neurophysiological testing, while others assessed maternal and antenatal data in the delivery room (N = 11 studies).The majority of these individual predictor studies were single center (75/104) and had a retrospective design (49/104).The studies were conducted in a variety of countries, and the patients were included between 1991 and 2019.

Risk of Bias
Table 3 shows the quality assessment of all included individual predictors studies by the QUIPS.A large heterogeneity was seen in the risk of bias, with only one study having a low risk of bias in all domains.The domains risk of bias of participants, study attrition summary, and prognostic factor measurement summary had the highest risks of bias.

Selection of Individual Predictors on Methodological and Statistical Quality
Only two studies investigating six antenatal predictors were of sufficient methodological quality (online suppl.material 2) and neither showed significant discriminating capacity.Routinely collected clinical data during admission were investigated in 37 studies (online suppl.material 2).Of these studies, eight showed good methodological quality (online suppl.material 2), and only neurological examination and the Thompson score were significant predictive factors for long-term outcomes.Seventeen studies (online suppl.material 2) were identified describing 20 laboratory predictors, such as pH, lactate, CKMB fraction.Only four studies had a good methodological quality (online suppl.material 2), but none reported sufficient discriminating accuracy.Of the 43 studies (online suppl.material 2) investigating neurophysiological or near infrared spectrometry (NIRS) predictors, only 15 studies had sufficient quality according to the QUIPS assessments (online suppl.material 2).Of these 15 studies, four high-quality studies reported EEG background pattern, timing of normalization, seizure burden, absence of sleep-wake cycling, and abnormal NIRS measurements as significant neurophysiological predictors (online suppl.material 2).Finally, of the 54 studies (online suppl.material 2) investigating imaging predictors, 22 studies had good methodological quality (online suppl.material 2).Of these 22 studies, six highquality studies found multiple significant discriminating predictors (e.g., abnormal basal ganglia thalamus, abnormal gray or white matter on MRI, abnormal posterior limb internal capsule, abnormal MR spectroscopy result, abnormal NICHD, or Rutherford MRI score) (online suppl.material 2).In these studies, the MRI was performed in the first 2 weeks of life.All individual predictors with accurate discrimination and which were investigated in studies of sufficient methodological quality are summarized in Table 4, suggesting that the most promising individual predictors for derivation of a prediction model might be a combination of clinical scores, and multiple measures in neurophysiological and neuroimaging testing.

Sensitivity Analysis
When only the prospective studies were included, we found 52 records.As shown in online supplementary material 11, the studies were still quite heterogeneous in outcome timing, predictor type and statistical methods, and outcome assessment itself.Also, the risk of bias assessed using QUIPS still showed a large heterogeneity across the studies (online suppl.material 12).

Discussion
This is the first systematic review identifying, appraising, and summarizing all published prediction models for death or long-term neurological outcome after HIE due to perinatal asphyxia.None of the nine prediction models had sufficient methodological quality nor were clinical feasible yet.
The identified models selected their candidate predictors for multivariate analysis if the univariate analysis showed statistical significance.Although there is no consensus on the optimal selection of candidate predictors, variable selection on the basis of univariate significance is known to yield over-fitted prediction models [26].
A major methodological flaw of all reviewed prediction models was that it was not always reported on how missing data were handled or that patients with missing data were excluded from the analysis.This leads to a biased effect estimation and decreases the discriminative ability of multivariable models compared with models where missing values are replaced by multiple imputation procedures [27].Furthermore, none of the reported models were externally validated.Prediction models are known to perform optimistically in a derivation dataset compared with the performance tested in an independent but similar population.Therefore, prediction models should always undergo external validation before implementation in clinical practice [28].
Finally, only one model reported a complete regression equation or a risk score [22].It is recommended that the final published prediction model should be presented in Predictors for Outcome of Term Newborns with Hypoxic-Ischemic Encephalopathy the form of an original regression equation including the intercept or baseline hazard.Incomplete model presentation limits the ability of other researchers and users to reproduce the study's findings or apply the model to obtain the predicted risk.For external validation purposes, the final model equation is mandatory.A positive finding when appraising the methodological quality of the prediction models was the high EPV in almost all studies.A key concept in the development of a prediction model for a binary outcome is the number of EPV.A small EPV is known to over fit models, making the application of these models unreliable in other populations.Generally, it is recommended to have an EPV of at least 10 to fit prediction models in clustered data using logistic regression [29].Although the EPV could not be calculated for two models, an EPV of <10 was only found in one model [19].Furthermore, it is remarkable that only 5/9 models reported their performance and only two of these nine models reported an AUC >0.85.
Despite this positive finding of an appropriate EPV and or performance, none of the identified prediction models can progress into an external validation phase given the other aforementioned methodological shortcomings and inadequate or complex model presentations.Future studies should follow the state-of-the-art guidelines for prediction model development to improve the quality of the derivation and reporting of new prediction models [13].
To assist future studies in developing new prediction models, we also appraised all studies investigating single/ individual predictive variables for the composite outcome death or long-term neurodevelopmental impairment after HIE following perinatal asphyxia.From these studies, a selection was made of predictors with a potential high discriminating performance, which could to be incorporated in a future prediction model (Table 4, online suppl.material 13).Studies with significant discriminating performance were summarized and assigned to three groups for potential use in a multivariate prediction model: clinical neurological assessment at birth (n = 2 studies), neurophysiological and NIRS (n = 6 studies) and neuroimaging assessments (n = 8 studies).The variables in these three groups should be explored when investigating the development of a prediction model for the short-and long-term outcomes after HIE.It is obvious that the intended timing of assessing the prediction determines which parameter should be incorporated.For example, the MRI DWI/ADC assessments will only be adding to the predictive value when performed between the 3rd and 7th day of life.When properly developing a tool for HIE prediction, adherence to several aspects of the TRIPOD and PROBAST guideline should carefully be taken into account.First, the EPV value should be <10 [13,30].Second, the missing data should be handled properly, e.g., by multiple imputation instead of whole case analyses.Third, not only the discriminating performance of the developed model, the AUROC, should be reported but also the calibration   performance of the prediction rule is needed for proper assessment.Finally, before a model can be implemented into clinical practice, it is important that a correction is performed for overfitting by internal and external validation [17].
Recently, three reviews were published showing the results of the pooled imaging and neurophysiological predictors [6][7][8].Although studies of low methodological quality were not excluded in these reviews, the results and conclusions were in line with the current overview of individual predictors.Van Laerhoven et al. [6] evaluated mainly studies in patients with normothermia treatment, demonstrating that EEG and aEEG performed best in the first week of life as potential predictors, directly followed by MRI.Ouwehand et al. [7] demonstrated that MRI had a high predictive accuracy when performed between the fourth and the eighth day after the hypoxic incident causing HIE.Ouwehand et al. [7] and Liu et al. [8] concluded that the background pattern of the aEEG at different time points, posterior limb internal capsule abnormalities, and apparent diffusion (ADC) map values of the Thalamus on MRI and magnetic resonance spectroscopy (H-MRS) results were reliable predictors for neurodevelopmental outcomes.Given the heterogeneity and the methodological quality of the studies, a meta-analysis of the neurophysiological predictors and imaging predictors was deemed not contributing; furthermore, the paucity of data concerning the clinical neurological and laboratory predictors made meta-analyzing impossible.The sensitivity analyses showed that the heterogeneity across the studies remained when only selecting the prospective studies.

Strengths and Weaknesses
To date, this is the first comprehensive overview of all available clinical prediction models and all available clinically used predictors.Selection and assessments of the literature were done following guidelines for systematic reviews [10,31].A careful double-blinded selection of the studies and extraction of data was performed by two independent authors.The PROBAST and QUIPS tool were used to perform a robust assessment of the risk of bias for each individual model or predictor, respectively.
However, there are limitations that need to be addressed.The main limitation to make any summary of results or conclusions was the methodological quality of studies themselves.We observed a high probability for bias and therefore a general low methodological quality in all prediction model studies.Furthermore, both the prediction model studies and studies on individual predictors showed a high level of heterogeneity regarding timing, outcome definition, and statistical methods and measures.To minimize bias (including publication bias), we chose to widen the search and to contact authors of all conference abstracts found.
Finally, 23 studies were excluded because infants included in these studies were treated with both hypothermia and normothermia, and the outcomes were not reported for each subgroup.Since the outcome of newborns with normothermia and hypothermia therapy differs, it is not recommended to combine these infants in one study group.

Conclusion
In conclusion, given the low methodological quality of the currently published clinical prediction models implementation into clinical practice is not yet recommended.Given the limitations in methodological quality and general applicability, a new model should be developed.The PROBAST guidelines should be followed for developing future high-quality prediction models.Several clinical, neurophysiological, and neuroimaging predictors showing significant performance are available for developing this high-quality prediction model.

Table 1 .
Prediction model studies characteristics

Table 2 .
Quality assessment prediction models

Table 4 .
Characteristics of predictor studies with good quality and fair discriminating performance