Introduction

Hip fractures are a global health problem [1], most commonly affecting adults in their 80s [2]. Due to the increase in life expectancy of the world’s population, an increase in the incidence of hip fractures is expected in the next few years [3, 4]. According to epidemiological projections, 6.26 million individuals will be affected by hip fractures annually by 2050 [5]. Hip fractures are associated with an increased risk of mortality amongst older adults, with a cumulative 30-day mortality rate between 5 and 10% [6]. Over a 1-year postoperative period, it could accumulate up until approximately 30% [4].

In order to identify the patients at greatest risk, preoperative predictors of mortality following hip fracture surgery have been studied extensively [7,8,9]. Predictors for early mortality are particularly important, as they lie at the core of preoperative decision-making in clinical guidelines [10]. Preoperative prognostics could be used to better inform patients and family on the consequences of the different treatment alternatives, leading to better shared decision-making. This is particularly relevant for frail patients with a limited life expectancy, who face higher risk of mortality but not poorer quality of life when they opt for conservative (non-surgical) management following shared decision-making [11]. Therefore, shared decision-making could be used to select a treatment that is optimal in terms of both clinical outcomes and patients’ personal values [12, 13]. During this process, it is essential that decisions are supported by the best available evidence [14]. Meta-analyses can substantiate shared decision-making as they are one of the strongest resources in evidence-based medicine [15].

However, there are several methodological limitations of existing meta-analyses in this field [7,8,9]. Firstly, the effects of predictors have frequently been pooled across widely ranging follow-up times, causing their clinical implication for early mortality to become ambiguous. Secondly, the statistical uncertainty in cumulative evidence, caused by the small number of available studies per predictor, has received little attention so far. Finally, to the best of our knowledge, none of the existing meta-analyses in this field has incorporated the Grading of Recommendation Assessment, Development and Evaluation (GRADE) criteria to assess the confidence in the cumulative evidence per predictor [16, 17]. Therefore, there is ample room for improvement in consolidating the current evidence base.

To support and improve evidence-based medicine for hip fracture patients, it is important to adequately reflect uncertainty in cumulative evidence. This will allow clinicians to assess the risk of early mortality more confidently, helping them to adequately inform their patients. The aim of this study is to conduct a meta-analysis, accompanied by GRADE assessments and sensitivity analyses, to detect valid predictors for early mortality following hip fracture surgery.

Method

This review was reported according to the PRISMA 2020 statement [18].

Search strategy

Scopus and PubMed were searched from inception to 3 November 2021. The search strategy comprised a combination of four key terms relating to older adults, hip fractures, mortality, and predictors. The complete search strategy is shown in Online Resource 1. Additionally, the Dutch Hip Fracture Audit (DHFA) was contacted for internal research reports.

Selection criteria

In this study, the inclusion criteria were as follows: (1) the article describes a cohort study examining preoperative predictors of mortality following hip fracture surgery, (2) the study reports on primary evidence, (3) the article is written in English, and (4) the full-text document can be retrieved. The exclusion criteria were as follows: (1) the article describes an unrepresentative population (i.e. mean/median age below 70 years, solely a single gender included, solely periprosthetic, or pathological fractures included), (2) the article does not report on preoperative predictors, (3) the article does not report on independent risk factors, (4) the article does not report on the statistics of interest (i.e. no odds or hazard ratios and no 95% confidence intervals), (5) the article does not report on mortality as an outcome or reports on mortality as part of a composite score of multiple adverse events, and (6) the article does not report on mortality within 1 year.

Data collection and extraction

The title, abstract, and full-text screenings were performed by MB. The abstract and full-text screenings were independently verified by WSN on a sample basis (70%). Disagreements were resolved through discussion, without need for adjudication by a third reviewer. Study characteristics were extracted onto standardised tables containing author, year, country, study design, sample size, gender distribution, mean/median age, fracture types, treatment types, and mortality rates.

Outcomes

Adjusted odds ratios (ORs) and adjusted hazard ratios (HRs) of preoperative predictors for 30-day mortality following hip fracture surgery were primary outcomes. Independent predictors for mortality within 1 year were secondary outcomes.

Risk of bias

Risk of bias was assessed with the Quality In Prognosis Studies tool [19]. A quarter of the articles were assessed independently by two reviewers (MB, WSN), who collectively refined the protocol to resolve ambiguities in the assessment criteria. The remaining articles were assessed by MB using the refined assessment criteria (Online Resource 2).

Data analysis

All predictors that were reported at least twice were synthesised in narrative summary tables [20], independent of whether they were reported as ORs or HRs. A minimum of three studies was set for quantitative synthesis, and eligibility for pooling was based on consistency in variable definitions. ORs and HRs were meta-analysed separately for each of the predictors, using DerSimonian-Laird random-effects models [21] to accommodate for population and intervention heterogeneity [22,23,24]. Heterogeneity was quantified with the I2 statistic, and results were summarised with forest plots.

Sensitivity analyses were conducted with respect to publication bias and statistical uncertainty caused by the small number of available studies per predictor. The former was inspected with the trim-and-fill method [25] using the \({R}_0^{+}\) and \({L}_0^{+}\) algorithms as recommended by Duval and Tweedie [26]. The latter was inspected with the modified Knapp-Hartung method with ad hoc variance correction [27] and a Bayesian hierarchical model [28] (details on the Bayesian model specification can be found in Online Resource 3). The Bayesian model was particularly of interest for the sensitivity analysis, based on its demonstrated ability to effectively deal with small sample sizes [22, 29,30,31]. All analyses were performed with R version 4.1.2 (R foundation, 2020, Vienna, Austria), using the metafor [32], brms [33], and robvis [34] packages.

Certainty of evidence assessment

Each pooled estimate was appraised using the GRADE criteria [16, 17] (Online Resource 4). When the quality of evidence was inconsistent across multiple pooled estimates of the same predictor, the quality of the pooled estimate based on most studies and patients was chosen for the final appraisal.

Results

Search and included studies

From the initial database yield of 1869 articles, 139 were reviewed in full text after assessing the eligibility based on titles and abstracts. Subsequently, an internal research report published by the DHFA was included and analysed. Reapplication of the exclusion criteria to the full texts yielded 100 articles for narrative synthesis and 33 articles for meta-analysis. The selection process is shown in Fig. 1.

Fig. 1
figure 1

PRISMA flow diagram describing the identification, screening, and selection of articles

A summary of the characteristics of the included studies is presented in Online Resource 5. Overall, early mortality was studied relatively infrequently: predictors for inpatient mortality were reported in 14 studies [35,36,37,38,39,40,41,42,43,44,45,46,47,48], predictors for 30-day mortality were reported in 35 studies [38, 40, 49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81], and predictors for 1-year mortality were reported in 60 studies. Amongst the 33 studies included in the meta-analysis involving 462,699 patients, one study did not report the 30-day mortality rate [52]. The median 30-day mortality rate and interquartile range across the remaining studies were 8.0% (6.5–9.6%). It was noteworthy that detailed descriptions of the hip fracture aetiology were generally lacking in these studies. The mechanism of injury was only explicitly mentioned in eight studies, with six studies reporting on the exclusion of patients with high-energy trauma [54, 65,66,67, 69, 71] and two studies reporting on the inclusion of them [75, 81]. Similarly, explicit statements on presence of pathological fractures could only be ascertained for 14 studies, with 12 reporting on complete exclusion of them [38, 40, 50, 54, 58, 66, 68,69,70,71, 73, 78] and two studies reporting on the inclusion of them [75, 81].

Risk of bias

Figure 2 depicts the unweighted risk of bias summary of the 33 studies included in the meta-analysis. Twelve articles were judged to be at overall low risk of bias, 15 were judged to have some concerns, and six were judged to be at high risk of bias. High risk was found for bias arising from participation in three studies and for bias arising from confounding in two studies. Amongst the 14 pooled estimates, one had a cumulative weight of high-risk studies of 71.6%. For all remaining pooled estimates, this was below 30% with a median and interquartile range of 2.5% (0–14.0%). The risk of bias assessments of all 100 studies included in the narrative review is shown in Online Resource 6.

Fig. 2
figure 2

Unweighted risk of bias summary of the studies included in the meta-analysis of predictors for 30-day mortality

Predictors for 30-day mortality

An overview of all meta-analysed predictors for 30-day mortality is shown in Table 1, and forest plots of all high-quality evidence predictors are shown in Fig. 3. The remaining forest plots are shown in Online Resource 7. None of the pooled evidence was downgraded for publication bias based on the outcomes of the sensitivity analysis using the trim-and-fill method.

Table 1 Summary of findings for the predictors of 30-day mortality following hip fracture surgery. The degree to which the studies included in the pooling procedures supported the association between the predictor and the increased risk of 30-day mortality is denoted by the direction of the association per study, where + denotes a significant result in favour of the association, 0 denotes a non-significant result in favour of the association, and − denotes a significant result refuting the association
Fig. 3
figure 3

Forest plots of high-quality evidence predictors for 30-day mortality following hip fracture surgery. The right panel depicts the risk of bias assessments according to the bias domains of the Quality in Prognosis Studies tool, i.e. study participation (D1), study attrition (D2), prognostic factor measurement (D3), outcome measurement (D4), study confounding (D5), and statistical analysis and reporting (D6). The risk of bias levels of low, moderate, and high were colour-coded in green, yellow, and red, respectively

Age

Age was reported as both categorical and continuous variables. Due to inconsistencies in the cut-off levels of age strata [49,50,51, 56, 60, 61, 65, 77], pooling was limited to studies reporting the influence of age per year increase. Analysis of 10 studies [40, 53, 57, 63, 64, 66, 68,69,70, 76] including 154,353 patients provided high-quality evidence that a year increase in age increased the risk of 30-day mortality, with an OR of 1.06 and 95% CI: 1.04–1.07. Figure 3 indicates that the pooled estimate overlapped with all 95% CIs, except for those reported by Cao et al. (1.07–1.08) and Würdemann et al. (1.01–1.05). Since the margin by which the CIs did not overlap was small, the interpretation of I2 was deemed misleading. Therefore, it was decided against downgrading the quality of evidence for inconsistency, despite I2 = 69%.

American Society of Anesthesiologists score

American Society of Anesthesiologists (ASA) scores were reported as both categorical and continuous variables across the studies. Amongst the reports of categorically treated ASA scores, two studies were excluded from pooling as there were insufficient data for the respective cut-off levels [53, 70]. Analysis of six studies [40, 56, 64, 73, 76, 79] including 12,994 patients provided high-quality evidence that individuals in ASA strata III–V were at a greater risk of 30-day mortality than individuals in ASA strata I–II, with an OR of 2.69, 95% CI: 2.12–3.42, and I2 = 0%.

Furthermore, analysis of three studies [57, 63, 78] including 5394 patients provided moderate-quality evidence that each unit increase in ASA score increased the risk of 30-day mortality with an OR of 2.62, 95% CI: 2.21–3.12, and I2 = 0%. The quality of evidence was downgraded by one level for risk of bias as the cumulative weight of studies at high risk of bias was 71.6%.

Chronic renal failure

Renal failure was defined as end-stage renal failure (ESRF) [61], unspecified chronic renal failure (CRF) [70], moderate to severe CRF [75], and a joint stratum of acute renal failure (ARF) and early to end-stage CRF [80]. To keep the analysis homogeneous, instances of ARF were excluded from pooling.

Analysis of three studies [61, 70, 75] including 248,872 patients provided moderate-quality evidence that CRF increased the risk of 30-day mortality, with an OR of 1.61, 95% CI: 1.11–2.34, and I2 = 50%. The quality of evidence was downgraded by one level for imprecision as both the Knapp-Hartung CI (0.52–5.23) and the Bayesian credible interval (CrI) (0.73–3.09) contained the null effect.

Dementia

Three studies did not report their dementia diagnoses [40, 70, 75], three studies reported on dementia in Alzheimer’s disease [52, 77, 80], and one study reported on memory loss, (pre)senile, and vascular dementias [65]. Two studies diagnosed dementia using an Abbreviated Mental Test Score ≤ 6 [49, 67], and one study diagnosed it with a Hodkinson’s Abbreviated Mental Test Score ≤ 6 [82]. Pooled estimates were not stratified by dementia diagnosis.

Analysis of three studies [52, 65, 77] including 29,929 patients provided high-quality evidence that dementia increased the risk of 30-day mortality, with a HR of 1.47, 95% CI: 1.31–1.64, and I2 = 0%.

Furthermore, analysis of seven studies [40, 49, 67, 70, 75, 80, 82] including 389,185 patients provided moderate-quality evidence that dementia increased the risk of 30-day mortality, with an OR of 1.57 and 95% CI: 1.30–1.90. The quality of evidence was downgraded for inconsistency due to substantial heterogeneity (I2 = 94%).

Diabetes

Analysis of four studies [68, 70, 75, 80] including 378,573 patients provided moderate-quality evidence that diabetes increased the risk of 30-day mortality, with an OR of 1.09, 95% CI: 1.01–1.18, and I2 = 28%. The quality of evidence was downgraded for imprecision as both the Knapp-Hartung CI (0.96–1.25) and Bayesian CrI (0.84–1.43) contained the null effect.

Gender

Analysis of 15 studies [38, 40, 49, 51, 53, 56, 57, 63, 68,69,70, 75, 76, 78, 80] including 411,554 patients provided high-quality evidence that males were at a greater risk of 30-day mortality than females, with an OR of 1.99, 95% CI: 1.87–2.13, and I2 = 58%.

Similarly, analysis of six studies [50, 55, 60, 65, 71, 77] including 23,988 patients provided high-quality evidence that males were at a greater risk of 30-day mortality than females, with a HR of 2.13, 95% CI: 1.94–2.34, and I2 = 0%.

Haemoglobin

The influence of haemoglobin (Hb) was tested for anaemia (Hb ≤ 10 g/dL) [49, 55, 67] and per millimole per litre decrease [40, 68, 78]. The former three studies comprised both ORs and HRs, causing an insufficiency in consistent data for pooling.

Analysis of three studies [40, 68, 78] including 5838 patients provided moderate-quality evidence that a millimole per litre decrease in Hb increased the risk of 30-day mortality, with an OR of 1.37, 95% CI: 1.17–1.61, and I2 = 40%. The quality of evidence was downgraded for imprecision as both the Knapp-Hartung CI (0.96–1.96) and Bayesian CrI (0.95–1.94) contained the null effect.

Heart failure

Four studies did not report their heart failure diagnoses [60, 61, 70, 72], two studies diagnosed heart failures using ICD-10 code I50 [75, 80], and one study included multiple hypertensive heart diseases in addition to ICD-10 code I50 [52]. Pooling was limited to studies reporting ORs since there were only two studies reporting HRs [52, 60].

Analysis of five studies [61, 70, 72, 75, 80] including 384,312 patients provided moderate-quality evidence that heart failures increased the risk of 30-day mortality, with an OR of 2.20 and 95% CI: 1.28–3.78. The quality of evidence was downgraded for inconsistency due to substantial heterogeneity (I2 = 99%).

Institutional residence

Analysis of six studies [40, 49, 67, 68, 78, 81] including 12,338 patients provided high-quality evidence that individuals living in an institution were at a greater risk of 30-day mortality than individuals living in their own home, with an OR of 1.81, 95% CI: 1.31–2.49, and I2 = 56%.

Malignancy

Four definitions of malignancies were found: history of any malignancy [61, 68, 80] excluding non-invasive skin cancer [49], non-metastatic cancer [67, 70, 75], and metastatic cancer [70, 72, 75]. Amongst the cases of metastatic cancer, no information could be found on whether bone metastases were included. Separate pooled estimates were computed for a history of any malignancy (excluding non-invasive skin cancer), non-metastatic cancer, and metastatic cancer.

Analysis of four studies [49, 61, 68, 80] including 136,160 patients provided moderate-quality evidence that a history of any malignancy increased the risk of 30-day mortality, with an OR of 2.39 and 95% CI: 1.69–3.38. The quality of evidence was downgraded by one level for inconsistency due to substantial heterogeneity (I2 = 61%).

Furthermore, analysis of three studies [67, 68, 70] including 136,906 patients provided low-quality evidence that non-metastatic cancer increased the risk of 30-day mortality, with an OR of 1.17 and 95% CI: 1.08–1.27. The quality of evidence was downgraded by one level for imprecision as both the Knapp-Hartung CI (0.99–1.73) and Bayesian CrI (0.95–1.86) contained the null effect and by another level for inconsistency due to substantial heterogeneity (I2 = 80%).

Finally, analysis of three studies [70, 72, 80] including 270,355 patients provided high-quality evidence that metastatic cancer increased the risk of 30-day mortality, with an OR of 2.83, 95% CI: 2.58–3.10, and I2 = 0%.

Narrative review findings

The narrative review findings of predictors for postoperative mortality within 1 year, including 30-day mortality, are summarised in Table 2. Overall, the results were congruent with the meta-analysis. For institutional residence, however, the rate at which significant associations with mortality were found differed between short-term and long-term follow-ups. Table 1 shows that two-thirds of the studies contributing to the pooled estimate for institutional residence were insignificant. Upon including 4-month and 1-year follow-ups, two-thirds of the associations tested between institutional residence and mortality were significant.

Table 2 Summary of narrative review findings of adjusted odds and hazard ratios for the association between predictors and postoperative mortality within 1 year

Discussion

This paper reports on the results of the first GRADE-compliant meta-analysis focusing on predictors of 30-day mortality following hip fracture surgery. In total, five high-quality evidence predictors were identified: age, male gender, ASA classification, institutional residence, and metastatic cancer. Additionally, six moderate-quality evidence predictors were identified: CRF, dementia, diabetes, Hb, heart failures, and a history of any malignancy. Finally, low-quality evidence was found for the influence of non-metastatic cancer.

To optimally use these findings in clinical practice, a few considerations must be made. Firstly, although a history of any malignancy is predictive of 30-day mortality, substantial heterogeneity exists in its prognostic value across studies (I2 = 61%). Mortality risk predictions could be improved if a distinction is made between non-metastatic and metastatic cancer. Our results showed that the respective 95% CIs of 1.11–1.56 and 2.58–3.10 were distinct and showcased little variability, indicating that the mortality risk differed significantly between these two malignancy types. Although the necessity to make this distinction might seem straightforward, various 30-day mortality risk scores have not done this yet [68, 73, 135]. In accordance with the Charlson Comorbidity Index (CCI) [136], risk predictions should distinguish between non-metastatic and metastatic cancer to provide more accurate and personalised prognoses.

Secondly, CRF could manifest itself in different degrees of severity. Amongst the pooled studies, only one exclusively reported on the effect of ESRF [61]. Due to the low ESRF prevalence in 29/746 patients, the respective 95% CI was wide (1.05–10.01). Consequently, the meta-analysis did not reveal a need to stratify the risk estimate by severity of CRF as the individual 95% CIs overlapped by a sufficient margin to keep the between-study heterogeneity within acceptable bounds at I2 = 50%. However, larger studies with ESRF prevalences of 113/3981 [35] and 886/44,419 patients [48] consistently reported larger risks of inpatient mortality with ORs of 6.70 and 95% CI: 4.20–10.69 and 6.70 and 95% CI: 3.57–12.58, respectively. Therefore, the pooled OR of 1.61 reported in this review is unlikely to be representative for patients with ESRF. Especially since CRF is highly prevalent amongst older adults [137], it becomes increasingly important to personalise prognoses based on the severity of CRF, rather than merely its presence or absence.

Thirdly, heart failures might require a more careful definition to be of better prognostic value. The pooled estimate reported in this review exhibited substantial unexplained heterogeneity (I2 = 99%). Even across studies which both resorted to ICD-10 code I50 for heart failure diagnosis [75, 80], the ORs differed substantially (95% CI: 1.54–1.73 vs 95% CI: 3.68–4.13). A disadvantage of ICD-10 code I50 is that it includes both heart failures with preserved ejection fraction and heart failures with reduced ejection fraction. Studies have shown that decreases in the left ventricular ejection fraction (LVEF) generally increase the risk of mortality [138]. Hence, it is postulated that the LVEF is an unobserved variable which could explain the high I2 value. Therefore, future studies should acknowledge the varying degrees of severity in heart failures and report the diagnoses in terms of the LVEF.

Several important limitations are noted. Some studies might have been overlooked since only two databases were searched for this review. Furthermore, the number of studies focusing on independent predictors of 30-day mortality is relatively limited, since most focus on more long-term prognoses. Consequently, the limited number of available studies restricted the use of other diagnostics besides the trim-and-fill method to assess risk of publication bias more reliably. We abstained from using publication bias diagnostics based on funnel plots and Egger’s test due to their very low power [139]. Hence, the conclusions drawn with respect to publication bias should be interpreted with caution.

Furthermore, the list of predictors is incomplete due to restrictions in pooling. Ischaemic heart disease was repeatedly associated with 30-day mortality but could not be pooled as the results were a mix of ORs and HRs [60, 75, 80]. Additionally, inconsistency in reporting was identified as a systemic cause for incompleteness in the list of predictors. The CCI [38, 51, 70] and the number of comorbidities [49, 58, 67] were also repeatedly found to be significant predictors of 30-day mortality. However, they could not be pooled since the cut-off levels by which patients were categorised were inconsistent.

Another issue induced by inconsistency in reporting manifested itself in the quality of pooled evidence. The pooled OR of Hb per millimole per litre decrease was based on three studies instead of five due to inconsistent definitions for the influence of Hb. The respective quality of evidence was now downgraded for imprecision, which is postulated to have arisen due to a lack of power. Had all five studies been eligible for pooling, then sufficient power might have been attained to circumvent downgrading. Hence, future studies should establish which variable definitions and cut-off levels are most clinically relevant to the field of geriatric trauma surgery, e.g. by using the methods reported by Ogawa et al. [140], to improve consistency in reporting.

Conclusion

This study identified five high-quality, six moderate-quality, and one low-quality evidence predictors for 30-day mortality following hip fracture surgery based on preoperative data. Many of the published studies and widely used risk scores define predictors as the mere presence or absence of diseases. To provide better risk predictions, future studies should step away from such coarse definitions. According to the findings in this study, malignancies, CRFs, and heart failures should be further subcategorised by severity to increase their prognostic value in prediction models. Hopefully, the results of this meta-analysis will enable clinicians to better identify patients who are at high risk of 30-day mortality. This information can be used to better inform patients on their prognosis, as one of the contributing factors which may lead to better shared decision-making in the preoperative phase.