Original Article
Optimal sampling in derivation studies was associated with improved discrimination in external validation for heart failure prognostic models

https://doi.org/10.1016/j.jclinepi.2020.01.011Get rights and content

Abstract

Objectives

The objective of the study was to identify determinants of external validity of prognostic models.

Study Design and Setting

We systematically searched for studies reporting prognostic models of heart failure (HF) and examined their performance for predicting 30-day death in a cohort of consecutive 3,452 acute HF patients. We applied published critical appraisal tools and examined whether bias or other characteristics of original derivation studies determined model performance.

Results

We identified 224 models from 6,354 eligible studies. The mean c-statistic in the cohort was 0.64 (standard deviation, 0.07). In univariable analyses, only optimal sampling assessed by an adequate and valid description of the sampling frame and recruitment details to collect the population of interest (total score range: 0–2, higher scores indicating lower risk of bias) was associated with high performance (standardized β = 0.25, 95% CI: 0.12 to 0.38, P < 0.001). It was still significant after adjustment for relevant study characteristics, such as data source, scale of study, stage of illness, and study year (standardized β = 0.24, 95% CI: 0.07 to 0.40, P = 0.01).

Conclusion

Optimal sampling representing the gap between the population of interest and the studied population in derivation studies was a key determinant of external validity of HF prognostic models.

Introduction

Heart failure (HF) is a leading cause of mortality and morbidity and a heavy social and economic burden in affluent societies [1]. In addition to innovative therapies, accurate prognostic assessment tools with optimal risk management within the existing framework of medicine are a major key to reduce these burdens [2]. Notably, they support appropriate decision-making by health care stakeholders, facilitating well-planned life management for patients themselves and efficient and effective patient and hospital management for health care providers and policymakers, using limited time and resources [[2], [3], [4], [5]]. Inappropriate selection and application of prognostic models may conversely cause huge loss in terms of risk-treatment mismatch and regulatory failures [[6], [7], [8]]. The demand for efficient prediction models is higher than ever in the era of precision medicine [9].

Despite the plethora of prognostic models for HF currently available [1], no confident guides exist regarding how to select models and which models to apply among them [10]. It is recommended to select models replicated in a number of studies and derived from cohorts similar to the population in question for the target outcomes [[10], [11], [12], [13], [14], [15]]. However, it would be hard to find a model with complete correspondence because the backgrounds of prognostic models are more or less different from the situations to be applied in terms of patients’ characteristics, prevalence of disease, incidence of adverse events, available treatments, outcomes to be predicted, time span of prediction, and intended moment of prediction. Therefore, understanding which differences affect and what determines the predictive value of prognostic models is required for efficient utilization of the models [10,[12], [13], [14]].

In addition to similarities in study characteristics, the quality of the derivation study is a potential determinant of external validity of prognostic models [[14], [15], [16], [17]]. A biased study would have a distorted spectrum of the population, predictors and outcomes, deriving models composed of inappropriate variables with inappropriate weighting [18]. This is why a reporting guideline for prediction studies [19] and guides for systematic review of prediction models [14,20,21] emphasize study bias. However, the effect of study bias on predictive ability has not been systematically examined because of lack of a standard, systematic, and quantitative assessment tool.

Recently, an expert panel has developed a method for critical appraisal of prognostic research, the Quality In Prognosis Studies (QUIPS) [22]. A standardized format for identifying important clinical characteristics in prediction models has also been proposed (CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies [CHARMS]) [20]. To address the issue, we used a cohort of 3,452 Japanese patients with acute HF and examined the performance of existing prognostic models of HF in predicting 30-day death after admission in the cohort. To identify factors associated with greater vs. less predictive power, we examined characteristics of the derivation study, including the study biases and similarities of study characteristics to those in the Japanese cohort.

Section snippets

Overall study design

Our study involved four steps. First, we systematically identified all studies reporting prognostic models of HF. Second, we assessed the risk of bias and other study characteristics of the original derivation studies based on the recently published assessment tools. Third, we examined the performance of the models in an existing cohort of HF. Fourth, we examined associations between model derivation studies and their predictive power. The institutional review boards at National Cerebral and

Systematic review of prognostic models

Among 6,340 articles identified through electronic database search and 103 additional papers identified through reference list search, we identified 224 models in 224 studies (Supplementary Fig. S1). Supplementary Table S1 summarizes all included studies and their characteristics. Supplementary Fig. S2 summarizes the risk of bias according to QUIPS. The percentage agreement and the kappa statistic among the independent raters were 72% and 0.48, respectively, overall (Supplementary Table S2).

Major findings

In the present study, we demonstrated that optimal sampling was the key determinant of the performance of prognostic models for HF among all the bias components and study characteristics in the derivation studies. Similarities of derivation and replication cohorts showed association with predictive value in univariable analysis, such as data source, stage of illness, age, gender, study year, and outcomes to be predicted, but this methodological bias was the only significant. All users and

Conclusion

Optimal sampling in derivation studies was a key determinant of the performance of HF prognostic models when applied in an acute HF cohort for predicting 30-day death after admission rather than similarities of characteristics in the studies. Consideration and presentation of study bias is important for all model users and developers.

CRediT authorship contribution statement

Naotsugu Iwakami: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Writing - original draft. Toshiyuki Nagai: Data curation, Funding acquisition, Investigation, Project administration, Writing - original draft. Toshiaki A. Furukawa: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Supervision, Writing - review & editing. Aran Tajika: Investigation, Writing - review & editing. Akira Onishi:

Acknowledgments

The authors are grateful for the contributions of all investigators, clinical research coordinators, and data managers involved in the WET and NaDEF study. The authors also thank Aya Ichizawa and Keiko Fujii (Kyoto University) for help collecting the full papers for the review.

References (33)

  • D.S. Lee et al.

    Risk-treatment mismatch in the pharmacotherapy of heart failure

    JAMA

    (2005)
  • M.B. Mortensen et al.

    Limitations of the SCORE-guided European guidelines on cardiovascular disease prevention

    Eur Heart J

    (2017)
  • F.S. Collins et al.

    A new initiative on precision medicine

    N Engl J Med

    (2015)
  • R.D. Riley et al.

    External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges

    BMJ

    (2016)
  • A.C. Justice et al.

    Assessing the generalizability of prognostic information

    Ann Intern Med

    (1999)
  • K.G. Moons et al.

    Prognosis and prognostic research: application and impact of prognostic models in clinical practice

    BMJ

    (2009)
  • Cited by (4)

    • The Global Effect of the COVID-19 Pandemic on STEMI Care: A Systematic Review and Meta-analysis

      2021, Canadian Journal of Cardiology
      Citation Excerpt :

      There were methodological challenges in studying the effect of pandemic on the prognosis of STEMI patients that conventional risk of bias tools, such as Risk Of Bias In Non-randomised Studies - of Interventions (ROBINS-I), would have been suboptimal in the evaluation of COVID-19 era studies because of the presence of collider bias.28 In a systematic review Iwakami et al. described the suitability of the Quality In Prognosis Studies (QUIPS)29 in the bias assessment of included studies that compared the prognosis of retrospectively replicated cohorts.30 The QUIPS tool evaluates the risk of bias of studies using 6 domains, namely, study participation, study attrition, prognostic factor measurement, outcome measurement, study confounding, and statistical analysis.

    • Predicting treatment effects in unipolar depression: A meta-review

      2020, Pharmacology and Therapeutics
      Citation Excerpt :

      Obtaining information from multiple sources would not only provide opportunity to optimise model performance and identify further predictor variables, but to also thoroughly assess the clinical utility of models and elucidate the specific clinical contexts and populations in which they are valid (Vaci et al., 2020). A recent study in cardiology highlighted that differences between the population in which prognostic models are developed and the populations in which models are tested represents a key determinant of model validity (Iwakami et al., 2020). We therefore emphasise the pressing need to undertake external validation of clinical prediction models to thoroughly assess their performance, clinical utility and to guide their appropriate clinical application.

    Funding Statement: The WET-NaDEF collaboration project was supported by Grants-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science, and Technology (Japan Society for the Promotion of Science [JSPS KAKENHI]), in Tokyo, Japan, Grant 23591062 and 26461088 (Dr. Yoshikawa); Grants-in-Aid for Young Scientists from JSPS KAKENHI, Grant 15K19402 (Dr. Nagai) and 18K15860 (Dr. Shiraishi); a Japan Health Labour Sciences Research, in Tokyo, Japan, Grant 14528506 (Dr. Yoshikawa); and the Sakakibara Clinical Research Grant for Promotion of Sciences, Japan, 2012, 2013, and 2014 (Dr. Yoshikawa); a grant from the Japan Agency for Medical Research and Development, in Tokyo, Japan, Grant 201439013C (Dr. Kohsaka), and a grant from the Japan Cardiovascular Research Foundation, in Tokyo, Japan, Grant 24-4-2 (Dr. Anzai). The funders played no role in conducting the research.

    Disclosures: Dr. Furukawa reported personal fees from Meiji Seika, grants and personal fees from Mitsubishi-Tanabe, personal fees from MSD, personal fees from Pfizer, outside the submitted work. Dr. Kohsaka reported grants and personal fees from Bayer Yakuhin, grants from Daiichi Sankyo, personal fees from Bristol-Myers Squibb/Pfizer, outside the submitted work. All the other authors reported that they have no relationships relevant to the contents of this paper to disclose.

    View full text