Original Article
Methods to select results to include in meta-analyses deserve more consideration in systematic reviews

https://doi.org/10.1016/j.jclinepi.2015.02.009Get rights and content

Abstract

Objectives

To investigate how often systematic reviewers encounter multiple trial effect estimates that are available for inclusion in a particular meta-analysis (multiplicity of results) and the methods they use to select effect estimates.

Study Design and Setting

We randomly sampled Cochrane and MEDLINE-indexed non-Cochrane reviews published between January 2010 and January 2012. The first presented meta-analysis of an effect measure for a continuous outcome in each review was identified, and methods to select results to include in this meta-analysis were extracted from review protocols and reviews. All effect estimates that were available for inclusion in the meta-analyses were extracted from trial reports.

Results

We examined 44 reviews. Multiplicity of results was common, occurring in 49% of trial reports (n = 210). Prespecification of decision rules to select results from multiple measurement scales and intervention/control groups (in multi-arm trials) was uncommon (19% and 14% of 21 review protocols, respectively). Overall, 70% of reviews included at least one randomized controlled trial with multiplicity of results, but this occurred less frequently in reviews with a protocol (risk difference, −25%; 95% confidence interval: −52%, 1%).

Conclusion

Systematic reviewers are likely to encounter multiplicity of results in the included trials. We recommend that systematic reviewers always consider predefining methods to select results to include in meta-analyses. Methods focusing on selection of measurement scales and how to deal with multi-arm trials would be most valuable.

Introduction

Systematic reviews of randomized controlled trials (RCTs) of health care interventions have the potential to have a major impact on patient health, research agendas, and policy making. However, the validity of systematic review findings can be compromised by challenges in undertaking meta-analysis. One challenge is that multiple effect estimates in a trial report may be available for inclusion in a particular meta-analysis [1], [2]. For example, a trial report may present effect estimates for two depression scales, at week three, six, and nine, each analyzed as unadjusted and adjusted for covariates. Multiplicity of effect estimates may lead to “selective inclusion of results,” whereby the process for selecting the trial effect estimates for inclusion in a meta-analysis is based on the estimates themselves, which may, in turn, result in biased meta-analytic effects [3].

Several organizations that produce systematic reviews (e.g., [4], [5], [6]) have recommended methods that aim to reduce selective inclusion of results. The methods (specified a priori) aim to uniquely identify results that will be included in a meta-analysis and can be placed in two broad categories, which we label “eligibility criteria to select results” and “decision rules to select results.” Eligibility criteria to select results include specifying lists of measurement scales, intervention/control groups, time points, and analyses that systematic reviewers consider eligible to include in the review (ideally based on some clinical or methodological rationale). Providing specific criteria discourages the use of broad outcomes such as “pain,” and instead encourages specification of details such as the eligible pain measurement scales and time points of interest to the review [1], [2].

Predefining eligibility criteria to select results is an effective method to minimize the number of effect estimates available for inclusion in a particular meta-analysis. However, this method may not always identify a single eligible effect estimate per trial, and in such cases, the addition of decision rules is useful. Decision rules are strategies to either select one effect estimate, or combine effect estimates, when multiple are available. An example of a decision rule to select one effect estimate is when commonly encountered measurement scales for a particular outcome domain (e.g., depression) are ranked based on their psychometric properties, and for trials that report the results of more than one scale, the results for the tool with the best measurement properties are selected. Such a strategy has previously been referred to as an “outcome data hierarchy” [2], [7], [8]. An example of a decision rule to combine effect estimates is when a trial includes more than one active treatment arm (e.g., placebo vs. high-dose drug vs. low-dose drug), and rather than selecting data from only one of the active arms (e.g., only one dosage group), data from all active treatment arms are combined (e.g., any dosage vs. placebo) [9], [10].

To our knowledge, only two previous studies have investigated multiplicity of results in trial reports or the methods systematic reviewers use to select results to include in meta-analyses [2], [11]. In the first study, that examined interobserver variation in results extracted from trials for use in meta-analyses, decision rules to select final vs. change from baseline values were reported in 4 of 10 review protocols [11]. In the second study [2], that examined the impact of multiplicity of trial results on meta-analysis results, multiplicity was found to be common, but methods to select results to include in meta-analyses were rarely predefined. In 83 RCTs included in 19 Cochrane reviews published from 2006 to 2007, 35% of the RCTs had multiple measurement scales, 29% had multiple intervention/control groups (i.e., in multi-arm RCTs), and 36% had multiple time points that were available for inclusion in a particular meta-analysis. In all review protocols, eligibility criteria for measurement scales and intervention/control groups were always defined, and eligibility criteria for time points were defined in eight (42%). In contrast, decision rules to select measurement scales or intervention/control groups were not reported in any of the review protocols, whereas a decision rule to select time points was reported in one review protocol (5%) [2].

To inform methods guidance regarding inclusion of results when there is multiplicity, several issues still require exploration. First, the protocols in Tendal et al. studies were published before 2006, and it is unclear whether reporting of eligibility criteria and decision rules to select results has changed over time. Second, most systematic reviewers do not report working from a review protocol [12], [13], and the methods used to select results to include in such reviews have not been examined. Third, there has been no investigation of the frequency of other types of multiplicity which may arise in RCTs [e.g., reporting of results from intention-to-treat (ITT) and per-protocol or unadjusted and covariate-adjusted analyses]. Fourth, no one has examined whether multiplicity of results and reporting of methods to select results to include in meta-analyses differs between clinical conditions. It may be hypothesized that there may be less multiplicity of results for clinical conditions that have “core outcome measurement sets” available [14], [15], [16]. Core outcome measurement sets are measurement scales recommended for use in RCTs and systematic reviews of a particular health condition and are designed to increase consistency in scale selection.

Our aim was to investigate multiplicity of results in trial reports and methods systematic reviewers use to select results to include in meta-analyses. The primary objectives were to investigate the frequency and types of: (1) multiplicity of results that arise in RCTs and (2) eligibility criteria and decision rules to select results, which are reported in review protocols and reviews. Secondary objectives were to examine how the extent of multiplicity of results was modified by the existence of a review protocol and the clinical condition of the review and how the reporting of eligibility criteria and decision rules to select results was modified by the clinical condition of the review. We also plan to investigate whether there is evidence of selective inclusion of results in the sample of reviews and what impact this may have on meta-analytic effect estimates [17]; the results of this research will form a subsequent article.

Section snippets

Methods

Our study protocol that describes the eligibility criteria, search strategies, selection of systematic reviews, data extraction, and planned analyses is published elsewhere [17]. An overview of the methods is provided here.

Results

A flow diagram of the identification, screening, and inclusion of systematic reviews is presented in Fig. 1. Searching yielded a total of 2,590 records. A full-text report was retrieved for 264 records. Of these, 145 were screened and excluded (the most common reasons for exclusion were that no meta-analyses were conducted or no continuous outcomes were analyzed in the review). The target sample size was reached after screening 189 randomly sorted full-text articles (leaving 75 full-text

Discussion

Our investigation of multiplicity of results demonstrates that systematic reviewers can expect to commonly encounter multiple eligible effect estimates in trials when they do not predefine methods to select results to include in meta-analyses. Multiple measurement scales and intervention/control groups (in multi-arm RCTs) were the most common types of multiplicity. At least one eligibility criterion and decision rule to select results were reported in more than 80% of review protocols and

Acknowledgments

This work was conducted as part of a PhD undertaken by M.J.P., which is funded by an Australian Postgraduate Award administered through Monash University, Australia. J.E.M is supported by an NHMRC Australian Public Health Fellowship (1072366).

References (34)

  • V. Hasselblad

    Meta-analysis of multitreatment studies

    Med Decis Making

    (1998)
  • Higgins JPT, Deeks JJ. Chapter 7: selecting studies and collecting data. In: Higgins JPT, Green S, editors. Cochrane...
  • B. Tendal et al.

    Disagreements in meta-analyses using outcomes measured on continuous or rating scales: observer agreement study

    BMJ

    (2009)
  • D. Moher et al.

    Epidemiology and reporting characteristics of systematic reviews

    PLoS Med

    (2007)
  • L. Turner et al.

    An evaluation of epidemiological and reporting characteristics of complementary and alternative medicine (CAM) systematic reviews (SRs)

    PLoS One

    (2013)
  • P. Tugwell et al.

    OMERACT: an international initiative to improve outcome measurement in rheumatology

    Trials

    (2007)
  • M.J. Page et al.

    An empirical investigation of the potential impact of selective inclusion of results in systematic reviews of interventions: study protocol

    Syst Rev

    (2013)
  • Cited by (21)

    • Methods used to select results to include in meta-analyses of nutrition research: A meta-research study

      2022, Journal of Clinical Epidemiology
      Citation Excerpt :

      Although inclusion of multiple effect estimates from a particular study in a meta-analysis is possible (using methods that adjust for statistical dependency) [10], more commonly only one of the available effect estimates is selected for inclusion. There are various methods that can be used to select a single effect estimate [9,10]. However, when this selection is based on the statistical significance, magnitude or direction of effect, this may introduce bias into the meta-analysis effect estimate [11].

    • Multiple outcomes and analyses in clinical trials create challenges for interpretation and research synthesis

      2017, Journal of Clinical Epidemiology
      Citation Excerpt :

      Furthermore, when only some outcomes are reported publicly, it is impossible for the systematic reviewer or other interpreter of the trial findings to know for sure whether there has been selective reporting. Few studies have explored the number of results that investigators could select to include in meta-analyses [7,13,19]. We know of no studies that have used both public and nonpublic data sources for RCTs to quantify the number of outcomes and results reported across RCTs, the number of reported outcomes that are defined, or the number of results that are meta-analyzable.

    View all citing articles on Scopus

    Conflict of interest: M.J.P. has roles in The Cochrane Collaboration including systematic review trainer for the Australasian Cochrane Centre; Methodological Editor for the Depression, Anxiety, and Neurosis Group; member of the Bias Methods Group, Statistical Methods Group, and Trainer's Network; and author of Cochrane systematic reviews. J.E.M. has roles in The Cochrane Collaboration including Co-convenor of the Statistical Methods Group; member of the Methods Executive, Methods Board, and the Bias Methods Group; Statistical Editor for the Consumers and Communication Review Group; Editor of Cochrane Methods; and author of Cochrane systematic reviews. M.C. has a role in The Cochrane Collaboration as author of Cochrane systematic reviews. S.E.G. has roles in The Cochrane Collaboration including Co-Director of the Australasian Cochrane Centre; past editor of the Cochrane Handbook for Systematic Reviews of Interventions; and author of Cochrane systematic reviews. A.F. has a role in The Cochrane Collaboration as member of the Statistical Methods Group. The views expressed in this article are those of the authors and not necessarily those of The Cochrane Collaboration or its registered entities, committees, or working groups.

    View full text