Methods to select results to include in meta-analyses deserve more consideration in systematic reviews

doi:10.1016/j.jclinepi.2015.02.009

Journal of Clinical Epidemiology

Volume 68, Issue 11, November 2015, Pages 1282-1291

https://doi.org/10.1016/j.jclinepi.2015.02.009 Get rights and content

Abstract

Objectives

To investigate how often systematic reviewers encounter multiple trial effect estimates that are available for inclusion in a particular meta-analysis (multiplicity of results) and the methods they use to select effect estimates.

Study Design and Setting

We randomly sampled Cochrane and MEDLINE-indexed non-Cochrane reviews published between January 2010 and January 2012. The first presented meta-analysis of an effect measure for a continuous outcome in each review was identified, and methods to select results to include in this meta-analysis were extracted from review protocols and reviews. All effect estimates that were available for inclusion in the meta-analyses were extracted from trial reports.

Results

We examined 44 reviews. Multiplicity of results was common, occurring in 49% of trial reports (n = 210). Prespecification of decision rules to select results from multiple measurement scales and intervention/control groups (in multi-arm trials) was uncommon (19% and 14% of 21 review protocols, respectively). Overall, 70% of reviews included at least one randomized controlled trial with multiplicity of results, but this occurred less frequently in reviews with a protocol (risk difference, −25%; 95% confidence interval: −52%, 1%).

Conclusion

Systematic reviewers are likely to encounter multiplicity of results in the included trials. We recommend that systematic reviewers always consider predefining methods to select results to include in meta-analyses. Methods focusing on selection of measurement scales and how to deal with multi-arm trials would be most valuable.

Introduction

Systematic reviews of randomized controlled trials (RCTs) of health care interventions have the potential to have a major impact on patient health, research agendas, and policy making. However, the validity of systematic review findings can be compromised by challenges in undertaking meta-analysis. One challenge is that multiple effect estimates in a trial report may be available for inclusion in a particular meta-analysis [1], [2]. For example, a trial report may present effect estimates for two depression scales, at week three, six, and nine, each analyzed as unadjusted and adjusted for covariates. Multiplicity of effect estimates may lead to “selective inclusion of results,” whereby the process for selecting the trial effect estimates for inclusion in a meta-analysis is based on the estimates themselves, which may, in turn, result in biased meta-analytic effects [3].

Several organizations that produce systematic reviews (e.g., [4], [5], [6]) have recommended methods that aim to reduce selective inclusion of results. The methods (specified a priori) aim to uniquely identify results that will be included in a meta-analysis and can be placed in two broad categories, which we label “eligibility criteria to select results” and “decision rules to select results.” Eligibility criteria to select results include specifying lists of measurement scales, intervention/control groups, time points, and analyses that systematic reviewers consider eligible to include in the review (ideally based on some clinical or methodological rationale). Providing specific criteria discourages the use of broad outcomes such as “pain,” and instead encourages specification of details such as the eligible pain measurement scales and time points of interest to the review [1], [2].

Predefining eligibility criteria to select results is an effective method to minimize the number of effect estimates available for inclusion in a particular meta-analysis. However, this method may not always identify a single eligible effect estimate per trial, and in such cases, the addition of decision rules is useful. Decision rules are strategies to either select one effect estimate, or combine effect estimates, when multiple are available. An example of a decision rule to select one effect estimate is when commonly encountered measurement scales for a particular outcome domain (e.g., depression) are ranked based on their psychometric properties, and for trials that report the results of more than one scale, the results for the tool with the best measurement properties are selected. Such a strategy has previously been referred to as an “outcome data hierarchy” [2], [7], [8]. An example of a decision rule to combine effect estimates is when a trial includes more than one active treatment arm (e.g., placebo vs. high-dose drug vs. low-dose drug), and rather than selecting data from only one of the active arms (e.g., only one dosage group), data from all active treatment arms are combined (e.g., any dosage vs. placebo) [9], [10].

To our knowledge, only two previous studies have investigated multiplicity of results in trial reports or the methods systematic reviewers use to select results to include in meta-analyses [2], [11]. In the first study, that examined interobserver variation in results extracted from trials for use in meta-analyses, decision rules to select final vs. change from baseline values were reported in 4 of 10 review protocols [11]. In the second study [2], that examined the impact of multiplicity of trial results on meta-analysis results, multiplicity was found to be common, but methods to select results to include in meta-analyses were rarely predefined. In 83 RCTs included in 19 Cochrane reviews published from 2006 to 2007, 35% of the RCTs had multiple measurement scales, 29% had multiple intervention/control groups (i.e., in multi-arm RCTs), and 36% had multiple time points that were available for inclusion in a particular meta-analysis. In all review protocols, eligibility criteria for measurement scales and intervention/control groups were always defined, and eligibility criteria for time points were defined in eight (42%). In contrast, decision rules to select measurement scales or intervention/control groups were not reported in any of the review protocols, whereas a decision rule to select time points was reported in one review protocol (5%) [2].

To inform methods guidance regarding inclusion of results when there is multiplicity, several issues still require exploration. First, the protocols in Tendal et al. studies were published before 2006, and it is unclear whether reporting of eligibility criteria and decision rules to select results has changed over time. Second, most systematic reviewers do not report working from a review protocol [12], [13], and the methods used to select results to include in such reviews have not been examined. Third, there has been no investigation of the frequency of other types of multiplicity which may arise in RCTs [e.g., reporting of results from intention-to-treat (ITT) and per-protocol or unadjusted and covariate-adjusted analyses]. Fourth, no one has examined whether multiplicity of results and reporting of methods to select results to include in meta-analyses differs between clinical conditions. It may be hypothesized that there may be less multiplicity of results for clinical conditions that have “core outcome measurement sets” available [14], [15], [16]. Core outcome measurement sets are measurement scales recommended for use in RCTs and systematic reviews of a particular health condition and are designed to increase consistency in scale selection.

Our aim was to investigate multiplicity of results in trial reports and methods systematic reviewers use to select results to include in meta-analyses. The primary objectives were to investigate the frequency and types of: (1) multiplicity of results that arise in RCTs and (2) eligibility criteria and decision rules to select results, which are reported in review protocols and reviews. Secondary objectives were to examine how the extent of multiplicity of results was modified by the existence of a review protocol and the clinical condition of the review and how the reporting of eligibility criteria and decision rules to select results was modified by the clinical condition of the review. We also plan to investigate whether there is evidence of selective inclusion of results in the sample of reviews and what impact this may have on meta-analytic effect estimates [17]; the results of this research will form a subsequent article.

Section snippets

Methods

Our study protocol that describes the eligibility criteria, search strategies, selection of systematic reviews, data extraction, and planned analyses is published elsewhere [17]. An overview of the methods is provided here.

Results

A flow diagram of the identification, screening, and inclusion of systematic reviews is presented in Fig. 1. Searching yielded a total of 2,590 records. A full-text report was retrieved for 264 records. Of these, 145 were screened and excluded (the most common reasons for exclusion were that no meta-analyses were conducted or no continuous outcomes were analyzed in the review). The target sample size was reached after screening 189 randomly sorted full-text articles (leaving 75 full-text

Discussion

Our investigation of multiplicity of results demonstrates that systematic reviewers can expect to commonly encounter multiple eligible effect estimates in trials when they do not predefine methods to select results to include in meta-analyses. Multiple measurement scales and intervention/control groups (in multi-arm RCTs) were the most common types of multiplicity. At least one eligibility criterion and decision rule to select results were reported in more than 80% of review protocols and

Acknowledgments

This work was conducted as part of a PhD undertaken by M.J.P., which is funded by an Australian Postgraduate Award administered through Monash University, Australia. J.E.M is supported by an NHMRC Australian Public Health Fellowship (1072366).

References (34)

R. Bender et al.
Attention should be given to multiplicity issues in systematic reviews
J Clin Epidemiol
(2008)
M.J. Page et al.
Many scenarios exist for selective inclusion and reporting of results in randomized trials and systematic reviews
J Clin Epidemiol
(2013)
P. Jüni et al.
Osteoarthritis: rational approach to treating the individual
Best Pract Res Clin Rheumatol
(2006)
M. Boers et al.
Developing core outcome measurement sets for clinical trials: OMERACT Filter 2.0
J Clin Epidemiol
(2014)
R. Altman et al.
Design and conduct of clinical trials in patients with osteoarthritis: recommendations from a task force of the osteoarthritis research society: results from a workshop
Osteoarthritis and Cartilage
(1996)
B. Tendal et al.
Multiplicity of data in trial reports and the reliability of meta-analyses: empirical study
BMJ
(2011)
J. Chandler et al.
Methodological standards for the conduct of new Cochrane Intervention Reviews. Version 2.3
(2013)
Finding what works in health care: standards for systematic reviews
(2011)
Methods guide for effectiveness and comparative effectiveness reviews
(2014)
S. Reichenbach et al.
Meta-analysis: chondroitin for osteoarthritis of the knee or hip
Ann Intern Med
(2007)

V. Hasselblad

Meta-analysis of multitreatment studies

Med Decis Making

(1998)

Higgins JPT, Deeks JJ. Chapter 7: selecting studies and collecting data. In: Higgins JPT, Green S, editors. Cochrane...

B. Tendal et al.

Disagreements in meta-analyses using outcomes measured on continuous or rating scales: observer agreement study

BMJ

(2009)

D. Moher et al.

Epidemiology and reporting characteristics of systematic reviews

PLoS Med

(2007)

L. Turner et al.

An evaluation of epidemiological and reporting characteristics of complementary and alternative medicine (CAM) systematic reviews (SRs)

PLoS One

(2013)

P. Tugwell et al.

OMERACT: an international initiative to improve outcome measurement in rheumatology

Trials

(2007)

M.J. Page et al.

An empirical investigation of the potential impact of selective inclusion of results in systematic reviews of interventions: study protocol

Syst Rev

(2013)

Cited by (21)

Methods used to select results to include in meta-analyses of nutrition research: A meta-research study
2022, Journal of Clinical Epidemiology
Citation Excerpt :
Although inclusion of multiple effect estimates from a particular study in a meta-analysis is possible (using methods that adjust for statistical dependency) [10], more commonly only one of the available effect estimates is selected for inclusion. There are various methods that can be used to select a single effect estimate [9,10]. However, when this selection is based on the statistical significance, magnitude or direction of effect, this may introduce bias into the meta-analysis effect estimate [11].
To investigate how often review authors encounter multiple results from included studies that are eligible for inclusion in a particular meta-analysis, and how often methods to select results are specified.
MEDLINE and Epistemonikos were searched (January 2018–June 2019) to identify systematic reviews with meta-analysis of the association between food/diet and health-related outcomes. A random sample of these reviews was selected, and for the first presented (index) meta-analysis, rules used to select effect estimates to include in this meta-analysis were extracted from the reviews and their protocols. All effect estimates from the primary studies that were eligible for inclusion in the index meta-analyses were extracted (e.g., when a study report presented effect estimates for blood pressure at 3 weeks and 6 weeks, both unadjusted and adjusted for covariates, and all were eligible for inclusion in a meta-analysis of the effect of red meat consumption on blood pressure, we extracted all estimates, and classified the study as having “multiplicity of results”).
Forty-two systematic reviews with 325 studies (104 randomized, 221 non-randomized) were included; 14 reviews had a protocol. In 29% of review protocols and 69% of reviews, authors specified at least one decision rule to select effect estimates when multiple were available. In 68% of studies included in the index meta-analyses, there was at least one type of multiplicity of results.
Authors of systematic reviews of nutrition studies should anticipate encountering multiplicity of results in the included primary studies. Specification of methods to handle multiplicity when designing reviews is therefore recommended.
Cherry-picking by trialists and meta-analysts can drive conclusions about intervention efficacy
2017, Journal of Clinical Epidemiology
The objective of this study was to determine whether disagreements among multiple data sources affect systematic reviews of randomized clinical trials (RCTs).
Eligible RCTs examined gabapentin for neuropathic pain and quetiapine for bipolar depression, reported in public (e.g., journal articles) and nonpublic sources (clinical study reports [CSRs] and individual participant data [IPD]).
We found 21 gabapentin RCTs (74 reports, 6 IPD) and 7 quetiapine RCTs (50 reports, 1 IPD); most were reported in journal articles (18/21 [86%] and 6/7 [86%], respectively). When available, CSRs contained the most trial design and risk of bias information. CSRs and IPD contained the most results. For the outcome domains “pain intensity” (gabapentin) and “depression” (quetiapine), we found single trials with 68 and 98 different meta-analyzable results, respectively; by purposefully selecting one meta-analyzable result for each RCT, we could change the overall result for pain intensity from effective (standardized mean difference [SMD] = −0.45; 95% confidence interval [CI]: −0.63 to −0.27) to ineffective (SMD = −0.06; 95% CI: −0.24 to 0.12). We could change the effect for depression from a medium effect (SMD = −0.55; 95% CI: −0.85 to −0.25) to a small effect (SMD = −0.26; 95% CI: −0.41 to −0.1).
Disagreements across data sources affect the effect size, statistical significance, and interpretation of trials and meta-analyses.
Multiple outcomes and analyses in clinical trials create challenges for interpretation and research synthesis
2017, Journal of Clinical Epidemiology
Citation Excerpt :
Furthermore, when only some outcomes are reported publicly, it is impossible for the systematic reviewer or other interpreter of the trial findings to know for sure whether there has been selective reporting. Few studies have explored the number of results that investigators could select to include in meta-analyses [7,13,19]. We know of no studies that have used both public and nonpublic data sources for RCTs to quantify the number of outcomes and results reported across RCTs, the number of reported outcomes that are defined, or the number of results that are meta-analyzable.
To identify variations in outcomes and results across reports of randomized clinical trials (RCTs).
Eligible RCTs examined gabapentin for neuropathic pain and quetiapine for bipolar depression, reported in public (e.g., journal articles) and nonpublic (e.g., clinical study reports) sources by 2015. We prespecified outcome domains. From each source, we collected “outcomes” (i.e., domain, measure, metric, method of aggregation, and time point); “treatment effect” (i.e., outcome plus the methods of analysis [e.g., how missing data were handled]); and results (i.e., numerical contrasts of treatment and comparison groups). We assessed whether results included sufficient information for meta-analysis.
We found 21 gabapentin (68 public, 6 nonpublic reports) and seven quetiapine RCTs (46 public, 4 nonpublic reports). For four (gabapentin) and seven (quetiapine) prespecified outcome domains, RCTs reported 214 and 81 outcomes by varying four elements. RCTs assessed 605 and 188 treatment effects by varying the analysis of those outcomes. RCTs reported 1,230 and 661 meta-analyzable results, 305 (25%) and 109 (16%) in public reports.
RCTs included hundreds of outcomes and results; a small proportion were in public reports. Trialists and meta-analysts may cherry-pick what they report from multiple sources of RCT information.
Extracting data from figures with software was faster, with higher interrater reliability than manual extraction
2016, Journal of Clinical Epidemiology
To compare speed and accuracy of graphical data extraction using manual estimation and open source software.
Data points from eligible graphs/figures published in randomized controlled trials (RCTs) from 2009 to 2014 were extracted by two authors independently, both by manual estimation and with the Plot Digitizer, open source software. Corresponding authors of each RCT were contacted up to four times via e-mail to obtain exact numbers that were used to create graphs. Accuracy of each method was compared against the source data from which the original graphs were produced.
Software data extraction was significantly faster, reducing time for extraction for 47%. Percent agreement between the two raters was 51% for manual and 53.5% for software data extraction. Percent agreement between the raters and original data was 66% vs. 75% for the first rater and 69% vs. 73% for the second rater, for manual and software extraction, respectively.
Data extraction from figures should be conducted using software, whereas manual estimation should be avoided. Using software for data extraction of data presented only in figures is faster and enables higher interrater reliability.
Equivalencies Between Ad Hoc Strategies and Multivariate Models for Meta-Analysis of Dependent Effect Sizes
2024, Journal of Educational and Behavioral Statistics
Investigation of bias due to selective inclusion of study effect estimates in meta-analyses of nutrition research
2024, Research Synthesis Methods

View all citing articles on Scopus

: Conflict of interest: M.J.P. has roles in The Cochrane Collaboration including systematic review trainer for the Australasian Cochrane Centre; Methodological Editor for the Depression, Anxiety, and Neurosis Group; member of the Bias Methods Group, Statistical Methods Group, and Trainer's Network; and author of Cochrane systematic reviews. J.E.M. has roles in The Cochrane Collaboration including Co-convenor of the Statistical Methods Group; member of the Methods Executive, Methods Board, and the Bias Methods Group; Statistical Editor for the Consumers and Communication Review Group; Editor of Cochrane Methods; and author of Cochrane systematic reviews. M.C. has a role in The Cochrane Collaboration as author of Cochrane systematic reviews. S.E.G. has roles in The Cochrane Collaboration including Co-Director of the Australasian Cochrane Centre; past editor of the Cochrane Handbook for Systematic Reviews of Interventions; and author of Cochrane systematic reviews. A.F. has a role in The Cochrane Collaboration as member of the Statistical Methods Group. The views expressed in this article are those of the authors and not necessarily those of The Cochrane Collaboration or its registered entities, committees, or working groups.

View full text

Original ArticleMethods to select results to include in meta-analyses deserve more consideration in systematic reviews

Abstract

Objectives

Study Design and Setting

Results

Conclusion

Introduction

Section snippets

Methods

Results

Discussion

Acknowledgments

J Clin Epidemiol

J Clin Epidemiol

Best Pract Res Clin Rheumatol

J Clin Epidemiol

Osteoarthritis and Cartilage

Multiplicity of data in trial reports and the reliability of meta-analyses: empirical study

BMJ

Methodological standards for the conduct of new Cochrane Intervention Reviews. Version 2.3

Finding what works in health care: standards for systematic reviews

Methods guide for effectiveness and comparative effectiveness reviews

Meta-analysis: chondroitin for osteoarthritis of the knee or hip

Ann Intern Med

Meta-analysis of multitreatment studies

Med Decis Making

Disagreements in meta-analyses using outcomes measured on continuous or rating scales: observer agreement study

BMJ

Epidemiology and reporting characteristics of systematic reviews

PLoS Med

An evaluation of epidemiological and reporting characteristics of complementary and alternative medicine (CAM) systematic reviews (SRs)

PLoS One

OMERACT: an international initiative to improve outcome measurement in rheumatology

Trials

An empirical investigation of the potential impact of selective inclusion of results in systematic reviews of interventions: study protocol

Syst Rev

Original Article
Methods to select results to include in meta-analyses deserve more consideration in systematic reviews