Elsevier

Journal of Clinical Epidemiology

Volume 101, September 2018, Pages 61-72
Journal of Clinical Epidemiology

Original Article
Three risk of bias tools lead to opposite conclusions in observational research synthesis

https://doi.org/10.1016/j.jclinepi.2018.05.021Get rights and content

Abstract

Objectives

The aim of this study was to assess the agreement and compare the performance of three different instruments in assessing risk of bias (RoB) of comparative cohort studies included in a health psychology meta-analysis.

Study Design and Setting

Three tools were applied to 28 primary studies included in the selected meta-analysis: the Newcastle-Ottawa Scale, quality of cohort studies (Q-Coh), and risk of bias in nonrandomized studies of interventions (ROBINS-I).

Results

Interrater agreement varied greatly from tool to tool. For overall RoB, 75% of the studies were rated as low RoB with the Newcastle-Ottawa Scale, 11% of the studies with Q-Coh, and no study was found to be at low RoB using ROBINS-I. No influence of quality ratings on the meta-analysis results was found for any of the tools.

Conclusion

Assessing RoB using the three tools may lead to opposite conclusions, especially at low and high levels of RoB. Domain-based tools (Q-Coh and ROBINS-I) provide a more comprehensive framework for identifying potential sources of bias, which is essential to improving the quality of future research. Both further guidance on the application of RoB tools and improvements in the reporting of primary studies are necessary.

Introduction

Assessing the methodological quality or risk of bias (RoB) of primary studies is an essential component of any systematic review or meta-analysis [1], [2] and should play a relevant role in interpreting the results of the review [3]. Moreover, the inclusion of poor-quality studies in a review may lead to invalid conclusions [3], [4]. In fact, the results of such quality assessments often exert an important influence on some decisions made in the review process, such as whether to exclude studies not meeting certain quality standards, to perform sensitivity analyses, to determine the strength of evidence, or to guide recommendations for future research and clinical practice [5], [6].

Compared to clinical trials, the quality assessment of observational studies is often more demanding due to the variety of designs comprised and their increased susceptibility to bias [5], [7], [8]. These difficulties are probably the reason why in some areas such as health psychology, only about half of all reviews that include cohort and case–control studies assessed the RoB of the primary studies [9]. Although a wide range of tools suitable for observational studies have been reviewed by several authors [10], [11], [12], there is no consensus on which is the best procedure or tool to assess RoB in observational designs, despite observational studies are usually included in systematic reviews including those of Cochrane [13]. Moreover, most of these tools were poorly developed, and their developers often failed to follow standard methodological procedures or to test their tools' validity and reliability [10], [14]. Thus, RoB assessments of a single study using different tools may lead to different conclusions [4], [15], [16], both in randomized controlled trials [1], [14], [17] and in observational studies [7], [8], [18].

Meanwhile, the use of scales that provide a single summary score is strongly discouraged [4], [15], [19] because it involves the weighting of component items, although some of them may be not related to RoB [3], [11]. The alternative seems to perform an RoB assessment based on domains [20], [21], [22], [23], which is increasingly applied and apparently provides a more structured framework within which to make qualitative decisions on the overall quality of studies and to detect potential sources of bias [16].

The general purpose of this study was to assess the agreement and compare the performance of three different instruments in assessing the RoB of comparative cohort studies included in a meta-analysis related to health psychology. The selected tools were as follows: (1) NOS [24], the most frequently used scale to assess the quality of cohort and case–control studies [9], which provides a summary score; (2) quality of cohort studies (Q-Coh) [21], a specific domain-based tool to assess the RoB of cohort studies with good psychometric properties; and (3) risk of bias in nonrandomized studies of interventions (ROBINS-I) [22], a new domain-based tool proposed by Cochrane, which is intended to assess RoB in nonrandomized studies of interventions but is also applicable to a wide variety of observational designs [25]. To be more precise, the specific objectives are as follows:

  • To estimate, for each tool, the degree of interrater agreement when examining items, domains of RoB, and overall quality rating.

  • To estimate the level of agreement between tools for specific biases, domains of RoB, and overall quality rating.

  • To appraise the qualitative aspects of the tools related to their usability: the average time spent, clarity of instructions and items, coverage, and validity.

  • To determine the effect of quality ratings on the results of a meta-analysis.

Section snippets

Risk of bias assessment tools

The NOS [24] was developed to assess the quality of observational studies included in systematic reviews. This tool exists in separate versions for cohort and case–control designs, although only the scale for cohort studies was applied here. Studies are assessed using eight items broken down into three dimensions: selection (four items), comparability (one item), and exposure for case–control studies or outcome for cohort studies (three items). A study can be awarded a maximum of nine stars.

Risk of bias assessment

Fig. 1 shows a summary of the consensus results of RoB assessment for each tool, for overall RoB and by domain. The NOS scores ranged from 5 to 9, with a median and mode of 8 (25th percentile [p25] = 6; p75 = 9). Once the studies were classified into categories, there were 21 studies with low RoB and seven studies with moderate RoB. None of the studies were placed in the category of high RoB. According to the Q-Coh results, three studies were classified as low RoB, 11 studies as moderate RoB,

Discussion

Our comparison of three tools for RoB assessment of nonexperimental studies suggests that we are dealing here with three different approaches to RoB assessment, each of which could lead to different conclusions about the final quality grade assigned to each study. In this study, no agreement between tools was found for overall RoB. While 75% of the studies can be considered to be at low RoB when the NOS is applied, 86% of the studies would be at serious RoB according to ROBINS-I. Overall RoB

Conclusions

The present study, comparing the performance of three different tools when assessing the RoB of 28 cohort studies, shows that assessing RoB on the same study using different tools may lead to opposite conclusions, especially at low and high levels of RoB, where most of the studies were rated as low RoB with the NOS, contrary to ROBINS-I with which most of the studies were rated as high RoB. Therefore, both the NOS and ROBINS-I showed low capability in grading RoB in observational studies. Our

References (57)

  • C.A. Lantz et al.

    Behavior and interpretation of the κ statistic: resolution of the two paradoxes

    J Clin Epidemiol

    (1996)
  • D.V. Cicchetti et al.

    High agreement but low kappa: II. Resolving the paradoxes

    J Clin Epidemiol

    (1990)
  • T. Byrt et al.

    Bias, prevalence and kappa

    J Clin Epidemiol

    (1993)
  • D. Moher et al.

    Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses?

    Lancet

    (1998)
  • C.M. Faggion

    The rationale for rating risk of bias should be fully reported

    J Clin Epidemiol

    (2016)
  • L. Hartling et al.

    Risk of bias versus quality assessment of randomised controlled trials: cross sectional study

    BMJ

    (2009)
  • B.T. Johnson et al.

    Panning for the gold in health research: incorporating studies' methodological quality in meta-analysis

    Psychol Health

    (2014)
  • J. Higgins et al.

    Cochrane handbook for systematic reviews of interventions version 5.1.0 [updated March 2011]. The Cochrane Collaboration, 2011

  • P. Jüni et al.

    The hazards of scoring the quality of clinical trials for meta-analysis

    JAMA

    (1999)
  • Centre for Reviews and Dissemination

    Systematic reviews: CRD’s guidance for undertaking reviews in health care. CRD, University of York; 2009

  • P. Jüni et al.

    Systematic reviews in health care—assessing the quality of controlled clinical trials

    Br Med J

    (2001)
  • J.M. Hootman et al.

    Reliability and validity of three quality rating instruments for systematic reviews of observational studies

    Res Synth Methods

    (2011)
  • A. Margulis et al.

    Quality assessment of observational studies in a drug-safety systematic review, comparison of two tools: the Newcastle–Ottawa scale and the RTI item bank

    Clin Epidemiol

    (2014)
  • J.J. Deeks et al.

    Evaluating non-randomised intervention studies

    Health Technol Assess

    (2003)
  • S. Sanderson et al.

    Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography

    Int J Epidemiol

    (2007)
  • A. Jarde et al.

    Methodological quality assessment tools of non-experimental studies: a systematic review

    An Psicol

    (2012)
  • S. Armijo-Olivo et al.

    Assessment of study quality for systematic reviews: a comparison of the Cochrane collaboration risk of bias tool and the effective public health practice project quality assessment tool: methodological research

    J Eval Clin Pract

    (2012)
  • P. Herbison et al.

    Adjustment of meta-analyses on the basis of quality scores should be abandoned

    J Clin Epidemiol

    (2006)
  • Cited by (64)

    View all citing articles on Scopus

    Conflict of interest: None.

    Funding: This work was supported by the Spanish Ministry of Science and Innovation (grant number: PSI2014-52962-P). I.O. was supported by funding from a predoctoral grant from the Ministry of Education, Culture and Sport of Spanish Government (grant number: FPU14/04514). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the article.

    View full text