Original Article
Reported effects in randomized controlled trials were compared with those of nonrandomized trials in cholecystectomy

https://doi.org/10.1016/j.jclinepi.2009.12.009Get rights and content

Abstract

Objectives

Because external validity of randomized controlled trials (RCTs) may be insufficient, the performance of nonrandomized controlled trials (nRCTs) is increasingly debated. RCTs and nRCTs were compared using the example of laparoscopic vs. open cholecystectomy (LC vs. OC).

Study Design and Setting

RCTs and nRCTs comparing LC and OC were identified by searching PubMed. To assess internal and external validity of the studies, patient characteristics, relative risks, and mean differences of RCTs and nRCTs were compared by meta-analytic techniques.

Results

In total, 162 studies were analyzed (136 nRCTs and 26 RCTs). Significant discrepancies between RCT- and nRCT-based results were revealed for 3 of 15 variables: overall complications (P < 0.021), wound infections (P < 0.014), and length of hospital stay (P < 0.005). In RCTs and in nRCTs, length of hospital stay and return to work were significantly reduced when using LC compared with OC. The results of nRCTs were more often heterogeneous among themselves (11 of 15) as compared with RCTs (4 of 15).

Conclusion

The results of RCTs and nRCTs differ significantly in at least 20% of the variables. External validities of RCTs and nRCTs in LC vs. OC appear to be similar. Between-study heterogeneity was larger in nRCTs than in RCTs of cholecystectomy.

Introduction

What is new?

  • The results of RCTs and nRCTs differ significantly in at least 20% of the variables, but more often when the number of participants in RCTs increases.

  • Between-study heterogeneity is more common in nRCTs than in RCTs.

  • Comparing cholecystectomy patients' baseline characteristics, the external validity of RCTs did not differ from that of nRCTs.

Randomized controlled trials (RCTs) are widely accepted to represent one of the highest levels of evidence in the hierarchy of research designs. Treatment comparisons based on RCTs are recognized as the most valid method to avoid selection and confounding biases in clinical research. If properly designed and well conducted, RCTs are likely to have high internal validity, that is, they measure what they are intended to measure. Thus, using RCTs, researchers are able to detect even small or moderate treatment effects [1]. In contrast, claims have been made that treatment effects of observational studies and nonrandomized controlled trials (nRCT) might be overestimated [2], [3], [4] because of the lack of internal validity (e.g., baseline comparability of the groups) [5].

The predominance of RCTs has been criticized for several reasons. First, RCTs are assumed to have no sufficient external validity, that is, the applicability of the results of RCTs to the general population may be low [6]. This is often illustrated by highly selected study participants [7], in terms of the patients included in RCTs, are in average younger or healthier than those included in nRCTs. Second, a notable drawback is the high costs of RCTs as a result of numerous quality aspects that are required when an RCT is performed [8]. Third, a reliable estimation of the incidence of rare side effects may be difficult, as RCTs are often based on small sample sizes. Finally, performing RCTs in nonpharmacological interventions, such as surgery, is also questionable [9], [10] because of the problems in the standardization of surgical procedures, blinding, or a lack of acceptance by patients and surgeons [11].

Facing these disadvantages of RCTs, the performance of nRCTs may have some merits. Although, in terms of external validity, nRCTs may be superior compared with RCTs, in terms of efficacy results, the RCT is superior. However, depending on the study characteristics, some nRCTs may closely approximate the “true” efficacy result [12]. It has been suggested that, for specific medical topics, both RCTs and nRCTs may sometimes yield very similar results [13], [14]. In addition, various studies have identified different methodological aspects that increase the scientific value of nRCTs [15], [16], [17]. For example, by using data from a general practice database, the results of a large RCT for the assessment of hormone replacement therapy in women at risk of coronary heart disease could be accurately replicated [18]. Although the reputation of nRCTs has been improved, most health care agencies accept the coverage of novel pharmacological interventions only in cases where data from RCTs indicate a significant increase of clinical effectiveness.

In the area of laparoscopic cholecystectomy (LC), both surgeons and researchers have not agreed yet about the optimal approach for evaluating a surgical procedure. Because the advantages of LC compared with open cholecystectomy (OC) were overwhelming for many years, RCTs were not performed [19]. Even in high-quality journals, observational data from nRCTs were accepted for publication [20], because it was held impossible to conduct RCTs on this topic. When eventually, RCTs and even blinded trials were performed [21], [22], LC was to be found less advantageous than expected from previous nRCT data. However, bile duct injury, which may occur when performing LC, was primarily seen in case series and registry studies [23], [24]. This adverse event may never be detected when solely relying on RCT data. Thus, nRCTs might be of considerable value in the evaluation of surgical procedures [25].

In summary, there is still controversy about whether and under what circumstances the results of nRCTs may agree with the results of RCTs. As RCTs are currently more accepted, the scientific value of nRCTs has not been sufficiently justified yet. LC might serve as an ideal showcase, because a wide variety of studies were published on this procedure in a short period of time. Although some modifications in LC technique have been developed, for example, mini-instruments or less trocars, none of these have gained widespread acceptance so that LC is a highly standardized technique. The aims of this literature analysis were as follows: first, to compare the results of RCTs vs. RCTs in terms of their internal validities (study results); second, to compare the results of RCTs vs. nRCTs in terms of their external validities (baseline characteristics); and third, to assess which characteristics of nRCTs are associated with less-reliable study results.

Section snippets

Literature search

A combined literature search in the Medline database was performed to select both RCTs and nRCTS (period 1993–2008). For this analysis, nRCTs are defined as quasi-experiments, natural experiments, or observational studies, which may be prospective or retrospective cohort studies or case–control studies [26]. Because in most studies, even among the RCTs, a primary outcome criterion is not defined, studies (including registries) that fulfilled the following inclusion criteria were selected for

Results of the literature search

The literature search resulted in 1,567 potentially relevant articles (Fig. 1). Based on this list of titles and abstracts, 33 RCTs and 192 nRCTs were identified. Assessing these articles, 63 studies were excluded, as they did not fulfill the inclusion criteria listed earlier (seven RCTs and 56 nRCTs; Appendix [available on the journal's Web site at www.elsevier.com]). Thus, the total number of studies for this analysis amounted to 26 RCTs and 136 nRCTs, including 15 studies in which LC was

Discussion

Using data from 162 studies for a set of different baseline and outcome variables, the results of RCTs and nRCTs in LC vs. OC were compared. The present analysis contains several major findings. First, the study designs do not differ in their baseline characteristics, and thus, the external validity of RCTs does not appear to be affected compared with that of nRCTs. Second, in none of the examined variables were the results of the study designs found to be significant and in opposite

References (41)

  • A.D. Furlan et al.

    Methodological quality and homogeneity influenced agreement between randomized trials and nonrandomized studies of the same intervention for back pain

    J Clin Epidemiol

    (2008)
  • R. Kunz

    Randomized trials and observational studies: still mostly similar results, still crucial differences

    J Clin Epidemiol

    (2008)
  • G.A. Colditz et al.

    How study design affects outcomes in comparisons of therapy. I: Medical

    Stat Med

    (1989)
  • J.N. Miller et al.

    How study design affects outcomes in comparisons of therapy. II: Surgical

    Stat Med

    (1989)
  • T.C. Chalmers et al.

    Bias in treatment assignment in controlled clinical trials

    N Engl J Med

    (1983)
  • K. McPherson

    The best and the enemy of the good: randomised controlled trials, uncertainty, and assessing the role of patient choice in medical decision making

    J Epidemiol Commun Health

    (1994)
  • R.C. Russell

    Surgical research

    Lancet

    (1996)
  • N. Black

    Why we need observational studies to evaluate the effectiveness of health care

    BMJ

    (1996)
  • R. McLeod

    Randomized, controlled trials: is there a role for them in surgery?

    Ann Surg

    (2006)
  • R. Kunz et al.

    Randomisation to protect against selection bias in healthcare trials

    Cochrane Database Syst Rev

    (2007)
  • Cited by (10)

    • No inexplicable disagreements between real-world data–based nonrandomized controlled studies and randomized controlled trials were found

      2021, Journal of Clinical Epidemiology
      Citation Excerpt :

      To complement our analysis, we took a closer look at the comparisons in this review that showed strong disagreement. In the included reports that showed a statistically significant disagreement in a review by Anglemyer, the authors of the original studies themselves referred to a risk of cofounding bias, differences in participant characteristics, or differences in context/setting as explanations, suggesting that they were aware of the potential causes of disagreement [26–28]. Bun et al. found only statistically significant differences between RCTs and NRCSs in 4 of 31 comparisons.

    • Diverse criteria and methods are used to compare treatment effect estimates: a scoping review

      2016, Journal of Clinical Epidemiology
      Citation Excerpt :

      We included a total of 26 studies in this review (Fig. 1). Among the 26 included studies, 19 aimed to compare the effect estimates obtained using observational studies to those from randomized controlled trials [8–10,17–32]; from these, four were specific to observational studies that used propensity score methods [8,10,19,32], and one was specific to observational studies conducted using administrative data sets [27]. Five studies aimed to compare the effect estimates from systematic reviews that used indirect comparisons versus systematic reviews that used direct comparisons or network meta-analysis [33–37], one aimed to compare the effect estimates from large randomized trials versus systematic reviews that used meta-analysis of small trials [38], and one aimed to compare more than two types of study designs [39].

    View all citing articles on Scopus
    View full text