Reported effects in randomized controlled trials were compared with those of nonrandomized trials in cholecystectomy

doi:10.1016/j.jclinepi.2009.12.009

Journal of Clinical Epidemiology

Volume 63, Issue 10, October 2010, Pages 1082-1090

https://doi.org/10.1016/j.jclinepi.2009.12.009 Get rights and content

Abstract

Objectives

Because external validity of randomized controlled trials (RCTs) may be insufficient, the performance of nonrandomized controlled trials (nRCTs) is increasingly debated. RCTs and nRCTs were compared using the example of laparoscopic vs. open cholecystectomy (LC vs. OC).

Study Design and Setting

RCTs and nRCTs comparing LC and OC were identified by searching PubMed. To assess internal and external validity of the studies, patient characteristics, relative risks, and mean differences of RCTs and nRCTs were compared by meta-analytic techniques.

Results

In total, 162 studies were analyzed (136 nRCTs and 26 RCTs). Significant discrepancies between RCT- and nRCT-based results were revealed for 3 of 15 variables: overall complications (P < 0.021), wound infections (P < 0.014), and length of hospital stay (P < 0.005). In RCTs and in nRCTs, length of hospital stay and return to work were significantly reduced when using LC compared with OC. The results of nRCTs were more often heterogeneous among themselves (11 of 15) as compared with RCTs (4 of 15).

Conclusion

The results of RCTs and nRCTs differ significantly in at least 20% of the variables. External validities of RCTs and nRCTs in LC vs. OC appear to be similar. Between-study heterogeneity was larger in nRCTs than in RCTs of cholecystectomy.

Introduction

What is new?

•
The results of RCTs and nRCTs differ significantly in at least 20% of the variables, but more often when the number of participants in RCTs increases.
•
Between-study heterogeneity is more common in nRCTs than in RCTs.
•
Comparing cholecystectomy patients' baseline characteristics, the external validity of RCTs did not differ from that of nRCTs.

Randomized controlled trials (RCTs) are widely accepted to represent one of the highest levels of evidence in the hierarchy of research designs. Treatment comparisons based on RCTs are recognized as the most valid method to avoid selection and confounding biases in clinical research. If properly designed and well conducted, RCTs are likely to have high internal validity, that is, they measure what they are intended to measure. Thus, using RCTs, researchers are able to detect even small or moderate treatment effects [1]. In contrast, claims have been made that treatment effects of observational studies and nonrandomized controlled trials (nRCT) might be overestimated [2], [3], [4] because of the lack of internal validity (e.g., baseline comparability of the groups) [5].

The predominance of RCTs has been criticized for several reasons. First, RCTs are assumed to have no sufficient external validity, that is, the applicability of the results of RCTs to the general population may be low [6]. This is often illustrated by highly selected study participants [7], in terms of the patients included in RCTs, are in average younger or healthier than those included in nRCTs. Second, a notable drawback is the high costs of RCTs as a result of numerous quality aspects that are required when an RCT is performed [8]. Third, a reliable estimation of the incidence of rare side effects may be difficult, as RCTs are often based on small sample sizes. Finally, performing RCTs in nonpharmacological interventions, such as surgery, is also questionable [9], [10] because of the problems in the standardization of surgical procedures, blinding, or a lack of acceptance by patients and surgeons [11].

Facing these disadvantages of RCTs, the performance of nRCTs may have some merits. Although, in terms of external validity, nRCTs may be superior compared with RCTs, in terms of efficacy results, the RCT is superior. However, depending on the study characteristics, some nRCTs may closely approximate the “true” efficacy result [12]. It has been suggested that, for specific medical topics, both RCTs and nRCTs may sometimes yield very similar results [13], [14]. In addition, various studies have identified different methodological aspects that increase the scientific value of nRCTs [15], [16], [17]. For example, by using data from a general practice database, the results of a large RCT for the assessment of hormone replacement therapy in women at risk of coronary heart disease could be accurately replicated [18]. Although the reputation of nRCTs has been improved, most health care agencies accept the coverage of novel pharmacological interventions only in cases where data from RCTs indicate a significant increase of clinical effectiveness.

In the area of laparoscopic cholecystectomy (LC), both surgeons and researchers have not agreed yet about the optimal approach for evaluating a surgical procedure. Because the advantages of LC compared with open cholecystectomy (OC) were overwhelming for many years, RCTs were not performed [19]. Even in high-quality journals, observational data from nRCTs were accepted for publication [20], because it was held impossible to conduct RCTs on this topic. When eventually, RCTs and even blinded trials were performed [21], [22], LC was to be found less advantageous than expected from previous nRCT data. However, bile duct injury, which may occur when performing LC, was primarily seen in case series and registry studies [23], [24]. This adverse event may never be detected when solely relying on RCT data. Thus, nRCTs might be of considerable value in the evaluation of surgical procedures [25].

In summary, there is still controversy about whether and under what circumstances the results of nRCTs may agree with the results of RCTs. As RCTs are currently more accepted, the scientific value of nRCTs has not been sufficiently justified yet. LC might serve as an ideal showcase, because a wide variety of studies were published on this procedure in a short period of time. Although some modifications in LC technique have been developed, for example, mini-instruments or less trocars, none of these have gained widespread acceptance so that LC is a highly standardized technique. The aims of this literature analysis were as follows: first, to compare the results of RCTs vs. RCTs in terms of their internal validities (study results); second, to compare the results of RCTs vs. nRCTs in terms of their external validities (baseline characteristics); and third, to assess which characteristics of nRCTs are associated with less-reliable study results.

Section snippets

Literature search

A combined literature search in the Medline database was performed to select both RCTs and nRCTS (period 1993–2008). For this analysis, nRCTs are defined as quasi-experiments, natural experiments, or observational studies, which may be prospective or retrospective cohort studies or case–control studies [26]. Because in most studies, even among the RCTs, a primary outcome criterion is not defined, studies (including registries) that fulfilled the following inclusion criteria were selected for

Results of the literature search

The literature search resulted in 1,567 potentially relevant articles (Fig. 1). Based on this list of titles and abstracts, 33 RCTs and 192 nRCTs were identified. Assessing these articles, 63 studies were excluded, as they did not fulfill the inclusion criteria listed earlier (seven RCTs and 56 nRCTs; Appendix [available on the journal's Web site at www.elsevier.com]). Thus, the total number of studies for this analysis amounted to 26 RCTs and 136 nRCTs, including 15 studies in which LC was

Discussion

Using data from 162 studies for a set of different baseline and outcome variables, the results of RCTs and nRCTs in LC vs. OC were compared. The present analysis contains several major findings. First, the study designs do not differ in their baseline characteristics, and thus, the external validity of RCTs does not appear to be affected compared with that of nRCTs. Second, in none of the examined variables were the results of the study designs found to be significant and in opposite

References (41)

D.A. Grimes et al.
An overview of clinical research: the lay of the land
Lancet
(2002)
G.H. Guyatt et al.
Randomized trials versus observational studies in adolescent pregnancy prevention
J Clin Epidemiol
(2000)
P.M. Rothwell
External validity of randomised controlled trials: “to whom do the results of this trial apply?”
Lancet
(2005)
D. Mant
Can randomised trials inform clinical decisions about individual patients?
Lancet
(1999)
L. Audige et al.
Issues in the planning and conduct of non-randomised studies
Injury
(2006)
R.L. Tannen et al.
A simulation using data from a primary care practice database closely replicated the women's health initiative trial
J Clin Epidemiol
(2007)
A.W. Majeed et al.
Randomised, prospective, single-blind comparison of laparoscopic versus small-incision cholecystectomy
Lancet
(1996)
S. Sauerland et al.
Retrospective clinical studies in surgery: potentials and pitfalls
J Hand Surg [Br]
(2002)
D. Moher et al.
Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses?
Lancet
(1998)
A.J. McMahon et al.
Laparoscopic versus minilaparotomy cholecystectomy: a randomised trial
Lancet
(1994)

A.D. Furlan et al.

Methodological quality and homogeneity influenced agreement between randomized trials and nonrandomized studies of the same intervention for back pain

J Clin Epidemiol

(2008)

R. Kunz

Randomized trials and observational studies: still mostly similar results, still crucial differences

J Clin Epidemiol

(2008)

G.A. Colditz et al.

How study design affects outcomes in comparisons of therapy. I: Medical

Stat Med

(1989)

J.N. Miller et al.

How study design affects outcomes in comparisons of therapy. II: Surgical

Stat Med

(1989)

T.C. Chalmers et al.

Bias in treatment assignment in controlled clinical trials

N Engl J Med

(1983)

K. McPherson

The best and the enemy of the good: randomised controlled trials, uncertainty, and assessing the role of patient choice in medical decision making

J Epidemiol Commun Health

(1994)

R.C. Russell

Surgical research

Lancet

(1996)

N. Black

Why we need observational studies to evaluate the effectiveness of health care

BMJ

(1996)

R. McLeod

Randomized, controlled trials: is there a role for them in surgery?

Ann Surg

(2006)

R. Kunz et al.

Randomisation to protect against selection bias in healthcare trials

Cochrane Database Syst Rev

(2007)

Cited by (10)

No inexplicable disagreements between real-world data–based nonrandomized controlled studies and randomized controlled trials were found
2021, Journal of Clinical Epidemiology
Citation Excerpt :
To complement our analysis, we took a closer look at the comparisons in this review that showed strong disagreement. In the included reports that showed a statistically significant disagreement in a review by Anglemyer, the authors of the original studies themselves referred to a risk of cofounding bias, differences in participant characteristics, or differences in context/setting as explanations, suggesting that they were aware of the potential causes of disagreement [26–28]. Bun et al. found only statistically significant differences between RCTs and NRCSs in 4 of 31 comparisons.
We assessed disagreements between nonrandomized controlled studies based on real-world data (NRCS-RWDs) and randomized controlled trials (RCTs).
We systematically searched for studies that compared treatment effect estimates from NRCS-RWDs and RCTs on the same clinical question. We assessed the potential difference between NRCS-RWDs and RCTs related to internal and external validity. We calculated various meta-epidemiological measures to assess agreement. In case of disagreements, we tried to identify the probable causes of disagreements.
We included 12 studies comparing 15 treatment effect estimates of NRCS-RWDs and RCTs. There were many potential causes of disagreement. Ninety-five percent confidence intervals overlapped for 12 of 15 treatment effect estimates. Our analysis on predicted vs. observed overlap showed that there were no more disagreements than expected by chance. We observed only two substantial differences between the 15 treatment effect estimates. In both cases, we identified risk of bias in the NRCS-RWDs as the most probable cause of disagreement.
Our findings suggest that there are clinical questions where the difference in risk of bias between a well-conducted NRCS-RWD and an RCT is negligible. In our analysis, threats to external validity appeared to have no or only a weak impact on the disagreements of treatment effect estimates.
Diverse criteria and methods are used to compare treatment effect estimates: a scoping review
2016, Journal of Clinical Epidemiology
Citation Excerpt :
We included a total of 26 studies in this review (Fig. 1). Among the 26 included studies, 19 aimed to compare the effect estimates obtained using observational studies to those from randomized controlled trials [8–10,17–32]; from these, four were specific to observational studies that used propensity score methods [8,10,19,32], and one was specific to observational studies conducted using administrative data sets [27]. Five studies aimed to compare the effect estimates from systematic reviews that used indirect comparisons versus systematic reviews that used direct comparisons or network meta-analysis [33–37], one aimed to compare the effect estimates from large randomized trials versus systematic reviews that used meta-analysis of small trials [38], and one aimed to compare more than two types of study designs [39].
To determine what criteria researchers use to assess whether the estimates of effect of an intervention on a dichotomous outcome are different when obtained using different study designs.
Scoping review of the literature. We included studies of dichotomous outcomes in which authors compared the estimates of effects from different study designs. We performed searches in electronic databases and in the list of references of relevant studies. Two reviewers independently selected studies and abstracted data. We created a list of the criteria used to compare estimates of effects between study designs, described their main features, and classified them using a clinical perspective.
We included 26 studies, from which we identified 24 criteria. Most of the studies focused on comparing estimates from observational studies and randomized controlled trials (n = 19). The most common criteria aimed to determine whether there was a difference or not (n = 18), provided guidance for such a judgment (n = 16), and were based on the point estimates (n = 11). We judged 14 criteria to be appropriate and classified them as either statistically related or clinically related.
We found that diverse criteria are used to compare effect estimates between study designs. Familiarity with these would aid in the interpretation of results from different studies regarding the same question.
Fifty ways to reduce length of stay: An inventory of how hospital staff would reduce the length of stay in their hospital
2012, Health Policy
In this study we present a bottom up approach to developing interventions to shorten lengths of stay. Between 1999 and 2009 we applied the approach in 21 Dutch clinical wards in 12 hospitals. We present the complete inventory of all interventions.
We organised, on the hospital ward level, structured meetings with the staff in order to first identify barriers to reduce the length of stay and then later to link them to interventions. The key components of the approach were a benchmark with the fifteenth percentile and the use of a matrix, that on one side was arranged along the main phases of the care process – the admission, stay and discharge – and on the other side to the degree to which the length of stay could be shortened by the medical specialists and nurses themselves or by involving others.
The matrix consists of a wide variety of interventions that mainly cover what we found in published research. As a bottom up approach is more likely to succeed, we would advise wards that have to reduce length of stay to make the inventory themselves, using appropriate benchmark data, and by using the matrix.
Low P-values should be supplemented by absolute frequencies of observations
2011, Journal of Clinical Epidemiology
Low P-values exclude nothing, and P-values are no substitute for measures of effect
2011, Journal of Clinical Epidemiology
Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials: a meta-epidemiological study
2024, Cochrane Database of Systematic Reviews

View all citing articles on Scopus

View full text

Original ArticleReported effects in randomized controlled trials were compared with those of nonrandomized trials in cholecystectomy

Abstract

Objectives

Study Design and Setting

Results

Conclusion

Introduction

Section snippets

Literature search

Results of the literature search

Discussion

Lancet

J Clin Epidemiol

Lancet

Lancet

Injury

J Clin Epidemiol

Lancet

J Hand Surg [Br]

Lancet

Lancet

J Clin Epidemiol

J Clin Epidemiol

How study design affects outcomes in comparisons of therapy. I: Medical

Stat Med

How study design affects outcomes in comparisons of therapy. II: Surgical

Stat Med

Bias in treatment assignment in controlled clinical trials

N Engl J Med

The best and the enemy of the good: randomised controlled trials, uncertainty, and assessing the role of patient choice in medical decision making

J Epidemiol Commun Health

Surgical research

Lancet

Why we need observational studies to evaluate the effectiveness of health care

BMJ

Randomized, controlled trials: is there a role for them in surgery?

Ann Surg

Randomisation to protect against selection bias in healthcare trials

Cochrane Database Syst Rev

Original Article
Reported effects in randomized controlled trials were compared with those of nonrandomized trials in cholecystectomy