Raising Awareness for the Replication Crisis in Clinical Psychology by Focusing on Inconsistencies in Psychotherapy Research: How Much Can We Rely on Published Findings from Efficacy Trials?

Citation: Hengartner MP (2018) Raising Awareness for the Replication Crisis in Clinical Psychology by Focusing on Inconsistencies in Psychotherapy Research: How Much Can We Rely on Published Findings from Efficacy Trials? Front. Psychol. 9:256. doi: 10.3389/fpsyg.2018.00256 Raising Awareness for the Replication Crisis in Clinical Psychology by Focusing on Inconsistencies in Psychotherapy Research: How Much Can We Rely on Published Findings from Efficacy Trials?

The replication crisis addresses a fundamental problem in psychological research. Reported associations are systematically inflated and many published results do not replicate, suggesting that the scientific psychological literature is replete with false-positive findings (Pashler and Harris, 2012;Yong, 2012;Aarts et al., 2015). Unfortunately, the replication crisis remained almost unanswered in clinical psychology until very recently. Leichsenring et al. (2017) and Tackett et al. (2017) are to be complimented on their comprehensive recommendations for clinical science replicability, as these two contributions were the first to address this important topic with respect to clinical psychology. Their arguments are persuasive and elaborate, but some controversial topics not detailed by these authors need to be addressed in order to provide a critical appraisal of our most heeded research findings. Therefore, in order to raise awareness for the replication crisis in clinical psychology, I will outline some specific issues underscoring that inconsistent and systematically biased research findings persistently compromise the yield of clinical research. For it I will elaborate on the efficacy of psychotherapy, which arguably is the most cited research topic within clinical psychology.

PUBLICATION AND REPORTING BIAS INFLATES EFFICACY
Concerning replicability in psychotherapy research, the main question to pose is: How much can we rely on the published evidence? To start with it needs to be acknowledged that the average efficacy of psychotherapy based on the scientific literature is systematically overestimated due to publication bias (Cuijpers et al., 2010a;Driessen et al., 2015;Cristea et al., 2017a). In accordance with findings from psychopharmacological research (e.g., Turner et al., 2008), studies with unfavorable treatment outcome are less likely to be published in the scientific literature. For instance, Driessen et al. (2015) found that 24% of all trials aimed at evaluating the efficacy of psychological treatments for major depression funded by the National Institutes of Health were not published, which led to a 25% reduction in the estimated efficacy of psychotherapy (g = 0.52 vs. g = 0.39 after consideration of unpublished trials). Similarly, focusing exclusively on the efficacy of cognitive-behavioral therapy (CBT) for adult major depression, Cuijpers et al. (2010a) reported a reduction of 37% in efficacy after adjustment for publication bias (d = 0.67 vs. d = 0.42 after imputation of unpublished trials). On the individual study-level, some researchers use selective outcome reporting to illegitimately present findings in an opportunistic way. Outcome reporting bias is very prevalent in clinical science and indicates that authors omit or change primary outcomes on basis of the results in order to avoid undesired findings (Dwan et al., 2008). For instance, Kirkham et al. (2010) showed that adjusting for outcome reporting bias reduced the primary treatment effect by 20% or more in 23% of all meta-analyses of clinical trials reviewed. They further state that 19% of meta-analyses with an initially significant result became non-significant after adjustment for reporting bias. To the best of my knowledge, reporting bias was not systematically tested in psychotherapy research yet, but given its high prevalence in clinical science (Dwan et al., 2008) it is very likely that controlling for reporting bias would reduce the average efficacy of psychotherapy even further than sole correction for publication bias. Obtaining unbiased efficacy estimates for psychotherapy trials from the published literature is obviously a serious issue.

INCONSISTENT META-ANALYSES
The replication crisis in the clinical sciences becomes also evident when one scrutinizes the literature on the comparative efficacy of different psychotherapies. The allegiance bias means that outcome studies in psychotherapy research are biased toward the main authors' psychotherapeutic allegiance (Luborsky et al., 1999). In this regard it is important to specifically mention three recent meta-analyses that came to completely divergent conclusions on the relative efficacy of CBT vs. psychodynamic therapy. In their meta-analysis, Leichsenring and Rabung (2011), both devoted to psychoanalysis, concluded that long-term psychodynamic therapy is markedly superior to short-term modalities such as CBT. Conversely, Smit et al. (2012), found no evidence for the superiority of long-term psychoanalysis related to their primary outcome of recovery as well as to all of their secondary outcomes comprising target problems, general psychiatric symptoms, personality pathology, social functioning, overall effectiveness, and quality of life. Finally, a meta-analysis conducted by Tolin (2010) concluded that CBT was superior to (short-term) psychodynamic therapy for depression and anxiety disorders. Obviously, and in accordance with an alarming issue recently detailed by Ferguson and Heene (2012), changes in the study selection criteria and the analysis procedure allow for producing almost any desired meta-analytic outcome. Unfortunately, the scientific literature is amassed with such examples. Thus, what shall we tell our patients: is longterm psychoanalysis empirically supported or would they fare better (or at least as good) with a short-term therapy such as CBT? However, that may be, clinicians and researcher should be aware that the credibility of many meta-analyses is rather modest (Pereira and Ioannidis, 2011).

SYSTEMATIC BIASES ARE PERVASIVE
Another perennial hot topic in clinical psychology is the efficacy of pharmacological vs. psychological treatments. In a meta-analysis of direct comparisons, Cuijpers et al. (2013) as well as Huhn et al. (2014) found no significant differences between treatment modalities for panic disorder, generalized anxiety disorder and social phobia. Conversely, focusing on pre-post effect sizes, Bandelow et al. (2015) estimated that pharmacotherapy was largely superior to psychotherapy for these major anxiety disorders (d = 2.02, 95%-CI = 1.90-2.15, for medications vs. d = 1.22, 95%-CI = 1.14-1.30, for psychotherapies, p < 0.001). According to the authors this finding cannot be explained by heterogeneity, publication bias or allegiance effects (Bandelow et al., 2015). So, again a largely inconsistent finding impedes stringent clinical recommendations. Shall we recommend psychotropic drugs as first-line treatment for major anxiety disorders or is psychotherapy equally efficient? And what are the reasons for such striking discrepancies between aggregated study results? Cristea et al. (2017b) provide a partial explanation. In their recent meta-analysis they showed that trials who were funded by the pharmaceutical industry report slightly better outcomes for pharmacotherapy relative to psychotherapy. Indeed, research sponsored by the pharmaceutical industry or conducted by authors with industry-related financial conflicts of interest is systematically biased toward the industry's vested interests (Bekelman et al., 2003;Lexchin et al., 2003;Lundh et al., 2012). Apparently researchers can willingly produce results that match their (or their sponsors) expectations through questionable research practices (Simmons et al., 2011;Bakker et al., 2012). But financial interests and allegiance are only part of the story; reputation and promotion are equally powerful motives (Nosek et al., 2012). Differences in the study design are another explanation for inconsistencies between research findings. Khan et al. (2012) as well as Hróbjartsson et al. (2013) showed that unblinded trial assessors systematically overestimate the efficacy of the experimental intervention, and compared to pharmacotherapy trials, psychotherapy trials use significantly less blinded outcome assessors (Huhn et al., 2014). Given that participants in psychotherapy trials are not blinded, patients' treatment expectations and beliefs (see Chen et al., 2011) may further inflate the apparent efficacy of psychotherapeutic interventions. Finally, most psychotherapy trials use waitlist conditions as comparator. However, waitlist designs do not only produce larger efficacy estimates than trials with placebo or routine care comparator (Cuijpers et al., 2016), they may even impede or postpone spontaneous remission (Furukawa et al., 2014), which is referred to as a nocebo effect. The meta-analysis by Furukawa et al. (2014) is particularly revealing, as it showed that response rate in CBT for depression did not appreciably differ from psychological placebo (OR = 1.7), but it did so weakly from no-treatment conditions (OR = 2.4) and markedly from waitlist conditions (OR = 6.3). Likewise, comparing the effect of psychotherapy for major depression to pill placebo, Cuijpers et al. (2014b) found a poor effect size of g = 0.25, which is much smaller than the large effect sizes commonly obtained relative to waitlist conditions.

ON TRIAL QUALITY AND EFFECTIVENESS
Study quality is an important determinant of treatment efficacy in clinical science, but unfortunately, most published psychotherapy trials use poor methods such as small sample sizes, inadequate concealment of allocation, no intent-to-treat analyses, and unblinded outcome assessors (e.g., Newby et al., 2015;Cristea et al., 2017a). That hypothesis was stringently tested by Cuijpers et al. (2010b) with respect to psychotherapy for adult depression. Their results indeed revealed that high-quality studies are a small minority and that they yield remarkably lower mean effect size estimates than studies of lower quality (d = 0.22 vs. d = 0.74, p < 0.001). Using a continuous measure of study quality ranging from 0 to 8 points in a meta-regression showed that each additional point increase in study quality reduced the average effect size by −0.07 points (95%-CI = −0.09 to −0.05, p < 0.001). The impact of low-quality study bias was very recently replicated by Cristea et al. (2017a) in a meta-analysis of the efficacy of psychotherapy for borderline personality disorder, suggesting that these findings are generalizable. Worthy of note, the estimates outlined above refer almost exclusively to efficacy under controlled laboratory conditions using selected, unrepresentative patient samples. Just as in pharmacological research (see Naci and Ioannidis, 2015), evidence of efficacy for psychological interventions under optimal laboratory conditions often does not replicate in real world clinical settings (Westen et al., 2004). Due to selective samples and unrepresentative clinical settings, effectiveness of many empirically-supported psychological interventions is inadequately poor under naturalistic real-world conditions (Weisz et al., 1995;Hansen et al., 2002;Westen et al., 2004). Furthermore, some psychological interventions with proven laboratory-based efficacy turned out largely ineffective (Hallfors and Cho, 2007) or even harmful (Lilienfeld, 2007) in real-world effectiveness trials. That is, efficacy estimates are not only inflated due to scientific and methodological biases, they also poorly translate into measurable public health benefits. However, a crucial point to consider is: "What do psychotherapy trials actually measure?" Following the primacy of the biomedical model of mental disorder, clinical psychology has largely adapted the methods from pharmacology trials (Deacon, 2013). That is, symptom rating scales have become the primary outcome in most trials, but this is not necessarily the domain where psychotherapy has its most significant impact. Perhaps psychotherapy's major asset, in contrast to pharmacological treatments, is to improve social functioning (e.g., Fournier et al., 2015). Replicating effectiveness within these domains is perhaps even more challenging than replicating symptom-based efficacy.

SUMMARY AND CONCLUSIONS
As in other psychological specialties (see Bakker et al., 2012), effect sizes published in the clinical psychological literature are often heterogeneous and inflated due to various scientific biases including allegiance bias (Luborsky et al., 1999), publication bias (Driessen et al., 2015), unblinded outcome assessors (Khan et al., 2012), sponsorship bias (Cristea et al., 2017b), or small sample sizes (Cuijpers et al., 2010b). After adjustment for systematic biases, efficacy estimates for various psychotherapy modalities tend to be disappointingly small (Cuijpers et al., 2010b;Cristea et al., 2017a). Some evidence suggests that when efficacy is estimated based exclusively on unbiased highquality trials, effects of psychotherapy could fall below the threshold for clinical relevance (Cuijpers et al., 2014a). Recently, some psychotherapy researchers hence raised the controversial point that effects of both psychotherapy and pharmacotherapy for depression may entirely reflect a placebo effect (Cuijpers and Cristea, 2015). Of further concern is the gap between treatment efficacy in controlled laboratory trials and treatment effectiveness in naturalistic real-world settings (Westen et al., 2004;Hallfors and Cho, 2007). The literature reviewed in this commentary was restricted to the efficacy of clinical psychological interventions, as that topic is highly relevant for clinical psychology. Nevertheless, conflicting and irreproducible findings have been detected and discussed in various other hot topics within clinical psychology, including the debatable effect of menopause on the occurrence of depression (Rössler et al., 2016;Hengartner, 2017), the putative consequences of violent video games (Ferguson and Kilburn, 2010;Calvert et al., 2017), or inconsistent associations between psychopathology and stress physiology (Chida and Hamer, 2008;Rosmalen and Oldehinkel, 2011). Even though the replication crisis was mostly addressed within social psychology, I conclude that it is no less pernicious and prevalent in clinical psychology. Psychotherapy was a marvelous invention, but initial enthusiasm regarding its efficacy has now been obfuscated due to scientific biases that systematically inflate estimates. Being aware of these issues may certainly improve our scientific and clinical endeavors.

AUTHOR CONTRIBUTIONS
The author confirms being the sole contributor of this work and approved it for publication.