Effectiveness of cognitive behavioural therapy with people who have autistic spectrum disorders: A systematic review and meta-analysis

• CBT when used with people who have autism is associated with a small or medium effect size. • Effect size varied according to type of outcome measure used. • Outcomes measured using self-report are associated with a small and non-significant effect size. • Outcomes measured using informant or clinician ratings are associated with either a small or medium effect size. • There have been many small trials and large definitive trials are now needed.


Introduction
Autism spectrum disorders (ASDs) are a range of neurodevelopmental disorders characterised by difficulties with social communication and interaction across contexts, as well as restricted and repetitive patterns of behaviour, interests and activities. The phenotype incorporates a range of symptoms across multiple domains, including cognitive, behavioural, affective and sensory symptoms (Volkmar, Paul, Klin, & Cohen, 2005;Wiggins et al., 2015). Sleeping and eating difficulties, synaesthesia, as well as affective dysregulation, and difficulties with initiation, planning and organisation are often present (Baron-Cohen, 2008;Wiggins et al., 2015). The prevalence amongst 4 year olds has been estimated to be approximately 13.4 per 1000 (Christensen et al., 2016), while the adult prevalence has been estimated to be 9.8 per 10,000 (Brugha et al., 2011).
There has been a marked increase in psychosocial interventions that aim to treat the symptoms or features of ASDs. In the United Kingdom, the National Institute for Health and Care Excellence (2012a) recommended that people with ASDs should be offered age-appropriate psychosocial interventions for comorbid mental health problems and the core symptoms of ASDs. There are a large number of interventions claiming to treat symptoms of ASDs, even though the evidence base is poor (Matson, Adams, Williams, & Rieske, 2013). However, there is evidence to support the use of applied behaviour analysis in the treatment of symptoms of ASDs, and the authors of a Cochrane review concluded that early and intensive behavioural interventions can lead to improvements in adaptive, and communicative behaviour, as well as social skills (Reichow, Barton, Boyd, & Hume, 2012). Nevertheless, there are few studies examining the effectiveness of these types of interventions with adults, as opposed to children, with ASDs (Wright, Brooks, D'Astous, & Grandin, 2013).
Alongside this, psychiatric comorbidity amongst people with ASDs is elevated (Green, Gilchrist, Burton, & Cox, 2000;Kim, Szatmari, Bryson, Streiner, & Wilson, 2000;Lugnegård, Hallerbäck, & Gillberg, 2011;Rescorla, 1986;Russell & Sofronoff, 2005), prompting many to consider how to adapt and deliver psychological therapies for children, adolescents and adults with ASDs. Several meta-analytic or narrative reviews involving studies that recruited samples of children and adolescents have been completed in this area examining the effectiveness of cognitive behavioural therapy (CBT) for anxiety disorders or social skills training (Ho, Stephenson, & Carter, 2014Kreslins, Robertson, & Melville, 2015;Spain & Blainey, 2015;Sukhodolsky, Bloch, Panza, & Reichow, 2013;Ung, Selles, Small, & Storch, 2014). While all of the aforementioned studies have concluded that CBT and associated interventions for anxiety amongst children with ASDs appear to be promising, none have considered CBT across the lifespan. Further, none of the previously completed meta-analyses have: (a) considered CBT, as opposed to applied behavioural analysis, when used as a treatment for the actual symptoms or features of ASDs, rather than the treatment of anxiety disorders, (b) included studies involving adult participants, and (c) included other affective disorders, such as depression, alongside anxiety disorders. In order to address these weaknesses, we completed a comprehensive meta-analysis and systematic review of the literature which aimed to investigate the effectiveness of cognitive behavioural therapy across the lifespan for either (a) affective disorders more broadly, while focusing on anxiety disorders as well, or (b) the symptoms and features associated with ASDs. A supplementary aim was to investigate whether there are differences in outcome for children, adolescents and adults.

Method
Relevant studies were identified by systematic searches of the following electronic databases: PsycINFO; MEDLINE; CINAHL Plus, Web of Science, as well as Google Scholar. The Cochrane Library was searched to identify any existing systematic reviews. The key search terms and how they were combined are found in Table 1. Terms were searched using English and American terminology, spelling, and truncation to ensure that all variant word endings were identified. Alongside this, the ancestry method was used to identify any further papers that may have met eligibility criteria. The grey or fugitive literature was also searched in an attempt to minimise publication bias. An initial search was completed via http://www.opengrey.eu which includes research reports, dissertations and conference papers. Dissertation Abstracts -International and the Comprehensive Dissertation Index were also searched, as well as trial registers. The final search for studies was completed on 29 January 2016. The review was registered with PROSPERO, an international database of systematic reviews in health and social care, in order to provide transparency to the review process and to

Eligibility criteria and study selection
Initially, titles and abstracts were screened for eligibility, and studies were included if all of the following criteria were met: (a) participants had a diagnosis of Autism Spectrum Disorder (or autistic disorder, Asperger disorder, childhood disintegrative disorder or pervasive developmental disorder not specified prior to the publication of DSM-V), and diagnosis was made by a qualified clinician and/or using a standardised diagnostic assessment; (b) studies used a control or comparison group design, e.g. waiting list or treatment as usual (TAU), with or without randomisation; (c) a clinician-led CBT intervention, either individual or group-based, incorporating both cognitive and behavioural components was used. Interventions in which CBT theory and principles were utilised to teach or improve behavioural patterns, e.g. social skills, were included, provided that this was explicitly stated; (d) use of at least one validated and standardised outcome measure of either core features of ASDs, i.e. difficulties in social interaction, impaired social communication or restricted or repetitive patterns of behaviour and interests, or co-occurring symptoms of mental disorder, e.g. anxiety, depression, and (e) written in English.
Studies that aimed to treat affective disorders or symptoms of ASDs were analysed separately for two reasons: (a) the "target" of the intervention was separate in these studies, with one group focusing on trying to treat symptoms of affective disorders, while the other attempted to reduce difficulties or symptoms associated with having an ASD, and (b) CBT for either incorporated psychoeducation, skills teaching, skills practice, behavioural experiments, and cognitive restructuring. However, the description of the interventions across studies was at times sparse, and it was at times difficult to ascertain the degree to which cognitive restructuring was used within some of the interventions. As a consequence, it was clear that the intervention incorporated both cognitive and behavioural components for some studies, while for others, this was less clear, although in all instances, the interventions were described by the authors as using both cognitive and behavioural methods However, it is important to bear in mind that CBT incorporates both cognitive and behavioural components, although for some disorders there is a clear focus on behavioural interventions (e.g. exposure and response prevention) when delivering CBT. We excluded any studies that solely made use of behavioural methods alone.
Studies were excluded if any of the following criteria were met: (a) the methodology used was a single case, case series, qualitative, metaanalysis or review articles; (b) the design of the study was such that the effect of the CBT intervention could not be isolated from other treatment methods, e.g. psychotropic medication; (c), the primary intervention was applied behavioural analysis or behaviour modification, or behavioural activation as a stand-alone treatment; and (d) the dataset had been used within a previously included study to avoid double counting of data (Senn, 2009). No limits were applied to the date of publication, age of participants or whether the study has been published in a peer review journal.
Studies that were non-randomised were not excluded. While this represents an inherent weakness by increasing the risk of bias, the decision was made to include non-randomised studies at this stage considering the likelihood that few definitive (Phase III) trials within this area have been completed.
Following the removal of duplicate studies, the systematic search of the electronic databases returned 2332 potentially eligible studies. Following an initial screen of the titles and abstracts, 2263 were excluded. In addition to the remaining 69 studies, a further 102 were identified using the ancestry method, and two were located from searching the grey literature. The resulting total number of papers retrieved was 173, six of which were protocols. The authors of protocols were contacted directly to try to source outcome data; two of these research groups provided data, while the remaining four did not respond and were excluded. A further 107 papers were excluded because they did not include a comparison or control group, five were excluded because they had made use of a pre-existing dataset that had been previously included, four were excluded because they did not include cognitive-behavioural components within the intervention, one was excluded due to a lack of validated or standardised outcome measures, one was excluded because the effects of CBT could not be isolated and one was excluded because we were unable to trace the paper.
The remaining 50 studies met the eligibility criteria, although two studies were excluded at this stage because the published data were insufficient and we could not calculate effect sizes; the authors did not respond to our request for further data (DeRosier, Swick, Davis, McMillen, & Matthews, 2011;Provencal, 2003). Forty-eight studies, involving 2099 participants (1081 CBT, 1018 control) were therefore included in the quantitative synthesis. Fig. 1 depicts a PRISMA flow diagram , outlining the identification, screening and inclusion or exclusion of articles throughout the process. Reasons for article rejection are clearly indicated. The eligibility criteria were applied by two authors (LW & PL) independently, and inter-rater reliability was excellent, 96.5%, k = 0.92, 95% CI [0.85, 0.98].

Data analysis
The standardised mean difference (SMD) was calculated to estimate the difference between the treatment and control conditions. Cohen's d was transformed into Hedge's g (Hedges, 1981) using correction factor J to correct for possible positive bias due to small sample sizes. The magnitude of Hedge's g was interpreted using Cohen's convention as small (0.2), medium (0.5), and large (0.8). The variance and standard error of g for each study was calculated. As outcome measures may take the form of self-, clinician-or informant-reports, and there is evidence to suggest that people with ASD may have difficulties with judging their own social or communicative behaviour (Baron-Cohen, Jolliffe, Mortimore, & Robertson, 1997) effect sizes were calculated individually for each type of outcome measure where possible (i.e. outcome measures were grouped as either self-report, informant-report, clinician-report, or task-based, where participants were invited to complete a task, such as an emotion recognition task using faces). In this context, an informant-based outcome measure was a rating of clinical symptomatology provided by a third party who was not the clinician or the participant. Often, this person was a family member.
The analysis was undertaken using RevMan Version 5.3. A random effects model was used for the following reasons: (a) heterogeneity was anticipated as data came from a variety of sources and we could not assume a common effect size; and (b) inferences made from random effects models are unconditional and can be applied to a population of studies larger than the sample.
Heterogeneity was thought to be associated with whether CBT was delivered as a group or individually, the age range of participants, and symptom severity. This was explored using the I 2 statistic, which describes the percentage of variation across studies due to heterogeneity, rather than chance (Higgins & Thompson, 2002). The I 2 statistic has been chosen rather than Cochran's Q since it enables quantification of the effect of heterogeneity, providing a measure of the degree of inconsistency in results (Higgins & Thompson, 2002), and it does not inherently depend on the number of studies included in the meta-analysis (Higgins, Thompson, Deeks, & Altman, 2003). The degree and impact of heterogeneity was assessed using the categorisation of low (25%), medium (50%) and high (75%), in addition to a quality assessment of the methodology (Higgins et al., 2003). A sensitivity analysis was also undertaken. Outliers were removed and the weighted mean effect size was recalculated. Publication bias was assessed graphically using funnel plots, plotting summary effect size against standard error (Light & Pillemer, 1984); a skewed and asymmetrical plot may indicate a publication bias (Iyengar & Greenhouse, 2009). Fail-safe N (Rosenthal, 1991) was used to assess the impact of bias by calculating an estimate of the number of new studies averaging a null result that would be required to bring the overall treatment effect to nonsignificance. A figure exceeding 5n + 10 would indicate that the results could be considered robust to the effects of publication bias (Rosenthal, 1991).

Quality appraisal
Quality appraisal of included studies was undertaken by two authors (LW & PL) independently using the National Institute for Health and Care Excellence Quality Appraisal Checklist for Quantitative Intervention Studies (National Institute for Health and Care Excellence, 2012b), bearing in mind that the use of such scales has been criticised in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidance

Quality appraisal
The key characteristics of the 50 included studies are found in Appendix A, while the summary quality appraisal ratings for each study are found in Appendix B. A persistent problem across all studies was small sample size, contributing to reduced power. Freitag et al. (2015) included the highest number of participants (101 CBT, 108 control), whilst eight of the studies included in the quantitative synthesis involved less than ten participants per group. Several of these studies were defined by the authors as pilot or feasibility trials. However, a number of studies that were not called pilot or feasibility trials, were in fact lower in quality and had smaller sample sizes than many clearly defined pilot or feasibility trials. Quality appraisal and risk of bias were therefore considered on a study by study basis and sensitivity analysis was conducted by removing studies deemed to be at high risk of bias, rather than those labelled as pilot or feasibility trials.
Other common problems included the lack of reporting on participant engagement within intervention sessions, poor reporting on missing data, and minimal information on fidelity checks. Very few studies reported adequate allocation concealment and ten of the studies included in meta-analysis were non-randomised, contributing to a high risk of allocation bias. Due to the nature of the interventions involved, it is not possible for investigators to blind participants (and often informants) to intervention allocation. However, blinding of outcome assessors was possible but was not conducted in the majority of studies, contributing to detection bias.
A final common difficulty across studies was failure to specify a primary outcome measure. This complicated the meta-analysis, particularly in studies where a high number of outcome measures were utilised or different measures were used to assess a range of constructs, because we were left to make the decision as to which outcome measure to use within our meta-analysis. We made this decision based upon the predominant hypothesis or research question under investigation. For example, where a study aimed to investigate the effectiveness of an intervention for social skills, we chose the instrument that was used to measure social skills so that the study could be included in our metaanalysis. In some circumstances, researchers made use of more than one measure which was associated with the predominant hypothesis; in these instances, we chose the most commonly used measure across studies in an attempt to reduce heterogeneity. Where there were no commonalities across studies, the authors did not specify their primary outcome measure, and there were multiple measures used, we chose the primary outcome measure at random. The lack of measures validated for use with individuals with ASD was noted, although this is clearly a wider issue that needs attention.

Cognitive behavioural therapy for affective disorders
Twenty-four of the included studies aimed to examine the effectiveness of CBT for affective disorders, with the bulk attempting to treat anxiety disorders, with others targeting depression or emotion regulation difficulties. Seventeen of these studies involved children and adolescents, whilst four included adult participants. Three studies included both adolescent and adult participants and were therefore assigned to a 'Mixed Age' subgroup for analysis (McGillivray & Evert, 2014;Pahnke, Lundgren, Hursti, & Hirvikoski, 2014;Russell et al., 2013). Fifteen of the 24 studies examined group-based CBT, whilst eight reported on individual CBT. The remaining study involved 21 group sessions, as well as three individual sessions (Langdon et al., 2016. Since this study was predominantly group-based, the decision was made to include it in the 'group-based' subgroup when analysing mode of CBT delivery. The majority of studies targeted anxiety (15 of the 24 studies). As this was such a large group, a subgroup analysis was conducted to assess potential variations of treatment effects across age groups within this subset of studies. This included studies investigating the treatment of anxiety disorders that had been included in earlier meta-analytic work (Sukhodolsky et al., 2013;Ung et al., 2014), but also included additional studies; two studies targeted symptoms of obsessive compulsive disorder (Russell et al., 2013;Russell, Mataix-Cols, Anson, & Murphy, 2009) were also included within this subset, as was a study investigating depression, anxiety and rumination (Spek, van Ham, & Nyklíček, 2013) and a study investigating depression, anxiety and stress (McGillivray & Evert, 2014). In the latter two studies, only outcomes pertaining specifically to anxiety were used to reduce heterogeneity within the quantitative synthesis as much as possible. In total, 19 studies were included within the anxiety subset. Of the remaining five studies, one targeted anger (Sofronoff, Attwood, Hinton, & Levin, 2007), one targeted general emotional regulation skills (Scarpa & Reyes, 2011), one targeted insomnia (Cortesi, Giannotti, Sebastiani, Panunzi, & Valente, 2012), one targeted self-esteem, quality of life and sense of coherence (Hesselmark, Plenty, & Bejerot, 2014) and one targeted stress and emotional distress (Pahnke et al., 2014).
Fourteen studies were defined as randomised controlled trials, seven of which compared a CBT intervention with a waitlist control group, and three compared CBT to treatment as usual. Three randomised controlled trials compared CBT to a non-CBT group-based treatment: either a social recreational program (Hesselmark et al., 2014;Sung et al., 2011) or an anxiety management group (Russell et al., 2013). The final randomised controlled trial (Cortesi et al., 2012) compared a CBT group to a group which received a placebo drug. This study also included a condition in which participants received melatonin and a condition in which participants received both melatonin and CBT. Participants from these intervention arms were not included as the use of a drug-based comparison group was not utilised in any other included study.
Three of the 24 studies investigating CBT for the treatment of affective disorders were quasi-experimental or non-randomised (Clarke, 2012;McGillivray & Evert, 2014;van Steensel, Dirksen, & Bögels, 2014), whilst seven were called pilot studies. Three of the seven pilot studies within this group were randomised, whilst four were not, and six compared a CBT intervention to a waitlist control group, whilst one compared CBT to treatment as usual.
As anticipated, there was extensive variation in the outcome measures used across studies. Many studies included outcome measures from various sources, with the most common report type being self-report within studies targeting co-occurring symptoms of affective disorder, followed closely by informant-report (usually parent) outcomes and clinician-rated outcomes. Only one study within this group used a task-based outcome measure (Cortesi et al., 2012). There was also considerable variation in the intensity and content of intervention. The number of sessions ranged from four to 50, whilst the length of each session ranged from 40 to 180 min. The majority of studies used a structured protocol (22 out of 24), with 21 of the studies utilised "traditional" CBT methods, with common components including role play, exposure and teaching/ rehearsal of emotional regulation skills. Common adaptations to CBT included an increased emphasis on behavioural rather than cognitive components, the use of social stories and vignettes and increased involvement of family members. One of the studies (Hepburn, Blakeley-Smith, Wolff, & Reaven, 2015) piloted a videoconferencing CBT intervention designed for delivery in a small, multifamily group format, whilst another study (Spek et al., 2013) used a modified version of Mindfulness Based Therapy with cognitive elements omitted. Another used a modified Acceptance and Commitment Therapy protocol and participants in the CBT group engaged in daily mindfulness exercises in addition to structured intervention sessions.

Cognitive behavioural therapy for ASD
There were 24 included studies that examined the effectiveness of CBT for symptoms or features of ASD. One study investigated both the effect of CBT on social skills and anxiety (White et al., 2013) and the outcomes pertaining to social skills were included in the meta-analysis. Another intervention study focused upon both social communication and anxiety, but the findings were reported in two separate papers (Fujii et al., 2013;Wood, Fujii, Renno, & Van Dyke, 2014); the decision was made to exclude Fujii et al. (2013) as inclusion would have led to the double counting of data. Provencal (2003) and DeRosier et al. (2011) were excluded as attempts to obtain data required to calculate effect sizes were unsuccessful.
The majority of studies targeted social skills (18 of the 24 studies included in quantitative synthesis), while of the remaining six studies, four targeted Theory of Mind (Begeer et al., 2011(Begeer et al., , 2015Ozonoff & Miller, 1995;Solomon, Goodlin-Jones, & Anders, 2004), one targeted affectionate communication (Andrews, Attwood, & Sofronoff, 2013) and one targeted the perception of facial emotions (Baghdadli et al., 2013). A number of studies targeted both social skills and aspects of social cognition. In these circumstances, the primary outcome measure was included, but there was extensive variation in outcome measures across studies. In situations in which the primary outcome measure was not specified, only outcome measures pertaining to social skills were included to avoid comparisons of different constructs across report types. The most common type of outcome measure was informant-report, followed by self-report. In contrast to studies investigating the effectiveness of CBT for affective disorders, seven studies within this group utilised a task-based measures, for example Theory of Mind tasks.
Fourteen of the studies were randomised controlled trials, one of which is the only Phase III trial in this area to date (Freitag et al., 2015). This study compared CBT to treatment as usual, whilst thirteen of the RCT's compared a CBT intervention with a waitlist control group. The final RCT (Soorya et al., 2015) compared CBT to a facilitated play active control group. Three of the remaining ten studies were quasi-experimental or non-randomised, and seven were labelled pilot studies. These studies were included in the initial analysis but the quasi-experimental studies involved a variety of control groups: Ozonoff and Miller (1995) compared CBT to no treatment, Laugeson, Frankel, Gantman, Dillon, and Mogil (2012) used a waitlist control group and Laugeson, Ellingsen, Sanderson, Tucci, and Bates (2014) and Laugeson and Park (2014) reported the use of an active control group based on a non-CBT social skills curriculum ("Superskills", Coucouvanis, 2004). Three pilot studies used a waitlist control group, two compared CBT to treatment as usual and one compared CBT to "no intervention" (Koning, Magill-Evans, Volden, & Dick, 2013). The remaining study reported the use of an active control group with sessions consisting predominantly of leisure activities (Baghdadli et al., 2013). Six of the seven pilot studies within this group were randomised, whilst the remaining study was quasi-experimental (Turner- Brown, Perry, Dichter, Bodfish, & Penn, 2008).
There was considerable variation in the intensity and content of intervention. The number of sessions ranged from five (Andrews et al., 2013) to 70, with  reporting on an intervention in which children received 30 minute sessions five days per week over a period of 14 weeks. The length of each session ranged from 30 min to whole day sessions. The majority of studies investigating the effectiveness of CBT for core features of ASD used a structured protocol (22 out of 24). In terms of treatment content, studies within this group less commonly reported "traditional" CBT methods. Some studies did not directly refer to cognitive behavioural therapy per se, but they explicitly mentioned the inclusion of both cognitive and behavioural techniques in the intervention, and therefore met inclusion criteria for the current study. Content commonly included direct social skills teaching and role play, emotional identification work and problem-solving exercises or discussions. Common adaptations included increased use of social stories and vignettes, increased use of role play and the involvement of family members in intervention sessions and homework activities.

Self-report outcome measures
Seventeen studies, including 645 participants (329 CBT, 316 control), included self-reported outcome measures. One study utilised a relevant self-reported outcome measure but it was not possible to include this in the analysis as an attempt to obtain the data necessary to calculate the effect size was unsuccessful (Storch et al., 2013). The outcome measures used varied considerably across studies. A random-effects meta-analysis of these trials indicated a small to medium but non-significant effect favouring CBT over waiting-list, treatment as usual or active control as reported by participants, g = 0.24; 95% CI [−0.05, 0.53], z = 1.6, p = 0.11, (Fig. 2). The analysis revealed a significant amount of heterogeneity, with I 2 indicating that 69% of the variability in estimated treatment effect was due to heterogeneity rather than chance, p b 0.001.
As one study, had a SMD (g = 2.64) considerably higher than the other included studies, g ranged from −0.39 to 0.85, a sensitivity analysis was conducted and this outlier was removed (Chalfant, Rapee, & Carroll, 2007). Exclusion of this study resulted in no significant treatment effect, g = 0.10; 95% CI [−0.06, 0.27], z = 1.21, p = 0.23, and I 2 reduced markedly to 4%, p = 0.41, indicating the considerable impact that the inclusion of this study had on the pooled SMD. A further sensitivity analysis to remove studies deemed to be at a high risk of bias (Clarke, 2012;Hesselmark et al., 2014;McGillivray & Evert, 2014;Reaven et al., 2009;Russell et al., 2009)  Eleven studies were included within our sub-group analysis focusing on the treatment of anxiety disorders using self-report measures. A random effects meta-analysis of these trials revealed a non-significant small to medium effect size, g = 0.32; 95% CI [− 0.10, 0.75], z = 1.50, p = 0.13. The analysis revealed a significant amount of heterogeneity, with I 2 indicating that 77% of the variability in estimated treatment effect was due to heterogeneity rather than chance, p b 0.001. To complete our sensitivity analysis, Chalfant et al. (2007) was removed as it was judged to an outlier, which reduced the effect size to g = 0.08; 95% CI [−0.12, 0.29], z = 0.79, p = 0.43, I 2 = 0%, p = 0.63. We removed two further studies judged to be at high risk of bias (Clarke, 2012;Reaven et al., 2009) reduced the effect size further, g = 0.01; 95% CI [−0.21, 0.24], z = 0.12, p = 0.90, I 2 = 0%, p = 0.78.

Informant-report outcome measures
Sixteen studies, including 620 participants (325 CBT, 295 control), made use of informant-reported outcome measures. One study utilised a relevant informant-reported outcome measure but was excluded because we did not obtain the data necessary to calculate the effect size (Pahnke et al., 2014). The outcome measures used varied considerably across studies. The meta-analysis of these trials indicated a significant medium effect favouring CBT over waiting-list, treatment as usual or active control as reported by informants, g = 0.66; 95% CI [0.29, 1.03], z = 3.49, p b 0.001, (Fig. 3). The analysis indicated a significant amount of heterogeneity, with I 2 indicating that 78% of the variability in estimated treatment effect was due to heterogeneity rather than chance, p b 0.001.
Again, Chalfant et al. (2007) had a SMD, g = 4.27, considerably higher than the other included studies, g ranged from − 0.39 to 1.21, and a sensitivity analysis was therefore conducted to remove this outlier. Exclusion of this study resulted in a lower treatment effect, g = 0.47; 95% CI [0.25, 0.69], z = 4.17, p b 0.001, although it remained statistically significant. I 2 reduced to 38%, p = 0.07, again indicating the impact that the inclusion of this study had on the pooled SMD. A further sensitivity analysis to remove studies deemed to be at a high risk of bias (Clarke, 2012;Hepburn et al., 2015;Reaven et al., 2009;Scarpa & Reyes, 2011) resulted in a very similar effect, g = 0.45; 95% CI [0.18, 0.72], z = 3.24, p = 0.001.

Clinician-rated outcome measures
Thirteen studies, including 514 participants (262 CBT, 252 control), made use of clinician-rated outcome measures, but there was substantial variation in the type of choice of measure. Two of these studies presented dichotomous data (Chalfant et al., 2007;van Steensel et al., 2014). In order to include these studies in a random-effects meta-analysis, the odds ratio was calculated and reexpressed as a SMD (Chinn, 2000). A random-effects meta-analysis using the Generic Inverse Variance method was conducted as estimates of effect were calculated for the two aforementioned studies. The analysis indicated a significant medium effect favouring CBT over waiting-list, treatment as usual or active control as rated by clinicians, g = 0.73; 95% CI [0.38, 1.08], z = 4.05, p b 0.001, (Fig. 4). The analysis again indicated a significant amount of heterogeneity, with I 2 indicating that 69% of the variability in estimated treatment effect was due to heterogeneity rather than chance, p b 0.001.

Task-based outcome measures
As only one study made use of this type of outcome measure, if was not possible to calculate the pooled SMD.

Self-report outcome measures
Nine studies (370 participants; 192 CBT, 178 control), investigated the effectiveness of CBT in treating symptoms associated with ASD and included appropriate self-reported outcome measures. As indicated in Fig. 5, a random-effects meta-analysis of these trials indicated a small, but non-significant effect favouring CBT over waiting-list, treatment as usual or active control, as reported by participants, g = 0.25; 95% CI [−0.03, 0.53], z = 1.77, p = 0.08. Heterogeneity was not significant, although I 2 indicated that 40% of the variability in estimated treatment effect was due to heterogeneity rather than chance, p = 0.10. A sensitivity analysis to remove studies deemed to be at a high risk of bias (Gantman, Kapp, Orenski, & Laugeson, 2012;Laugeson et al., 2012;Turner-Brown et al., 2008) resulted in no significant treatment effect, g = 0.10; 95% CI [−0.24, 0.45], z = 0.58, p = 0.56.

Informant-report outcome measures
Eighteen studies (950 participants; 480 CBT, 470 control) were included in this analysis revealing a significant small effect favouring CBT over waiting-list, treatment as usual or active control as reported by informants, g = 0.48; 95% CI [0.30, 0.65], z = 5.39, p b 0.001. Heterogeneity was not significant, although I 2 indicated that 36% of the variability in estimated treatment effect was due to heterogeneity rather  than chance, p = 0.06. A sensitivity analysis to remove studies deemed to be at a high risk of bias Ichikawa et al., 2013;Koning et al., 2013;Laugeson et al., 2012) resulted in a slightly larger medium treatment effect, g = 0.52; 95% CI [0.34, 0.70], z = 5.63, p b 0.001, with a small reduction in heterogeneity, I 2 = 33%, p = 0.12 (Fig. 6).

Clinician-rated outcome measures
Six studies, including 153 participants (79 CBT, 74 control) were included. One of these studies presented the outcome as dichotomous data, and therefore the odds ratio was calculated and expressed as a SMD (Koenig et al., 2010); the generic inverse variance method the estimate of effect was calculated. The analysis indicated a significant "medium" effect favouring CBT over waiting-list, treatment as usual or active control as rated by clinicians, g = 0.65; 95% CI [0.10, 1.21], z = 2.30, p = 0.02 (Fig. 7). Heterogeneity was non-significant, although I 2 indicated that 47% of the variability in estimated treatment effect was due to heterogeneity rather than chance, p = 0.10.
One study had a SMD, g = 2.43 (Koenig et al., 2010), considerably higher than the other included studies, g ranged from 0.08 to 1.51. Removing this outlier resulted in a lower treatment effect, g = 0.47; 95% CI [0.09, 0.85], z = 2.40, p = 0.02, although it remained statistically significant. I 2 reduced to 1%, p = 0.40, indicating the considerable impact that the inclusion of this study had on the pooled SMD. A further sensitivity analysis to remove studies deemed to be at a high risk of bias (Ichikawa et al., 2013;Turner-Brown et al., 2008;Wood et al., 2014) resulted in a very similar but lower and non-significant treatment effect, g = 0.44; 95% CI [− 0.01, 0.89], z = 1.90, p = 0.06. It is highly likely that this is related to the fact that the exclusion of the above studies left only two studies in the analysis, and as such, this analysis should be interpreted with marked caution.

Task-based outcome measures
Seven studies, incorporating 237 participants (117 CBT, 120 control), were included in this analysis, which revealed a significant small effect in favour of CBT over waiting-list, treatment as usual or active control on task-based measures, g = 0.35; 95% CI [0.09, 0.61], z = 2.67, p = 0.008 (Fig. 8). Heterogeneity was not an issue, I 2 = 0%, p = 0.58. Removing studies deemed to be at a high risk of bias (Baghdadli et al., 2013;Koning et al., 2013;Ozonoff & Miller, 1995;Solomon et al., 2004) resulted in a very similar non-significant effect size, g = 0.30; 95% CI [−0.12, 0.72], z = 1.42, p = 0.16. Again, it is highly likely that this is related to the fact that the exclusion of the above studies left only three studies in the analysis should therefore be interpreted with marked caution.

The effectiveness of CBT across differing age groups
Further subgroup analysis using self-report outcome measures was not completed because our initial analysis indicated that CBT was not superior to control conditions when used to treat either affective disorders of symptoms associated with autism. While there were 16 studies that made use of informant-report outcome measures when treating affective disorders, none of these included adult participants, and only one study looking at the treatment of symptoms related to autism included adult participants. As such, a subgroup analysis based on informant-report outcome measures was not completed.
Subgroup analysis using clinician-rated outcome measures across different age groups was possible, but only for studies that aimed to treat affective disorders. There was substantial variability that appeared due to genuine subgroup differences, rather than sampling error, I 2 = 80.2%, p = 0.006, and a large combined effect size in favour of CBT for studies involving children and adolescents, g = 0.95; 95% CI [0.55, 1.35], z = 4.64, p b 0.001, but not for studies involving adults,   (Chalfant et al., 2007;Wood et al., 2009) from the studies involving children and adolescents resulted in a lower but significant effect size, g = 0.67; 95% CI [0.42, 0.91], z = 5.28, p b 0.001. The comparison between studies involving children, adolescents and adults is inherently problematic and should be interpreted cautiously because only two studies involving adults were included (Fig. 9).

Publication bias
Visual inspection of Funnel plots did not reveal significant asymmetry for self-reported outcome measures used within studies that aimed to treat affective disorders. Fail-safe N was not calculated because CBT was not superior to control conditions. A similar analysis could not be completed for studies that focused on symptoms related to autism because there were less than ten.
Turning to informant-based outcome measures, used for both studies that focused on affective disorders and symptoms associated with autism, no significant asymmetry was found. For studies involving affective disorders, 281 new studies averaging a null result would be required to bring the overall treatment effect to non-significance. For studies targeting symptoms related to autism, 287 new studies averaging a null result would be needed to again bring the overall treatment effect to non-significance. These figures exceed 5n + 10, and the conclusion that these findings are robust to publication bias is valid.
Considering clinician-rated outcome measures, there was no significant asymmetry for studies that treated affective disorders, while a Funnel plot was not created for studies that treated symptoms of autism because there were fewer than ten. Fail-safe N revealed that 227 new studies averaging a null result would be needed to bring the treatment effect to non-significance calculated using clinician-rated outcome measures taken from studies that treated affective disorders. The effect calculated using clinician-rated outcome measures taken from studies treating symptoms associated with autism would become non-significant if only 18 papers averaging a null effect were published suggesting that this finding may be subject to publication bias and influenced by the fewer papers in this area.
Whilst it was not possible to examine task-based outcome measures for studies that treated mental disorder, for studies that focused on symptoms related to autism, because the number of papers was less than ten, a Funnel plot could not be created. However, fail-safe N revealed that only 5 new studies averaging a null effect size would bring the overall treatment effect to non-significance. This means that publication bias may feature, and the conclusions are heavily influenced by there being relatively few papers.

Discussion
The results of the meta-analysis indicated that cognitive behavioural therapy (CBT) is associated with a small to medium effect size when  used to treat co-morbid affective disorders with children, adolescents, or adults who have ASDs, but this varied according to whether the outcome data was taken from self-report, informant-report, clinician-report, or task-based measures. CBT was associated with a small and non-significant effect size, g = 0.24, when the analysis was completed using self-report measures, and associated with significant heterogeneity; when studies at risk of bias were excluded, resulting in low heterogeneity, treatment was associated with a small non-significant effect size, g = 0.09. CBT was superior to control conditions when the analysis was completed with either informant-and clinician-report measures, both being associated with a medium effect size, but there was significant heterogeneity; a sensitivity analyses reduced heterogeneity, and revealed that CBT remained superior, and was associated with a medium effect size of, g = 0.45, and, g = 0.59, respectively.
Turning to consider CBT for symptoms associated with ASDs, the findings from the meta-analysis were very similar to that found for CBT when used to treat co-morbid affective disorders. CBT, when used as a treatment for the symptoms of ASDs, rather than affective disorders, was associated with an effect size that ranged from small to medium, again, dependent upon the type of outcome measure used. Using data from self-report measures, CBT was associated with a small non-significant effect size, g = 0.25, and while heterogeneity was not significant, excluding studies at risk of bias to reduce heterogeneity reduced the effect size; it remained small and non-significant, g = 0.1. There was evidence that CBT was significantly beneficial when the analysis was based on informant-report measures, and resulted in a small effect size, g = 0.48, which increased to medium following our sensitivity analysis to account for heterogeneity, g = 0.52. Considering clinicianreport measures, CBT was found to be significantly superior, and associated with a medium effect size, g = 0.65. Following the exclusion of studies thought to be at risk of bias to reduce heterogeneity, CBT was no longer superior, and associated with a non-significant medium effect size, g = 0.44. Task-based measures, which are both less subjective and completed by the participant, were also evaluated to determine whether CBT is an effective treatment for symptoms of ASDs. The initial findings were significantly in favour of CBT as an effective treatment, and associated with a small effect size, g = 0.35, but the exclusion of studies thought to be at higher risk of bias, led to a non-significant treatment effect, falling in the small range, g = 0.3.
Sub-group analysis based on the age of the participants was not completed for self-report measures as there was no evidence that CBT was superior to control conditions, nor was this possible for informant-based measures, as few studies involving adults also included an informant-based measure. It was only possible to undertake a subgroup analysis for the treatment of affective disorders based on clinician-report measures, and the findings indicated that CBT was superior and associated with a large effect size, g = 0.95, when used with children and adolescents, while following our sensitivity analysis, this reduced to a medium effect size, g = 0.67. These effect sizes are lower than that previously reported by Sukhodolsky et al. (2013) and Kreslins et al. (2015), with both previous meta-analyses having included fewer studies. Turning to consider adults, the results indicated that CBT was not superior to control conditions, and was associated with a small effect size, g = 0.04; interpreting this result is problematic because it is only based on two published studies.
Within the current meta-analysis, and those completed previously which focused on the treatment of anxiety amongst children and adolescents (Kreslins et al., 2015;Sukhodolsky et al., 2013;Ung et al., 2014), there are substantial differences in treatment efficacy dependent upon the type of outcome measure included within the analysis. Self-report measures, in contrast to informant-and clinician-report measures, are not reliably associated with significant change following treatment. Within the current meta-analysis, this was the case for studies involving children, adolescents or adults who received treatment for affective disorders more broadly. This was also the case for studies where CBT was used to treat the symptoms of ASDs. As discussed previously by both Sukhodolsky et al. (2013) and Kreslins et al. (2015) it may be the case that individuals with ASDs have difficulties with reporting symptoms because of associated developmental challenges (e.g. communication problems) faced by this population leading to difficulties with reliably reporting symptoms. Interestingly, Kreslins et al. (2015) suggested that children with ASDs may confuse symptoms of anxiety and ASDs, which may lead to difficulties with completing self-report measures of anxiety. However, it is apparent that adults with ASDs also have these difficulties, as while there are few trials involving adults, those that have been completed had similar difficulties with the use of self-report measures. Alongside this, trials of CBT used to treat symptoms of ASDs, rather than affective disorders, have also encountered similar difficulties with self-report measures. It is perhaps probable that individuals with ASDs may find self-report measures difficult because of their associated developmental problems (e.g. perspective-taking, communication problems) and further work regarding the development of valid and reliable measures for use with this population is needed. However, it must also be mentioned that perhaps CBT does not bring about change for individuals with ASD, and the results using both informant-and clinician-report measures have been subjected to an observer-expectancy effect, considering that is very difficult to mask informants, and not all studies made use of masked assessors, introducing significant bias. While this may not explain all the variability within the data, it has a role to play, and as such, it is vitally important that future trials ensure that they make use of masked assessors and have satisfactory arrangements for independent data management.
Related to these difficulties, there were a variety of issues associated with the included studies, highlighted by the quality appraisal, which need to be considered further. First, the majority of the studies included involved small samples, and trials labelled as feasibility or pilot trials often had larger sample sizes than studies that were not identified as either a feasibility or pilot trial. Eight of the studies included in this meta-analysis had less than ten participants per group. This is problematic, as there are no large scale definitive trials in this area making use of robust methodologies. As such, the conclusions reached within this meta-analysis, and previous meta-analyses are potentially limited. This does not mean that the conclusions are entirely invalid, but it does allow some questions to be raised about validity, which could be addressed in the future with the completion of several large scale definitive trials by different research groups around the world. Related to these issues, the study by Chalfant et al. (2007) tended to have a relatively higher standardised mean difference. While this was a randomised trial, the accessors were not masked, and in fact were the actual therapists who carried out the intervention. Considering the lack of blinding and independent data management within this study, there is an inherent increased risk of bias. Several other studies included within this meta-analysis also had a relatively higher standardised mean difference (e.g. Wood et al., 2009), and the majority of them did not make use of independent data management and analysis, something we would strongly recommend for future trials in this area.
Second, studies often did not report sufficient information regarding participant engagement and fidelity, while third, there were issues with adequate allocation concealment that must be addressed within future studies. Fourth, it is important to note that ten studies were not randomised, and few reported that data were managed and analysed independently. Fifth, and again looking forward to the future, researchers in this area need to specify a primary outcome measure within their trials, and further work to develop valid and reliable measures of outcome for use with participants who have ASDs is needed. Sixth, it would be advantageous for researchers to describe their interventions more thoroughly or ensure that they are available for scrutiny, perhaps within public databases. Finally, it is recommended that future trials make use of and adhere to the CONSORT recommendations for reporting randomised control trials to help increase the quality of the evidence that is available.
There are a number of strengths associated with the current metaanalysis. Considering strengths, within the current meta-analysis, we attempted to include studies that aimed to treat affective disorders more broadly, rather than just anxiety, and included studies that were designed to evaluate CBT as a treatment for the actual symptoms or core features of ASDs. As such, our work is comprehensive, capturing studies that have attempted to make use of CBT with individuals with ASDs for a variety of problems and this is a marked strength over and above previously completed meta-analytic work. Alongside this, we have included studies with samples of children, adolescents, and adults, or mixed samples, while at the same time, undertaking a subgroup analysis to compare differences between children/adolescents and adults, considering the developmental differences between these populations which may have an impact upon the process of engaging in and completing therapy. We have also made use of an appropriate analytic strategy, and made use of independent reviewers for both screening and the quality appraisal. As such, the current meta-analysis is the most comprehensive to date, covering CBT used to treat either affective disorders or symptoms of autism.
Turning to consider weaknesses, there are a variety of problems with many of the included studies which have been mentioned in the preceding paragraph, and these problems need to be considered when interpreting the results of this meta-analysis. While this does not necessarily invalidate our conclusions, it must be considered when interpreting the findings and considering future research.
We would suggest that future studies in this area adhere to following recommendations, (a) small-scale studies should be clearly described as feasibility or pilot trials, (b) methods and interventions should be described fully, in line with CONSORT recommendations. Standardised reporting and a more uniform approach to study design would help to minimise heterogeneity across studies, (c) appropriate allocation concealment, randomisation, blinding procedures and independent data management should be considered a priority and should be described fully, (d) where possible, consistent usage of pre-existing outcome measures across studies would be beneficial in order to increase comparability across trials, (e) researchers should specify a primary outcome measure a priori, and (f) participant engagement and fidelity should be clearly reported. Looking forward to the future, considering the marked number of small trials, well-designed definitive trials from different research groups around the world are needed in order to demonstrate that CBT is an empirically validated treatment use with people who have ASDs. To date, there has only been a single definitive trial within this area (Freitag et al., 2015).
Bearing the aforementioned recommendations for future studies in mind, and considering the conclusions from both the current and previous meta-analyses, CBT is at least associated with a small non-significant effect size, and at best, associated with a medium effect size, depending on whether you ask those receiving the treatment, those supporting the treatment, or those delivering the treatment. There are three further comments we would like to add to help in the design of future studies, including the interventions. First, there have been a variety of modelling and pilot studies across different countries, but very few researchers have developed interventions within the spirit of co-production with people with autism and their families. Co-production means working together with those who will receive the intervention when developing and running a clinical trial to ensure that those who are likely to receive the intervention have also genuinely helped design the intervention. While some studies employed this, if used more commonly, such a strategy would lead to improved engagement and outcomes, especially from the point of view of children and adults with autism.
Second, many of the reviewed studies focused on delivering groupbased interventions for a variety of different problems. While delivering interventions in a group may be more cost effective, this may not be associated with greater effectiveness. The reason for this is that co-morbidity is high amongst people with autism, and within a group there may be participants who have obsessive-compulsive disorder, social phobia, generalised anxiety disorder, depression, or many other psychiatric problems, in addition to the difficulties associated with autism itself. While there are marked similarities, cognitive behavioural therapy for depression is different than cognitive behavioural therapy for obsessive compulsive disorder, and delivering interventions within a group may have prevented therapists form being able to tailor the intervention to address the needs of each individual within the group adequately. Related to this, there are some individuals with ASDs who may be unable or unwilling to access group-based interventions. As such, we recommend that researchers begin to focus more heavily on formulation-driven and trans-diagnostic interventions delivered with individuals, rather than within a group, bearing in mind that there is evidence that individually delivered CBT is associated with stronger effect sizes than group-based CBT for people with intellectual disabilities, another group which tends to have marked co-morbidity (Vereenooghe & Langdon, 2013).
Finally, little to no attention has been paid to therapist competence within this area, including therapist style, integrity, alliance and experience, all of which has been linked to outcomes in a variety of studies involving people without ASDs (Brown et al., 2013;Haug et al., 2016;Muse & McManus, 2013;Podell et al., 2013). Further research is needed into these factors within studies involving people with ASDs in order to potentially help improve outcomes. Related to this, little attention has been paid to the accreditation of cognitive behavioural therapists within the literature. While behavioural therapists are certified through the Behaviour Analyst Certification Board®, those offering cognitive behavioural therapy are not certified in a similar manner in many jurisdictions. In some countries, such as the United Kingdom, there are organisations which accredit cognitive behaviour therapists, namely the British Association for Behavioural and Cognitive Psychotherapies (BABCP), but this does not mean that therapists have appropriate clinical expertise and experience of working with people who have ASDs in order to ensure that they are able to adapt therapy in a way that is likely to be efficacious. Further, while CBT should be adapted to meet the needs of those with ASDs, we still know relatively little about the effectiveness of many of these adaptations, as they have not been investigated using experimental designs to determine whether they lead to substantial improvements in treatment engagement and outcome. While future definitive trials are certainly needed within this area, alongside this, we also need greater experimental work examining the effectiveness of various adaptations to CBT for use with people who have ASDs.