Internet-Delivered Psychological Treatments for Mood and Anxiety Disorders: A Systematic Review of Their Efficacy, Safety, and Cost-Effectiveness

Background Greater access to evidence-based psychological treatments is needed. This review aimed to evaluate whether internet-delivered psychological treatments for mood and anxiety disorders are efficacious, noninferior to established treatments, safe, and cost-effective for children, adolescents and adults. Methods We searched the literature for studies published until March 2013. Randomized controlled trials (RCTs) were considered for the assessment of short-term efficacy and safety and were pooled in meta-analyses. Other designs were also considered for long-term effect and cost-effectiveness. Comparisons against established treatments were evaluated for noninferiority. Two reviewers independently assessed the relevant studies for risk of bias. The quality of the evidence was graded using an international grading system. Results A total of 52 relevant RCTs were identified whereof 12 were excluded due to high risk of bias. Five cost-effectiveness studies were identified and three were excluded due to high risk of bias. The included trials mainly evaluated internet-delivered cognitive behavioral therapy (I-CBT) against a waiting list in adult volunteers and 88% were conducted in Sweden or Australia. One trial involved children. For adults, the quality of evidence was graded as moderate for the short-term efficacy of I-CBT vs. waiting list for mild/moderate depression (d = 0.83; 95% CI 0.59, 1.07) and social phobia (d = 0.85; 95% CI 0.66, 1.05), and moderate for no efficacy of internet-delivered attention bias modification vs. sham treatment for social phobia (d = −0.04; 95% CI −0.24, 0.35). The quality of evidence was graded as low/very low for other disorders, interventions, children/adolescents, noninferiority, adverse events, and cost-effectiveness. Conclusions I-CBT is a viable treatment option for adults with depression and some anxiety disorders who request this treatment modality. Important questions remain before broad implementation can be supported. Future research would benefit from prioritizing adapting treatments to children/adolescents and using noninferiority designs with established forms of treatment.


Introduction
A pressing challenge for mental health services is meeting the demand for the treatment of depression and anxiety disorders. Nearly 40% of the population is estimated to be in need of treatment at some time during their life for anxiety or depression [1]. Each year 14-18% of the population across the age span suffer an anxiety disorder and 7-9% suffer from depression in the United States as well as in Europe [1,2]. Thus, meeting the needs of people suffering anxiety and depression with the current delivery methods is a gargantuan task [3][4][5].
Only one third of depressed patients respond fully to pharmacotherapy [6] and patients prefer psychological to pharmacologic treatment for depression and anxiety at a 3:1 rate [7]. Fortunately, cognitive behavioral treatments are helpful for anxiety and depression for adults [8][9][10] and for children and adolescents [11]. Other psychological therapies such as interpersonal and psychodynamic therapies have also been reported to produce significant improvements [10,12,13]. However, limited access to qualified therapists restricts the utility of psychological treatments. In fact, of those with a serious problem as many as 50% in developed and 85% in undeveloped countries will simply go untreated [14]. Of those who do receive treatment, rates of quality care are moderate to low for anxiety disorders [15].
The internet has offered a new avenue for providing psychological treatments, but the effectiveness of these treatments is still an issue. Most reviews to date have found support for the use of internet-delivered cognitive-behavioral therapy (I-CBT) [16][17][18]. For example, a meta-analytical review found that I-CBT was helpful for four distinct disorders [19]. Similarly, Hedman et al. [20] reviewed randomized controlled trials (RCTs) of I-CBT and reported large effects for depression, social phobia and panic disorder. While ambitious, extant reviews nevertheless fail to address some key issues.
First, the quality of the evidence needs to be carefully considered. In previous reviews, when used at all, quality assessments were restricted to a few indices of the internal validity of the individual studies. A proper assessment of risk of bias is essential to avoid the risk of drawing false conclusions, however, and it can be justified to exclude studies of higher risk of bias from the synthesis [21]. A recent example is a Cochrane review that found moderate clinical effect of exercise on depression when including all relevant trials regardless of risk of bias [22]. However, when restricting the analysis to the trials with low risk of bias, the estimate indicated only a small effect of exercise that did not reach statistical significance.
Furthermore, investigators that conduct systematic reviews and meta-analyses are increasingly aware that not only individual studies but also the body of evidence needs to be systematically evaluated, because the confidence in the pooled effect estimates may be compromised not only by risk of bias in individual studies but also by several other factors (e.g., imprecision, inconsistency, indirectness, and publication bias) [23]. The issue of quality assessment is compounded further if reviews are conducted by the trial authors themselves [20,24]. For example, the Cochrane Collaboration requires an independent assessment of eligibility and risk of bias by a second author not involved in the study/ studies due to potential conflicts of interest [25]. Also, as experts in the content area under review they may have pre-formed opinions that can influence their assessments [26]. Given that the extant reviews were conducted by the trial authors themselves, the field would gain additional credibility from an independent evaluation.
Second, the issue of noninferiority has been largely ignored in previous reviews, but is necessary when comparing an existing evidence-based treatment (e.g., CBT) with a new one (e.g., I-CBT). In contrast to investigations of psychological therapy that involve new methods in areas where there is no known evidencebased treatment, the internet programs wisely employ known treatment techniques; only the manner of treatment delivery is altered. A greater reach and eventual cost savings could make internet therapies viable alternatives in healthcare. A critical issue, then, is whether they are noninferior to existing treatment. Noninferiority trials have gained increased attention to help in clinical decision making as the list of possible treatments grows, since a new treatment should be at least not inferior to existing evidence-based ones [27]. The methodology for noninferiority trials differ from superiority trials [27] and there is a need to review the literature from this perspective. Previous reviews on internet-delivered treatments generally conclude that these treatments have effects equivalent to the established forms of treatments [18][19][20]28]. However, the absence of a significant difference between two treatments in a clinical trial is not the same as a proof of noninferiority. Furthermore, formal indirect comparisons of treatment effect estimates between trials are only appropriate if the new and established treatments were compared against a reference that is similar both in methods and population [29], which, in this case seems to be a indeterminate presumption. We therefore believe that the field is ripe for an analysis that elaborates on the issue of noninferiority vs. superiority.
Noninferiority trials are difficult to design and execute well [27]. Circumstances that strengthen inferences about superiority, because they increase similarities across treatment arms, can have the reverse effect on inferences of noninferiority. If a novel treatment is in fact inferior to established treatments, a trial with a sloppy design will be biased against finding this difference [27]. Superiority trials mainly use intention-to-treat (ITT) samples whereas noninferiority should be demonstrated also in the perprotocol analysis because an ITT analysis tends to dilute differences. Furthermore, there should be a fairness of comparisons between the new and established treatment, such that the established treatment is implemented rigorously under conditions that do not compromise the assay sensitivity. For example, if many subjects in a trial have previously failed to respond to the control treatment, there would be a bias in favor of the new treatment [30]. Noninferiority trials could also provide data for whether internet therapies are cost-effective, with important implications for healthcare.
Third, the previous reviews have largely ignored potential adverse events (e.g., harms, side effects, and deterioration), which may prove important for implementation of remotely delivered psychological treatments. Finally, reviews to date have focused on CBT, while trials of other treatments have begun to emerge [31].
The current review addresses all of the above issues. It has been conducted under the auspices of the Swedish Council on Health Technology Assessment (SBU), a government agency that has produced numerous systematic reviews evaluating the effects of various treatments (www.sbu.se/en/). The overall aim of this report is to provide a systematic review of the literature evaluating internet-delivered psychological treatment for mood and anxiety disorders with attention to methodological quality, consideration of the noninferiority perspective, and with ratings of the quality of the evidence using Grading of Recommendations Assessment, Development and Evaluation (GRADE) [23] by a freestanding council. Specifically, the following questions guided the review (additional questions were addressed in the governmental report): 1. Is internet-delivered psychological treatment efficacious, safe and cost-effective for mood and anxiety disorders in children, adolescents and adults? 2. Is internet-delivered treatment noninferior to established psychological treatments?

Protocol and registration
This systematic review was conducted at SBU. The inclusion criteria were pre-specified and a protocol was registered in advance internally at SBU (ref. no UTV2012/26), see Protocol S1.

Eligibility criteria
Only published studies in English were considered for this review. The criteria for eligibility included the following characteristics.
Patients. Children, adolescents and adults with anxiety or mood disorders according to the manuals of the American Psychiatric Association [32] and the World Health Organization [33]. The specific diagnoses included were major depressive disorder, dysthymia, bipolar disorder, social phobia, panic disorder, generalized anxiety disorder (GAD), posttraumatic stress disorder (PTSD), obsessive-compulsive disorder (OCD), specific phobia, and separation anxiety (in children and adolescents). Studies were excluded if the participants were selected primarily because of a specific physical illness.
Interventions. Internet-delivered psychological treatments, defined as interventions based on an explicit psychological theory, not conducted at a clinic, and delivered to the patients via the internet. Any support had to be remotely delivered (e.g. email-like messages or telephone). The degree of support was categorized into pure self-help (no support), technician-assisted (e.g., nonclinical), or therapist-guided (i.e., clinical support).
Comparator. Any established psychological treatments, waiting list, usual care, or attention control.
Outcome. Change in symptoms of the primary disorder, adverse events, and cost per effect and per quality-adjusted lifeyears.
Study design. For short-term effects and risk of adverse events only RCTs were included. For long-term follow-up assessments (i.e., $6 months after post-assessment) RCTs and observational studies were included because of the ethical and practical dilemmas of conducting long-term RCTs. For costeffectiveness data, economic evaluations based on individual-level data and decision models were eligible.

Information sources
Electronic searches were conducted using Medical Subject Headings (MeSH) and relevant text word terms. The databases used were PubMed, Cochrane Library, CINAHL, PsycINFO, Psychology and Behavioral Sciences Collection (PBSC), TRIP database and CRD, up to March 4, 2013.

Search strategy
We used search terms for depression/mood and anxiety and for each disorder (e.g., panic, phobia), for a range of delivery methods (e.g., online, internet, web, computer, phone), and for therapy, psychotherapy, intervention, and terms for specific interventions (e.g., cognitive behavioral, psychodynamic, interpersonal). The detailed search strategies are found in Appendix S1.

Study selection
Two reviewers independently screened the titles and abstracts identified by the search strategy. All studies of potential relevance according to the inclusion criteria were obtained in full text and two reviewers independently assessed them for inclusion. Any disagreements were resolved by discussions. Reference lists were screened for additional studies of relevance. Appendix S2 lists the efficacy and cost-effectiveness reports that were excluded after fulltext reading.

Data collection process
From each included study of moderate or low risk of bias (see below), data was extracted and inserted in a table by one reviewer. A second reviewer audited the data extraction. Any disagreements were resolved by discussion.

Risk of bias in individual studies
Two reviewers independently assessed the risk of bias with the use of checklists developed for each relevant study design [34]. Risk of bias is the systematic tendency that any aspect of the study makes the estimated treatment effect deviate from its true value, that is, the extent to which results of an included trial can be believed. The checklist for RCTs used hereinis highly similar to the Cochrane Collaboration's tool for assessing risk of bias [26] and includes 31 items to consider for the randomization (methods and outcome; 3 items), treatment (blinding, compliance, therapists, confounds; 5 items) and assessment (blinding, reliability, validity, timing, analysis; 9 items) of the participants, dropout (size, balance, covariates, analysis; 5 items), reporting bias (protocol, primary/ secondary outcome, adverse events, assessment, 6 items), and conflicts of interest (3 items). A rating of low, moderate or high risk of bias was given to each category of items and was combined into a global rating of the trial.
Trials that had a serious flaw were rated as high risk of bias; trials that met all or nearly all criteria were rated as low risk of bias, such as trials with a convincing comparator (e.g., an established treatment or a sham versions of attention bias modification) and no other obvious risk of bias; the remainder were rated as moderate risk of bias. Trials of moderate risk vary in their strengths and weaknesses: some trials likely provide valid results while others are only possibly valid. A high-risk trial is not valid; the results are at least as likely to reflect flaws in the study design as true differences among the trial arms. A fatal flaw may be reflected by one aspect introducing a high risk of bias or by failure to meet combinations of item criteria. The reviewers agreed on rules-ofthumb for decisions on categories and alert attention to trials that had, for example, N,30, dropout .20%, or unbalanced baseline characteristics. We included trials for the evaluation of long-term effects if they had a dropout rate of less than 30% and reported on other treatments during the follow-up period. For health-economic studies to be included they had to report both costs and effects. Any disagreements were resolved by consensus or by arbitration by a third reviewer. If necessary, study authors were solicited to provide additional information. Only studies with low or moderate risk of bias were used for further synthesis.

Planned method of analysis
We included as noninferiority designs all comparisons of internet-delivered treatment vs. established psychological therapies (e.g. I-CBT compared with individual therapist-led CBT). For these comparisons we used a predefined noninferiority margin of d = 20.2, chosen because it relates to a small effect size [35] and to ensure that noninferior treatments would retain an advantage over no treatment. All other comparisons were evaluated as superiority designs. Meta-analyses were carried out in RevMan 5. The calculations of the standardized mean differences were based on the groups' sample sizes, means and standard deviations at posttreatment. If the number of participants at post-treatment were not reported, the group sizes at randomization were used. Random effects models were used. All effect sizes in this report refer to between-group effects. Costs were converted to USD and the 2013 price-level [36].

Publication bias
Potential publication bias was assessed for plausibly effective interventions by inspecting funnel plots and by a trim-and-fill procedure [37], which yields an estimate of the effect size after taking bias into account (analyses performed in Comprehensive Meta-Analysis v2, Biostat Inc.).

Quality of evidence (GRADE)
The international grading system GRADE [23] was used to assess the quality of evidence for effects and safety with regard to groups of studies relevant to each treatment and support type, population, and disorder, according to the following four levels: N High quality (››››) -We are very confident that the true effect lies close to that of the estimate of the effect.
N Moderate quality (›››#) -We are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
N Low quality (››##)-Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect.
N Very low quality (›###) -We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of the effect. In the GRADE system, evidence based on RCTs begins as high quality evidence, but may be rated down for several reasons, including study limitations, inconsistency of results, indirectness of evidence, imprecision or reporting bias. That is, for each type of treatment and support type, for each disorder and population, the quality of the evidence was assumed to be high at the outset, but subsequently rated down if there were limitations in the relevant studies. For example, there were three trials of CBT with clinical support vs. waiting list for adult participants diagnosed with panic disorder. Evidence of treatment efficacy start as being of high quality because the trials were RCTs, while study limitations (waiting list comparison [WLC]), inconsistency in the results (two trials show favorable effect, one shows no effect), and imprecision across studies (all three trials have small samples) entailed that the body of evidence finally received a low-quality rating. The quality of evidence was decided upon through discussions among the authors and input from an external group, the Quality and Priority Group at the agency. In line with agency guidelines we rated down for indirectness when only one RCT was included for a specific question, unless the included RCT was a multi-center trial.

Results
We identified 52 relevant trials (54 reports), whereof 12 trials (13 reports) were excluded due to high risk of bias. The efficacy data thus included 39 reports with 40 RCTs of low or moderate risk of bias and 2 additional reports of long-term follow-ups of these trials that were included in the synthesis (Figure 1). Most trials recruited volunteers via advertisements, evaluated variations of therapistguided I-CBT in self-help format carried out over 8-12 weeks and used a WLC ( Table 1). The support was delivered via phone or email-like messages and took approximately 10-20 minutes per participant and week. Diagnoses were made mainly by using the MINI neuropsychiatric interview or the Structured Clinical Interview for DSM Axis-I Disorders (SCID) and the screening was performed in person or via telephone. The majority of the trials (88%) were conducted by teams from Australia or Sweden.

Mood disorders in adults
Nine trials were identified: eight had moderate risk of bias [31,[38][39][40][41][42][43][44] and one had high risk of bias (Appendix S2). The participants fulfilled criteria for a depressive episode, current or in partial remission, recurrent episodes, or dysthymia. No trials for bipolar disorder were found. Six trials included only participants with mild to moderate depression and six trials excluded participants who reported suicidal ideation.
None of the trials assessed noninferiority. Five evaluated the effect of I-CBT vs. a WLC [41,42,44], WLC and weekly symptom ratings [39], or WLC and access to an online discussion group [40]. We found a large pooled effect for I-CBT as compared to a WLC ( Figure 2). The quality of evidence was rated as moderate for therapist-guided I-CBT due to study limitations (WLC), see Table 2.
Three other trials were included that evaluated one intervention each: one trial with an intervention that combined components from acceptance and commitment therapy, behavioral activation, and mindfulness [38]; one with internet-delivered psychodynamic therapy (I-PDT) [31], and one with therapist-led I-CBT delivered via a chat interface [43]. For each intervention, the quality of evidence was rated as very low. Four long-term follow-up assessments (five reports) were assessed as having a high risk of bias [31,39,40,44,45].
Eight trials with moderate risk of bias evaluated the effect of therapist-guided I-CBT compared to a WLC [49][50][51][52][53][54][55][56]. The treatments conferred a large effect compared to WLC (Figure 3). The quality of evidence for therapist-guided I-CBT was rated as moderate due to study limitations (WLC). One report also evaluated whether therapist-guided I-CBT was superior to bibliotherapy [56]. I-CBT was not found to be superior to bibliotherapy. The quality of evidence for guided I-CBT vs. bibliotherapy was rated as very low due to imprecision (small sample) and indirectness (single trial).
Two trials (three reports) [56,58,59] included long-term followups of the treatment groups were assessed as having moderate risk of bias. Their findings suggested that participants' improvements persisted after 30 months [58] and 1 and 5 years [56,59]. The quality of evidence was assessed as very low due to risk of bias and imprecision. One trial found that unguided I-CBT was not superior to a WLC [53]. The quality of evidence for unguided I-CBT for social phobia was rated as very low due to study limitations (WLC), imprecision (small sample), and indirectness (single trial).
Three trials, two of low [46,48] and one of moderate risk of bias [57], compared internet-delivered Attention Bias Modification (I-ABM) to an identical sham intervention. We found no clinically relevant pooled effect (Figure 4 includes one of three primary outcomes; plots were nearly identical for the Social Phobia Scale and the Liebowitz Social Anxiety Scale). The quality of evidence was rated as moderate for a lack of clinically meaningful effect of I-ABM (rated down due to imprecision, i.e., small sample).
Panic disorder. Nine trials of I-CBT were identified. Five trials had moderate risk [60][61][62][63][64] and four had high risk of bias (e.g., due to differences among groups in baseline characteristics, sample sizes, dropout). One trial found no difference between therapist-guided I-CBT and live group CBT in participants recruited from a clinical population (d = 0.00) [60]. Noninferiority was not established as the 95% CI (20.41 to 0.41) included our predefined noninferiority margin of d = 20.20. One trial found no difference between I-CBT and live individual CBT [63]. This trial was not designed as a noninferiority trial and the small sample limits the inferences to be made. The quality of evidence for the noninferiority of I-CBT vs. either individual or group CBT thus was rated as very low due to study limitations (e.g., insufficient information about treatment integrity), imprecision (small sample), and indirectness (single trial).
Three trials that compared therapist-guided I-CBT vs. a WLC with [64] or without [61,62] online information about panic found small to very large effects. No meta-analysis was undertaken because of the heterogeneity in outcome measures and effect sizes. We rated the quality of evidence as low because of study limitations (WLC, dropout) and imprecision (heterogeneous effect sizes, small samples).
Generalized anxiety disorder. Four trials with moderate risk of bias were identified [65][66][67][68]. They evaluated therapistguided I-CBT vs. a WLC. The pooled effect was large although heterogeneous across the trials ( Figure 5). We rated the quality of evidence as low for the short-term effect because of study limitations (WLC) and imprecision (heterogeneous effect sizes, small samples).
One trial found that I-CBT with non-clinical support by a technician was more effective than a WLC [67]. One trial included therapist-guided I-PDT [65]. As for the I-CBT condition in this trial no effect was found for I-PDT vs. WLC. The quality of evidence for I-CBT with non-clinical support and therapist-guided I-PDT was rated as very low because of study limitations (WLC, only one technician), imprecision (small sample), and indirectness (single trial).
Specific phobia. We identified one trial of moderate risk of bias [69]. Four weeks of therapist-guided I-CBT did not outperform brief therapist-led exposure (one introductory session and one three-hour exposure session) according to a behavioral approach test in participants with spider phobia. The quality of evidence was rated as very low due to study limitations, imprecision (small sample), and indirectness (single trial).
Posttraumatic stress disorder. Two relevant trials were identified; one trial with high risk of bias and one trial with moderate risk of bias that found that therapist-guided I-CBT was superior to WLC [70]. The quality of evidence was rated as very low due to study limitations (WLC), imprecision (small sample), and indirectness (single trial).
Obsessive-compulsive disorder. We identified one trial of moderate risk of bias [71]. Therapist-guided I-CBT conferred a large effect compared to supportive therapy online. The quality of evidence was rated as very low due to study limitations (no credible active comparison condition), imprecision (small sample), and indirectness (single trial).

Transdiagnostic
interventions for anxiety and depression. Six trials of moderate risk of bias were identified that included participants with mixed anxiety disorders and/or MDD [72][73][74][75][76][77]. Five trials found that therapist-guided I-CBT had moderate or large effects as compared to a WLC [72][73][74]76]. No meta-analysis was performed due to the heterogeneity in outcome measures, diagnoses, and treatment protocols. We rated the   Psychotropic medication, although this was not stated explicitly in each report. c Includes major depressive episode acute/in partial remission, major depressive disorder, dysthymia; mainly mild/moderate severity, low suicidality. d Mainly GAD, social anxiety disorder, panic disorder; also major depressive disorder (k = 2). doi:10.1371/journal.pone.0098118.t001 quality of evidence as low for these interventions because of study limitations (WLC) and imprecision (small samples, heterogeneous effects and interventions). One trial included a 1-and 2-year follow-up, with results suggesting that the improvements lasted throughout the follow up [74]. The quality of evidence was rated as very low due to study limitations (observational design, dropout) and imprecision (small sample). One trial recruited participants from an anxiety clinic and found no difference between unguided I-CBT and a WLC on the Patient Global Impression scale [75]. The quality of evidence was rated as very low due to study limitations (WLC, dropout), imprecision (small sample), and indirectness (single trial).

Publication bias
Funnel plots and Duval and Tweedie's trim-and-fill procedure indicated no or trivial publication bias with respect to the pooled effect sizes for I-CBT for adults with depression, social phobia, and GAD.

Children and adolescents
We found four trials and excluded three due to high risk of bias because of various shortcomings. One trial of moderate risk of bias evaluated I-CBT for mixed anxiety disorders [78]: 30% of the completers did no longer fulfill criteria for their primary anxiety diagnosis, compared to 10% in the WLC. The quality of evidence was rated as very low for the efficacy of internet-based psychological interventions for children and adolescents ( Table 2).

Risk of adverse events
Eight trials provided information on intervention-associated risks for depression [31,38], social phobia [46,54], GAD [65], OCD [71], and transdiagnostic treatments [74,76]. The information provided was related to a worsening in symptoms and indicated that symptom worsening was present in 0-5% of treated participants and in 2-9% of participants in the comparison groups. The quality of evidence was rated as very low for the risk of adverse events following internet-based psychological interventions for both children and adults (›###).

Cost-effectiveness
Of the 139 studies screened for cost-effectiveness data, five trials met the eligibility criteria. Two had a moderate risk of bias [79,80] and three were excluded due to high risk of bias (e.g. incomplete information on costs; Appendix S2). One trial compared costs and effects between I-CBT and treatment as usual while on waiting list among patients with depression [80], and found that I-CBT had a cost per QALY of 29,384 USD compared to treatment as usual. At a willingness-to-pay for a QALY of 50,000 USD the probability was approximately 70% that I-CBT was cost-effective compared to treatment as usual. One trial compared costs and effects between I-CBT and group CBT among patients with social phobia [79]. Compared to group CBT, I-CBT was associated with a lower cost per patient of 1,422 USD and 19% greater improvement on LSAS at the six-month follow-up. At a willingness-to-pay per additionally improved patient of 3,000 USD, the probability that I-CBT was cost-effective compared to group CBT was approximately 90%. The calculations of QALYs had not taken the time aspect of the effect on quality of life into account and are not presented.

Discussion
In this review we assessed whether internet-delivered psychological treatments for mood and anxiety disorders are efficacious, noninferior to established treatments, associated with risk of adverse events, and cost-effective. We found limited to moderate evidence that for adults who seek out this treatment, therapistguided I-CBT has a favorable short-term effect compared to waiting list for social phobia, panic disorder, generalized anxiety disorder, or mild to moderate major depression. We were not able to draw conclusions about noninferiority to proven treatments, long-term effects, adverse events, cost-effectiveness, or efficacy when given to children and adolescents.
Several reviews interpret the body of evidence such that I-CBT and established forms of CBT have comparable effects for mild to moderate depression and several anxiety disorders [19,20,81]. In contrast, we found insufficient evidence to conclude whether I-CBT is noninferior to face-to-face treatment. There are important aspects that need to be attended to with regard to the methodology, and ethics, of conducting trials with a placebo/notreatment arm when there are existing evidence-based treatments [82,83]. These issues notwithstanding, we found few trials that compared I-CBT to a face-to-face treatment. These trials were generally not adequately designed to evaluate questions of noninferiority [27,84], with an exception of one trial, which provided tentative support for similar efficacy of therapist-guided I-CBT and group CBT for social phobia in adults [47].
There are a number of shortcomings with the existing trials that future studies would benefit from attending to. A common issue to these studies is that they were conducted by teams that developed the I-CBT program but had no role in developing the comparison face-to-face therapy. In addition, independent ratings of the quality of delivery of the therapy were not routinely included. Further, the face-to-face comparator was often group CBT and not individual CBT although the latter is generally the first-hand choice for anxiety and mood disorders [85,86]. The guidelines from the National Institute for Health and Care Excellence   (NICE) do not support the notion of equivalence between internetdelivered and face-to-face treatment for social phobia [86], in part due to the aforementioned issues. More aptly designed trials are needed before we can answer clearly whether internet-delivered treatments are noninferior to face-to-face treatment. Furthermore, the lack of comparisons with established treatments provide scant data for cost-effectiveness analyses. Consequently, this review can provide no conclusions about the cost-effectiveness of I-CBT. The diverging conclusions among extant reviews about equal efficacy between internet-delivered and face-to-face treatments highlight critical methodological aspects that set the present review apart from previous reviews [19,81,87]. First, we used rigorous criteria for establishing noninferiority, whereas previous reviews seemingly have relied on subjective and indirect appraisal of the effect size differences. Second, we performed a systematic assessment of the body of evidence for each disorder [23] whereas previous reviews either used no formal assessment or relied only on the criteria stated by Chambless et al. [88], which indicate as evidence-based treatment any treatment that have been found superior to any comparison condition in two RCTs. The grading of the body of evidence that we used here entailed a reduced confidence in the results mainly due to the fact that studies were unblinded, used subjective outcome measures, were designed with waiting list or similar comparison groups, and included relatively small samples.
Third, we performed a comprehensive assessment of the risk of bias in the trials and excluded trials with high risk. Few trials for social phobia and depression were judged as having high risk of bias, which resulted in similar conclusions about short-term efficacy as the meta-analysis by Andrews et al. [19] Similarly, for PTSD, OCD, specific phobia, and transdiagnostic treatments only one trial (for PTSD) was excluded due to high risk. However, excluding high-risk studies resulted in fewer trials and a lower grading of the evidence for panic disorder than stated in previous reviews [19,89]. Of the four excluded publications on panic disorder two publications were from 2001, one from 2006, and one from 2008 (see Appendix S2). Given the technical progress in the field and that the reports represent studies planned and performed some years before publication, at least the 2001 publications are among the first in an emerging field and would have less resemblance of current and future practice of internetdelivered treatment packages.
We found only four relevant trials for children and adolescents, and three had high risk of bias. The three excluded trials concerned social anxiety, OCD, and diverse anxiety disorders (mainly GAD), respectively. Including them would not alter our conclusions. This turnout seems to reflect the slow progress in general among psychological interventions for children and adolescents [90]. Although the low number of studies precludes quantitative meta-analysis, an equally important objective of a systematic review is to identify gaps in the literature. This could alert researchers and funding agencies to important research questions that are not given sufficient attention. The effect of internet-delivered interventions in general may be smaller among children [91], which stresses the need for more research specifically for this population.
Finally, the trials included in this review may seem few in comparison to the expanding number of publications in the literature. However, we only included studies of participants with diagnosed mood and anxiety disorders. There are many other studies on internet-treatments in which participants have not been subjected to a diagnostic interview. Several of those trials used unguided interventions, which may explain why so few trials of unguided interventions were included. Also, we did not pool studies across treatments and support types, or across disorders,  and therefore each cluster of studies yielded a modest number of trials despite an impressive amount overall.
Remote delivery is one of several promising avenues for expanding the reach of psychological interventions [5]. Indeed, a key impetus in much of the reviewed research is to improve accessibility to CBT [63] and attract those normally too shy to seek treatment and those without access to CBT [92]. A central question, therefore, is whether internet-delivered treatment indeed attracts an underserved population. Among the trials of I-CBT for depression, 53-61% of participants had a history of psychological treatment [39,40,44]. Among the anxiety trials 16-66% of participants had previously received psychological treatment [49][50][51][52]54,56,71] and one-fourth had received CBT [63,71]. The data indicate that many depression trial participants already had access to treatment whereas this seemed to be less clear for anxiety disorders. The high level of educational attainment and employment rates among the participants raise concerns about whether the effects found in most RCTs can be generalized to those who today are underserved. Other questions also likely to be important to generalization concerns how these treatment programs can be implemented within the healthcare services and what type of changes that would be needed; for example, the training of existing therapists. Expanding the reach of psychological treatments is important [5]. We therefore concur with the NICE guidance [86] and hope for further research that attend to these issues in more detail.
Several trials assessed long-term outcomes of the treatments. Yet, no clear conclusions could be drawn about long-term effects as these data were limited mainly due to the observational design, attrition, and the lack of data on participants' receipt of other treatments during the follow-up period. Only eight efficacy trials reported on deterioration, and no trial suggested that adverse events in a broader sense had been monitored. There is clearly a need for better reporting of risk of safety data [93,94]. Currently, the risk of reporting bias precludes conclusions about the riskbenefit ratio of the treatments, which is an important aspect of comparing treatments.
Correlational evidence suggests that therapist guidance is beneficial for the outcome [16,95,96]. Less extensive support without adequate oversight of the patients' mental health status could also compromise patient safety. We therefore emphasize that evidence was found only for therapist-guided treatments. The lack of efficacy of I-ABM (also seen in a trial published after our final search [97]) compared to the effects of ABM in the laboratory [98], and of other remotely delivered therapies [99] further indicates the importance of attention to details about how interventions are delivered.
We believe that using only trials with low or moderate risk of bias is an improvement to previous reviews. We are mindful of the fact that the ratings of risk of bias were subjective, which hampers the replicability of our findings. However, it is broadly recognized as poor review practice to disregard study quality altogether [26], for example, because of the impact of quality on effect estimates [28]. Instead of choosing a threshold approach, a qualityweighting approach can be used whereby low quality studies are included in the review and their influence is analyzed, thus avoiding selection bias. However, the assignment of quality weights is still fraught with subjectivity: unless rigorously implemented, it might increase the risk of over-inclusion bias and may result in inconsistency [100]. In addition, the use of simple scoring sheets for assessing bias is not recommended [26]. To minimize the uncertainty due to subjective judgments we performed the ratings according to best practice: The risk of conflicts of interests were minimized by the choice of independent reviewers and we used comprehensive score sheets developed for risk of bias ratings in individual trials and dual review; and we used the GRADE model for the overall assessment of the evidence [23]. Our ratings of the strength of evidence are related not only to specific treatment packages and comparison conditions ( Table 2), but also to the particular population of adults seeking out treatment themselves. The majority of the included trials were conducted in Sweden or Australia, which greatly increases external validity within these countries; however, also warranting caution before extrapolating these findings into healthcare services in other countries and cultural settings.

Conclusions
I-CBT for adults with mild to moderate depression and select anxiety disorders may complement existing services. More research is needed before conclusions can be drawn about the efficacy of internet-delivered treatment regarding other anxiety disorders, other treatment methods than CBT, the treatment of children, long-term effects, safety, cost-effectiveness, and noninferiority to proven forms of treatment. We believe that a shift is warranted from waiting list trials to using active comparators, particularly direct comparisons with established treatments. Nonetheless, more research is needed to understand what makes psychological treatments effective, and for whom. This field unfolds rapidly, however, and it may not be long until remaining questions can be satisfactorily answered.

Supporting Information
Appendix S1 Detailed search strategy.

(DOCX)
Appendix S2 List of reports excluded after full-text reading. (DOCX) Figure 5. Short-term efficacy of internet-based cognitive behavioral therapy (I-CBT) vs. waiting list for generalized anxiety disorder in adults. For the meta-analysis, the outcome chosen from each study was the Penn State Worry Questionnaire. doi:10.1371/journal.pone.0098118.g005