Systematic Review on Internet Support Groups (ISGs) and Depression (1): Do ISGs Reduce Depressive Symptoms?

Background: Internet support groups (ISGs) enable individuals with specific health problems to readily communicate online. Peer support has been postulated to improve mental health, including depression, through the provision of social support. Given the growing role of ISGs for both users with depression and those with a physical disorder, there is a need to evaluate the evidence concerning the efficacy of ISGs in reducing depressive symptoms. Objective: The objective was to systematically review the available evidence concerning the effect of ISGs on depressive symptoms. Method: Three databases (PubMed, PsycINFO, Cochrane) were searched using over 150 search terms extracted from relevant papers, abstracts, and a thesaurus. Papers were included if they (1) employed an online peer-to-peer support group, (2) incorporated a depression outcome, and (3) reported quantitative data. Studies included both stand-alone ISGs and those used in the context of a complex multi-component intervention. All trials were coded for quality. Results: Thirty-one papers (involving 28 trials) satisfied the inclusion criteria from an initial pool of 12,692 abstracts. Sixteen trials used either a single-component intervention, a design in which non-ISG components were controlled, or a cross-sectional analysis, of which 10 (62.5%) reported a positive effect of the ISG on depressive symptoms. However, only two (20%) of these studies employed a control group. Only two studies investigated the efficacy of a depression ISG and neither employed a control group. Studies with lower design quality tended to be associated with more positive outcomes ( P = .07). Overall, studies of breast cancer ISGs were more likely to report a reduction in depressive symptoms than studies of other ISG types (Fisher P = .02), but it is possible that this finding was due to confounding design factors rather than the nature of the ISG. Conclusions: There is a paucity of high-quality evidence concerning the efficacy or effectiveness of ISGs for depression. There is an urgent need to conduct high-quality randomized controlled trials of the efficacy of depression ISGs to inform the practice of consumers, practitioners, policy makers, and other relevant users and providers of online support groups.


Introduction
Internet support groups (ISGs) provide individuals with specific health problems an opportunity to share experiences and to seek, receive, and provide information, advice, and emotional support online. It has been estimated that millions of people visit online peer-to-peer discussion groups daily [1], and there is evidence that over 28% of Internet users have visited an online support group at least once [2].
Internet users seeking health information frequently access information about depression [3], and online depression groups have been reported to be among the most common ISGs on the Internet [4]. It is also known that there is a high level of depression among individuals with a physical illness [5]. Thus, many users seeking to join health ISGs may have elevated depressive symptoms or may be at risk of developing depression.
Peer support has been postulated to improve mental health, including depression, through the provision of social support, which alters cognitions, attitudes, self-attributions, and coping, which, in turn, leads to a reduction in depressive symptoms [6]. Given the growing role of ISGs for both consumers with depression and other health conditions, there is a need to evaluate the evidence concerning the effect of these groups on depressive symptoms. One research group has conducted a high-quality, systematic review of studies on the effect of health ISGs on a range of outcomes [1]. The review did not, however, focus on depression outcomes in detail and was confined to articles published prior to October 2003.
The current paper aims to provide a systematic and comprehensive review of the available evidence concerning the effect of ISGs on depressive symptoms regardless of the ISG health condition. A more detailed review of depression ISGs specifically is provided in a companion paper, which reports the scope and findings from all qualitative and quantitative empirical studies of depression ISGs (see [7]).

Databases
Three databases (PubMed, PsycINFO, Cochrane) were searched using keywords and phrases for the period prior to August 2007. The search was undertaken at two time points, the first in May 2005 and the second in July 2007.

Search Methodology
The search terms and strategies were based on those reported by Eysenbach et al [1], which involve the following concepts: (computer/Internet communication and support) or e-community venue. In addition, a further 48 relevant search terms were extracted from research papers on ISGs, abstracts extracted by running database searches using the resulting search terms, and an online thesaurus searching for similes of key terms [8].

Study Identification
A multi-step process was employed to select relevant studies for the current review and the review of depression ISGs reported in the companion paper to this study [7] (see Figure  1). In the first stage, each of the 12,692 abstracts returned by the database searches was screened by one of the three authors (AC, MB, KG). The aim of this stage was to screen out clearly irrelevant abstracts and, in particular, to eliminate any reference that clearly did not satisfy the following inclusion criteria: or peer support, online/computer-based communication or interaction, collaborative virtual environments or interventions. 3. The support "group" discussed or investigated was health/psychology related (eg, biological illness, mental illness, health risk factors, bereavement, group counseling), or the article measured a health/psychology related outcome in relation to the support group.
After removing duplicate papers (Stage 2), the remaining abstracts (n = 859) were coded as relevant, not relevant, or possibly relevant according to the following inclusion criteria: 1. Employed an online peer-to-peer support group 2. Incorporated either a depression outcome or involved a unipolar depression ISG 3. Reported either quantitative or qualitative empirical data (Stage 3) Studies were included whether they incorporated a stand-alone ISG or involved a complex multi-component intervention. Reviews of ISGs satisfying the first two criteria were identified and analyzed separately. Abstracts were coded by one author (AC or KG) and checked by a second author (KG or AC). Any disagreement was resolved by discussion. After excluding the irrelevant abstracts, 158 papers were obtained, read (if in English), and coded against the inclusion criteria by one author (KG). The coding was checked by a second author (AC). Those papers that did not report a depression outcome or did not concern an ISG exclusively devoted to depression were excluded (Stage 4), as were any duplicate papers generated as a result of conducting a two-phase searches process (n = 2). In addition, two papers were judged to be non-English versions of an English-language publication and were excluded [9,10]. Nine other non-English papers of possible but not definite relevance were excluded for pragmatic reasons (cost of translation) [11][12][13][14][15][16][17][18][19]. It is unclear how many of these would have been retained in the review had they been formally translated. However, one did not satisfy the inclusion criteria based on a translation by the first author [18], and only three of the remaining non-English papers were rated as probable or definite relevance based on the English abstract and a perusal of the content of the tables in the untranslated paper [11,14] or a partial translation supplied by a colleague [19].
The above process yielded a total of 38 relevant papers and five systematic reviews. Two additional relevant papers were identified from the five reviews, and a further two papers cited in at least one of the 38 relevant papers were included among the pool of relevant papers (Stage 5). This resulted in a total of 42 relevant papers of which 31 papers comprising 28 separate trials incorporated a depression outcome (Stage 6) and 11 (studies of depression ISGs) did not. The current paper focuses on the 28 trials reporting a depression outcome.

Coding of the Included Papers
The 31 papers reporting a depression outcome were independently coded by two raters (KG, AC), and discrepancies were subsequently resolved by discussion between the two raters.
Quantitative studies that included depression outcomes were coded for ISG, participant and study characteristics, and depression outcomes.

Analyses
A formal quantitative meta-analysis was not conducted due to the low quality of the studies meeting the inclusion criteria and the heterogeneous nature of the conditions studied. However, the possible role of different characteristics and quality were explored by comparing the characteristics of samples reported to have yielded positive, statistically significant results with those that did not, using a series of Fisher exact tests for categorical attributes and Mann-Whitney tests for other data.
For the purposes of this analysis, data were analyzed at the comparison rather than the study level. In addition, for descriptive purposes, where possible, Cohen's d standardized effect sizes were calculated and reported. For uncontrolled studies, the pre-post standardized effect size was calculated from the mean pre-test and post-test scores and standard deviations. For controlled studies, the study effect size was the difference between the pre-post effect size for the control group and the pre-post effect size for the intervention group. In a study involving the comparison between depression scores for high-use compared to low-use Internet users, effect size was based on the standardized difference for the two groups. Effect sizes were not calculated in several instances. Where only the t test value for dependent (or equivalent) samples was available [20], no effect size was estimated as such t values are based on the standard error of the difference rather than a pooled standard deviation and therefore overestimate the effect size. For the same reason, an effect size was not calculated from the F value of simple effects analysis of residualized change in depression [21]. In addition, effect sizes were not calculated for studies in which only baseline adjusted means [22] and baseline adjusted difference in change [23] were reported and for one study containing apparent inconsistencies in reported sample standard deviations [24].

Results
Of the 28 studies with depression outcomes, five reported results separately for two different populations (patient versus carer [21,25], mothers versus fathers [26], adolescents versus young adults [27], heterogeneous versus homogenous group composition [28]), and one involved two arms differing in intervention duration [24]. Thus, there were a total of 34 samples. In reporting the findings below, the term "samples" will be used to refer to these 34 different populations or arms, and the term "studies" will be reserved to describe the 28 trials.

Study Characteristics
Of the 28 studies with depression outcomes, 16 involved the evaluation of stand-alone ISGs or used a design that controlled for the use of intervention components other than the peer-to-peer component or involved cross-sectional studies of online groups (single component). The remaining studies incorporated a multi-component intervention that comprised the discussion group plus at least one additional component such as health education, skills training, or decision aids. Table  1 and Table 2

Origin
The majority of studies were reported in published journal articles, and, in most cases, the senior author was located in the United States.

Interventions
The studies primarily employed bulletin boards, chatrooms, or mailing lists, either alone or in combination (see Table 3). Approximately two-thirds were closed ISGs, typically developed for research purposes. Half of the studies specified that the ISGs were moderated, and of these the majority of moderators were health professionals. The duration of the interventions ranged from 12 minutes to 12 months (median 16.5 weeks), and length of time to follow-up ranged from immediately post-intervention to 12 months post-intervention.

Participants
More samples were focused on ISGs for breast cancer than any other condition. In addition, a significant percentage of the samples related to depression and ISG use in those without a physical or psychological condition. As noted above, only two samples were exposed to depression ISGs. The median age of participants in the samples typically fell between 26 and 65 years. Some of the samples comprised college-aged or younger adolescents. None was concerned specifically with older people, although the median age of one sample was 64 years [28]. Significantly, only a minority of samples focused on men, whereas almost one half contained a predominance of, or all, women. Only one study focused on rural participants [47]; two others mentioned the inclusion of some rural residents [26,30].

Outcome Measures
Half of the studies (n = 14) used the Center for Epidemiologic Studies Depression Scale (CES-D) as an outcome measure, with the next most common measures (with two trials each) being the Symptom Checklist 90 (SCL-90) and the Beck Depression Inventory (BDI). Each of the remaining measures was administered in one trial only.

Study Quality
One third of the studies involved an RCT, and almost half of the 28 studies employed a control group. The majority of the remaining studies used a pre-post design. Of the 23 studies that used at least a pre-post design, only three (13%) used an ITT design, with a further study neither specifying if an intent-to-treat design was employed nor indicating the level of dropout if any [46]. Two of the four ITT studies [29,33] used the last observation carried forward method for treating missingness. The third inferred mood from initial and final posts on a bulletin board, thus ensuring that there was no dropout [32]. No study used multiple imputation for estimating missingness. Of the nine studies said to have employed an RCT design, only three [22,24,48] both adequately specified the randomization procedure and employed an appropriate method of randomization [51].
Intervention and control sample sizes ranged from 10 to 244 (median 46) and 30 to 236 (median 51), respectively, for samples derived from studies of at least pre-post test quality. Cross-sectional study sample sizes ranged from 158 to 2373 (median 230). Dropout among samples in studies of at least pre-post test quality ranged from 7.9% to 41.7% and 0% to 37% for intervention and control conditions, respectively. Of the 22 studies of at least pre-post design with some dropout, 46% (n = 10) compared the characteristics of completers and non-completers. All but one of these (n = 9, 90%) reported no difference in baseline characteristics for these groups.

ISG Efficacy for Depression
The outcomes for single and multiple studies are discussed separately.

Single-Component Studies
Of the 17 intervention samples (16 studies) involving a peer-to-peer component alone or a cross-sectional design, 10 (59%) yielded a positive effect of the ISG on depressive symptoms. However, only two of these involved a controlled trial.
The largest number of single-component samples involved women with breast cancer (n = 5) [20,[29][30][31][32]. Of these, four yielded significant effects of moderate to large size [20,[29][30][31], and the fifth was associated with a small, significant association between board use and improved mood [32]. However, only one of these trials employed a controlled design [29].
Three samples (three studies) involved ISGs comprising members with a mental disorder, two of them depression [33][34][35]. One of these produced a positive result. In particular, Houston et al [34] found that more frequent depression ISG users were significantly more likely to recover from depression after adjustment for baseline depression severity and demographic variables. However, the study did not include a control group. The second depression ISG comparison involved the control arm of an RCT of an online cognitive behavior therapy intervention for depression in which a research bulletin board was used as a control condition [33]. There was no significant effect of the bulletin board.
There were two other single-component samples (2 studies) involving medical conditions, one of them involving a trial of an ISG for diabetes [36,37], the other the use of an ISG for renal patients undergoing dialysis [38]. The ISG did not produce an effect on depressive symptoms in either of these studies, but the latter involved only three cases.
Finally, seven samples (six studies) involved people with no psychological or physical disorder [27,[39][40][41][42][43]. Three samples (two studies) involved experimental studies of the effect on mood of online communication between peer dyads [27,39]. Two of these reported a positive effect of the dyad on mood. The remaining four samples (four studies) involved cross-sectional studies of survey data designed to investigate the association between frequency of chatroom use and mood in community samples. Two of these studies involved university communities and found that higher chatroom use predicted lower depression [40,43]. A third, cross-sectional study of general users on the Internet did not find an association between frequency of use and mood but employed a dichotomized measure of frequency and may therefore have lacked statistical power [42]. The final study, which involved adolescents aged 11 to 16 years, found a reverse effect, with higher Internet use being associated with a higher level of depressive symptoms [41]. In summary, there is weak evidence that chatroom use among people without a disorder may be associated with lower levels of depression, but the quality of evidence is poor and the findings inconsistent.

Multi-Component Studies
Of the 17 samples (12 studies) that involved intervention components in addition to the ISG, only two (12%) reported a positive effect [21,28]. The first, involving a homogenous group of patients with Parkinson's disease, employed a pre-post design only and incorporated a health professional education component as well as the ISG [28]. The second, involving heart recipients, employed a historical control differing in depression severity and comprised many potentially active components in addition to the ISG, including stress skills training [21].

Association Between Positive Results and Study Characteristics
Multi-component studies were significantly less likely to yield significant, positive outcomes than stand-alone interventions and cross-sectional studies (Fisher exact test, P = .01). Breast cancer ISGs were more successful than other ISGs (Fisher exact test, P = .02), but most of the breast cancer studies originated from a single research group. Outcome was not affected by the use of synchronous (chatroom) compared to asynchronous (bulletin board, listserv/newsgroups) ISGs (Fisher exact test, P = .99), whether or not the study reported using a moderator (Fisher exact test, P = .72) or whether the board was public, research, and/or restricted access (Fisher exact test, P = .11).
There was no effect on outcome for the duration of the intervention (Mann-Whitney U = 57, P = .23) or the length of follow-up (Mann-Whitney U = 75.5, P = .83). Nor was there a significant association between age (25 years and younger vs older) and success, but there were few studies of young people (Fisher exact test, P = .64). Considering only the samples that were predominantly comprised of males (n = 4) or females (n = 16), there was no association between outcome and sex (P = .59), but the sample size of males was very small.
With respect to study quality, there was a trend toward an association between lower design quality and positive outcomes, with 19% (n = 3) of samples involving controlled comparisons (RCT, controlled trial, historic control) and 53% (n = 9) of uncontrolled effects yielding significant positive findings. However, this association fell short of statistical significance (Fisher exact test, P = .07). A similar non-significant trend (Fisher exact test, P = .13) was noted for samples involving RCTs compared to other designs. In the latter case, only 17% (n = 2) of the RCTs yielded a positive effect and none of these employed an ITT design. By contrast, 48% (n = 10) of the lower-quality trials yielded significant positive outcomes. There was no association between total sample size of study intervention groups and outcome (Mann-Whitney U = 62, P = .26).

Discussion
The most salient finding of this review was the paucity of high-quality studies of the impact of depression or other ISGs on depression outcomes. Only a minority of the identified studies employed a control group, and two-thirds of RCTs either failed to use an adequate method of randomization or failed to specify the method of randomization. In addition, only 13% of studies of at least pre-post quality used an ITT analysis, and no study used multiple imputation for treating missingness. This low level of quality is a cause for concern, particularly given the trend toward an association between significant positive findings and low design quality.
Despite the apparent popularity of the Internet as a source of support for people with depression, there were only two studies of the effectiveness or efficacy of depression ISGs in improving mood. One comprised the control arm in a study of the effectiveness of a psychological therapy, and the other involved an uncontrolled multi-time-point study of an existing public depression ISG. Although the findings from the latter study were promising, neither study was of sufficient quality to evaluate whether depression ISGs improve or do not improve depression outcomes. Clearly, there is a need to undertake an RCT of the effect of a depression ISG on depression status.
Although there were more studies of the effect on depression for ISGs for conditions other than depression, many of these studies were of low quality and almost 50% employed multi-component interventions of which the ISG was only one component. Indeed, only two studies employed both a controlled design and a single-component intervention [27,29]. The first involved a structured 12-week breast cancer newsgroup intervention facilitated by a psychologist. There was a greater reduction in depressive symptoms among the ISG than the control group using ITT analyses. The second involved a sample of well adolescents and a sample of well college students who, after exposure to a negative mood induction manipulation, were provided with the opportunity to interact online with an unknown peer. There was an improvement in mood for the adolescents assigned to online peer interaction relative to control adolescents, but no such effect for college students. Thus, the results of the two highest quality studies are encouraging and suggest that further studies of ISGs of all types are warranted.
The finding that breast cancer ISGs were significantly more likely to be associated with positive results than ISGs of other types requires further investigation given that women with breast cancer are known to be at increased risk of depression [52]. If found to be effective in reducing depressive symptoms, such ISGs could provide an important mental health self-care and prevention tool for women with breast cancer. However, the status of the current results is unclear given that the majority of findings were derived from one research group and the studies were typically of low quality.
The finding that chatroom use tends to be associated with lower levels of depression among participants without depression or other medical conditions raises the possibility that chatroom usage may protect against depression in universal samples of members of the community. However, much of the evidence is based on cross-sectional surveys. Thus, the direction of causation cannot be determined, and chatroom usage may be associated with other behaviors and these rather than the chatroom use may mediate the depression levels.
Theoretically, online support groups could be particularly relevant and appropriate for users who are isolated or not able to access conventional or face-to-face services, either due to lack of mobility or geographic location. It is therefore of some concern that none of the studies investigated ISGs among older people and that only one study specifically focused on the effectiveness of an ISG for rural participants.

Limitations
A limitation of this study is that it does not include trials published after July 2007. To investigate this, a further search was conducted by the first author incorporating the time period from August 2007 to May 2009 and using the same search terms employed in the reported searches but limiting results to those incorporating the terms "depression" or "depressive" or "mood." After excluding a published study reporting data from a dissertation already incorporated into the review [53], 14 new relevant papers were identified. Of these, six involved experimental studies [54][55][56][57][58][59] and the remainder were non-experimental [60][61][62][63][64][65][66][67]. No new descriptive studies of depression ISGs were identified. Of the experimental studies, all but two [54,55] incorporated potentially active components in addition to an ISG. Only one of the six employed an ITT design [58], and although three were RCTs [56][57][58], none specified the method of randomization. The remaining three experimental studies were controlled trials [54,55,59], but one employed a non-contemporaneous control [54]. Of the two single-component studies, one involved an ISG for Spanish-speaking immigrant women with breast cancer [54] and the other an ISG for Asian American women with a lesbian or bisexual orientation [55]. Neither resulted in a positive effect on depressive symptoms relative to a control.
Of the four multi-component trials [56][57][58][59], three reported a greater reduction in depressive symptoms in the intervention group [57][58][59]. The first of these studies involved an ISG and educational films for people with chronic pain or burnout ( [57], RCT), but the effect was not sustained at follow-up. The second employed a discussion group in addition to a therapist-facilitated online group and an offline cognitive behavioral therapy program, but the latter is a known effective treatment for depression ( [58], RCT). The third comprised a computer and Internet educational program for older people that incorporated, but was not limited to, participation in forums and virtual communities ( [59], controlled trial). The remaining multi-component trial found no effect of a complex intervention incorporating an ISG component for rural-residing women with a chronic illness ( [56], RCT). This study found that an intensive intervention involving peer-to-peer online support, expert-facilitated online group discussion, and online expert advice resulted in no greater reduction in depression than an information intervention alone or no intervention [56]. The 11 non-experimental studies identified investigated the relationship between chatroom (unspecified) use and depression, and most used a cross-sectional design. The findings were mixed. In summary, studies published since mid-2007 shed little additional light on the effectiveness of ISGs in reducing depressive symptoms and provide no further evidence concerning the efficacy of depression ISGs.

Conclusions
There is a need for high-quality research to investigate the effect of ISGs on depression outcomes. We acknowledge that there are significant challenges associated with designing and undertaking efficacy studies of ISGs. We acknowledge too that the appropriateness and feasibility of conducting such research on online self-help groups have been questioned [68]. However, we believe that creative researchers, together with consumers, can find a way to shed further light on an issue of unquestionable practical significance for millions of consumers worldwide.