Smartphones for Smarter Delivery of Mental Health Programs: A Systematic Review

Background The rapid growth in the use of mobile phone applications (apps) provides the opportunity to increase access to evidence-based mental health care. Objective Our goal was to systematically review the research evidence supporting the efficacy of mental health apps for mobile devices (such as smartphones and tablets) for all ages. Methods A comprehensive literature search (2008-2013) in MEDLINE, Embase, the Cochrane Central Register of Controlled Trials, PsycINFO, PsycTESTS, Compendex, and Inspec was conducted. We included trials that examined the effects of mental health apps (for depression, anxiety, substance use, sleep disturbances, suicidal behavior, self-harm, psychotic disorders, eating disorders, stress, and gambling) delivered on mobile devices with a pre- to posttest design or compared with a control group. The control group could consist of wait list, treatment-as-usual, or another recognized treatment. Results In total, 5464 abstracts were identified. Of those, 8 papers describing 5 apps targeting depression, anxiety, and substance abuse met the inclusion criteria. Four apps provided support from a mental health professional. Results showed significant reductions in depression, stress, and substance use. Within-group and between-group intention-to-treat effect sizes ranged from 0.29-2.28 and 0.01-0.48 at posttest and follow-up, respectively. Conclusions Mental health apps have the potential to be effective and may significantly improve treatment accessibility. However, the majority of apps that are currently available lack scientific evidence about their efficacy. The public needs to be educated on how to identify the few evidence-based mental health apps available in the public domain to date. Further rigorous research is required to develop and test evidence-based programs. Given the small number of studies and participants included in this review, the high risk of bias, and unknown efficacy of long-term follow-up, current findings should be interpreted with caution, pending replication. Two of the 5 evidence-based mental health apps are currently commercially available in app stores.

Introduction mobile devices [2], and around 13,600 health apps intended for use by consumers were available for download in Apple's App Store [3]. About 6% of these apps targeted mental health outcomes, while 18% focused on related health issues, such as sleep, stress, relaxation, and smoking behaviors. A survey among the Australian general public indicated that 76% would be interested in using mobile phones for mental health monitoring and self-management [4]. This suggests that mHealth is acceptable and may be a useful vehicle for enhancing access to evidence-based monitoring and self-help for individuals with mild-to-moderate common mental health conditions [4]. Clinical practice guidelines recommend cognitive behavior therapy (CBT) and self-help resources (such as mHealth) as options for psychological treatment for individuals experiencing mild-to-moderate symptoms of anxiety or depression [5]. mHealth apps can be used as stand-alone self-help programs or as a conjunctive treatment modality in guided programs, for example, part of a website or through direct contact with a mental health professional. The app can include treatment components such as cognitive therapy (CT), behavioral activation (BA), psychoeducation, or monitoring of symptoms.
Advantages of mHealth include the improvement of treatment accessibility and participant retention, real-time symptom and activity monitoring and tracking of treatment progress through ecological momentary assessment (EMA), provision of personalized feedback and motivational support, portability and flexibility of use, and the potential to improve adherence to treatment [6][7][8][9][10]. However, there are also disadvantages with using mobile devices for mental health. Technical problems and factors related to telecommunication can arise (eg, battery failures, reliability and sustainability of connections [11]), and issues of data security, patient privacy, and the identification and timely management of crises and risk of harm must be carefully considered when integrating smartphone technology into behavioral health care [12].
Previous research suggests that mental health interventions delivered through mobile apps can be effective in treating a range of mental health disorders, such as depression, stress, anxiety, and smoking cessation [6,7]. However, the thriving development of mental health apps warrants a systematic review of the available evidence base in this growing area. Previous reviews examining evidence-based mental health apps did not incorporate quantitative analyses [12] or included mHealth interventions that were not directly downloadable as an app (such as programs using SMS [short message service] text messaging or Internet-enabled interventions on mobile phones [6,13]). Therefore, the aim of this paper is to systematically review the available evidence-based apps directly downloadable on mobile devices (such as smartphones and tablets) for mental health symptoms or disorders (depression, anxiety, substance use, sleep disorders, suicidal behavior, psychotic disorders, eating disorders, stress, gambling) in children, adolescents, adults, and older individuals.

Search Strategy and Selection of Studies
A comprehensive literature search in bibliographic databases (MEDLINE, Embase, the Cochrane Central Register of Controlled Trials, PsycINFO, PsycTESTS, and Compendex and Inspec) for relevant articles published from January 1, 2008 (launch date of the first app), to May 30, 2013, was conducted. Terms indicative of mobile apps and mental health disorders were used to search these databases, with the search being limited to "humans", English, and peer-reviewed journals (see Multimedia Appendices 1-3 for the full search string). The identified titles and abstracts were screened for eligibility by 2 independent researchers. Full text copies of all potentially relevant papers, or papers where there was insufficient information in the abstract to determine eligibility, were obtained. Full text articles were further screened and discarded from further analyses if they met exclusion criteria. In addition, references of earlier reviews and reference lists of the included primary articles were examined. Furthermore, key technology journals (Cybertechnology, Behavior and Social Networking; Journal of Medical Internet Research; and Studies in Health Technology and Informatics) were hand-searched. We also reviewed Beacon, a website for evidence-based online programs for mental health, developed and delivered by the Centre for Mental Health Research at the Australian National University. Finally, a search was conducted of prominent individual authors' and researchers' names in the field of mHealth or Internet interventions (see Multimedia Appendix 4) in MEDLINE. Data extraction of relevant articles was completed by 2 independent researchers, with disagreements resolved through discussion or with a third researcher.
We applied strict inclusion criteria in order to investigate any evidence-based mental health apps that could be downloaded from app stores (eg, Google Play for Google Android [14] or the Apple iTunes store [15]). Studies examining the effects of mental health apps on mental health symptoms or disorders (depression, anxiety, substance use, sleep disorders, suicidal behavior, self-harm, psychotic disorders, eating disorders, stress, and gambling) that were directly downloadable on a mobile device (eg, smartphone or tablet) compared with a control group were included. The control group could consist of a wait list, treatment-as-usual, or another treatment. Studies without a control group (pre-post design) were also included. There was no restriction on participant age. Studies were excluded if they did not include an intervention or if mental health symptoms/disorders were not an outcome, and if the intervention was an Internet-based intervention, virtual reality exposure treatment, interactive voice response technology intervention, or a text messaging-only intervention without a mobile application component. Studies were also excluded if the intervention was downloaded on a computer and transferred (eg, through Bluetooth or infrared) to a mobile device, if the intervention targeted a medical disorder (eg, irritable bowel syndrome, diabetes), if the paper provided a description of the mobile application but no outcome data, and if the intervention was developed before 2008. Conference abstracts, protocol papers, case studies, non-peer reviewed papers, and non-English papers were also excluded.

Quality Assessment
Study quality was assessed according to 6 basic criteria of the Cochrane Risk of Bias Assessment Tool [16]: sequence generation, allocation concealment, blinding of outcome assessors, incomplete outcome data, selective outcome reporting, and other sources of bias. For the third criterion (blinding of outcome), we omitted blinding of participants since blinding participants for treatment allocation is rarely achievable in intervention trials for mental health disorders.

Outcome Measures
Primary outcome measures included reduction of depression symptoms, anxiety symptoms, substance use, sleep disturbance, suicidal behavior (suicide ideation, suicide plans, and attempts), self-harm, psychotic symptoms, symptoms of eating disorders, and gambling, as assessed with validated mental health scales.

Statistical Analyses
When data were available and extractable, intention-to-treat (ITT) within-group and between-group effect sizes (Cohen's d) for the intervention group were calculated by taking the difference between the mean pre-and posttest scores (within-group effect size) or the difference of the posttest scores (between-group effect size) and dividing by the pooled standard deviation. Effect sizes of 0.8 can be assumed to be large, while effect sizes of 0.5 are moderate, and effect sizes of 0.2 are small [17]. Where authors provided only t test statistics, we computed effect sizes using the formula: d=t / sort(df) [18]. Hedges' g effect sizes were converted to Cohen's d. Authors were contacted to provide additional data if needed. Two studies [19,20] did not provide sufficient data to calculate ITT within-group effect sizes.

Selection and Inclusion of Studies
A total of 5464 abstracts in MEDLINE (n=1859), Embase (n=1030), the Cochrane Central Register of Controlled Trials (n=277), PsycINFO (n=1095), PsycTESTS (n=1), and Compendex and Inspec (n=1203) were examined (N=4997 abstracts in total, after removal of duplicates). The majority of records that were excluded addressed nonpsychological technical issues, provided descriptions of mobile apps without outcome data, or were protocol papers or conference abstracts. Of these, 133 full text papers potentially eligible for inclusion were retrieved for further consideration, of which 126 were excluded. Seven trials met inclusion criteria. A further screening for potentially relevant references in recent systematic reviews or meta-analyses and the included studies, individual author names in MEDLINE, and hand-searching of technology journals (Cybertechnology, Behavior and Social Networking; Journal of Medical Internet Research; and Studies in Health Technology and Informatics [January 1, 2008, to May 30, 2013]) and the Beacon website resulted in 95 potentially relevant abstracts and retrieval of 64 additional full text papers for further assessment. Of these, only 1 study met inclusion criteria and was included in the final analysis. In total, 8 trials were identified. These described 5 apps (Mobilyze! [11], mobiletype [21,22], DBT Coach [23], Mobile Stress Management [19,20,24], and Get Happy Program [25]) (see Figure 1 for a flowchart of the screening process). There was a high degree of consensus among raters who screened the titles and abstracts (an interrater reliability of 95.2%).

Characteristics of Included Studies
A total of 227 participants were recruited across all studies. One study [19] did not provide sufficient information about sample size per treatment arm. Of the 8 included studies, 4 trials describing 3 apps assessed depression (Mobilyze!, mobiletype, Get Happy Program), and 3 studies describing 1 app (Mobile Stress Management) assessed stress as a primary outcome measure. Substance use was used as an outcome measure in 1 study (DBT Coach). Table 1 provides an overview of the included studies (see Multimedia Appendix 5 for the complete version of the table). One study used BA and another used CBT as the therapeutic mode of the intervention. Two studies described a trial delivering emotional self-awareness (ESA), 1 study was based on dialectical behavioral therapy (DBT) and opposite action (ie, emotional regulation skills), and 3 studies described an app delivering stress inoculation training (SIT) as the content of the intervention. Four studies describing 3 trials used an attention-placebo as a control group, 1 study used an active comparison, and 1 study did not specify the nature of the control group. Two studies used a pre-post design without a control group, and all studies except one were feasibility and/or pilot studies. Two studies recruited adults from the community, 1 study recruited from an outpatient clinic, and 2 studies recruited from the workplace. Two studies describing 1 trial recruited adolescents from general practice, and 1 study targeted female university students. Four studies delivered the intervention through a stand-alone mobile app, while 3 studies describing 2 trials used a mobile app alongside a website and EMA to deliver the intervention. One study used a mobile application in conjunction with traditional face-to-face therapy. All included studies delivered the program on a mobile phone, with 1 study also including iPads. Delivery length varied between 6 days and 8 weeks. Five studies assessed posttest outcomes only, whereas 3 studies describing 2 trials undertook follow-up assessments as well (6 weeks and 3 months). Five studies describing 4 apps were guided by mental health professionals through phone or email contact, whereas in 3 studies describing 1 app, participants independently navigated their way through the trial.

Quality Assessment
The quality of the studies varied but was generally low (see Table 2). Three studies describing 2 apps reported adequate sequence generation [21,22,25], whereas 3 studies [19,20,24] did not outline their sequence generation method. Three studies [21,22,25] reported allocation to conditions by an independent (third) party, whereas 3 other studies [19,20,24] did not provide sufficient information on allocation. Two studies that included diagnostic interviews [21,22] reported using blinded outcome assessors, and 4 studies [19,20,24,25] did not report blinding of assessors or used self-report outcome measures. Two studies [11,23] were not eligible for ratings for sequence generation, allocation concealment, or blinding of outcome assessors due to the pre-post study design. In 6 studies [11,[21][22][23][24][25], ITT analyses (completeness of follow-up data) were conducted; 1 of these failed to describe dropout rates [23], and only 1 study [11] described reasons for dropout during the intervention. Two studies [19,20] did not state the nature of the statistical analyses or dropout rate at all. Insufficient information and a high risk of bias of selective outcome reporting was present in 3 studies [19,20,23] and 2 studies [21,22] respectively. Three studies [11,23,25] had a high risk of other sources of bias (eg, absence of a control group, possible treatment infidelity) while for 5 studies [19][20][21][22]24] the risk of bias from other sources was unclear (due to significant difference at baseline for stress outcome, unequal number of participants in intervention and control group, and insufficient information). None of the included studies met all 6 quality criteria of the Cochrane tool (see Table 2).

Depression
Four studies describing 3 mobile apps [11,21,22,25]  In a randomized controlled trial (RCT) of a guided mobiletype app with EMA conducted by Kauer et al [21] and Reid et al [22], no significant differences were found at posttest and follow-up on outcomes of depression, anxiety, and stress among adolescents from general practice compared to an attention control group (Depression and Anxiety Stress Scale [DASS] anxiety: d=0.07, P=.76; DASS depression: d=0.09, P=.69). However, it should be noted that the control group received largely the same intervention as the experimental group, with the exception of two components; ESA training via EMA and minimal feedback reports. Mediator analyses yielded an indirect effect of group on depression via ESA (beta=-0.610, 95% CI -5.596 to -0.003). Significant small to moderate within-group differences over time were found for the intervention group

Anxiety/Stress
Three RCTs describing 1 unguided mobile app (Mobile Stress Management) using SIT [19,20,24] found a significant decrease in state and trait anxiety (State and Trait Anxiety Inventory [STAI]) and a significant increase in active coping skills among oncology nurses [20,24] and female university students [19] compared to a control group. Grassi et al [19] used a simplified version of the Mobile Stress Management app, which was also effective for reducing stress. However, both Villani et al [20] and Grassi et al [19] did not provide statistical results for intervention versus control group comparisons. Villani et al [24]

Substance Use
A pilot feasibility study aiming to reduce substance use (alcohol, drugs, and tobacco) among adults suffering from borderline personality disorder using a mobile app (DBT Coach [23]) in conjunction with face-to-face DBT therapy, indicated a significant reduction (P<.05) within each DBT Coach session in emotional intensity and urge to use substances (d=0.52 and d=0.29 respectively). Furthermore, a significant reduction (P<.05) in symptoms of depression (BDI: P=.014, d=0.55), global symptom severity (Brief Symptom Inventory: P=.021, d=0. 43), and confidence in participants' ability to use opposite action (ie, emotion regulation) skills (Behavior Confidence Questionnaire: P=.008, d=0.59) was noted from pre-to post assessment. Multimedia Appendix 5 outlines the ITT within-group effect sizes. The DBT Coach app is publicly available for download.

Ecological Momentary Assessment
Mixed findings were obtained from the 2 studies using EMA as part of the intervention. In the Burns et al [11] study, promising accuracy rates (60-91%) were achieved in predicting categorical contextual states (eg, location) based upon participant EMA entries. For participant states rated on continuous self-report scales (eg, mood), predictive capability was poor. Notwithstanding these technological outcomes, Reid et al [21] and Kauer et al [22] demonstrated that increased self-monitoring with EMA by participants did lead to increased ESA and thereby reduced depressive symptoms.

Intervention Feasibility and Adherence
Three studies providing usability and feasibility outcomes (eg, acceptability of the technology, perceived usefulness, perceived utility) reported moderate to high rates of mobile phone usage, feasibility, and participant satisfaction with the intervention [11,23,25]. The dropout rate was reported in 4 studies and varied between 12.5% and 34.3% [11,21,22,25]. Reported reasons for dropout, where described, were mostly due to technical problems [11].

Principal Results and Comparison With Prior Work
In general, the studies included in this systematic review showed promising results for evidence-based mental health apps in reducing depressive symptoms and caseness, stress, anxiety, and substance use, similar to previous reviews of mHealth [6,7]. However, due to the high risk of bias in some studies, these findings need to be considered with caution pending replication. Due to the absence of a control group in 2 studies [11,23], it was difficult to determine whether the beneficial effects were attributable to the app itself, a function of natural remission or regression to the mean, or in case of the DBT Coach app [23], due to the face-to-face DBT therapy offered to all participants in conjunction with the app. Additionally, a clear conclusion about the efficacy of the DBT Coach for substance use treatment cannot be drawn yet, since-besides the absence of a control group-change in substance use (eg, amount of alcohol units per week) prior to or after treatment was not reported, nor was a distinction made between different types of substance use (alcohol, drugs, nicotine cessation). Furthermore, some studies failed to provide sufficient information regarding dropout rates or did not report the statistical analyses used [19,20].
The mobiletype app was the only intervention that failed to yield any significant direct effect on depression, although a significant indirect effect was found in a reduction of depressive symptoms through the direct effect of increased ESA [21]. Because the attention-placebo control group received almost the same intervention as the experimental group, except for the ESA component, the nonsignificant finding is likely to be the cause of this finding. This study suggests that repeated self-monitoring over time using EMA on a mobile device may increase ESA and thereby reduce depressive symptoms. Evidence supports a similar mechanism underlying improvements in depression with CBT, where one of the most important components of CBT for depression involves rating one's mood and activities in a diary to raise awareness of how activities influence mood states [26]. The development of mobile devices has facilitated the collection of EMA data, thereby providing a portable and convenient delivery mode with which an individual can incorporate EMA and regular mood monitoring in their daily lives and improve ESA as part of treatment for depression. Although EMA shows promising results in predicting categorical contextual states, it needs to be further optimized to be able to accurately predict mood states [11]. Once refined in such a way to maximize accuracy and temporal resolution and minimize bias, EMA holds considerable potential to reveal dynamic interplay between mood, cognition, and behavior, increase participant self-awareness of such processes, and thereby enhance mental health treatment [27]. Together with the use of biomedical and/or activity sensors, timely personalized feedback can be generated to prompt users. mHealth interventions therefore have the potential to improve current depression treatment considerably [10]. In a similar way to guided Internet interventions [28], guided apps might derive larger effect sizes and adherence rates than stand-alone self-help apps, but more research is necessary to elucidate this.

Usability, Helpfulness, and Satisfaction
Usability, helpfulness, and satisfaction ratings, where assessed, were moderate to high [11,23,25], indicating that mHealth apps are perceived to be a useful vehicle for enhancing access to evidence-based monitoring and self-help. However, common technical problems (eg, battery failure, connectivity, freezing of app) need to be overcome. Adherence rates (if reported) were high, in line with previous research in mHealth [29], but higher when compared to adherence rates seen with Internet-based interventions [30]. It might be that the method of delivery (mobile phone) and its portability and flexible usage, and/or its delivery of personalized feedback may account for these higher retention rates for mobile apps. However, some of the included studies provided subjects with monetary rewards for participation, which is likely to artificially raise adherence rates as well.

Sustainability of Results
Most studies included only posttest assessment or a short-term follow-up (6 weeks). Although 1 study showed sustainable results at 3-month follow-up [25], sustainability of results over a medium-to long-term timeframe requires further investigation and replication. As such, on the basis of current evidence, sustainability of results cannot yet be determined.
Since mental health apps downloadable for use by the general population are increasing rapidly, despite evidence for their efficacy being largely unknown, the focus of this systematic review was on apps only. We applied very stringent inclusion criteria to ensure that we identified the evidence-based mental health apps that could be downloadable in the future by the general public from app stores, for example, Google Play for Google Android [14] or the Apple App Store [15]. Therefore, several highly sophisticated programs using mobile technology were excluded, such as the myCompass program [31] for depression, anxiety, and stress. The CBT-based myCompass program is delivered via a website with an Internet-enabled mobile phone component and encourages real-time self-monitoring of moods, mood triggers, and lifestyle behaviors using SMS text messaging and email prompts. Other examples of similar programs include an SMS-based txt2quit intervention [32] and a video-based STUB IT intervention [33], both of which have been shown to be effective for smoking cessation, and an SMS-based intervention [34] to increase medication adherence in individuals with schizophrenia. We were also unable to include the innovative INTREPID research [35], which used virtual reality exposure therapy on mobile phones to reduce anxiety.
There are more than 3000 mental health apps for Android, Apple, and Microsoft freely available to download to date, compared to the 8 evidence-based apps we identified through our systematic review. Only 2 of the apps included in this review are currently available for public download, comprising less than 1% of the commercially available apps. A recently published review on existing (commercial) mHealth apps for the most prevalent health conditions in the Global Burden of Disease list provided by the World Health Organization [36] echoes this finding. The authors concluded that the development of mHealth apps was first and foremost driven by commercial and economic motivations rather than scientific motivations behind research. Although the numerous protocols [9,37] and case studies [38,39] we excluded indicate a nascent field of research, the rapid growth and development of thousands of non-evidence-based mental health technologies has generated the need for independent regulation. This is underlined by the alarming findings from previous research [40][41][42] indicating that only 13-26% of Web-based or app-based interventions for smoking cessation adhere to treatment guidelines. A recent study on commercial apps using EMA for alcohol use echoes these findings [43]. The US Food and Drug Administration has taken an important step towards the development of quality control guidelines for health apps [44], but there are still major issues and dangers concerning the lack of quality control of commercially available mental health apps. Further research and work must be undertaken to develop, test, and disseminate evidence-based mHealth interventions among the public to ensure optimal public health outcomes.

Limitations
This review has several limitations. First, despite an extensive search, the number of included studies was small, which restricted our interpretations as to whether mHealth apps have an effect on reducing mental health symptoms. Second, the number of participants in the included studies was small. As a result, the studies were probably underpowered to detect the more subtle effects of the interventions. Furthermore, small sample sizes hamper the precision and accuracy of the statistical results and therefore limit our interpretations [45]. Third, the quality of the included studies was low. Historically, low quality trials yield positive results [46]. Due to the small number of studies, we were unable to examine whether significant differences existed between higher-and lower-quality studies. Fourth, there were no studies that examined the long-term efficacy of mental health apps. Therefore, long-term effects remain as yet unknown. Finally, only studies from peer-reviewed, English language journals were included in this review. However, the effect of language bias has been shown to have a minimal impact on the conclusions of systematic reviews [47].

Future Research
There is a very clear need for more research in this area. Trials with an RCT design of high quality to minimize risk of bias are needed to determine the efficacy of mental health apps. Unfortunately, the competitive nature and time-consuming process of grant applications and RCT designs necessary for such high-quality research contrasts sharply with the speed of development in this highly innovative technology. Component testing with small sample sizes may offer one solution to help bridge the gap between academia and real-world applications [48]. Research is particularly weak in the domains of sleep disturbance, anxiety disorders, and smoking cessation and needs further investigation. The cost-effectiveness and cost-utility of mHealth, compared to standard care or Internet-based treatment, requires further examination.

Conclusions
In summary, although a firm conclusion cannot yet be drawn, the current systematic review suggests that mobile apps for mental health have the potential to be effective in reducing depression, anxiety, stress, and possibly substance use for individuals experiencing these symptoms. Given the widespread usage of mobile and smartphones and increasing uptake of tablet devices, mHealth has the potential to increase treatment accessibility globally. The difference in the volume of commercial apps compared to the small number of tested evidence-based apps is striking. It warrants the need for public education and further development and research into evidence-based mental health apps and consideration of industry regulation.