Outcomes of Psychological Therapies for Prisoners With Mental Health Problems: A Systematic Review and Meta-Analysis

Objective: Prisoners worldwide have substantial mental health needs, but the efficacy of psychological therapy in prisons is unknown. We aimed to systematically review psychological therapies with mental health outcomes in prisoners and qualitatively summarize difficulties in conducting randomized clinical trials (RCTs). Method: We systematically identified RCTs of psychological therapies with mental health outcomes in prisoners (37 studies). Effect sizes were calculated and meta-analyzed. Eligible studies were assessed for quality. Subgroup and metaregression analyses were conducted to examine sources of between-study heterogeneity. Thematic analysis reviewed difficulties in conducting prison RCTs. Results: In 37 identified studies, psychological therapies showed a medium effect size (0.50, 95% CI [0.34, 0.66]) with high levels of heterogeneity with the most evidence for CBT and mindfulness-based trials. Studies that used no treatment (0.77, 95% CI [0.50, 1.03]) or waitlist controls (0.71, 95% CI [0.43, 1.00]) had larger effect sizes than those that had treatment-as-usual or other psychological therapies as controls (0.21, 95% CI [0.01, 0.41]). Effects were not sustained on follow-up at 3 and 6 months. No differences were found between group and individual therapy, or different treatment types. The use of a fidelity measure was associated with lower effect sizes. Qualitative analysis identified difficulties with follow-up and institutional constraints on scheduling and implementation of trials. Conclusions: CBT and mindfulness-based therapies are modestly effective in prisoners for depression and anxiety outcomes. In prisons with existing psychological therapies, more evidence is required before additional therapies can be recommended.

. To address this, many countries have introduced specialist mental health services in prisons but these vary considerably within and between countries, including for psychological therapies. Little is known about which treatments are based on good quality evidence, which may not be generalizable from community settings because of the particular challenges of delivering treatment in prisons based on individual characteristics (including comorbidity) and the nature of the environment.
A number of systematic reviews of mental health interventions for prisoners have been published (Bartlett et al., 2015;Fontanarosa, Uhl, Oyesanmi, & Schoelles, 2013;Heckman, Cropsey, & Olds-Davis, 2007;Himelstein, 2011;Kouyoumdjian et al., 2015;Leigh-Hunt & Perry, 2015;Morgan & Flora, 2002;Morgan et al., 2012;Ross, Quayle, Newman, & Tansey, 2013;Shonin, Van Gordon, Slade, & Griffiths, 2013;Sirdifield, Gojkovic, Brooker, & Ferriter, 2009). However, they mostly focus on selected populations and disorders (Leigh-Hunt & Perry, 2015), specific therapies (Shonin et al., 2013) and combine randomized and nonrandomized trials (Bartlett et al., 2015;Morgan et al., 2012). Other reviews have been broader literature reviews that examined different study designs (including theoretical papers, audits, needs assessments, and screening; Sirdifield et al., 2009) or included interventions outside prison (Fontanarosa et al., 2013). One review of Englishlanguage studies that covered a broad range of interventions and outcomes using dichotomous diagnoses found a strong effect size (ES ϭ 0.87) but did not explore sources of heterogeneity or compare the outcomes by treatment type (Morgan et al., 2012). Another recent review covered RCTs to improve health during imprisonment and a year after release, but this review covered a wide range of mostly physical health and drug abuse interventions (Kouyoumdjian et al., 2015), did not metaanalyze findings, and used a search strategy that was not optimized for identifying psychological treatments. Thus, previous reviews have been limited in examining the efficacy of psychological therapies by either being too specific or overly broad. This paper aims to address these gaps by conducting a systematic review and meta-analysis on solely RCTs of psychological therapies of unselected samples of prisoners. For the purposes of this review, prisoners are considered to be presentenced (also known as remand prisoners or detainees) and sentenced individuals in jails and prisons, but not persons in police custody or other forms of administrative detention (such as immigrant detention centers). We sought to compare effect sizes across different types of psychological therapies and examine sources of heterogeneity. In addition, we qualitatively examined the difficulties in implementing RCTs of psychological therapies in prisons in order to make further recommendations for research.

Protocol and Registration
The Preferred Reporting Items for Systematic Reviews and Metaanalyses (PRISMA) guidelines were followed (Moher, Liberati, Tetzlaff, & Altman, 2009), and the protocol was prospectively registered in PROSPERO (n.d.; the International prospective register of systematic reviews) to minimize reporting bias through adherence to the initial protocol and to avoid duplication so that researchers can see what systematic reviews are in progress before undertaking their own.

Search Strategy
PsycINFO, MEDLINE, Global Health, PubMed, CINAHL, National Criminal Justice Reference Service, Scopus, EMBASE, and Cochrane Library were searched from their start dates until May 30, 2015. Additional targeted searches were conducted by hand-searching citations and reference lists of other systematic reviews and articles. Targeted searches on specific authors (identified from previous papers), mindfulness-based therapies, and treatments for psychopathy were conducted separately. We corresponded with authors to clarify data when necessary. Details about keywords are outlined in Appendix A.

Study Eligibility
Inclusion and exclusion criteria were as follows: Study design. RCTs including pilot studies and clusterrandomized trials were included. Nonrandomized trials (including pretest/posttest comparisons) and case studies were excluded.
Population. Prisoners (including juveniles, remand, detainees) were included. Samples not currently in prison (e.g., postprison release treatments (Sacks, McKendrick, & Hamilton, 2012), people on parole, and in secure hospitals or therapeutic communities outside prisons) were excluded.
Interventions. Cognitive behavioral therapy, dialectical behavior therapy, Mindfulness-based Therapy, and other group treatments such as Music Therapy and Art Therapy (including self-help treatments) were included. Studies examining only medication were excluded.
Outcomes. Studies that reported psychological improvement measured by standardized instruments at posttreatment and follow-up were included. Outcomes restricted to recidivism or substance use were excluded.
Language. Studies in any language including unpublished (e.g., doctorates) reports were considered. Studies that did not provide data to calculate effect sizes were excluded.
Studies treating psychopathy or sociopathy in prisons were not included because none of the identified studies had standardized psychological outcomes.

Data Extraction and Quality Assessment
In addition to effect sizes, 95% confidence intervals, variance of outcomes, and prespecified study characteristics were recorded. Primary outcome was selected as being the most commonly used psychological assessment in the included study to facilitate comparisons. A second extractor (a consultant psychiatrist with prison experience) extracted data independently, and any disagreements were resolved.
Eligible studies were assessed using the quality checklist used by the National Institute of Health and Care Excellence (NICE; see Appendix B), which assesses internal validity such as the use of adequate concealment method for participant allocation (concealing the allocation sequence from research and clinical staff and participants until permanent assignment of participants into each study group), blinding of subjects and investigators, and intentionto-treat analyses. Overall rating was either: Ϫ (few or no criteria fulfilled), ϩ (some fulfilled), or ϩϩ (all or most fulfilled).

Statistical Analysis
Effect size calculation. The standardized mean difference (d), 95% confidence intervals, and variance were calculated for each study (Wilson, 2001). For studies with more than one control group, the one that received more therapy was chosen over the waitlist control in order to have a more conservative estimate. For a study that compared two different treatment groups, each treatment group was independently compared with controls. Doublecounting of the participants did not apply as no studies reported participants in both intervention groups .
Meta-analysis. Given the clinical heterogeneity between studies, random-effects models were conducted. The degree of statistical heterogeneity was assessed using I 2 , which represents the percentage of the observed variation in effect size across studies due to true heterogeneity rather than chance  with values of 25%, 50%, and 75% indicating low, moderate, and high levels of heterogeneity, respectively (Higgins, Thompson, Deeks, & Altman, 2003).
Effect sizes were grouped into domains and presented in forest plots. First, the studies were grouped by comparator type: one group of studies included no treatment (including no-contact group) as controls, and another included waitlist as controls. A final group included active treatment controls, such as treatmentas-usual or another form of psychological therapy such as individual supportive therapy, standard prison-based therapeutic community, supportive group therapy (SGT), or attention-matched manualized psychoeducation (Ford, Chang, Levine, & Zhang, 2013;Johnson & Zlotnick, 2012;Messina, Grella, Cartier, & Torres, 2010;Perkins, 1998;Wilson, 1990).

Metaregression and Publication Bias
Metaregression analysis was performed to examine sources of heterogeneity on a range of prespecified factors. For the dichotomous version of the gender variable, more than 90% of male was classified as male even when total sample included some females. Because of a large number of U.S.-based studies (n ϭ 26) and few studies from each of the other countries included, the variable of country setting was analyzed as U.S. versus rest of the world.
In metaregression, variables in univariate analyses with p values of Ͻ0.l were included in multivariable models. Multivariable analysis was conducted with all of the variables simultaneously with either the dichotomous or continuous version of each variable to avoid collinearity (Chatterjee & Hadi, 2006). If there were fewer than 10 studies that reported the explanatory variable(s) of interest, metaregression analysis was not performed .
To test for publication bias, funnel plot analysis and Egger's test were performed (Sterne & Egger, 2001;Sterne et al., 2011;Tacconelli, 2010). As an exploratory analysis, the trim and fill analysis (with random-effects model) was also conducted with the total sample and subset of samples (studies with no treatment/waitlist controls) to identify and correct for funnel plot asymmetry attributable to publication bias Peters, Sutton, Jones, Abrams, & Rushton, 2007). Analyses were performed in STATA-IC 14.

Qualitative Analysis
For a qualitative analysis on the difficulties of conducting RCTs of psychological therapies in prisons, the discussion sections (and in particular the limitations parts) of included studies were reviewed through a thematic analysis, which identifies key recurrent messages from series of studies (Bearman & Dawson, 2013). The identified factors were organized thematically by the frequency of their appearance in these studies, and those that were mentioned by at least two independent researchers were extracted for the purposes of this synthesis.

Main Results
Study characteristics. We identified 37 studies from 31 publications (see Figure 1) between 1979 and 2015 from 7 different countries (China, India, Iran, Norway, Spain, US, and U.K.). This included 2,761 prisoners, 59% of whom were male. The mean age was 31.8 years (adult prisoners: 34.4 years, juveniles: 16.9 years). All identified studies recruited voluntary participants through informed consent, and none of the studied treatments were mandatory. Sixteen studies had either a specific diagnosis such as PTSD (n ϭ 6) and depression (n ϭ 2) or specific symptoms in their inclusion criteria (see Appendix C for details of included studies).
In addition, there were 12 studies with a satisfactory fidelity measure of treatment, 5 with a partial measure, 9 studies without any measure and 11 studies not reporting. Seven studies used doubleblinding.
Specific types of outcomes. Twenty studies that measured depression outcomes had a pooled effect size of 0.60, 95% CI [0.38, 0.83] with high heterogeneity (I 2 ϭ 71%, 95% CI [54%, 81%]; see Figure 3). There were higher effect sizes in the trials that used no treatment and waitlist controls.
Psychological treatments were effective for other mental health outcomes including anxiety, overall psychopathology, trauma, and anger/hostility but not for somatization (see Table  1).
Effect sizes at follow-up. Six studies investigated outcomes at 3 months posttreatment, and reported a nonsignificant pooled effect size of 0. . When the studies were stratified by treatment type, effect sizes did not significantly differ (see Figure 4).

Metaregression Results
Univariate metaregression analysis. Higher attrition rates and the use of no treatment/waitlist controls correlated with higher effect sizes (see Table 2).
The second most commonly identified problem was institutional constraints which reflected two main subcategories: constraints on the

Discussion
We have reported a systematic review and meta-analysis of RCTs of psychological therapies focused on prisoner mental health outcomes based on 37 studies involving 2,761 prisoners. Although the random-effects pooled effect size was 0.50 (95% CI [0.34, 0.66]), which would represent a medium effect (Cohen, 1977), after limiting RCTs to those with active controls, the effect size was reduced to 0.21 (95% CI [0.01, 0.41]). This pattern was consistent for specific mental health problems, such as depression, where there was the most evidence.

Implications
There were four main implications. First, this review suggests that RCTs of CBT and mindfulness-based therapies have shown moderate evidence to improve depressive and anxiety symptoms in prisoners where no preexisting treatments are in place, with mindfulness-based therapies possibly demonstrating higher effect sizes. The mechanisms underlying such treatment efficacy need exploration (van der Velden et al., 2015). Second, trauma-based therapies demonstrated limited evidence of effect on trauma symptomology. Although the difference between types of therapy was not statistically significant, both a visual analysis and a subgroup analysis of trauma symptom outcomes were consistently lower than other mental health problems such as depression or anxiety. Improving trauma-based treatments should be prioritized given the high prevalence of PTSD in prisons (4 -21%; Goff et al., 2007). Prisoners not only arrive with high levels of existing trauma symptoms, but also are prone to traumatic experiences in prison. Therefore, future research should take into account repeat traumas while in prison in the treatment delivery and assessment of outcomes. In contrast, we reported that trauma-based symptoms were reduced after psychological treatments in prisoners, but this was in trials using all therapeutic approaches, not only trauma-based ones. This suggests that reducing trauma symptoms in prisoners may benefit from improving psychological treatments more widely rather than introducing specific types of therapy. Third, it was difficult to come to conclusions about action-oriented approaches (such as art and music therapy) because of the lack of research and the difficulty in interpreting pooled estimates based on different treatments. These methods are not widely available to prisoners but may provide alternatives for those not interested in current treatments and be more cost-effective (Bilderbeck et al., 2013), partly because they are more accessible and less stigmatizing for male prisoners than other psychosocial treatments (Byrne, 2000). A final implication is based on the finding that participation type (group vs. individual) did not significantly differ, which suggests that group therapies could be considered as a baseline psychological intervention if resources are limited-although these will not be appropriate for acute illnesses. Caution is warranted in interpreting the lack of significant difference in format of therapy as there may be other explanations. For example, treatment dosage was different-the average treatment length was 10 weeks for group therapies, 6 weeks for individual therapies, and 12 weeks for combination ones (treatments comprised of weekly or biweekly sessions).
Most of the included trials involved short-term treatment with an average length of 10 weeks. Providing short-term psychological therapies can be efficient, particularly as the review found that the length of treatment did not alter treatment effects. However, as the maintenance of psychological gains was not found at 3 and 6 months, further research is needed to clarify ways to retain short-term gains, and consideration should be given to additional sessions after the ending of a treatment program. In addition, future research should investigate combined individual and group treatments.
Qualitative analysis of difficulties in conducting RCTs in prisons suggested that many obstacles would not be overcome by improving research design as many were secondary to structural factors (such as following up prisoners and scheduling treatments) in conducting research in prisons. The early involvement of the relevant custodial staff and departments in the research design and plans for implementation may address these problems.
We identified shortcomings in trial design in many of the included RCTs. Small samples in particular could be overcome by multicenter trials (Bilderbeck et al., 2013;Chen et al., 2016;Sleed et al., 2013;. In multisite trials, adherence to the study protocol must be thoroughly checked to ensure that the results are comparable in different sites. In addition, few studies utilized a fidelity measure to ensure consistent quality and delivery of treatment (Bond, Evans, Salyers, Williams, & Kim, 2000). We found that the presence of a fidelity measure was associated with lower effect sizes, possibly because of its association with implementing more stringent study conditions, and thus less prone to bias such as lack of blinding.
Prison populations exhibit high levels of psychopathology but also have elevated levels of comorbidity including personality disorder and substance use. If research and treatment pathways fail to take these comorbidities into account, any treatment approach that focuses on a single diagnostic group may encounter difficulties in identifying and interpreting the true clinical effect or may exclude individuals with notable health and social needs. For example, a pilot scheme in England extending a community service into prison (IAPT) identified that limiting the access for prisoners with more complex presentations excluded high need persons (Forrester, MacLennan, Slade, Brown, & Exworthy, 2014). The provision of more specialist and targeted services should, however, continue to be considered for acute cases and those who do not respond to available treatment approaches. A more joined-up approach between the offending and health pathways may be warranted. Many jurisdictions provide large-scale psychological treatment programs that address offending needs, including in relation to emotional management. These programs have successfully run for decades and although their impact on mental health is uncertain, future research on broader psychological outcomes could be considered.

Comparisons
Evidence comparing psychological and pharmacological treatments for prisoners is lacking as we did not identify head to head  (Leucht, Helfer, Gartlehner, & Davis, 2015). Although treatment effects were not sustained at 3-month and 6-month follow-up for studies that examined longer term outcomes in this review, this contrasts with trials of antidepressants and antipsychotics for acute treatment in the community that appear to be sustained at follow-up (e.g., for antidepressants at 12 (Chang, Lichtenstein, Langstrom, Larsson, & Fazel, 2016), and comparisons will need to take into account the differential adherence patterns between psychotropic medication and psychological treatments. A recent review of mostly CBT, disorder-specific psychotherapies, and psychodynamic approaches reported an effect size of 0.58 (Huhn et al., 2014), similar to our pooled estimate of 0.50. Community-based trials have also found that studies with no treatment/waitlist controls have higher effect sizes than subgroups with more active controls (such as those receiving placebo, treatment as usual, and noneffective therapy; Huhn et al., 2014). This supports the view that active treatment controls are likely to have better posttreatment outcomes than the no treatment/waitlist controls because of placebo or other nonspecific benefits from the intervention offered to the control group. Finally, the current review did not show clear differences in participation format (individual vs. group), similar to community studies (Gaudiano & Miller, 2013).

Strengths and Limitations
To our knowledge, this is the first comprehensive meta-analysis of all psychological therapies for prisoners. It includes 37 trials, and larger than a previous review of 15 investigations (Morgan et al., 2012), although the latter was focused on prisoners with dichotomous diagnoses. The current review also provides a more conservative estimate of effect (ES ϭ 0.50) than the 2012 review (ES ϭ 0.87), likely because of a larger number of included studies. On the other hand, some limitations need to be considered. Double-blinding is difficult for psychological treatment studies (Huhn et al., 2014). Lack of blinding can favorably bias treatment and imperfect blinding has been a commonly identified issue in other meta-analyses of psychotherapy studies (Gold, Voracek, & Wigram, 2004;Huhn et al., 2014;Sensky, 2005). In addition, there were 8 studies that did not employ intention-to-treat (ITT) analyses (ES ϭ 0.58, 95% CI [0.14, 1.01]), which might favorably bias the treatment group if noncompleters report lower treatment effects, and these were not different to those that used ITT analyses (ES ϭ 0.46,95% CI [0.29,0.63]. A further related limitation were the analytic strategies employed. Apart from one trial (Johnson & Zlotnick, 2012), studies did not use analysis of covariance (AN-COVA) when reporting posttreatment outcomes; using pretreatment scores as a covariate in comparing posttreatment scores would yield a more precise effect size estimate . Apart from four investigations Zlotnick, 2002;Zlotnick, Johnson, & Najavits, 2009), studies included in the review relied on self-report measures for outcomes (Ahrens & Rexford, 2002;Loper & Tuerk, 2011;Rohde, Jorgensen, Seeley, & Mace, 2004;Wilson, 1990). However, these are appropriate for many psychological trials; clinical interviews that only check for presence or absence of a formal diagnosis may not be sensitive to treatment change, and many trials did not require a baseline diagnosis. Nevertheless, some triangulation of outcomes (with clinical and possible biological markers) should be considered in future work. Furthermore, outcomes of specific disorders were not examined as a subgroup analysis in this review because of the limited number of studies that required participants to have a clinical diagnosis. Future work could consider recruiting prisoners with certain diagnoses, particularly severe mental disorders that are overrepresented in custodial populations and whose outcomes are worse than other prisoners. The shortage of empirically tested treatments targeting specific psychological diagnoses in prisoners seems to be largely a result of structural factors such as the institutional constraints of prison settings discussed in the thematic analysis in this review (see also Appendix E). However, the fundamental purpose of prisons is not the care and treatment of those with severe mental illness and the emphasis in many jurisdictions is on transferring them to secure hospitals in order to access the full range of appropriate care and treatment within an explicitly therapeutic environment. The interventions that prisoners may access in secure hospitals were not included in the review.
We reported high levels of heterogeneity and our overall effect size should be interpreted with caution. High levels of heterogeneity are not unusual for meta-analyses of RCTs, and partly reflect the diverse populations being studied (Higgins et al., 2003). We addressed this partly by conducting a number of subgroup analyses (by comparator, treatment type, and outcome) and multivariable metaregression on a range of prespecified characteristics. For example, metaregression analyses indicated that higher attrition rates were correlated with higher effect sizes (both univariately in all studies and also in a multivariable analysis for waitlist control studies). In studies that did not complete the intention-to-treat (ITT) analysis, one explanation is that participants who drop out do not complete all the treatment components, and are therefore less likely to benefit. It may also be in part attributed to other factors that have been shown to correlate with treatment dropouts such as format of treatment delivery (e.g., inperson v. self-guided) or number of sessions (Fernandez, Salem, Swift, & Ramtahal, 2015). Furthermore, the finding that retention rate and sample size were significantly associated with between-study heterogeneity in waitlist control studies but not in active treatment controls supports the view that the contribution of such factors is not as strong as it is in better designed studies. In addition, heterogeneity was not high in some of these subgroups such as trials with no treatment controls and those with trauma outcomes. Subgroup and metaregression analyses are potentially informative as they identified some consistent explanations for the variations between studies, which can be used to conduct and interpret future treatment trials in prisoners. However, for some subgroups such as 'other' therapies that included a wide range of treatments, clinical heterogeneity means that the pooled effect size should be interpreted with considerable caution. Other limitations of the review include that the metaregression was based on study characteristics that were reported, and there will be other explanations that we were unable to test, such as environmental factors (prison-related conditions, attitudes of correctional staff and other prisoners). The alternative-a systematic review without a meta-analysis-was considered and the information in this review allows for groups of similarly conducted studies to be reviewed. Moreover, we incorporated a qualitative analysis of the barriers to psychological trials in prisons. At the same time, as we have conducted two complementary analyses of heterogeneity, this review is more than simply a presentation of pooled estimates.
In addition, another limitation is that we examined outcomes using continuous symptom scores rather than categorical diagnoses, which meant they were more sensitive to change, and included prisoners without diagnoses at baseline. The alternative-to investigate changes to diagnoses-may be easier to interpret and assist in planning services, but was not feasible due to the lack of relevant studies, and future studies could consider including both continuous and categorical outcomes.

Conclusion
We found that psychological therapies for mental health outcomes in prisoners were modestly effective when there are no existing psychological treatment programs. However, effects were weaker when active treatment controls and a fidelity measure were used in trials. Whether this level of evidence is sufficiently strong for the introduction of such therapies in prison requires careful review and consideration of other factors including cost-effectiveness.

Appendix A Search Keywords
Psychological disorders keywords: mental ‫ء‬ , psych ‫ء‬ , disorder ‫ء‬ , depress ‫ء‬ , schizo ‫ء‬ Prison setting keywords: prison ‫ء‬ , inmate ‫ء‬ , jail ‫ء‬ , penal, correctional, incarcerated Intervention keywords: cognitive behavioral/ behavioral therapy, CBT, group therap ‫ء‬ , intervention ‫ء‬ , treatment ‫ء‬ , therapeutic communit ‫ء‬ Study design keywords: RCT ‫ء‬ , controlled trial ‫ء‬ , randomi#ed controlled trial ‫ء‬ , controlled clinical trial ‫ء‬ , randomi#ed trial ‫ء‬ , randomi#ed clinical trial ‫ء‬ , trial ‫ء‬ Additional targeted search for psychological therapies that may have been missed by other search terms were performed with the keywords: therap ‫ء‬ and psychotherapy. The only difference between groups is the treatment under investigation. Well covered Not addressed Adequately addressed Not reported Poorly addressed Not applicable 1.7

RCT Quality Checklist Used for Quality Appraisal
All relevant outcomes are measured in a standard, valid and reliable way. Well covered Not addressed Adequately addressed Not reported Poorly addressed Not applicable 1.8 What percentage of the individuals or clusters recruited into each treatment arm of the study dropped out before the study was completed? Well covered Not addressed Adequately addressed Not reported Poorly addressed Not applicable 1.9 All the subjects are analyzed in the groups to which they were randomly allocated (often referred to as intention-to-treat analysis studies that did not provide mean age but report age range of the sample, the range was recorded. (Appendices continue)

Note.
Full description of quality checklist questions can be found in Appendix B. Abbreviations: equiv.
ϭ equivalent; sig. ϭ significant; expr ϭ experimental. Key for rating: 1 ϭ Well covered; 2 ϭ Adequately addressed; 3 ϭ Poorly addressed; N/A ϭ Not addressed/not applicable; ϩϩ ϭ all or most of the criteria fulfilled (unfulfilled criteria very unlikely to alter conclusions of study); ϩ ϭ some of the criteria fulfilled (unfulfilled criteria unlikely to alter conclusions of study).