Intervention studies to foster resilience - A systematic review and proposal for a resilience framework in future intervention studies.

Psychological resilience refers to the phenomenon that many people are able to adapt to the challenges of life and maintain mental health despite exposure to adversity. This has stimulated research on training programs to foster psychological resilience. We evaluated concepts, methods and designs of 43 randomized controlled trials published between 1979 and 2014 which assessed the efficacy of such training programs and propose standards for future intervention research based on recent developments in the field. We found that concepts, methods and designs in current resilience intervention studies are of limited use to properly assess efficacy of interventions to foster resilience. Major problems are the use of definitions of resilience as trait or a composite of resilience factors, the use of unsuited assessment instruments, and inappropriate study designs. To overcome these challenges, we propose 1) an outcome-oriented definition of resilience, 2) an outcome-oriented assessment of resilience as change in mental health in relation to stressor load, and 3) methodological standards for suitable study designs of future intervention studies. Our proposals may contribute to an improved quality of resilience intervention studies and may stimulate further progress in this growing research field.

resilience definitions and the instruments used to measure resilience, and will 2) propose standards for future intervention trials based on recent developments in the field.

Current state and challenges in resilience research in relation to definition and assessment of resilience
Over the past two decades, the concept of resilience has significantly changed from a trait-oriented to an outcome-or process-oriented approach. A trait-oriented approach assumes that resilience is primarily determined by a certain personality type (often referred to as 'hardy personality'), which enhances individual adaptation to stress or adversity (Block & Block, 1980;Connor, Davidson, & Lee, 2003;Hu, Zhang, & Wang, 2015;Ong, Bergeman, Bisconti, & Wallace, 2006). Resilience conceptualized as a trait is considered as an intrinsic and stable attribute. Up to now, however, there is only weak empirical evidence supporting that assumption (Bonanno & Diminich, 2013;Kalisch et al., 2017). Instead, personality seems to be one of many risk or resilience factors for maintaining or regaining mental health (Bonanno & Diminich, 2013;Luthar, Cicchetti, & Becker, 2000).
In recent years, resilience is increasingly considered as an outcome (outcome-oriented approach) meaning that mental (or physical) health is maintained or regained despite significant stress or adversity (i.e., short-term/acute or long-term/chronic, social or physical stressors) (Kalisch et al., 2017;Kalisch, Müller, & Tüscher, 2015). Here, the exposure to substantial risk or adversity is a central prerequisite of resilience (Earvolino-Ramirez, 2007;Jackson, Firtko, & Edenborough, 2007;Luthar et al., 2000;Masten, 2001). In this vein, the psychological resilience of a person can only be determined if the individual was or is currently exposed to stress or trauma. Resilience as outcome is viewed as modifiable (Masten, 2001) and is partially determined or predicted by multiple resilience factors (Bonanno & Diminich, 2013). Resilience factors refer to resources which protect a person from the potential negative effect of encountered stressors by modifying the individual's response to stress and adversities (Fletcher & Sarkar, 2013;Rutter, 1985). These include internal factors, such as (epi)genetics, (resilienceconducive) personality traits (e.g., optimism, hardiness) or beliefs (e.g., self-efficacy) (Reivich & Shatté, 2002;Southwick & Charney, 2012;Southwick, Litz, Charney, & Friedman, 2011). Besides the individual perspective, the role of external and environmental resources for resilience (e.g., social, material or energy resources), the access to those resources and the stability of access were pointed out (Hobfoll, Stevens, & Zalta, 2015). For example, people living in resource rich and stable environments were shown to be more resilient when faced with adversities than individuals in a more unfavorable context. As internal and external resilience factors are predictors, they have to be differentiated from resilience as an outcome (Bonanno, Romero, & Klein, 2015;Kalisch et al., 2015;Kalisch et al., 2017).
Finally, resilience is increasingly considered as a dynamic process of adaptation itself (process-oriented approach). The temporal aspects of resilience were emphasized, for example, by Bonanno et al. (2015). Besides considering the baseline or preadversity functioning and looking at characteristics of the actual aversive circumstances (e.g., chronic or acute events or level of exposure, such as the physical proximity to a stressor), they suggested several trajectories of postadversity adjustment: chronic dysfunction, recovery, delayed reactions and resilience (for an overview see Bonanno & Diminich, 2013).With regard to resilience, the authors further differentiate between emergent resilience (i.e., resilience following a chronic aversive event) and minimal-impact resilience (i.e., resilience following acute aversive events) (Bonanno et al., 2015). However, the trajectories of adjustment are conceptualized differently by other research groups. For example, Layne et al. (2009) delineate stress resistance and resilience. According to these authors, stress resistance refers to maintaining homeostasis and a stable adaptive functioning when faced with adversity (compare resilience according to Bonanno et al., 2015). Resilience, on the other hand, is rather understood as trajectory of recovery (i.e., full recovery of homeostasis following temporary perturbation in functioning after a stressor). Depending on the time required to restore healthy systems, resilience may also be distinguished from protracted recovery (i.e., gradual recovery). One step further, posttraumatic growth is another trajectory of adjustment that is differentiated from resilience in the literature (e.g., Layne et al., 2009). Whereas resilience relates to maintaining mental health or the full recovery of preadversity functioning, posttraumatic growth pertains not only to restoring homeostasis, but also to increasing the level of functioning compared to the outset prior to stressor exposure by positive transformations (Layne et al., 2009). In comparison to resilience that frequently occurs after adversity and is the most prevalent outcome (Angel, 2016), posttraumatic growth is seen as the rarer phenomenon that can be observed in less resilient individuals (Angel, 2016;Levine, Laufer, Stein, Hamama-Raz, & Solomon, 2009). According to Tedeschi and Calhoun (1996), posttraumatic growth includes the perception of benefits (i.e., meaning making) in different domains (i.e., closeness in social relationships, possibilities in life, personal strengths, spiritual change and appreciation of life) after a traumatic event. It results from reflective ruminative thinking (Angel, 2016;Zoellner & Maercker, 2006) and can in turn increase future resilience as soon as the individual had time to process the traumatic event (Angel, 2016;Tedeschi, 2011). Posttraumatic growth is associated with several resilience factors, such as optimism, positive reappraisal or sense of coherence (Zoellner & Maercker, 2006). However, since the associations between posttraumatic growth and mental dysfunctions (e.g., depression, PTSD) are less clear (Zoellner & Maercker, 2006), more research on posttraumatic growth as an adaptive phenomenon (as assumed for example by Tedeschi & Calhoun, 1996) is needed.
To sum up, in current research, resilience as process is characterized by either a trajectory of undisturbed, stable mental health during or after a period of adversity or by a pattern of temporary disturbances that is followed by a relatively rapid and successful recovery (see also American Psychological Association, 2015; Kalisch et al., 2015;Mancini & Bonanno, 2009;Norris, Tracy, & Galea, 2009;Sapienza & Masten, 2011;Windle, 2011). As resilient individuals are able to adapt in the face of adversity, they are assumed as being less likely to engage in the meaning-making processes that are related with posttraumatic growth (Levine et al., 2009).
Consequences of the conceptual heterogeneity in resilience definitions have already been discussed elsewhere (e.g., Davydov, Stewart, Ritchie, & Chaudieu, 2010;Kalisch et al., 2017;Luthar et al., 2000). With regard to intervention research that aims to modify certain behaviors and cognitions, a trait-oriented definition of resilience does not seem to be useful. Although individuals with certain resilience-conducive factors and traits may be more likely to have positive outcomes than others (Miller & Harrington, 2011), such resilience-conducive factors and traits should not be confounded with the resilient outcome itself (Kalisch et al., 2015). For intervention research, an outcome-oriented definition seems more appropriate since it views resilience as a modifiable and teachable construct.
The heterogeneity in resilience definitions also influences the operationalization of the construct. Up to now, there is no 'gold standard' for the assessment of resilience and no established outcome measure of resilience. It is often examined by self-report 'resilience scales' or measures of surrogate outcomes.
Overall, most of the existing 'resilience scales' measure resilience as stable personality trait (e.g., Resilience Scale [RS]; (Wagnild & Young, 1993) or focus on assessing the availability of different resilience factors (e.g., social support, self-efficacy) to maintain or regain mental health despite significant adversities (e.g., Connor-Davidson Resilience Scale [CD-RISC];  (for an overview and evaluation of psychometric quality of these scales see Pangallo, Zibarras, Lewis, &Flaxman, 2015 andWindle, Bennett, &Noyes, 2011). In this way, they only provide a summary or 'composite' of putative resilience factors supporting positive adaptation to stress and adversities (Smith et al., 2008;Windle et al., 2011), whereas resilience as an outcome is not assessed. Smith et al. (2008) suggested a different approach. The Brief Resilience Scale (BRS; (Smith et al., 2008) measures the ability to recover from stress and is the only scale not simply assessing factors that may favor mental health despite adversities. However, to the best of our knowledge, when considering resilience as outcome, the explained variance of 'resilience scales' has not yet been systematically analyzed. As a consequence, it remains unclear to what extent those scales predict a resilient outcome.
Another approach that is relevant for resilience intervention science is the use of so-called 'surrogate outcomes' (Macedo et al., 2014). Based on definitions of surrogate outcomes in clinical trials (e.g., La Cour, Brok, & Gøtzsche, 2010;Twaddell, 2009), they can be defined as outcome measures that are assessed as alternatives or substitutes of intervention-specific outcomes, which are not specifically targeted by an intervention. Examples include resilience factors, mental health-related constructs (e. g., well-being or quality of life) or subjective stress (e.g., Macedo et al., 2014).
There are also alternative approaches for resilience assessment. We recently suggested a conceptual framework for the neurobiological research of resilience against stress-related mental dysfunctions and made proposals for outcome variables (compare Kalisch et al., 2015). Since resilience as outcome is defined as mental health despite stress, the outcome variable has to take account of mental health and individual stressor exposure. Evidence for this approach comes from non-interventional studies. Here, mental health is defined as the main dependent variable and stressor load (i.e., sum of experienced stressors) is considered as an independent variable, thereby controlling for individual differences in stressor exposure. The results of these studies are that stressor load has a significant negative effect on mental health (e.g., Kendler, Karkowski, & Prescott, 1998;Lu, 1991;Salguero, Fernández-Berrocal, Iruarrizaga, Cano-Vindel, & Galea, 2011). Second, as the cutoffs between normal and pathological symptom levels are often arbitrary, resilience studies should focus on mental health dimensions and observable behavior that can be investigated across conventional diagnostic boundaries (Kalisch et al., 2015).

Study designs in current resilience intervention research
All currently available resilience intervention studies apply a longitudinal design testing the efficacy of training programs with at least two measurement time points, one before and one after the intervention. The study design, however, varies regarding the time point the training is conducted in relation to stressor exposure (before, during or after). Overall, there are three types of resilience interventions (see Fig. 1): 1) Resilience interventions before stressor exposure (Fig. 1a): Training programs conducted in preparation of an imminent, acute and often severe stressor that will be experienced by all participants (e.g., military deployment) in order to prevent mental dysfunctions. Since stressor exposure begins after the end of the training program, the period between T2 and T3 is particularly relevant to test the efficacy of the intervention in preventing mental dysfunctions. Therefore, in relation to the stressor exposure, T2 is a first post-test and T3 is the second post-test in this intervention design. The follow-up assessment at T4 (and further time points) allows for examining the longterm effects of the intervention. 2) Resilience interventions during stressor exposure (Fig. 1b): Training programs conducted during (chronic) stressor exposure (e.g., workplace stressors in employees) to prevent or treat subsequent mental dysfunctions. After testing for an improvement in resilience between T1 (baseline) and T2 (post-test), the follow-up assessments at T3 and T4 also allow for examining long-term effects of the resilience training program. 3) Resilience interventions after stressor exposure ( Fig. 1c): Training programs implemented after a severe, acute and often unpredictable stressor that was experienced by all participants (e.g., natural disaster) to prevent or treat subsequent mental dysfunctions. To assess the effect of the intervention, investigators would have to assess outcome changes before (baseline T1) and after (post-test T2) the intervention. Long-term effects of the resilience training program could be examined at the follow-up assessments at T3 and T4.

Aims of this review
In this review we aim to a) critically evaluate and discuss existing intervention research with regard to resilience definition, its assessment and technical design issues in 43 randomized controlled trials published between 1979 and 2014; and b) derive a first set of suggestions for further intervention studies based on established guidelines in intervention research and current developments in the field.
In the following part, we describe our methods in evaluating A. Chmitorz et al. Clinical Psychology Review 59 (2018) 78-100 intervention studies before presenting the results. The results section is divided in three sections: In the first two sections, we synthesize current issues concerning the operationalisation of resilience (definition and assessment) in intervention studies and discuss the use of surrogate outcomes of resilience. The third section contains the results of our evaluation with regard to design issues of resilience intervention studies. In the discussion, we summarize our findings and further discuss our proposals for the assessment of outcome-defined resilience in intervention trials. Based on the results of our evaluation, we end our review with a proposal for a checklist (see conclusion), which may be useful for designing and conducting future resilience intervention studies.

Method
This is the first review that synthesizes all randomized controlled trials (RCTs) (k = 43) that were published between 1979 and 2014 and which were included in recently published systematic reviews on the efficacy of resilience intervention trainings (Leppin et al., 2014;Macedo et al., 2014;Robertson et al., 2015;Vanhove et al., 2015). In contrast to those reviews, which summarized the empirical evidence for resilience training programs in adults, we focus on methodological aspects of resilience intervention studies to answer the question whether concepts, methods and designs of those studies are useful to measure the efficacy of intervention trainings to foster resilience.
In this review, we paid particular attention to the definition and assessment of resilience in intervention studies. For each intervention study, we examined the presence of a resilience definition and analyzed the outcome variables used (see Appendix, Table B.1). Studies were classified depending on the use of a resilience scale and the assessment of resilience factors, mental health-related constructs, stress perception, mental health and stressor load. Moreover, we focused on the following methodological aspects: time point of intervention in relation to stressor exposure, sample size and sample size calculation, comparator used, use of comprehensive baseline diagnostics of mental and physical health, analysis of baseline comparability, post-test and follow-up period, assessment of adverse effects, assessment of satisfaction and usability as well as use of multi-center or multi-site trials (see Appendix, Table D.1).

Results
In this part, we present the findings of our evaluation concerning the definition and assessment of resilience as well as technical design issues and make specific suggestions for the respective planning of future intervention studies to foster resilience.
In the 25 studies without a resilience definition, the underlying resilience concept can only be retrieved from the training concept or the assessment method used in the evaluation.
Suggestions: In order to compare findings of intervention studies and facilitate meta-analysis, we suggest to determine the resilience definition at the planning phase of an intervention study and to clearly state it in any subsequent publication. Heterogeneous definitions of a construct or an outcome limit the comparability of study results and the possibility to validly pool the evidence on the efficacy of resilience training programs (Robertson et al., 2015). Specifically, we propose that intervention studies should be based on an outcome-oriented approach instead of defining resilience as personality trait (see also introduction).

Assessment of resilience
In the following sections, we present our findings on the assessment of resilience in intervention studies regarding resilience scales, surrogate outcomes for resilience as well as the assessment of mental health and stressor load.
Suggestions: A large number of 'resilience scales' assess several resilience factors. By assessing resilience factors, however, valid conclusions about the efficacy of resilience interventions cannot be drawn. We therefore propose to clearly distinguish between resilience factors and resilience as an outcome (see also introduction) and use separate outcome measures for their assessment. Studies interested in assessing resilience factors should rely on specific instruments developed to assess those factors.

Surrogate outcome measures
Fourteen of the 43 RCTs used 'surrogate outcomes' for measuring intervention effects (see Appendix, Table E.1 for an overview), although those studies were defined as evaluations of interventions fostering resilience.
• Mental health-related constructs: Six of the 43 RCTs measured mental health-related constructs, such as psychological well-being, as surrogate outcomes (Abbott, et al., 2009;Burton, et al., 2009;Grant, et al., 2009;Grant, et al., 2010;Kent, Davis, Stark, & Stewart, 2011;McCraty & Atkinson, 2012). Suggestions: Evaluation studies should consider the outcomes targeted by the intervention in order to avoid selective reporting bias (see conclusion and Table 1). We therefore suggest to assess outcome-defined resilience as primary outcome in resilience intervention studies (see part 'Assessment of mental health and stressor load'). Measures of resilience factors, mental health-related constructs and stress perception may then be included as secondary outcomes to gather additional information.
• Stressor load: Two of the 43 studies have considered individual stressor exposure when investigating mental health. In these RCTs conducted in the military context (Adler, et al., 2009;Castro, Adler, McGurk, & Bliese, 2012), individual differences in combat exposure prior to a resilience training were controlled in the analysis of mental health (PTSD symptoms and depression). No study examined the impact of stressors during or after an intervention.
Suggestions: As per definition, stressor exposure is a prerequisite of resilience and an individual's resilience can only be determined in the presence of stressor exposure (see introduction). Therefore, we suggest to operationalise outcome-defined resilience by assessing a person's mental health in relation to the individual stressor load, that is, the sum of experienced stressors in a certain time period.
Depending on the objectives of a resilience training program, mental health can be assessed at a more general (i.e., covering several dysfunctions) or more specific level (i.e., focusing on a single dysfunction). Mental dysfunctions should be assessed by quantitative, ideally dimensional, outcomes. In case resilience interventions aim at protecting participants against several stress-induced negative mental health effects, the mental health assessment should test for a wide range of dysfunctions. We suggest measuring a global or average score of mental health reflecting the average burden from several dysfunctions. Examples include the General Health Questionnaire (GHQ) (Goldberg & Hillier, 1997), the Adult Self Report (ASR) (Achenbach & Rescorla, 2003), the Symptom Checklist-90-Revised (SCL-90-R) (Derogatis, 1994) or the Brief Symptom Inventory (BSI) (Derogatis, 1993). All four instruments show good psychometric quality (Achenbach & Rescorla, 2003;Derogatis & Melisaratos, 1983;Groth-Marnat, 2009). If, however, resilience training programs aim at exclusively preventing or treating a specific mental dysfunction (e.g., posttraumatic stress symptoms in soldiers returning from military deployment), it may be sufficient to measure only this specific dysfunction. Nevertheless, a global mental health score would be beneficial to obtain additional information on the effects of the training on a wider range of mental dysfunctions.
The assessment of stressor load in resilience intervention studies might also be more or less detailed depending on the specific adversities the training participants are exposed to. The individual stressor load may be assessed either by questionnaires for micro-or macrostressors (Caspi et al., 1996;Lazarus & Folkmann, 1989;Swann & Hodson, 2004) or by using Ecological Momentary Assessment (EMA), which aims to capture subjective experiences, emotions and psychological reactions in daily life using, for example, electronic diaries via handheld computers or internet enabled smartphones (Kubiak & Krog, 2012;Trull & Ebner-Priemer, 2014). In this way, it allows for the repeated sampling in a given subject in the sense of intensive longitudinal data (Bolger & Laurenceau, 2013), and tapping into a range of relevant domains in real-time and in the subjects' natural habitat (Kaplan & Stone, 2013;Shiffman, Stone, & Hufford, 2008). Ecological Momentary Assessment has been successfully employed in a range of different fields within the health science (Robbins & Kubiak, 2014;Smyth & Stone, 2003;Trull & Ebner-Priemer, 2013;Trull & Ebner-Priemer, 2014) and could also be beneficial for resilience intervention studies.
We are aware that these suggestions impose considerable additional burden on intervention studies. We nevertheless believe that resilience training must prove that it enhances an individual's chances to stay mentally healthy in the face of adversity.

Technical design issues in resilience intervention studies
Besides of those limitations already pointed out previously (Leppin et al., 2014;Macedo et al., 2014;Robertson et al., 2015;Vanhove et al., 2015) (small sample sizes, lack of dismantling designs, lack of follow-up assessment or insufficient follow-up periods and lack of adverse effects assessment) our evaluation of technical design issues in the 43 RCTs yielded the following additional aspects: lack of a priori sample size calculations, lack of adequate comparators and comparable treatment doses in intervention and control groups, lack of adequate baseline diagnostics of mental and physical health, missing assessment of baseline comparability and the participants' satisfaction and usability of the training as well as insufficient conduct of multi-center studies.

A priori sample size calculation
Only 11 of the 43 RCTs performed an a priori sample size calculation before recruiting training participants (see Appendix, Table D.1; column Type of sample and sample size (Randomized/Analyzed); a priori sample size calculation).
Suggestions: In line with international guidelines, we propose that investigators should perform an a priori sample size calculation und describe the procedures of sample size determination.

Comparators, multiple treatment groups and dismantling design
In eight of the 43 RCTs the control group received no intervention (NI), 15 studies used a waiting list control (WL) and seven compared a resilience intervention with treatment as usual (TAU) (i.e., training participants receive TAU plus a resilience intervention or the resilience intervention is compared to TAU). Examples for the TAU conditions are standard post-deployment stress education (Adler, et al., 2009), standard diabetes self-education (Bradshaw, et al., 2007), a general practitioner's usual care (Brouwers, et al., 2006), a traditional childbirth education program (Schachman, et al., 2004) or standard care and treatment for depressive patients (Songprakun & McCann, 2012a).
In 13 of 43 RCTs, attention control (AC) groups were used (see Appendix, Table D.1; column 'Comparators, multiple treatment groups, dismantling design'). Examples for the AC groups are discussion groups (Cohn & Pakenham, 2008), emotional ventilation (Farchi & Gidron, Table 1 Proposal of a checklist for planning and conducting resilience intervention studies based on international guidelines and up to date resilience research (Higgins & Green, 2011;Moher et al., 2010;WBP, 2010). • Dropout analyses (comparison of number/reasons for dropout between study groups; comparison of characteristics between participants dropping out and remaining in the study) (to be transparent about potential attrition bias) □ • Intention-to-treat analyses (ITT) □ a Performance bias can hardly be avoided in resilience interventions that are implemented face-to-face as intervention providers cannot be blinded. However, in case of online resilience training programs, even this sort of bias may be avoided.

Definition of resilience
A. Chmitorz et al. Clinical Psychology Review 59 (2018) 78-100 2010), a general wellness online intervention (Kanekar, et al., 2009), a decision-making exercise (Luthans, et al., 2010;Luthans, et al., 2008), a non-directive support intervention (Sahler, et al., 2013) or a verbal and written skills training (Sadow & Hopkins, 1993). Among the 13 studies using AC, in one case (Rose, et al., 2013) treatment doses were not fully comparable, since the authors compared an interactive resilience training program with a rather passive learning AC. Two of the 43 studies included multiple intervention groups to compare resilience interventions with alternative treatments (e.g., Battlemind debriefing, relaxation/meditation training) (Adler, et al., 2009;Maddi, et al., 1998). Three of the 43 studies compared different types of resilience interventions (e.g., small vs. large group intervention; emotion vs. problem-focused intervention) (Adler, et al., 2009;Bond & Bunce, 2000;Burton, et al., 2009;Gardner, et al., 2005). In a study protocol, Burton, et al. (2009) planned to examine the additional value of integrating physical activity promotion in a resilience intervention. Furthermore, none of the RCTs used a dismantling design to study the efficacy of single components of a resilience training program.
Suggestions: According to the 'gold standard' for intervention research, resilience training programs should be evaluated in four phases (Mazurek Melnyk, Morrison-Beedy, & Moore, 2012;NIH, 2016). Phase 1 includes concept development and feasibility testing. In phase 2, resilience interventions should be tested in open-ended, uncontrolled studies. In phase 3, trainings are then evaluated in RCTs which provide the most reliable evidence on the efficacy of interventions (Kendall, 2003;Moher et al., 2010;WBP, 2010). In line with that, Leppin et al. (2014) and Robertson et al. (2015) already recommended using RCTs as ideal study design for resilience intervention research. To exclude bias of true outcomes through attention effects, we suggest using at least TAU and preferably AC as control group in RCTs. Studies should apply identical treatment doses in intervention and control groups, for example, with regard to the frequency of contact to intervention providers. After having compared the resilience intervention with TAU, we suggest to examine its (incremental) effects in contrast to alternative treatments (e.g., cognitive-behavioral therapy in clinical populations). Besides, resilience training programs should be tested in different samples to examine its specificity for certain populations. In phase 4, after the evidence for the efficacy of a multicomponent intervention has been established, dismantling studies should be conducted to identify effective training components. In addition, phase 4 could include larger field trials in real-world settings.

Baseline diagnostics
Another critical aspect in intervention studies is the lack of sufficient diagnostic procedures for mental and physical health prior to the resilience training program.
Suggestions: Resilience intervention studies should include a structured and fully standardized clinical interview for mental health (e.g., SCID-I and II, M.I.N.I.) (First, Williams, Karg, & Spitzer, 2015;Sheehan et al., 1998) for all subjects at baseline, as also suggested by the German Scientific Advisory Board on Psychotherapy (WBP, 2010). In addition, studies could use self-report screening instruments in order to monitor changes in the participants' mental health throughout the intervention. These instruments should be sensitive to change and consider various symptom domains. Examples are the instruments we already proposed for generating a global mental health score (see 'Assessment of mental health and stressor load'). In case a resilience intervention aims at protecting participants against the development of mental dysfunctions before, during or after stressor exposure (i.e., prevention), only healthy individuals have to be included in the intervention study. However, if the training program focuses on treating mental impairments (i.e., therapy) during or after stressor exposure, participants would have to be selected according to existing dysfunctions.
Suggestions: Comparable to mental health diagnostics at baseline, resilience intervention studies should also include a comprehensive baseline assessment of physical health and medical history for all participants. Similar to baseline assessments in epidemiological studies (e.g., Dawber, 1980;Marmot & Brunner, 2005), it could comprise a health survey considering general health, different illnesses and symptoms (e.g., heart diseases), medical consultation, as well as medication use. Moreover, health-related quality of life could be assessed using, for example, the Medical Outcomes Study Short Form Health Survey (SF-36) (Brazier et al., 1992;McHorney, Ware, & Raczek, 1993).
In addition, data on health-related behaviors (e.g., physical activity) should be collected at baseline. During the intervention, physical health should be monitored using scales that have proven being sensitive to changes in physical health like the SF-36 (e.g., Hemingway, Stafford, Stansfeld, Shipley, & Marmot, 1997).

Baseline comparability
Of the 43 RCTs, 29 assessed the baseline comparability of intervention and control group with regard to sociodemographic and outcome variables.
Suggestions: Baseline characteristics of the intervention and control group should be compared (post-hoc) to test for comparability between study groups (Altman et al., 2001) and to verify a successful randomization.

Post-test and follow-up assessment
With regard to the three forms of intervention designs (see Fig. 1 and introduction), the resilience intervention was conducted before stressor exposure in two of the 43 RCTs ( Fig. 1a; e.g., resilience intervention to prepare police academy trainees for the stressors of police work) (see Appendix, Table D.1; column 'Time point of implementation in relation to stressor exposure'). Forty-one of the 43 RCTs evaluated a resilience intervention implemented during (k = 36, Fig. 1b) or after stressor exposure (k = 5, Fig. 1c) (e.g., during continuous work stressors or in the aftermath of various traumatizing events, respectively).
Overall, 36 of the 43 RCTs performed a post-test assessment. Since in the two studies conducted before stressor exposure the exact time point and time frame of the specific stressor(s) was not predictable, both studies used simulations (e.g., simulated stressful police activities; stressful video of a car accident) (Sarason, et al., 1979;Varker & Devilly, 2012) at the end of the training programs to assess intervention effects. Assuming that all assessments within one week after the end of the intervention can be viewed as post-test in studies conducted during or after stressor exposure, 34 of the remaining 41 RCTs performed a post-test.
Follow-up assessment was conducted in 26 of the 43 studies. Of the two RCTs that evaluated a resilience intervention before stressor exposure, only Varker and Devilly (2012) assessed the participant's reaction to a stressful video at a four-week follow-up. In case of the 41 studies evaluating a training program during or after stressor exposure, 25 included a follow-up assessment whereby half of these studies (13/ 25) considered short follow-up periods of three months or less (see Appendix, Table D

.1; column 'Post-test/Follow-up').
Suggestions: Intervention studies should include a post-test and a long-term follow-up assessment. If a resilience intervention is conducted before a stressor exposure, the time period between the end of the training program and the end of the stressor exposure is relevant in order to examine the efficacy of the intervention in preventing mental dysfunctions. In this case, intervention studies should include a posttest immediately or at least within several days after the stressor exposure. If the training program is implemented during or after stressor exposure, resilience intervention studies should include a post-test immediately or within several days after the end of the training (e.g., last training session). Follow-up assessments should be conducted in all three types of intervention studies (see 'Study designs in current resilience intervention research'). Repeated follow-up measurements after at least six months and 12 months after the stressor exposure (see Fig. 1a) or after the end of the resilience intervention (see Fig. 1b and  c), respectively, are required in order to examine long-term effects of the intervention.

Assessment of adverse effects
None of the 43 RCTs assessed potential adverse effects of resilience interventions (see Appendix, Table D.1; column 'Assessment of adverse effects').
Suggestions: A systematic assessment of unwanted or adverse effects of the training program should be included in resilience intervention studies. To assess adverse effects after a resilience training, items similar to instruments in psychotherapy research (e.g., Unwanted Events and Adverse Treatment Reaction Checklist for Psychotherapy (UE-ATR); Linden, 2013) could be applied at post-test and follow-up.

Assessment of satisfaction and usability
Satisfaction and usability were assessed in 13 of the 43 studies (see Appendix, Table D.1; column 'Assessment of satisfaction and usability').
Suggestions: The participants' satisfaction with the resilience training program and the usability of learned skills should be examined. It may be relevant since those aspects could have an effect on the efficacy of the intervention.

Multi-center or multi-site trials
So far, 42 of 43 intervention studies were conducted as single-center designs, i.e., resilience training programs were tested at single clinical/ non-clinical locations (e.g., specific worksite) (see Appendix, Table D.1; column 'Multi-center or multi-site trial'). Only Sahler, et al. (2013) conducted a multi-site trial by investigating the efficacy of 'Bright IDEAS' at four hospitals.
Suggestions: Single-center trials may be particularly useful in the early phase of resilience training evaluation (Bellomo, Warrillow, & Reade, 2009). Subsequently, the intervention should be tested in multicenter trials to examine whether results on their efficacy can be generalized (Bellomo et al., 2009;Gheorghe, Roberts, Ives, Fletcher, & Calvert, 2013).

Discussion
By evaluating methods and designs of the 43 RCTs aiming to foster resilience, major problems in the concepts, methods and designs of the studies were identified. First, there is no consistent definition of resilience in intervention studies. In more than half of the studies a resilience definition was not included. In the remaining studies, the definitions differed significantly. Second, studies differ with regard to the outcome variables and assessment instruments to measure resilience, which limits comparisons and pooling of study results. In one third of the studies, resilience was assessed by resilience scales measuring certain resilience factors. Another third of the studies assessed surrogate outcomes of resilience, in particular resilience factors, mental health related constructs or stress perception. A major problem is that, although measures of mental health were included in more than half of the studies, individual stressor exposure was only considered in two studies. Third, there are several major technical design problems, e.g., the lack in a priori sample size calculations (only conducted by one quarter of the studies) or the lack of adequate comparators. Since half of the studies used no intervention in the control group or a WL control, attention effects may bias the results. None of the studies used a dismantling design. Moreover, we found a lack of adequate baseline diagnostics. While mental health at baseline was assessed in three quarters of the studies, physical health at baseline or aspects related to it was only assessed in one quarter of the studies. Whereby the majority of studies performed post-test assessment, only two thirds conducted longterm follow-up assessments. Adverse effects were not assessed in any of the studies. Participants' satisfaction was measured in one third of the studies. There was only one multi-center study among the studies included in our evaluation.
We conclude from this in-depth methodological analysis that concepts, methods and designs in current resilience intervention studies are of limited use to properly assess the efficacy of interventions to foster resilience. This results on the one hand from an incomplete application of international guidelines for the conduct and report of intervention studies, such as CONSORT and Cochrane criteria (Altman et al., 2001;Higgins & Green, 2011). On the other hand, and more importantly, the definition and assessment of resilience as a trait construct or composit of resilience factors and the lack of assessment of the relationship of stressor load and mental health as a measure of resilience makes it difficult to conclude that resilience trainings have indeed resulted in resilient outcomes. As a consequence we first propose a checklist which summarizes measures to prevent major biases in further intervention studies and thereby may help to improve quality in further resilience intervention research (see Table 1). Second, we discuss in the following paragraph our proposals for the assessment of outcome-defined resilience in intervention studies. Using such an assessment may be better suited to demonstrate efficacy of interventions aiming at fostering resilience.

A proposal for study designs for resilience interventions before, during, and after stressor exposure
Following the line of the principal designs for resilience intervention studies (see 'Study designs in current resilience intervention research' and Fig. 1), we propose three study designs (see Appendix F, Figs. F.1-F.3) for testing the efficacy of resilience interventions. They refer to RCTs comparing the resilience intervention group with a comparator group. These proposals build on the following considerations (see also Kalisch et al., 2015): In a longitudinal resilience study with two time points T A and T B , mental health problems P at T A and T B are expressed as sum scores ΣP TA and ΣP TB . The change in mental problems from T A to T B is expressed as ∑(P TB -P TA ), whereby a positive sign reflects an increase in mental health problems from T A to T B . An individual's cumulative stressor load between T A and T B is assessed using a quantitative sum score ∑ S TA to TB . In order to operationalise resilience as an outcome, mental health problems are then related to the individual stressor load. The outcome-oriented assessment of resilience is based on the assumption that someone is more resilient at T B the less that person develops mental problems between T A and T B (the smaller ∑(P TB -P TA )) in proportion to the stressor load accumulated between T A and T B (∑S TA to TB ). As a consequence, individuals with high stressor load (high ∑ S) and low mental health problems (∑P) at a given time are considered as more resilient than individuals experiencing equal mental health problems under low stressor load (low ∑S) in the same period.
In general, there are several ways to control for individual stressor load when examining changes in mental health in resilience intervention studies. For example, resilience intervention studies could control for stressor-related variables in the statistical analysis. The number of micro-and macrostressors could be considered as covariates in an analysis of covariance for mental health or the individual stressor load might be controlled for by including it as predictor in regression analysis on mental health problems (∑P). In such regression analysis, especially the statistical interaction between the predictors intervention group (e.g., resilience training vs. control) and stressor load (e.g., high vs. low stressor load) would reveal if the effect of individual stressor exposure on mental health problems depends on the participation in the resilience intervention, i.e. if the resilience intervention is effective by reducing the effect of stressor load on mental health problems. Alternatively, we have recently proposed the R score to examine outcome-defined resilience in a longitudinal manner (Kalisch et al., 2015). The R score is the quotient of the difference in mental health problems between two time points normalised by the individual stressor load in the same period. However, the validity of this R score as primary outcome measure of resilience has to be further examined.
In the three possible designs (see also introduction and Fig. 1)the resilience training is conducted before (see Fig. 1a and Appendix Fig.  F.1), during (see Fig. 1b and Appendix Fig. F.2) or after stressor exposure (see Fig. 1c and Appendix Fig. F.3) the difference in mental health between two time points in relation to stressor exposure can be determined for each period (T1 to T2; T2 to T3; T3 to T4 etc.) in the intervention and control group, respectively. An individual who does not or only to a small extent develop mental dysfunctions during or after the stressor exposure (i.e., between T2 and T3 in Fig. F.1 or between T1 and T2 in Figs. F.2 and F.3), would be more resilient than a person experiencing severe impairments in mental health during or after the same stressor.
In order to control for pre-intervention differences, data on individual mental health and stressor load at baseline should also be collected, where possible. With regard to baseline stressor exposure, the number of macrostressors experienced during lifetime and the amount of microstressors at the time prior to the intervention could be assessed. The baseline mental health value and the baseline (cumulative) stressor load could then be considered as covariates in the post-test analysis of mental health. Alternatively, investigators could also conduct a baseline monitoring phase prior to the start of the intervention (e.g., two weeks between T0 and T1; see Figs. F.1 and F.2) to obtain information on the baseline values of mental problems and individual stressor load. A monitoring phase might also be necessary if investigators aim at selecting participants at high risk for mental dysfunctions or exposed to high levels of stressors. Since the participants are only included after stressor exposure, it is not possible to conduct a baseline monitoring phase if the resilience training is conducted after a certain stressor (e.g., natural disaster; see Fig. F.3). Nevertheless, data on the occurrence of other macrostressors during lifetime could be collected at baseline in this design.
In all three designs, the occurrence of new macrostressors should be assessed retrospectively at each time point (T2, T3, and T4) in the course of the intervention study. The amount of microstressors during the intervention might be assessed either retrospectively using questionnaires at each time point (T2, T3, and T4; e.g., by asking for typical daily hassles experienced during the last week) or using EMA between the measurements (see alsoresults).
We propose conducting RCTs in resilience intervention research from phase II on (see checklist, Table 1). Although this study design currently presents the 'gold standard' for clinical intervention trials (Moher et al., 2010), some limitations have been discussed that we want to mention here. Those include, apart from high costs and the complexity of implementing RCTs especially the problem that experimental conditions often differ from the situation in real life (Rosen, Manor, Engelhard, & Zucker, 2006). As a consequence, a lack of external validity, for example due to strict eligibility criteria in RCTs, is often criticized (Rothwell, 2006). This can be alleviated by design aspects, such as intention-to-treat analysis or community-level RCTs with less strict inclusion criteria (e.g., Rosen et al., 2006). Ethical limitations in RCTs depend on the clinical equipoise of intervention and control groups: If researchers are uncertain whether a resilience intervention is superior to another training program, withholding a program from some individuals or groups on the basis of randomization can be considered ethical if potential risks were taken into account (Rosen et al., 2006). However, it might not be ethical to randomize participants after a severe trauma (e.g., natural disaster, fatal accident) to a wait-list control or an attention control that is known as being not (clinically) equivalent to the resilience intervention (Nardini, 2014).

Conclusion
The present review is the first that focuses on methodological aspects of resilience intervention studies, summarizes the current state of research and proposes a methodologically sound framework of resilience intervention research. With regard to the currently published intervention studies aiming at fostering resilience, we conclude that concepts, methods and designs in those studies are of limited use to properly assess the efficacy of interventions to foster resilience. Major problems are the use of definitions of resilience as trait or composite of resilience factors, the use of unsuited assessment instruments, and issues in the study design. To overcome these challenges, we have proposed an outcome-oriented definition of resilience, an outcome-oriented assessment of resilience as change in mental health in relation to stressor load, and made proposals for methodological standards for suitable study designs of future intervention studies. We conclude our work with a summary of the suggestions (see Table 1) in this paper which could be used as checklist for planning and conducting resilience intervention studies.
We are aware that establishing a consensus about study guidelines in such a wide field is no minor challenge, and we would like to explicitly point out that we only consider our conclusions as potential starting points for a discussion that hopefully will involve a large part of the community. We believe that the benefit of such a consensus finding process would lie in reduced heterogeneity in the way resilience is operationalized, permitting comparability of studies and facilitating cross-talk between researchers, and also in less confusion about how to interpret study results and how to communicate our findings to the general public. We would therefore ask our colleagues to consider this paper as an invitation to an open-ended discussion that is nevertheless necessary to guarantee that future resilience intervention studies will follow highest quality standards and yield robust and interpretable results. We are convinced that this will ultimately be for the benefit of the entire field.

Role of funding sources
There was no funding for this review.

Contributors
All eight authors contributed to the development of this manuscript. The first authorship as well as the overall responsibility for the review is shared by AC and AK. AC, AK, IH and KL conceived the initial review and developed the manuscript. AC and AK conducted the methodological analyses of the included studies. KL, OT, RK, TK and MW provided expert support with regard to the proposals for a resilience framework in future intervention research and critically commented on the review. AC and AK wrote the first draft of the manuscript and all authors contributed to and have approved the final manuscript.

Conflict of interest
All authors declare that they have no conflicts of interest. e intensity (e.g., number of sessions, duration of sessions) e setting (e.g., group, individual, combined) e delivery mode (e.g., face-to-face, online, telephone, combined)

Appendix A. Data extraction
• For each comparator: e type of comparator (no intervention, attention control, waiting-list control, treatment as usual, head-to-head comparison) e in case of attention control, treatment as usual or head-to-head comparison: content and intensity of control group Outcomes • Outcomes and time points collected; • Assessment instruments; • Format of outcome assessment (e.g., paper-pencil, online); • Dropouts;

Results
• Statistical analyses; • Significant and non-significant effects (e.g., main effects, interactions); • Conclusions of the study authors concerning efficacy of resilience intervention; Miscellaneous aspects • Place of study conduction; • Strengths and limitations of studies according to authors.
Appendix B. Background information on study characteristics of the 43 RCTs part I A. Chmitorz et al. Clinical Psychology Review 59 (2018)    A. Chmitorz et al. Clinical Psychology Review 59 (2018) 78-100 Appendix C. Resilience definitions in the 43 RCTs  Abbott, et al. (2009) 'A person's ability to persevere in the face of challenges, setbacks and conflicts (Reivich & Shatté, 2002)' (p. 89) Bekki, et al. (2013) 'Ability to overcome everyday setbacks'; 'attitudinal and behavioral characteristics that enable returning to a positive trajectory after an event that hinders progress in some way'; 'resilience is multidimensional (Lightsey, 2006)'; 'a dynamic process (APA)' (p. 26) Bradshaw, et al. (2007) 'Capacity to be resilient'; 'capacity of all individuals to withstand hardship and repair oneself, to transform and change no matter the risks (Lifton, 1994;Wolin & Wolin, 1995) and a force within everyone that drives them back to seek self-actualization, altruism, wisdom, and harmony with a spiritual source of strength (Richardson, 2002)'; 'the capacity and trait that people with diabetes must have to adapt to living well with their disease' (p. 651) Dolbier, et al. (2010) 'Recovering from a stressor to a pre-stressor level of functioning (Steinhardt, 2008)'; 'a protective factor that may decrease adjustment problems and increase positive change when coping with stressful situations (Paton, Violanti, & Smith, 2003)' (p. 137) Hodges (2010) "The capability of individuals to cope successfully in the face of change, adversity, and risk" (Stewart, Reid, & Mangham, 1997, p. 22)'; '"the capacity to rebound or bounce back from adversity, conflict, failure, or even positive events, progress, and increased responsibility" (Luthans, 2002a, p. 702).' (p. 26) Loprinzi, et al. (2011) 'Ability to thrive despite stress and adversity '; 'is also described as invulnerability and hardiness (Kobasa, 1979)'; 'the source of resilience is an individual's inner strength that helps the individual adapt to stressors and pursue life's meaning and purpose' (p. 365) Luthans, et al. (2008) 'State-like construct'; 'one's ability, when faced with adversity, to rebound or "bounce back" from a setback or failure (Block & Kremen, 1996;Masten et al., 1985)'; 'dynamic learning process of resilience' (p. 211) Luthans, et al. (2010) "A class of phenomena characterized by patterns of positive adaptation in the context of significant adversity or risk", which enables individuals to bounce back quickly and effectively from adverse events (Masten & Reed, 2002)'; 'difference between those who recover well after adversity and those who remain devastated and unable to move ahead (Block & Kremen, 1996;Masten et al., 1985)'; 'those higher in resilience bounce back psychologically (including emotion and cognition) to levels at, or even beyond, previous levels of homeostasis or equilibrium (Richardson, 2002)' (p. 47) McCraty and Atkinson (2012) 'Capacity to prepare for, recover from, and adapt to stress, adversity, trauma, or tragedy' (p. 49) McGonagle, et al. (2014) 'Positive adaptability or ability to thrive in the face of adversity (Campbell-Sills & Stein, 2007;Luthans, 2002)' (p. 387) Pidgeon, et al. (2013) 'Resilience defined as competence to cope and adapt in the face of adversity and to bounce back when stressors become overwhelming is considered a significant protective factor against instances of compassion fatigue, burnout and mental and physical illness (Thomas & Otis, 2010)' (p. 355) Rose, et al. (2013 'Ability of individuals to adapt successfully in the face of acute stress, trauma, or chronic adversity, maintaining or rapidly regaining psychological well-being and physiological homeostasis (Charney, 2004)' (p. 107) Songprakun and McCann (2012a); Songprakun and McCann (2012b) 'Psychosocial capacity of the person to maintain positive adaptive functioning which minimizes negative thoughts and promotes recovery of strength and coping ability and to have a positive outlook in the face of difficult circumstances such as depression (Reivich et al., 2005)'; 'includes four major components: social competence, problem solving ability, development of autonomy, and having a sense of purpose and a sense of meaning (Bernard, 2004)'; 'has been suggested that resilience is a protective factor that facilitates successful coping in conditions of adversity (Fergus & Zimmerman, 2005)' (p. 2) Sood, et al. (2011) 'Ability of an individual to withstand adversity )' (p. 858) Steinhardt and Dolbier (2008) 'ability to recover quickly from disruptions in functioning that result from stress appraisals and to return to the previous level of functioning (Carver, 1998;O'Leary & Ickovics, 1995)' (p. 445) Stoiber and Gettinger (2011) 'Capacity to be resilient, or to develop positive adaptation or "bounce back" when faced with difficulties and cope effectively (Luthar, 2000;Merrell, Levitt, & Gueldner, 2010)' (p. 687) Varker and Devilly (2012) 'Resilience as an adverb describing the goal of the intervention (i.e. resilience training) and as an outcome as measured by lack of distress (to determine whether the resilience training was or was not successful)' (p. 697)  'A force within everyone that drives them to seek self-actualization, altruism, and be in harmony with a spiritual source of strength (Richardson, 2002)' (p. 179) Appendix D. Background information on study characteristics of the 43 RCTs part II A. Chmitorz et al. Clinical Psychology Review 59 (2018)    Post-test assessment: All assessments within one week after the stressor exposure or the end of the intervention; follow-up periods are also indicated as assessment periods after the stressor exposure or the end of the intervention. Simulated or 'artificial' stressor at post-test (simulated stressful police activities) since resilience intervention aims at preparing for a stressor (in police work) that cannot be exactly predicted.
r Assessment took place eight weeks after the baseline assessment, there is no information about the duration of the training available.
s Steinhardt et al. (2008) used the same sample as Dolbier et al. (2010); the two publications differ in the reported outcomes and the focus of the statistical analyses (Dolbier et al. (2010): post-traumatic growth).
t Simulated or 'artificial' stressor at post-test (car crash video) since resilience intervention aims at preparing for a stressor (stressors in the work of emergency services personnel) that cannot be exactly predicted.
u No intervention in the control group, but all groups participate in weekly group sessions (sham intervention) to exclude group effects.
v Control group completes questionnaires and participates in groups on Navy knowledge assignment (sham intervention).
W Post-test just before end of the 9-week training. Assessment of retention rates in study completers after two years.