Effects of mental health interventions for students in higher education are sustainable over time: a systematic review and meta-analysis of randomized controlled trials

Background Symptoms of depression, anxiety, and distress are more common in undergraduates compared to age-matched peers. Mental ill health among students is associated with impaired academic achievement, worse occupational preparedness, and lower future occupational performance. Research on mental health promoting and mental ill health preventing interventions has shown promising short-term effects, though the sustainability of intervention benefits deserve closer attention. We aimed to identify, appraise and summarize existing data from randomized control trials (RCTs) reporting on whether the effects of mental health promoting and mental ill health preventing interventions were sustained at least three months post-intervention, and to analyze how the effects vary for different outcomes in relation to follow-up length. Further, we aimed to assess whether the effect sustainability varied by intervention type, study-level determinants and of participant characteristics. Material and Methods A systematic search in MEDLINE, PsycInfo, ERIC, and Scopus was performed for RCTs published in 1995–2015 reporting an assessment of mental ill health and positive mental health outcomes for, at least, three months of post-intervention follow-up. Random-effect modeling was utilized for quantitative synthesis of the existing evidence with standardized mean difference (Hedges’ g) used to estimate an aggregated effect size. Sustainability of the effects of interventions was analyzed separately for 3–6 months, 7–12 months, and 13–18 months of post-intervention follow-up. Results About 26 studies were eligible after reviewing 6,571 citations. The pooled effects were mainly small, but significant for several categories of outcomes. Thus, for the combined mental ill health outcomes, symptom-reduction sustained up to 7–12 months post-intervention (standardized mean difference (Hedges’ g) effect size (ES) = −0.28 (95% CI [−0.49, −0.08])). Further, sustainability of symptom-reductions were evident for depression with intervention effect lasting up to 13–18 months (ES = −0.30 (95% CI [−0.51, −0.08])), for anxiety up to 7–12 months (ES = −0.27 (95% CI [−0.54, −0.01])), and for stress up to 3–6 months (ES = −0.30 (95% CI [−0.58, −0.03])). The effects of interventions to enhance positive mental health were sustained up to 3–6 months for the combined positive mental health outcomes (ES = 0.32 (95% CI [0.05, 0.59])). For enhanced active coping, sustainability up to 3–6 months was observed with a medium and significant effect (ES = 0.75 (95% CI [0.19, 1.30])). Discussion The evidence suggests long-term effect sustainability for mental ill health preventive interventions, especially for interventions to reduce the symptoms of depression and symptoms of anxiety. Interventions to promote positive mental health offer promising, but shorter-lasting effects. Future research should focus on mental health organizational interventions to examine their potential for students in tertiary education.


INTRODUCTION
Mental health problems among students in higher education is an emerging public health issue and evidence-based prevention is essential (Christensson et al., 2010;Dahlin et al., 2011;Garlow et al., 2008;Hunt & Eisenberg, 2010;Steptoe et al., 2007). Recent systematic reviews on student health raise concerns over high rates of mental ill health outcomes with pooled prevalence ranging between 27% and 34% for depression and depressive symptoms and reaching 11% for suicidal ideation (Ibrahim et al., 2013;Rotenstein et al., 2016;Tung et al., 2018). Also, a two-fold risk for suicide is shown during ongoing university studies compared to when having attained university studies (Lageborn et al., 2017). Elevated rates of mental ill health, namely symptoms of distress, anxiety, and depression, in undergraduates appear to substantially exceed the corresponding estimates in age-matched peers (Cvetkovski, Reavley & Jorm, 2012;Dyrbye, Thomas & Shanafelt, 2006;Leahy et al., 2010;Winzer et al., 2014) and the general population (Ibrahim et al., 2013;Rotenstein et al., 2016). Female students, minority groups, and students with financial problems constitute groups with higher risks (Cvetkovski, Reavley & Jorm, 2012;Eisenberg, Hunt & Speer, 2013;Said, Kypri & Bowman, 2013). Once heightened at the beginning of the study period, the symptoms of anxiety, and depression remain elevated over the academic years and at no time point drop down to pre-registration levels (Bewick et al., 2010). Mental ill health among students may potentially be caused by heavy workload, insufficient feedback from teachers and worries about future endurance/competence (Dahlin, Joneborg & Runeson, 2005), but may also reflect the increase in deteriorated mental health among adolescents (Hunt & Eisenberg, 2010). Mental ill health problems are often accompanied by decrements in positive mental health through lowered self-perception, inadequate social-emotional skills, and poor interpersonal relationships (Conley, Durlak & Kirsch, 2015). Moreover, perceived academic stress and burn-out are associated with impaired academic achievement (Andrews & Wilding, 2004;Keyes et al., 2012;Vaez & Laflamme, 2008), worse occupational preparedness and lower occupational performance after the graduation (Rudman & Gustavsson, 2012). In prevention science the health promotion approach constitutes a substantial ingredient of the integrative model for mental health intervention in youth (Weisz et al., 2005). Including aspects of positive mental health, i.e., emotional, psychological and social well-being (Westerhof & Keyes, 2010) is a beneficial strategy in mental health interventions (Kobau et al., 2011). It has been shown that psychological assets, e.g., boostering positive emotions, coping strategies, and compassion may help people to manage life's challenges. Thus, promoting mental health and preventing mental ill health are two essential and complementary steps in reducing the burden of disease (Jane-Llopis, 2007;Keyes, 2007;World Health Organization (WHO), 2002).
Previous research on mental health promotion and mental ill health prevention has shown promising short-term effects of stress reduction techniques and meditation, self-hypnosis, cognitive behavioral, and mindfulness interventions (Conley, Durlak & Kirsch, 2015;Conley et al., 2017;Regehr, Glancy & Pitts, 2013;Shiralkar et al., 2013) as well as of technology based interventions (Conley et al., 2016;Davies, Morriss & Glazebrook, 2014;Farrer et al., 2013). As mental health problems persist during the study period (Bewick et al., 2010;Christensson et al., 2010) and negatively affect academic performance and future working capacity (Rudman & Gustavsson, 2012;Vaez & Laflamme, 2008), the sustainability of intervention benefits as well as its determinants and moderators deserve closer attention. Several reviews have approached the issue of intervention effect sustainability by averaging the effects reported for the longest follow-up periods, although a substantial variability in the ranges and means of the follow-up lengths made the comparisons difficult (Conley, Durlak & Kirsch, 2015;Conley et al., 2016Conley et al., , 2017. The authors highlighted the need for in-depth investigation of the intervention benefit sustainability over variable post-intervention follow-up periods since the effects may change their direction and strength over time (Conley, Durlak & Kirsch, 2015). Therefore, to further address the nature of sustainability of intervention effects, in this review we aimed to systematically identify, appraise and summarize the existing data from randomized control trials (RCTs) reporting on whether the effects of mental health promoting and mental ill health preventing interventions are sustained for at least three months of post-interventional follow-up. Further, we aimed to analyze how the direction and magnitude of the effects vary for different outcomes in relation to the lengths of follow-up and to assess whether effect sustainability varied by the types and major features of interventions, study-level determinants, and characteristics of participants.

Eligibility criteria
The protocol was registered in PROSPERO, CRD42015029353 (Data S1). The study followed the guidelines for conducting systematic review as suggested by the Cochrane handbook for systematic reviews of interventions (Higgins, Green & Cochrane Collaboration, 2008) and reported the study findings and procedure in relation to the sec statement (Moher et al., 2009) (Table S5).
The PICO components (population, intervention, comparator and outcome) were developed after discussing eligibility criteria with stakeholders from the student health services: P = students in university settings; I = any types of mental health-promoting and mental ill health-preventing interventions; C = any types of active or inactive controls; O = (i) positive mental health, including well-being, coping, locus of control, resilience, self-esteem/self-compassion, stress management, academic achievement or academic performance, and (ii) mental ill health, including symptoms of anxiety, symptoms of depression, psychological distress, worry, fatigue, sleeping problems, and perceived stress. The study design was restricted to RCTs with at least three months of post-intervention follow-up. No language restrictions were initially applied. Studies focused on students with diagnosed psychiatric disorders and studies conducted in primary care settings were excluded.

Search strategy
In collaboration with librarians (CG, AW: see Acknowledgements), a sensitive search strategy was developed and adapted to the following databases: MEDLINE (Ovid), PsycInfo (Ovid), ERIC (Ovid), and Scopus. Study selection, data extraction, and quality assessment Screening was conducted independently by two authors (RW, KG) and two colleagues (AF, AM: see Acknowledgements). The eligibility of each article was initially evaluated by the title and abstract and, if found appropriate, followed by full-text examination. At this stage only English-language publications were assessed. This resulted in a loss of 20 publications in Chinese (k = 15), Japanese (k = 3), Korean (k = 1), and Spanish (k = 1). Gray literature was taken into consideration when accessible free of charge. Any disagreements were resolved through panel discussions. Studies selected for inclusion were examined for potential overlap in study populations, which was not found.
Data extracted from the articles included first author, country of origin, setting, funding, inclusion and exclusion criteria, characteristics of the intervention and comparison groups (age, gender, ethnicity), characteristics of the intervention (type, format, delivery level, length of session, duration), type of comparison, outcome definition and measurement scale, sample size, post-intervention length of follow-up, percent of withdrawals at each measurement point, and study quality (described below).
If a study reported multiple outcomes and/or if outcomes were assessed at multiple follow-up time points, quantitative data (means and standard deviations (SDs)) were extracted separately for each outcome at each follow-up period. The same approach was utilized for multi-armed RTCs, from which separate extraction was performed for each intervention-comparison pair. When data were missing in the original reports we contacted authors for further clarification.
As suggested by Conley, Durlak & Kirsch (2015), original interventions were grouped into: (i) cognitive behavior therapy (CBT)-related if focusing on identifying and changing unhelpful cognitions, behaviors and emotional regulation; (ii) mind-body-related, i.e., interventions that facilitate the mind's capacity to affect bodily function and symptoms; and (iii) psycho-educational-related if focusing on information, discussion and didactic communication on, e.g., stress-reduction and coping. Categorization was based on the original definitions, if provided, and otherwise by us. Level of delivery was considered as universal if intervention targeted students without reported mental ill health symptoms and as selective if provided for those with adverse mental health symptoms. Interventions were further divided into group or individual format. Comparators were sub-divided into active controls (i.e., another type of intervention) and inactive controls (waitlist controls, placebo-controls, "living as usual," and no intervention). Study outcomes were classified in two major categories: mental ill health outcomes consisting of anxiety symptoms, depressive symptoms, psychological distress, stress, self-reported worry, passive coping, and deteriorated quality of sleep; and positive mental health and academic performance outcomes including self-esteem, selfcompassion, self-efficacy, mental or subjective well-being, resilience, active coping, happiness, stress management and academic performance.
The quality of selected trials was assessed independently by three authors (RW, KG, LL) and colleagues (AF, AM, SB: see Acknowledgements) using the Effective Public Health Practice Project Quality Assessment Tool (EPHPP), as recommended by the Cochrane Collaboration for public health reviews (Higgins, Green & Cochrane Collaboration, 2008). The EPHPP assesses selection bias, study-design, confounders, blinding, data collection method and withdrawals, and dropouts to yield the study quality as either strong, moderate or weak. Discrepancies in quality assessment were resolved by discussions with one of the three reviewers not involved in the review process.

Statistical analysis
Because of the variety of instruments used for measuring outcomes, a standardized mean difference using Hedges' g was chosen as a common effect size (ES) for conducting quantitative synthesis. ES was calculated separately at each post-intervention follow-up time point as a difference in means between intervention and control group, divided by the pooled within-group SD and incorporating a correction factor for small sample sizes (Borenstein et al., 2009). One trial reported Hedges' g as the study ES (Braithwaite & Fincham, 2009), while for other studies it was calculated from the available raw data. Throughout the recalculations we kept the original direction of scales indicating the improvement of outcome measures. Thus, for mental ill health outcomes ESs below zero pointed to superiority of the intervention group over the controls, while for positive mental health and academic performance outcomes, ESs above zero indicated that the results favored the intervention. For one study where follow-up means, but not SDs, were provided and the intervention effect was indicated as "non-significant" (Chiauzzi et al., 2008), we set ES to zero. To ease interpretation of the magnitude of Hedges' g, we applied Cohen's convention (Cohen, 1992) and defined the ES as small (0.2), medium (0.5), and large (0.8).
Precautions were taken to overcome unit-of-analysis error and avoid using multiple assessment of the same construct (Higgins, Green & Cochrane Collaboration, 2008). If more than one ES was reported in a given study for the same outcome at a given follow-up point (e.g., for depression assessed by both the Hamilton depression rating scale and Beck depression inventory), we averaged ESs to obtain the single outcome measure per intervention at each measurement point (Higgins, 2006;Jones & Johnston, 2000;Kanji, White & Ernst, 2006;Peden et al., 2001;Seligman et al., 1999). ESs were also averaged within the trials with multiple interventions of similar nature, i.e., if interventions belonged to the same category (Chiauzzi et al., 2008;Mak et al., 2015). A similar approach was applied to studies with multiple comparisons (Chiauzzi et al., 2008;Rohde et al., 2014;Yang et al., 2014). An exception was the study by Kanji, White & Ernst (2006), where two control groups-an attention control and a time control-were included separately as considered to be different in approach and content and, thus, representing active and inactive comparisons, respectively.
Meta-analysis was conducted for all specific outcomes originally reported and for outcomes combined within mental ill health and positive mental health and academic performance categories. To analyze the combined outcomes, we applied a hierarchical approach for selecting outcomes from the studies reporting more than one from the same category. The hierarchy was based on descending order of outcome reporting, i.e., from the most often reported to the least often reported. For mental ill health outcomes the hierarchical selection was ordered as: depressive symptoms, anxiety symptoms, stress, psychological distress, self-reported worry, quality of sleep, and passive coping. For positive mental health and academic performance outcomes the order was: self-esteem, academic performance, self-efficacy, self-compassion, mental or subjective well-being, resilience, stress management, active coping, and happiness.
Because of the initial assumptions of between-study heterogeneity, a random-effects model incorporating both within-and between-study variability was used for quantitative synthesis. To assess the sustainability of intervention effect over time and address the variety of the follow-up lengths reported in the original studies, we categorized the post-intervention follow-ups as 3-6 months, 7-12 months, and 13-18 months. Each of the included studies reported outcome measures for at least one of these categories and quantitative synthesis was conducted separately for each category. If a given study provided several outcome measures falling in the same length category (e.g., for both three and six month follow-ups), the ES for the follow-up close to the upper boundary (i.e., six months) was chosen (Kanji, White & Ernst, 2006;Seligman, Schulman & Tryon, 2007;Vazquez et al., 2012). Only one study (Seligman et al., 1999) assessed outcomes at follow-up periods longer than 18 months and the measurements of those periods (i.e., 24 months, 30 months, and 36 months) were not included in the meta-analysis. Finally, to obtain comparability between trials, if a study provided results based on both imputed data (i.e., intention-to-treat analysis) and non-imputed data (i.e., follow-up completers) (Reavley et al., 2014) or if both crude and adjusted ESs were available (Chase et al., 2013) we favored the imputed and crude measures for the main analysis leaving the latter (non-imputed and adjusted measures) for sensitivity analysis.
We evaluated statistical heterogeneity among the studies using Q and I 2 statistics. For Q, p-value < 0.1 was considered as representative of statistically significant heterogeneity, and I 2 values of 25%, 50%, and 75% were indicating low, moderate and high heterogeneity, respectively (Higgins et al., 2003).
The subgroup analyses were performed by stratifying the main analysis by a priori identified moderators related to interventions (category of intervention, delivery level, type of format, type of controls), study-level moderators (initial study size, study quality) and moderators related to participant characteristics (gender, country). The analyses were performed if at least two studies were included in each subgroup. Mixed-methodology was applied with random-effect modeling used for within-group pooling, while between-group differences were assessed with fixed-effect model. Leave-one-out influence analysis was conducted to assess the potential impact of individual studies on the overall pooled ES by omitting one study at a time (Tobias, 1999). Following the approach suggested by Hart et al. (2012), sensitivity analysis was conducted to assess whether the overall pooled ES differed if the lowest or the highest original ES was selected from the studies with multiple outcome assessments or multiple interventions or comparisons. In meta-analyses with three or more studies included, we assessed publication bias by funnel plots, Egger's regression asymmetry test, and the Begg-Mazumdar adjusted rank correlation test (Begg & Mazumdar, 1994;Egger et al., 1997).
All statistical analyses were performed using STATA version 13.1 (StataCorp, College Station, TX, USA), p-values < 0.05 were considered statistically significant, and all statistical tests were two-sided.

RESULTS
After removing the duplicates, 6,571 records were available for title and abstract screening. Among these, 6,519 records were excluded as not meeting the PICO-criteria, leaving 52 articles for full-text examination. Further evaluation excluded another 26 studies: post-intervention follow-up less than three months (k = 11), not enough data to calculate ES (k = 6), not a RCT (k = 5), population not relevant (k = 3), and outcome not relevant (k = 1). A selection process yielded a final number of 26 RCTs to be included in the meta-analysis (Fig. 1). Table 1 summarizes the characteristics of studies eligible for inclusion. Among the 26 RCTs (Braithwaite & Fincham, 2009;Chase et al., 2013;Cheng et al., 2015;Chiauzzi et al., 2008;Erogul et al., 2014;Fontana et al., 1999;Franklin & Franklin, 2012;Gortner, Rude & Pennebaker, 2006;Hamdan-Mansour, Puskar & Bandak, 2009;Higgins, 2006;Jones & Johnston, 2000;Kanji, White & Ernst, 2006;Kattelmann et al., 2014;Kenardy, McCafferty & Rosa, 2006;Li et al., 2015;Mak et al., 2015;Pachankis & Goldfried, 2010;Peden et al., 2001;Reavley et al., 2014;Rohde et al., 2014;Seligman et al., 1999;Seligman, Schulman & Tryon, 2007;Shapiro et al., 2011;Vazquez et al., 2012;Yang et al., 2014;Zheng et al., 2015), CBT-related interventions were assessed in 11 studies, while mind-body-related and psycho-educational-related interventions were assessed in 10 and five studies, respectively. Universal and selective delivery levels were equally present (k = 13 for both). Face-to-face group format was the most common (k = 16). At least one mental ill health outcome   was assessed in 24 studies, while at least one positive mental health outcome was appraised in 14 studies. Twenty-three trials reported at least one outcome measurement during 3-6 months post-intervention follow-up, with eight and five trials reporting corresponding measurements during 7-12 months and 13-18 months follow-ups. None of the interventions had an organizational approach. More detailed characteristics of the selected studies are presented on-line (Table S1). The study quality assessed in all 26 RCTs varied between strong (k = 4), moderate (k = 12), and weak (k = 10). When subdivided by the outcome categories, 24 trials with at least one mental ill health outcome revealed their study quality as strong (k = 4), moderate (k = 11), and weak (k = 9), and 14 trials with at least one positive mental health outcome and academic performance of strong (k = 3), moderate (k = 6), and weak (n = 5) quality. Across all studies included in the analysis, selection bias was the most commonly assessed weakness component (n = 21). (Table S2).

Effects and sustainability over time
Interventions preventing mental ill health As presented in Table 2, for the combined mental ill health outcomes an aggregated ES for all preventive interventions yielded a superiority of interventions over the comparisons at 3-6 months and 7-12 months of post-intervention follow-up, although  Table 2 Meta-analysis and sub-group analyses for hierarchically selected mental ill health outcomes, stratified by the length of post interventional follow-up periods.

Variables
Length of post intervention follow-up periods (months) 3-6 7-12 13-18 All interventions (k) 21 9 3 Hedges' g (95% CI) the effects were small. Pooled ES did not reach statistical significance for follow-up periods of 13-18 months. High-to-moderate heterogeneity was detected at all three follow-up periods with I 2 of 79.5%, 74.2%, and 58.2%, respectively. Publication bias was evident for studies with 3-6 months follow-up (Egger's test p-value = 0.013), though not for studies with longer follow-up periods (7-12 months: p-value = 0.151; 13-18 months: p-value = 0.141), (Fig. S1). Influence analysis revealed no indication that individual RCTs, if omitted, would significantly influence the observed overall ESs. As previously noted, sensitivity analyses were performed for studies with multiple outcome measures reported for the same follow-up and for studies with multiple interventions or comparisons. Pooling together the highest ESs originally reported for these studies did not alter the results of the main analysis 3-6 months: . Only one study reported the results using both imputed and non-imputed data (Reavley et al., 2014), with the former included in the main meta-analysis. Alternative inclusion of the latter did not change the overall ES. In sub-group analyses for the combined mental ill health outcomes, studies employing CBT-related interventions revealed significant pooled ESs for 3-6 month and 13-18 month follow-ups (Table 2; Fig. 2). Less consistent results were observed for mind-bodyrelated interventions. No superiority of intervention group appeared among studies with psycho-educational interventions. Pooled ESs for universal preventive interventions yielded significant results for follow-up up to 7-12 months. Less consistency appeared in the aggregated results for selective interventions and interventions conducted face-to-face in groups. Trials with small sample size and trials comprising more than 60% females yielded significant effects for up to 7-12 months of follow-up. The small numbers of studies might explain the lack of consistency in the results of other sub-group comparisons. The high heterogeneity seen for studies with 3-6 months of follow-up might reflect differences in delivery level (p-value for Q between sub-groups = 0.006) and study size (p < 0.001), while for studies with follow-up periods of 7-12 months, heterogeneity could be explained by differences in type of comparison (p < 0.001), study size (p = 0.01), and country where the RCT was conducted (p = 0.003). We were unable to detect between-group differences for trials with follow-up of 13-18 months because of the small number of studies in the sub-groups. Assessment of the specific mental ill health outcomes, revealed a sustainable effect of all interventions combined lasting up to 13-18 months for symptoms of depression (ES = -0.30 (95% CI [-0.51, -0.08])), (Table S3). For symptoms of anxiety sustainability was observed up to 7-12 months (ES = -0.27 (95% CI [-0.54, -0.01])). Only one study assessed the effect of interventions targeting anxiety during 13-18 months of post-intervention follow-up and, hence, we were unable to perform meta-analysis. For symptoms of stress, reductions lasted up to 3-6 months post-intervention (ES = -0.30 (95% CI [-0.58, -0.03])). Other comparisons were either inconclusive or quantitative synthesis was not performed because of the small number of studies.
Interventions promoting mental health and academic performancea paucity of outcomes Table 3 presents overall ESs for the combined positive mental health and academic performance outcomes with rather limited data available, in particular, for the follow-up periods longer than 3-6 months. All interventions combined showed superiority over the controls during 3-6 months of follow-up with small, but significant pooled ES. For longer follow-up periods the results were inconclusive. High heterogeneity was detected when studies with 3-6 months follow-up were pooled (I 2 = 86.5%). Because of the small number of studies, publication bias were only assessed for studies with 3-6 months of follow-up and were detected (Egger's test p-value = 0.03) (Fig. S1). Influence analysis indicated that four individual studies (Erogul et al., 2014;Hamdan-Mansour, Puskar & Bandak, 2009;Pachankis & Goldfried, 2010;Peden et al., 2001), if omitted, would drop the significant overall ES for the studies with 3-6 months follow-up to borderline significance. Overall ESs at follow-ups of 7-12 and 13-18 months remain non-significant regardless of individual study influences. Sensitivity analyses pooling the highest ESs originally reported for studies with multiple outcome assessment or multiple interventions or comparison groups showed no alteration to the overall results at 3-6 months follow-up (ES = 0.32 (95% CI [0.06, 0.58])), but made the overall ESs for 7-12 months follow-ups significant (ES = 0.53 (95% CI [0.20, 0.87])), as well as for 13-18 months follow-up (ES = 0.53 (95% CI [0.21, 0.86])). However, only two studies were assessed within each category of 7-12 and 13-18 months follow-ups. Use of the lowest originally reported ES did not affect the results of the main analysis (3-6 months: ES = 0.32 (95% CI [0.06, 0.58]); 7-12 months: ES = 0.16 (95% CI [-0.18, 0.50]); 13-18 months: ES = 0.16 (95% CI [-0.17, 0.49])). One study reported both crude and adjusted outcome assessment (Chase et al., 2013). Use of the adjusted ES for sensitivity analysis did not alter the results observed in the main analysis.
Sub-group analyses for the combined positive mental health and academic performance outcomes were performed only for studies with 3-6 months follow-up (Table 3; Fig. 3). Superiority of interventions over comparisons was shown for CBT-related interventions, selective delivery level, face-to-face group format, RCTs with inactive comparisons, studies with small sample size and trials conducted in US. Between-group difference was significant for delivery level (p < 0.001), format type (p < 0.001), study size (p < 0.001), and gender mix (p = 0.02). Sub-group analyses for Table 3 Meta-analysis and sub-group analyses for hierarchically selected positive mental health and academic performance outcomes stratified by the length of post interventional follow-up periods.

Variables
Length of post intervention follow-up periods (months) 3-6 7-12 13-18 All interventions (k) 11 2 2 Hedges' g (95% CI) 0.32 ( studies with longer follow-up revealed either non-significant results or were impossible to conduct owing to the small number of trials in the sub-groups. Because of lack of data on the specific positive mental health and academic performance outcomes, only studies on active coping, self-esteem, and self-efficacy with 3-6 months follow-up were quantitatively assessed (Table S4). Sustainability of the intervention effect was observed for active coping (ES = 0.75 (95% CI [0.19,1.30])) with no significant effects shown for other outcomes.

DISCUSSION
Our systematic review and meta-analysis showed sustainability of the benefits of mental health interventions targeting students in higher education, though in most of the analyses, the pooled ESs yielded significant, but small overall effects. For the combined mental ill health outcomes, the observed effects across all preventive interventions were sustained for up to 7-12 months post-intervention. Sustainability of effects was most pronounced for interventions designed to reduce the symptoms of depression, for which the superiority of intervention groups over the comparisons remained significant for up to 13-18 months post-intervention. For the combined positive mental health and academic performance outcomes, aggregated results across all promotion interventions revealed slightly shorter, but still evident sustained effects, which remained significant at post-intervention follow-up of 3-6 months.
To our knowledge, this is the first systematic review and meta-analysis focusing primarily on the sustainability of the effects of mental health promoting and mental ill health preventing interventions among students in higher education and analyzing different categories of follow-up duration. A direct comparison to the existing literature was therefore difficult as other reviews mostly assessed the effects measured at the  showed the duration of follow-up to be negatively correlated with aggregated ES across mental ill health and positive mental health outcomes combined as well as no effect for psycho-educational interventions. A similar tendency for the effects of intervention to become non-significant as the duration of follow-up increases was observed in our study, though the sustainability of effects differed between ill-health and positive mental health outcomes. As in Conley's review (Conley, Durlak & Kirsch, 2015), no effects of psycho-educational interventions on any outcomes were evident in our data, regardless of the duration of follow-up. The second review (Conley et al., 2016) reported a significant effect of universal interventions at any follow-up periods ranging between 13 and 52 weeks as well as a positive effect of selective interventions during the follow-up periods of 2-26 weeks. Similarly, in our study mental ill health outcomes were reduced by universal interventions for up to 7-12 months of follow-up and by selective interventions at follow-ups of up to 3-6 months, although our results on positive mental health and academic performance outcomes were less conclusive. Similar to the third review (Conley et al., 2017), our results indicated that the most sustainable effects were observed for interventions designed to reduce the symptoms of depression and symptoms of anxiety. Although our literature search for intervention studies was not limited to psychological interventions, only this type was retrieved. The scarcity of organizational mental health promoting interventions was verified by a scoping review (Enns et al., 2016). However, an exception may be a recent systematic review on learning environment interventions for medical student well-being, suggesting changes to curriculum (Wasson et al., 2016). Their results support previous findings suggesting that to maximize the effectiveness of mental health promotion, all levels of delivery must contribute, i.e., not just individual and group levels, but also structural, and societal levels (Hamilton & Bhatti, 1996). To further improve the sustainability of student mental health promotion, psychological interventions may be combined with a whole-setting approach, as endorsed by the WHO initiative health promoting universities (HPU) (World Health Organization, 1995).

Limitations
Systematic reviews on student mental health have indicated lack of follow-up data on outcome assessment as a major obstacle for determining the long-term effect of interventions (Conley, Durlak & Kirsch, 2015;Conley et al., 2016;Davies, Morriss & Glazebrook, 2014;Farrer et al., 2013). Likewise, the scarcity of studies assessing the effects of interventions at post-interventional follow-ups of longer than three months along with a substantial variability in the lengths of follow-ups reported in the original studies should be considered as major limitations of our review. In particularly, the lack of original evidence affected our analysis of positive mental health outcomes as it restricted us to mainly aggregating the effects of interventions with 3-6 months of follow-up. Other limitations must also be considered. First, in most cases low numbers of studies in sub-groups prevented us from exploring the moderating effect of types of interventions, study-level determinants and participant characteristics during follow-up periods longer than six months making the results of sub-group analyses tentative. This also precluded us from conducting in-depth investigation of sources for heterogeneity, which was found to be mostly high. Second, our intention to analyse two dimensions of mental health, that resulted in combining the original outcomes into the "mental ill health" and "positive mental health and academic performance" outcome categories, with a hierarchical approach applied, could have boosted heterogeneity. In a subsequent analyses, we attempted to reduce heterogeneity by pooling together the studies with the same specific outcomes reported, though for several outcomes it was not possible due to data scarcity. Third, the evidence was insufficient to obtain any aggregated ESs for the specific outcomes, in particular, for self-reported worries, passive coping, academic performance, self-compassion, mental and subjective well-being, resilience and happiness rating. Fourth, a substantial variability exists in measurement instruments and, in several cases, the same outcome was measured by different scales. We tried to address this limitation by choosing Hedges' g as an ES and by investigating how sensitive the aggregated results were to our initial approach of combining the original ESs in cases of multiple outcome measures or in multi-armed RCTs. The sensitivity analyses proved the robustness of our findings for mental ill health, though for positive mental health outcomes the use of the lowest ESs from the original studies altered the results for studies with 7-12 and 13-18 months of follow-up. Fifth, more than 30% of the original studies were assessed as being of weak quality. To address this issue, we conducted sub-group analyses stratifying the trials by study quality. For both categories of outcomes, these analyses revealed inconclusive results when trials with insufficient quality were pooled that should be accounted when interpreting our results. Furthermore, selection bias was the most commonly identified weakness. This bias, whether induced by the investigators or caused by self-selection may have resulted in either underestimation or overestimation of the original ES and therefore could affect the aggregated results. Finally, the results should be seen in the context of the presence of publication bias among the studies with 3-6 months of follow-up and of our inability to assess publication bias for positive mental health outcomes at follow-ups longer than six months, which may have resulted from our restriction to English-language publications at the final stage of selection.

CONCLUSION
Despite the limitations, the evidence suggests long-term effect sustainability for mental ill health preventive interventions, in particular, for interventions to reduce the symptoms of depression and symptoms of anxiety. Interventions designed to promote positive mental health offer promising, but shorter-lasting effects. As the research field of health promoting interventions for students expands, future studies may improve our attempts to establish the effectiveness and sustainability of those interventions, e.g., ascertaining the effects for specific positive mental health outcomes. In addition, future research should also focus on mental health organizational interventions to investigate their potential for students in tertiary education.