Preventing new episodes of bipolar disorder in adults: Systematic review and meta-analysis of randomized controlled trials

Uncertainty remains regarding the relative efficacy of maintenance pharmacotherapy for bipolar disorder (BD), and available data require updating. The present systematic review and meta-analysis aims to consolidate the evidence from the highest quality randomized controlled trials (RCTs) published up to July 2021, overcoming the limitations of earlier reviews. The PubMed and the Cochrane Central Register of Controlled Trials were searched for double-blind RCTs involving lithium, mood stabilizing anticonvulsants (MSAs), antipsychotics, antidepressants, and other treatments. Rates of new mood episodes with test vs. reference treatments (placebo or alternative active agent) were compared by random-effects meta-analysis. Polarity index was calculated for each treatment type. Eligible trials involved ≥6 months of maintenance follow up. Of 2,158 identified reports, 22 met study eligibility criteria, and involved 7,773 subjects stabilized for 1-12 weeks and followed-up for 24-104 weeks. Psychotropic monotherapy overall (including lithium, MSAs, and second generation antipsychotics (SGA) was more effective in preventing new BD episodes than placebo (odds ratio, OR=0.42; 95% confidence interval, CI 0.34-0.51, p<0.00001). Significantly lower risk of new BD episodes was observed with the following individual drugs: aripiprazole, asenapine, lithium, olanzapine, quetiapine, and risperidone long-acting (ORs varied 0.19-0.46). Adding aripiprazole, divalproex, quetiapine, or olanzapine/risperidone to lithium or an MSA was more effective compared with lithium or MSA monotherapy (OR=0.37; 95%CI 0.25-0.55, p<0.00001). Active treatment favored prevention of mania over depression. The key limitations were "responder-enriched" design in most trials and high outcomes heterogeneity. PROSPERO registration number is CRD42020162663.

In the past two decades , eleven metaanalytical reviews have been reported assessing the efficacy of prophylactic/maintenance pharmacotherapy for BD in randomized clinical trials (RCTs) (see the comparison table in the Supplementary file S1). These reviews included 692-9821 participants per meta-analysis (average N = 3768) who aged ≥16 years and had clinical follow up for 3-12 months. Most trials had "responder-enriched" design, in which subjects who responded favorably to initial treatment for acute mania or bipolar depression, were stabilized and then randomized to continue the same treatment, or to discontinue it for alternative medication or placebo.
A meta-analytical review of lithium trials by Geddes et al. found lithium to be more effective than placebo in preventing any BD relapses and manic relapses ( Geddes et al., 2004 ). A later review of RCTs on lithium by Severus et al. found this agent to have higher effectiveness than placebo in preventing any mood episodes, manic episodes, and, dependent on the type of analysis applied, depressive episodes ( Severus et al., 2014 ). The most recent review by Oya et al. of RCTs on lithium and lamotrigine found both agents to be significantly superior to placebo in preventing new BD episodes of any mood polarity ( Oya et al., 2019 ). Most other meta-analytical reviews have focused on a broader set of BD treatments, including MSAs and antipsychotics. Vieta et al. found that monotherapy with aripiprazole, olanzapine, quetiapine, risperidone, or VPA was associated with fewer new episodes of both mania and bipolar depression than placebo, and that among various combinations, only quetiapine added to lithium or VPA was more effective than lithium or VPA alone ( Vieta et al., 2011 ). Miura et al. and Kishi et al. have identified multiple drug regimens associated with significantly fewer new BD episodes than placebo (asenapine, aripiprazole, olanzapine, quetiapine, lamotrigine, lithium, VPA, aripiprazole + lamotrigine, aripiprazole + VPA, and others) ( Miura et al., 2014 ;Kishi et al., 2020a ). The most recent meta-analyses showed that monotherapy with SGAs or MSAs was associated with lower overall BD recurrence rates ( Kishi et al., 2020b ), and that combination SGA + mood stabilizer prevented recurrence for up to 12 months for BD type I compared with mood stabilizer alone ( Kishi et al., 2021 ). Only two meta-analyses focused exclusively on SGAs (Lindström et al. 2017) or specifically on olanzapine ( Cipriani et al., 2010 ). Liu et al. found greater benefits of ADs (alone or added to lithium or an MSA) over placebo in reducing risk of new episodes of bipolar depression without increased risk of mood switching into mania. However, compared with MSA monotherapy, ADs alone increased risk of mania without reducing risk of new depression (Liu et al. 2017) .
Most meta-analyses did not cover the entire range of available treatments for BD, and the most recent comprehensive meta-analytical review analyzing all BD treatment classes is now relatively outdated (Miura et al. 2014) . The systematic review for the latest guidelines by the International College of Neuropsychopharmacology (CINP) focusing on long-term BD treatment in adults was published in 2017 ( Fountoulakis et al., 2017 ). Some trials analyzed in earlier reviews were as brief as 12 weeks post-treatment, and thus may have included "relapses" of the initial (index) mood episode, rather than a true "new" BD episode ("recurrence"). Some reviews (Miura et al. 2014;Kishi, Ikuta, et al. 2020) also included unblinded or partially blinded trials, or excluded "continuation" studies (where subjects were randomly assigned to a maintenance treatment regimen while in an acute mood episodes (Miura et al. 2014) . These considerations indicate the need for an updated and comprehensive review of double-blind RCTs, comparing a broad range of prophylactic/maintenance BD pharmacotherapies based on post-treatment observation for at least six months. The aim of the present meta-analysis is to consolidate the evidence on maintenance BD pharmacotherapy from a set of the highest quality RCTs published to date, overcoming the limitations of earlier reviews.

Eligibility criteria
We included studies that were prospective, randomized, controlled, double-blind treatment trials for BD, reported between each database inception date and July 10th 2021, had prospective follow-up of at least six months, and had ≥30 participants per treatment arm. Diagnosis of BD was based on standardized criteria of the American Psychiatric Association Diagnostic and Statistical Manual of Mental Disorders (DSM) or the World Health Organization International Classification of Diseases (ICD). All languages were accepted provided there was at least an abstract in English. We excluded reports focusing on specific patient populations (such as pregnant or postpartum women, juveniles or the elderly); participants limited to those with particular co-occurring conditions (including anxiety, substance use, personality disorders, or suicidal behavior); trials limited to patients with rapid-cycling BD, or trials including cases of unipolar major depression or schizoaffective disorder or rapid cycling BD without reporting results for BD subgroups separately; trials with subjects known to be unresponsive to BD pharmacotherapy (lithium and/or VPA); and reports lacking quantitative data on treatment response (see the full list of selection criteria in Supplementary file S2). Subjects included were men or women, aged ≥18 years, hospitalized or ambulatory, with any type of BD (I, II, unspecified), entering trial in any current clinical state (manic, mixed, depressed, euthymic, with or without psychotic features), with any duration of illness and any previous treatment. Test treatments considered were lithium, MSAs, first generation antipsychotics, SGAs, ADs, and other substances (agomelatine, cannabinoids, ketamine, melatonin, memantine, or omega-3 fatty acids). Medicines could be given as a monotherapy or as an adjunct to other treatments, with any dose, formulation, or route of administration. Comparators could be an inactive placebo or an alternative psychotropic agent, and each treatment arm could involve monotherapy, drug combination, or drug with a psychosocial intervention.

Information sources
Cochrane Central Register of Controlled Trials was searched ("Advanced Search" option) with five embedded sources of information: PubMed, EMBASE (Excerpta Medica dataBASE), ClinicalTrials.gov, IC-TRP (International Clinical Trial Registry Platform), and CINAHL (Cumulative Index to Nursing and Allied Health Literature). We also searched the original PubMed website separately.

Search strategy, data selection and data collection processes
Two primary reviewers (AN, CG) conducted independent electronic literature searches through all databases using standardized search formulas detailed in the Supplementary files S3 and S4. Potentially relevant publications were identified by screening titles and abstracts, and the review of full texts and references of plausible reports has followed. Other authors' meta-analyses and systematic reviews on BD maintenance pharmacotherapy were also screened by the two primary reviewers and each referenced trial paper was explored. Each of the two primary reviewers has composed a list of RCTs meeting the eligibility criteria (mentioned above). Discrepancies in selections were resolved by consensus among the authors, and the final list of selected trials was consolidated. Duplicated data were reduced to the most recent/most comprehensive data. Study selection flowchart was composed.
The original RCT investigators were contacted via email by the first author (AN) when full text of the paper of interest was not accessible, or to obtain the unpublished data.
From the final set of reports selected for analysis, two primary reviewers (AN, CG) have independently extracted summary data into an Excel spreadsheet; these data were audited and revised by other reviewers (EV, MT, RB). A full list of data extracted is reported in the Supplementary file S5.

Data items, study risk of bias assessment, and effect measure
The primary outcome of interest was the proportion of subjects who developed at least one new BD episode of any type during the clinical follow-up after maintenance phase randomization. The secondary outcomes were proportions of subjects with new manic, mixed, and depressive episodes. An intent-to-treat (ITT) sample size (defined as subjects who received at least one dose of medication, had at least one follow-up assessment, or both) was used as a denominator (subjects at risk) to calculate outcome rates (subjects with a new episode). If the ITT sample size was not reported, we used the randomized sample size. When several follow-up points were reported, we used data from the longest available observation.
Since all of the included trials met our stringent inclusion and exclusion criteria (listed above), no quality score was assigned to each individual trial. Thus, risk of study bias was minimized from the very beginning. We made an attempt to further grade each study at higher data granularity level ( N ≥ 50 per treatment arm, follow-up ≥48 weeks, stabilization ≥4 weeks, and separate psychometric scales to assess mania and depression), however, this failed to elicit any useful cut-offs in quality hierarchy without compromising the analysis power.
The main effect measure for the outcomes of interest was Odds Ratios (OR) with 95% confidence intervals (CI) for number of newly observed BD episodes during the follow-up with test treatment vs. comparison treatment. To facilitate interpretation of drug differences we also reported 1/OR, with values > 1.0 signifying an excess of new episodes with placebo (or comparator treatment) vs. test treatment.

Synthesis methods
Meta-analyses were conducted using the Review Manager software ( "RevMan." n.d 2020 ) and SPSS 19.0 ( IBM Corp. Released 2010. IBM SPSS Statistics for Windows, Version 19.0. Armonk, NY: IBM Corp IBM Corp 2010 ) and verified with Stata.13 software to make sure that calculations performed independently by two authors (AN, RB) were in accord ( Stata 2013 ). Chi-squared test was used to compare the proportion of new BD episodes in two comparison groups (contingency tables); t -test was used to compare average event rates between the two comparison groups, and correlation analysis (Spearman's nonparametric r coefficient) was used to explore the association of the outcome rates and a continuous variable such as time spent in stabilization or follow-up duration, after verifying normality of data distribution with the Shapiro-Wilk test. Mantel-Haenszel random effects modeling was used to adjust for variability among trial results. Heterogeneity was reported using the I 2 index. All p-values were two-sided and the difference was considered significant at p-value < 0.05. Separate meta-analyses were performed for monotherapies, add-on treatments, medication classes, and individual drugs, as well as for manic and depressive subtypes of outcome episodes.
The "polarity index" (PI) ( Popovic et al., 2012 ) was computed for each treatment type as the ratio of the meta-analytically computed number-needed-to-treat (NNT) for BD depression to NNT for mania; PI > 1.0 is considered to indicate greater antimanic than antidepressive effect of treatment.

Bias mitigation
Reporting bias was mitigated by exhaustive search of all relevant trials with both positive and negative findings. Certainty bias was minimized by using a stringent set of trial inclusion and exclusion criteria.
Meta-regression modeling with Stata.13 explored the association of the outcome with the class of test drug, index mood episode polarity, pre-specified minimum duration of trial stabilization phase, pre-specified maximum duration of clinical follow-up, sample size, and publication year.

Registration
This systematic review and meta-analysis is reported in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines ( Liberati et al., 2009 ), and is registered with the International Prospective Register of Systematic Reviews (PROSPERO; CRD42020162663).

Study selection and study characteristics
Initially, 2158 reports were identified, and 22 of them representing a total of 7773 unique subjects met our eligibility criteria and were selected for analysis, as is summarized in a flowchart ( Figure 1 ). Some trials were excluded as they involved subjects with inadequate initial response to lithium or VPA administered for at least 2 weeks (Charles L. Bowden et al., 2013Bowden et al., , 2010, and thus were considered non-responders. Four older trials involving the AD imipramine were excluded after our joint research team discussion, being considered unrepresentative of the current clinical practice ( Prien et al., 1984 ;Quitkin et al., 1981 ;Kane et al., 1981Kane et al., , 1982. One RCT of Calabrese et al. was excluded because it focused on rapid-cycling BD ( J. R. Calabrese et al. 2000 ). The only RCT on omega-3 polyunsaturated fatty acids that met our study inclusion criteria failed to show significant prophylactic effect of this adjunctive supplement ( McPhilemy et al., 2020 ), and was biased by different baseline pharmacotherapy regimens in patients from the compared cohorts. One recent study on AD citalopram ( Ghaemi et al., 2021 ) allowed to include patients who have been depressed for ≥8 weeks despite receiving a mood stabilizer for ≥4 weeks, which may indicate that cohort included non-responders. Moreover, patients were not rerandomized after completing the acute treatment phase. The table in Supplementary file 1 displays reasons for exclusion of trials which were analyzed in other systematic reviews. Salient characteristics of the 22 selected trials are summarized in Table 1 .
The included studies were grouped to compare: [a] monotherapy vs. placebo, [b] active medication added to lithium or MSA vs. lithium or MSA alone, and [c] two active drugs. Of 13 monotherapy trials, 6 had three treatment arms, with two active treatments and one placebo arm (used as a control for both active agents in our metaanalysis). Thus, in total there were 13 monotherapy trials (with 19 pairwise comparisons), 8 add-on trials (8 pairwise comparisons), and one trial comparing two active treatments, for a total of 28 pairwise comparisons ( Table 1 ). In all included trials the participants were first treated for an acute episode of mania or bipolar depression, and then, after achieving "recovery," "remission," or "treatment response," they were clinically "stabilized" over a period of 1-12 weeks (mean 6.7 weeks) and randomized to one of the maintenance treatment arms. Thus, almost all trials had "responder-enriched" design. The pre-specified follow-up duration ranged from 24 to 104 weeks (mean 58.5 weeks). One study pre-specified "≥140 mood episode recurrences" as an observation endpoint, which was reached at 79.7 weeks on average with paliperidone and 40.4 weeks with placebo (Berwaerts et al. 2012).

Risk of any new mood episode
The overall mean proportion of subjects with any new BD episode following randomization was significantly lower with active medication (as a monotherapy or as an adjunct to lithium/MSA) compared with placebo alone or with lithium or MSA alone (   Meta-analysis of 8 placebo-controlled add-on trials found an overall superiority of active treatment added to lithium or MSA vs. lithium or MSA alone in preventing any new BD episode ( Figure 2 ) the resulting overall pooled OR was 0.37 (95%CI, 0.25-0.55; z = 4.88, p < 0.00001), or 1/OR 2.70 (95%CI, 1.81-4.02). The heterogeneity among add-on trials results was also high (I 2 = 74%). Several individual medicines added to lithium or MSA yielded significantly lower odds of any new BD episode compared with lithium or MSA alone; they included: aripiprazole, divalproex-extended release (ER), quetiapine, and either olanzapine or risperidone ( Figure 2 ).
The only trial with two active comparators moderately favored olanzapine over lithium (1/OR = 1.62; 95%CI, 1.09-2.41; z = 2.40, p = 0.02).  Table 2 provides a summary of the effects of all treatments in both monotherapy and add-on trials, as well as comparison of drug classes or individual drugs for prevention of any BD episode. All drug classes showed significant preventive efficacy, with comparable pooled OR in the following order (from lowest to highest OR value): SGAs

Polarity of index and outcome mood episode
The most common polarity of the index mood episode was manic or mixed (14/22 trials, 63.6%). Among subjects with new BD episode from all treatment arms, 49.5% experienced manic or mixed episode, and 50.4% experienced depression.
The "polarity index" (PI) was 1.38 for all agents, 2.29 for lithium, and 1.57 for SGAs, indicating greater effects in preventing mania; but it was only 0.38 for MSA class, indicating greater effects in preventing bipolar depression ( Table 3 ). Thus, based on the PI value, lithium had the highest antimanic prevention efficacy.

Meta-regression analysis and correlation analysis
Meta-regression modeling of rate of new BD episodes based on 27 comparisons of drug vs. placebo found no signifi-  (values > 1.00 favor reduction of risk for mania more than for bipolar depression). Random-effects meta-analyses were performed comparing active treatment versus placebo.
cant association of trial outcome with drug class (SGA, MSA, lithium), index mood episode polarity, minimum required duration of stabilization, or maximum follow-up duration, sample size, or trial reporting year (all z-scores were ≤1.23, all p-values were ≥0.22; not shown). Among placebo-treated subjects there was a weak nonsignificant inverse association between duration of stabilization phase (in weeks) and rates of new BD episodes (Spearman's r = -0.175, p = 0.43). There was also a statistically significant positive correlation between duration of follow-up (in weeks) and the rate of new BD episodes ( r = 0.419, p = 0.005) among placebo-treated subjects.

Discussion
This meta-analytical review on BD maintenance pharmacotherapies is one of the largest of the eleven others published in the past two decades (mentioned above) in terms of a total patient sample size. It adds the evidence from seven more trials compared with the latest formal meta-analysis on all BD pharmacotherapies (2014) (Miura et al. 2014) and complements the latest metaanalyses focused on MSAs and antipsychotics (Supplementary file S1) (Kishi, Ikuta, et al. 2020;Kishi, Matsuda, et al. 2020;Kishi et al. 2021) . Although we searched for a broad range of BD medications, only RCTs on lithium, MSAs and SGAs were selected for our final list of trials due to stringent eligibility criteria.
Our findings indicate that long-term psychotropic pharmacotherapy for 24 to 104 weeks (mean 58.5 weeks), either as a monotherapy compared with placebo, or as an adjunct to lithium or MSA compared with lithium or MSA alone, was more effective in preventing new mood episodes in clinically stabilized adult BD patients (mostly with BD type I) who had responded to initial short-term and stabilization treatment.
Heterogeneity of outcomes among the analyzed trials was high. Such variance may reflect the differences in subject characteristics, sample size, criteria and duration of "stabilization" following the index episode, criteria for a new mood episode, duration of initial treatment and of followup, and polarity of the index mood episode. Shorter stabilization prior to long-term randomization may increase the risk of new episodes to follow, which is supported by the observed trend of negative correlation between the minimum number of weeks pre-specified for "stabilization" phase and the rates of subsequent BD episodes. Longer follow-up is likely to be associated with a higher probability of new BD episodes being detected, as was supported by the observed positive correlation between the maximum number of follow-up weeks pre-specified and the rates of any new BD episode among placebo-treated subjects. Interestingly, our sensitivity analysis of the set of trials homogenized by minimum required stabilization time did not help to decrease the overall results heterogeneity, suggesting that other factors contributed largely into the within-study heterogeneity.
Among the studied classes of medicines, SGAs demonstrated the lowest odds of any subsequent BD mood episodes (OR = 0.37) compared with placebo, followed by lithium (OR = 0.46), and MSAs (OR = 0.53). However, since confidence intervals largely overlapped between all these treatment classes, no conclusion can be made on SGAs superiority. Asenapine appeared to be particularly effective, compared with placebo, though tested in only one trial ( Table 2 ). Other more extensively evaluated drugs with relatively favorable performance compared to placebo were the SGAs aripiprazole, olanzapine, quetiapine, and risperidone. Thus, our data support the recently developed CINP Treatment Guidelines for Bipolar Disorder in Adults, which recommend to "start with lithium, aripiprazole, olanzapine, paliperidone, quetiapine, or risperidone (including risperidone LAI) monotherapy" for the long-term BD treatment (Fountoulakis et al. 2017) .
There was an association between polarity of the index BD episode and polarity of the outcome BD episode. Patients who were initially manic (the majority) were more likely to develop subsequent new mania, whereas those initially depressed were more likely to develop depression. This finding suggests that both an incompletely remitted index episode (or a relapse) as well as an entirely new BD episode (recurrence) were observed. Inclusion of relapses seems particularly likely in that initial stabilization following treatment of acute index episode was for only 1-12 (on average 6.7) weeks, which is shorter than typical untreated episodes of mania and particularly of bipolar depression ( Manic-Depressive Illness 2008 ; Yildiz A Nemeroff C Ruiz 2015 ).
Lithium demonstrated the highest preventive effect for new BD manic episodes vs. new depressive episodes (PI = 2.29), followed by SGA class (PI = 1.57). While lithium and SGAs showed significant preventive effect for both mania and depression, MSAs were selectively effective for preventing depressive outcomes only (PI = 0.38). The latter find-ing may reflect the particularly weak antimanic effect reported for lamotrigine ( Solmi et al., 2016 ). In one previous study several tested MSAs (lamotrigine, oxcarbazepine, and VPA) showed PI values < 1.00, indicating greater prophylactic efficacy against bipolar depression than mania ( Popovic et al., 2012 ). Interestingly, in the same study lamotrigine had the lowest PI value among eight comparators (PI = 0.40) ( Popovic et al., 2012 ). Finally, active pharmacotherapy had an overall PI of 1.38, implying predominantly antimanic effects.

Limitations
This study is limited by a small number of trials with individual treatments other than SGAs, as well as by relatively high levels of between-trial heterogeneity. The "enrichment" design of most included trials adds several limitations. First, it impedes generalization of the findings, which are only applicable to medication "responders". Second, discontinuing the drug a patient was stabilized on might trigger relapse of the same index mood episode in placebo arm, and this event will be mistakenly registered as a "new" mood episode. Association of index and outcome mood episode polarity also suggests that the studied outcomes included both BD relapses and recurrences. Future RCTs should consider providing longer time for patient stabilization to adequately assess the actual rates of "new" BD episodes. Several pairwise comparisons used data from the same placebocontrolled trial with three treatment arms, so as to reduce heterogeneity artificially. Some clinical trials used "time to event" metrics as the primary outcome, whereas our metaanalyses relied on a reported proportion of patients with new BD episodes. Thus, even though some trials did not report the test drug to be significantly superior to placebo Vieta et al., 2012 ), our OR calculation did find its superiority. Notably, most index episodes were manic rather than depressive, despite the status of bipolar depression as the more challenging component of BD to treat ( Baldessarini et al., 2019 ;Baldessarini et al., 2020 ).
Of note, the actual time spent in "stabilization" phase beyond a pre-specified minimum could differ for each patient, and its average duration was not consistently reported. Some patients required weeks to meet stabilization criteria, others were stabilized quickly. The mean follow-up duration was only reported in seven out of 22 trials; while remitted patients were observed up to the last visit in the pre-specified follow-up time frame, those who developed a new mood episode had different time of drug exposure, which was typically reported as time-to-event metrics using Kaplan-Meier curves.
Additional trials with head-to-head comparisons are required to clarify the relative effectiveness of specific agents and drug classes to prevent new episodes of mania, and importantly, new episodes of bipolar depression following different initial mood states (especially euthymic, depressive, or mixed).

Contributors
AN and CG performed the literature screen and data extraction. RB, MT and EV audited the RCTs to be included. AN and CG have verified the included data. AN and RB performed data analyses which were further verified by YZ. RB, MT, EV, and CG provided clinical expertise regarding data interpretation. AN generated a paper draft which was collaboratively revised by RB, EV, MT, CG, YZ. All authors contributed to and have approved the final manuscript.

Role of the funding source
There was no single funding source for this study common for all the authors.

Author disclosures
Dr. Tohen was supported by a grant from Atlas Foundation.
Dr. Baldessarini was supported by a grant from the Bruce J. Anderson Foundation and by the McLean Private Donors Psychiatric Research Fund.
Dr. Zhu was supported by the National Institute of General Medical Sciences grants P20GM13042201, P20GM109089, and P20GM121196.
All the above-mentioned grant sources had no further role in study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the paper for publication.
Dr. Baldessarini was supported by a grant from the Bruce J. Anderson Foundation and by the McLean Private Donors Psychiatric Research Fund.
Dr. Zhu was supported by the National Institute of General Medical Sciences grants P20GM13042201, P20GM109089, and P20GM121196.
Other authors (Dr. Nestsiarovich, Dr. Gaudiot) have no financial relationships with commercial entities that might appear to represent potential conflicts of interest with the work presented.