A comparison of the three year course between chronic depression and depression with multiple vs. few prior episodes

This study tested the hypothesis that chronic depression (CD) is more similar to depression with multiple prior episodes (ME) than to depression with few prior episodes (FE). Data from participants (n = 1013) with mild to moderate depressive symptoms (Patient Health Questionnaire [PHQ-9] score 5 - 14) who took part in a randomized control trial of an internet intervention for depression (EVIDENT trial) were re-analyzed. The MINI-interview was conducted to diagnose CD (n = 376). If CD was not diagnosed, the self-reported number of depressive episodes was used to categorize participants as having episodic depression with up to five (FE, n = 422) or more than five (ME, n = 215) prior episodes. Over a three-year period, participants were assessed repeatedly regarding the course of depression (PHQ-9, QIDS), quality of life (SF-12) and therapeutic progress (FEP-2). At baseline, most scores were different between CD and FE but comparable between CD and ME. Time to remission did not differ between CD and ME but was longer in CD compared to FE. Results suggest that ME closely resembles CD and that CD differs from FE.


Introduction
Major depressive disorder is one of the most frequent mental disorders with a lifetime prevalence of 15% (Wittchen et al., 2011). Chronic depression (CD) develops in approximately 20 to 30% of depressed individuals (Jobst et al., 2016;Murphy and Byrne, 2012). Definitions of CD vary among diagnostic manuals and studies (see Jobst et al., 2016). In DSM-5 the diagnostic entity "persistent depressive disorder" was introduced as a new category that aims to combine dysthymic disorder (including double depression) and chronic major depression (American Psychiatric Association, 2013), as these forms of depression could not be differentiated satisfactorily based on their symptomatology and course, and because both illnesses respond to the same treatments (Ildirli et al., 2015;Jobst et al., 2016). The current ICD-10 (WHO, 2010) does not offer a distinct diagnosis of CD in a similar way as DSM-5, with only the category dysthymia corresponding to CD.
Currently, "chronicity" is defined by the duration of depressive symptoms, with a threshold of two years (American Psychiatric Association, 2013). Some clinicians, however, also label highly recurrent depression as "chronic". For example, Barnhofer et al. (2009) combined patients with more than three episodes and patients with CD in their trial on mindfulness-based cognitive therapy for CD. Research results indeed indicate similarities between recurrent depression and CD. A chronic course is predicted by similar risk factors as a recurrent course of depression (Hoertel, 2017;ten Have et al., 2018). Even more similarities can be found between CD and depression with multiple prior episodes (ME). CD is characterized by more psychiatric comorbidities, a more difficult treatment course, more submissive and hostile interpersonal styles, an earlier onset of depression, and more childhood adversities than non-chronic depression (Bird et al., 2018;Klein et al., 2015;Köhler et al., 2019;Nelson et al., 2017), whereby the https://doi.org/10.1016/j.psychres.2020.113235 Received 10 April 2020; Received in revised form 8 June 2020; Accepted 13 June 2020 latter two (onset, childhood adversities) are also characteristics of ME compared to depression with few prior episodes (FE) (Bockting et al., 2005;Ma and Teasdale, 2004).
Previous research suggests that CD shows more similarities to ME than to FE; however, to the best of our knowledge individuals with CD have never been directly compared to those with ME vs. FE. Using data from a randomized controlled trial, we compared these three groups (CD, ME and FE) with regard to differences in clinical variables (depressive symptoms, quality of life, well-being, symptom distress, interpersonal problems, and incongruence) at baseline (cross-sectional) and over the course of three years (longitudinal). Based on the literature cited above, we hypothesized that participants with the diagnosis of CD closely resemble (at baseline and over the course of three years) those with ME but differ from those with FE.

Methods
This study is based on a multicenter (five sites in Germany) randomized, controlled and assessor-blinded trial comparing the internetbased intervention "Deprexis" with the control condition care-as-usual (CAU). The trial was conducted in compliance with the Declaration of Helsinki and approved by the Ethics Committee of the German Psychological Association (SM 04_2012) and registered with ClinicalTrials.gov (NCT01636752). The full protocol (Klein et al., 2013), results regarding the effectiveness of the intervention (Klein et al., 2016(Klein et al., , 2017b as well as results for moderators of treatment effects (Probst et al., 2020) have been published previously.

Participants
The recruitment process has been described in detail elsewhere (Klein et al., 2017a). Inclusion criteria were age between 18 and 65, internet access, the ability to communicate in German, and self-reported mild to moderate depressive symptoms on the Patient Health Questionnaire-9 (PHQ-9; (Kroenke et al., 2001). Mild depressive symptoms were defined as PHQ-9 scores between 5 and 9, whereas moderate depressive symptoms were PHQ-9 scores from 10 to 14. Exclusion criteria were acute suicidality, lifetime diagnosis of bipolar disorder or schizophrenia (Klein et al., 2016;2017b).
In total, n = 1013 participants ( Fig. 1) were randomized (ratio 1:1) into the control condition (care-as-usual, CAU) or the intervention condition (CAU plus Deprexis). Randomization into the intervention or control condition was stratified by PHQ-9 (PHQ-9 < 10 vs. PHQ-9 ≥ 10), as participants with PHQ-9 < 10 (mild depressive symptoms) received unguided Deprexis and participants with PHQ-9 ≥ 10 (moderate depressive symptoms) received guided Deprexis. Based on the Mini International Neuropsychiatric Interview (MINI), a structured interview conducted at baseline (see measures), participants were categorized as having dysthymia or not. Dysthymia was diagnosed in n = 376 (37%) participants (CD group). Among CD-participants, 198 (53%) reported a history of up to five depressive episodes and 178 (47%) of more than five previous episodes. In 115 CD-participants dysthymia was accompanied by a current major depressive disorder, thus meeting the criteria of double depression according to DSM-IV (American Psychiatric Association, 1994). Participants without a MINI dysthymia diagnosis were categorized according to their self-reported number of depressive episodes as either non-chronic with a history of up to five depressive episodes (few episodes; FE group, n = 422, 42%) or non-chronic with more than five depressive episodes (multiple prior episodes; ME group, n = 215, 21%).

Intervention
All participants were permitted to use any form of treatment, including medication and psychotherapy for depression or other conditions. This constituted CAU. Participants in the control condition received solely CAU during the first 12-months. Following a naturalistic design approach, CAU was not influenced by the investigators but monitored during the course of the study (Klein et al., 2017). The control group participants were offered access to a psychological internet intervention (Deprexis) 12 months after the baseline assessment.
Participants in the intervention condition received CAU as well as immediate access to the internet intervention (Deprexis). Immediately after randomization, they were offered access to the 12-week CBTbased internet intervention Deprexis. Briefly, the Deprexis program consists of ten modules, including therapeutic techniques such as cognitive restructuring, behavioral activation, acceptance and mindfulness, relaxation exercises, problem-solving, and positive psychology interventions (Meyer et al., 2009).
Details on the treatment utilization (antidepressant medication, inpatient psychiatric treatment, outpatient psychiatric treatment) in the CAU as well as Deprexis group have been reported previously (Klein et al., 2016(Klein et al., , 2017b and details on adherence to Deprexis were published by Fuhr et al. (2018).

Mini International Neuropsychiatric Interview (MINI)
The MINI 6.0 (Sheehan et al., 1998) is a structured interview to assess DSM-IV (American Psychiatric Association, 1994) and ICD-10 (World Health Organization, 2010) psychiatric disorders. The MINI was administered by trained raters via telephone. Raters were mostly degree-educated psychologists but also included advanced graduate students majoring in psychology or medicine. Before they were permitted to rate trial participants, raters completed face-to-face training and were required to demonstrate adequate interrater reliability on an audiotaped interview.

Patient Health Questionnaire-9 (PHQ-9)
The PHQ-9 is a self-administered version of the PRIME-MD diagnostic instrument for common mental disorders (Kroenke et al., 2001). The nine items known as PHQ-9 represent the depression module; each item is rated on a 4-point Likert-scale from "0″ (not at all) to "3″ (nearly every day). The PHQ-9 score can thus range between 0 and 27, and remission can be defined as a total score below the threshold for mild depressive symptoms (PHQ < 5) (Nierenberg and DeCecco, 2001). The PHQ-9 has been shown to be a valid and reliable measure of depression that is sensitive to change (Löwe et al., 2004;Nierenberg and DeCecco, 2001). The PHQ-9 represented the primary outcome and was administered at baseline and at 14 follow-ups: 3-months, monthly from 4 to 12 months, 18-months, 24-months, 30-months, 36-months). In our sample, internal consistency was good (α = 0.80 -0.85).

Quick Inventory of Depressive Symptomatology (QIDS)
The PHQ-9 was complemented by the clinician-rated version of the QIDS via telephone interviews. Depressive symptom severity is assessed by 16 items referring to the last seven days, focusing on the nine DSM-IV criterion symptom domains measuring psychological as well as somatic symptoms (Rush et al., 2003). The QIDS was applied at baseline and at two follow-ups: 3-months and 12-months. The internal consistency of QIDS was satisfactory (α = 0.75).

Short-Form Health Survey (SF-12)
The SF-12 is an abbreviated version of the 36-item "Short-Form Health Survey", containing two subscales measuring physical and mental aspects of health-related quality of life (Ware et al., 1996). The SF-12 is a self-report measure rating the presence and severity of physical and psychological impairment over the course of the last four weeks. The subscale scores range from 0 to 100, where higher scores indicate higher levels of health. The SF-12 was administered at baseline and seven follow-ups: 3-months, 6-months, 12-months, 18-months, 24months, 30-months, 36-months. E. Humer, et al. Psychiatry Research 291 (2020) 113235 2.3.5. Questionnaire for the Evaluation of Psychotherapeutic Progress-2 (FEP-2) The FEP-2 was developed as a measure of therapeutic progress, which has been shown to be change-sensitive as well as reliable (Lutz et al., 2009;Lutz and Böhnke, 2008). The instrument can also be used as a broad symptom measure covering the following four dimensions: well-being, symptom distress, interpersonal problems and incongruence with respect to approach and avoidance goals. The FEP-2  Humer, et al. Psychiatry Research 291 (2020) 113235 consists of 40 items that are rated on a 5-point Likert-type scale, with higher scores indicating higher impairment. Reliability is high for the global scale (Cronbach α = 0.96; Retest between rtt = 0.69-0.77) and sensitivity to change has been demonstrated (Lutz et al., 2009). The FEP-2 was administered at baseline and seven follow-ups: 3-months, 6months, 12-months, 18-months, 24-months, 30-months, 36-months.

Statistical analyses
Statistical analyses were conducted with SPSS version 25 (Inc, Chicago, IL, USA). All tests were performed two-tailed and the significance value was set to p < 0.05. To evaluate differences in sociodemographic characteristics, univariate ANOVAs and chi-square-tests were conducted. Bonferroni corrections were applied for the pairwisepost-hoc tests. To assess whether the three depression groups (CD, ME, FE) differ at baseline and over time in the outcome variables (depressive symptoms, quality of life, well-being, symptom distress, interpersonal problems, and incongruence), linear multilevel models were conducted. One multilevel model was conducted for each outcome variable (PHQ-9, QIDS, SF-12, and FEP-2 scales), as the number of repeated assessments differed among the dependent variables (i.e., PHQ-9: 15 time points; FEP-2, SF-12: 8 time points; QIDS: 3 time points) as described above. All models had two-levels (repeated measures nested within individuals) and were performed with the full maximum likelihood method. Fixed effects were time and depression group (CD, ME, FE with CD being the reference) as well as the interaction between time and depression group. The time variable was calculated for each participant individually as days of the follow-up assessment after baseline assessment (coded as 0), since the timing of the follow-up assessments varied between participants. As random term, the random intercept was included, the random slope was not added as the models did not converge with a random intercept and a random slope.
We also conducted a time to event analysis to evaluate differences between the depression groups (CD, ME, FE with CD being the reference) in time to remission from depressive symptoms as measured with the PHQ-9 (the first time a PHQ-9 score < 5 was reached was defined as the onset of remission). Participants who did not achieve remission within 36 months or dropped out were censored. No missing data were substituted in the analysis. The Kaplan-Meier method was used to estimate time to remission (in days) and differences among the three depression groups were compared using the log-rank, Breslow and Taron-Ware tests.
The multilevel models and the event analysis were conducted for the total sample as well as for the intervention (Deprexis) and control condition (CAU) separately. This was done to explore whether the results for the total sample can be replicated in both conditions.

Sociodemographic variables
Significant differences among the three depression groups were observed in age (F(2, 1012) = 3.65, p = 0.026) and gender (χ 2 = 8.18, p = 0.017) but not in marital status and education (Table 1). Post-hoc tests for age showed that the CD group was older than the FE group (p = 0.021). Importantly, no age differences emerged between ME and FE (p = 0.918) ruling out the possibility that ME participants developed more episodes than FE participants because of their older age.

Outcome variables at baseline
CD vs. ME: For the total sample (Tables 2-4), the intercepts did not differ between the CD and the ME group in depression severity (PHQ-9 and QIDS), the SF-12 mental component, and the FEP-2 scales. Solely for the SF-12 physical component (p = 0.019), a significant difference towards stronger impairment in CD was observed at baseline. However, when the analysis was conducted for the CAU and Deprexis condition separately (Supplemental Tables 1-3), the intercepts of all dependent variables did not differ between CD and ME.
CD vs. FE: The intercepts of all investigated variables differed, with a stronger impairment in CD (p < 0.001) for the total sample (Tables 2-4). This was also the case when running the analysis separately for the Deprexis and the CAU condition (Supplemental Tables 1-3).

Comparisons between the three depression groups in trajectories of outcome variables over the 3-year observation period
The time courses of the primary outcome, as well as the secondary outcome variables, are reflected by the estimates for the fixed effect time (slope) summarized in Tables 2-4 for the total sample. For all analyzed dependent variables, no differences were observed between CD and ME as well as between CD and FE.
Separate statistical analyses of the two treatment conditions (Supplemental Table 1-3) were in accordance with the analysis of the total sample, except for differences in the control group sample between CD and FE in the FEP-2 scales interpersonal problems (p = 0.036), incongruence (p = 0.028) and the total score (p = 0.044). The slopes were more positive in FE compared to CD in these variables and this means less improvement over time for FE compared to CD.

Discussion
The current study was conducted to examine whether participants with CD resemble those with ME but differ from those with FE.
At baseline, significant differences emerged between CD and FE in all investigated outcome variables, but only one significant difference was observed between CD and ME (physical quality of life) in the total sample. Regarding the trajectories over three years, CD and ME did not differ in any of the parameters, whereas CD and FE differed in several outcome variables in the CAU condition. Furthermore, time to remission from depressive symptoms was similar in CD and ME but differed between CD and FE.
In summary, we found more differences between CD and FE than between CD and ME, although more than half of CD-participants (53%) had a history of up to five depressive episodes as had the FE group. Yet, we also observed a difference regarding physical aspects of health-related quality of life at baseline between CD and ME. These results E. Humer, et al. Psychiatry Research 291 (2020) 113235 provide partial support for our hypothesis. Conventionally, several factors are examined when trying to validate a diagnostic category: antecedent validators (e.g. risk factors), concurrent validators (e.g. symptom similarity) and predictive validators (e.g. clinical course and treatment response). Our results provide Abbreviations: CD: chronic forms of depression. ME: more than five depressive episodes. FE: up to five depressive episodes. N: sample size. SD: standard deviation. χ 2 : chi square. Abbreviations: QIDS: Quick Inventory of Depressive Symptomatology. PHQ-9: Patient Health Questionnaire-9. SE: standard error. CD: chronic forms of depression. ME: more than five depressive episodes. FE: up to five depressive episodes.

Table 3
Estimates of the multilevel model with SF-12 as dependent variable. Abbreviations: SF-12: Short-Form Health Survey. SE: standard error. CD: chronic forms of depression. ME: more than five depressive episodes. FE: up to five depressive episodes. Abbreviations: FEP-2: Questionnaire for the Evaluation of Psychotherapeutic Progress-2. SE: standard error. CD: chronic forms of depression. ME: more than five depressive episodes. FE: up to five depressive episodes.
concurrent (similar but not identical baseline symptoms) and predictive validators (similar three-year course) suggesting that patients with ME and CD belong to the same category. The studies by Bockting et al. (2005) and Ma and Teasdale (2004) provide antecedent evidence (childhood adversity and age of onset), suggesting that the ME and FE categories might be separate. Unfortunately, we were unable to compare participants with CD and ME with respect to these or other antecedent validators. Therefore, our results must be replicated and extended before firm conclusions can be drawn regarding the diagnostic validity and utility of a "frequent episode" category. Altogether, the current findings emphasize the need for further refinement of the definition of CD, to improve treatments for those who have developed a more protracted course of the disorder but are currently not considered as patients with CD. According to our study an extension of the definition of chronic forms of depression including those patients with a high number of previous depressive episodes, should be considered and evaluated in future studies.

Strengths and limitations
Several strengths and limitations should be noted. The first strength of this study is the relatively long observation period of three-years, which allowed us to study whether the outcome trajectories differed among the three depression groups. A second strength is the large sample size, which allowed us to create relatively large subsamples with different depression forms. A third strength concerns the fact that participants were recruited from a broad array of clinical and nonclinical settings, which enhances external validity. A fourth strength is the broad generalizability of our findings as the trial's inclusion criteria were broad, allowing any form of concomitant treatment. A fifth strength is that all statistical analyses were not only conducted for the total sample, but also for the intervention and control group separately. These additional analyses largely confirmed that the results for the total sample are replicable in both conditions. A final strength is that the diagnosis of dysthymia and the diagnosis of current depression (MINI) as well as depression severity (QIDS) were ascertained by trained clinical raters.
There are also limitations to consider when interpreting the results. The first and major limitation of the study is that CD was assessed with an interview, whereas the differentiation between ME and FE was based on self-reports. Recall bias regarding the illness course could have led to false classifications, at least in some participants (Andrews et al., 1999). Using the same method to assess CD, ME, and FE would have been ideal. A second limitation is that the differentiation between ME and FE was based on lifetime history of more or up to five episodes. In contrast, other studies used different cut-offs, such as three (Ma and Teasdale, 2004), four (Bockting et al., 2009), or five (Bockting et al., 2005) previous episodes. To avoid the problem of selecting a cut-off score to dichotomize the sample into those with multiple vs. few prior episodes, continuous scoring of the episodes should be used in future research, which should also apply validated clinical interviews to assess the number of previous depressive episodes. These future studies should also include a detailed assessment of the subtypes of CD, as the chronic forms investigated here included dysthymia and double depression only (the MINI interview evaluated current major depression as well as dysthymia but no other depressive disorders). Therefore, we do not know whether and how many participants met the criteria of persistent major depressive episode, intermittent major depressive episode with current episode, or intermittent major depressive episode without current episode as defined in DSM-5. It might be that the chronic forms of depression evaluated here (dysthymia and double depression) more closely resemble ME than FE, but other forms of CD might not show the same pattern. Another shortcoming is related to the lack of biological samples due to the online study design, e. g., cortisol levels (also in response to stimulus) might help to differentiate the groups. Moreover, there were age and gender differences between the groups. Including gender and age as moderators in further studies would be interesting to see if results are comparable for women and men and across age groups. A further limitation of the study is that generalizability is limited by the mild to moderate symptom severity range in our sample; thus, results might not generalize to patients with severe depression. Moreover, participants had higher levels of education compared to a representative population-based sample with the same range of depressive symptom severity (Späth et al., 2017). Therefore these results might not generalize to all persons with mild to moderate depressive symptoms. A final limitation is that participants were allocated to two conditions, with the timing of access to the intervention representing a confounder (immediate access in Deprexis vs. delayed access in CAU). However, to analyze whether the observed differences for the three depression groups are replicable among both conditions, separate analysis for both groups were performed, which confirmed the findings for the total sample.

Conclusions
We aimed to examine whether participants with ME and CD resemble each other closely, which might justify their inclusion in the same diagnostic category. Our results provide concurrent (similar but not identical baseline symptoms) and predictive validators (similar three-year course) suggesting that patients with ME and CD belong to the same category. More research is needed to elucidate whether ME and CD should be included in a combined diagnostic category, as these findings tentatively suggest.

Funding
This work was supported by the German Federal Ministry of Health, II A 5 -2512 FSB 052. The funding body had no role in the design of the study, data collection, analysis or interpretation of the data.   Humer, et al. Psychiatry Research 291 (2020) 113235 Declaration of Competing Interests JPK received payments for presentations and publications by the following companies: Beltz, Hogrefe, Elsevier. BM is employed as research director at GAIA AG, the company that developed, owns, and operates the internet intervention investigated in this trial. All the other authors report no relationships with commercial interests.