Behavioural Activation versus Treatment as Usual for Depressed Older Adults in Primary Care: A Pragmatic Cluster-Randomised Controlled Trial

Introduction: Effective non-pharmacological treatment options for depression in older adults are lacking. Objective: The effectiveness of behavioural activation (BA) by mental health nurses (MHNs) for depressed older adults in primary care compared with treatment as usual (TAU) was evaluated. Methods: In this multicentre cluster-randomised controlled trial, 59 primary care centres (PCCs) were randomised to BA and TAU. Consenting older (≥65 years) adults (n = 161) with clinically relevant symptoms of depression (PHQ-9 ≥ 10) participated. Interventions were an 8-week individual MHN-led BA programme and unrestricted TAU in which general practitioners followed national guidelines. The primary outcome was self-reported depression (QIDS-SR16) at 9 weeks and 3, 6, 9, and 12-month follow-up. Results: Data of 96 participants from 21 PCCs in BA and 65 participants from 16 PCCs in TAU, recruited between July 4, 2016, and September 21, 2020, were included in the intention-to-treat analyses. At post-treatment, BA participants reported significantly lower severity of depressive symptoms than TAU participants (QIDS-SR16 difference = −2.77, 95% CI = −4.19 to −1.35), p < 0.001; between-group effect size = 0.90; 95% CI = 0.42–1.38). This difference persisted up to the 3-month follow-up (QIDS-SR16 difference = −1.53, 95% CI = −2.81 to −0.26, p = 0.02; between-group effect size = 0.50; 95% CI = 0.07–0.92) but not up to the 12-month follow-up [QIDS-SR16 difference = −0.89 (−2.49 to 0.71)], p = 0.28; between-group effect size = 0.29 (95% CI = −0.82 to 0.24). Conclusions: BA led to a greater symptom reduction of depressive symptoms in older adults, compared to TAU in primary care, at post-treatment and 3-month follow-up, but not at 6- to 12-month follow-up.


Introduction
Depressed older adults (≥65 years) often do not receive adequate psychological care [1].Research shows that older adults have less access to psychological treatment than younger adults, even though its effectiveness does not differ between age-groups [1,2].Many older adults report not knowing where to get help or feel like they should be able to solve their problems themselves [3].Additionally, older adults have a 2-3 times lower chance of being referred for a guideline-focused psychological treatment [4].Even though the majority of older adults with depression would rather receive psychological than pharmacological treatment, most patients are treated with antidepressants [5,6].Antidepressants often take several weeks to show effect, and their use is associated with risks and adverse events, such as falls and possible adverse interactions with two-thirds of the medications prescribed to older adults [7].
Older people tend to visit their general practitioner (GP) frequently, making the primary care setting a practical option for increasing accessibility of mental health care [8].Despite the common constraints of limited time, competing priorities, and limited availability of mental health experts in primary care [9], behavioural activation (BA) may still be a viable option due to its simplicity and feasibility [10].Within BA, the therapist and patient work together to create a personal environment of positive reinforcement by increasing functional and pleasurable behaviour and decreasing avoidant and depressed behaviour [11].BA is as effective as cognitive behavioural therapy and more effective than control conditions in reducing depression in adults [12,13].BA uses behavioural techniques, leaving out the more complicated cognitive techniques, and thus can be delivered by less specialised health practitioners such as mental health nurses (MHNs) [10].It is feasible, effective, and cost-effective in managing depression in primary care [14].
In their meta-analysis, Orgeta and colleagues conclude that BA might also be effective for treating older adults with depression but that sample sizes were too small to draw firm conclusions [15].Several studies also showed that collaborative care, which included telephone-delivered or face-to-face-delivered BA, was more effective in treating and preventing major depression, dysthymic disorder, and clinically relevant depressive symptoms in older adults than usual primary care [16][17][18][19].Nonetheless, adequately powered real-world RCTs investigating the effectiveness of BA as a standalone therapy for depressed older adults are lacking.Therefore, we aimed to evaluate the effectiveness of BA delivered by MHNs, compared to treatment as usual (TAU) for older adults with moderate to severe depressive symptoms in primary care.A cluster-randomised design was chosen because of possible contamination of knowledge by participating MHNs.There was no other way to ensure that the therapy was delivered by separate mental health professionals across treatment arms [20].Our primary hypothesis was that BA leads to a greater reduction of depressive symptoms at the patient level compared to TAU after treatment and at subsequent follow-ups.Results regarding cost-effectiveness, moderators, and mediators of BA will be described elsewhere.

Trial Design
We performed a multicentre, cluster-randomised trial in primary care with two treatment arms and a 12-month follow-up period.Clusters were PCCs which were recruited by contacting two umbrella organisations (networks of several PCCs sharing the same pool of MHNs) as well as standalone PCCs in the Radboudumc network.Two umbrella organisations (Zorroo [19 PCCs] and NEO [24 PCCs]) joined as well as 16 standalone practices.PCCs were eligible when they had at least one MHN employed (which applies to almost all PCCs in the Netherlands).Details about the design have been described elsewhere [21].This study was funded by ZonMw (843001606) and registered at the Dutch Trial Register (NL5436)1 retrospectively, 2 months after recruitment started, due to an administrative error.At that point in time, ten participants were recruited, but no changes in the trial design have been made after the start of recruitment.

Participants
Participants were older adults (≥65 years) who presented with depressive symptoms at their general practice either spontaneously or after reading an information leaflet about the ongoing trial.Inclusion criteria were aged 65 years or older and current clinically relevant depressive symptoms as measured with the Patient Health Questionnaire 9 (PHQ-9 ≥ 10) [22], the shortest validated screening instrument for older adults in primary care [23].Exclusion criteria were current severe mental illness (except severe depression), high risk of suicide, drug and/or alcohol abuse in need of specialised treatment, as assessed with the Mini International Neuropsychiatric Interview (MINI5.0.0) [24], psychotherapy in the previous 12 weeks or current treatment by a mental health specialist, as confirmed by the GP, and moderate to severe cognitive impairment, as measured with the Montreal Cognitive Assessment (MoCA < 18) [25].
In case the MINI 5.0.0 results suggested a severe mental condition, an old-age psychiatrist as well as the patients' GP were consulted to determine whether participating in the study and thereby withholding another specialised treatment would be damaging to the patient.Only patients in need of specialised care were excluded.Physical illness, comorbid psychological disorders that were not in need of specialised treatment, disability, or mild cognitive impairment were not exclusion criteria.Additionally, illiteracy or a nonperfect understanding of Dutch language were not exclusion criteria as independent, trained, and blinded research assistants offered assistance with filling in questionnaires at participants' homes.These assistants were employed as quality officers at the research department of a large mental health institute (Pro Persona) and were neither involved in the study design nor the analyses.Patients with antidepressants were eligible, provided that a stable dose had been maintained for at least 12 weeks before participating in the study.

Randomisation and Masking
PCCs were randomly allocated to BA or TAU, taking into account the number and sex of the employed MHNs, the size of the patient population, and affiliation with an umbrella organisation.An independent statistician was provided an anonymised list of the abovementioned characteristics per PCC, without any traceable data and created two equal groups of PCCs based on these characteristics.Computer generated random numbers were used to assign these groups to conditions (BA or TAU).The allocation sequence was concealed until clusters were enrolled and assigned to one of the interventions.Participants and MHNs were not blinded to treatment allocation due to the nature of the intervention.Research assistants involved in collection of outcome assessment were blinded to allocation.

Procedures
Older adults that visited their GP or MHN with depressive symptoms, complaints of loneliness, or medically unexplained symptoms were informed about the study with an information letter.After their consent, they were asked whether they agreed to be contacted by our research assistant.Some older adults visited their GP after having read posters, newsletters, or information booklets about the study in the waiting room.These eligible participants were possibly referred by their GP to the research assistant that contacted them to plan a home visit to check for eligibility and inform them about the allocated treatment option in their PCC (BA or TAU).
During this baseline home visit, an experienced and trained research assistant conducted the MINI5.0.0 and cognitive assessment.Participants received the baseline questionnaire (online or on paper), which they were asked to complete within the following week.In case of illiteracy, language barriers, or difficulty answering questions, an extra meeting with a different, independent research assistant was planned to assist with the questionnaires.Approximately 1 week after the baseline meeting, treatment started.Nine weeks after the baseline meeting, the post-treatment home visit and questionnaires were planned.In the follow-up period (3-12 months after the post-treatment meeting), participants received a follow-up questionnaire every 3 months, either on paper or online.Questionnaires included questions about depressive symptoms, possible mediators, and a cost-effectiveness measure.A detailed description of all used measures can be found elsewhere [21].Participants received assistance with filling in questionnaires, if necessary.GPs informed the research team immediately about any serious adverse events (SAEs) such as hospitalisation or death during the study and 12-month followup.Research assistants asked whether SAEs had taken place at follow-up appointments.SAEs were reported to the sponsor as well as the medical ethical committee.Non-serious adverse events were not collected in agreement with the medical ethical committee due to an expected high frequency in the older population and limited likelihood these were associated with the intervention.

Interventions
In the intervention arm consisting of 30 PCCs, BA was delivered in one 45-min face-to-face session followed by seven 30-min face-to-face sessions.All key elements of the original BA protocol of Martell et al. were included, such as functional analysis, activity registration, activity scheduling, and relapse prevention [11].BA was delivered by primary care MHNs at the PCC or at participants' homes.All MHNs had at least a bachelor degree in nursing and were educated about different treatment techniques such as motivational interviewing and problem solving but not of protocolised therapies for mental health problems such as cognitive behavioural therapy or BA for depression [26].All MHNs (n = 29) in the BA arm received a 2-day training by licensed specialists in the delivery of the BA protocol.They received biweekly online supervision in small groups.MHNs filled in a session checklist for every patient to check for therapy adherence to the BA protocol after every session.Before the start of treatment, participants were made aware that BA consisted of a limited number of 8 weekly sessions and that the goal was to finish treatment within these 8 sessions.During the course of the intervention, no additional mental health interventions were delivered to the BA group.Any prescribed antidepressant medication remained on a stable dose.
In the TAU arm, consisting of 29 PCCs, treatment options were consistent with the guidelines of the Dutch College of GPs [27].Health professionals in the TAU arm were free to determine the frequency and duration of care.Usual MHN support consists of 30-min eclectic counselling sessions.

Depressive Symptoms
The Quick Inventory of Depressive Symptomatology selfreport 16 items (QIDS-SR16) was the primary outcome measure of severity of depressive symptoms at patient level at baseline, posttreatment (9 weeks), and all follow-ups (3, 6, 9, and 12 months after post-treatment) [28].The QIDS-SR16 is a self-report instrument assessing depressive symptoms during the last 2 weeks with a score range of 0-27 with higher scores indicating more depressive symptoms.It is widely used, can be administered in 5-7 min, and has good psychometric properties [28].
The PHQ-9 was used to assess the eligibility of potential participants and was also used as a secondary outcome to assess changes in depressive symptoms over the course of treatment (weeks 2, 4, and 7), post-treatment (week 9), and all follow-ups.

Behavioural Activation for Late-Life Depression
The PHQ-9 is a 9-item self-report instrument that assesses depressive symptoms during the last 2 weeks.The psychometric properties of the PHQ-9 are good.The PHQ-9 has a score range from 0 to 27.Patients scoring PHQ ≥ 10 can be classified as having clinically relevant depressive symptoms [29].

Depressive Disorder and Psychiatric Comorbidity
A trained research assistant administered the complete M.I.N.I.5.0.0 to distinguish patients with a major depressive disorder from patients with only clinically relevant symptoms and assess psychiatric comorbidity at baseline.At post-treatment and 6-and 12-month follow-up, only the section about depression was administered to determine whether participants had a major depressive disorder.The M.I.N.I.5.0.0 is a short diagnostic interview that assesses current psychological disorders based on DSM-IV diagnostic criteria.

Statistical Analyses
Prior to the study, we calculated a sample size of n = 200 (alpha = 0.05, power [1-beta] = 0.80, 2-tailed test) with an expected effect size of 0.50 based on available data from studies with younger participants [21].Since the COVID-19 pandemic posed the risk that we had to discontinue the trial prematurely, we performed an interim analysis on the 161 participants to determine the need to restart recruitment after the first lockdown.We took into account the actual correlation between pre-and posttreatment (0.468), the design effect based on actual ICC (0.017) and cluster size (4.35),actual dropout during the treatment phase (17.5%), and cluster variation (0.275), which led to a necessary number of 137 participants (alpha = 0.05, power [1-beta] = 0.80, 2tailed test) to be able to detect an effect size of 0.50.The new calculation was conducted by an independent statistician.Baseline characteristics of the participants and the PCCs were summarised using the means for continuous data and percentages for categorical data.We used the R software version 4.1.1[30] within RStudio version 2021.09.0 + 351a, specifically the lme4 and lmerTest packages [31], to model the differences in the course of depressive symptoms between BA and TAU with linear mixedeffects models.Prior to entering treatment effects in the models, we fitted two models to examine whether the PCC level added additional predictive power to the model (see online suppl.material; for all online suppl.material, see https://doi.org/10.1159/000531201).To model the non-linear effect of treatment on depression over time, while accounting for the intra-subject correlation between the different time points, we used flexible functions known as regression splines, including pre-treatment assessment of depression as a covariate and the random part of the model with one internal knot placed at the 9-week mark of end of treatment.We modelled the effect of time using linear and quadratic splines and selected the models with the best fit based on the Akaike information criterion values and Bayesian information criteria [32].Fixed effects were decomposed by ANOVA and evaluated by the F-statistic.Because a quadratic spline with one knot is difficult to interpret, we computed estimated marginal means (EMMs) using the R emmeans package [33] and examined whether the differences between the groups were significant at post-treatment and 3, 6, 9, and 12 months after post-treatment by performing t tests on these EMMs and computing standardised between and within effect sizes (Cohen's d).For the effect size calculation, we used the estimation of the SD of the intercept in our model but without the baseline covariate of the outcome because the SD of the intercept is nearly zero when this covariate is added.
The intention-to-treat analyses included all participants, regardless of the number of sessions received.The use of mixed models with time, treatment, interaction between time and treatment, and baseline covariate as parameters obviates separate multiple imputation methods for missing data [34].We replicated our final model with the secondary outcome measure (PHQ-9) which also included measurements at weeks 2, 4, and 7 of the treatment phase.
We determined the proportion of patients that showed reliable and clinically significant improvement and deterioration on the outcome measures at the 12-month follow-up, based on the model of Jacobson and Truax [35,36].Relapse and recurrence were assessed with survival analysis (Cox proportional hazards regression), using treatment group as a covariate as well as any covariates that significantly influenced outcomes in the main analysis.Relapse was defined as the return of clinically relevant depressive symptoms (PHQ-9 ≥ 10) once remission (PHQ ≤ 5) had occurredbut before recovery (minimum of 4 months with PHQ ≤ 5) had been attained [37].Recurrence referred to a return of clinically relevant depressive symptoms (PHQ-9 ≥ 10) after recovery had taken place [37].We utilised the PHQ-9 (relapse or recurrence of clinically relevant depressive symptoms) as it was used as an inclusion criterium, while MINI.5.0.0 (presence or absence of a depressive disorder) was not.
For the per protocol subgroup analysis, we used the final model with a subset of the data.We defined BA delivered "per protocol" as completion of at least five out of eight sessions before the end of treatment measurement (9 weeks) while adhering to the BA protocol in these sessions, as registered by MHNs in a provided checklist.When all treatment steps were checked in the checklist, the session was counted as "per protocol" session.Not fulfilling these per protocol requirements was regarded as preliminary discontinuation of treatment.Data of all TAU participants were used in the per protocol model since the requirements of TAU treatment were not defined.Furthermore, we analysed whether baseline characteristics were related to per protocol treatment completion within the BA group, as tested with t tests and Pearson's χ 2 tests.
To investigate whether "dropout" influenced treatment outcome, dropout was inserted as a binary variable in the final model.This variable indicated whether a participant did or did not finish the follow-up measurement of 12 months after post-treatment.We also analysed whether baseline characteristics of participants were related to later dropout, as tested with t tests and Pearson's χ 2 tests.
Lastly, we performed some ancillary analyses.We performed a "COVID" analysis to examine the effect of the COVID-19 period on our results.In this analysis, a binary variable stating whether the measurement point was before or during the pandemic was added as a factor in the final model.Furthermore, we investigated whether age, education, cognitive impairment, sex, and help with filling in questionnaires were related to treatment outcome when added as a factor in the final model.

Results
We recruited 27 PCCs before the start of the study, and because of concerns that the incidence rate would be lower than expected, we recruited the remaining 32 PCCs in two subsequent recruitment waves.Between July 4, 2016, and September 21, 2020, we assessed 395 older adults for eligibility.Of the 212 potential participants in the BA condition, 96 (45.2%) met the eligibility criteria and gave informed consent.In the TAU arm, 183 potential participants were screened, of whom 65 (35.5%)met eligibility criteria and gave informed consent.The twenty-two centres that did not recruit any participants were considered "lost to follow-up," even though they did not formally drop out of the study.At any given time during the study period, we had missing data for 43/96 (44.8%) participants in the BA arm and 20/65 (30.8%) participants in the TAU arm (shown in Fig. 1).Eight participants received BA treatment at home rather than at the GP's office.Three SAEs were reported which resulted in death.SAEs were not related to treatment or to depressive symptoms.
Baseline characteristics are shown in Table 1.Participants had a mean age of 75.2 years (SD = 7.0) and were predominantly female (60.2%).One in three participants had a comorbid anxiety disorder.The average duration of their depressive symptoms was 9.7 years (SD = 16.6), and more than half of the participants fulfilled the criteria for major depressive disorder.Out of all participants, 69% had pre-existing symptoms that were known and formerly treated by their GP, while 26% reported experiencing an incident depression.Details on the treatment history of the remaining 5% were missing.On average, participants used 4.8 medications prescribed by a medical professional for physical conditions and 0.6 medications for mental health conditions.Thirty-five (21.7%) participants needed help filling in the questionnaires.There were no significant baseline differences between groups.A detailed description of baseline characteristics can be found in the online supplementary material.

Main Results
One participant dropped out before the first measurement.All available data from all other participants (n = 160) were used in the intention-to-treat model.The best fitting model used quadratic splines without adding PCC as a level.The comparison between different models can be found in online supplementary Table S1.
The interaction between treatment arm and depression over time was significant [F (3, 127) = 4.60, p < 0.001].Figure 2 shows the predicted values of QIDS-SR16 based on our final model.At the end of treatment (9 weeks), EMM results show significantly fewer depressive symptoms in BA participants than in TAU participants (QIDS-SR16 difference = −2.77,p < 0.001), and a significant difference persisted until the 3-month followup after post-treatment (QIDS-SR16 difference = −1.53,p = 0.02) but not in the follow-ups at 6 to 12 months.The between-groups effect size at the end of treatment was d = 0.90 (95% CI, 0.42-1.38),a large effect [38].At the 12-month follow-up, the between-groups effect size was d = 0.29 (95% CI, −0.81 to 0.24), a small to medium effect [39].All EMMs and effect sizes can be found in Table 2.

Secondary Outcomes
The PHQ-9 model showed a similar pattern as the QIDS-SR16 model.The PHQ-9 was administered before and during treatment and at follow-up, showing that BA participants reported significantly fewer depressive symptoms than TAU participants, starting 2 weeks after baseline (difference = 1.18, p < 0.001) and reaching a difference of 3.21 (p < 0.001) at the end of treatment, which was reduced to a difference of 1.09 (p = 0.25) at the 12 month follow-up.
Based on the model data, using the standard deviation (SD) of baseline depression in our sample (SD = 3.96) at post-treatment, 50% of the BA participants and 17% of the TAU participants showed reliable clinical improvement, and none of the BA participants and 3.1% of the TAU participants showed reliable deterioration.At the 12-month follow-up, the percentage of reliable improvement increased to 52% of the BA participants and 29% of TAU participants, and 1.1% of BA participants and 1.6% of TAU participants deteriorated.A Cox proportional hazards regression analysis, with treatment group, and help with questionnaires as covariates showed that the estimated risk of relapse (hazard ratio, 1.67 [95% CI, 0.63-4.38];p = 0.30) and recurrence (hazard ratio, 0.88 [95% CI, 0.05-12.98];p = 0.88) in the BA condition did not differ from the risk of relapse in the TAU condition.

Per Protocol Analysis and Dropout
The per protocol analyses (n = 127) yielded similar results as the intent-to-treat analyses.Participants who completed BA treatment per protocol (N = 62) were more likely to live alone than participants who discontinued treatment preliminarily (N = 34); (χ 2 [1, N = 91] = 5.60, p = 0.018).
Adding dropout to the final model did not significantly influence treatment results, and there was no significant interaction between dropout and treatment group.Participants who dropped out of the study (n = 54) did not differ significantly at baseline from completers (n = 107) on all baseline measures except on cognitive impairment (MoCA score (t(161) = 10.3, p = 0.002) and age (t(161) = 4.42, p = 0.037).Participants who dropped out of the study had more cognitive impairment (MoCA, M = 24.72;SD = 2.81) and higher age (M = 77; SD = 7) at baseline than participants who did not dropout (MoCA, M = 26.10;SD = 2.36; age, M = 74; SD = 6).Details of these analyses can be found in online supplementary Table S2.

Ancillary Analyses
Adding a binary variable stating whether a measurement point was during a COVID-19 lockdown as a factor did not alter the results of the main analysis.Adding baseline variables age, education, cognitive impairment, and sex as a covariate did not alter the results of the main analysis.There was a significant interaction effect of time with received help during the study [F (3, 127) = 3.09, p = 0.03].Participants who received help during the study performed significantly worse on the 3-, 6-, and 9-month follow-up than participants filling in the questionnaires alone, with differences on QIDS-SR16 ranging from 1.66 at 3 months to 2.59 at 9 months.There was no interaction of received help with treatment arm [F (3, 127) = 0.006, p = 0.99].Detailed results of these ancillary analyses can be found in the online supplementary material.

Discussion
This cluster-RCT showed that symptom severity was significantly lower in BA participants compared to TAU participants at post-treatment, starting in the second week of the study and up to 3 months after the end of treatment but not at 6-12 month follow-up.We observed between-group effect sizes favouring BA of d = 0.90 directly after treatment which were reduced to d = 0.29 at the 12-month follow-up.Some bias might have occurred as a result of the cluster-randomised design.Consent and dropout rates were in line with expectations during the treatment phase, but BA (45%) had more dropouts than TAU (27%) after treatment.The BA arm had a higher consent rate (45%) than TAU (35%), and in TAU (13 PCCs), more PCCs were lost to follow-up than in BA (9 PCCs).The majority of PCCs (17 PCCs) that were lost to follow-up did not manage to or did not intent to recruit participants but did not formally withdraw.Our ancillary analyses showed that adding dropout as a factor in the main analyses did not alter the results and that age and cognitive impairment influenced dropout of the study but not treatment completion or treatment results.The only factor explaining preliminary discontinuation of treatment was "living alone."Participants living with a partner were more likely to discontinue treatment than participants living alone.This factor has not been found in earlier studies.
Orgeta et al. [15] showed that BA was a promising treatment for depressed older adults with moderate to large effect sizes (SMD = −0.72,95% CI, −1.04 to −0.41 at 4-12 weeks) compared to TAU but that higher quality research was needed to confirm this.To our knowledge, our study was the first adequately powered cluster-RCT assessing the effectiveness of a standalone BA treatment for depression in older adults, which shows similar results as the before-mentioned meta-analysis.Our results are comparable to those reported in a recent meta-analysis on psychotherapy for depression [2] that found large effect sizes for older adults (≥55 to 75 years; g = 0.66) as well as for the oldest old (≥75 years; g = 0.97) and to the results of the CASPER-plus trial that studied the effectiveness of collaborative care, with a focus on a brief BA intervention by telephone, as compared to TAU for older adults with depression [16].This study found that collaborative care was more effective in the short term than TAU, with an effect size of 0.32, but the effects did not remain significant at the 12-month follow-up and beyond [16].Collaborative care, however, includes appointing a case manager who discusses the patients' health with other health professionals, keeps track of and adds medication if needed, and calls patients to provide information about other possibly helpful services.Implementing a collaborative care system may therefore be more complex and Fig. 1.Consort flow diagram. 1 "Range" refers to the number of enrolled participants per PCC. 2 All serious adverse events during treatment were assessed and presented to the Dutch Central Committee on research involving human subjects (CCMO), and all were categorised as unrelated to treatment.possibly costlier than implementing BA.Moreover, it should be noted that at 12 months after treatment ended, the effects (d = 0.29) were similar to the effects of antidepressants as compared to placebo in other studies [40].

Behavioural Activation for Late-Life Depression
A main strength of this study was its ecological validity.Procedures closely matched the daily practice of PCCs, and our BA protocol was adapted to a primary care setting.Furthermore, compared to other studies, the participants in our sample were diverse in terms of age, symptom duration, and level of education.Whereas studies tend to be criticised for their stringent inclusion criteria, our study included older adults with physical and psychological comorbidity and mild cognitive impairment.The home visits by research assistants kept the one in five participants in the study who would have otherwise declined to participate, either because of illiteracy or difficulties with questionnaires.These participants had a higher severity of depressive symptoms on follow-up measures than participants who filled in questionnaires independently, regardless of treatment arm.
The study has some limitations that require careful consideration in light of the methodological recommendations for psychology trials by Guidi et al. [41].First, while the pragmatic design of the study improved its ecological validity by closely resembling daily practice, the resulting heterogeneity may have limited the ability to draw definitive conclusions about the efficacy of BA, as opposed to its real-world effectiveness.For example, by using a PHQ-9 cut-off as an inclusion criterium, we missed an opportunity to differentiate between patients in different phases of depressive disorder [41].It remains unclear whether participants who experienced first incident depression benefit similarly from a BA treatment to those with persistent or recurrent treatment-resistant depressive symptoms.Future studies could adopt a staging approach that incorporates a patient's history into the randomisation, gaining insights into BA's effectiveness in differing stages of depressive disorder.Furthermore, the varying frequency of attended sessions and the heterogeneous nature of the treatment options in TAU may have impacted outcome differences post-treatment.Possibly, BA participants received more attention of a health care provider, and potential treatment ingredients of TAU remained unclear [42].One way to eliminate such a "frequency effect" is to standardise the number of sessions in both treatment arms, for example, with an "attention placebo group" [41].This may result in a better understanding of BA's efficacy in this population but would be less pragmatic and limit understanding of the effectiveness in usual health care settings.Future pragmatic studies could provide a more detailed description of the frequency of depression-related care in the TAU group, allowing the frequency effect to be explored in analyses.
Second, potential bias may have occurred in this study.Outcome measures were self-report questionnaires filled out by participants who were aware of the treatment group they were assigned to.The use of a single self-rating scale fails to consider the wide range of variables that can impact the clinical representation of a disorder, while adverse events are usually ignored [41].Future studies should consider combination of observer-and self-rated tools, considering a wider range of factors beyond clinical symptoms.However, the QIDS-SR is valid when  2 ES = between-group effect size. 3 Post-treatment is at 9 weeks after baseline. 4Follow-up months are the months after post-treatment.
Behavioural Activation for Late-Life Depression compared to clinician-rated instruments such as the QIDS-C and MADRS but at a lower cost and burden on the patient [43,44] which is also an important consideration in study design.Furthermore, assessors assisting with questionnaires were independent and blinded to treatment condition, and no significant baseline differences between groups were detected.Both treatment groups in this study were presented as active and likely to be effective conditions, limiting impact of an expectancy effect on results.Third, dropout rates were high at both the patient as the PCC level.The enthusiasm expressed by participating umbrella organisations did not translate to active recruitment by all individual PCCs.Furthermore, MHNs in BA practices were offered additional training for the intervention, while MHNs in the TAU arm were not.Therefore, health care providers in the BA arm might have been more engaged in the recruitment process.Possibly, this has led to the inclusion of less motivated patients in the BA arm, which would explain the higher dropout rate in that group, though these non-significant rates may also be due to chance.Our ancillary analyses showed that adding dropout as a factor in the main analyses did not alter the results.
Fourth, recruitment ended prematurely due to CO-VID-19 lockdowns, but an interim sample size analysis conducted by an independent statistician confirmed that the study had sufficient power to answer research questions even with a slightly smaller dataset, as approved by the medical ethical committee.Ancillary analyses showed that the timing of the intervention either before or after COVID did not significantly alter treatment results.It remains unclear whether COVID-19 has affected participants in areas that questionnaires did not cover.A final limitation is the lack of formal registration of ethnicity as >95% was expected to be Caucasian.Therefore, results cannot be generalised to other ethnic groups.
These limitations notwithstanding, this study shows that MHNs can deliver BA, a low-intensity treatment for depressed older adults, and the oldest old in primary care that leads to a faster symptom reduction than TAU, resulting in significant differences between treatment groups at posttreatment but not in the long run (6-12 months after treatment ended).Two weeks into treatment, participants in the BA arm showed a significantly lower severity of symptoms than in TAU, lasting until 3 months posttreatment.A fast symptom reduction has individual and societal benefits since depressive symptoms are associated with a decreased quality of life, disability, greater healthcare utilisation, and increased mortality [45,46].BA is easy to train and easy to implement, which provides an opportunity to reduce the use of psychopharmaceuticals that seem to lack effectiveness in older adults.However, in the long term, BA does not outperform TAU, and the treatment elements that make up the effectiveness of BA in older adults remain unknown.Future studies could use staging methods to determine whether the treatment is effective in different stages of depression and investigate whether booster sessions could sustain the symptom reduction longer.

Fig. 2 .
Fig.2.The predicted values of QIDS-SR16 with 95% confidence interval and error bars for the effect of treatment on depression over time based on a quadratic spline model with baseline depression as a covariate and a knot placed on day 78 (median day of end of treatment measurement).

Table 1 .
Baseline information