Stability of frontal alpha asymmetry in depressed patients during antidepressant treatment

Highlights • Frontal alpha asymmetry (FAA) is moderately stable over time.• Antidepressant response prediction with FAA remains consistent, even after 8 weeks of treatment.• FAA is a differential predictor of antidepressant response robust to state and drug effects.

properties of FAA as a prognostic biomarker, knowledge on its reliability, stability, and sensitivity to other factors, such as medication or severity of depression, needs to be established.
A predominant view in affective neuroscience is that FAA in depressed patients consists of mostly trait-like features, not changing over time with state and independent of interventions, although some studies have suggested otherwise: both longitudinal and cross-sectional designs have been used to test FAA stability (see Table 1 for a summary, and appendix Table A1 for a detailed overview of studies). With an exception of Debener et al. (2000), most studies report FAA to be stable with minor or no changes between baseline and assessment later, both in patients and healthy controls (Allen et al., 2004;Bruder et al., 2008;Davidson et al., 2003;Deldin and Chiu, 2005;Gollan et al., 2014;Keune et al., 2011;Spronk et al., 2008;Sutton and Davidson, 1997;Tomarken et al., 1992).
Despite some inconclusive results, the majority of findings indicate that FAA is predominantly a trait, only partially or not affected by changes in depressive state. Our meta-analysis on FAA as a diagnostic marker of depression (Van der Vinne et al., 2017) demonstrated that bias is strongly reduced from 300 cases onwards. Studies investigating FAA stability until now always studied smaller samples (n ≤ 85). This may explain part of the conflicting results on FAA in these studies.
This has motivated our current work that aims to replicate longitudinal results on the temporal stability of FAA by using data from the iSPOT-D dataset (baseline n = 1008, week-8 n = 453). The primary hypothesis was that FAA is reliable, and remains stable over time, with limited changes as a result of antidepressant treatment, time and state change. We therefore assessed FAA after eight weeks of antidepressant drugs and consequential state changes in mood. As age, sex, and depression severity have had a significant influence on FAA-related outcomes in iSPOT-D and other studies (e.g. Arns et al., 2016;Bruder et al., 2001;Stewart et al., 2010;Van der Vinne et al., 2017), we extended analyses by investigating possible mediation of FAA by these variables. We specifically studied MDD patients versus healthy controls differentiating subgroups identified in our previous meta-analysis, i.e. severely depressed patients over 53 years old (Van der Vinne et al., 2017). As in earlier iSPOT-D reports on FAA anxiety was not found to be of influence, we did not add this variable to our analyses.
For clinical use of FAA as a biomarker for treatment response, it is relevant to assess stability and robustness to medication. Stability is particularly an advantage when patients are already on an AD preceding baseline (that often have long half-life times requiring wash-out periods of weeks) and FAA remains unaffected. We therefore also assess outcome prediction with FAA recorded after eight weeks treatment. In our previous report (Arns et al., 2016), at baseline, right-sided FAA in females was associated with favorable outcome to the SSRIs escitalopram and sertraline, whereas left-sided FAA was not. If FAA is prognostic for AD treatment outcome in specific subsamples, and FAA is indeed a stable trait, FAA after eight weeks on an AD should still be able to predict treatment outcome for females in agreement with our previous study (Arns et al., 2016). We hypothesized that analysis of week-8 medicated EEG data would result in the same treatment prediction results as baseline unmedicated data did.

Design
This is an international multi-center, randomized, prospective openlabel trial (Phase-IV clinical trial), in which MDD patients were randomized to escitalopram, sertraline, or venlafaxine-XR treatment in a 1:1:1 ratio. The study protocol details, including a power calculation, have been published by Williams et al. (2011). This design was deliberately chosen to mimic real-world practice with the aim of optimizing the translatability to real world settings.

MDD patients and treatment
We included 1008 MDD patients, recruited between October 2008 and January 2011. A detailed description of the study assessments, inclusion/exclusion criteria, diagnostic procedures and treatment is available in Williams et al. (2011). In summary, the primary diagnosis of nonpsychotic MDD was confirmed before randomization using the Mini-International Neuropsychiatric Interview (MINI-Plus, Sheehan et al., 1998), according to DSM-IV criteria, and a score ≥16 on the 17-item Hamilton Rating Scale for Depression (HRSD 17 ). Additional measuring of depression complaints was done with the Very Quick Inventory of Depressive Symptomatology -Self Report (VQIDS-SR 5 , De La Garza, John Rush, Grannemann, and Trivedi, 2017). Comorbid anxiety disorders were allowed (present in 6.2% [specific phobia] to 10.5% [social phobia] of patients). All patients were either medicationnaive or, if previously prescribed an antidepressant medication, had undergone a washout period of at least five half-lives before the baseline visit clinical and EEG assessments. After the baseline visit, patients were randomized to one of three antidepressant medication treatments. After eight weeks of treatment, patients were tested again using the HRSD 17 , the VQIDS-SR 5 and an EEG assessment (Fig. 1). This study was approved by the institutional review boards at all of the participating sites and this trial was registered with ClinicalTrials.gov. Registration number: NCT00693849; URL: http://clinicaltrials.gov/ct2/show/ NCT00693849.

Pre-treatment assessments
EEG recordings were performed using a standardized methodology and platform (Brain Resource Ltd., Australia). Details of this procedure Williams et al., 2011) and of its reliability and acrosssite consistency have been published elsewhere (Paul et al., 2007;Williams et al., 2005). In summary, subjects were seated in a sound and light attenuated room that was controlled at an ambient temperature of 22°C. EEG data were acquired from 26 channels: Fp1, Fp2, F7, F3, Fz, F4, F8, FC3, FCz, FC4, T3, C3, Cz, C4, T4, CP3, CPz, CP4, T5, P3, Pz, P4, T6, O1, Oz and O2 (Quik-cap; NuAmps; 10-20 electrode international system). EEG was assessed for two minutes with eyes open (EO) (with the subject asked to fixate on a red dot on the screen) and two minutes with eyes closed (EC). The subject was instructed to remain relaxed for the duration of the recording. The operator did not intervene when drowsiness patterns were observed in the EEG. Data were referenced to averaged mastoids with a ground at AFz. Horizontal eye movements were recorded with electrodes placed 1.5 cm lateral to the outer canthus of each eye. Vertical eye movements were recorded with electrodes placed 3 mm above the middle of the left eyebrow and 1.5 cm below the middle of the left bottom eyelid. Skin resistance was <5 K Ohms for all electrodes. The sampling rate of all channels was 500 Hz. A low pass filter with an attenuation of 40 dB per decade above 100 Hz was employed prior to digitization.

EEG analysis
A detailed overview of the data-analysis can be found in Arns et al. (2016). In summary, data were (1) filtered (0.3-100 Hz and notch); (2) EOG-corrected using a regression-based technique similar to that used by Gratton et al. (1983), segmented in 4-second epochs (50% overlapping), and an automatic de-artifacting method was applied. This EEG processing pipeline was also validated against an independent manual-processing pipeline (Arns et al., 2016). For further analysis, an average reference was applied, data were filtered (alpha power (µV 2 ): 8-13 Hz) and FAA was calculated between F3 and F4 as (F4 -F3)/ (F4 + F3).

Statistics
Normal distribution was inspected, and appropriate transformations performed in case of non-normality. Non-log transformed alpha power was used to calculate FAA. Remission was defined as a score ≤7 on the HRSD 17 eight weeks after starting treatment (current endpoint), and response was defined as a ≥ 50% decrease in HRSD 17 score from baseline to eight weeks. To control for antidepressant side-effects, we employed the VQIDS-SR 5 , developed specifically to focus on the core symptoms of depression. This enabled us to measure true depression severity, ruling out antidepressant side-effects such as physical complaints. We repeated ANOVAs from paragraph 3.2 and 3.3 and replaced all HRSD 17 variables with VQIDS-SR 5 equivalents. Results are reported in Appendix D.
Differences in age, sex, education, and depression severity at baseline were tested using one-way ANOVA or non-parametric tests, depending on its distribution. We only included patients who returned for their week-8 visit while on their assigned medication, having followed this treatment for a minimum of 6 weeks ('per-protocol' grouping, also see the Consort diagram in Fig. 1).
FAA reliability analysis was performed by calculating Intraclass Correlations (ICCs) across baseline and week-8 measurements. A fullfactorial Repeated Measures ANOVA was conducted with the within-subject factor FAA Change Eyes Closed (FAA at baseline and after eight weeks) and between-subject factor Treatment arm (comparing drug effects of respectively escitalopram, sertraline, and venlafaxine). Given the large sample size we set the significance level for main effects found for FAA Change in the main analyses at p ≤ .01, for interaction effects this remained at a conventional level of p ≤ .05. When significant interactions were found prompting subgroup analyses, again a level of p ≤ .05 was used. Effect sizes (ES) of main effects are reported in Cohen's d. FAA stability was also tested through Pearson correlations between FAA Change and HRSD 17 Change.
Post hoc, we repeated the Repeated Measures and Pearson correlations analyses in the subgroups of moderately and severely depressed (HRSD 17 score of ≥24) over the age of 53, separately for males and females (conform our meta-analysis, Van der Vinne et al., 2017). However, as these groups might lead to underpowered tests, we also performed a custom Repeated Measures ANCOVA on the whole dataset, now also including covariates Age and Depression severity, separately for males and females.
When a null hypothesis was not rejected by any of the ANOVAs or correlational analyses, we utilized Bayesian alternatives. This was done for testing evidence of absence of a change in FAA, using the Bayesian Repeated Measures ANOVA framework (based on work by Jeffreys (1961) and Rouder et al. (2009)). We analyzed the data with JASP (JASP Team, 2017). The first null hypotheses states that there is no difference in FAA between baseline and after 8 weeks. The second that FAA Change is not correlated to HRSD 17 Change. The two-sided alternative hypotheses state that FAA changed after eight weeks, or that FAA is correlated to HRSD 17 Change.
Through a Repeated Measures model (Arns et al., 2016), we again predicted treatment outcome in females taking an SSRI (escitalopram or sertraline), while this time replacing baseline FAA with week-8 FAA (within subjects variable FAA Condition (EC and EO), and between subjects variable Response, and covariate Age). We tested effects onetailed (halved p-values were reported) because we specifically expected more right-sided FAA in SSRI responders than in non-responders, implying that a result in the unexpected direction would lead to the same conclusion as finding no differences at all (Ruxton and Neuhäuser, 2010). In Appendix B, we explain why we compare the smaller sample containing only patients who were present for the assessment after 8 weeks, to the larger sample with all baseline patients from the previous study.

Results
Of the 1008 MDD patients enrolled, the final MDD sample for the FAA Change analyses consisted of 453 MDD patients. The remaining 555 patients were left out of the study: they either never started treatment, had less than 6 weeks of medication, or had no week-8 assessment (or it was of insufficient quality) (see Fig. 1). Table 2 shows demographic information and response and remission rates for included patients. There were no differences between the three treatment groups regarding age, sex, baseline MDD, anxiety severity, remission and response rates, or number of rejected EEG epochs. Approximately 5.3% of EEG epochs were rejected due to artifacts for the MDD group during EC.

FAA change over time
ICCs for FAA with both continuous and dichotomous (leftward or rightward FAA) variables were 0.276 and 0.256, respectively. The Repeated Measures ANOVA revealed no evidence for change in FAA after AD treatment (F(1,450) = 1.421, p = .234), nor an interaction with Treatment Arm (F(2,450) = 0.690, p = .502). FAA Change was neither significantly correlated to the change score in HRSD 17 (r = 0.039, p = .410), nor to the percentage change in HRSD 17 (r = 0.047, p = .323).  Results of Bayesian Repeated Measures testing of invariant (constant) FAA revealed a Bayes factor indicating evidence for the null hypothesis. The models with the factors FAA Change and Treatment Arm showed that the data occur >7.4 times more likely under the null hypothesis, than under any alternative model with (a combination of) the factors. Bayesian Pearson correlations between FAA Change and the difference score HRSD 17 /the percentage difference of HRSD 17 reveal moderate to strong results. The data are respectively 12.1 and 9.3 times more likely to occur under the null hypothesis than under the model assuming a correlation between the variables. See Appendix F for an elaboration on results and JASP tables.

Extended repeated measures model and correlations
Focusing on variables known to have an influence on FAA, specifically in the subgroup we thought to be prone to changes in FAA (severely depressed females and males over 53 years old), we did not find significant changes, although subsample sizes were small. Furthermore, in these subgroups the FAA Change score was not significantly correlated to the change score in HRSD 17 (see appendix Table C1 for all statistics). Bayesian Repeated Measures ANOVAs for the two sex groups of severely depressed over the age of 53 reveal anecdotal (i.e. worth no more than a bare mention, a customary description for BFs ranging 1-3) to moderate results. Most models therefore provided no conclusive evidence for either the null or the alternative hypotheses, although some models indicated moderate evidence of the data being more likely to occur under the null hypothesis. See Appendix F for an elaboration on results and JASP tables.
Extending the Repeated Measures model from paragraph 3.2 showed that -irrespective of sex -baseline severity and age are not significantly contributing to FAA Change. Bayesian Repeated Measures alternatives for the extended ANOVAs showed similar results to paragraph 3.2. For females, the data are ≥6.6 times more likely to occur under the null hypothesis, than under any alternative model with (a combination of) the factors, and ≥4.7 times more likely in case of males. See Appendix F for an elaboration on results and JASP tables.

Treatment prediction using medicated week-8 data in females
Treatment outcome prediction with week-8 data, revealed a similar prediction pattern as baseline data reported in Arns et al. (2016): onetailed testing of the prediction of response in females taking an SSRI for depression (escitalopram or sertraline), treatment response effects remained significant with week-8 FAA on group level (F(1,150) = 3.725, p = .028). Furthermore, the response effect of FAA was again lacking after eight weeks in the venlafaxine group.
The week-8 SSRI data in Fig. 2 visualize how responders were significantly more right-sided than non-responders (based on female FAA means reported in appendix Table E1). Fig. 2 also shows how the response effect was similar to the baseline assessment. This was despite the confidence interval (CI) of FAA in Fig. 2 (SSRI non-responders) showing no significant difference from 0 when measured with EO after eight weeks. No interactions with age were observed. The equivalent of Fig. 2 data for males is available in Appendix G.
Cohen's d comparing FAA change scores of female SSRI responders and non-responders was 0.304. When using the direction of week-8 FAA alone to prescribe an SSRI or SNRI would have improved the overall remission rate from 47% to 56-58% for an SSRI.

Discussion
We investigated the stability of FAA in MDD patients during antidepressant treatment. We hypothesized that FAA is a robust metric, insensitive to time, antidepressant drug treatment and state changes. FAA did not change significantly after eight weeks of escitalopram, sertraline, or venlafaxine treatment, despite a relatively low reliability of the FAA measurements. Additional Bayesian testing revealed that a stable FAA is more likely than a change in FAA over time after antidepressant treatment. Furthermore, post-hoc tests with variables known to have influence on FAA (in earlier iSPOT-D studies), revealed no differential temporal changes in FAA in depressed patients differing on age, sex, depression severity, or change in depression severity. Focusing on core depression symptoms only (as measured by the VQIDS-SR 5 , see appendix D), we found similar results.
To further confirm FAA temporal stability, we hypothesized that predicting treatment outcome in females taking SSRIs would lead to similar outcome when using week-8 FAA instead of the previously studied baseline FAA (Arns et al., 2016). This re-analysis indeed confirmed an overall response in the SSRI group with right-sided FAA, and a nonresponse with left-sided FAA. Although the effect size was less pronounced with week-8 data, week-8 FAA yielded the same conclusions as the baseline measurements, with a Cohen's d of 0.547 in the previous analyses vs. our current 0.304. Furthermore, we yielded the same improvement in remission rates when week-8 FAA had been used for 'prescribing' medication: previous SSRI remission rates improved from 46% to 53-60% using baseline FAA, the current from 47% to 56-58% using week-8 FAA. This extends the use of FAA as a prognostic biomarker, as response prediction was neither modified by moment of assessment, nor by AD treatment.
The low reliability was unexpected, and implies that FAA following treatment was not as stable as in previous studies. In several studies, FAA was found to be relatively reliable and consistent, based on ICCs and Cronbach's alpha (Allen et al., 2004;Debener et al., 2000;Keune et al., 2011;Sutton and Davidson, 1997;Towers and Allen, 2009). Especially Towers and Allen (2009) demonstrated FAA consistency, through several methods. An important difference is the use of a single FAA statistic per assessment time (two in total) in our study vs. several other studies using (fictive) multiple time points. This could account for our lower reliability. Despite the low ICC, we did replicate no evidence for a significant change in FAA over time, in a large sample (N = 453).
To our knowledge, this is the first study to assess the temporal stability of FAA in a large sample. This supports previous studies showing that FAA mainly depends on a considerable number of traitlike features, insensitive to antidepressant treatment, age, sex or depression severity (Allen et al., 2004;Arns et al., 2016;Bruder et al., 2008;Carvalho et al., 2011;Deldin and Chiu, 2005;Feldmann et al., 2018;Gollan et al., 2014;Keune et al., 2011;Nusslock et al., 2018;Spronk et al., 2008;Sutton and Davidson, 1997;Tomarken et al., 1992;Van der Vinne et al., 2017;Vuga et al., 2006). Similarly, Segrave et al. (2011) showed no evidence for antidepressant elicited changes in FAA when comparing a small group of depressed patients on ADs with unmedicated patients. In other small cohorts, FAA was not modified by the use of antidepressive medication either (Bruder et al., 2008;Vuga et al., 2006), in agreement with our observations.
In the prevailing approach-withdrawal motivation system hypothesis, it is assumed that FAA is associated with lifetime MDD (having had at least one depressive episode in one's life), and not specifically current MDD. This is an important distinction, and our results initially support this theory. The motivation system hypothesis states that FAA is not expected to change as a result of changes in MDD status, and ultimately not with MDD remission. However, with establishing FAA (in)stability, our study would neither provide evidence for, nor against the theory. That is, if we would have found the opposite result (a change in FAA), this could have been explained as well, by the related capability model (Coan et al., 2006). This model states that resting state FAA is more prone to fluctuations than FAA measured after inducing positive or negative mood. Because we measured resting state FAA, either outcome could be explained within the approach-withdrawal motivation system, given the capability model. Therefore, it is difficult to unambiguously place our results in the existing theories. Note that our earlier findings were less compatible with the motivation system: Firstly, in the N. van der Vinne,et al. NeuroImage: Clinical 24 (2019) 102056 approach-withdrawal motivation system, left-sided FAA is theorized to be more associated with withdrawal behavior and depression. But brain asymmetry was found not to be different in these groups as measured both through EEG FAA (Van der Vinne et al., 2017), and through fMRI in a recent large ENIGMA consortium study (de Kovel et al., 2019). Secondly, prognostic results for females in the FAA iSPOT-D study (Arns et al., 2016) revealed heterogeneity in MDD patients, not consistent with assuming a homogenic FAA related vulnerability for MDD.
In sum, the current study was not designed to directly investigate the approach-withdrawal motivation theory, and cannot provide support in favor of or against the theory. We show that FAA is a robust metric, suitable for sex specific treatment prediction under challenging circumstances, such as state, time, the use of common antidepressive agents and drug changes. This suggests reliable implementation in clinical practice as a prognostic biomarker in both medicated and unmedicated patients.

Conclusions
In an adequately powered sample, we demonstrate that (1) neither antidepressant medication, (2) nor MDD state and severity, have systematic effects on FAA. This confirms FAA stability. Furthermore, as prognosis of treatment response is irrespective of the moment of measurement, FAA may serve as a robust biomarker to optimize MDD treatments. Table A1.    Gotlib et al., 1998 Regression with 1st predictor "Never vs. Ever depressed" and 2nd predictor "Currently depressed vs. remitted" log R − log L Most relevant results: No difference between currently depressed and remitted. FAA seems to be a state independent marker. Grünewald et al., 2018 Correlation FAA-BDI

ln[right] − ln[left]
Overall no specific conclusion on state or trait, but no correlations were found in the MDD group.

Appendix B. Comparison baseline and week-8 data
To justify the use of a follow-up sample that is supposed to contain the same MDD patients as the baseline data (paragraph 3.5), but does not due to incomplete assessments, we performed the baseline analysis from Arns et al. (2016) on only those who did have a complete week-8 assessment. The effect within the SSRI group was the same (p = .001, F(1,150) = 10.619, see Table B1 for all statistics). Table C1.

Appendix D. VQIDS-SR 5
To control for AD side effects, we repeated analyses from paragraph 3.2 and 3.3 and replaced all HRSD 17 variables with VQIDS-SR 5 equivalents. Correlational analyses showed that FAA Change was neither significantly correlated to the change score in VQIDS-SR 5 (r = 0.059, p = .225), nor to the percentage change in VQIDS-SR 5 (r = 0.060, p = .219).
Focusing on variables known to have an influence on FAA, specifically in the subgroup we thought to be prone to changes in FAA (severely depressed females and males over 53 years old), we did not find the FAA Change score to be significantly correlated to the change score in VQIDS-SR 5 , although subsample sizes were small. Extending the Repeated Measures model from paragraph 3.2 showed that VQIDS-SR 5 baseline severity and age are not significantly contributing to FAA Change, both in males and females (see table D1 for all statistics).

Table B1
P-values of mentioned interaction effects in the re-analysis of Arns et al. (2016) with data only of MDD patients who had measurements after 8 weeks (thus excluding FAA baseline measurements of patients who did not return for follow-up).

F1. Elaborated Bayesian analyses paragraph 3.2
Results of Bayesian testing of an absence of change in FAA, revealed a Bayes factor indicating evidence for the null hypothesis: the models with the factors FAA Change and Treatment Arm showed that the data occur >7.4 times more likely under the null hypothesis, than under any alternative  Note: All models include subject. Note: Compares models that contain the effect to equivalent models stripped of the effect. Higher-order interactions are excluded.
N. van der Vinne, et al. NeuroImage: Clinical 24 (2019) 102056 model with (a combination of) the factors. This means that moderate evidence for the null hypothesis was found with only FAA Change in the model (BF 01 = 7.483), increasing to (very) strong evidence when adding a combination of the two main effects (BF 01 = 240.356) and including their interaction effect (BF 01 = 5109.119). The error percentage was <2.5%, which indicates sufficient stability of the numerical algorithm that was used to obtain the result. For each factor, the BF inclusion reflects how well the factor predicts the data by comparing the performance of all models that include the factor to the performance of all the models that do not include the factor. For both the factors FAA Change and Treatment Arm, there is weak evidence in favor of their inclusion (BF inclusion = 0.134 and 0.031 respectively), as well as a weak evidence in favor of the inclusion of the interaction effect (BF inclusion = 0.047). This implies that these factors are not providing evidence for change in FAA. See Table F1   Note: All models include subject. Note: Compares models that contain the effect to equivalent models stripped of the effect. Higher-order interactions are excluded.  Note: Compares models that contain the effect to equivalent models stripped of the effect. Higher-order interactions are excluded. Note: All models include subject. Note: Compares models that contain the effect to equivalent models stripped of the effect. Higher-order interactions are excluded.  Note: All models include subject. Note: Compares models that contain the effect to equivalent models stripped of the effect. Higher-order interactions are excluded.