Relationship between daily rated depression symptom severity and the retrospective self-report on PHQ-9: A prospective ecological momentary assessment study on 80 psychiatric outpatients

Background: Depression-related negative bias in emotional processing and memory may bias accuracy of recall of temporally distal symptoms. We tested the hypothesis that when responding to the Patient Health Questionnaire (PHQ-9) the responses reflect more accurately temporally proximal than distal mood states. Methods: Currently, depressed psychiatric outpatients ( N = 80) with depression confirmed in semi-structured interviews had the Aware application installed on their smartphones for ecological momentary assessment (EMA). The severity of “ low mood ” , “ hopelessness ” , “ low energy ” , “ anhedonia ” , and “ wish to die ” was assessed on a Likert scale five times daily during a 12-day period, and thereafter, the PHQ-9 questionnaire was completed. We used auto-and cross-correlation analyses and linear mixed-effects multilevel models (LMM) to investigate the effect of time lag on the association between EMA of depression symptoms and the PHQ-9. Results: Autocorrelations of the EMA of depressive symptom severity at two subsequent days were strong ( r varying from 0.7 to 0.9; p < 0.001). “ Low mood ” was the least and “ wish to die ” the most temporally stable symptom. The correlations between EMA of depressive symptoms and total scores of the PHQ-9 were temporally stable ( r from 0.3 to 0.6; p < 0.001). No effect of assessment time on the association between EMA data and the PHQ-9 emerged in the LMM. Limitations: Altogether 11.5 % of observations were missing. Conclusions: Despite fluctuations in severity of some of the depressive symptoms, patients with depression accurately recollect their most dominant symptoms, without a significant recall bias favouring the most recent days, when responding to the PHQ-9.


Introduction
Measurement-based care of depression (Trivedi et al., 2006), currently a standard in clinical practice, includes a systematic assessment of the severity of depressive symptoms and guides treatmentrelated decision-making.The Patient Health Questionnaire (PHQ-9) (Kroenke et al., 2001) is a widely used validated self-report depression scale that scores each of the nine Diagnostic and Statistical Manual of Mental Disorders, 5th edition (DSM-5) (American Psychiatric Association, 2013) criteria over the preceding two weeks.However, current mood may, by resulting in autobiographic memory bias (Hitchcock et al., 2019;Quinlivan et al., 2017), distort recall of distal emotional experiences in patients with mood disorders.Therefore, responses to the PHQ-9 might reflect more patients' temporally proximal than distal depressive symptoms within the two-week period.
Ecological momentary assessment (EMA), a method involving a frequent repeated assessment of patients' experiences, symptoms, and behaviour in real time and in the context of their daily life, may overcome the recall bias related to use of depression scales.Studies examining how daily EMA of depression symptoms corresponds to the total score of PHQ-9 are, however, rare and include small (Cao et al., 2020;Moukaddam et al., 2019;Torous et al., 2015) or non-clinical samples with self-reported depressive symptoms (Burchert et al., 2021;McIntyre et al., 2021).These studies have demonstrated moderate correlations between PHQ-9 scores and daily assessments of depression symptoms and a within-patient fluctuation of some of the depressive symptoms.For example, mood symptoms (Bowen et al., 2017) and suicidal ideation (Kleiman et al., 2017) seem to oscillate over time in depressed patients.This temporal fluctuation of symptom severity of depression, in addition to recall bias, may also affect the validity of the PHQ-9, supposed to reflect the most frequent and dominant depression symptoms over a two-week period, as required in the DSM-5 for major depressive disorder (MDD).
In this prospective study using EMA in psychiatric patients with major depressive episodes, we aimed to examine the temporal patterns of core depression symptoms ("low mood", "anhedonia", "low energy", "hopelessness", and "wish to die") severity assessed daily over the course of two weeks.We also investigated the effect of assessment timing on the association between the EMA of the daily depression symptom severity and patients' retrospective self-report of their depression symptoms over the two weeks, assessed by PHQ-9.We hypothesized that a) depressive symptom severity fluctuates over time, when assessed daily; b) the PHQ-9 scores reflect more the severity of depressive symptoms during temporally proximal days than distal days.

MoMo-Mood project
This study is a part of the Mobile Monitoring of Mood (MoMo-Mood) studya collaborative project between the Department of Psychiatry, University of Helsinki, the Department of Psychiatry, Helsinki University Hospital, the Department of Psychiatry, University of Turku, the Department of Psychiatry, Turku University Hospital, the City of Espoo Mental Health Services, and the Department of Computer Science at Aalto University.The Helsinki and Uusimaa Hospital District's Ethics Committee approved the research protocol on 29 August 2018.

Participants
Overall, 80 outpatients with DSM-5 (American Psychiatric Association, 2013) diagnoses of MDD (n = 46); currently depressed patients with bipolar disorder (BD) (n = 16), or borderline personality disorder (BPD) (n = 18) were recruited from outpatient clinics for mood disorders.Inclusion criteria were a) MDD, BD type I or II, or BPD with an ongoing major depressive episode; b) PHQ-9 score ≥ 10; and c) possession of a smartphone with an Android or iOS operating system.MDD and BD were ascertained with the Mini International Neuropsychiatric Interview (Sheehan et al., 1998) and BPD with the Structured Clinical Interview-II (First et al., 1997).Exclusion criteria were a) psychotic features; b) concurrent substance use disorder; and c) imminent risk of suicide.

Data collection
We used Aalto University's Niima Data Collection Platform (Aledavood et al., 2017).The Niima is primarily developed to run studies with various modalities of behavioural and physiological data, enabling various types of data from study participants to be collected and accessed.It allows different levels of access to data (e.g.access only to certain sources of data) for different researchers, depending on their role in the study.The platform is privacy-first; and does not use other cloud services as data intermediaries.Niima is an open-source project that also allows use of other open-source applications as necessary.In MoMo-Mood, we used AWARE application as part of Niima for collecting the EMA data.

Procedures
All participants after giving informed consent had the AWARE application (Ferreira et al., 2015) installed on their smartphones.During the two-week period patients received notifications on their phones 5 times during the day (once in the morning, one in the evening, and three times in between).The patients were asked to assess their mood, ability to feel pleasure, hopefulness, energy level, and willingness to live using Likert scale items (1-7).The mood item of the Likert scale varied from "I am very happy" -1 to "I am very depressed" -7; the energy level item from "I am totally exhausted" -1 to "I am very energetic" -7; the pleasure item "I am enthusiastic and feel pleasure" (1 -"not at all", 7 -"very much"); the hopefulness item from "I am totally hopeless" -1 to "I am very hopeful" -7; and the willingness to live item from "I want to die" -1 to "I want to live" -7.The patients fulfilled the PHQ-9 both before and the day after the EMA.Due to technical reasons the data on the first two days of the EMA was abnormal in some patients.Therefore, we marked the day of the PHQ-9 after the EMA as the day 0 and tracked the EMA responses back to the day-12.

Statistical analysis
In the analyses, we calculated means of the daily EMA item scores using all available data and investigated product-moment auto-and cross-correlations.No imputation was used.We estimated linear multilevel models (Bates et al., 2015;Gelman and Hill, 2007), regressing the experience sampling variables on the fixed-effects of lag-1 experience (autocorrelation), post-active PHQ-9 report, time lag to last sampling, and interaction of PHQ-9 and the lag to last sampling.The interaction represents the hypothesis that PHQ-9 is more correlated with the temporally proximal than with the distal experiences.As a sensitivity analysis, we compared a few recent days with the entire 12 days using a dummy indicator of time lag from 0 to 2 days from the last sampling (and PHQ-9 sampling).

Sample characteristics
The sample comprised 61 women and 19 men (mean age 35 years; s. d. 12 years).The mean PHQ-9 scores were 16.5 (SD 5.2) before and 16.1 (SD 5.6) after the EMA (NS).Overall, 11.5 % of all EMA but no PHQ-9 data were missing (all available data were used, and no patient was fully excluded from any analysis).The daily EMA means are presented in the Table 1.

Correlation analyses
The experience sampling variables of each day were correlated with the variable itself at the last assessment date, although slightly more so during the immediately preceding days (Fig. 1a).The mood variable appeared less stable over time than death thoughts (Fig. 1b).The correlation of the experience sampling variables (and their daily total scores) with post-active PHQ-9 reports remained much more stable over the 12-day follow-up than their autocorrelations (Fig. 1c vs. a).Confidence intervals of successive days clearly overlapped when examining the correlation of core symptoms with post-active PHQ-9 (Fig. 1d).

Linear multilevel regression analysis
In the linear multilevel regression analysis, experience samples were associated with their past values and with PHQ-9, but not with time of experience assessment relative to PHQ-9 (see Supplementary Table 1).In addition, as a sensitivity analysis, we estimated the same models without the lag-1 variable for autocorrelations, but the conclusion of the lack of assessment time-to-PHQ interactions on experience samples remained (see Supplementary Table 2).The conclusion also withstood a test of non-linear association of the few closest days to PHQ-9 assessment with experiences (see Supplementary Table 3).

Discussion
In this prospective ecological momentary assessment (EMA) study on 80 psychiatric outpatients with depression, we tested the hypothesis that PHQ-9 reflects more accurately temporally proximal rather than distal depressive symptoms due to previously postulated memory-and moodrelated negative bias.Overall, the PHQ-9 does not seem to favour the severity of depressive symptoms during the days that are more proximal to the responding, partly owing to relative temporal stability of the majority of depressive symptoms over the 12-day period, except for Table 1 The daily means and standard deviations (SD) of the ecological momentary assessment (EMA) questions during a 12-day follow-up of patients with depression (n = 80).a Explanations: The "mood" item on the 1-7 Likert scale varied from "I am very happy" -1 to "I am very depressed" -7; the "hopelessness" item from "I am totally hopeless" -1 to "I am very hopeful" -7; the "anhedonia" item was "I am enthusiastic and feel pleasure" (1 -"not at all", 7 -"very much"); "death" item from "I want to die" -1 to "I want to live" -7; and the "energy" item from "I am totally exhausted" -1 to "I am very energetic" -7.depressive mood, which markedly fluctuated in some patients with depression.This study has several strengths.First, to our knowledge, this is the largest EMA study examining the association between daily EMA of depression symptoms and PHQ-9 in psychiatric outpatients with depression (N = 80).Second, our study included also currently depressed patients with BD and BPD, allowing broader generalizability of the findings.Third, mood disorders were diagnosed with semistructured clinical interviews by experienced psychiatrists.
Our study also has limitations.First, not all depressive symptoms were included in our EMA items.Second, 11.5 % of the EMA data were missing and no imputation method was used.Third, we were unable to gather EMA data on depressive symptoms during the full two-week period that the PHQ-9 requires (fell two days short).Fourth, we deliberately used mean values of five daily assessments in the analyses, which limits our ability to unravel within-day symptom severity fluctuations.Fifth, the EMA items were not validated against the healthy controls.However, correlations between the EMA-responses and PHQ-9 scores support convergent validity of the EMA items.Sixth, effects of treatment were not investigated.Finally, all patients were diagnosed with an ongoing major depressive episode, limiting the range of their PHQ-9 scores and mood fluctuations.
The EMA of daily rated depressive symptoms allows examination of their temporal patterns.In our study, although varying over a 12-day period, autocorrelations of such important depressive symptoms as "wish to die", "hopelessness", "low energy", and "anhedonia" remained high or moderate.This is in contrast to depressive mood, which showed notable temporal variability in some patients, with the autocorrelations weakening with increasing time lag between assessments.This finding corresponds to the previous EMA studies in patients with MDD (Thompson et al., 2012;Torous et al., 2015), revealing instability of mood in some depressed patients.Fluctuating mood is probably even more distinguished in depressed patients with comorbid BDP (Trull et al., 2008), patients also included in our sample.In contrast, the EMA symptom "wish to die" appeared the most temporally stable, differing from a report describing rapid temporal fluctuation of suicidal ideation in depressed patients (Kleiman et al., 2017).Repeated answers of the patients with absent "wish to die" (low base rate) as well as conceptual differences between "wish to die" and suicidal thoughts may underlie this discrepancy.Overall, although varying, severity of the majority of core depressive symptoms appears moderately stable, apart from a mood symptom showing the most prominent fluctuation in some depressed patients over a 12-day period.
The correlations between the EMA of depressive symptoms (and their daily total scores) and the total scores of the PHQ-9 were stable and moderate throughout the 12-day period, agreeing with previous research (Burchert et al., 2021;McIntyre et al., 2021;Moukaddam et al., 2019).Moreover, we detected no associations between timing of assessment of depressive symptom severity and PHQ-9 scores.The relative temporal stability of correlations between the EMA of depressive symptoms and the PHQ-9 scores concur with our observation of moderate temporal stability of the majority of depressive symptoms over the 12 days.The variability in severity of some of the depressive symptoms, detected in the EMA, does not appear sufficiently marked to bias patients' ability to recall their most frequent and dominant depressive symptoms over the two weeks when answering the PHQ-9, which inquired about the frequency of depressive symptoms over this period.
In conclusion, despite variability in the severity of some depressive symptoms and previously postulated negative memory bias, patients with depression are able to recollect their most dominant mood state over a two-week period when answering the PHQ-9.

Fig. 1 .
Fig. 1. a) Autocorrelation of daily averages of experience sampling variables with the last day's value.b) Mood and death thoughts variables from panel (a) with whiskers for 95 % confidence intervals.c) The (cross-)correlation of experience sampling variables with the post-active PHQ-9 sum score.d) The values of panel (c) core symptoms are shown with 95 % confidence intervals.In panels (a) and (b), day-11 is the first day with EMA data and day 0 is the last day with EMA data.In panels (c) and (d) day 0 is the day when the PHQ-9 was answered and days-1, -2,…, -12 are the 12 days prior to the PHQ-9 where the EMA was completed.Abbreviations: EMAecological momentary assessment; PHQ-9 -Patient Health Questionnaire.