Discrepancies in self-reported and actigraphy-based sleep duration are associated with self-reported insomnia symptoms in community-dwelling older adults

Objectives: To establish agreement between self-reported and actigraphy-based total sleep time (TST). To determine the impact of self-reported sleep problems on these measurements. Design: Cross-sectional study using data from Wave 3 of The Irish Longitudinal Study on Ageing (2014 (cid:1) 2015). Participants: Community-dwelling older adults, aged (cid:1) 50 years, with self-reported sleep information and (cid:1) 4 days of actigraphy-based TST ( n = 1520). Measurement: Self-reported total sleep time, daytime sleepiness, insomnia symptoms (trouble falling asleep, trouble waking too early) measured during a structured self-interview. Actigraphy-based total TST was col- lected using GENEactiv wrist-worn accelerometers. Demographic characteristics and health information were controlled for. Analyses included descriptive statistics, reliability and agreement analysis using paired t -tests, intra-class correlations and Bland-Altman analysis. Linear regression was used to model associations with measurement discrepancies. Results: Participants reported that they slept 7.0 hours (SD: 1.4, Range: 2.0 (cid:1) 13.0 hours) on average, compared to 7.7 hours (SD: 1.2 hours, Range: 3.0 (cid:1) 13.0 hours) recorded by accelerometry. Trouble falling asleep or waking too early “ most of the time ” were associated with under-reporting of sleep by 2.3, and 2.2 hours respectively. Agreement between measurements had an intra-class correlation of 0.18 and wide 95% limits of agreement (-3.90 to 2.55 hours). Under-reporting of sleep was independently associated with insomnia symptoms. Conclusion: The agreement between self-reported and actigraphy-based TST in community dwelling older adults was low. Self-reported insomnia symptoms were independently associated with under-reporting of sleep. Studies seeking to measure sleep duration should consider inclusion of questions measuring experi- ence of insomnia symptoms to account for potential in ﬂ uence on measurements. the This study sought to assess measurement agreement between self-reported total sleep time and actigraphy-based total sleep time using data from TILDA. Overall agreement was low between measurements with systematic under-reporting of sleep relative to actig-raphy-based sleep apparent. Average results from both measurements were within one hour of each other, but participants their sleep by 0.7 hours compared to actigraphy-based sleep. Differences were more pronounced in participants reporting poorer health, such as fair/poor self-rated health and depressive symptoms, with discrepancies of greater than one hour between measurements.


Introduction
Sleep is a restorative process and plays a vital role in preservation of cognitive, physical and mental health. [1][2][3] Poor sleep quality and duration has been associated with adverse health outcomes such as cognitive impairment, cardiovascular disease, depression, and mortality, many of which older adults are at increased risk of. 1,[4][5][6][7][8] Measurement of sleep can be achieved both subjectively and objectively. Despite the importance of measurement of sleep quality and duration, the number of studies examining agreement between types of measurement in large community samples is limited. [9][10][11][12][13][14] Surveys typically use subjective means such as sleep questionnaires, or sleep diaries, to assess sleep in participants. Laboratory based polysomnography is considered the gold standard for sleep measurement, but is expensive and impractical to consider for use in large community dwelling populations. 15 Activity based monitoring through the use of accelerometer devices has been shown to be an effective tool for sleep research. 16 These have become a feasible method for capturing objective sleep in large population studies. Wrist-worn devices are cost-effective and allow for non-invasive objective measurement of sleep in natural settings over long periods of time.
Sleep quality is a complex construct. Perception of sleep may not corroborate with measured sleep, potentially impacted by sleep complaints experienced by an individual. 9,[17][18][19] A number of studies have identified disagreement between subjective and objective measurements of sleep in older adults where accelerometer devices were used. [9][10][11][12][13][14] The CARDIA study showed subjective reports of sleep to be longer than actigraphy-measured sleep, increasing by 31 minutes for each additional hour of measured sleep. 13 Sleep complaints are prevalent in the older population. 20 It has been reported that 30À48% of older adults report experiencing insomnia symptoms, with the most common complaint being difficulty initiating or staying asleep. 21 Sleep complaints were found to drive differences between objective and subjective measurements. Using wrist worn accelerometer devices in a sample of adults aged 57À97 years and the Pittsburgh Sleep Quality Index (PSQI), van den Berg et al. showed that adults reporting poorer sleep were more likely to under estimate their sleep duration compared to their accelerometer measurement, 11 while McCrae et al. reported that subjective and objective measurements matched only for those without sleep complaints. 14 Similarly, vulnerable older adults reporting sleep complaints were more likely to negatively report their sleep relative to their objective measurements. 12 It is advised for studies to include both subjective and objective measurements of sleep. 9,22 The inclusion of subjective measurement is common in population studies, but there is no consensus on which tool is optimal for this purpose. Subjective sleep duration is often captured by a single survey question such as "How many hours on average do you sleep per night". A number of high quality sleep scales have been validated, such as the PSQI (19 items), 23 the Epworth Sleepiness Scale (8 items), 24 or the Munich Chronotype Questionnaire (13 items). 25 These scales provide detailed data on different aspects of sleep, but application may be difficult in studies with high participant burden or clinical settings with limited time.
Large studies examining agreement between subjective and objective measurements in community dwelling older adults are limited. 11,26 A number of large, population representative studies of older adults have introduced accelerometery measurement including The Irish Longitudinal Study on Ageing (TILDA), 27 the English Longitudinal Study of Ageing, 28 and the Whitehall II study of British civil servants. 26 These studies provide unique opportunities to analyze sleep in rich, complex datasets. Using the Whitehall II cohort, van Hees et al. showed gender, depression and insomnia symptoms to be drivers of measurement disagreement. 26 As the use of objective measurement becomes more common in large population studies, it is important to understand the concordance of self-reported and objective sleep measurements as typically measured in these studies. The majority of research in this area has used small samples with limited generalisability. 9,12,19,29 This study set out to contribute to existing literature by assessing measurement agreement in TILDA cohort of community-dwelling older adults using wrist-worn accelerometry and quick to administer survey questions on sleep duration and complaints. The nature of the study allowed for an extensive list of known confounders for sleep measurement agreement to be accounted for in a large, population derived sample of community dwelling older adults.
This study aims to 1) establish the overall level of agreement between self-reported total sleep time and actigraphy-based total sleep time and 2) to determine the impact self-reported sleep problems have on measurement agreement using three short sleep problem questions.

Methods
Data were drawn from TILDA, whih is a nationally representative population study of community dwelling adults aged 50 and over in the Republic of Ireland. The TILDA study design has been outlined previously. 27,30 Briefly, the sample was selected using the RANSAM sampling procedure. 31 Interviewers visited residential addresses drawn from the Irish National Geodirectory. Any adult aged 50 or older, including their spouse of any age, were invited to participate. 8175 adults aged 50 or older took part in the first wave of data collection in TILDA (2009À2011) representing a response rate of 62%. Participants provided written, informed consent. Ethical approval was obtained from the Trinity College Dublin Faculty of Health Sciences Research Ethics Committee.
Through structured interviews, TILDA collects detailed information on the health, social and financial situation of each participant. Fig. 1 presents a detailed breakdown of the analysis sample for this study. This study analyzed data from Wave 3, collected during 2014 and 2015 (n = 6902). Where a participant could not complete the survey due to a cognitive or physical impairment, a proxy interview was offered. An end-of-life interview was administered to a close relative or friend where a participant had passed away. Only participants who were aged 50 years or older and had completed the selfinterview were included (n = 6497). Wave 3 included a comprehensive health assessment. 32 The health assessment was carried out by trained research nurses and offered to participants who completed the structured interview. 190 GENEactiv wrist-worn accelerometer devices were also available during Wave 3, facilitating an accelerometry study on a sub-sample of Wave 3 health assessment participants. Following their health assessment, a randomly selected group of participants were asked to wear an accelerometer device for seven consecutive days immediately following their assessment (n = 1578). 33 Previous analyses reported that four days of measurement were sufficient to obtain a reliable estimate of total sleep time in this cohort. 34 Due to technical faults with the device, or where a participant did not wear the device for the full seven days, 45 devices were returned with fewer than four days of data captured. A further 13 participants with incomplete data on self-reported sleep were excluded from the analysis. The final analysis sample consisted of 1520 participants.

Accelerometer devices
Devices used were wrist-worn GENEactiv devices. These devices have a measurement range of §8 g with a maximum logging period of 7 days at 100 Hz. The device is lightweight and water resistant. Devices have a body-temperature sensor that can confirm wear and nonwear periods. Data were processed using a fully automated Micro-Electro-Mechanical Systems (MEMS) accelerometer classification algorithm devised for epidemiological studies. 34,35 Measures

Total sleep time
Actigraphy-based Total Sleep Time (TSTobj) was classified as average total sleep time for the length of device wear as measured by the accelerometer device. 34,35 Self-reported Total Sleep Time (TSTsub) was classified using the interview question "Approximately how many hours do you sleep on a weeknight?". Participants were asked to round to the nearest hour when asked to self-report their average sleep. For comparative purposes, TSTobj was rounded to the nearest hour. As previous analyses have shown there to be no difference in weekday and weekend recordings in this sample, measured total sleep time was derived using all available days of recording. 34 Sleep reporting difference A sleep reporting difference score was calculated as (TSTsub-TSTobj). A positive value indicated over-reporting of sleep relative to objective measurement, and a negative value indicated underreporting.

Daytime sleepiness
Participants rated whether they were likely to doze off or fall asleep during the day on a four point likert scale À 0.would never doze, 1.slight chance of dozing, 2.moderate chance of dozing, 3.high chance of dozing.

Insomnia symptoms
Participants were asked two additional questions drawn from the Jenkins Sleep Scale which relate to insomnia symptoms. 36 These asked how often the participant has trouble falling asleep, and how often they have trouble waking up too early and not being able to fall asleep before. These were answered on a three point likert scale À 0. rarely or never, 1.sometimes, 2. most of the time.

Covariates
Variables known to be related to sleep duration were included as potential confounders. Socio-demographic information including age, sex, education level (primary/none, secondary level, third level or higher), marital status (married, not married, separated/divorced, widowed), area of residence (urban, rural) and employment status (employed, retired/not employed) were collected during the interview. The season of recording for measured sleep was also included.
Cognitive status was assessed using the Mini-Mental State Examination (MMSE). 37 Health measures included self-rated health (excellent/very good/ good, fair/poor), smoker status (smoker, non-smoker), self-reported pain and depressive symptoms as measured by scoring nine or higher on the eight-item center for Epidemiological Depression Scale. 38 Height and weight were measured during the health assessment. BMI was defined as weight divided by height squared (kg/m 2 ) and categorised as underweight/normal weight (0À24.9), overweight (25.0À29.9), and obese (30).
Participants were asked if they have ever been diagnosed with any chronic or cardiovascular conditions. Chronic conditions included lung disease, asthma, arthritis, osteoporosis, cancer, Parkinson disease, stomach ulcer, varicose ulcer, liver disease, thyroid disease, or kidney disease. Cardiovascular conditions included hypertension, stroke, angina, heart attack, heart murmur atrial fibrillation, or other abnormal heart rhythms. Participants were classified as having none, or one or more chronic conditions, and none, or one or more cardiovascular conditions. Detailed medication use information was also collected. Medications were classified using Anatomical Therapeutic Chemical (ATC) classification codes. Use of medications that impact sleep were included as covariates. Antidepressant medication was classified by ATC code N06A. Use of sleep medication included ATC codes N05A (antipsychotics), N05B (anxiolytics), N05C (hypnotics and sedatives), and R06A (antihistamines). Use of antihypertensive medication was classified by ATC codes C02 (antiadrenergic agents), C03 (diuretics), C07 (b blockers), C08 (calcium-channel blockers), and C09 (angiotensin-converting enzyme inhibitors).

Statistical analysis
Analyses were performed using Stata version 15.1 (Stata Corp., College Station, TX). Descriptive statistics for sleep parameters and participant characteristics were produced. TSTsub and TSTobj differences are presented to examine where discrepancies were more pronounced within sample characteristics.
Intra-class correlation coefficients produced using two-way mixed-effects models and Bland-Altman analysis was used to examine measurement agreement between TSTsub and TSTobj. 39,40 Overall agreement between measurements, and agreement by sleep problems were calculated.
The Bland-Altman analysis used TSTobj measurements in their original format prior to rounding. Linear regression models were used to analyze independent associations between TSTsub-TSTobj difference scores and sleep problems (daytime sleepiness, trouble falling asleep and trouble waking up too early). Basic models were adjusted for age and sex. Full models were additionally adjusted for covariates associated with large discrepancies in TSTsub and TSTobj. Full model outputs are shown in Supplementary Tables A3ÀA5.
Significance was set at p<0.05. All tests were two-tailed.

InterviewÀHealth assessment time delay
A delay between the structured interview and health assessment is present. The median time delay between structured interview and health assessment was 56 days (IQR: 23 days À 109 days) (data not shown). Supplementary analyses were produced to assess whether delays between measurements contributed to discrepancies.

Sample characteristics
Characteristics of the sample are presented in Table 1. 53.6% of the sample were female. The sample had a mean age of 67.5 years (SD 9.1, range 50À94 years). 55.8% lived in an urban area. The majority of the sample were married (72.4%), reported that their health was excellent/very good/good (84.1%) and were retired or not working (73.4%). Over half the sample reported one or more chronic conditions (56.1%) and 48.1% reported one or more cardiovascular conditions. Just 8.4% reported use of sleep medication, while 9.3% reported use of anti-depressants and 46.3% reported use of anti-hypertensive medication. The cohort were predominantly overweight (44.9%) or obese (31.8%). 9.0% reported depressive symptoms, while 35.7% reported the presence of pain. Only 10.8% of the sample were current smokers. The mean MMSE score for the sample was 28.7 (SD=1.7, range: 16À30). Table 2 summarises TSTsub and TSTobj by sample characteristics. Participants had a TSTsub of 7.0 hours (SD: 1.4, Range: 2.0À13.0 hours) of sleep on average compared to an TSTobj of 7.7 hours (SD: 1.2 hours, Range: 3.0À13.0 hours). The magnitude of TSTsub-TSTobj difference differed between demographics. Female participants had a sleep difference score of À0.8 h, while male participants had À0.6 hours. Those living in an urban area had a difference of À0.8 hours compared to À0.5 hours for those living in a rural area. The difference was À0.7 hours in Retired/Not Employed participants compared to À0.5 in employed participants. One of the largest discrepancies of TSTsub and TSTobj was found in participants who were separated/divorced (À1.1 hours).

Total sleep time
Differences in health status also had an impact on TSTsub-TSTobj difference. Those who reported fair/poor self-rated health had a significantly larger difference than those who reported excellent/very good/good self-rated health (À1.1 vs. À0.6 hours). Similarly, participants who used sleep medication had a score of À0.9 hours compared to À0.6 for those who were not taking sleep medication. The largest discrepancy was found in those who reported depressive symptoms who measured a difference of À1.7 hours, compared to À0.6 in those who did not report depressive symptoms. Fig. 2 presents a Bland-Altman plot depicting TSTsub and TSTobj. The average agreement line between the two measures is shown to be below 0 (Observed Average Agreement = À0.68) suggesting systematic bias toward under-reporting of sleep compared to actigraphy-based sleep. The limits of agreement ranged from À3.90 to 2.55 hours.
Supplementary Table A.1 presents mean TSTsub-TSTobj differences by Interview À Health Assessment time delay quantiles. Those with the shortest delay between structured interview and health assessment (0À20 days) had a mean TSTsub-TSTobj difference of À0.73 hours (SD=1.74) compared to À0.70 (SD=1.69) in those with longer delays ranging between 77 and 127 days, and À0.52 (SD=1.60) in delays of between 128 and 421 days. No clear relationship between Interview-Health Assessment Time Delay was seen in a scatterplot depicting the two measures ( Supplementary Fig. A.1). This is further assessed in Supplementary Table A.2 which presents a simple linear regression to model the association between TSTsub-TSTobj difference and Interview-Health Assessment time delay (log). The association was found to be non-significant [B = 0.04, 95% CI: À0.04,0,12].
Scatterplots depicting average TSTsub-TSTobj difference and sleep problem components are shown in Fig. 3. Significant negative Spearman rank correlations were found between self-reported sleep and all sleep problem components suggesting that sleep difference scores became more negative, representing greater underreporting of self-reported sleep, as sleep problems increased. Correlations were strongest with trouble waking too early (R s =À0.454, p<0.001) and trouble falling asleep (R s =À0.345, p<0.001). Only daytime sleepiness was significantly correlated with actigraphy-based sleep, however this correlation was weak (R s =À0.082, p<0.01).
The ICC coefficient between TSTsub and TSTobj was 0.18 (Table 3). The highest ICC coefficients were measured in those who reported trouble waking too early "rarely or never" (ICC=0.29) and trouble falling asleep "rarely or never" (ICC=0.26). The lowest coefficients were obtained in measurements of participants who reported trouble falling asleep or trouble waking up too early "most of the time", with ICC coefficients of 0.03 and 0.02 respectively. Fig. 4 depicts mean TSTsub and TSTobj by sleep problem measures. Sleep difference scores remained stable for those reporting increased chances of daytime sleepiness, with those reporting that they would never doze, and those reporting a high chance of dozing both measuring a TSTsub-TSTobj difference of À0.8 hours (Fig. 4A). TSTsub-TSTobj differences were lowest in those who reported "rarely or never" having trouble falling asleep (À0.3 hours) (Fig. 4B), or trouble waking too early (À0.2 hours) (Fig. 4C). These differences increased as the frequency of either problem increased. Participants who stated that they have trouble falling asleep "most of the time" had a difference of À2.3 hours. Those who stated they have trouble waking too early "most of the time" had a difference of À2.2 hours.

Self-reported and actigraphy-based sleep differences
Regression models predicting sleep difference scores for individual sleep problem components are shown in Fig. 5. Basic models and fully adjusted models are displayed. Positive coefficients represent over-reporting of sleep, negative coefficients represent under-reporting of sleep.
In the fully adjusted models, only participants who reported a slight chance of daytime sleepiness compared to no chance had a significant positive association with sleep difference scores (B = 0.31, 95% CI: 0.08,0.51, p<0.01) (Fig. 5A).
When compared to participants who stated they "rarely or never" had trouble falling asleep, or waking up too early, those who reported they had trouble either "sometimes" or "most of the time" had significant negative associations with sleep difference scores. The effects were attenuated in fully adjusted models, but remained significant. Strongest associations were found in those who had trouble falling asleep "most of the time" (B=À1.80 95% CI:À2.10,À1.51, p<0.001) (Fig. 5B) compared to trouble falling asleep "rarely or never", or trouble waking up too early "most of the time" (B=À1.86 95% CI:À2.10,À1.63, p<0.001) (Fig. 5C) when compared to trouble waking up too early "rarely or never".
The r 2 value for the basic Daytime Sleepiness model was just 0.01, suggesting that although significant associations were found, Daytime Sleepiness predicts little of the variance (1%) in sleep reporting scores (Supplementary Table A3). In contrast, the r 2 value for the basic model using trouble falling asleep to predict variance in sleep difference scores was 0.18 (Supplementary Table A5). This increased only slightly to 0.21 where all covariates were included in the fully adjusted model.

Discussion
This study sought to assess measurement agreement between self-reported total sleep time and actigraphy-based total sleep time using data from TILDA. Overall agreement was low between measurements with systematic under-reporting of sleep relative to actigraphy-based sleep apparent. Average results from both measurements were within one hour of each other, but participants under-reported their sleep by 0.7 hours compared to actigraphybased sleep. Differences were more pronounced in participants reporting poorer health, such as fair/poor self-rated health and depressive symptoms, with discrepancies of greater than one hour between measurements.
Another aim of this study was to determine the impact of subjective sleep problems on measurement agreement using three short sleep problem questions. Sleep problems were prevalent. Over onethird of participants experienced insomnia symptoms which is in accordance with previous literature. 21 Agreement between measurements decreased in those experiencing insomnia symptoms as the magnitude of sleep under-reporting increased. In particular, those who experienced trouble falling asleep, or waking too early "most of the time" were found to under-report their sleep by over two hours on average. This is consistent with other findings which showed that poor self-reported sleep quality was associated with shorter reported TST when compared to measured sleep. 11,12,14 Experience of daytime sleepiness did not show the same effects on sleep reporting in this sample. Measurement differences were consistent in all categories which may suggest daytime sleepiness is not a driving force to the same extent of insomnia symptoms in perceptions of sleep. Use of the TILDA data in this analysis enabled extensive adjustment for relevant confounders in assessing these questions as driving factors of these discrepancies.
Depressive symptoms were shown to be the strongest driving participant characteristic in sleep reporting differences. To further understand this relationship, supplementary analyses assessing prevalence of depressive symptoms and anti-depressant use by sleep problems were conducted. Depressive symptoms were more prominent in participants reporting greater magnitude of sleep problems (Supplementary Table A6). Similarly, use of antidepressant medication was more commonly reported in those reporting more sleep problems (Supplementary Table A7). This is consistent with previous findings of the association between insomnia and depressive symptoms. 2,6 Accelerometer devices may classify a period of sedentary, motionless behaviour as sleep. They are still however considered an effective method for sleep research. 16 We have shown that in a sample of older adults, actigraphy-based measurements were on average longer than self-reported sleep.
Underreporting was most pronounced in participants who selfreported insomnia symptoms. It is suggested that discrepancies of this kind in insomniacs may be a reflection of poor sleep quality that is not captured by accelerometer devices rather than a misperception. 11 It is unclear whether differences here were under-reporting sleep relative to how much sleep was actually obtained, or whether recordings were longer resulting from sedentary periods resulting from trouble falling asleep or waking too early. The day-to-day variability of sleep duration experienced in those with insomnia complaints creates challenges in using either objective measurement or single-item survey questions. With consideration to the high prevalence of insomnia in the older population, 21    methods used in sleep quality assessment given the complex nature of sleep. 9 The PSQI has been commonly used as the method of measuring sleep quality in research of this kind, [9][10][11][12]19,28 but the length of this tool may be prohibitive in certain settings. Our findings demonstrate that addition of short, quick to administer questions which assess insomnia symptoms may be effective complimentary questions to account for potential bias when evaluating sleep duration.. These results add to current literature with this evidence, and confirm findings of a similar nature using a large, population-derived cohort of older adults. These findings also have implications for clinical practice. Those with adverse health characteristics were shown to be most at risk of discrepancies. Measurement on these cohorts should emphasise use of objective measurement using wearable devices such as FitBit or similar.

Strength and limitations
This study has a number of strengths. This is one of the largest studies of its kind to date and uses a population derived communitydwelling cohort of older adults. The design of TILDA allowed for an assessment of agreement of sleep measurements in older adults while accounting for a comprehensive array of confounding factors.
Limitations are also present. The accelerometer measurements are from a sub-sample of the TILDA study and not population representative, however it has been shown that this sample is characteristically similar to the full cohort. 34 Participants are asked when reporting their sleep to round to the nearest hour, reducing the precision of this measurement. Self-Reported Total Sleep Time was asked during the self-interview, while accelerometer measurements were taken directly after the health assessment. As a consequence of the study design of TILDA, a delay between the structured interview and health assessment is present. This created some uncertainty in whether reported sleep reflected the same sleep patterns that were being experienced during the accelerometer recording period. We produced supplementary analyses to assess whether this contributed to measurement discrepancies. We did not find clear evidence for an effect resulting from the lag. Nevertheless, lag between assessments is a potential limitation in studies of variable characteristics like sleep duration. Studies of a similar nature should be conscious of potential contributions to discrepancies where delays between measurements are present.

Conclusions
In summary, agreement between self-reported and actigraphybased sleep duration was low in this sample of community dwelling older adults. Health practitioners should be aware of the prevalence of under-reporting of sleep duration, particularly as those with adverse health characteristics were shown to be most at risk of discrepancies. Insomnia symptoms were associated with measurement discrepancies. It was shown that this effect can be gauged using quick to administer questions. Studies which seek to use either form of measurement should consider inclusion of similar questions as a means of establishing whether self-reporting, or objective measurement may have been influenced. Future work will investigate longitudinal differences in measurements and incident insomnia symptoms to better understand the directionality of the relationship.

Declaration of conflict of interest
The authors declare no conflict of interest.