Psychometric and Structural Validity of the Pittsburgh Sleep Quality Index among Filipino Domestic Workers

Objectives: Evaluate the psychometric properties and structural validity of the Filipino version of the Pittsburgh Sleep Quality Index (PSQI) among Filipino domestic workers (FDWs). Methods: In Study 1, 131 FDWs completed PSQI and other scales, along with 10-day actigraphic assessment with accompanying electronic daily sleep dairy. A subsample of 61 participants completed follow-up assessment after 10 days. In Study 2, 1363 FDWs were recruited and randomized into two halves. Exploratory factor analysis (EFA) and Confirmatory factor analysis (CFA) were used in the two halves, respectively. Results: In Study 1, the Cronbach’s alpha of the PSQI was 0.63 at baseline and 0.67 at follow-up. Test-retest reliability for the PSQI global score based on intraclass correlation was 0.63. Convergent validity was supported by the significant associations between the PSQI global score, PSQI components scores, sleep patterns from the daily sleep diary, and measures of depression, anxiety, and rumination. Small correlations between the PSQI global score and measures of daytime sleepiness, social support, and self-reported height, supported discriminant validity. In Study 2, EFA yielded two PSQI factors with acceptable factor loadings. CFA established that this two-factor model, comprised of perceived sleep quality and sleep efficiency, evidenced better model fit than alternative models tested. The Cronbach’s alpha of two factors was 0.70 and 0.81, respectively. Conclusions: The PSQI demonstrated good internal consistency of two factors, and good convergent, and divergent validity. Results can be referenced in future studies to measure and screen sleep dysfunction among clinical and non-clinical populations in the Philippines.


Background
The healthcare-related burden of impaired sleep is enormous. Studies increasingly link inadequate sleep and sleep disorders like insomnia to increased risk of depression, and other mood disorders [1][2][3], as well as increased fatigue [4], reduced psychomotor performance [5], poor memory consolidation [6], and substantial workplace cost due to work underperformance and absenteeism [7].
Migrant workers are likely to experience increased risk of poor sleep and consequent poor health. Migrant workers, especially domestic workers, may be exposed to sleep deprivation due to on-call PSQI once again after 10 days. The data of this study was a part of a larger study utilizing actigraphy to determine the burden of sleep dysfunction and related correlates, along with several embedded validation studies [11,19,21] and a pilot study for the larger planned respondent driven sampling (RDS) project [20,[31][32][33].
In Study 2, the data with 1363 FDWs was obtained from a RDS project conducted in Macao (SAR) from November 2016 to November 2017.
The studies were approved by the ethics committee of the University of Macau. The research process and objectives were explained to the participants before the informed consent was acquired.

Measures
In Study 1, the Filipino versions of the PSQI and Epworth Sleep Scale (ESS) were provided by the Mapi Research Trust (https://eprovide.mapi-trust.org). Official translated versions of the PHQ-9 and GAD-7 were obtained from Pfizer [34]. The Ruminative Response Scale (RRS) and Multi-Dimensional Scale of Perceived Social Support (MSPSS) were translated into Filipino following standard forward and backwards translation guidelines, including cognitive interviews, and pilot testing [35]. Actigraphy and daily sleep diaries were used in Study 1. In Study 2, only PSQI questionnaire data and demographic information were used.

•
Objective Sleep: The Actiwatch-2 (Philips Respironics, Bend, OR, USA) is a widely used wrist-worn sleepmonitoring device, validated against PSG, and used to monitor sleep patterns and individual sleep quality [36]. All the participants wore the actigraph on the wrist of their non-dominant hand for 10 continuous days with 30 seconds epoch length. We only used data from eight nights, removing weekend nights, which reflect different sleep patterns. The following outcome variables are generated: total sleep time (TST), sleep onset latency (SL); sleep efficiency (SE); wake after sleep onset (WASO); number of wake bouts (WB); and fragmentation index (FI), which is an indication of the degree of sleep fragmentation (detailed in Table 2). •

Daily Sleep Diary
This consisted of self-reported TST, bedtime and wake time, SL (assessed on a 5-point ordinal item ranging from 'less than 15 minutes' to 'more than 120 minutes'), sleep quality (SQ) (assessed on a 5-point ordinal item ranging from 'very good' to 'very bad'), TIB (the total time spent in bed), and SE (detailed in Table 2). The diary records of bedtime and wake time were also used to clean the sleep logs in Actiwatch-2. Daily sleep diary was received via online survey sent using short message service twice per day (morning and evening). •

Depressive Symptoms
The Patient Health Questionnaire with nine items (PHQ-9) is a self-report screening measure used to assess depressive symptoms occurring in the past two weeks. Each item is rated from 0 (not at all) to 3 (nearly every day). Higher total scores indicate greater depression symptom severity [37]. The Filipino version of PHQ-9 was used in a previous study among FDWs in Macao with a good internal consistency (Cronbach's alpha = 0.79) [38], and validity [19,39]. The Cronbach's alpha in the present study was 0.78 indicating good internal consistency reliability.

• Anxiety
The Generalized Anxiety Disorder scale with seven items (GAD-7) was used to measure anxiety symptoms [40]. Each item is rated from 0 (not at all) to 3 (nearly every day), with an anxiety symptom severity score from 0 to 21. The Filipino version of GAD-7 was used in the previous study among FDWs in Macao with a good internal consistency (Cronbach's Alpha = 0.80) [38], and validity [19]. The Cronbach's alpha in the present study was 0.82 indicating good internal reliability.

•
Epworth Sleepiness Scale: The Epworth Sleepiness Scale (ESS) is an 8-item self-report questionnaire to measure daytime sleepiness in adults [41,42]. Items range from (0 'never' to 3 'high chance') to reflect subjects' probability of falling asleep in eight different situations (e.g., while sitting or reading, watching television, and driving). The total score of ESS ranges from 0 to 24, with higher scores indicating greater daytime sleepiness [41]. The Cronbach's alpha in the present study was 0.82, indicating good internal reliability.

• Rumination
The Ruminative Response Scale (RRS) short version describes rumination that is self-focused, symptom-focused, and focused on the possible causes and consequences of dysphoric mood [43]. Each of the 10 items is rated on a Likert scale ranging from 1 (almost never) to 4 (almost always). The total score ranges from 10 to 40. Higher total scores reflect greater self-reported rumination. In the present study we omitted one item of 'write down what you are thinking about and analyze it' based on community feedback during the translation and cultural adaptation process as migrant workers thought it was not typical for them to do. The Cronbach's alpha of RRS in the present study was 0.93, indicating excellent internal reliability.

•
Perceived social support The Multi-Dimensional Scale of Perceived Social Support (MSPSS) is a 12-item scale to assess perceived social support [44]. This measure consists of three subscales that examine perceived support from family (four items), friends (four items) and a significant other (four items). Respondents answer on a 7-point scale, from 1 (very strongly disagree) to 7 (very strongly agree). The Cronbach's alpha in the present study was 0.89, indicating good internal reliability.
Participant characteristics included self-reported age, years working as a domestic worker in Macao, marital status, education level, type of visa, Cantonese fluency (speaking and understanding), monthly salary, weekly working hours, numbers of days off per month, and residence (i.e., live in or outside of the employer's house).

Study 1
We computed descriptive statistics for participants' demographic information. All variables were checked for normality. Pearson correlation was conducted for the relationships between normally distributed variables. Spearman's rho was used for the relationships between non-normally distributed variables. Item-level missing data for the PSQI was observed for 3 participants. The missing data of PSQI was dealt with using listwise deletion given that less than 5% missingness was observed in the sample [45].

Reliability Testing
Internal consistency reliability was assessed using Cronbach's alpha for the seven PSQI components scores. The values over 0.60 are considered acceptable [46]. Item-to-total correlations (ITC) were calculated to assess the internal homogeneity of the scale. Each component score of PSQI was treated as one separate item. ITC values higher than 0.30 are acceptable [47]. The test-retest reliability was assessed by ICC with baseline PSQI global and component scores and paired 10-day retest scores. A nonparametric bootstrap was used to obtain the 95% confidence interval (CI) of ICC. ICC values higher than 0.75 are considered strong, values from 0.40 to 0.75 are moderate, and values less than 0.40 are considered poor reliability [48].

Validity Testing
Convergent validity refers to associations between two measures that are theoretically related. This was tested with correlations between the PSQI global score and PHQ-9, GAD-7, and RRS. Based on previous literature, we hypothesized that: (a) greater depressive symptom severity would correlate with worse sleep dysfunction [49]; (b) greater anxiety symptom severity would correlate with worse sleep dysfunction [50]; (c) greater level of rumination would correlate with worse sleep dysfunction [51]. Convergent validity was also examined by the associations between the follow-up of PSQI global and component scores and averaged daily sleep parameters from the Actiwatch-2 and sleep diary, separately. We hypothesized that the variables of TST, SL, SE from Actiwatch-2 and daily sleep diary would be significantly associated with PSQI components of 'sleep duration', 'sleep latency', and 'habitual sleep efficiency', respectively.
Discriminant validity refers to the expected lower association between constructs due to their lack of theoretical relation. This was assessed by correlating the PSQI global score with the ESS, MSPSS and self-reported height. A previous study evidenced poor correlation between ESS and PSQI global [28], this might due to the different goal of ESS, which measures habitual sleepiness rather than actual sleep symptoms [52]. We hypothesized that there would be the negligible correlations between the PSQI global and ESS [28], MSPSS [53] and self-reported height, respectively.

Study 2
The basic psychometric properties of the PSQI including Cronbach's alpha, componentto-component correlations (Spearman's rho), and component-to-total correlations (Spearman's rho) were assessed. Construct validity of PSQI was separated into two parts, EFA and CFA.
The participants were randomly divided into two halves with the RAND formula in Excel. EFA was conducted on the first random sample. Before conducting factor analysis procedures, the suitability of performing factor analysis was assessed based on Bartlett test of sphericity, p < 0.001 and the Kaiser-Meyer-Olkin (KMO) of sampling adequacy = 0.64 [54]. EFA was performed using principal component analysis with maximum likelihood estimation to identify the latent factors that explain the common and unique variance of the 19 items of PSQI. An oblimin rotation procedure was conducted. Factors were extracted based on eigenvalues above 1 [55]. The item loading values equal to or greater than 0.3 were retained.
To verify the factor structure of PSQI, CFA was then conducted on the second random sample to assess the fitness of the structural model based on the identified model obtained in the EFA. The weighted least squares mean and variance adjusted (WLSMV) estimator was used as that the PSQI components scores are ordinal rather than continuous [56]. The adequate goodness of fitness indexes of the model was evaluated and based on standard benchmarks, including the chi-square test of the model (the p value greater than 0.05 would be preferred), comparative fit index (CFI) >= 0.90, Tucker-Lewis index (TLI) >= 0.90, root mean square error of approximation (RMSEA) <= 0.08, and standardized root mean square residual (SRMR) <= 0.08 [57,58]. We also calculated the goodness of fit of other models from the previous studies to make comparisons to other samples. For the best model, the standardized estimated of the factor loading paths was summarized in Figure 1. Descriptive statistics and EFA procedures were conducted with STATA 14.0 (Stata Corp, College Station, TX, US). CFA was conducted using Mplus [59]. All the statistical significance level was set as p value < 0.05 with two tails.

Study 1
One hundred and thirty-one FDWs with an average age of 39.7 years (SD = 8.3; median = 39; range = 21-59) participated in this study. Their average height was 155.7cm (SD = 6; median = 157.5; range = 130-183). The majority (58.02%) of participants reported to have at least some college or higher educational attainment. The average length as a domestic worker in Macao was 5.1 years (SD = 3.6; median = 4). The average monthly salary was 488.4 (SD = 107.1; median = 480) USD. The reported average weekly working hours were 69.1 (SD = 20.1; median = 70). More than half (59.5%) lived outside of their employer's home.

Reliability
The Cronbach's alpha of the PSQI global scale was 0.63 at baseline (n = 131) and 0.67 at follow-up (n = 61). According to alpha if item deleted analysis, the reliability slightly increased (0.64) either when the component of 'use of sleeping medication' or 'habitual sleep efficiency' was omitted (see Table 3). The coefficients of item-to-total correlation ranged from 0.37 ('use of sleeping medication') to 0.66 ('subjective sleep quality' and 'sleep latency'). The 10-day ICC of the PSQI global score was 0.63.
The ICC values of the PSQI component scores ranged from 0.30 ('use of sleeping medicine') to 0.58 ('sleep latency' and 'sleep duration').

Validity
The detailed results of discriminant validity of PSQI are shown in Table 4. The detailed results of convergent validity of PSQI are given in Tables 4 and 5. No actigraphy variables were found significantly associated with the follow-up of PSQI global and component scores, except inverse correlations between PSQI 'sleep duration' and Actiwatch-2 TST (r s = −0.65, p < 0.01) and WB (r = −0.43, p < 0.01), which show consistency in reporting (higher PSQI scores indicate shorter sleep time).

Participants Characteristics
Participant characteristics are presented in Table 6.

EFA Results
The EFA results are displayed in Table 7. Based on the eigenvalue >=1, two factors were obtained, which explained 33.99% and 22.25% variance of data, respectively. Each PSQI component had an acceptable loading, which ranged from 0.38 to 0.77. Five components loaded high on factor 1, which was named 'perceived sleep quality.' Two components loaded high on factor 2, which was named 'sleep efficiency'. This result was the same as that reported by Magee et al. [30].

CFA Results
The two-factor model identified through the result of EFA was tested. We also compared our model with the original one-factor [23] and other two- [28,60] and three-factor models [27,29,61]. Table 8 presented the goodness-fit indices of each PSQI model with second random half of the sample. From the results, the two-factor model based on EFA results presented good fit: CFI = 0.96, TLI = 0.94, RMSEA = 0.065, SRMR = 0.039. However, original one-factor and other two-factor models provided poor fit to the data (see Table 8).
The replicated three-factor model from Gelaye et al. [61] also presented acceptable fit: CFI = 0.94, TLI = 0.90, RMSEA = 0.050, SRMR = 0.093. However, the standardized path coefficient (1.59) between factor 1 'perceived sleep quality' and factor 3 'daytime disturbances' of the model was greater than 1. This result suggested that the factor 1 and factor 3 might have overlapping concepts and should be combined to be one, which was consistent with the EFA identified two-factor model. Figure 1 showed the standardized path coefficients of the two-factor model of PSQI.

Basic Psychometric Properties of PSQI
PSQI global scores of 1363 participants ranged from 0 to 17, with the mean score of 6.28 (SD = 3.24). The Cronbach's alpha of PSQI factor 'perceived sleep quality' and 'sleep efficiency' was 0.70 and 0.81, respectively. Table 9 provided more detailed information.      Note: Overall Cronbach's alpha of PSQI is 0.63 at baseline and 0.67 at 10-day retest. * = p < 0.05, ** = p < 0.01. a = Spearman's correlation, b = Pearson correlation. CI = confidential interval. ICC = intraclass correlation coefficient. Each component score ranges from 0 to 3. Item-to-total correlation means the correlations between each component and the PSQI-Global score.     Note: EFA = exploratory factor analysis, CFA = confirmatory factor analysis. Cantonese fluency was assessed with a ruler scale, which ranged from the lowest level (0) to the highest level (10). "Live-in/live-out" was asked by "Do you live in your employer's home?".
Note: * = p < 0.05, ** = p < 0.01. IQR = Interquartile range. NA = no correlation is presented since items were included in the PSQI factor. Each component score ranges from 0 to 3. All the correlations were Spearman's rho coefficients.

Discussion
To our knowledge, this is the first study to assess the psychometric properties and the factorial validity of Filipino version of the PSQI. The results demonstrated a low internal consistency of the PSQI global score, but acceptable values for the two PSQI factors. Our literature review revealed a wide arrange of Cronbach alpha of PSQI from 0.57 to 0.89 [24,25,68]. Measures with low alpha may still be useful [69]. The 10-day test-retest ICC values for the PSQI global and component scores demonstrated moderate reliability except for the components of 'habitual sleep efficiency' and 'use of sleeping medicine', which suggested that sleep is stably assessed using the PSQI global score and some of the component scores within this population.
Overall, 'subjective sleep quality' and 'sleep latency' components were most highly correlated with the global score, and components of 'use of sleeping medicine' and 'habitual sleep efficiency' were least correlated with the global score of PSQI. This pattern of associations suggests that the global score of PSQI reflects 'subjective sleep quality' and 'sleep latency' more than other components and that 'use of sleeping medicine' and 'habitual sleep efficiency' are less reliable, consistent with previous studies [62,68]. This is likely due to the infrequent use of sleep medication in this sample (less than 20% reported its use).
The PSQI demonstrated good convergent validity in our sample. Greater sleep dysfunction was significantly associated with higher levels of depression and anxiety, similar to previous research [70]. The PSQI global score and many components were found significantly and moderately associated with RRS. The reason could be explained that these two scales might have conceptional overlap. The RRS assesses respondents' reflection and brooding on the possible causes and consequences of dysphoric mood [43]. Its association with sleep quality was approved and illustrated among undergraduate students with findings that rumination factors like worry might contribute to cognitive activity, which could affect sleep quality [71].The adequate convergent validity of the PSQI was also supported by the moderate associations between PSQI follow-up and sleep diary variables, which were shown not only on the PSQI global score, but also on other components. We would expect daily assessments of sleep dysfunction to demonstrate higher test-retest reliability than aggregated retrospective reports of sleep problems [72], so the high correlations between daily diary reports and PSQI scores obtained at 10-day follow-up indicate strong reliability for self-reported sleep problems in the sample. In particular, self-reported SQ, SL, and TST were especially highly correlated. This was consistent with previous study findings that sleep patterns of sleep diaries had the high correlations with PSQI items [73].
For actigraphy variables, we only found that longer TST and more WB were significantly associated with longer PSQI 'sleep duration'. The results were consistent with the previous criterion validity study of PSQI among non-clinical population, which found no significant correlation results between PSQI global and actigraphy variables of TST, SE, WASO, and SL, but significant associations between PSQI sleep duration and TST, as measured by the actigraphy [74]. Similarly, the original PSQI validation study showed a lack of association between the PSQI and PSG with the strongest correlation being r = 0.30 between the PSQI and PSG SL [23]. The possible reasons might be that actigraphy or PSG measures actual sleep in real time while the PSQI is retrospective recall measurement, which may hinder accuracy and have reporting biases. Moreover, the low correlations ranging from 0.28 to 0.32 between PSQI components and PSG sleep parameters also supported the difference between objective sleep measures and self-report measures [73].
Discriminant validity was demonstrated by small effect size correlations (<0.30) between the PSQI global score and MSPSS and self-report height. Even some significant associations between PSQI and MSPSS-total and MSPSS-family were found, the associations were still weak. The result was consistent with the previous validation study [53].
The present study also examined the factor structure of the Filipino version of PSQI. The EFA identified two factors within the PSQI, which were labeled 'perceived sleep quality' for the first factor including the PSQI components of 'subjective sleep quality,' 'sleep latency,' 'sleep disturbances,' and 'daytime dysfunction', and the term 'sleep efficiency' for the second factor, which including 'sleep duration' and 'habitual sleep efficiency'. Subsequent CFA evidenced that this two-factor model along with the three-factor model [61] were favored statistically over the original one-factor model [23] and other published two-factor models [28,60]. Although the three-factor model from Gelaye et al. [61] had similar model fit with the our two-factor model, the model fit suggested combining the 'perceived sleep quality' and 'daytime disturbances' factors. The results and process in the present study were similar with the previous studies [30,61].
The Cronbach's alpha of PSQI factor 'perceived sleep quality' was 0.70, indicating acceptable internal consistency. The Cronbach's alpha of PSQI factor 'sleep efficiency' was 0.81, indicating the good internal consistency. All the components showed high component-total correlations with the PSQI factors, which further supported good internal consistency of the PSQI among FDWs.
Investigators previously argued that a two-or three-factor structure of the PSQI might be a better representation of sleep disturbance than a unidimensional model [27,28,75]. Our study supported a two-factor structure of PSQI, which was consistent with the two-factor model proposed by previous researchers [30,61,64]. Some researchers observed that the removal of 'use of sleeping medication' did not have a major impact on the fitness of the CFA models [30]. However, another structure validation study among 309 Brazilian adolescents showed the best two-structure model of PSQI excluding the component of 'use of sleeping medication' [64]. In our study, the identified two-factor model fit indecencies improved when this component was removed. Of note, the PSQI global score and cut-off score in defining the poor sleep would be changed when removing 'use of sleep medication.' Further studies should explore whether the scale demonstrates incremental validity in assessing sleep dysfunction when 'use of sleep medication' component is included in non-clinical samples.

Conclusions
Study 1 provided evidence that the Filipino version of PSQI is an adequately reliable and valid assessment instrument useful for quantifying sleep parameters in FDWs. Among the PSQI component scores, the most robust evidence was obtained for 'subjective sleep quality,' 'sleep latency,' and 'sleep duration.' The use of sleep medication is not likely a critical indicator of sleep dysfunction in this population. The findings in Study 2 validated the two-factor structure of the PSQI to assess self-reported subjective sleep disturbance among FDWs. The Filipino version of PSQI scale demonstrated good construct validity. The present study could be referenced for future studies to measure and screen sleep dysfunction among clinical and non-clinical population in the Philippines.
The current study has some notable strengths. It is the first known study to evaluate the psychometric properties and structural validity of PSQI among Filipino transnational migrants or any Filipino sample. Second, the study design included daily diary self-reported sleep assessments. Third, we used actigraphic assessment as an objective indicator of sleep dysfunction. Despite these strengths, the study has several limitations. First, the sample size only included female domestic workers, limiting generalizability to other transnational migrants and men. This two-factor structure of PSQI may not generalize to all Filipinos or Filipino migrant workers, especially men [27][28][29][30]. Second, participants were recruited using snowball sampling methods in study 1, which is likely to introduce some sampling bias. Third, the factorial validity of the measure could not be assessed given the size of the sample. Further studies that asses a more diverse sample of overseas Filipino workers and evaluate the factorial validity of the Filipino version of the PSQI are needed. Fourth, previous studies used one or several self-reported items instead of the full PSQI scale to measure sleep in epidemiological studies [76,77]. Further studies could explore the utility of a brief version of the PSQI among FDWs due to their very busy schedule.
Declarations: Ethical approval and consent to participate: All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. The study was approved by the ethics committee of the University of Macau. Informed consent was obtained from all individual participants included in the study.