Measurement Invariance and Latent Mean Differences in the Nurses' Emotional Labour Scale

ABSTRACT Background The measurement invariance and latent mean differences in emotional labor across different hospital and monthly salary levels among registered nurses have never been confirmed for the Emotional Labour Scale. These issues may influence the application and efficacy of this scale in practice. Purpose This study was developed to evaluate the factor structure of the nurses' Emotional Labour Scale and to examine the measurement invariance and latent mean differences for this scale across different hospital and monthly salary levels. Methods Data were collected from 461 registered nurses working in four general hospitals and 12 long-term care hospitals. Confirmatory factor analysis and a multigroup confirmatory factor analysis were performed to determine the internal structure and measurement invariance of the Emotional Labour Scale. Results The results of the confirmatory factor analysis indicate that the factor structure model proposed by the original scale fits well with our data as well as configural invariance, factor loading invariance, intercept invariance, and uniqueness invariance. Moreover, factor variance/covariance invariance across two hospital levels as well as configural invariance, factor loading invariance, and intercept invariance across two monthly salary levels were supported. The mean score for emotional control effort in the profession of general hospital nurses was lower than that for long-term care hospital nurses. No statistically significant latent mean differences were found across monthly salary levels. Conclusions/Implications for Practice The findings show the Emotional Labour Scale to be a valid and reliable tool for assessing registered nurses and also comparing the mean score for emotional labor across hospital and monthly salary levels to be feasible. The scale may contribute to the development of human resource strategies.


Introduction
Emotional labor has been studied primarily among service providers such as flight attendants (Hochschild, 2012) and began to be studied in sociology and business approximately 20 years ago (Grandey & Sayre, 2019). There has been increasing interest in emotional labor among Korean nurses from the perspective of nursing service (S. H. Kim & Ham, 2015; J. E. Kim et al., 2019). Because emotional labor is a concept rooted in the premise of interactions with others and refers to the efforts to control and express emotions to adhere to the norms required by clients or organizations (M. J. Kim, 2017), the practice of emotional labor may vary according to the characteristics of clients and organizations. In contrast to other service jobs that serve a physically and mentally healthy target population, nursing services are provided to patients with physical and mental health problems. In addition, because nurses provide primary care at the patient bedside, their emotional labor differs from that of other professionals (Gray, 2009;Williams, 2013).
Emotional labor exists for nurses in all nursing care settings, as it relates to their interpersonal interactions with patients, guardians, and other health professionals. In particular, nurses show therapeutic emotional labor in their relationships with patients and guardians (Delgado et al., 2017). Nurses perceive increased autonomy as professionals when providing care and services for patients with a professional attitude (Hong, 2016). Moreover, the facial expressions, attitudes, and kindness that nurses show during the process of care are directly linked to patients' satisfaction (Andrew et al., 2011) and facilitate an empathetic experience with patients (Msiska et al., 2014). The emotional management efforts of nurses to use positive facial expressions and control their feelings increase patient comfort and sense of security. Thus, this concept of emotional labor has been considered an important topic in the study of nursing work (Mann & Cowburn, 2005).
As described above, understanding the unique features of nurses' work is critical to understanding nurses' emotional labor. However, most Korean studies on emotional labor in nurses have utilized a translated and revised version (S. H. Kim & Ham, 2015) of a foreign instrument originally developed for use on hotel workers (Morris & Feldman, 1996). Moreover, this instrument comprises factors such as the frequencies and varieties of emotional display, attentiveness for emotional display, and emotional dissonance (Morris & Feldman, 1996) and is limited in its ability to assess the unique attributes of nurses' emotional labor (Hong & Kim, 2019). In addition, validity and reliability-testing problems were encountered in developing the Korean version of this instrument. To overcome these issues, Hong (2016) developed the Emotional Labour Scale for nurses to measure the properties of emotional labor performed by Korean registered nurses with patients. Hong (2016) and Hong and Kim (2019) identified the attributes of emotional labor of hospital nurses using a sample of 304 participants. The construct validity of the instrument was tested in these studies using exploratory factor analysis, the group comparison method, and item-total correlation, whereas criterion validity was tested based on correlation with Brotheridge and Lee's (2003) emotional scale. The Emotional Labour Scale was developed by Hong and Hong and Kim using sample populations of nurses in a metropolitan area, and applicability is limited to general hospitals and upper-level general hospitals. The scale has not yet been tested in other geographic settings or types of medical institutions. Therefore, to increase the generalizability of this scale, measurement invariance should be tested by ensuring that those who recorded the same score on a specific scale remain at the same level of the measurement area regardless of individual characteristics. Therefore, the adequacy evaluation of the Emotional Labour Scale for long-term care hospitals and wages should be investigated, and the consistency of its factor structure should be tested.
Many studies employ confirmatory factor analysis during the process of instrument development or assessment to examine and validate instrument suitability. In addition, confirming that the factor structure of an instrument is maintained equally between groups is crucial to ensuring that scores may be compared between groups without error (S.-H. Kim, 2019). Emotional labor and other factors related to emotional labor (e.g., burnout) are known to vary according to hospital and nurse salary levels (K. R. Lee & Kim, 2016;Yom et al., 2017). In a previous study (K. R. Lee & Kim, 2016), nurses' degree of emotional labor was statistically significantly higher in large-sized hospitals than in small-and medium-sized hospitals but was not affected by level of income. However, in Yom et al., the surface acting of emotional labor was found to differ between yearly income groups. Moreover, the mean of emotional labor was higher in low-salary nurses, although the difference did not achieve statistical significance (J. Lee, 2019). In another study, higher levels of emotional labor demand were found to be associated with lower salary levels (Bhave & Glomb, 2009;Glomb et al., 2004). This suggests that response patterns to each item may differ among participant groups. However, instruments are typically developed without establishing equivalence across groups for major factor-structure constructs. No study on the emotional labor of nurses in the literature has examined measurement invariants across different hospital and salary levels, although, for example, nurses in long-term care settings who earn a certain salary and care for patients with widely different levels of illness severity may interpret the meaning of an item very differently from nurses who work in a general hospital located in a metropolitan area. Therefore, this study was designed to examine measurement invariance in the Emotional Labour Scale, which was not tested by either Hong (2016) or Hong and Kim (2019) during scale development. This will contribute to ensuring the correct use of this scale and the correct interpretation of results and is expected to increase the usefulness of this instrument in measuring emotional labor in nurses. The aims of this study were to examine the factor structure of the Emotional Labour Scale and to assess measurement invariance and latent mean differences across different hospital and nurse salary levels.

Design and Participants
In this study, a descriptive cross-sectional survey was used to assess the factorial structure of the Emotional Labour Scale and test for measurement invariance and latent mean differences across hospital and monthly salary levels. Nurses working in a general or long-term care hospital who signed an informed consent form were enrolled as participants.
The minimum sample size for conducting a valid confirmatory factor analysis of the Emotional Labour Scale was calculated based on the model fit (Preacher & Coffman, 2006). The results designated a minimum required sample size of 197 with a degree of freedom of 101, a significance of .05, a power of .90, a null root mean square error of approximation (RMSEA) of .00, and an alternative RMSEA of .05. Thus, the authors designated a target total sample size of 450, with 225 participants assigned to each of the two groups and a withdrawal buffer of 10%. The questionnaire was distributed to 500 participants. After excluding the 26 questionnaires that were not returned and the 13 with incomplete responses, data from 461 participants were available for analysis in this study.

Instrument
The Emotional Labour Scale (Hong, 2016) is a 16-item scale consisting of three factors: emotional control effort in profession (seven items), patient-focused emotional suppression (five items), and emotional pretense by norms (four items). Each item is rated using a 5-point Likert scale ranging from strongly disagree (1) to strongly agree (5), with higher scores indicating greater emotional labor. At the time of development, the Cronbach's alpha of the Emotional Labor Scale was .81 for the entire scale, .88 for the emotional control effort in profession factor, .77 for the patient-focused emotional suppression factor, and .69 for the emotional pretense by norms factor. In this study, the Cronbach's alpha was .84 (95% CI [.82, .86])

Data Collection
Data were collected from June 13 to August 18, 2017. Four general hospitals and 12 long-term care hospitals were convenience sampled from Daegu City and Gyeongsangbuk-do Province. Next, the heads of the nursing department at each hospital were contacted via email and phone and informed of the study purpose, participant inclusion requirements, methods, and sample size. Upon obtaining permission to collect data, a research assistant visited the hospital to obtain signed informed consent from the participants and then distributed and retrieved the questionnaires. The participants were each given a small gift as a token of appreciation for their cooperation.

Ethical Considerations
This study was approved by the institutional review board of the authors' university (approval number: CUIRB-2017-0010) before beginning the study. To protect the participants, they were informed of the purpose, method, and procedures of the study as well as of their freedom to withdraw from the study at any point. In addition, the authors informed participants of the measures that would be taken to ensure participant anonymity and data confidentiality. The participants then signed a consent form and completed the questionnaire.

Statistical Analysis
The data were analyzed using IBM SPSS Statistics 25.0 and Amos 20 software (IBM Inc., Armonk, NY, USA). Demographic characteristics were presented as frequency and percentage. The differences in demographic characteristics between the groups were analyzed using a chi-square test. Univariate normal distribution was tested using skewness and kurtosis. First, the factor structure of emotional labor was analyzed using confirmatory factor analysis. Before this analysis, multivariate normal distribution was tested, and estimates that did not satisfy multivariate normality were analyzed using bootstrapping. There were no missing data on emotional labor in this study. Model fit was examined comprehensively using the w 2 statistic (df, p value), normed w 2 (normed chi-square, NC), comparative fit index (CFI), goodness-of-fit index (GFI), adjusted GFI (AGFI), RMSEA, and standardized root mean residual (SRMR). The criteria for the model fit indices were p ≥ .05 for w 2 statistic, 2.00-5.00 for NC, CFI ≥ .90, GFI ≥ .90, AGFI ≥ .90, RMSEA ≤ .08, and SRMR ≤ .08 (Bae, 2016). To confirm the validity of the factors, convergent validity was established based on a factor loading ≥ .5 (critical ratio, CR ≥ 1.97), construct reliability ≥ .7, and average variance extracted (AVE) ≥ .5, whereas discriminant validity was established based on the AVE being greater than the square of the correlation coefficients between factors and the correlation coefficient plus or minus 2 times the standard error (correlation coefficient ± 2 Â standard error) not including the value of 1 (Bae, 2016).
Next, multigroup confirmatory factor analysis was conducted to test for measurement invariance across groups.
The levels of invariance were tested using the following invariance steps: configural, factor loading, intercept, uniqueness, and factor variance/covariance (Bae, 2016). Next, the fit of each model was examined. After comparing latent means, if the configural invariance, factor loading invariance, and intercept invariance were established, the latent mean differences across two groups were then examined in a model in which the factor loadings and intercepts were constrained to be equal. In this study, Hedges's g and Glass's delta were used as the effect size, with a value of .20 interpreted as small, a value of .50 interpreted as medium, and a value of .80 interpreted as large. Hedges's g was used because each group differed significantly in terms of hospital-level percentage breakdown. Factor variance invariance was not confirmed in this study. Therefore, Glass's delta was used, as each monthly salary group had different standard deviations (Ellis, 2020).
Although w 2 difference is frequently utilized to test for measurement invariance, this parameter is known to be sensitive to sample size, which hinders accurate analysis of differences. Therefore, differences in model fit indices were utilized as an alternative approach. In this study, as suggested by Chen (2007), the differences in the CFI were used as the main criterion, whereas the differences in the RMSEA and SRMR were used as the subcriterion to identify the measurement invariance model. The comparison criteria differed according to the analytical model used. For studies, including this study, with less than 300 cases per group, the criteria for equivalence of fit between two models are ΔCFI ≤ −.005, ΔRMSEA ≤ −.010, and ΔSRMR ≤ −.025 for the test of factor loading invariance and ΔCFI ≤ −.005, ΔRMSEA ≤ −.010, and ΔSRMR ≤ −.005 for the test of intercept and uniqueness invariances.

Characteristics of Subjects
Most participants were in their 20s (39.8%), followed by 30s (22.6%). Most were women (96.5%), held a bachelor's degree (52.9%), and were in a clinical career stage with more than 6 years of experience (58.1%). The highest percentage of participants worked in geriatric units (36.9%), followed by medical units (23.6%) and surgical units (17.6%). The most common current job position was staff nurse (81.1%). Most nurses were assigned fewer than 30 patients per shift on average (54.6%), and most nurses had a monthly salary of less than approximately 2,200 U.S. dollars (51.9%). The characteristics of the general and long-term care hospital nurses are shown in Table 1.

Confirmatory Factor Analysis for the Emotional Labour Scale
Model fit indices are shown in Table 2. Confirmatory factor analysis was performed to test the factor structure of the Emotional Labour Scale and, with the exception of NC and SRMR, the model fit indices did not meet the criteria: w 2 = 483.216 (df = 101, p < .001), NC = 4.784, CFI = .864, GFI = .880, AGFI = .839, RMSEA = .091, and SRMR = .078. After checking the highest modification index, we set a covariance between the errors of Items 1 and 2 and examined the model fit once more. Only NC, RMSEA, SRMR, and GFI met the criteria, whereas the remaining fit indices did not: w 2 = 384.839 (df = 100, p < .001), NC = 3.848, CFI = .899, GFI = .904, AGFI = .869, RMSEA = .079, and SRMR = .071. Next, the highest modification index was rechecked, and a covariance between the errors of Items 11 and 12 was set for another test of model fit. With the exception of w 2 , which is influenced by sample size, and AGFI, the model fit indices all satisfied the criteria: w 2 = 325.602 (df = 99, p < .001), NC = 3.289, CFI = .920, GFI = .918, AGFI = .887, RMSEA = .071, and SRMR = .070. With the exception of Item 13 (factor loading = .46), the factor loadings for all of the items in the Emotional Labour Scale satisfied the criteria. Moreover, the criteria for construct reliability (≥ .78) and AVE (.58) were also met. Thus, convergent validity was established (Table 3). AVE was greater than the square of the coefficients of correlation between Factors 1 and 2 (.39), between Factors 1 and 3 (.37), and between Factors 2 and 3 (.47). Furthermore, the values for correlation coefficients (between Factors 1 and 2, Factors 1 and 3, and Factors 2 and 3) ± 2 Â SE did not include 1, at .30-.37, .35-.39, and .43-.51, respectively. Thus, discriminant validity was established.

Measurement invariance across hospital level
To test for measurement invariance across hospital level, we examined the fit of each model (Table 4). The configural invariance Note. Model 1 has a covariance between the errors of Items 1 and 2; Model 2 has two covariances between the errors of Items 1 and 2 and between Items 11 and 12. df = degrees of freedom; NC = normed chi-square; CFI = comparative fit index; GFI = goodness-of-fit index; AGFI = adjusted goodness-of-fit index; RMSEA = root mean square error of approximation; 90% RMSEA = the lower limit and upper limit of a 90% confidence interval for the population; SRMR = standardized root mean residual. *p < .001.

Latent mean differences across hospital level
The only latent mean for emotional control efforts in the profession was statistically significantly lower by .103 in general hospital nurses than in long-term care hospital nurses (CR = −2.430, p = .015). Hedges's g of Factor 1 (emotional control efforts in profession) was 0.262, indicating a small effect size, whereas those of Factors 2 and 3 were 0.056 and 0.029, respectively (Table 5).

Discussion
This study was designed to examine the factor structure of emotional labor developed by Hong (2016) and Hong and Kim (2019) in measuring and evaluating nurses' emotional labor and to verify whether measurement invariance was maintained across hospital and salary levels. Thus, a survey was conducted on registered nurses working in general and long-term care hospitals, and factor and measurement invariance analyses that discriminated among hospital and salary levels were conducted on the Emotional Labour Scale. Note. df = degrees of freedom; NC = normed chi-square; CFI = comparative fit index; RMSEA = root mean square error of approximation; SRMR = standardized root mean residual. *p < .05. Confirmatory factor analysis was conducted on the 16 items and three factors to identify the factor structure of the Emotional Labour Scale, whereas multigroup confirmatory factor analysis was conducted to examine whether the scale contains measurement invariance across two hospital levels and nurses' monthly salary levels. Furthermore, latent mean analysis was performed to examine potential variation in the means of the latent factors of emotional labor across two hospital levels and monthly salary levels.
After conducting a confirmatory factor analysis for each factor structure model to examine the fit of the factor structure model of the Emotional Labour Scale, Model 2, which has two covariances between the errors of Items 1 and 2 and between Items 11 and 12 (which had a high modification index), was identified as the most suitable. Item 1 is "I try to be kind to patients genuinely from my heart," whereas Item 2 is "I try to change my emotions to be positive to meet my patients' expectations." Item 11 is "I control my mind, thinking that patience is a virtue," whereas Item 12 is "I tolerate unfair treatment to maintain a good work atmosphere in the ward." Because each question has a similar meaning in Korean, it was thought that this may have raised the modification index.
Then, as a procedure before the measurement invariance verification, Model 2 (the model identified in the confirmatory factor analysis as most appropriate) was analyzed separately by hospital and salary levels, with the results assessed to identify match quality with the data in each group. As a result, the three-factor factor structure suggested by Hong (2016) and Hong and Kim (2019) was found to be the most suitable. We successfully replicated the result of the factor structure of the Emotional Labour Scale across hospital and monthly salary levels. This means that nurses in this study interpreted the items on the Emotional Labour Scale similarly regardless of work setting and salary level. In the future, further research should be conducted to explore the possibility of applying the factor structure suggested by the original author transversely (across different countries) using international comparative research.
In the tests for measurement invariance in the Emotional Labour Scale across hospital and monthly salary levels conducted in this study, the scale was shown to have factor loading invariance, intercept invariance, uniqueness invariance, and factor variance/covariance invariance, which is considered the highest level of invariance, across hospital levels. This supports that the scale is valid for application at both of the hospital levels examined, which is a major contribution of this study. In other words, empirical evidence is provided in this study of the usefulness of the Emotional Labour Scale in analyzing the between-group differences in both a general hospital and a long-term care hospital. Furthermore, the Emotional Labour Scale exhibited invariance up to intercept invariance across the two monthly salary levels (i.e., less than and more than 2,200 USD). The equivalence of residual variances with factor variances and covariances was not supported. Therefore, the mean of the latent variable between the two groups may be compared in the future, but caution is needed in the interpretation of the mean difference because of the differences in factor and residual variances.
General hospital nurses earned a lower mean score for F1 (emotional control efforts in profession) than their long-term care hospital peers. This result suggests that the latter display greater emotional control efforts. This implies that long-term care hospital nurses engage in greater emotional labor such as controlling their emotions or breaking down barriers when engaging in interactions with their patients, who typically stay in the hospital longer than general-hospital patients, and thus form stronger carer-patient bonds (Gray, 2009). In this study, the effect size on emotional control efforts in profession, a latent variable, was found to be small. The lack of significant between-group differences in latent mean values suggests the presence of similar variations of means in observed variables that take measurement error into consideration between the groups, with no significant measurement error.
The original scale developed by Hong (2016) and Hong and Kim (2019) targeted nurses working in general hospitals and upper-level general hospitals. However, in this study, nurses from tertiary hospitals (upper-level general hospitals) were excluded, and nurses working in long-term care hospitals and general hospitals were used for comparison. Thus, the Emotional Labour Scale has been used to examine nurses at each hospital level. Moreover, whereas Hong and Hong and Kim studied nurses from healthcare institutions in the Seoul metropolitan area, in this study, nurses were enrolled from healthcare institutions in Daegu City and Gyeongsangbuk-do to test for measurement invariance across regions. In addition, although nurses were distinguished into two groups based on monthly salary level, this allocation does not necessarily correlate with job position. Hence, subsequent studies should be developed to test for measurement invariance across job positions and analyze variance attributable to job position to confirm that the scale may be used regardless of job position. In general, measurement invariance between groups must be verified in scales (even standardized scales) to allow their measured concept to be applied across two or more groups. In the absence of verification, ascertaining whether between-group differences are caused by a true score or a systematic error in estimation of a value is difficult (Han et al., 2019). In addition, further studies to verify longitudinal measurement invariance as well as cross-sectional measurement invariance are suggested. Although no single statistical criterion is applicable to all situations, several important methodological issues based on a previous study (Han et al., 2019), including the sample size and equality across groups, were considered in this study. Because the sample sizes were not equal and the sample size in each group was less than 300, we used a stricter criterion (ΔCFI). The effect size measures were also considered.

Implications for Nursing Management
On the basis of the emotional labor scores of nurses, different hospital levels were found to maintain factor variance/covariance invariance, and wages (monthly salaries) were found to maintain up-to-intercept invariance. This supports the validity of the Emotional Labour Scale and its reliability for use in making comparisons and conducting between-group analyses distinguished by hospital level and salary level. The verification of measurement invariance supports using the Emotional Labour Scale to accurately measure the emotional labor of nurses across diverse settings to improve human resources management.
Emotional labor in nurses is widely considered to be higher than ever now because of factors including the COVID-19 pandemic. The findings of this study have significant implications for nursing management, as the scale may be utilized to provide valid baseline data for managing emotional labor in nurses. Furthermore, by better understanding emotional labor in nurses, nursing organizations can develop more-effective strategies to reduce job stress, increase job satisfaction, and reduce turnover intention. The Chinese version of the Emotional Labour Scale originally developed by Hong and Kim (2019) has also recently been developed and tested for validation (Ying et al., 2020). The data obtained using this scale may be useful in developing human resources strategies and management policies for nurses.