Measuring mental well-being in Norway: validation of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS)

Background Mental well-being is an important, yet understudied, area of research, partly due to lack of appropriate population-based measures. The Warwick-Edinburgh Mental Well-being Scale (WEMWBS) was developed to meet the needs for such a measure. This article assesses the psychometric properties of the Norwegian version of the WEMWBS, and its short-version (SWEMWBS) among a sample of primary health care patients who participated in the evaluation of Prompt Mental Health Care (PMHC), a novel Norwegian mental health care program aimed to increase access to treatment for anxiety and depression. Methods Forward and back-translations were conducted, and 1168 patients filled out an electronic survey including the WEMWBS, and other mental health scales. The original dataset was randomly divided into a training sample (≈70%) and a validation sample (≈30%). Parallel analysis and confirmatory factor analysis were carried out to assess construct validity and precision. The final models were cross-validated in the validation sample by specifying a model with fixed parameters based on the estimates from the trainings set. Criterion validity and measurement invariance of the (S)WEMWBS were examined as well. Results Support was found for the single factor hypothesis in both scales, but similar to previous studies, only after a number of residuals were allowed to correlate (WEMWBS: CFI = 0.99; RMSEA = 0.06, SWEMWBS: CFI = .99; RMSEA = 0.06). Further analyses showed that the correlated residuals did not alter the meaning of the underlying construct and did not substantially affect the associations with other variables. Precision was high for both versions of the WEMWBS (>.80), and scalar measurement invariance was obtained for gender and age group. The final measurement models displayed adequate fit statistics in the validation sample as well. Correlations with other mental health scales were largely in line with expectations. No statistically significant differences were found in mean latent (S)WEMWBS scores for age and gender. Conclusion Both WEMWBS scales appear to be valid and precise instruments to measure mental well-being in primary health care patients. The results encourage the use of mental well-being as an outcome in future epidemiological, clinical, and evaluation studies, and may as such be valuable for both research and public health practice.


Background
As declared by the World Health Organization (WHO), mental health is not solely characterized by the lack of negative symptoms or the absence of mental disorders [1]. Increasingly, the definition of mental health also incorporates the presence of psychological resources, encompassing both hedonic (subjective well-being) and eudemonic (psychological functioning) aspects [2,3]. The WHO has even declared positive mental health to be the foundation for well-being and effective functioning for both the individual and the community [1]. The positive approach towards mental health has been applied in numerous areas [4]. For example in clinical psychology, it has been shown that interventions focusing on strengths and positive emotions can be as effective in treating mental disorders as more traditional approaches like cognitive behavioral therapy [5,6]. Another example is the increased focus on positive attributes (assests) of people and/or communities within the area of public health [7]. Despite these advances, the field of positive mental health continuous to be under-researched partly because of the lack of appropriate population-based measures [8].
The Warwick-Edinburgh Mental Health Scale (WEMWBS) was developed to meet the need for a psychometrically sound measure of positive mental health [9]. The scale was derived from the "Affectometer 2", a mental wellbeing scale with several favorable psychometric properties, but also with important limitations with regard to social desirability bias, item redundancy, and scale length [10]. Based on literature, validation results of the Affectometer 2, and input from focus groups, an expert panel agreed on key concepts and items that should be part of the new and improved scale. The key concepts were "positive affect and psychological functioning" (including autonomy, competence, self-acceptance, and personal growth), and "interpersonal relationships". The final scale consisted of 14 positively worded items [9].
In the UK, the WEMWBS has been considered as an appropriate tool to measure mental well-being in different samples, such as overall population samples [9,11,12], students [9,13], teenagers [14,15], clinical samples [16,17], and ethnic minority samples [18]. Yet, high values for Cronbach's alpha led to the suspicion that item redundancy could be an issue for the WEMWBS as well. As a result, the 7-item short WEMWBS (SWEMWBS) was developed [19,20]. The SWEMWBS has been preferred in terms of its psychometric properties and its convenience for monitoring positive mental health. However, it presents a more restricted definition of mental well-being as it mainly encompasses hedonic items. The WEMWBS may therefore be preferred when content coverage is an issue [19,20]. It should also be noted that a recent study indicated limited discriminant validity of the WEMWBS when compared to the General Health Questionnaire (GHQ-12), a common mental distress measure [21].
To date the WEMWBS has been translated to more than a dozen languages such as Hindi, Urdu and Arabic [22]. Some of these translations have been validated and published, including Dutch, Italian, Spanish, and Portuguese versions [11,[23][24][25][26]. The Chinese, Norwegian, and Swedish version of the SWEMWBS have also been validated and published [17,27]. The full version of the WEMWBS has not been validated in Norway yet and the validation of the Norwegian version of the SWEMWBS by Haver et al. [27] was conducted among Norwegian hotel managers only. It is therefore necessary to assess whether their findings can be generalized to other populations as well.
Several of the previous validation studies of the (S)WEMWBS (this abbreviation includes both WEMWBS and SWEMWBS) found evidence against the original proposed 1-factor structure [9,13,18,27] as indicated by poor model fit. This may suggest multidimensionality, but a meaningful additional factor has not been identified so far [18]. Some studies improved model fit by including a number of correlated error terms to the 1-factor model [9,27]. However, a clear justification, other than improving model fit, was not given. As models with and without correlated error terms may be differentially associated with relevant other variables, it would be of interest to examine the impact of correlated error terms in more detail.
An important advantage of the WEMWBS has been its combined brief and rich description of positive mental health, which is useful in monitoring mental health in the population, as well as in evaluating mental health programs. One such program in Norway is Prompt Mental Health Care (PMHC), which is modelled after the English program Improving Access to Psychological Therapies (IAPT) [28]. Like PMHC, IAPT is a free-of-charge, low-threshold, primary health care program, aimed at reaching adults with anxiety and mild to moderate levels of depression. Cognitive behavioural therapy (CBT) is provided by multidisciplinary teams of health care professionals including at least one psychologist. PMHC was launched in Norway in 2012 and has to date been expanded to 23 sites across the country [29].
As a mental health program, PMHC aims to reduce symptoms of depression and anxiety, as well as to increase work participation, quality of life and mental well-being. The WEMWBS has been used to measure the latter outcome. The aim of the present study is to examine the psychometric properties of the original and the abbreviated WEMWBS in this Norwegian sample of primary care patients.

Participants
Eighteen hundred and 58 patients received treatment at PMHC between October 2014 and February 2016 across 14 different sites. Of the 1858 patients that received treatment, 1189 participated in the study, resulting in an overall participation rate of 64%. Participation was based on opt-in, where all eligible clients where invited after the initial assessment by a PMHC therapist. All participants provided written informed consent upon recruitment. Patients either were referred to the service by their general practitioners or contacted the free-of-charge service themselves. Eligible patients were adults with anxiety and/or low to moderate levels of depression, and whose home address was within their respective PMHC site. Patients with suspected psychosis, bipolar disorder, personality disorder, severe drug abuse, and suicide risk were generally excluded from PMHC, and were referred to the GP or more specialized mental health care services. The data material presented in this manuscript is based on the responses to a set of questionnaires that all participating PMHC patients completed prior to treatment. The study was approved by the Regional Committee for Medical Research Ethics in Norway (nr. 2014/597).

Measures
The Warwick-Edinburgh mental well-being scale (WEMWBS) The WEMWBS is a 5-point Likert scale consisting of 14 items, which can be ranged from "none of the time" to "all of the time". A global score was calculated by adding up item scores, ranging from 14 to 70. The higher the global score, the higher the level of mental well-being. The original WEMWBS showed high reliability, low social desirability bias, and confirmatory factor analysis supported the single-factor hypothesis, after allowing some of the residuals to correlate [9]. Moreover, the original scale showed high positive correlations with other well-being scales, and low to moderate positive correlation with overall health [9].
The SWEMWBS consist of 7 items, and was found to have good psychometric properties as well [20,24]. Haver et al. [27] assessed the validity of the SWEMBS among Norwegian and Swedish hotel managers, and reported acceptable psychometric properties.

Translation into Norwegian
The method of forward and back translation was used to translate the original scale to Norwegian, as advised by Health Scotland [30]. In Stage 1, the original scale was independently translated by an expert panel of four people, of whom two were native Norwegian speakers. All four were fluent in both English and Norwegian. Three had knowledge about the instrument. In stage 2, the translators agreed upon a synthesized version with a recording observer present. In stage 3, the synthesized version was translated back to the original language, English, by two additional independent translators with fluency in both English and Norwegian. In stage 4, the expert panel developed a pre-final version of the questionnaire for field testing. Finally, in stage 5 a small sample of psychology students at the University of Bergen (N = 5) completed the pre-final version of the questionnaire, and were assessed for question comprehension and interpretation. As this did not lead to further changes, the pre-final version was adopted as the final version of the Norwegian WEMWBS.

Other measures
Other measures were used to assess criterion validity of WEMBWS. All measures included in this study were self-administered.
The Patient Health Questionnaire (PHQ-9) was used to measure depressive symptoms [31]. It included 9 items based on each of the DSM-IV criteria for depression, and could range from 0 ("none of the time) to 3 ("all of the time"). This yielded a total sum score that ranged from 0 to 27. Cronbach's alpha was .85 in the current sample.
The Generalized Anxiety Disorder Assessment (GAD-7) was used to measure anxiety [32]. It includes 7 items to score common anxiety symptoms, and contains the same response alternatives as the PHQ-9, ranging from 0 ("none of the time) to 3 ("all of the time"). Total score could range from 0 to 21. In addition to measuring generalized anxiety disorder, there are indications that the GAD-7 also has good sensitivity and specificity for panic, social anxiety, and post-traumatic stress disorder [32]. Cronbach's alpha was .87 in the current sample.
The EQ-5D-5 L was used to measure functional health status [33]. It included 5 items measuring mobility, selfcare, usual activity, pain/discomfort, and anxiety/depression, and could range from 1 (no problem) to 5 (extreme problem/unable to carry out activity). This yielded a total sum score that ranged from 5 to 25. The higher the score, the lower the level of functional status. Cronbach's alpha was .66 in the current sample.
An abbreviated version of the Mindful Attention Awareness Scale (MAAS-5) was used to measure awareness of and attention to whatever is happening in the present [34][35][36]. It included 5 items ranging from 1 to 6 from which a mean can be computed. Higher scores reflect higher levels of mindfulness. Cronbach's alpha was .87 in the current sample. The Brief Self-Control Scale (BSCS) is a short version of the Self-Control Scale which measures five domains: Controlling thoughts, controlling emotions, controlling impulses, regulating behavior/performance, and habit-breaking [37]. It included 13 items, and the Norwegian version could range from 1 (Disagree Very Strongly) to 6 (Agree very strongly) for 5 items, and 1 (never) to 6 (always) for the remaining 8 items. Higher scores reflect lower levels of self-control. Cronbach's alpha was .77 in the current sample.

Statistical analyses
Prior to analyses, the original dataset (N = 1168) was divided into a training sample (≈70%) and a validation sample (≈30%) by means of a random split [38]. The training set was used for all analyses described below, while the validation set was only used to validate the final measurement models.
Item level descriptive statistics were calculated to examine the distributional properties of the WEMWBS.
For each WEMWBS item, a univariate ordinal probit variance component model was fitted in order to examine whether it was necessary to account for the cluster effects of pilot site (average cluster size = 64). Intraclass correlation coefficients were calculated for each item as the proportion of the residual between-group variance and the total variance (probit variance at the within-group level was standardized to 1). The largest ICC's were found for item 13 (ICC = .009) and item 4 (ICC = .013). The ICC's of all other items were ≤.002. Given the very small ICC's across all items, accounting for the cluster effect of pilot site was deemed unnecessary.
A parallel analysis was carried out to determine whether a multidimensional factor structure was supported in our data based on the eigenvalues from the sample correlation matrix. For this particular analysis, we treated the 5-point Likert scale of the WEMWBS as continuous. The 95th percentile criterion was used to decide on the number of factors. Results indicated that only the first eigenvalue of the sample correlation matrix (λ 1 = 6.45; λ 2 = 1.19) was larger compared to the 95th percentile eigenvalues (λ 1p = 1.27; λ 2p = 1.21) of the parallel analysis, which supported the 1-factor structure of the WEMWBS.
Confirmatory factor analysis based on the polychoric correlation matrix was fitted to the data, which provides statistical results analogous to Samejima's Graded Response Model [39]. This model was used to assess the unidimensional fit and the construct validity of both the full Norwegian WEMWBS and SWEMWBS (indicators specified as categorical, robust weighted least squares estimator (WLSMV), parameterization Theta). The mean and variance of the latent factor were set to respectively 0 and 1. WLSMV allows for partially missing data. Only 1.2% of the participants did not complete any of the WEMWBS items, whereas the percentage of missingness per item varied between .9% and 2.5%. The Root Mean Square Error of Approximation (RMSEA) and the Comparative Fit Index (CFI) were used as goodness of fit measures. An RMSEA close to or lower than .06 and a CFI close to or higher than .95 was adopted to indicate good model fit [40]. The initial 1-factor models (WEMWBS and SWEMWBS) were specified without correlated errors. If model fit was poor, correlated errors were added in a stepwise fashion based on the largest standardized expected parameter change (SEPC) until adequate fit statistics were obtained [41], similar to the procedure adopted by Tennant et al. [9]. The final models were cross-validated in the validation sample by specifying a model with fixed parameters based on the estimates obtained from the trainings set.
Measurement invariance was examined separately for gender (male vs female), and equal sized age groups (18-30; 31-43; ≥44). First, configural invariance was tested by estimating the 1-factor model of the WEMWBS in each group without constraining factor loadings and intercepts. In the next step, metric invariance was tested by constraining the factor loadings to be equal in each group. Due to the categorical nature of the indicators, additional constraints were required to test metric invariance; the first threshold of each item was held equal across groups and the second threshold of the item that was used to set the metric of the factor was also held equal across groups [42]. Finally, both factor loadings and thresholds were constrained to be equal across groups to test for scalar invariance. Scalar invariance is required for comparing absolute scores across groups. In many cases, full scalar invariance cannot be obtained and one or more of the constrained model parameters need to be set free in order to improve model fit. According to recommendations from Byrne et al. [43] partial scalar invariance is obtained when at least two of the factor indicators are invariant. Testing strict measurement invariance, in which residual variances are fixed to one across groups, was considered less relevant for the present study since correlated errors were explicitly accounted for in the measurement model [44]. Adjustments to the model were informed by SEPC. In line with Cheung and Rensvold [45], we used a change in CFI of more than 0.01 as an indicator of true difference in relative model fit.
To examine the impact of taking into account measurement error and correlated error terms on structural parameter estimates, the correlations with the criterion variables were calculated for latent (S)WEMWBS scores with and without correlated errors, and manifest (S)WEMWBS scores. This resulted in three correlations per WEMWBS version for each criterion variable. To express the relative difference between these three estimates, relative bias (|r|-|r ref |/|r ref |) was calculated using the correlations between the criterion variables and the latent scores from the (S)WEMWBS model with correlated errors as the reference. Small relative bias for the estimated structural correlations based on the latent (S)WEMWBS model without error terms as compared to the model with error terms would support the justification of the latter model. Not accounting for measurement error typically attenuates the size of the correlation between variables and lower reliability would therefore result in larger relative bias when applying manifest sum scores. Finally, mean WEMWBS scores across age and gender were examined. We expected similar levels of mental well-being across age groups, and men to have higher levels of mental wellbeing as compared to women [9].
The Statistical Package for Social Science (SPSS) version 22 was used to prepare the data file and for basic descriptive statistics. Mplus version 7.11 was used for all other analyses.

Sample and item characteristics
The training sample included 799 participants of which 73% were women, 44% had higher education than secondary school, and 61% were living with a partner or spouse. Mean age was 37.3 (SD = 12.6). Forty percent were employed, 34% were on sick leave, 5% were receiving disability pension, and 21% belonged to another occupational status category.
As displayed in Table 1, all categories were used and there were at least 6 responses for each answer category across all items. There was evidence for skewness for some of the items, in particular items 3, 5, 10 and 13, while kurtosis was most pronounced for items 9 and 12. In histograms, the WEMWBS and SWEMWBS sum scores seemed normally distributed, and there were no indications for floor and/or ceiling effects. The mean manifest sum scores of the WEMWBS and the SWEMWBS were 38.6 (SD = 8.9) and 20.0 (SD = 4.5).

Construct validity
Confirmatory factor analysis was conducted to test the hypothesized one-factor structure of the Norwegian version of WEMBWS, and goodness of fit for the single-factor model was tested. The initial model, assuming no dependencies among residuals, showed poor fit (CFI = 0.90; RMSEA = 0.11). After adding correlated error terms in a stepwise fashion, adequate fit statistics were obtained after 10 steps (CFI = 0.99; RMSEA = 0.06), as seen in Table 2.
The final model was tested in the validation sample and displayed adequate fit statistics (CFI = 0.99; RMSEA = 0.03). The same analyses were conducted to test the hypothesized one-factor structure of the Norwegian short-version of the scale, the SWEMWBS. Like the 14-item model, confirmatory factor analysis of the 7-item model, assuming no dependencies among residuals, showed relatively poor fit (CFI = 0.95; RMSEA = 0.11). Adequate model fit (CFI = .99; RMSEA = 0.06) was obtained after adding 5 correlated error terms. As can be seen in Table 2, adding correlated error terms had relatively little effect on the discrimination parameters (factor loadings). The final model of the SWEMWBS was tested in the validation sample and displayed adequate fit statistics as well (CFI = 0.98; RMSEA = 0.04). A value .95 was found for the correlation between manifest scores of the WEMWBS and SWEMWBS.

Measurement invariance
The configural model for the WEMWBS with correlated errors yielded an acceptable fit for both gender (RMSEA = .06; CFI = .98) and age group (RMSEA = .07; CFI = .98). Subsequent estimation of the metric and scalar models yielded acceptable model fit statistics as well. For gender, ΔCFI was <.001 for the metric vs configural comparison and .001 for the scalar vs metric comparison. For age, ΔCFI was .004 for the metric vs configural comparison and .002 for the scalar vs metric comparison. Similar results were obtained when testing the measurement invariance of the SWEMWBS.

Precision
As displayed in Fig. 1, the conditional precision for the WEMWBS was >.90 ± 2SD from the mean for both the initial model without correlated errors and the final model with correlated errors. For the SWEMWBS, the conditional precision was >.80 for ±2SD from the mean for the models with and without correlated errors. For comparison, Cronbach's alpha was .91 and .83 for the WEMWBS and the SWEMWBS, respectively. It should be noted that the average inter-item correlation for the WEMWBS was .42 (18.7% of inter-item correlations >.5, range = .21 to .73), whereas the average inter-item correlation for the SWEMWBS was .41 (23.8% of interitem correlations >.5, range = .24 to .61). The lower alpha for the SWEMWBS seems therefore primarily the result of fewer items, and not per se due to the removal of redundant items.

Criterion validity
The correlation of the mental well-being scores with depressive symptoms varied between -.67 and −.62, in line with expectations (see Table 3). Moderate correlations were found with anxiety (−.46 < r < −.46) and functional health status (−.47 < r < −.44). The correlation with mindfulness was low (.27 < r < .29), whereas the correlation with self-control was somewhat higher than expected (−.37 < r < −.35). Still, this correlation was significantly lower as compared to the correlations with anxiety and functional status. Relative bias appeared to be small in most cases, except for manifest SWEMWBS scores. For three out of five correlations with manifest SWEMWBS, the relative bias exceeded 10% (Table 3).

Discussion
There has been an increased interest for measuring positive aspects of mental health [46]. The WEMWBS was developed in the UK as a broad measure of mental wellbeing capturing both hedonic and eudemonic aspects with good psychometric properties. The primary aim of this study was to validate the WEMWBS and its abbreviated version (SWEMWBS) in a sample of Norwegian primary health care patients who suffer from anxiety and/or mild-to-moderate depression. The unidimensional nature of the WEMWBS was confirmed by means of a parallel analysis, and CFA indicated acceptable model fit after including a number of correlated error terms. The latter is generally considered bad praxis, as correlated error terms should only be added in case these can theoretically be justified [44]. Correlated error terms can alter the meaning of the underlying latent construct, and can bias structural parameter estimates. However, in the case of the WEMWBS, our results showed that the relative bias in structural parameter estimates of associations with relevant criterion variables were small for the (S)WEMWBS with and without correlated error terms, and manifest WEMWBS scores. As expected, the relative bias was somewhat higher for manifest SWEMWBS scores as 3 out of 5 correlations had relative biases exceeding 10%. Forero et al. [47] labeled a relative bias of more than 10% as "substantial", but a general consensus on what is substantial bias seems to be lacking. Nonetheless, our finding suggests that relative bias should be taken into the equation when using manifest SWEMWBS scores.
In contrast to some previous studies [19,20], the present study did not formally test whether the SWEMWBS fitted the Rasch model. However, given that it did not fit the less restrictive graded response model, it would be safe to conclude that the Rasch model did not hold in this sample of Norwegian primary health care patients. Future studies should examine whether it's possible to derive an   [9,19,20,24]. No evidence was found for floor or ceiling effects, suggesting the scale has potential to examine change during the course of treatment. In addition, the precision estimates of both WEMWBS versions were good. Full scalar measurement invariance was obtained for gender and age group, suggesting that meaningful comparisons of the WEMWBS scores across these groups can be made. Previous studies found measurement invariance across age [15], but not across gender [15,19]. Associations with criterion variables were largely as expected, and in line with previous findings [9,11,15,19,27].

Limitations
A number of limitations should be mentioned. The sample consists of primary health care patients only, and the findings may therefore not be extended to the general Norwegian population. Moreover, responsiveness of the Norwegian versions of WEMWBS and SWEMWBS should be assessed in future studies in order to address whether they are responsive to change, as was the original scale [48]. The current study provided limited information on the discriminant validity of the WEMWBS. In the light of recent findings [21], more research is needed to determine how different the WEMWBS is from other competing measures. Finally, test-retest reliability and measurement invariance across time were not tested.

Conclusion
In summary, both the full and short WEMWBS scales appear to be valid and precise instruments to measure well-being in Norwegian primary health care patients with anxiety and/or mild-to-moderate depression. The SWEMWBS has a clear advantage over the WEMWBS due to its brevity. Future studies are warranted to confirm these findings in similar and other populations, and extend the validation work as outlined in the limitations section. The results of the present study encourage the use of mental well-being as an outcome in epidemiological, intervention and evaluation studies, and may as such be valuable for both research and public health practice.