The health and suffering scale: Item reduction, reliability and validity among women undergoing rehabilitation for exhaustion and long‐lasting pain

Abstract Aim To investigate the necessity of an item reduction and to evaluate estimates of dimensionality, reliability and validity of the Health and Suffering Scale among two groups of women, one undergoing rehabilitation for exhaustion and long‐lasting pain and one reference group. Design Psychometric evaluation of the scale using cross‐sectional data. Method The Health and Suffering Scale is a self‐report scale which measures perceived suffering in relation to health on a semantic visual analogue scale. Classical and modern test theory were applied for item reduction and to explore estimates of reliability and validity. Results The Health and Suffering Scale was found to be unidimensional, nine of originally twenty items were part of a consistent factor structure and hierarchical order. These items were internally consistent, discriminated between patients and healthy respondents, and had an excellent level of separation of individuals experiencing various levels of health and suffering. Re‐test reliability estimates were moderate.


| BACKG ROU N D
Widely used instruments measuring health-related quality of life are mainly conceptualized around individuals' function and symptoms as well as expectations and concerns in everyday life (Brazier et al., 1992;Herdman et al., 2011;WHO, 2002). The shortcoming of instruments developed within the medical paradigm is that existential signs of health are either rarely touched or are approached in an objectified way rather than as an inner subjective experience.
We argue that nurses and caring scientists should complement each other with appropriate resources to estimate patients' health from a caring perspective. Caring science emphasizes that health cannot be understood without taking the phenomenon of suffering into consideration (Arman et al., 2015;Eriksson, 2006). Suffering is regarded to be an inseparable part of life, thus, understanding and integrating suffering into one's life is necessary in order to perceive health.
Suffering that is understood and integrated into life becomes a bearable and natural part of health (Arman et al., 2015;Eriksson, 2006).
According to theory, this back-and-forth movement between nonintegrated, unbearable suffering and integrated, bearable suffering can be affected by various coincidences in life (Eriksson, 2006).
Acknowledging these existential signs of health is specifically important in encounters with patients living with long-term disease or patients who are going through decisive periods of life (Rehnsfeldt & Eriksson, 2004). Challenging life situations usually involve suffering and existential caring encounters are aimed at making suffering bearable through the creation of meaning (Rehnsfeldt & Eriksson, 2004).
In line with an ontological understanding of health, Andermo et al. (2018) developed twenty items based on empirical data and nursing theory of health and suffering as outlined above. The items intend to capture an individual's balance of health and suffering on a visual analogue scale (VAS) between word pairs reflecting health and suffering. In the initial phase of development, the item collection was considered to be multidimensional (Andermo et al., 2018).
When developing a new scale, redundant or dysfunctional items are to some degree expected but undesirable (Streiner et al., 2015).
Thus, the necessity of an item reduction for psychometrical reasons must be considered in scale development. The original context of item development was in rehabilitation for long-term disease, pain and exhaustion among mainly female patients. The reason for this was that exhaustion and long-lasting pain are the most common reasons for sickness absence and are demanding public health problems both in Sweden (Swedish Insurance Agency, 2018, 2020 and the European Union (Breivik et al., 2013;Cimmino et al., 2011;Milczarek et al., 2009). A point of particular concern is that exhaustion and long-lasting pain are more prevalent in women than in men (Breivik et al., 2013;Cimmino et al., 2011;Norlund et al., 2010;Purvanova & Muros, 2010;Swedish Insurance Agency, 2018, 2020. It is both a question of validity and an ethical matter that self-report instruments used in clinical practice and research are perceived as relevant by the individuals themselves, otherwise, people's subjective perspectives risk being insufficiently considered in evidence-based care (Hagell et al., 2009). The twenty items intend to capture individuals' inner subjective experience of their health and were found to be perceived as relevant and meaningful (Andermo et al., 2018). Maintaining continuity in the scale development, the current psychometric evaluation of the items was performed among women in a rehabilitation context. The aims of the study were to investigate the necessity of an item reduction and to evaluate estimates of dimensionality, reliability and validity of the Health and Suffering Scale among two groups of women, one undergoing rehabilitation for exhaustion and long-lasting pain and one reference group.

| DE S I G N
In this second phase of development of the Health and Suffering Scale, psychometric properties of the scale and the items were tested using cross-sectional data.

| Sample
Women were selected consecutively at a rehabilitation clinic for exhaustion and long-lasting pain in Sweden. There were 297 eligible patients and the response rate after two reminders was 56.2%, yielding 167 participants. One participant's responses in the Health and Suffering Scale (HSS) were all missing and were excluded from the analysis, yielding 166 participants. The sample consisted of 94 participants (56.6%) undergoing rehabilitation for long-lasting pain and 69 participants (41.6%) undergoing rehabilitation for exhaustion.
Three participants (1.8%) received another kind of rehabilitation or had finished the rehabilitation program more than 3 months before (Table 1). More than a third of women (39.8%, N = 66) were working in human service professions, including childcare and teaching.
A reference sample consisted of nurses, occupational therapists and physiotherapists studying in health care programs for specialization within their profession at a medical university, Sweden. The sample was selected consecutively for known-group validation and calculation of test re-test reliability. Students on specialization level were chosen for known-group validation foremost because they were expected to be healthy women which is a central criterion for testing the scale's ability to differentiate between healthy and suffering women. Further, they matched to some degree patients' employment within human service profession and midlife situation. There were 209 eligible students and the response rate after two reminders was 61.7%, yielding a final sample of 129 participants. Response rate for the re-test was 83.7% (N = 108). Eight students (6.2%) reported having received rehabilitation for long-lasting pain or exhaustion during the last 3 months.

| Instruments
The Health and Suffering Scale (HSS) was developed on both an empirical and theoretical basis (Andermo et al., 2018). It is a self-report scale consisting of 20 items that intends to measure perceived suffering in relation to health on a semantic visual analogue scale (VAS). Perceived suffering is reported in relation to perceived health on a VAS between word pairs reflecting health and suffering according to the theory of Eriksson (2006), for example, "lost grip on life -understanding about life" or "life without meaning -meaningful life" (Figure 1). Two of the 20 items directly reflect the concepts of health and suffering: "Barriers to health -Health" and "Unbearable suffering -bearable suffering". The remaining 18 items were initially related to five sub-domains of health and suffering: life passion and energy, presence in life, relationships, personal freedom and meaning. The VAS registers 101 steps from 0 to 100.

| Data collection
Patients and students were orally informed about the study in the middle of the 10-weeks rehabilitation period and in connection to a lecture, respectively, and could voluntarily sign up for an information letter including a personal link to the study's web survey by email.
A first and second reminder was sent out after 2 and 4 weeks after the information letter had been sent. Data were collected between October 2018 and August 2019.
Participants entered the web survey by a personal link.
The web survey was identical for both samples, consisting of F I G U R E 1 Example of four Health and suffering scale items taken from the web survey sociodemographic questions, HSS, KEDS and a third instrument not evaluated in the current study. Patients were asked to complete the survey once, whereas students were asked to complete the HSS for a re-test 2 weeks after having submitted the survey for the first time.

TA B L E 1 Sociodemographic characteristics of participants
Students were asked to participate in a re-test because they were expected to be more stable in their health experience than patients under rehabilitation. The median (IQR) time between answering test and re-test was 15 (13-20) days.

| ANALYS IS
Psychometric hypotheses guiding the analysis: 1. Dimensionality a. The HSS is expected to be multidimensional with factors having a reasonable share of the total variance.
b. The factor structure is simple (items load only on one factor) and consistent between both samples.

Item reduction
a. Item reduction will be necessary for items with a unique variance >0.5 and inconsistent hierarchical position between samples.

Reliability estimation
a. Coefficient α is between 0.7 and 0.9 for each extracted factor.
b. The paired differences of the test and re-test sample come from a distribution with zero median and the ICC estimate is categorized as substantial (> 0.8).
c. The HSS can separate between the two samples and individuals perceiving various levels of suffering within each sample.

Construct Validation
a. Item hierarchy is meaningful from a theoretical point of view and consistent in both samples.
b. The HSS targets the patient group better than the reference sample and differentiates between the two samples.

| Dimensionality and item reduction
Factor analysis was applied separately in the two data sets, both to explore dimensionality of HSS and for item reduction (MatLab version R2019b, MathWorks). Factor solutions consisting of factors with eigenvalues >1 were investigated and estimated factor loadings were rotated according to both the varimax and promax method. The communality of a variable should be >0.5 and a factor should not consist of fewer than three variables (Norman & Streiner, 2014). Factor loadings exceeding 0.5 are considered practically significant and factor loadings >0.7 indicate a clear factor structure (Hair et al., 2016).
The MatLab function 'factoran' returns maximum likelihood estimates of the factor loading matrix and the specific variances.
The patient sample skewness was −0.14 and kurtosis was 2.36. The distribution of the reference sample was moderately skewed to the left (−0.69) and had a kurtosis of 3.36. Thus, the assumptions of normality were not severely violated considering that a normal distribution has a skewness of 0 and a kurtosis of three. Further, exploratory factor analysis is robust against moderate deviations from normality (Norman & Streiner, 2014).

| Reliability estimation and construct validation
The Andrich rating-scale model ( (Wright & Linacre, 1994). An instrument discriminates a sample into three or four levels for person reliability >0.9 and a person separation index of three has been described as an excellent level of separation (Boone et al., 2013).
Internal consistency was evaluated with Cronbach's alpha coefficient (Leontitsis, 2020). Intraclass coefficient (ICC) was calculated to estimate test re-test reliability according to Shrout and Fleiss (1979), convention (2,1), with a 95% confident interval (Qin et al., 2019;Zoeller, 2020). The ICC was selected based on the following assumption: In the test re-test, subjects provided observations for both measurement occasions which require an analysis by a 2-way model (Weir, 2005). Addressing both systematic and random error in the estimation of the ICC was of interest in order to take agreement (versus consistency) of individual observations between the two measurements into account (McGraw & Wong, 1996;Weir, 2005).
Health and Suffering Scale data were missing for 9.8% of observations among patients versus 1.6% among students. In all the missing values in the patient sample, no slider movement was registered.
The VAS slider in the web survey was on zero by default and participants might not have moved the slider because it already appeared to be in the position zero, reflecting unbearable suffering. In comparison, the percentage of missing values in KEDS-generated data was 1.4% among patients versus 0% among students. We assumed the percentage of missing values in the HSS-generated data to be equally distributed as observed in the student data and the KEDSgenerated data. Missing values were therefore treated as value zero.

| RE SULTS
Women in the patient sample were older, had a lower education, were more often single, and were more often job seeking than women in the reference sample (Table 1)

| Dimensionality
The exploratory factor analysis of the patient data identified two factors with eigenvalues >1 (11.2, accounting for 56% of the total variance and 1.4, accounting for 7% of the variance of the 20 items).
In the reference sample, three factors with eigenvalues >1 were identified (9.7, accounting for 48.5% of the total variance, 2.1 accounting for 10.5% of the total variance and 1.2, accounting for 6% of the total variance of the 20 items). Biplots and deep scree plots indicated a strong first factor.
After this preliminary factor extraction, the 'varimax' rotation revealed that nine items were identical in the first factor and that 4 items were identical in the second factor. The nine identical items in the first factor had a unique variance of <0.5, whereas the four identical items in the second factor had a unique variance of >0.5. The comparison indicated inconsistency in the 2-factor solution between the two samples and HSS was explored for unidimensionality.
The 1-factor solutions showed that all items loaded >0.5 in the patient sample compared to13 items in the reference sample, implying that more than 25% of the variance of most items can be explained by one factor (Hair et al., 2016) (Table 2). Analysing both samples as one sample (N = 295) revealed a strong first factor with an eigenvalue of 12.16 (explaining 60.8% of the total variance) and a second factor with an eigenvalue of 1.46 (7.3% of the total variance).

| Item reduction
Following a 1-factor solution, items with a unique variance of >0.5 were excluded (Table 2). According to this criterion, nine items were suggested to be retained with total consistency between Applying RSM for comparison of item hierarchies between the two samples and the re-test reference sample (Table 3)

| Reliability estimation
The probability curves of categories within the RSM showed that each part of an item measure was assigned one most probable response category with distinct probability peaks >0.

| Construct validation
The Wright map of the patient sample (Figure 2a)

| D ISCUSS I ON
The current study found the HSS to be of unidimensional character; nine of originally twenty items were part of a consistent factor structure and hierarchical order in two different samples of women.
These items were internally consistent and discriminated between patients and healthy respondents. Further, they showed to have an excellent level of separation of individuals experiencing various TA B L E 3 Item difficulty of the health and suffering scale Note: Ordered from the most difficult items at the top to the easiest item at the bottom.

TA B L E 4
Health and suffering scale mean sum score values for known-group comparison and test-retest comparison levels of suffering in each sample. However, re-test reliability in a reference sample was estimated to be fair to moderate among a sample of women who were expected to be more stable regarding their perceived health and suffering.
The study's major finding is that the original, supposedly multidimensional, instrument turned out to be unidimensional and that only nine of 20 items were retained, a procedure that risks changing the scale's core construct and therefore critically needs to be discussed.
Those nine items which contributed to a consistent factor structure in two different but also comparable contexts were closely related to caring science theory on health and suffering. From previous re- Reliability coefficients reflect the amount of error and "the extent to which a measurement instrument can differentiate among individuals" (Streiner et al., 2015, p. 161 (Streiner et al., 2015;Weir, 2005). Signs of low between-subject variability in the reference sample could be found in the Wright map.
The Wright map of the reference sample illustrated that most individual scores were high indicating ceiling effects with the risk to not sufficiently separate students' perceived suffering outside the narrow ability range of the existing items. The scale targeted the patient sample much better, as illustrated by the corresponding Wright map, which leads to the conclusion that higher between-subject variability might be expected and consequently higher estimates of reliability in the patient sample. One limitation of the current study is that a test re-test was not performed in the patient sample because the patients were not regarded to be stable in their health and suffering experience when undergoing rehabilitation. Even though reliability estimates are rather bound to the interaction between a specific study context and the instrument than the instrument itself (Shrout, 1998;Streiner et al., 2015), the obtained ICC estimates have implications for the power calculation of studies using the scale. In future studies, it has to be taken into account that sample size needs to be adjusted to the probability of detecting a true effect which in turn depends on the ability of the HSS to differentiate among individuals, its reliability (Shrout, 1998;Streiner et al., 2015;Weir, 2005).
For convergent validation of the HSS, a validated clinical measure was needed that captured the health condition of both patients with pain and those with exhaustion. As pain disorders and symptoms of exhaustion disorder tend to co-occur (Borchers & Gershwin, 2015;Eller-Smith et al., 2018;Salvagioni et al., 2017;Yalcin & Barrot, 2014), the assessment instrument for exhaustion disorder syndrome, KEDS (Beser et al., 2014), was chosen as a clinical convergent measure.
When evaluating convergent validity, the association between the new construct and a similar established construct should be robust, but not "overly high" (Streiner et al., 2015, p. 240). A moderate association between exhaustion and perceived health and suffering was found, and according to the confidence interval it can be considered robust at least for the patient sample. This indicates that estimates of exhaustion are specific to the clinical context but not equivalent to the construct of health and suffering which is more general and not bound to a specific clinical context. Although pain and symptoms of exhaustion co-occur (Yalcin & Barrot, 2014), the KEDS is no absolute measure for the health of the heterogenic patient group of patients suffering from pain and/or exhaustion. This effect might have contributed to a moderate association between clinically determined health and perceived health and suffering.

| Limitations
The psychometric properties of the HSS were evaluated within patients undergoing rehabilitation for long-lasting pain and exhaustion as well as in students in health care programs, contexts that are strongly dominated by female individuals. Thus, further evaluation of the HSS in male populations is needed. A major drawback of the chosen reference sample is that respondents were not selected from the Swedish normal population but from a university, causing significant differences in the age and educational level between the samples. Further, the requirement of a sufficient sample size for factor analysis was only fulfilled at a minimum level with subject to item ratios of 8:1 and 6:1. The statistical risks are that the items loaded on the wrong factor with a misleading factor structure as a consequence (Norman & Streiner, 2014). However, analysing both samples as one, implying a subject to item ratio of almost 15:1, reinforced the unidimensional structure of the scale. According to the item separation indexes obtained in both samples (>4, item reliability >0.9), the sample size needed for a reliable Rasch analysis was fulfilled (Boone et al., 2013). Another major limitation is the high percentage of missing items in the HSS due to the deceptive appearance of the digital slider function.

| CON CLUS ION
The study identified a HSS nine-item version including five response categories (Figure 3) to be a unidimensional measure of perceived health and suffering with reasonable estimates of reliability and validity among women undergoing rehabilitation for exhaustion and pain. The scale reflects an ontological understanding of health as subjective, existential and dynamic, embracing suffering as a natural part of life. After further psychometric investigation, the HSS is expected to help patients and health care professionals guide and evaluate rehabilitation processes that aim to enhance individuals' health and alleviate their suffering. This psychometric evaluation of the HSS serves as a first basis for future studies with the aim to estimate patients' subjective health and suffering from a caring perspective within the context of rehabilitation of exhaustion and long-lasting pain. Nevertheless, we advise, in line with general recommendations of psychometricians, to re-assess psychometric properties of the HSS in parallel with future investigations' main aim.

ACK N OWLED G EM ENTS
The authors would like to thank all patients and students for their participation in the study and the clinics for its collaboration.

CO N FLI C T O F I NTE R E S T
The authors have no conflicts of interest to declare. All co-authors have seen and agree with the contents of the manuscript and there is no financial interest to report. We certify that the submission is original work and is not under review at any other publication.

AUTH O R CO NTR I B UTI O N S
AG, SA, and MA designed the study. AG collected the data, performed the analysis, and wrote the manuscript. The analysis was scrutinized by AL and critically discussed by all authors. All authors contributed to the final manuscript and approved to submission.

DATA AVA I L A B I L I T Y S TAT E M E N T
Raw data are archived at Karolinska Institutet, Sweden. Data supporting the findings of this study are available from the corresponding author AG on reasonable request.