Validity and reliability of the Fatigue Severity Scale in Finnish multiple sclerosis patients

Abstract Background Fatigue is one of the most debilitating symptoms in multiple sclerosis (MS) considerably interfering with patients’ daily functioning. Both researchers and clinicians need psychometrically robust methods to evaluate fatigue in MS. Objectives The objective of this study was (i) to evaluate the psychometric properties of the Finnish version of the Fatigue Severity Scale (FSS) and (ii) to describe the results among patients with MS. Methods In total, 553 patients with MS (mean age, 53.8 years; standard deviation [SD], 11.4; 79% women: mean patient‐defined disease severity, Expanded Disability Status Scale [EDSS] 4.0, SD, 2.5) completed the self‐administered questionnaires including the FSS. A standard procedure was used for the translation of the FSS. Results The mean (SD) score for the FSS was 4.5 (1.7); in 65% of the patients, the score was ≥4.0. The data quality of the FSS was excellent, with 99.6% of computable scale scores. Floor and ceiling effects were minimal. The FSS showed high internal consistency (Cronbach's alpha, 0.95). Unidimensionality was supported based on confirmatory factor analysis with the comparative fit index being 0.94. The FSS showed moderate/high correlations with the perceived burden of the disease, quality of life and disease severity, whereas, age or gender did not have a significant effect on the FSS score. Conclusions The Finnish version of the FSS showed satisfactory reliability and validity and thus can be regarded as a feasible measure of self‐reported fatigue.


| INTRODUCTION
Fatigue is considered to be one of the most common and disabling symptoms of multiple sclerosis (MS), affecting about 80% of patients (Minden et al., 2006;Weiland et al., 2015). However, there is, no universally accepted definition for fatigue. MS-related fatigue has been reported to manifest itself as an overwhelming sense of tiredness and lack of energy that affects a patient's participation in the activities of daily living and work. Fatigue is observed at all stages of disability and clinical forms of the disease (Induruwa, Constantinescu, & Gran, 2012).
The causes of fatigue in MS are multifactorial and not well understood.
Fatigue has been associated with dysfunction in the central nervous system and in immune-and neuroendocrine regulation. Pro-inflammatory cytokines, over activity of neural circuits, defects in pre-frontal basal ganglia circuitry, and axonal injury have been suggested as possible mechanisms (Induruwa et al., 2012). Depressive symptoms, impaired sleep, heat sensitivity, physical deconditioning, and medications have also been related to fatigue in MS (Induruwa et al., 2012).
Fatigue assessment typically relies on subjective self-report questionnaires. Fatigue has been reported as a more frequent symptom in patients with higher disability (Amtmann et al., 2012;Armutlu et al., 2007;Mills & Young, 2010;Valko, Bassetti, Bloch, Held, & Baumann, 2008), in association with unemployment (Johansson, Ytterberg, Hillert, Widen, & von Koch, 2008;Mills & Young, 2010) as well as in progressive phenotypes of the disease (Mills & Young, 2010). Conversely, no significant association with demographic factors, such as age (Mills & Young, 2010;Valko et al., 2008) or gender (Valko et al., 2008) has been reported. A frequently used inventory for the evaluation of fatigue is the Fatigue Severity Scale (FSS) developed by Krupp et al. for the use in patients with systemic lupus erythematosus and MS (Krupp, LaRocca, Muir-Nash, & Steinberg, 1989). The FSS, a nine-item questionnaire, primarily focuses on the motor aspects of fatigue, the main emphasis being the assessment of the severity of fatigue symptom and its impact on an individual's daily functioning. Each item of the questionnaire is scored on a seven-point Likert scale ranging from 1 ("completely disagree") to 7 ("completely agree"; Table 1). The mean score of the nine items is used as the FSS score. Originally, the cut-off score for fatigue was set to be ≥4 (Krupp et al., 1995), because fewer than 5% of healthy controls rated their fatigue above this level while 60%-90% of patients with medical disorders experienced fatigue at or above this level (Krupp et al., 1989). Subsequent studies have recommended the same cut-off score (Armutlu et al., 2007;Valko et al., 2008). The categorization into non-fatigue (FSS ≤4.0), borderline fatigue (4.0 < FSS < 5.0) and fatigue (FSS ≥5.0) has also been suggested (Johansson, Ytterberg, Back, Holmqvist, & von Koch, 2008;Ottonello, Pellicciari, Giordano, & Foti, 2016).
As the validity and reliability of an assessment are contextual (i.e., The aim of this study was to evaluate the psychometric properties of the Finnish version of the FSS and to describe the results among patients with MS. The specific aims were to evaluate the validity and reliability of the FSS and its dimensional structure.

| Patients
This was a retrospective, cross-sectional postal survey. The study protocol was approved by the ethics committee of the Hospital District of 1.

| Outcome measures
Patients were required to complete the survey questionnaire or were interviewed via telephone using the Finnish questionnaire adapted from that used in previous, multi-national studies (Karampampa, Gustavsson, & Miltenburger, 2013). The questionnaire included demographic background variables (e.g., age, gender, employment status, and early retirement due to MS), disease information (e.g., year of diagnosis, age at the diagnosis, type of MS, and self-assessment of disease severity by

Patient Assessment of Expanded Disability Status Scale (EDSS) Levels
(a method widely used in cost-of-illness studies in MS (Kobelt, Berg, Lindgren, & Jönsson, 2006)). The self-perceived feelings of fatigue were evaluated with the FSS (Krupp et al., 1989). The study population and methods have been described previously, (Ruutiainen, Viita, Hahl, Sundell, & Nissinen, 2016). The perceived quality of life was evaluated using the generic EuroQol 5D-3L instrument (EQ-5D) including five domains of well being (mobility, personal care, usual activities, pain/ discomfort, and anxiety/depression) using a social tariff established with the general population in UK (EuroQol Group, 1990). The EQ-5D has been officially translated into Finnish in 1991. The visual analog scale (VAS) was used to assess patients' perceived health state on a scale of 0 (worst imaging health state) to 100 (best imaginable health state) (EuroQol Group, 1990

| Statistical methods
Psychometric properties of the FSS were evaluated using standard methods (Nunnally & Bernstein, 1994) including: •

| Demographic and clinical characteristics of the sample
The study sample (n = 553) was representative of all ages, MS phenotypes and levels of disability. Sample demographics and disease characteristics are summarized in Table 2. The mean (SD) age was 53.8 (11.4) years. A majority (76.1%) of the patients were within the working age (<63 years). The mean patient-assessed EDDS score of the study sample was 4.0 (2.5).

| Data quality
The percentage of missing data was low (0.4%), and the percentage computable scale scores were high (99.6%; Table 3).

| Scaling assumptions
The frequency distribution of item response was relatively symmetrical; item mean scores ranged from 3.9 to 5.2 (SD, 1.8-2.2). Item to total correlations were acceptable (range, 0.626-0.875; Table 3).

| Reliability
Cronbach alpha reliability coefficient for the entire sample was 0.949 showing high degrees of internal consistency of the FSS. When deleting one item of the FSS, the Cronbach alpha values did not change markedly (range, 0.939-0.951; Table 3).

| Validity
The correlations between the FSS and other outcomes are provided in Table 5. The construct validity of the FSS was confirmed by moderate/high Spearman's rank coefficient correlations between the FSS and burden of the disease (MSIS-29), quality of life (EQ-5D and VAS), and disease severity (EDDS). Higher fatigue scores were associated with a greater perceived burden of the disease, lower quality of life, and higher disease severity.
Known-group validity was also supported ( Table 6). As predicted, mean fatigue scores for patients who were retired due to their MS were significantly higher than that for patients who were employed, when limiting the comparison to age groups <63 years. Additionally, mean fatigue scores for patients with greater disease severity were higher than that for patients with milder disease severity. Similarly, mean fatigue score for patients with progressive disease phenotype (secondary or primary progressive) was higher than that for patients with relapsing-remitting form of the disease. In contrast, mean fatigue scores did not differ according to age groups or gender.

| DISCUSSION
This study examined the psychometric properties of the FSS in a large sample of Finnish patients with MS. Consistent with the findings from other language versions of the FSS (Al-Sobayel et al., 2016;Amtmann et al., 2012;Armutlu et al., 2007;Bakalidou et al., 2013;Krupp et al., 1989;Learmonth et al., 2013;Lerdal, Johansson, Kottorp, & von Koch, 2010;Valko et al., 2008) showing that the influences of language and cultural background might not be significant in the FSS among patients with MS. In the total sample, 360 (65%) patients were classified as fatigued when using a score of ≥4.0 as a criterion for self-perceived fatigue. When using more stringent scores (≥4.5) 56% and (≥5.0) 48% of the patients in the present sample were evaluated as fatigued. Using a score of ≥4.0 as a criterion for possible fatigue is supported by the overall frequency estimates (80%) (Minden et al., 2006;Weiland et al., 2015). Typically floor and ceiling effects are considered problematic when more than 15% of the sample has either the lowest or the highest possible score (Terwee et al., 2007). In our study sample, the FSS did not show ceiling (3.5%) or floor (2.5%) effects of this magnitude supporting previous findings (Amtmann et al., 2012).
The reliability analyses included estimation of item to total correlations and internal consistency. High item to total correlations (r, range   Armutlu et al., 2007;Bakalidou et al., 2013;Ottonello et al., 2016;Valko et al., 2008). The optimal Cronbach's alpha range has been reported to be between 0.7 and 0.9 for internal consistency or item homogeneity, while values over 0.9 have been suggested to show item redundancy (Boyle, 1991). Our results together with previous findings suggest some redundancy in item content in the FSS and therefore a possibility to shorten the scale without a significant loss of precision. Item numbers 1 and 2 have previously shown relatively low inter-item correlations (Lerdal et al., 2005) and reliability (Bakalidou et al., 2013). Subsequently, based on Rasch models, it has been suggested that by eliminating item number 1 (Ottonello et al., 2016), or item numbers 1 and 2 (Lerdal et al., 2010), better psychometric properties than those in the original nine-item version may be obtained. Based on the Rasch analyses, even a shorter five-item version (by eliminating item numbers 1, 2, 6, and 8) that satisfies strict tests of unidimensionality has been recommended (Mills, Young, Nicholas, Pallant, & Tennant, 2009). These shortened versions have however been found to show relatively high ceiling effects (Mills et al., 2009;Ottonello et al., 2016).
Additionally, the five-item version was found to be less sensitive to detect differences between groups and change over time (Lerdal et al., 2010). We found that inter-item correlations and item to total correlations were the lowest for the item numbers 1 and 2 (item to total r, 0.626 and 0.707, respectively). However, these correlations were also considerably higher than the 0.40 threshold value that is commonly interpreted as an evidence of scale reliability (Everitt, 2002). Additionally, in CFA, the FSS showed a CFI of 0.94. A CFI of ≥0.90 has been suggested as a criterion for acceptable fit of the scale in a unidimensional model (Hu & Bentler, 1999). Previously reported CFIs for the FSS were 0.97 (Amtmann et al., 2012) and 0.99 (Bakalidou et al., 2013). These  (Learmonth et al., 2013), disease severity (Armutlu et al., 2007;Flachenecker et al., 2002;Valko et al., 2008), depression (Armutlu et al., 2007;Bakalidou et al., 2013;Flachenecker et al., 2002), pain (Amtmann et al., 2012) hypothesized group differences based on previous findings concerning employment status Mills & Young, 2010), disability (Amtmann et al., 2012;Armutlu et al., 2007;Mills & Young, 2010;Valko et al., 2008), disease phenotype (Mills & Young, 2010), as well as the demographic factors, age and gender (Valko et al., 2008).
Progressive disease (higher disability and progressive phenotype) as well as retirement due to MS were found to be associated with higher levels of fatigue as evaluated by the FSS. In contrast, age or gender did not have an effect on the FSS scores.
The limitations of this study should be considered. The response rate was relatively low (37%) (Ruutiainen et al., 2016). Thus, it is possible that the sample is not representative. The responders can be argued to have more severe fatigue than the non-responders which may increase the risk for "selection bias." As described previously (Ruutiainen et al., 2016), the demographic and disease related characteristics of the study population represent well the general MS population. Additionally, the evaluations, including the severity and the phenotype of the disease, were based on patients' self-reports. Although this method is widely used in cost-of-illness studies in MS (Kobelt et al., 2006), we cannot rule out the possibility that some of the evaluations might have been different if based on clinician's evaluation. Possible "selection bias" and "information bias" may affect the generalisability of the findings of this study. Further, since depressive patients were not excluded from the study sample and depression was not evaluated, we cannot rule out the effects of depressive symptoms on the FSS scores observed in this study. Cross-sectional data did not allow evaluation of the responsiveness of the FSS for change, an important aspect of psychometric functioning. The evaluation of test-retest reliability or comparison of the FSS to other fatigue scales or between MS patients and healthy controls was not possible. Traditional methods comparable with previous studies were adopted to establish reliability and validity in this study. Strengths of the study include good data quality at least partly explained by the possibility to fill in the questionnaires via telephone interview.
In conclusion, this study supported the validity and reliability of the Finnish version of the FSS in patients with MS. The scale appears psychometrically feasible to assess perceived fatigue among Finnish patients with MS.