Psychometric Properties of Preference-Based Measures for Economic Evaluation in Amyotrophic Lateral Sclerosis: A Systematic Review

Objective The aim of this review was to synthesize the psychometric properties of generic preference-based measures (PBMs) of health-related quality of life (HRQL) in Amyotrophic Lateral Sclerosis (ALS). Methods A systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Four databases were searched from inception to April 2019: OVID Medline, Embase, PsycINFO, and CINAHL. Studies were included if (1) the sample represented individuals with ALS, (2) a generic PBM was utilized and reported on, and (3) information on the psychometric property of a generic PBM was provided. Results Ninety-one articles were screened, and 39 full-text articles were reviewed. Seven full-text articles were included in this review. The mean age of participants ranged from 58.1 to 63.8 years, and mean time since diagnosis ranged from 20.5 to 44.6 months. Two generic PBMs were found, the EQ-5D-3L (n = 6) and the Quality of Well-Being Self-Administered (QWB-SA) scale (n = 1). Convergent validity of the EQ-5D-3L was large against a global scale of self-perceived health (r = 0.60) and small to large against ALS specific HRQL measures (r = 0.19 to 0.75). For the QWB-SA scale, correlations were small against a generic measure (r = 0.21) and large against ALS specific measures (r = 0.55). The EQ-5D-3L discriminated across different disease severity; however, floor effects were reported. Conclusion This review highlights the need for more rigorously designed studies to assess the psychometric properties of generic PBMs in ALS and the development of an ALS specific PBM that adequately reflects the health concerns of individuals with ALS.


Introduction
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease characterised by selective and progressive degeneration of voluntary motor neurons [1]. Adults with ALS have an overall mortality rate of 80% within the first 2 to 5 years after diagnosis and experience wide variability in disease severity and disease progression [2]. e disease affects more than 200,000 people worldwide in mid to late adulthood with an average age of onset of 55-66 years [3]. Signs and symptoms of ALS include (a) muscle weakness and atrophy resulting in loss of muscle control; (b) spasticity; (c) bulbar symptoms such as speech and swallowing difficulties; and (d) respiratory symptoms [4]. With disease progression and the resulting symptoms and loss of independence, the health-related quality of life (HRQL) of individuals with ALS is severely impacted [4][5][6][7].
HRQL instruments provide a structured way of including the patient's perspective when evaluating the influence of a disease and its treatments on one's physical, mental, and social well-being [5,7,8]. HRQL can be assessed using health profiles or preference-based measures (PBMs; also known as utility measures). Health profiles, such as the ALS Specific Quality of Life-Revised (ALSSQOL-R) scale, are scored by subscales and do not produce a single index score useful for economic evaluation purposes [5,9,10]. PBMs, on the other hand, are scored from 0.0 (death) to 1.0 (full health) and provide a single value of HRQL [9]. ey can be used by researchers and policymakers for economic decision-making purposes to calculate quality-adjusted life years (QALYs) and determine the cost-effectiveness of interventions in ALS [9].
Existing PBMs used with individuals with ALS are generic and consist of measures such as the Short Form 6 Dimension (SF-6D) [11], Health Utilities Index Mark 3 (HUI3) [12], and EuroQol 5 Dimension (EQ-5D) (3 and 5 levels) [13,14]. For some conditions, such as rheumatoid arthritis [15], cardiovascular disease [16], and various cancers [17], these measures have established estimates of reliability and validity. However, the reliability and validity of PBMs have not yet been summarized for ALS. As these measures were not developed specifically for individuals with ALS, it is important to assess their psychometric properties in this population [18].
is will assist in understanding whether the values obtained by the scoring system are valid and can be utilized by researchers and policy makers for clinical and cost-evaluation purposes. erefore, the aim of this review was to synthesize the psychometric properties of generic PBMs of HRQL in ALS.

Methods
A structured search was conducted in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [19] reporting guidelines to identify possible articles that report information on the psychometric properties of PBMs of HRQL in ALS. COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) guidelines for systematic reviews of patient-reported outcome measures [20] were used to facilitate the understanding of a systematic review on PBMs and determine the quality of PBMs used.

Study Selection.
Two independent reviewers (NP and AM) identified potentially relevant articles by systematically screening titles/abstracts and then selecting full-text articles for inclusion. Reasons for exclusion were recorded, and if present, differences in responses between the two reviewers were discussed and a consensus reached. A third reviewer (AK) was consulted if a consensus was not reached. Studies were included if (1) the study sample represented individuals with ALS, (2) a generic PBM of HRQL was utilized and reported on, and (3) potentially relevant information on the psychometric property of a generic PBM was provided, whether this was their objective or not. Only full-text articles written in English or French and published in peer-reviewed journals were included in the review. Grey literature, conference proceedings, and abstracts were excluded.

Data Extraction.
e following information was extracted independently, by two reviewers (NP and AM), from the full-text articles selected for data extraction: (i) study characteristics: author(s), year of publication, study design, study purpose, and study setting, (ii) sample characteristics: sample size (N), age, gender, time since diagnosis (months), ALS diagnosis, and disease severity, (iii) PBM(s) used (mean ± standard deviation (SD)), and (iv) psychometric properties. Specifically, the following metrics were sought from the included articles: (i) Reliability(test-retest reliability): the extent to which scores of a measure have not changed over time, provided the characteristics being measured do not change [21,22]. (ii) Content validity: the degree to which the content of an instrument is an adequate reflection of the construct of interest [21]. (iii) Construct validity Convergent validity: the degree to which scores of two measurement instruments relate when measuring a similar construct of interest [21,23]. Discriminative (known-groups) validity: the degree to which an instrument is able to discriminate between two groups that differ on the construct being measured [24]. (iv) Predictive validity: the extent to which measurement instrument scores are an adequate reflection of a gold standard for the construct of interest in the future [18]. (v) Responsiveness: the ability of an instrument to detect change over time in the construct of interest [21]. (vi) Floor/ceiling effect: the percentage of the sample obtaining scores at the lower and upper ends of the scale, respectively [18]; known as a form of interpretability that can affect the responsiveness of an instrument [18]. Floor and ceiling effects were deemed significant when percentage values >15% were seen [25].

Evaluation of Measurement
Properties. e evaluation of measurement properties consisted of three steps. First, the methodological quality of studies was assessed using the relevant boxes for each measurement property included in the COSMIN Risk of Bias Checklist [26]. Second, the results of each study were rated against COSMIN's criteria for good measurement properties as either sufficient (+), insufficient (−), or indeterminate (?) [26]. ird, all results were rated and graded using COSMIN's modified Grading of Recommendations Assessment and Development and Evaluation (GRADE) approach (Supplementary File, Tables 2 & 3) [20,26]. e evaluation of measurement properties could only be assessed for studies whose primary or secondary objective(s) was to evaluate the psychometric properties of a PBM [27]. e hypotheses derived were used to evaluate the psychometric properties when evaluating results against COSMIN's criteria for good measurement properties [26]. Reliability correlation coefficients were hypothesized to be greater or equal to 0.70 [18]. For measures assessing similar constructs (e.g., HRQL), we hypothesized large correlations of ≥0.50 [18,28]. For measures assessing related, but dissimilar constructs (e.g. function/disease severity), we hypothesized a medium correlation of 0.30-0.49 [18,28]. For discriminative (known-groups) validity, we hypothesized a significant difference in mean scores (p < 0.05) between groups of different predetermined variables (e.g., ALS severity levels) [26]. For predictive validity, areas under the curve (AUCs) were hypothesized to be greater than or equal to 0.70 [26]. Responsiveness was hypothesized to be significant at p < 0.05 or with an AUC ≥0.70 [26].

Results of Search.
A total of 135 records were identified through the database searches. Forty-four records were removed due to duplication, resulting in a total of 91 articles for screening. Fifty-two articles were excluded during the initial screening of titles and abstracts. From this, 39 full-text articles were assessed for eligibility, whereby 32 of those articles were subsequently excluded. Articles were excluded if (i) a generic PBM was not assessed (n � 4), (ii) the psychometric properties of a generic PBM was not assessed (n � 8), (iii) the study did not report on or assess the population of interest (n � 4), and (iv) articles were grey literature, conference proceedings, or abstracts (n � 16). is left seven full-text articles for inclusion in the review.  Table 1 presents key characteristics and psychometric properties from each study included in the review. Sample sizes across the seven studies ranged from 19 to 214 participants and 31% to 49% female. e mean participant age ranged from 58.1 to 63.8 years, and a mean time since ALS diagnosis of 20.5 to 44.6 months. ALS severity was classified according to (i) the ALS Functional Rating Scale-Revised (ALSFRS-R) (mean score � 32.63) [29]; (ii) the ALS Severity Scale (ALSSS) (mean score � 27.1) [31]; (iii) high or low severity classified as requiring caregiver assistance or not (75% of sample classified as high) [32]; or (iv) the ALS Health State Scale (ALS/HSS) (27-29% of sample classified as moderate or severe ALS) [33,34]. If ALS severity was not reported, ALS diagnosis was classified using the El Escorial criteria with 21% to 47% of the sample classified as probable or definite ALS [30,35].
e EQ-5D-3L is a widely used generic PBM of HRQL [37]. It consists of five domains (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) and produces a single index score for health utility ranging from −0.594 for the worst possible health state to 1.0 for the best possible health state [38]. e QWB scale is an interview-administered scale that has been developed for self-administration (QWB-SA). It combines three scales of functioning with a measure of symptoms and problems and produces a single index score that ranges from 0.0 (death) to 1.0 (full function) [39].
Only one [35] of the seven included studies' primary purpose was to evaluate the psychometric property of a generic PBM, the QWB-SA scale. e remaining six [29][30][31][32][33][34] studies reported information on the psychometric properties of a generic PBM, the EQ-5D-3L; however, it was not the purpose of their study. Mean EQ-5D-3L scores ranged from 0.18 to 0.54, and a range of 37 to 214 individuals with ALS were included in these studies. A mean QWB-SA score of 0.43 was reported, and nineteen individuals with ALS were included in this study [35].

Psychometric
Properties. Convergent validity, discriminative (known-groups) validity, and floor effects were reported in the seven included studies.

Convergent Validity.
For the EQ-5D-3L, convergent validity was evaluated in four out of six studies (Table 2) [29][30][31]33]. A large correlation of 0.60 with the EQVAS was reported in a single study (n � 77) [33]. Correlations with a disease-specific health profile, the ALS Assessment Questionnaire 40 (ALSAQ-40) subscales. ranged from small with the Eating and Drinking (ALSED) subscale (r � 0. 19) to large with the Activities of Daily Living and Independence (ALSADL-I) subscale (r � −0.75) [33]. A large correlation of 0.72 with the disease-specific functional measure, ALSFRS-R, was reported in a smaller study (n � 46) [29]. A medium correlation of 0.43 was found with social support, as measured by the FSozU K-14 measure [29].

Floor Effects.
Floor effects were reported for the EQ-5D-3L, where 54% to 92% of individuals with ALS reported moderate or severe problems across all five dimensions of the measure (Table 1) [29, 31, 34].

Evaluation of Psychometric Properties.
Six out of seven studies could not be evaluated on the psychometric properties reported, as only one [35] of the seven studies' primary purpose was to evaluate the psychometric property of a generic PBM. For this study [35], a methodological quality analysis of the data resulted in a serious risk of bias, determined using COSMIN's risk of bias checklist [27]. In grading the quality of evidence using the GRADE approach and in accordance with hypotheses, there was serious inconsistency, very serious imprecision, and serious indirectness.
is resulted in an overall rating of very low (Table 1).

Discussion
To our knowledge, this was the first study systematically reviewing the psychometric properties of generic PBMs in ALS. Across the seven studies included in this review, only the EQ-5D-3L and the QWB-SA scale were used in ALS. Furthermore, convergent validity, known-groups validity,    and floor/ceiling effects were the only psychometric properties assessed for these measures in this population. Our review revealed that other important psychometric properties of PBMs (i.e., content validity, reliability, and responsiveness) have not yet been evaluated in ALS. Furthermore, none of the included studies, with one exception, were specifically designed to assess the psychometric properties of a generic PBM in the ALS population [35]. When the methodological quality of this study was assessed, the quality was graded as very low, preventing an accurate conclusion regarding the usability of the QWB-SA scale in the ALS population. e EQ-5D-3L was highly correlated with the ALSFRS-R, an ALS specific functional rating scale reflective of disease severity; well exceeding our hypothesized correlation of less than 0.5 (for comparison of dissimilar constructs HRQL and disease severity). is is not entirely unexpected however as both the EQ-5D-3L [9] and the ALSFRS-R [40] contain similar domains, such as mobility and self-care, that are highly affected in ALS: this may explain the large correlations observed between the two measures [5]. Moreover, mobility is a domain that is greatly affected in various conditions, including ALS [41], due to its relation to independence and quality of life. As such, it is often included as a construct in many generic PBMs of HRQL. e QWB-SA scale, however, may not be a generic measure that can be used in this population due to our study's findings and the unique nature of symptoms experienced by individuals with ALS. For example, the QWB-SA scale contains items that address mobility; however, the items are symptoms and limitations focused with little emphasis on ALS-relevant items such as functional mobility, speech, or pain [35,42]. is could result in items that are not relevant to this population or even an underrepresentation of items that are relevant. Furthermore, the structure of the QWB-SA scale includes a style of item weighting that results in items relevant to individuals with ALS to contribute much less to the overall score. Furthermore, the QWB-SA scale was shown to weakly correlate with the generic SF-36 (r � 0.21) and strongly correlate with the disease-specific SIP/ALS-19 health profile (r � 0.55). Respectively, a correlation ≥0.50 and a correlation of 0.30-0.49 would be expected; however, the opposite was observed. Additionally, this was the only study included in the review with the primary purpose of psychometric evaluation. When the quality of evidence was assessed, it was deemed to be poor [27,35]. As the QWB-SA scale was observed to correlate weakly with certain domains of the SF-36 that were similar in the EQ-5D-3 L and the ALSFRS-R, items included in the QWB-SA scale may not truly capture what is important to individuals with ALS or be the best tool for use in this population. However, as only one study has assessed this, further research is warranted in order to make accurate recommendations.
At a total score level, the EQ-5D-3L measure in ALS was able to discriminate between patients across disease severity as evidenced by significant differences in mean scores. However, at the individual item level, there is a prominent floor effect as majority of individuals reported moderate or severe problems in EQ-5D-3L domains, indicating the full scope of the disease is not being captured. is can affect the responsiveness of an instrument and the ability to accurately detect change over time [18]. For individuals with ALS, this is important to take note of as responsiveness is a critical property for assessing the cost-effectiveness of interventions in ALS [9]. Moreover, content validation, a fundamental component of validity, was not assessed in any of the studies. As such, generic PBMs may miss domains that are important or specific to individuals with ALS. For example, valued domains such as recreation and leisure activities and interpersonal relationships have been identified by individuals with ALS to be important to their quality of life [8]. However, these domains are not always assessed by generic PBMs. e development of an ALS-specific PBM would be one possible solution to help ensure that included domains reflect the health concerns of individuals with ALS.
PBMs, such as the EQ-5D-3L and the QWB-SA scale, were developed to provide evidence on the benefits or harms of a treatment on HRQL from the patient's perspective [9]. ey provide a single index value of HRQL used to produce QALYs in order to evaluate the cost-effectiveness of interventions for a health condition [9]. PBMs can be of great use to patients, clinicians, and researchers alike; however, our results indicate there is limited evidence of their psychometric properties in ALS.
e EQ-5D consists of two parts. e first part (the descriptive system) assesses health in five domains: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. In addition to this, the EQ-5D contains a visual analogue scale (VAS) of self-rated health, scored from 0 to 100. e scores from the VAS cannot be used directly as weights in QALY calculations as they do not produce a single index value; however, the scores can be used as a subjective assessment of self-perceived health. It can provide clinicians and researchers with insight into how individuals perceive their overall health status, and how it changes over time with treatment. More recently, a 5-level version of the EQ-5D (EQ-5D-5L) was developed to improve the sensitivity of the measure and reduce ceiling effects [14]. e measure maintains the five domains from the EQ-5D-3L but expands from 3 to 5 response levels (no, some, moderate, severe, and extreme problems). e EQ-5D-5L defines a total of 3125 health states (5 5 ) 14 , a substantial increase from the EQ-5D-3L with 243 health states (3 5 ) 13 and has been translated into more than 170 languages worldwide. For the EQ-5D-5L, each domain is scored from 1 to 5 and a utility value is derived from the five questions. To produce a single index score, a time-trade-off value set with general population preferences was recently developed for Canada [43]. e EQ-5D-5L may be useful in ALS; however, more studies should be conducted with the primary purpose of psychometric evaluation and utilization of this measure. is would result in a stronger conclusion regarding the appropriateness of the EQ-5D-5L for clinical research and economic evaluation.
One limitation for this systematic review is the small sample of studies included. As only one study's primary purpose was the psychometric evaluation of a generic PBM, there is limited evidence regarding the psychometric properties of generic PBMs in ALS. Another limitation is the use of only two generic PBMs in ALS; this may result in an imprecise representation and accuracy of generic PBMs' use in ALS.

Conclusion
To our knowledge, this is the first study systematically reviewing the psychometric properties of generic PBMs in ALS. e EQ-5D-3L was the most reported generic PBM. Although this measure demonstrated convergent and known-groups validity in ALS, significant floor effects were observed for all items, indicating that questions may not be appropriate for individuals with ALS. e only other measure used was the QWB-SA scale, which showed poor quality in its assessment of convergent validity and revealed items that are not relevant to individuals with ALS. Furthermore, there were psychometric properties of generic PBMs that have not been assessed in ALS, namely, content validity, reliability, and responsiveness. erefore, our results highlight the need for more rigorously designed studies assessing the psychometric properties of generic PBMs in ALS or the development of an ALS specific PBM that reflects the health concerns of individuals with ALS.
Data Availability e data that support the findings of this study are available in this published article and in the supplementary material of this article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.