Psychometric validation of the EuroQoL 5-Dimension 5-Level (EQ-5D-5L) in Chinese patients with adolescent idiopathic scoliosis

Background Scoliosis is a common spinal deformity that occurs often during adolescence. Previous studies suggested that adolescent idiopathic scoliosis (AIS) patients can have various aspects of their lives being affected, due to disease presentation and/or treatment received. It is important to define a reliable instrument based on which the affected patients’ health-related quality of life can be assessed. This study aims to assess the validity, reliability and sensitivity of the EuroQoL 5-dimension 5-level (EQ-5D-5L) in Chinese patients with AIS. Methods Adolescent idiopathic scoliosis patients of Chinese descent were prospectively recruited to complete both the traditional Chinese versions of the EQ-5D-5L and the refined Scoliosis Research Society-22 (SRS-22r) questionnaires. Patients’ demographic profiles and corresponding clinical parameters including treatment modalities, spinal curve pattern and magnitude, and duration of bracing were recorded. Telephone interviews were then conducted at least two weeks later for the assessment of test-retest reliability. Statistical analysis was performed: construct validity of the EQ-5D-5L domains were assessed using Spearman’s correlation test against the SRS-22r; whereas intra-class correlation coefficient (ICC) was used to assess the test-retest reliability, and agreement over the test-retest period was expressed in percentages. Also, the sensitivity of the EQ-5D-5L in differentiating various clinical known groups was determined by effect size, independent t-test and analysis of variance. Results A total of 227 AIS patients were recruited. Scores of domains of the EQ-5D-5L correlated significantly (r: 0.57-0.74) with the scores of the SRS-22r domains that were intended to measure similar constructs, supporting construct validity. The EQ-5D-5L domain responses and utility scores showed good test-retest reliability (ICC: 0.777; agreement: 76.4 -98.1 %). Internal consistency was good (Cronbach’s α: 0.78) for the EQ-5D-5L utility score. The EQ-5D-5L utility score was sensitive in detecting differences between subjects who had different treatment modalities and bracing duration, but not for curve pattern and its magnitude. Conclusions The EQ-5D-5L is found to be a valid, reliable and sensitive measure to assess the health-related quality of life in Chinese AIS patients. This potentiates the possibility of utilizing the EQ-5D-5L to estimate AIS patients’ health-related quality of life, based on which the outcome of various treatment options can eventually be evaluated. Electronic supplementary material The online version of this article (doi:10.1186/s13013-016-0083-x) contains supplementary material, which is available to authorized users.


Background
Scoliosis can be defined as a torsional spinal deformity, in which the 3-dimensional geometry of the spine is changed as a result of the combination of a translation and rotation of variable number of vertebrae [1]. A majority of scoliosis is idiopathic and presents during adolescence [2]. These patients with adolescent idiopathic scoliosis (AIS) often present at variable curve magnitudes upon the first consultation and the curvature may progress depending on the initial magnitude of curve and status of skeletal maturity [3,4]. The natural history may also be affected by the introduction of any intervention such as bracing before patients have reached skeletal maturity [5].
Besides the obvious radiographic differences in curve magnitude, any treatment option can only truly demonstrate benefit with superior patient-perceived outcome measures. It is thus necessary to explore patients' quality of life. This is particularly important in AIS as previous reports suggest that these patients experience relatively poorer psychosocial functioning, self-perception of body image, and health-related quality of life versus their nonscoliotic peers [6]. When compared to their healthy peers, AIS patients undergoing brace treatment may be negatively affected in terms of psychosocial well-being [7,8]. Among various treatment modalities, AIS patients with observation may experience a better score for body image and quality of life than braced patients [9,10]. On the contrary, there are studies suggesting that there are no differences in the quality of life between patients treated with bracing and those under monitoring only [11]; and even between braced/operated patients and the general population in the long-term [12].
The reported evidence here suggest that AIS can affect the health-related quality of life (HRQOL) of the affected adolescents, which can be variable depending on the severity of disease presentation, and different treatment options. With the appropriate indications for treatment in place, healthcare providers may be able to improve the HRQOL of AIS patients with timely interventions. For instance, patients may benefit from interventions, such as psychological therapy accompanying the administration of bracing. This can improve the self-perception of body image, which is a barrier to the initiation and continuation of brace treatment [13], and ultimately enhances brace compliance. Therefore, a reliable instrument tailored for AIS is desirable to assess the physical ability, psychological well-being and psychosocial functioning of these patients. Moreover, the instrument serves as an indicator of how these factors can impact the HRQOL of the AIS population in general.
In fact, validated outcome measurements, together with systemic reviews based on clinical trials, form the scientific framework of evidence-based medicine (EBM), which is used to guide clinical practice [14]. Evidencebased medicine can be defined as an integration of the best research evidence with clinical expertise and patient values [15]. Ultimately, the goal of EBM is to provide scientific information to clinicians to improve the quality of healthcare by taking into account cost, ethics and safety. Adoption of EBM to clinical practice depends on the quality of evidence (i.e. from the validated outcome measurements and systemic reviews on clinical trials), and the willingness of the clinician to apply that evidence to their practice [14]. Therefore, it is of utmost importance to utilize an effective and appropriate objective outcome measure for the assessment of patient values and their quality of life. This can be accomplished by the use of structured questionnaires to measure an individual's perception of his/her physical, mental and social ability to function [16].
Several systematic reviews [6,17,18] have summarized that various instruments can assess the HRQOL of AIS patients, and are primarily classified into two main categories: generic and condition-specific instruments. As generic instruments capture a very broad range of health statuses, condition-specific measures specifically assess the special states and functions of a particular disease in greater details than generic measures [16], with more responsiveness in detecting important changes over time, and better sensitivity in discovering subtle effects of interventions [10,19]. However, disease-specific instruments can only focus on known and anticipated consequences [20,21]. These instruments do not allow obvious comparisons across populations of different diseases, and between outcomes of different treatments for patients with various health problems [16]. On the contrary, generic measures give health state utility values that permit comparisons between patient groups [22], or cost-effectiveness comparisons between different treatment modalities for various diseases [23]. It can be used to generate 'normative values' with which patients with health problems can be compared [16]. Despite generic measures may have value in detecting unexpected positive or negative effects of an intervention [24], its nonspecific nature can have reduced sensitivity in detecting changes caused by interventions in relevance to any one illness, especially in clinical trials. Generic measure allow broad applicability across specialties or populations but is multi-domain. This poses a risk of results misinterpretation if improvement in only a single domain is reported as general improvement in quality of life and may distort general scoring [25].
The refined Scoliosis Research Society-22 (SRS-22r) was originally developed for aiming at measuring spinespecific HRQOL of adolescent or adult patients with scoliosis. Given that two domains (self-image and satisfaction with management) of the SRS-22r are relevant and only specific to scoliosis patients, the measured constructs in the SRS-22r instrument may not fully overlap with generic instruments. Previous studies [26,27] administering both the generic and spine-specific instruments suggest that self-image and satisfaction with management are poorly correlated with domains of generic instruments. This is the case with commonly used instruments like the EuroQoL 5-dimension (EQ-5D) [26] and the 36-Item Short Form Health Survey (SF-36) [27], whose domains do not relate well to spine-specific instruments. Furthermore, generic instruments allow head-tohead comparisons among different health conditions, particularly for the EQ-5D, as a preference-based measure which enables calculation of quality-adjusted life years (QALYs) in economic evaluation. As such, the spinespecific HRQOL instruments may not supersede the generic instruments among AIS patients. Therefore, the aim of the present study was to assess the validity, reliability and sensitivity of the EQ-5D in Chinese AIS patients.

Subjects and setting
Convenience sampling of patients with histological proof of AIS patients of Chinese ethnicity were recruited between August and October 2015 at the Duchess of Kent Children's Hospital in Hong Kong. Exclusion criteria included patients with non-idiopathic scoliosis (congenital/ neuromuscular), who could not understand traditional Chinese, refused to participate or were physically or mentally unfit. This study was ethically approved by the local institutional review board.
Subjects who consented were asked to answer a structured questionnaire which consisted of the EQ-5D-5L questionnaire (Hong Kong (traditional Chinese) EQ-5D-5L version) and the traditional Chinese version of the SRS-22r questionnaire. Half of the subjects were asked to fill in and complete the SRS-22r questionnaire first prior to being given the EQ-5D-5L, and the other half were given the questionnaires in the reversed order.
Demographic data of patients and clinical data at the time of visit were collected. A spine surgeon performed the consultation and radiographic measurement as usual, without prior knowledge of the conduction of questionnaires. The Cobb angle [28] was measured on the whole spine radiograph taken at that appointment and were recorded. Also, the curvatures were classified using the modified Lenke classification system [29] which included six curve types: type 1 (main thoracic), type 2 (double thoracic), type 3 (double major; thoracic curve larger than lumbar curve), type 4 (triple major), type 5 (thoracolumbar or lumbar curve), type 6 (double major; thoracolumbar or lumbar curve larger than thoracic curve). Treatment modalities of whether patients were undergoing observation, bracing, bracing followed by surgery and those who had corrective surgery but presented for regular review, were retrieved from subjects' medical records.
All subjects were scheduled for a telephone interview conducted by a single research personnel in a random order, at least two weeks after their baseline interview. This follow-up interview consisted of administering the two questionnaires in the same order as at baseline. This was structured to assess the test-retest reliability of our study instruments.

Study instruments
The EQ-5D-5L is a generic health status measure developed by the EuroQol Group for measurement of quality of daily life [30], providing descriptions of five dimensions of health status. It is an instrument enabling a quantitative expression of the individual's values and preferences regarding overall health status [16,31]. Being a utility measure, the EQ-5D-5L plays an important role in both clinical and economic appraisal, for instance in the assessment of social value of different healthcare interventions by means of cost-utility analysis [32], and its possible use as decision-aids in individual patient care where patients having difficulties deciding between treatment options [33].
The EQ-5D-5L has five domain scales (mobility, selfcare, usual activities, pain and discomfort, and anxiety and depression) and five levels for each domain. Since the Chinese-specific EQ-5D-5L value set / tariff is currently not available, we applied a two-step indirect approach to estimate the EQ-5D-5L scores applicable for Chinese population, as adopted in previous studies [34]. The first step was the application of an indirect interim mapping method [35]. The EQ-5D-5L health status was transformed to the EQ-5D-3L health status according to the transition probability matrix. Finally, the EQ-5D-3L health status were scored according to a recently developed Chinese-specific the EQ-5D-3L value set ranging from −0.149 for the worst health status ('33333') to 1 for the full health ('11111') [36]. Since the EQ-5D-5L has 5 items, each digit in the five digit codes refers to the status of each dimension, ranging from 1 for no problem, to 5 for severe problem. For example, the five digit of '11111' implies to a health status with no problems in the 5 dimensions, scoring 1 being the best score with no problem in each domain listed in the order of: mobility = 1, self-care =1, usual activities =1, pain and discomfort =1, anxiety and depression =1. A higher score in the EQ-5D-5L indicated better HRQOL.

Refined Scoliosis Research Society-22 (SRS-22r) (Additional file 2)
The SRS-22r is a simple and valid spine-specific healthrelated quality of life instrument developed by the Scoliosis Research Society. It provides an insight into the idiopathic scoliosis patient's perception of his/her condition [37]. The SRS-22r is a refinement of the previous SRS-22 questionnaire, with a minor revision (i.e. Question 18-related to going out, and a concern over Question 15 related to financial considerations), it makes gathering of longitudinal HRQOL information from adolescence through adulthood possible [38].
The SRS-22r had 22 items grouped into five subscales. The domains covered were: Function (5 items), Pain (5 items), Self-image/appearance (5 items), Mental Health (5 items) and Satisfaction with Management (currently undergoing or had been performed -2 items). The sum of domain scores gave the overall SRS-22r total score with a range from 0 to 5. Patients were asked to indicate the stage of undergoing treatment, whether they were present for initial consultation, regular follow-up without intervention, bracing, immediately pre-operative, or postoperative. The SRS-22r questionnaire had been previously validated in the Hong Kong Chinese scoliosis population [39].

Statistical analysis
Descriptive statistics including mean, standard deviation (±SD) and percentage of floor and ceiling of domain and total scores were calculated. At least 15 % of patients achieving the lowest or highest possible score was considered as presence of floor or ceiling effect, respectively [40]. The construct validity of the EQ-5D-5L domain was assessed using Spearman's correlation test against the SRS-22r domain scores holding similar constructs.
The internal consistency was assessed by Cronbach's alpha using a value >0.7 to indicate adequate internal consistency [41]. Test-retest reliability was assessed by examining the weighted kappa for five individual domain responses and the intra-class correlation coefficient (ICC) for the EQ-5D-5L score over the 2-week period. An ICC of ≥0.7 was used to indicate good reproducibility of the EQ-5D-5L score [40]. A weighted Kappa of <0.2 was interpreted as poor agreement of individual domain responses between two assessments, 0.21-0.4 as fair, 0.41-0.6 as moderate, 0.61-0.8 as good and ≥0.8 as very good [42].
The sensitivity of the EQ-5D-5L score was determined by performing known group comparisons by effect size, independent t-test and analysis of variance, where appropriate. Cohen's effect size was calculated as the difference between mean scores, divided by pooled SD. Comparisons of known clinical groups were (i) Observation treatment versus bracing or surgery; (ii) Observation treatment versus bracing only; (iii) Bracing versus surgery; (iv) Duration of bracing: for less than, or more than one year; (v) Curve Pattern: Modified Lenke Classification type 1/2 (thoracic curves only) versus type 5 (lumbar curves only) versus type 3/4/6 (thoracic and lumbar curves); (vi) Curve magnitude: Cobb angle ≤40°v ersus >40°. Data analyses were conducted using SPSS Windows 23.0 (IBM SPSS Inc., Chicago, IL, USA) and STATA version 13.0 (StataCorp LP. College Station, Texas, U.S.). P-value <0.05 was statistically significant.

Results
A total of 227 patients with AIS were recruited to participate in completing both the SRS-22r and the EQ-5D-5L questionnaires. All the patients gave consent and agreed on participation. Hence a total of 227 eligible patients were included in the psychometric validation of the EQ-5D-5L. The mean age was 15.6 (±SD: 4.5) years, 74.9 % of female, and 9.7 % of severe curvature with Cobb angle of >40°. About 62 % were under Observation management with regular follow-up while the remaining subjects were braced before (5.7 %), undergoing bracing (0.8 %) and underwent surgery before (9.3 %). Baseline characteristics of AIS patients are shown in Table 1. Table 2 summarizes the mean, standard deviations, floor and ceiling effects of the EQ-5D-5L and SRS-22r subscale scores, and distribution of the EQ-5D-5L domain responses. No significant floor effect were observed for all HRQOL scores but the EQ-5D-5L (66 %), Function/Activity (67 %), Pain (45 %) and Mental Health Over 70 % of patients perceived as "no problems" in all EQ-5D-5L domains (70.0 -96.3 %). No patients responded with "extreme problems" in Mobility, Selfcare, Usual Activities, and Depression/Anxiety; and "unable to" in all EQ-5D-5L dimensions.
Test-retest reliability of the EQ-5D-5L and SRS-22r are shown in Table 3. There were 20 patients who failed to comply with telephone interviews, and one patient was eliminated at test-retest due to the change in treatment modality from preoperative regular follow-up to postoperative hospitalization. Among 106 (83.5 %) patients assessed in both baseline and 2-week retest interviews, the mean interval between interviews was 19.7 days (range: 15-36 days). The ICC of the EQ-5D-5L and SRS-22r subscales and overall scores exceeded 0.7. Agreement of the EQ-5D-5L domain responses between two interviews ranged from 76.4 % in Pain/discomfort to 98.1 % in Self-care. Cronbach's alpha coefficient was 0.78 in the EQ-5D-5L score, indicating acceptable internal consistency reliability.    Correlations between the EQ-5D-5L domain responses and SRS-22r domain scores are depicted in Table 4. Those patients perceived as "no problems" in the EQ-5D-5L domain had significantly higher Function/Activity, Pain, Appearance and total scores of the SRS-22r than those perceived as having "any problems". However, those patients having "no problems" in Self-care, Usual Activities and Pain/discomfort had significantly lower satisfaction with management than those having "any problems". Furthermore, the EQ-5D-5L score had a strong correlation with Function/Activity (r = 0.715) and total scores (r = 0.735) of the SRS-22r, and a moderate correlation with Pain (r = 0.594) and Appearance (r = 0.512) scores with p < 0.001.
Sensitivity of the EQ-5D-5L in differentiating known clinical groups are displayed in Table 5. The EQ-5D-5L and SRS-22r scores were able to detect statistical differences in treatment modalities (i.e. observation management versus bracing or surgery, or observation management versus bracing). Statistical differences in the EQ-5D-5L was detected between bracing patients with duration of less <1 year and ≥1 year but the SRS-22r did not. Cobb angle and curve type in terms of the modified Lenke classification were not associated with the EQ-5D-5L and SRS-22r scores, with the exception that the patients with severe curvature had worse mental health than those with mild or moderate curvature. No differences in the EQ-5D-5L and SRS-22r scores, apart from Appearance and Satisfaction with Management, between patients undergoing bracing and surgery were observed.
Moreover, the profile of the studied population is presented in Table 6. The EQ-5D-5L was able to differentiate patients undergoing various treatment (Observation versus Bracing/Surgery or Observation versus Bracing), based on the domains of Mobility, Self-care, Usual activities and Pain/discomfort (p ≤ 0.001). The EQ-5D-5L was also able to differentiate among patients who were undergoing bracing based on the duration of bracing, with the most effective, significant domain being Pain/ discomfort with 70 % versus 35.3 % of patients for duration of < 1 year and ≥1 year respectively. However, the EQ-5D-5L cannot differentiate among patients on the basis of the pattern/type (modified Lenke classification) and the magnitude of curvature.

Discussion
Adolescent idiopathic scoliosis is the most common spinal abnormality in the pediatric population as seen by pediatricians and spine surgeons [43], and it can contribute to 70 % of the structural deformities affecting the spine in children and adolescents [44]. Adolescent idiopathic scoliosis patients, whether being compared to their healthy peers or comparing among different types    of treatment, often have various aspects of their life being affected by the spine deformity. Therefore, it is desirable to have a reliable and suitable instrument to assess these patients' HRQOL. The estimated HRQOL not only reflects the impact of AIS, it may also become part of the basis upon which the cost-effectiveness of differential scoliosis treatment options can be evaluated. This psychometric validation study is the first to report the validity, reliability and sensitivity of the EQ-5D-5L questionnaire in AIS patients of Chinese ethnicity. The reliability of an instrument and whether it can reproduce consistent results is important for the assessment of the HRQOL. In this AIS population, the testretest reliability of the EQ-5D-5L is shown to be good, despite having an ICC of less than that for SRS-22r. This is accompanied by a strong agreement for all five domains of the EQ-5D-5L (Mobility, Self-Care, Usual Activities, Pain/Discomfort, Depression/Anxiety). Being only a generic utility instrument of HRQOL, it is of utmost importance to ascertain whether the EQ-5D-5L contains the essential elements required for the assessment of the HRQOL of AIS patients. For AIS, such elements should be important for this age-group, and are tailored for how scoliosis diseases process or presentation can affect the patients.
To substantiate the validity of the EQ-5D-5L specifically for AIS, the SRS-22r was used as it is disease specific for scoliosis and contains multiple items for contributing to one domain score. Both the SRS-22r and EQ-5D-5L questionnaires are commonly-used quality of life outcome tools for patients with spinal deformity [45]. The SRS-22r is a disease-specific outcome measure commonly used for effects of treatment, and it has been used previously to assess the pre-and postoperative quality of life of scoliosis patients and the treatment outcome of bracing [46,47], as well as being used in long-term follow-up to monitor the effects of surgery versus bracing over time or comparing untreated versus bracetreated patients [11,48]. The EQ-5D-5L, on the other hand, is a utility measure which is more commonly used for facilitating the calculation of QALYs. This make costutility analysis and economic evaluations of healthcare interventions possible, such as in the study for different treatment for various diseases like those for rheumatoid arthritis [49], and juvenile idiopathic arthritis [50].
In this study, not only is the EQ-5D-5L found to have a good correlation with the overall utility score of the SRS-22r, it is worth emphasizing that the EQ-5D-5L can reflect upon certain aspects of the SRS-22r with significant strength. The EQ-5D-5L has strong correlation to the Function/Activity domain, and moderate correlations to the Pain, Appearance and Mental Health domains of the SRS-22r. With the EQ-5D-5L, those patients with "no problems" in each domain scored better in the Function/Activity, Pain and Appearance domains of the SRS-22r than those having "any problems", hence resulting in better overall SRS-22r score. Those patients with "no problems" in the domains of Self-care, Usual Activity and Pain/Discomfort in the EQ-5D-5L are less satisfied with their scoliosis treatment than those patients with "any problems". This demonstrates that through the SRS-22r, the EQ-5D-5L has the ability to reflect and put into context in terms of patients' treatment with disease presentation/symptoms, despite the absence of a domain representing patients' satisfaction with management in the EQ-5D-5L. It is worth to take into account the inherent differences of various domains being focused on by the EQ-5D-5L versus the SRS-22r, as Appearance and Satisfaction with Management in the SRS-22r cannot have comparable items in the single-itemed, generic the EQ-5D in general. At the same time, the visual analogue scale (VAS) of the EQ-5D does not have an equivalent item in the SRS-22r. The VAS is a subjective assessment of overall current health. It is of most value when looking at changes within individuals rather than cross-group comparisons [51], but may not produce health state utilities for calculating QALYs [52] with doubts in its estimation of value function [53]. Thus, EQ5D-VAS in this case may not be of great interest as the perception of general health of each patient varies, and any analysis of one score given by individuals is less conclusive as compared to the total score compounded by the five domains covered effectively by the EQ-5D-5L. Also, in studying ceiling effect, two-thirds (66 %) of patients reached the EQ-5D-5L profile of '11111' , suggesting a substantial ceiling effect for the EQ-5D-5L scores. Compared with previous studies in Chinese population, our results demonstrated a consistent pattern of high ceiling effect reported in the general population (78 %) [54] and other chronic conditions [55][56][57]. Overall, the EQ-5D-5L is found to be sensible and appropriate for the administration to AIS patients, based on the above convergent validity demonstrated from consideration of individual dimensions of the instruments.
In addition, there is further investigation into whether AIS patients with different clinical parameters or severity can be detected by the EQ-5D-5L. Both the EQ-5D-5L and SRS-22r are shown to be sensitive in differentiating clinically known groups who are receiving different treatment modalities. These include comparisons between observation treatment and bracing/surgery, and between observation treatment and bracing. Furthermore, patients with the same brace treatment can be further differentiated by the EQ-5D-5L based on the duration of bracing (less than or more than/equal to one year) with statistical significance. This is not observed with the SRS-22r questionnaire. However, the severity of spinal curvatures as denoted by the magnitude of curves in Cobb angles can only be reflected by the Mental Health domain score of the SRS-22r being worse in severe curvature (>40°) than the mild/moderate curves (≤40°). Nevertheless, the overall scores of the SRS-22r and EQ-5D-5L are not significantly different in this AIS population, according to the type and magnitude of curvatures. These results coincide with the findings by Lange et al. [48], in which there were no differences in the HRQOL-scores in patients with various types of curvature: thoracic, thoraco-lumbar, or lumbar major curve. As compared to the depiction by Mental Health domain of the SRS-22r in our results, the Self-image and Pain domain of the SRS-22 questionnaire were significantly worse in their patients with residual large curve magnitudes, but otherwise the reported the HRQOL was not related to curve size [48].
The main limitation of this study is the initial test and test-retest requiring questionnaires to be administered in different methods (test by interviews in person and retest by interviews over the phone). This is due to the routine, regular follow-up of AIS patients being more than two weeks apart, hence it is impractical to request patients to return in close succession for questionnaire interviews only. However, both the EQ-5D and SRS-22r can still have their overall scores reproduced at an acceptable and significant level. Finally, the derivation of the EQ-5D-5L score adopts an indirect two-step approach, rather than direct valuation approach using a preference weighting of the EQ-5D-5L value set. An indirect two-step approach may be subject to measurement errors in the EQ-5D-5L score, but such an approach is still the best available approach for derivation of the EQ-5D-5L scores in Hong Kong where the cultural-specific value set is not available.