Sensitivity to Change and Minimal Important Differences of the LupusQoL in Patients With Systemic Lupus Erythematosus

Objective As a health‐related quality of life (HRQOL) measure, the LupusQoL is a reliable and valid measure for adults with systemic lupus erythematosus (SLE). This study evaluates the responsiveness and minimal important differences (MIDs) for the 8 LupusQoL domains. Methods Patients experiencing a flare were recruited from 9 UK centers. At each of the 10 monthly visits, HRQOL (LupusQoL, Short Form 36 health survey [SF‐36]), global rating of change (GRC), and disease activity using the British Isles Lupus Assessment Group 2004 index were assessed. The responsiveness of the LupusQoL and the SF‐36 was evaluated primarily when patients reported an improvement or deterioration on the GRC scale and additionally with changes in physician‐reported disease activity. MIDs were estimated as mean changes when minimal change was reported on the GRC scale. Results A total of 101 patients were recruited. For all LupusQoL domains, mean HRQOL worsened when patients reported deterioration and improved when patients reported an improvement in GRC; SF‐36 domains showed comparable responsiveness. Improvement in some domains of the LupusQoL/SF‐36 was observed with a decrease in disease activity, but when disease activity worsened, there was no significant change. LupusQoL MID estimates for deterioration ranged from −2.4 to −8.7, and for improvement from 3.5 to 7.3; for the SF‐36, the same MID estimates were −2.0 to −11.1 and 2.8 to 10.9, respectively. Conclusion All LupusQoL domains are sensitive to change with patient‐reported deterioration or improvement in health status. For disease activity, some LupusQoL domains showed responsiveness when there was improvement but none for deterioration. LupusQoL items were derived from SLE patients and provide the advantage of disease‐specific domains, important to the patients, not captured by the SF‐36.


INTRODUCTION
The survival of patients with systemic lupus erythematosus (SLE) has improved in the last 50 years from less than 50% at 5 years in 1955 to 85% at 10 years and recently, 75% at 20 years (1). The Outcome Measures in Rheumatology Clinical Trials group and the US Food and Drug Administration (FDA) have recommended that for clinical trials and observational studies, health-related quality of life (HRQOL) should be assessed using both generic and disease-specific measures, allowing comparison with healthy samples, estimates of health utilities, and diseasespecific information known to be important to patients (2,3). HRQOL instruments provide a standardized, valid, and reliable way of gaining the patient's perspective as to how they are affected by SLE and the benefits and limitations of interventions. HRQOL in SLE is poorly correlated with the clinicians' assessment of disease activity and damage (4,5), as some symptoms are only known to the patient (e.g., fatigue, nausea). Therefore, HRQOL measurement can provide added value because it can supply information not captured by other outcome measures. HRQOL may be informative not only as an efficacy measure, but it also potentially reflects safety issues, and for these reasons HRQOL is becoming important in labeling claims (6,7).
The LupusQoL is a valid, reliable, patient-derived, disease-specific HRQOL measure for adults with SLE (8) that contains items/domains more relevant to patients with SLE than generic measures (9). As with many HRQOL measures, the interpretation of the data may be problematic and should not be based solely on P values, especially if HRQOL is a secondary outcome when a trial tends not to be powered for HRQOL. To aid the interpretation of the LupusQoL, evaluation is required to assess its sensitivity to change (the ability to detect an improvement or deterioration when patients deem themselves to have improved or deteriorated) (10) as advocated by the regulatory bodies (2), and to estimate the minimal important difference (MID), the smallest difference that patients perceive as beneficial or harmful (11).
This study aimed to evaluate these parameters, using both anchor-based and distribution-based methods, for each domain of the LupusQoL and the Short Form 36 health survey . Specifically, the study looked at the ability of the scales to detect an improvement in HRQOL following treatment of a severe or moderate flare, to detect deterioration in HRQOL (e.g., when patients fail to have their disease controlled by their initial treatment plan), and to estimate the MIDs. The responsiveness of the LupusQoL and the SF-36 was evaluated primarily when patients reported an improvement or deterioration on the global rating of change (GRC) scale (12) and, second, with changes in physician-reported disease activity.

PATIENTS AND METHODS
Study design. This was a prospective, longitudinal, observational study. The study was granted multicenter Research Ethics Committee approval (MREC 02/05/035) and was carried out in compliance with the Helsinki Declaration at the following rheumatology units: Bangor, Birmingham (2 centers), Blackburn, University College London, Nottingham, Manchester, Doncaster, and Sheffield. All patients gave written informed consent.
Patient inclusion and exclusion. Patients were recruited over an 18-month period and were followed at 4-week (6 2 weeks) intervals for 9 months. The inclusion criteria were fulfillment of $4 American College of Rheumatology (ACR) criteria for SLE (13,14), ages $16 years, literate ability in the English language, willingness to give written informed consent, and a flare of SLE requiring specific treatment. A flare was defined as a significant increase in disease activity, resulting in a British Isles Lupus Assessment Group 2004 (BILAG-2004) index A or B score, based on criteria that are new or worse (15)(16)(17). To be included in this study, patients had to require an increase in therapy defined as $1 of the following: an increase of oral prednisolone to $20 mg/day, introduction of methotrexate, parenteral methylprednisolone, and/or other immunosuppressive therapy (e.g., cyclophosphamide, rituximab). The exclusion criteria were ages ,16 years, inability to read English, inability to give valid consent, and pregnancy.
Assessment measures. Demographic and clinical details were recorded at baseline by the clinician (date of birth, sex, date of diagnosis, fulfilment of ACR criteria for SLE, ethnic group [18], marital status, and current therapy). The Systemic Lupus International Collaborating Clinics/ACR Damage Index (SDI) (19) was reported twice, at baseline and at the end of the study. The BILAG-2004 disease activity index was assessed at each visit.

Significance & Innovations
The LupusQoL, a patient-derived disease-specific health-related quality of life (HRQOL) measure for adults is sensitive to change in health status and can be recommended for use in clinical trials.
The LupusQoL domain minimal important differences for deterioration range from 22.4 to 28.7 and for improvement from 3.5 to 7.3.
LupusQoL items were derived from systemic lupus erythematosus patients and provide the advantage of disease-specific domains, important to them, not captured by the Short Form 36 health survey.
These results will allow appropriate power calculations and interpretation of HRQOL measurements in clinical trials and longitudinal observational studies.
The original English version of the LupusQoL (4-week recall period) (8) was completed by the participant at each time point. It has 8 domains: physical health, pain, planning, body image, burden to others, intimate relationships, emotional health, and fatigue. This instrument has good internal reliability (Cronbach's a 5 0.88-0.96), test-retest reliability (intraclass correlation coefficient [ICC] 0.72-0.93), and concurrent validity with comparable domains of the SF-36 (ICC 0.71-0.79). It has acceptable ceiling effects and minimal floor effects. Scoring of each domain of the LupusQoL is such that 0 5 worst health and 100 5 best health (8).
Patients completed the SF-36 (UK version 1) with a 4-week recall at each assessment (20). The SF-36 measures 8 dimensions of health: physical functioning, social functioning, role limitations due to physical problems, role limitations due to emotional problems, mental health, energy/vitality, bodily pain, and general health perception. Domain scores can range from 0 to 100 (higher scores indicate a better HRQOL).
To estimate patient-reported change, each domain of the LupusQoL and the SF-36 incorporated the GRC scale (12). Patients were asked to rate change in each domain over the past 4 weeks from 7 (a very great deal better) to 27 (a very great deal worse) with 0 indicating no change. Scores of 21 to 1 were classified as no change, with 27 to 22 as deterioration and 2 to 7 as improvement. Within the deterioration and improvement categories, scores of 2, 3, 22, and 23 were considered to represent minimal, but nevertheless, important changes.
At baseline and each review visit, the clinician assessed the SLE disease activity using the BILAG-2004 index (15). The BILAG-2004 category scores A to E are based on intention to treat, where A 5 severe disease activity, B 5 moderate disease activity, C 5 mild, stable disease, D 5 inactive disease but previously affected system, and E 5 a system that has never been involved. Changes in overall disease activity between consecutive time points, as measured by the BILAG-2004 index, were defined as follows: 1) deterioration, with any system changing to A from B/C/D or to B from C/D (21); 2) improvement, with all systems changing from A to B/C/D and B scores changing to C/D (22), with no deterioration in any system (one persistent B score was allowed if there was improvement from A or B in at least 1 other system); 3) persistent inactive disease, with all systems scored as C/D/E at both time points; and 4) persistent active disease, with A or B system scores remaining unchanged but without overall improvement or deterioration. When changes of activity of a single BILAG system were analyzed, the above definitions applied, but only for that system, and no persistent B score was allowable for improvement.
Statistical methods. The sample size calculation was based on summary statistics from previous work during the LupusQoL development and based on the changes of the physical health domain. A sample size of 52 would have 80% power to detect a difference in means of 4, assuming an SD of 10, using a paired t-test with a 5% significance level. The intention was to recruit 104 patients, to allow for patients who did not report changes in HRQOL, expected smaller effect size for the other domains, missing data, and dropouts. All analyses were performed using Stata Release software, version 13 (23).
Determination of sensitivity to change (responsiveness). The primary method for assessing responsiveness was based on patient-reported GRC scores. Responsiveness was also examined using physicianreported disease activity change scores. Based on GRC or disease activity change scores, each domain of both HRQOL measures was evaluated to determine its ability to detect an improvement in HRQOL following treatment of a flare and to detect deterioration in HRQOL (e.g., when treatment has undesirable and troublesome side effects or the patients fail to have their disease controlled by their initial treatment plan). Responsiveness was estimated as the mean change in HRQOL domain score across participantreported improvements or improvement of disease activity, and across participant-reported deteriorations or deterioration of disease activity, between consecutive assessments. We calculated 95% confidence intervals (95% CIs) using robust methods for estimating the standard error in Stata, based on the approach proposed by Huber (24).
Additionally, standardized response means (SRMs), the ratio of the mean change of the domain score between consecutive observations and the corresponding estimated SD of the change score, were reported based on GRC scores. Effect sizes, for which the mean change of each domain score was standardized using the estimated SD of the baseline score, were also reported based on GRC scores. Both are standardized measures of responsiveness, with the SRM having the advantage that it is less affected by the heterogeneity of the sample by using a more appropriate SD, namely that of the change score. SRMs or effect sizes of 0.2, 0.5, and 0.8 are deemed to demonstrate small, moderate, or large responsiveness, respectively (25). We also explored changes in relevant domains of the LupusQoL and SF-36 to changes of key systems in the BILAG-2004 index. The musculoskeletal and mucocutaneous systems are the most commonly affected systems in SLE patients, and therefore we explored the relationships between musculoskeletal system changes and the changes in physical health and pain domains of the LupusQoL and the changes in physical functioning and bodily pain domains of the SF-36. We also explored the mucocutaneous system changes and the changes in the body image domain of the LupusQoL.
Estimation of MIDs. Methods for estimating the MID are either anchor-based (sometimes referred to as minimum clinically important difference) or distributionbased. We use the term MID as this is the more dominant term in the current literature (26) for both approaches, as ultimately they seek to establish the same property. We will illustrate the difference in the methodology by using MID(a) for the anchor-based approach or MID(d) for the distribution-based approach. No single approach is perfect, and multiple strategies are likely to enhance the interpretability of changes in HRQOL scales (11,27). An anchor-based method was used as the primary approach (as preferred by the FDA) (2), based on the average change in LupusQoL or SF-36 scores for the subset of patients who were considered to have a small but discernible change in that particular HRQOL domain. These analyses were complemented by distribution-based approaches based around the common standards of 1 SEM, using data from McElhone et al (8) and 0.5 SD, which corresponds to a medium effect (28,29).

RESULTS
Patient recruitment and followup. During the 18month recruitment period, a total of 115 patients from 9 centers were deemed eligible for the study and approached. Four patients declined to participate and 111 patients were recruited. Of these, 101 patients completed the study and are reported here (Figure 1  When the overall disease activity lessened, 6 domains of the LupusQoL (physical health, pain, planning, emotional health, body image, and fatigue) and 7 of the SF-36 domains (physical functioning, bodily pain, mental health, social functioning, role emotional, role physical, and vitality) showed an improvement. For the remaining LupusQoL and SF-36 domains, changes were small and nonsignificant. When overall disease activity increased,  MID results. Using the anchor-based approach, the MID(a)s for improvement and for deterioration for each of

DISCUSSION
Knowing whether, or to what extent, a patient has improved or deteriorated following a course of treatment is fundamental to clinical practice. This work has demonstrated that all 8 of the LupusQoL domains are sensitive to change and able to identify patient-reported improvements and deteriorations. With changes in physician-reported disease activity, there were less consistent findings of improvement in 6 of 8 LupusQoL domains when disease activity lessened but little or no responsiveness with worsening disease activity. There may be several reasons for this difference: 1) physician-reported disease activity measures a different concept from HRQOL, hence the FDA recommendation that responsiveness should be measured against the patient GRC, 2) patients may perceive improvement more clearly than deterioration, particularly after having a flare, 3) the number of patients in the deterioration subgroups, especially when single BILAG system changes were examined, may be insufficient to detect significant changes, and 4) the assessment over a month may be too short a time period for change to occur in some domains of the LupusQoL following a change in disease activity. Different LupusQoL domains had different patientreported MID(a)s, which also differed for deterioration and improvement. When looking for an improvement in SLE, the MID ranges from 4 to 7 points, depending on the domain. For the SF-36, the MIDs range from 3 to 11 points. These results will allow appropriate power calculations and interpretation of HRQOL measurements in clinical trials and longitudinal observational studies. MIDs are not without problems, in that different methodologies (anchor-based or distribution-based) generated somewhat different MIDs, and the MID reflects the difference that is important at a group, but not the individual, level. Regulatory bodies advocate the use of anchor-based methods in the estimation of responsiveness as they use patient ratings (2), even though the reliability of patients' estimates of their previous health status has been questioned (30,31).
This study recruited patients with moderate or severe flares and is likely to be representative of patients recruited into clinical trials. Notably, the original LupusQoL mean scores derived from consecutive outpatients at UK centers were strikingly higher (by approximately 20-25 points) across all domains (8) than the baseline values for these patients with moderate or severe flares. Such large differences suggest that a flare of SLE has a very significant impact on all aspects of HRQOL and may also explain why the LupusQoL is less responsive to deterioration of disease activity, as patients already have poor health.
There have been 2 publications regarding the sensitivity to change of the LupusQoL. Using the Canadian version of the LupusQoL, Touma et al (32) concluded that its responsiveness was similar to that of the SF-36 following a 12month prospective cohort study of consecutive patients at a single tertiary center. However, only changes in the disease activity measure, the Systemic Lupus Erythematosus Disease Activity Index 2000 (SLEDAI-2000) (33,34), were used to estimate responsiveness, while in our study the patient-reported GRC scale was used to estimate responsiveness as recommended by the regulatory bodies (2,3), in addition to a disease activity measure (the BILAG-2004 index). Results of a multitertiary center cohort study, recruiting consecutive patients using the French version of the LupusQoL, assessed patients at 3 and 6 months (35). The anchors for improvement and deterioration included a patient-reported 7-point Likert scale and visual analog scale (100 mm). A Likert scale of 5 patient-reported symptoms extracted from the Systemic Lupus Assessment Questionnaire (SLAQ) was also used (36,37). The French language LupusQoL and the SF-36 showed comparable responsiveness, and the MIDs were similar for both measures. Despite the different patient selection criteria (consecutive recruitment/SLE flare; single tertiary center/ multicenter study), length of followup period, different methods (anchor-based, distribution-based), and scales to evaluate sensitivity to change (GRC scale, SLAQ, SLEDAI-2000), there is agreement that the LupusQoL demonstrates sensitivity to change in SLE.
In this study that recruited patients with active lupus, the LupusQoL and SF-36 appear to be more responsive to improvement than deterioration; this result was also noted in the French study (35). Researchers previously reported that patients with other conditions detected improvements following treatment more easily than deterioration. Patients reported that they often did not realize how much they had deteriorated until they started to improve (38). This recognition is an encouraging finding, especially when the LupusQoL is recommended for use in clinical trials. When patients improve during and after an intervention, the LupusQoL should be able to detect these changes. In contrast, in a study of SLE patients that employed the SF-36, deterioration of HRQOL was perceived more readily than improvement (39). However, that article described studies in a clinical trial setting, using an immunologic anchor as a marker of improvement.
In spite of a large data set and rigorous followup schedule, our study had little missing change score data on most domains (approximately 15%). The majority of patients were white (62.6%), but other groups were represented, including 15.2% of South Asian origin. Although monthly followups may not have allowed sufficient time for an intervention to take effect and for some HRQOL domains to change as different domains may change over different periods of time, monthly reviews did ensure that relapses and the effects of these on HRQOL were not missed.
The assessment of lupus disease in clinical trials should involve patient-reported outcomes, including a global assessment and specific instruments that capture the impact of the disease on the patient quality of life. The LupusQoL has previously demonstrated good construct, face, discriminative, and concurrent validity, and internal and test-retest reliability, and has been mapped to the Short Form in 6 dimensions (8,40,41). Linguistic validations have enabled the instrument to be employed successfully in 51 countries using 77 different languages (42). This study demonstrates the responsiveness of the instrument and further construct validity as compared with the SF-36 and provides the MIDs. The SF-36 and the LupusQoL are similar in terms of responsiveness, but the items on the LupusQoL were informed by patients with SLE, and therefore it has the advantage of several SLE-specific domains that are important to patients (planning, burden to others, intimate relationships, and body image) (9) that are not captured by the SF-36.