Responsiveness of the short-form health survey and the Parkinson’s disease questionnaire in patients with Parkinson’s disease

Background The responsiveness of a measurement instrument is important for understanding its ability to detect changes in the progression of a disease. We examined and compared the internal and external responsiveness of the 36-item Short-Form Health Survey (SF-36) and the 39-item Parkinson’s Disease Questionnaire (PDQ-39) in patients with Parkinson’s Disease (PD). Methods Seventy-four patients with PD were evaluated using the SF-36 and PDQ-39 at baseline and again after one year. In addition, their motor signs, motor difficulties of daily living, and depressive symptoms were assessed as external criteria. The internal responsiveness was examined using effect size, standardized response mean, and the Wilcoxon signed rank test. The external responsiveness was examined using receiver operating characteristic curves, correlation analyses, and regression models. Results Both instruments were partially sensitive to changes during the 1-year follow-up and able to discriminate between patients with improved versus deteriorated motor signs. In addition, both were similarly responsive to changes in the motor difficulties of daily living; the SF-36 appeared to be more sensitive than the PDQ-39 to changes in depressive symptoms. Conclusions The SF-36 and the PDQ-39 were acceptably internally and externally responsive during the 1-year follow-up. Electronic supplementary material The online version of this article (doi:10.1186/s12955-017-0642-8) contains supplementary material, which is available to authorized users.


Background
Parkinson's disease (PD) is a progressive neurodegenerative disorder; therefore, measuring the patient's healthrelated quality of life (HRQoL) to reflect the progression of PD is important. HRQoL refers to the status of one's health in the physical, emotional, and social well-being [1]. Soh et al. [2] reported that the 36-item Short-Form Health Survey (SF-36) is the most frequently used generic HRQoL instrument, and that the 39-item Parkinson's Disease Questionnaire (PDQ-39) is the most frequently used disease-specific HRQoL instrument. Both measures have good internal consistency, stability, and discriminant validity [3,4], and both are recommended by the Movement Disorder Society [4,5].
The responsiveness of a measurement instrument is important because it reflects the extent to which the instrument can detect changes in the progression of a disease and whether it can show longitudinal validity [6,7]. Two major types of responsiveness have been recommended: internal and external [8]. Internal responsiveness refers to the ability of an instrument to detect change at two different time points. External responsiveness refers to the ability of an instrument to change relative to the change of a reference measure. These two types of responsiveness provide different but complementary information: internal responsiveness reflects patient-level changes, and external responsiveness reflects a different view of patient status using another available measure. Although the responsiveness of HRQoL instruments has been examined in several studies [9,10], few have examined the comparative responsiveness of the SF-36 and the PDQ-39 in patients with PD. Schrag et al. [11] reported that the PDQ-39 was acceptably responsive to small changes in PD progression during a 1-year follow-up, independent of any intervention or external measure. Using self-reported changes in health status as external criteria, Brown et al. [12] reported that the SF-36 was more responsive than was the PDQ-39 in an 18-month follow-up. However, neither study [11,12] included clinician-judged motor evaluation or psychological measures as external criteria to evaluate the responsiveness of the HRQoL instrument.
We assessed and compared the internal and external responsiveness of the SF-36 and PDQ-39 in patients with PD. Evaluations of motor signs and motor difficulties of daily living and a measure of depressive symptoms were used as external criteria. Many interventions for patients with PD have targeted motor problems and depression [13,14], but few have examined the comparative responsiveness of the SF-36 and the PDQ-39 against these external criteria to determine which one to use in longitudinal outcome research in patients with PD.

Study design and participants
Between September 2014 and August 2015, participants were consecutively recruited from the neurology departments of two medical centers in southern Taiwan. The inclusion criterion was a diagnosis of PD by a neurologist. The exclusion criterion was a Saint Louis University Mental Status Examination (SLUMS) [15] score that indicated severe cognitive impairment (a score < 19 for people with less than a high school education and < 20 for those with a high school education or above). The SLUMS is a 30-point questionnaire used to screen for cognitive impairment; it includes items for orientation, memory, attention, and executive function. Participants were concurrently interviewed and evaluated face-toface at baseline and after 12 months of follow-up. The notion that a 1-year follow-up was sufficient was based on a prior study [16]. Most (80.4%) evaluations were carried out when the participants were in the "on" phase of the medication cycle.
This study followed the principles of the Declaration of Helsinki. The study protocol was approved by the Institutional Review Board of National Cheng Kung University Hospital (B-ER-101-171) and the Institutional Review Board of E-Da Hospital (EMRP-102-024). Written informed consent was obtained from each participant.

Measures
HRQoL was examined using the SF-36 and PDQ-39. The SF-36 [17] includes eight domains: physical functioning, role limitations caused by physical problems, bodily pain, general health perceptions, mental health, role limitations caused by emotional problems, social functioning, and vitality. Each domain score and the summary score range from 0-100, with higher scores indicating a better HRQoL. The Taiwanese version of the SF-36 has been tested in middle-aged women [18] and stroke patients [19] and has been reported to be reliable and valid.
The PDQ-39 [20] has eight domains: mobility, activities of daily living, emotional well-being, stigma, social support, cognition, communication, and bodily discomfort. It is composed of a summary index and eight domain scores, each of which ranges from 0 to 100. A higher score indicates a more frequent self-perceived difficulty in HRQoL. The Taiwanese version of the PDQ-39 has been reported to be reliable and valid in patients with PD [21].
Parts II and III of the Movement Disorder Society Revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS) [22] and the Geriatric Depression Scale (GDS) [23] were used as external criteria. The MDS-UPDRS Part II includes 13 items that assess motor difficulties of daily living, and Part III contains 18 items that examine motor signs. Each item uses a 5-point Likert scale (0 = "no problems identified" and 4 = "severe problems identified"), and item scores are summed for the total score of each part [22]. The 30-item GDS is a yes/ no screening scale for depressive symptoms. The GDS has good psychometric properties and has been reported as efficient in patients with PD [24,25].
According to Revicki et al. [26], the external criteria for assessing the responsiveness might be patient-rated global improvement, clinical measures with established responsiveness, or some combination of clinical and patientbased outcomes. Because it is important to select external criteria that are relevant for the disease indication and to use multiple independent external criteria, we chose the MDS-UPDRS and GDS, which are both recommended by the Movement Disorder Society [22,24] and frequently used for evaluating motor function and depression symptoms in patients with PD. In addition, the minimal clinically important difference (MCID) has been established for the MDS-UPDRS Part III, which served as a reference for evaluating responsiveness of HRQoL measures in the present study.

Statistical analysis
We used the MCID of the MDS-UPDRS Part III [27] as a criterion for classifying patients with PD into Improving, Stable, and Worsening groups during the 1-year follow-up. Patients with MDS-UPDRS Part III change scores < −3.25 were considered improving, and those with change scores > 4.63 were considered worsening. The internal responsiveness of the SF-36 and PDQ-39 was then calculated using effect size (ES), standardized response mean (SRM), and the Wilcoxon signed rank test [8]. The latter was used because the change scores of the SF-36 and PDQ-39 did not follow the normal distribution. Significance was set at p < 0.05.
ES is a ratio of mean change scores between the baseline and follow-up measure divided by the standard deviation (SD) in the baseline measure. The SRM is the mean of the differences between the baseline and followup, divided by the SD of the changed scores. Negative values in the ES and in the SRM represent a worse HRQoL and positive values represent a better HRQoL. In the ES and SRM, a value of 0.2 represents a small sensitivity to change, 0.5 a moderate sensitivity, and 0.8 a large sensitivity [8].
External responsiveness was evaluated using receiver operating characteristic curves (ROCs), correlation analyses, and regression models. The ROCs were used to assess the ability of a measure to reflect a change or lack of change in the external criteria [8]. The area under the ROC is calculated for the range from 0.5 (not accurate for distinguishing improvers from non-improvers) to 1.0 (perfectly accurate) [28] based on the external criteria. When calculating the ROC, the MCID of the MDS-UPDRS Part III was used to judge whether a patient improved, remained stable, or worsened. Correlation analyses of the change scores between the HRQoL measures and the GDS and MDS-UPDRS Part II were done using Spearman's rank-correlation. Correlation coefficients of 0.25-0.49, 0.5-0.74, and ≥ 0.75 show small, moderate, and strong associations, respectively [29]. Simple linear regressions were used for a relative estimate of the degree of variance in the change of the external criterion that could be explained by the change score of the HRQoL instrument. SPSS 17.0 was used for statistical analyses (IBM, Chicago, IL, USA).

Results
At the 1-year follow-up, we had completed assessing 74 of the 95 enrolled patients with PD. Twenty-one (22.1%) patients were lost to follow-up: 2 had died, 7 could not be contacted, and 12 had dropped out. The baseline demographics and evaluation scores were not significantly different between the 74 patients who completed the study and the 21 who did not. Most of the patients were male (67.6%) and at Hoehn and Yahr stages I and II (75.6%) at baseline (Table 1). One year of follow-up showed that the severity of the PD and motor signs of the 74 patients had significantly increased.
The MCID of the MDS-UPDRS Part III showed that 16 of the 74 patients had improved and that 34 were worsening ( Table 2). The improved group had significant differences between baseline and follow-up scores in the SF-36 general health and PDQ-39 mobility domains. The worsening group had significant differences between baseline and follow-up scores in the SF-36 social functioning, vitality, and total scores, and in the PDQ-39 emotional well-being, social support, communication, bodily discomfort, and summary index scores. The values of the area under the ROCs indicated that both the SF-36 and the PDQ-39 discriminated between patients with improved and with worsening motor signs (Table 3). In addition, the change in the MDS-UPDRS Part II scores between baseline and follow-up had similar degrees of association with the changes in the SF-36 (r = −0.40, p < 0.01) and PDQ-39 (r = 0.45, p < 0.01) scores. The changes in the GDS scores were moderately associated with the changes in the SF-36 (r = −0.53, p < 0.01) scores and only slightly associated with the changes in the PDQ-39 (r = 0.29, p < 0.05) scores. Regression analyses showed that the changes in the SF-36 and PDQ-39 scores explained more than 20% of the variance in the change scores of the MDS-UPDRS Part II (Table 3). A one-unit change in the SF-36 and in the PDQ-39 scores, on average, corresponded with approximately −0.15 and 0.25 points, respectively, in the changes in the MDS-UPDRS Part II scores. In addition, the change in the SF-36 scores explained 23% of the variance in the change scores of the GDS. A one-unit change in the SF-36 score corresponded with a change of approximately 0.18 points in the GDS score.

Discussion
We examined the internal and external responsiveness of the SF-36 and PDQ-39 in a 1-year follow-up of patients with PD. We found that both HRQoL instruments partially detected changes in patients with improved and worsening motor signs, and were able to discriminate between them. In addition, both instruments were The minimal clinically important difference (MCID) of the Movement Disorder Society Revision of the Unified Parkinson's Disease Rating Scale Part III was used to classify PD patients into improved and worsening groups *Significant differences between baseline and follow-up in the improved group by using Wilcoxon test, *p < 0.05, **p < 0.01 † Significant differences between baseline and follow-up in the worsening group by using Wilcoxon test, † p < 0.05, † † p < 0.01  similarly responsive to changes in the motor difficulties of daily living, and the SF-36 was more sensitive to change in depressive symptoms. Our findings support the longitudinal validity of the SF-36 and PDQ-39 for measuring the long-term health outcomes of patients with PD. However, our finding that the PDQ-39 was as responsive as the SF-36 for detecting 1-year change is inconsistent with Brown et al. [12]. The inconsistency might be attributable to the different criteria used: they used a subjective rating of overall HRQoL, but the present study used a clinician-judged evaluation of motor signs. The internal responsiveness of the HRQoL measures also seemed to depend upon the criteria used.
We examined the internal responsiveness of the domain scores to better understand the characteristics of these two HRQoL instruments. The significant and moderate changes in the social support and communication domains of the PDQ-39 in the worsening group signal the continuous decline in social interaction associated with worsening motor signs in patients with PD. The motor signs evaluated in the MDS-UPDRS Part III included speech, facial masking, rigidity, bradykinesia, gait, posture, and tremor. A quantitative study [30] reported motor signs as a significant predictor of communication in the PDQ-39. In addition, much qualitative research [31][32][33] has vividly described how patients with PD perceived their movement and communication difficulties to be distressing and socially embarrassing. Such stigmatized feelings might subsequently lead to decreased social interaction. Additional research is needed to quantitatively examine the relationships between motor signs, experienced stigma, and social interaction in patients with PD.
The results of the internal responsiveness of the two HRQoL scales might be related to the type of anchor used, the characteristics of the HRQoL scales, and the changes that occurred in our patients. We used the MICD of the MDS-UPDRS Part III as a criterion for classifying patients into improving, stable, and worsening groups. The post hoc calculation of the change scores of the other measures for these three groups indicated that while the stable group had smaller changes in the MDS-UPDRS Part II and PDQ-39 than did the other two groups, they had substantial changes in the GDS and SF-36 that were comparable to those of the other groups (Additional file 1).
Moreover, when we calculated the changes in domain scores (Additional file 2), we found that the stable group had smaller changes than did the other two groups in the SF-36 domains of physical functioning and rolephysical, and in the PDQ-39 domains of mobility and ADL. In contrast, the stable group had a notable decline in the SF-36 domains of general health and role-emotional, and in the PDQ-39 domain of emotional well-being, which might be associated with increased depression as measured by the GDS.
Overall, the heterogeneous results among the domains of the HRQoL scales reflect the complex interplay between the physical, psychological, and social factors of an individual. Using the MDS-UPDRS Part III as the anchor has the strength of distinguishing those with changed motor signs, but the results might not be generalizable if other anchors (e.g., GDS) are used.
Using the MCID of the MDS-UPDRS Part III as a gold standard for evaluating external responsiveness, we found that both the SF-36 and the PDQ-39 discriminated between patients with improved and with deteriorated motor signs. In addition, when the MDS-UPDRS Part II was used as the external criterion, the SF-36 and PDQ-39 had similar degrees of sensitivity to changes in the motor difficulties of daily living. Our post hoc correlation analyses of the change scores between the HRQoL domains and the MDS-UPDRS Part II showed significant correlations in the SF-36 physical functioning, bodily pain, social functioning, and vitality domains (rs = −0.233~−0.487) and significant correlations in the PDQ-39 mobility, activities of daily living, emotional well-being, social support, and bodily discomfort domains (rs = 0.300~0.375). This suggests that the motor difficulties of daily living in patients with PD contribute to compromised HRQoL across physical, psychological, and social domains.
When the GDS was used as an external criterion, the SF-36 appeared to be more responsive to changes in depressive symptoms than did the PDQ-39. This might be because the SF-36 has three domains (mental health, role-emotional, and vitality) that ask questions directly related to psychological health, while the PDQ-39 has only one (emotional well-being). Our post hoc correlation analyses of the change scores between the HRQoL domains and the GDS also found that SF-36 had more domains significantly correlated with the GDS than did the PDQ-39. Significant correlations were found in six of eight domains in the SF-36 (physical functioning, rolephysical, general health, mental health, role-emotional, and vitality; rs = −0.237~−0.532), while in the PDQ-39, significant correlations were found in only three of eight domains (mobility, emotional well-being, and social support; rs = 0.285~0.316).
It is noteworthy that while the change scores of the SF-36 domains were significantly correlated with the change scores of either the MDS-UPDRS Part II or the GDS, some domains (stigma, cognition, and communication) of the PDQ-39 were correlated with neither one. Because the PDQ-39 items were generated from indepth interviews with patients as concerns of the effect of PD, future research should include other external criteria (e.g., social interaction) to validate these unique domains.
This study has some limitations. First, we used convenience sampling and recruited our participants from two medical centers. The generalizability of our findings might thus be limited to clinic-based patients with PD. In addition, 22.1% of participants at baseline were lost to the 1-year follow-up. Although their baseline demographics and evaluation scores were not significantly different from those of the patients who completed the study, the smaller sample size might affect the validity of our results and conclusion. Moreover, all patients were tested at a single point in time without controlling for the medication cycle phase during the interview or evaluation. We allowed medication effects to vary naturally, because the purpose of this study was not to test responsiveness to any intervention. However, this design might lead to some confounding of the evaluation results. Future research probably should control for participants' medication status to arrive at a more precise estimation of intervention effectiveness. Finally, psychometric testing of the Taiwanese version of the SF-36 has been done only in middle-aged women and stroke patients [18,19]. Future research probably should examine the psychometric properties of the SF-36 in Taiwanese patients with PD.