Self‐Report versus Clinician Examination in Early Parkinson's Disease

Abstract Background Evaluating the discrepancies between patient‐reported measures and clinician examination has implications for formulating individual treatment regimens. Objective This study investigated the association between health outcomes and level of self‐reported motor‐related function impairment relative to clinician‐examined motor signs. Methods Recently diagnosed PD patients were evaluated using the Parkinson's Progression Marker Initiative (PPMI, N = 420) and the PASADENA phase II clinical trial (N = 316). We calculated the average normalized difference between each participant's part II and III MDS‐UPDRS (Movement Disorder Society Unified Parkinson's Disease Rating Scale) scores. Individuals with score differences <25th or >75th percentiles were labeled as low‐ and high‐self‐reporters, respectively (those between ranges were labeled intermediate‐self‐reporters). We compared a wide range of clinical/biomarker readouts among these three groups, using Kruskal–Wallis nonparametric and Pearson's χ2 tests. Spearman's correlations were tested for associations between MDS‐UPDRS subscales. Results In both cohorts, high‐self‐reporters reported the largest impairment/symptom experience for most motor and nonmotor patient‐reported variables. By contrast, these high‐self‐reporters were similar to or less impaired on clinician‐examined and biomarker measures. Patient‐reported nonmotor symptoms on MDS‐UPDRS part IB showed the strongest positive correlation with self‐reported motor‐related impairment (PPMI rs = 0.54, PASADENA rs = 0.52). This correlation was numerically stronger than the part II and clinician‐examined MDS‐UPDRS part III correlation (PPMI rs = 0.38, PASADENA rs = 0.28). Conclusion Self‐reported motor‐related impairments reflect not only motor signs/symptoms but also other self‐reported nonmotor measures. This may indicate (1) a direct impact of nonmotor symptoms on motor‐related functioning and/or (2) the existence of general response tendencies in how patients self‐rate symptoms. Our findings suggest further investigation into the suitability of MDS‐UPDRS II to assess motor‐related impairments. © 2021 The Authors. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society

A BS TRACT: Background: Evaluating the discrepancies between patient-reported measures and clinician examination has implications for formulating individual treatment regimens.
Objective: This study investigated the association between health outcomes and level of self-reported motorrelated function impairment relative to clinician-examined motor signs. Methods: Recently diagnosed PD patients were evaluated using the Parkinson's Progression Marker Initiative (PPMI, N = 420) and the PASADENA phase II clinical trial (N = 316). We calculated the average normalized difference between each participant's part II and III MDS-UPDRS (Movement Disorder Society Unified Parkinson's Disease Rating Scale) scores. Individuals with score differences <25th or >75th percentiles were labeled as lowand high-self-reporters, respectively (those between ranges were labeled intermediate-self-reporters). We compared a wide range of clinical/biomarker readouts among these three groups, using Kruskal-Wallis nonparametric and Pearson's χ 2 tests. Spearman's correlations were tested for associations between MDS-UPDRS subscales. Results: In both cohorts, high-self-reporters reported the largest impairment/symptom experience for most motor and nonmotor patient-reported variables. By contrast, these high-self-reporters were similar to or less impaired on clinician-examined and biomarker measures. Patientreported nonmotor symptoms on MDS-UPDRS part IB showed the strongest positive correlation with selfreported motor-related impairment (PPMI r s = 0.54, PASADENA r s = 0.52). This correlation was numerically stronger than the part II and clinician-examined MDS-UPDRS part III correlation (PPMI r s = 0.38, PASADENA r s = 0.28). Conclusion: Self-reported motor-related impairments reflect not only motor signs/symptoms but also other selfreported nonmotor measures. This may indicate (1) a direct impact of nonmotor symptoms on motor-related functioning and/or (2) the existence of general response tendencies in how patients self-rate symptoms. Our findings suggest further investigation into the suitability of MDS-UPDRS II to assess motor-related impairments. © 2021 The Authors. When treating Parkinson's disease (PD), the decisions about when to prescribe symptomatic treatments and what agents to use are unique to each person. 1 To formulate the proper treatment regimen, physicians rely primarily on patient-reported symptoms supplemented by expert examination. Moreover, when new drugs are tested, the Food and Drug Administration (FDA) highlights the primacy of patient-reported scales complemented by clinician examinations to achieve the most meaningful outcomes for patients.
In PD, one of the most commonly used scales to assess motor impairment is the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS), which includes a patient-reported questionnaire to assess motor aspects of experiences of daily living (ie, part II) and a clinician-examined motor examination (ie, part III). Because both parts are intended to assess aspects of motor impairment in PD, one would expect a close relationship between these subscales. However, it has previously been shown that self-reported measures may be associated with comorbid conditions such as anxiety, depression, [2][3][4][5] or cognitive deficits. 6,7 In particular, Weintraub and colleagues 8 suggested that nonmotor symptoms of PD may impact patients' endorsement of motor problems in daily life, supported by other studies demonstrating that anxiety and depression are strongly related to self-reported quality of life and motor severity. 9 Furthermore, unawareness or underrecognition of certain symptoms has been well documented in PD. [10][11][12][13][14] Other differences between patient-reported and clinician-examined measures may result from differences in the scale design itself. For instance, whereas MDS-UPDRS III is focused on motor examination only, MDS-UPDRS II was developed to assess the impact of motor symptoms (eg, tremor) on daily activities (eg, eating tasks). 15 Although MDS-UPDRS III cannot be considered objective, the standardized training for administration and welldefined and differentiated response options help reduce the potential for subjective differences between or within raters. In comparison, MDS-UPDRS II provides a more subjective interpretation of the experienced impact of motor impairment that depends on each individual patient's reference system.
Given the extensive use of MDS-UPDRS in clinical practice and clinical trials, we aimed to investigate the relationship between MDS-UPDRS parts II and III specifically to identify (1) whether and to what extent discrepancies between self-assessed and clinicianexamined measures exist and (2) how these differences are associated with demographic and clinical characteristics. This study analyzed data from individuals with early PD participating in two large studies: the Parkinson's Progression Marker Initiative (PPMI) observational clinical study and the phase II clinical trial PAS-ADENA. Gaining a better understanding of potential differences between patients' self-reports and clinical motor examinations has implications for formulating individual treatment regimens and evaluating treatment success in clinical drug development. The findings of this study thereby may help to ensure that individuals with PD receive proper treatment for their needs.

Study Population
PPMI is an international multicentric cohort study to explore the cause and natural history of PD through longitudinal investigation of the progression of PD biomarkers. 16 The PPMI protocol and eligibility criteria are available elsewhere. 16 The PPMI protocol was reviewed and approved by the Institutional Review Board and the Independent Ethics Committee (IRB/IEC) at each center. Written informed consent was obtained from all participants. For the present analyses and among the 423 individuals with de novo diagnosed PD in the PPMI study, we selected 420 with baseline Hoehn and Yahr (HY) stages 1 and 2, who had no missing baseline data on MDS-UPDRS scale (Fig. 1). We used the baseline data (downloaded in June 2019) at which time participants were not taking any symptomatic PD medications.
PASADENA is a phase II clinical trial (NCT03100149) sponsored by F. Hoffmann-La Roche Ltd. The study was approved by the IRB or IEC. Further information on the protocol and eligibility criteria can be found at https:// clinicaltrials.gov/ct2/show/NCT03100149 and in Pagano et al. 17 At baseline, data from 316 individuals with recently diagnosed PD (ie, disease duration <2 years) were available. All patients were either treatment naïve or treated with a stable dose of monoamine oxidase-B inhibitors (36.4%) and had HY stages 1 and 2, with a diagnosis of PD confirmed by the dopamine transporter singlephoton emission computed tomography (DaT-SPECT).
Only baseline data were considered for the present analysis; therefore, no drug effects were investigated.

Self-Reported Motor-Related Impact and Clinician-Examined Motor Signs
We investigated MDS-UPDRS II as the self-reported measurement of motor-related impact experienced by individuals with PD in their daily lives and MDS-UPDRS III as the clinician-examined measurement of motor signs.
Thirteen items in MDS-UPDRS II and 33 items in MDS-UPDRS III are each ranked on a five-point Likert scale (0 = normal, 1 = slight, 2 = mild, 3 = moderate, and 4 = severe). The MDS-UPDRS II score ranges from 0 to 52, and MDS-UPDRS III score ranges from 0 to 132. To facilitate comparisons of relative scores on parts II and III, each score was normalized by the number of possible items on each scale (ie, part II score/13 items and part III score/33 items). Thus, the normalized score represented the average severity rating (0-4 points) per item. The difference in normalized scores was then generated per participant (ie, individual normalized MDS-UPDRS II scoreindividual normalized MDS-UPDRS III score). The resulting difference score represented the average severity of part II relative to part III items, whereby negative values indicated higher average severity scores on part III than part II and vice versa.
The categorization in groups of high-, intermediate-, and low-self-reporters was done separately for each cohort (ie, PPMI vs. PASADENA). Each participant was categorized by their difference score according to their position in the respective frequency distribution. Individuals with a difference score below the 25th percentile were labeled as low-self-reporters (ie, self-report ratings relatively lower than clinician ratings) and those above the 75th percentile as high-self-reporters (ie, self-report ratings relatively higher than clinician ratings). Individuals with score differences between the 25th and 75th percentiles were labeled as intermediate-self-reporters. This approach does not decide the "true" severity of motor symptoms; neither clinician-examined nor selfreport markers were considered the gold standard. Rather, participants were compared with each other in their overall relative self-versus examiner ratings. The variables in each category and their classification and scoring are presented in detail in Appendix S1.

Statistical Analyses
Descriptive results comprised means and standard deviations for continuous variables and frequencies and percentages for categorical variables. Kruskal-Wallis nonparametric tests compared low-, intermediate-, and high-self-reporter scores on continuous variables. Categorical variables were analyzed using Pearson's χ 2 test. Because the groups were assigned according to individual place in the frequency distribution in the respective population and the borders between intermediate-and low-/high-self-reporters were somewhat arbitrary, the primary focus of this study was to compare the extreme ends of the distribution. Therefore, Cohen's d was used to quantify the effect sizes of differences between highand low-self-reporters for continuous variables and Cramer's V for categorical variables.
A significance level of α = 0.05 was used for all statistical tests, not corrected for multiple comparisons.

Secondary Analysis
To obtain further insights into the potential intercorrelation between the patient-reported and clinicianexamined scales, Spearman's correlations tested for associations between MDS-UPDRS parts II and III and for their respective associations with parts IA and IB. Because parts II and III are both intended to assess motor-related impairment, we expected a high correlation between these two scores. Moreover, to identify whether and which factors are associated with selfreported and clinician-examined ratings, we used multivariable linear regression models to predict MDS-UPDRS parts II and III, respectively, with parts IA, IB, and II or III, age, and sex as predictor variables. Variables were scaled before being entered into the regression model.

Parkinson's Progression Marker Initiative
The 420 individuals with PD from the PPMI cohort had a mean baseline age of 61.6 AE 9.7 years, and 65.7% were men. One hundred and three participants were categorized as low-self-reporters, 214 as intermediate-self-reporters, and 103 as high-selfreporters. There were no differences in age, sex, and other sociodemographic variables between the three groups (all P > 0.05; Table 1). Age of onset, duration of PD, the most affected side at the onset of disease, and common genetic variants (β-glucocerebrosidase and leucine-rich repeat kinase 2) were similar between these groups (all P > 0.05; Table 1).
High-, intermediate-, and low-self-reporters differed on almost all patient-reported scales, except for the Questionnaire for Impulsive-Compulsive Disorders in Parkinson's Disease (QUIP; P = 0.1).
Patient-reported measures such as anxiety, depression, somnolence, REM sleep behavior disorder (RBD), and autonomic nervous system symptoms were more frequently reported among high-self-reporters. (Further details are provided in Appendix S1.) For scales with clinician report based on patient information, findings were mixed. In MDS-UPDRS IA, the most complex behavior was observed in high-selfreporters, followed by intermediate-and low-selfreporters, as described earlier. On the Schwab and England Activities of Daily Living (ADL), intermediateself-reporters showed higher functionality compared to both low-and high-self-reporters, whereas no difference between low-and high-self-reporters was found (92.8 AE 5.5 vs. 94.2 AE 5.6 vs. 91.4 AE 6.5, in low-, intermediate-, and high-self-reporters, respectively, P < 0.001).
By contrast, the three groups did not differ on most clinician-examined and objective measures (other than MDS-UPDRS III), except for the HY stage and University of Pennsylvania smell identification test (UPSIT) score. Low-self-reporters were in higher HY stages compared to intermediate-and high-self-reporters, whereas no difference between intermediate-and high-self-raters was found (1.8 AE 0.4 vs. 1.5 AE 0.5 vs. 1.5 AE 0.5, in low-, intermediate-, and high-self-reporters, respectively, P < 0.001). On UPSIT olfactory test, high-self-reporters had the best performance compared to low-and intermediate-self-reporters. Intermediate-and low-selfreporters did not differ from each other (21.8 AE 7.7 vs. 21.7 AE 7.9 vs. 24.1 AE 9.1, in low-, intermediate-, and high-self-reporters, respectively, P = 0.044). Despite higher patient-reported orthostatic symptoms among high-self-reporters versus low-and intermediate-selfreporters (MDS-UPDRS IB, light headedness on standing, P = 0.011), the objectively measured systolic blood pressure drop was not statistically different between the groups (P = 0.6). There were no differences in self-reported cognitive impairment according to MDS-UPDRS IA between groups (P = 0.3). Similarly, there was no difference in performance on any cognitive test (all P > 0.05; Table 2).

PASADENA
In the PASADENA cohort, mean baseline age was 59.9 AE 9.1 years, and 67.4% were men. Three hundred and sixteen participants were grouped in 80 low-selfreporters, 157 intermediate-self-reporters, and 79 highself-reporters. There were no group differences in sociodemographic or disease-related characteristics (all P > 0.05; Table 1).
Regarding patient-reported measures, MDS-UPDRS part I score was lowest in low-self-reporters, followed by intermediate-self-reporters and high-self-reporters (              2.1 AE 1.8 vs. 3.3 AE 2.4 vs. 5.2 AE 3.4, in low-, intermediate-, and high-self-reporters, respectively, P < 0.001) (details of item level are provided in Appendix S1). Patient-reported scores of depression and anxiety, RBD, Parkinson's Disease Questionnaire (PDQ-39) score, and autonomic symptoms were higher in highself-reporters (details in Appendix S1).
For clinician-report scales based on patient information, highest impairment of MDS-UPDRS IA was noted in high-self-reporters, followed by intermediate-and low-self-reporters. In the Schwab and England ADL scale, only high-self-reporters differed from the other groups, whereas no differences were observed between low-and intermediate-raters (92.8 AE 5.9 vs. 93.7 AE 6.2 vs. 90.0 AE 5.8, in low-, intermediate-, and high-selfreporters, respectively, P < 0.001).
The only clinician-examined measure that was analyzed in the PASADENA cohort (other than MDS-UPDRS III) was HY stage, on which high-selfreporters were in lower stages compared to intermediate-and low-self-reporters (2.0 AE 0.2 vs. 1.7 AE 0.5 vs. 1.6 AE 0.5, in low-, intermediate-, and high-self-reporters, respectively, P = 0.0005).

Predictors of MDS-UPDRS Part II and Part III Scores
Summarized findings of the two cohorts are reported in Appendix S1.

Discussion
This study examined the relationship between the degree of self-reported motor-related function impairment and clinician-examined motor sign severity, as evaluated by discrepancies between scores on MDS-UPDRS parts II and III. We found consistent response patterns across variables and two independent cohorts, such that individuals who endorsed a high number and/or severity of experienced motor symptoms (part II) also endorsed a greater number of nonmotor symptoms across the spectrum of patient-reported questionnaires. However, when compared to clinician-examined ratings, point estimates tended to go in the opposite direction, such that high-self-reporters tended to have less severe clinician-examined objective signs than lowself-reporters and vice versa. Scores of intermediate selfreporters were in the intermediate range on most measures. The results were consistent across variables and showed similar patterns in both PPMI and PASADENA populations.
In the PASADENA study, we found that MDS-UPDRS II-which is intended to measure patients' judgments of motor-related impact in daily life-is predicted not only by clinician-examined motor sign severity on part III but also numerically even more strongly by nonmotor symptoms reported in parts IA and IB. In the PPMI study, part IB was also the strongest predictor of part II, followed by part III scores (findings are reported in Appendix S1).
There are likely two primary (and not mutually exclusive) explanations for these findings.
The first explanation reflects the idea that "motor" activities do not exclusively measure motor behavior. For example, difficulties doing hobbies and with dressing or with hygiene activities may be caused directly by motor impairment but also could be caused by impaired cognition, fatigue, pain, and so on. This potential association is currently not considered how most of the items in MDS-UPDRS part II are phrased; that is, for a participant responding to the questionnaire it is not always clear that the item should assess the contribution of motor problems only. Moreover, some individuals might find it difficult to distinguish between the sources of their problems. Similar issues were also observed in part II's predecessor scale, the UPDRS-ADL. 18 Therefore, the discriminant validity of MDS-UPDRS part II as a scale to measure pure motor aspects of experiences of daily living is questionable and subject to jingle-jangle fallacies (ie, the name of a scale drives how it is interpreted). If MDS-UPDRS part II has stronger correlations with nonmotor symptoms than with motor symptoms, then it may not only be measuring what it was designed to measure.
The second explanation is the influence of response tendencies such that patients who self-rate motorrelated symptoms higher than their motor examination findings also consistently self-report a greater severity/impairment across a diverse array of PD symptoms, despite no worsening of examination findings or objective tests. Indeed, response style biases could exist in patient-reported (or any self-reported) questionnaires, meaning that individuals endorse symptoms in a certain way regardless of the objective severity or content of the question. 19 This is a phenomenon that has been documented outside the field of PD. According to the Symptom Perception Hypothesis, patients with negative affection tend to report more physical health issues. 2,3 In particular, it has previously been shown that depression was associated with recalling more physical symptoms, whereas anxiety was associated with reporting more momentary physical symptoms. 4 Both explanations are supported by findings in previous studies. Indeed, treatment of depression was associated with improved self-reported quality of life and less self-reported functional motor-related impact on the UPDRS-ADL score but no change in the UPDRS motor score. 20 Weintraub and colleagues 8 also reported that nonmotor symptoms (ie, signs of depression or cognitive impairment), but not physician-assessed motor sign severity, were significant predictors of functional disability in PD as measured by the (self-reported) UPDRS-ADL subscale. Others have reported positive correlations between depression and higher selfreported functional disability, motor impairment, and disease severity. 21 These findings could support the first explanation; for example, someone who suffers from depression may perform daily (motor) activities slowly or with more difficulty, due to apathy, subjective fatigue, and so on In line with the second explanation, depression may also change to what degree patients recognize, perceive, or are impacted by any motor difficulties that make any activities more difficult.
In addition to psychiatric comorbidities, data collection method, the language of the questionnaire, and cognitive load, characteristics of respondents such as age, sex, education, income, race, culture, and personality are among the variables that can affect response style. 19 However, in the present analyses we did not find any differences in the assessed demographic variables between groups, which may, in part, be due to the restricted range of demographic and cultural variables in these populations. It is important to note that most of the self-reported scales used in the present studies are intercorrelated, driven by the partially overlapping (or even the same) constructs being assessed. The pattern that high-self-reporters reported more symptoms overall indicates that the patients were consistent with how they reported their symptoms.
The present results have implications for quantifying treatment response over time, both in the clinic and in clinical trials. If response tendencies, other nonmotor symptoms, and/or how a question is formulated drive higher values on a self-reported "motor-related" scale, this needs to be considered when interpreting findings. This consideration is instinctive for experienced clinicians; for example, if a major negative life event occurs in a patient who also reports more motor symptoms, clinicians consider the emotional state of the patient rather than automatically increasing levodopa doses.
However, when individualized evaluation is impossible, as in clinical trials, important bias can occur. In particular, whereas a stable response pattern could potentially be adjusted for in trial design, any change in this effect during the trial can be problematic. One possible approach to potential bias may be to correct selfreported measures like MDS-UPDRS part II for potential effects of noteworthy nonmotor symptoms, for instance, by a regression-based approach, similar to what is done in normative studies that correct cognitive scales for potential sociodemographic effects like level of education. Further research is required to better understand what drives the present results and the utility of such an approach.

Study Limitations
There are some limitations to this study. First, we analyzed data from individuals with early PD, which were marked by low levels of impairment, particularly in MDS-UPDRS parts I and II; therefore, there may be a limited dynamic range. The findings highlight the immediate need for patient-reported outcome measures, which measure symptoms that meaningfully impact patients' lives in the earliest stages of the disease. The importance of patient-reported outcomes (ie, MDS-UPDRS parts I and II) may be higher in a population with more advanced disease stages. Only 4% of the PASADENA population showed abnormal scores of anxiety and depression at baseline, whereas, in the PPMI, 14% of the whole sample showed signs of depression (Geriatric Depression Scale ≥5), and 23.8% showed signs of anxiety (State-Trait Anxiety Inventory >75th percentile). Nevertheless, the observed results were similar between PPMI and PASADENA. It is possible that individuals with depression/ functional impairment might be less likely to participate in a research study or clinical trial, leading to an underrepresentation of the role of depression or anxiety in our study. Moreover, our approach of assigning individuals to groups of low-, intermediate-, and highself-reporters has limitations. We based the classification on the difference between normalized MDS-UPDRS part II and part III scores and labeled the lower and upper 25% of all individuals as low-and high-self-reporters, respectively, for each population separately. Thus, the classification is based on the distribution of the difference between part II and part III in the respective population and is therefore highly sample dependent. The results may differ if replicated in another study population. Nevertheless, we tested this approach in two independent samples (PPMI and PASADENA) and achieved overall consistent results, which corroborates the findings.
Future directions are explained in Appendix S1.

Conclusions
Self-reported motor-related impact in daily life, as assessed by MDS-UPDRS II, not only reflects examination-based severity (ie, MDS-UPDRS III) but is also strongly associated with nonmotor symptoms. This may reflect both a direct impact of nonmotor manifestations on motor function and differences in response tendencies to self-reported questionnaires. These factors must be considered when using self-reported rating scales such as MDS-UPDRS II in clinical research and drug trials.