Introduction

The unified multiple system atrophy (MSA) rating scale (UMSARS) was developed almost 20 years ago as a clinical rating scale to capture multiple aspects of the disease [4, 17]. It is easy to use in clinical practice, with an average time of administration of ~ 15 min to complete four subscales: UMSARS-1 (12 questions) rates patient-reported functional disability; UMSARS-2 (14 questions) rates clinician-assessed motor impairment; UMSARS-3 records blood pressure and heart rate measurements in the supine and standing positions; and UMSARS-4 (1 question) rates chore-based disability. Higher scores on the UMSARS indicate more severe disease. The US Food and Drug Administration and the European Medicines Agency have relied on the UMSARS for drug development, and all recently completed clinical trials for MSA have used this scale as primary endpoint [5, 7, 13,14,15].

With its widespread use, however, a number of shortcomings of the UMSARS as a clinical outcome assessment (COA) have become increasingly apparent. We here summarize the limitations of the UMSARS, confirm some of these limitations with data from our ongoing Natural History Study of the Synucleinopathies (NHSS) (ClinicalTrials.gov: NCT01799915), and suggest a framework to develop and validate an improved COA to be used as primary endpoint in future clinical trials of disease modification in patients with MSA.

Overall limitations of the UMSARS

Ability to detect change

Natural history studies have shown that the UMSARS is sensitive to change as it increases over time with worsening disease severity [3, 6, 16]. However, in published studies, the standard deviation (SD) of annual UMSARS increases was very variable and in many cases exceeded the expected effect sizes of candidate drugs. Specifically, the annual UMSARS-1 increase in published studies ranged from 3.91 to 6.5 with SD ranging from 0.6 to 6; and of UMSARS-2 ranged from 3.5 to 8.2, with a SD ranging from 0.6 to 7 [5,6,7, 13, 15, 16]. Therefore, to achieve sufficient statistical power, studies using UMSARS require large cohorts (typically at least 100 patients per group in parallel-group placebo-controlled clinical trials) and long study periods (at least 1 year), which is a significant hurdle for a rapidly progressive orphan disease.

To explore the contributions of each item of the scale to its sensitivity to change, we analyzed data at baseline and at 1-year visits from 70 patients with MSA enrolled in the NHSS and calculated their mean change, standard deviation (SD), and standardized effect mean change/SD) (Table 1). We identified several items with a low standardized effect (< 0.20) denoting little ability to detect change, including orthostatic symptoms (UMSARS-1 item 9), bowel function (UMSARS-1 item 12), and tremor at rest (UMSARS-2 item 5). Conversely, items with high standardized effect (> 0.30) denoting good ability to detect change, were dressing and hygiene (UMSARS-1 items 5 and 6), and posture and gait (UMSARS-2 items 12 and 14).

Table 1 Mean annual change, standard deviation and standardized effect of each individual UMSARS item in patients with MSA enrolled in the Natural History Study of the Synucleinopathies (NHSS)

As a second step, we undertook a preliminary feasibility analysis to determine whether a modified, reduced version of the UMSARS could be more sensitive to change. Using data from the NHSS, we created multiple iterations of the scale by sequentially adding items. We created 24 iterations with different combinations of items of the UMSARS. An abridged UMSARS including 11 items with high standardized effect (UMSARS-1 items 2, 3, 6, 7, and 11; and UMSARS-2 items 1, 2, 9, 11, 12, and 14), with a total maximum score of 44 (shown in bold in Table 2) had the best ability to detect change over 1 year. Additional items did not improve the standardized effect or the % annual increase (Figs. 1, 2). This 11-item UMSARS had a mean annual increase of 13% (5.77 ± 4.88 points, standardized effect = 1.17) from baseline. In contrast, the current UMSARS with 24 items had a mean annual increase of 11% (10.44 ± 9.76 points, standardized effect = 1.07). While these difference of 2% annual increase and 0.10 of standardized effect are small and could increase with further enhancement, they already yield a significant difference when doing power calculations for clinical trials. Using the current UMSARS, it is necessary to randomize 304 patients to detect a 30% difference between a disease-modifying drug and placebo (80% power, 0.05 alpha). In contrast, using the abridged 11-item UMSARS would only require randomizing 256 patients to detect the same difference. Assuming a $40,000 cost per patient in a 1-year disease-modifying clinical trial, this reduction of 48 patients (from 304 to 256) would result in ~ $2 million in savings. Needless to say, a proper development and validation process for a new COA will be required but this limited preliminary exercise already shows that the current UMSARS can be improved.

Table 2 Iterations of the UMSARS according to a subsequent addition of items ranked by their standardized mean differences
Fig. 1
figure 1

Relationship between UMSARS, disease progression, and duration of disease. Panels A and B depict 143 patients with probable or possible MSA enrolled in the Natural History Study of the Synucleinopathies who completed at least a 1-year evaluation (61 completed a 2-year evaluation). Both panels illustrate how patients with lower UMSARS tend to have progression rates in the UMSARS, significantly for UMSARS-1 (R2 = 0.041; P = 0.0153) and close to significance in UMSARS-2 (R2 = 0.039; P = 0.0917). While this has been interpreted as faster progression in patients with earlier disease stages, an alternative plausible explanation is the that the UMSARS may have poor ability to capture disease progression in advanced patients. To illustrate this, panels C and D depict correlations between the UMSARS at each visit (baseline, 1 year and 2 years) and the patient’s duration of disease at each visit as defined by the time from onset of motor symptoms. There is an apparent ceiling effect at 43 in UMSARS-1 and at 49 in UMSARS-2 (denoted with dashed line), meaning that no patient ever reached higher scores, despite the fact that these are not the highest possible UMSARS scores. This suggests that UMSARS is not a suitable tool to capture disease progression at advanced disease stages

Fig. 2
figure 2

Percent mean annual change in the total score (a) and effect size of the yearly change in the total score (b) according to the number of items included in the scale. The dotted line shows the corresponding values for an 11-item abridged version of the UMSARS, showing that further addition of items does not increase its sensitivity to change. Note that this is only a quick example showcasing the feasibility of improving the UMSARS and developing a new clinical outcome assessment (COA) tool for MSA that can be used in future clinical trials of disease modification. We are not proposing using this 11-item UMSARS instead of the conventional UMSARS. A proper development and validation process for the new COA will be necessary

Ceiling effect

In past MSA clinical trials, only patients at early disease stage were enrolled. This was because UMSARS increased faster early in the disease [6, 15, 16], and therefore it was during this period that a significant difference between active agent and placebo appeared easier to detect. We confirmed this faster rate of worsening of UMSARS in the NHSS (Fig. 1a, b), suggesting that the disease progresses faster early on. A different, yet underexplored explanation, however, is that the UMSARS might not be an ideal tool to detect change in patients with advanced disease. Thus, the faster progression of UMSARS early on and its lack of progression later in the disease may be due to a ceiling effect of the scale rather than a feature of the disease. To explore this, we performed a nonlinear regression between the UMSARS-1 and UMSARS-2 scores at each visit (baseline, 1 year, and 2 years) and the disease duration from the onset of the motor symptoms (Fig. 1c, d). The annual increase in UMSARS-1 and UMSARS-2 were +5.4 (SD: 5.1) and +5.9 (SD: 5.3), respectively, although scores stopped increasing after 5 years (slope of −0.36 points/year). Note in Fig. 1c that, while the maximum UMSARS-1 score is 48, no patient surpassed 44 points. Similarly, while the maximum UMSARS-2 score is 56, no patient surpassed 49 points.

To illustrate this, we could picture two patients with MSA, with each item’s score of their UMSARS-1 shown in parenthesis. One has unintelligible speech most but not all the time (item 1, score of 4), uses a nasogastric tube for liquids only but still enjoys solid food by mouth (item 2, score of 4), is unable to write (item 3, score of 4), needs help for feeding (item 4, score of 4), getting dressed (item 5, score of 4), and hygiene (item 6, score of 4), is wheelchair-bound (item 8 with a score of 4 with an item 9 falling score of 4) but does physical therapy in the sitting position every day, enjoys watching TV with his wife and plays with his grandchildren. He has no orthostatic symptoms (item 9, score of 0), uses intermittent self-catheterization (item 10, score of 3), sexual activity is impossible (item 11, score of 4), and requires laxatives for constipation (item 12, score of 3). Thus, his UMSARS-1 is 37. The second patient is unable to speak (item 1, score of 4), uses a nasogastric tube for solids and liquids (item 2, score of 4), is unable to write (item 3, score of 4), needs help for feeding (item 4, score of 4), getting dressed (item 5, score of 4), and hygiene (item 6, score of 4), is bed-bound and spends most of the day sleeping (item 7, score of 4). He has no orthostatic symptoms (item 9, score of 0), has a suprapubic catheter (item 10, score of 4), sexual activity is impossible (item 11, score of 4), and has occasional constipation but no medication is needed (item 12, score of 2). Thus, the second patient’s UMSARS-1 is also 37, like the first patient, but it is clear that the second patient is at a considerably more advanced disease stage than the first. When designing a new COA, considering the clinical and functional features of patients in advanced stages of MSA will be key to avoid a ceiling effect.

It is also plausible that the UMSARS may have a floor effect in early-stage patients, but the paucity of observations in early-stage patients in published natural history studies makes it challenging to prove it.

Specific items limitations

The limitations of UMSARS to detect change and its ceiling effect may be due to shortcomings in specific items. The following are some examples.

Unclear anchoring descriptions. Unclear descriptions of UMSARS items and answers may be limiting the accuracy of the scale to reflect the actual disease stage. For instance, the hygiene item (UMSARS-1 item 6) describes “difficulty with showering.” While many patients require help to get into the shower, most need no help with showering, which makes the scoring of this item challenging. In the same item, one of the anchors is “Patient requires assistance for washing, brushing teeth, combing hair, using the toilet” (3 points). However, it is unclear whether patients must require assistance with all four, or just one. “Assistance” is too wide a term, as it may indicate the need for human assistance or use of assistive devices. Patients may not require help, although they may welcome it as it makes their activities of daily living easier. Although this item had a high standardized effect (Table 1) in our NHSS data, its refinement might result in even higher standardized effect and thus even better ability to detect change.

Conversely, many patients may need the intervention described in the anchor, yet may not use it, as in the urinary dysfunction item (UMSARS-1 item 10), in which the anchors include “urgency and/or frequency and/or incomplete bladder needing intermittent catheterization” (3 points). Indeed, many patients with severe urinary retention with high post-void residual volume do not use catheterization due to a number of reasons, e.g., aversion to catheters or lack of information about it being needed.

Finally, the UMSARS-1 items 2 (swallowing) and 8 (falling) include anchors like choking/falling “less than once a week” and choking/falling “more than once a week.” But none of these would apply for a patient choking or falling once a week.

Lack of correlation with disease severity. The anchors of the gait and walking items (UMSARS-1 item 7 and UMSARS-2 item 14) are based on the patient’s requirement for assistance and/or walking aid occasionally (2 points) or frequently (3 points). These, however, do not define the frequency, or whether the patient requires a cane (in which case the disease stage would be mild-moderate) or a walker/stroller (in which case the disease stage would be moderate-severe). Thus, a patient using a cane all the time would get a score of 3, but a more advanced patient using a walker/stroller occasionally who holds on to the walls the rest of the time would get a score of 2. Moreover, “assistance” is too wide a term, as it may indicate either requiring a holding hand or full support.

The swallowing item (UMSARS-1 item 2) does not take into consideration dietary changes. Patients with normal nutrition having frequent aspiration (score of 3), may quickly improve (score of 0) when started on a blended liquid diet, even though the dysphagia is at the same stage.

The item on sexual function (UMSARS-1 item 11) is problematic. Many patients do not have sexual activity due to reduced sexual desire, or conjugal, medical or psychological reasons other than MSA-related difficulties. Moreover, there are no satisfactory scoring system for female sexual dysfunction and erectile dysfunction in elderly males is common and may not be due to MSA. Indeed, the sexual item was omitted from the UMSARS assessment to minimize variability and noise in past MSA placebo-controlled trials [7].

Susceptibility to improvement with symptomatic treatment. The changes induced by symptomatic treatments can reduce the accuracy of UMSARS to track neurodegeneration-related disability. The inability to separate disease-modifying vs. symptomatic effects is common in COAs for neurodegenerative disorders. In the case of MSA, this is particularly evident with items assessing autonomic symptoms. For instance, bowel function (UMSARS-1 item 12) can be improved with widely available nutritional or pharmacologic treatments for constipation. Likewise, orthostatic symptoms (UMSARS-1 item 9) can be improved with non-pharmacologic and pharmacologic therapies [10]. Although these treatments could result in improvement in the UMSARS, they do not indicate slowing of the disease and thus add unwanted “background noise” when tracking disease progression. Thus, designing a new COA for disease modification may require careful consideration of these items. The fact that the UMSARS does not include an item to assess mood is also problematic, given that patients with MSA report more burdensome symptoms when depressed [8].

Redundancy. The items on gait (UMSARS-1 item 7, and UMSARS-2 item 14) are the same, the former evaluated by history, and the latter by physical examination. While item duplication enhances internal consistency, it results in spurious inflation and increases the scale administration time. Conversely, the development of a new COA could consider assigning a heavier weight to clinically relevant items (e.g., gait), in contrast to other, less disabling items (e.g., oculomotor dysfunction).

Cultural bias. The cutting food/handling utensils item (UMSARS-1 item 4) assumes that food is regularly cut for eating and that utensils are used, although some cultures serve food in bite-sized portions, and some do not use utensils. In East Asian regions, chopsticks (requiring more dexterity) and spoons are frequently used instead of forks.

Difficult assessment of items. For item 3 of UMSARS-2, the examiner must assess whether the patient has specific oculomotor abnormalities. This item reached only a moderate intra-/inter-rater agreement during the initial UMSARS validation [4]. While extraocular movements are important in the differential diagnosis, they do not appear to be relevant to track progression of the disease.

Lack of assessment of some MSA-related features. The current UMSARS does not assess relevant MSA comorbidities impacting quality of life, such as sleep disorders, drooling, vocal cord dysfunction and stridor [1], depression [8], contractures and pain [9], or urinary tract infections [12]. Capturing these items may result in higher construct validity and sensitivity.

Lack of assessment of instrumental activities of daily living. UMSARS-4 evaluates the need for assistance with nondescript chores. Evaluating specific activities of daily living such as shopping, cooking, using the phone, driving, or using public transportation may provide a more precise representation of the patients’ progressive functional disability.

Orthostatic hypotension assessment. UMSARS-3 includes blood pressure and heart rate measurements in the supine position and after 2 min of standing. The consensus, widely accepted definition of orthostatic hypotension (OH), however, requires blood pressure readings within 3 min of standing [2]. As orthostatic blood pressure falls rapidly, ensuring consistency in the definition of OH would facilitate comparison and cross-validation of blood pressure data among studies.

Toward a new MSA clinical outcome assessment

Because of the shortcomings of the UMSARS, developing and validating an improved COA is an urgent priority. As shown here, developing an improved COA is a laborious but achievable goal. The time is right for MSA clinicians together with industry, professional societies, and patient advocacy groups to develop and validate a new COA. Based on UMSARS data from available MSA natural history studies and our combined clinical experience, we recommend the following: (1) Provide specific instructions with a video-tutorial on how to acquire each item for both in-person and remote (telemedicine) visits. (2) Eliminate redundant items and refine anchors for more accurate scoring. (3) Carefully consider items that are susceptible to change with symptomatic treatment, such as orthostatic hypotension or constipation. Although these may not reflect progression of the underlying neurodegenerative process of MSA, these items may be required when assessing a hypothetical disease-modifying treatment that could selectively improve autonomic function. The use of symptomatic medication should be taken into consideration. If blood pressure readings are included in the COA, consider measuring it in the supine position and after 3 min of standing (instead of 2 min). (4) Consider adding items on previously not assessed MSA features that impact a patient’s quality of life and disability (e.g., vocal cord dysfunction, depression of mood and drooling). (5) The development and validation of the COA should be made for in-person and remote visits, not only because of the risks posed by the recent COVID-19 pandemic but because it would facilitate enrolment and retention in clinical trials. (6) Receive input from the regulatory agencies (i.e., FDA, EMA) to satisfy their requirements.

The process to develop and validate a new COA should be data-driven. It can improve the abovementioned conceptual limitations and use the large amount of data available from natural history studies and clinical trials [5,6,7, 11, 13, 14, 16]. Collectively, these available data sets include baseline and follow-up information on demographics, UMSARS, and other scales from more than 500 patients with MSA. The development of the new COA using historical data could be iterative, adding, subtracting, or refining items, until the best sensitivity to change has been identified. This future newly developed COA should then obtain further input from a significant number of clinicians and industry experts to refine the items. With the collaboration of patient advocacy groups, the new COA must be examined in patient-focused groups with cognitive interviewing to determine whether the new items are understandable, clear, and meaningful to ensure content validity. The new COA must be validated in a multicenter cross-sectional and prospective fashion to determine its acceptability, scaling assumptions, construct validity, measurement equivalence, internal consistency, inter-rater reliability, its ability to detect change, and its minimally clinically important difference. Finally, the new COA should be translated to other languages.

Conclusions

Several pharmaceutical companies are currently working toward MSA-targeted therapies. Late preclinical or early clinical studies of at least ten candidate compounds to slow or halt the progression of MSA are underway. It is expected that, if successful, these will proceed to late clinical development stages within 3–5 years, the period of time required to develop and validate a new COA. Collaboration between established MSA research networks, professional societies, the pharmaceutical industry, and patient advocacy groups can address this challenge.