Rapid motor fluctuations reveal short-timescale neurophysiological biomarkers of Parkinson’s disease

Objective. Identifying neural activity biomarkers of brain disease is essential to provide objective estimates of disease burden, obtain reliable feedback regarding therapeutic efficacy, and potentially to serve as a source of control for closed-loop neuromodulation. In Parkinson’s disease (PD), microelectrode recordings (MER) are routinely performed in the basal ganglia to guide electrode implantation for deep brain stimulation (DBS). While pathologically-excessive oscillatory activity has been observed and linked to PD motor dysfunction broadly, the extent to which these signals provide quantitative information about disease expression and fluctuations, particularly at short timescales, is unknown. Furthermore, the degree to which informative signal features are similar or different across patients has not been rigorously investigated. We sought to determine the extent to which motor error in PD across patients can be decoded on a rapid timescale using spectral features of neural activity. Approach. Here, we recorded neural activity from the subthalamic nucleus (STN) of subjects with PD undergoing awake DBS surgery while they performed an objective, continuous behavioral assessment that synthesized heterogenous PD motor manifestations to generate a scalar measure of motor dysfunction at short timescales. We then leveraged natural motor performance variations as a ‘ground truth’ to identify corresponding neurophysiological biomarkers. Main results. Support vector machines using multi-spectral decoding of neural signals from the STN succeeded in tracking the degree of motor impairment at short timescales (as short as one second). Spectral power across a wide range of frequencies, beyond the classic ‘β’ oscillations, contributed to this decoding, and multi-spectral models consistently outperformed those generated using more isolated frequency bands. While generalized decoding models derived across subjects were able to estimate motor impairment, patient-specific models typically performed better. Significance. These results demonstrate that quantitative information about short-timescale PD motor dysfunction is available in STN neural activity, distributed across various patient-specific spectral components, such that an individualized approach will be critical to fully harness this information for optimal disease tracking and closed-loop neuromodulation.


Introduction
Parkinson's disease (PD), one of the most prevalent neurodegenerative conditions [1], is typified by motor and cognitive dysfunction that occur in the setting of pathologicallyincreased oscillatory neural activity in the basal ganglia [2][3][4][5]. Lower frequency oscillations, particularly those in the β (~12-30 Hz) range, have emerged as potential biomarkers for PD motor dysfunction based primarily upon relatively longer timescale observations of abundant β oscillations in the unmedicated PD state and decreased β power in response to therapy (dopaminergic medications or DBS) [6][7][8][9][10][11]. Critically, the relevance of those observations for the moment-to-moment, ongoing manifestation of symptoms on shorter timescales is less well understood. Much as cold temperature is permissive but not causal of snow, it may be the case that β oscillations reflect conditions favorable for the expression of PD symptoms, while separate or additional features of neural activity modulate the immediate dynamics of motor dysfunction.
PD is characterized by a range of signs and symptoms that may include bradykinesia, rigidity, resting tremor, and postural/gait instability, which are heterogeneously manifested across individuals [12][13][14][15]. Disease severity is typically measured using the Unified Parkinson's Disease Rating Scale (UPDRS). Unfortunately, although well-validated [16][17][18], this scale ultimately relies upon subjective patient and clinician assessments and is not intended to capture fluctuations in motor behavior on the timescale of seconds. More objective, quantitative examination of PD patients' movements revealed that, within individuals, a large dynamic range of movement speed is preserved but shifted towards slower movements [19,20]. Remarkably, faster movements that approach normal subject velocities (and that maintain normal movement accuracies) can be elicited [21], highlighting the impressive residual motor capacity that persists in this condition. Meanwhile, PD patients are notably impaired in their ability to correct for visible target deviations during ongoing movement [22]. Based upon these observations, we hypothesized that quantifying natural motor variability in the context of a simple, continuous-performance, visual-motor task may allow the identification of neurophysiological biomarkers associated with fluctuating motor states.
Our overall strategy was as follows: (1) Engage PD subjects in a continuous motor performance task to elicit natural motor variability and quantify this variability with an array of metrics at short timescales; (2) Apply a machine learning algorithm to determine weights for each of these metrics that maximally differentiate each patient's motor performance from that of control subjects performing the same task to generate an objective, patient-specific, scalar measure of motor impairment in each short time interval; (3) Examine oscillatory activity in the subthalamic nucleus (STN) while subjects performed this task to determine the manner and extent to which neural signals capture each patient's fluctuating motor impairment.

Subjects
All patients undergoing routine, awake placement of deep brain stimulating electrodes for intractable, idiopathic PD between June 2014 and September 2018 were invited to participate in this study. PD patients were selected and offered the surgery by a multidisciplinary team based solely upon clinical criteria, and the choice of the target (STN vs. Globus pallidus internus) was made according to each patient's particular circumstance (disease manifestations, cognitive status and goals). In this report, we focused on patients undergoing STN DBS (n = 22). Patients were off all anti-Parkinsonian medications for at least 12 h in advance of the surgical procedure (mean and standard deviation of UPDRS III scores was 50.95 ± 13.6, n = 21). Approximately age-matched controls (often patients' partners) also participated in this study (n =15); patients were aged 47.5-78.5 years (mean 64.3), and controls were aged 48.3-79.2 years (mean 62.6) at the time of testing (t-test comparing the age distributions: p = 0.56). Controls were required simply to be free of any diagnosed or suspected movement disorder and to have no physical limitation preventing them from seeing the display or manipulating the joystick. There was a strong male-bias in the patient population (20 M, 2 F) and a female preponderance in the control population (3 M, 12 F), reflecting weaker overall biases in the prevalence of PD and the clinical utilization of DBS therapy [23].
Patients and other subjects agreeing to participate in this study signed informed consent, and experimental procedures were undertaken in accordance with an approved Rhode Island Hospital human research protocol (Lifespan IRB protocol #263157) and the Declaration of Helsinki. Data from all patients who enrolled and took part in this study are presented here.

Behavioral task
We employed a target-tracking task to estimate the degree of motor dysfunction in a continuous fashion. Specifically, while PD subjects reclined on the operating bed in a 'lawnchair' position, a joystick was positioned within their dominant hand, and a boom-mounted display was positioned within their direct line-of-sight at a distance of ~60-90 cm. The task was implemented in MonkeyLogic [24,25] and required subjects to follow a green target circle that moved smoothly around the screen by manipulating the joystick with the goal of keeping the white cursor within the circle ( figure 1(a)). The target circle followed one of several possible paths (invisible to the subject), with each trial lasting 10-30 s. Each session consisted of up to 36 trials (~13 min of tracking data), and subjects performed 1-4 sessions during the operation, as time and patient comfort allowed. Control subjects performed this task in an extra-operative setting.

Motor metrics & motor error score (MES)
Several measures of motor performance were applied to this task in a small time windows. These measures consisted of tremor (the magnitude of the 3-10 Hz band-pass filtered x-and y-joystick traces) and other metrics calculated after low-pass filtering the x-and y-joystick traces below 3 Hz. Eight metrics were defined from the movement of the cursor and the target. The cursor position at time t was defined as C(t) = (x c (t), y c (t)), and the target position was defined similarly as T(t) = (x T (t), y T (t)).
Tremor magnitude (TM) was calculated by taking ∥ C a (t) ∥, where ∥ · ∥ was defined as the Euclidean norm, taken of the analytic signal C a (t), which was derived from applying the Hilbert transform to the tremor-filtered components of C(t). Absolute distance (AD) was defined as ∥ C(t) − T(t) ∥. Vector Error (VE) was defined as the vector difference T t − C t between the cursor vector segment C t = C t C t + Δt and the target vector segment T t = T t T t + Δt , for Δt = 1 ms. Instantaneous cursor speed was defined as S t c = ∥ C t ∥ /Δt, and similarly, target speed was S t T = ∥ T t ∥ /Δt. Slowness (SL) was defined as an exponential decay of the cursor speed, exp(−0.042 · S t c ). Speed difference (SD) was defined as the difference in cursor speed and target speed at time t, S t c − S t T . Excursion difference (ED) was defined as the difference in the excursions of the cursor and target from the origin at the center of the display, ∥ T(t) ∥ − ∥ C(t) ∥. Vector angle (VA) was defined as the angle between C t and T t : Correction angle (CA) was defined as the angle between the T t and the vector segment representing the optimal path O t from C(t) to the target, C t T t : Metrics were initially calculated using 100 ms contiguous, non-overlapping epochs for the behavioral timescale analysis. Metrics were subsequently recalculated over longer epochs (1-10 s) to align with neural data at the relevant timescale for each analysis. Because the motor manifestations of PD are heterogenous, for each PD subject we sought to determine the combination of metric weights that maximally captured their individual motor signs. We applied a linear support vector machine (SVM) algorithm to define the hyperplane in the eight-dimensional metric-space that optimally separated a PD subject's performance from that of controls. In other words, for each PD subject, the distribution of eight-dimensional data points for each epoch of performance was compared to the aggregate distribution of control subject data points (sub-sampled as needed to maintain equal numbers across groups for SVM binary classification). In all these cases, 100-fold Monte Carlo cross validation was performed using a 2:1 split of the data into training and testing portions. The parameters defining this hyperplane corresponded to the weights applied to the different metrics. The normal distance from each PD subject's data points (each representing performance during one epoch) to the hyperplane was defined as the 'Motor Error Score' (MES).

Surgical procedure
Microelectrode recordings (MER) from the region of the STN of awake patients are routinely obtained in order to map the target area and guide DBS electrode implantation. A single dose of short-acting sedative medication (typically propofol or dexmedetomidine) was administered before the start of each procedure, at least 60-90 min prior to microelectrode recordings. The initial trajectory planned for three electrodes (anterior, center, and posterior) was determined on high-resolution (typically 3 T) magnetic resonance images (MRI) coregistered with CT images demonstrating previously-implanted skull-anchor fiducial markers. A 3D printed stereotactic platform (Star-Fix micro-targeting system, FHC Inc. Bowdoin, ME) was then created such that it could be affixed to these anchors, providing a precise trajectory to each target [17]. Microdrives were attached to the platform and then loaded with microelectrodes. Recordings were typically conducted along the anterior, center, and posterior trajectories, corresponding to the axis of highest anatomical uncertainty based upon the limited visualization of the STN on MRI. Additional lateral and medial electrodes were utilized when necessary for the clinical care. Electrodes were arranged in a Ben-gun configuration and were separated by 2 mm. Recordings were referenced to all six or more cannulae inserted bilaterally along each target-specific trajectory, with each terminating 15 mm above the MRI-estimated target.
MER began about 10-12 mm above the MRI-estimated target, which was chosen to lie near the inferior margin of the STN, about two-third of the distance laterally from its medial border. The STN was identified electrophysiologically as a hyperactive region typically first encountered about 3-6 mm above estimated target. At variable intervals, when at least one electrode was judged to be within the STN, electrode movement was paused in order to assess neural activity and determine somatotopic correspondence, as per routine clinical practice. At these times, if patients were willing and able, additional recordings were obtained in conjunction with patient performance of the visual-motor task.

Neurophysiological signals & analysis
Neural signals from the first four patients were acquired with FHC tungsten electrodes and an FHC data acquisition system (FHC Inc. Bowdoin, ME, USA), with signals split into a Plexon MAP system (Plexon Inc. Dallas, TX, USA); data for the subsequent 18 patients were recorded using 'Neuro-Probe' tungsten electrodes and Neuro Omega data acquisition systems (Alpha Omega, Inc. Nazareth, Israel). Electrode impedances were typically 400-700 kΩ.
Patients performed up to four sessions of the task, with electrodes positioned at different depths for each session. Electrodes were not independently positionable, so some signals were necessarily acquired outside of the STN. All recorded signals were nevertheless considered and analyzed. Behavioral data from each session were processed through an SVM model, as above, to generate the MES distributions, which were then compared against neural activity acquired simultaneously on individual electrodes.
Data were analyzed in MATLAB (Mathworks, Natick, MA) and Python 3 (python.org). Local field potentials (LFPs) from the microelectrodes were initially acquired at 40-44 kHz. Offline, signals were notch filtered at 60 Hz and its first four harmonics. Time series were zscored, and artifacts above four standard deviations were removed. These signals were bandpass filtered from 3-400 Hz and then down-sampled to 1 kHz. Spectral estimates were obtained through fast Fourier transform (FFT) convolution of LFPs with finite impulse response filters. Filters were calculated at 1 Hz intervals with a bandwidth of 2 Hz, covering 3-400 Hz (the use of non-overlapping frequency bins resulted in qualitatively similar results). The power was calculated by multiplying the conjugate of the Hilbert Transform by the filtered signal in each band. For visualization (figures 2(c) and 3(c)), spectral power was log-normalized and min-max scaled from 0 to 1 for each electrode. We defined six . The width of each band scaled approximately with the frequency because a particular absolute interval in a higher frequency range was likely to be less informative than the same interval in a lower frequency range. Activity in the vhf range, which is thought to reflect multiunit spiking activity, was included specifically in consideration of evidence that PD may be marked by pathological spike-field coupling with β oscillations [27][28][29][30].
Our goal was to determine which spectral features best corresponded to the MES on short timescales (corresponding to 1-10 s epochs of task performance and neural data acquisition). To this end, the Spearman correlation of MES with LFP power in particular frequency bands was calculated. Further, to understand if a combination of spectral features might serve as a better marker for MES, continuous decoding of MES was undertaken with a support vector regression (SVR) processing layer using the normalized power-spectral features of the LFP signals.
For any particular set of spectral features, SVR estimation accuracy was assessed with 100fold Monte Carlo cross-validation using a 2:1 training/testing split of each data set. The max-normalized SVR weights were examined to understand the contribution of particular frequency bands to MES classification accuracy [31]. Both non-linear (radial basis kernel, ε = 0.01) and linear SVR were employed, the former to assess the performance capability of this technique and the latter to allow examination of feature-weight coefficients. Both methods yielded qualitatively similar overall decoding performance across timescales (1-10 s).
To make certain machine learning models were not overtrained and/or reliant on spurious associations between behavior and neural activity, a bootstrap procedure was performed in which the correspondence between MES and neural activity in each epoch was randomized and additional SVR models were then generated using the same parameters and crossvalidation techniques. Decoding performance of these models was generally quite poor and therefore suggested the performance obtained in the original models was not spurious or due to overfitting. The results of these control analyses are shown in the relevant figures.
SVR coefficients ('weights') were examined to understand what neural features contributed to MES decoding. Permutation tests were used to determine the significance of these SVR weights and patterns. To assess whether there were clusters of frequency bands with higher or lower weights than expected by chance, a contiguity-sensitive permutation test was performed where the mean SVR weight values were shuffled with respect to frequency over 10 000 iterations and the null distribution of weights over 3-7 contiguous bands was computed. The probability of the observed weight distributions over multiple, adjacent frequency bands was then determined using this null distribution as reference. To determine if cross-frequency patterns of SVR weight assignments existed, a cross correlation between feature weights for every pair of frequencies was calculated, across SVR models; autocorrelations (points along the diagonal) were excluded from this calculation (diagonal points were filled-in with the means of the adjacent non-diagonal points for the purposes of visualization). Areas of significance within this matrix were computed using a bootstrap procedure in which the assignment of frequency feature weight to LFP signal was shuffled 1000 times to create null distributions for each point on the matrix; the probability of an observed value was then calculated with respect to this null distribution.
Unless otherwise specified, statistical tests comparing observed classification or estimation accuracies with those from shuffled MES-neural data sets were performed using a Mann-Whitney U test. The Wilcoxon paired test was employed when distinct models were run on identical cross-validation data sets. Where appropriate, p-values were adjusted for multiple comparisons using the Benjamini-Hochberg procedure (false discovery rate, q = 0.05) [32].

Classification of Symptom Profiles
For some analyses, we assigned patients to one of two subtype categories: tremor dominant (TD) or non-tremor dominant (nTD) (21 total patients with UPDRS scores, 9 TD, 12 nTD), based on UPDRS sub-score ratios of Jancovic and colleagues [33]. All patients that failed to meet threshold for tremor dominance (ratio of tremor to PIGD sub-scores less than 1.5) were classified as nTD. To assess correlations between tracking task metrics and UPDRS-III components, we considered items (Rest/Postural tremor, Finger Taps, Hand Opening/ Closing, Rapid Alternating Movements, Rigidity) pertaining to the upper extremity relevant to the subject's task performance.

Anatomical reconstruction of recording sites
Patients typically underwent pre-, intra-and post-operative imaging. To reconstruct recording locations, MR and CT images were co-registered using the FHC Waypoint Planner software. The raw DICOM images and the linear transform matrices were exported and applied to reconstructed image volumes using the AFNI command '3dAllineate,' bringing them into a common coordinate space [34]. Depths were calculated by combining intraoperative recording depth information with electrode reconstructions obtained from intra-or postoperative images using methods described previously [35,36].
To determine the anatomical distribution of recording sites across patients, pre-operative T1weighted MR images were registered to a T1-weighted MNI reference volume (MNI152_T1_2009 c) using AFNI's '3dQwarp' command [26]. The resulting patientspecific transformation was then applied to recording site coordinates. MNI-warped recording coordinates were then assessed for proximity to the STN as delineated on the MNI PD25 atlas [38][39][40].

Data availability
De-identified data and relevant analysis code may be shared upon request for collaborative work with other research groups.

Quantification of PD motor behavior on short timescales
PD patients undergoing awake implantation of STN DBS electrodes (n = 22) and control subjects in the clinic setting (n = 15) performed a tracking task in which they followed an on-screen target using a joystick-controlled cursor ( figure 1(a)). Because the heterogeneity of motor impairment in PD was unlikely to be associated with some narrow aspect of performance on this task, we defined a library of eight-motor metrics that tiled the space of potential movement error. We sought to understand whether distinct symptoms of PD were manifested uniquely within this task. To understand how the metrics were related to symptom profile, across all subjects, we correlated the median of each metric distribution ( We next combined the individual metrics into a single score reflecting the degree of motor impairment within each time window. Specifically, a linear support vector machine (SVM) was applied to the metric values in each epoch to determine, for each patient, the optimal weighting of those metrics to achieve maximal separation of PD performance from control performance. Patients who were 'tremor dominant' (TD), based upon the ratio of their tremor related vs. postural/gait-related UPDRS III sub-scores (see Methods), tended to have greater weights assigned to the Tremor Magnitude metric, whereas 'non-tremor dominant' (nTD) patients tended to have greater weights assigned to Absolute Distance and Correction Angle (supplementary figure 4(b)). To obtain a scalar measure of movement quality for each epoch of motor performance, we defined the 'motor error score' (MES) as the normal distance from a motor performance epoch data point to the SVM hyperplane in metric space. Higher MES reflected increased motor impairment ( figure 1(b)). The MES served as a single composite measure of motor error weighted according to those particular metrics that best distinguished an individual patient's behavior from that of control subjects.
The PD vs. control MES distributions were distinct (Control: mean ± Std = − 1.99 ± 2.28; PD: 5.88 ± 10.66, p < 0.0001 by Mann-Whitney Utest). To determine the relevant timescale of motor fluctuations, the MES autocorrelation was computed for PD and control subjects ( figure 1(c)). The central peak was broader in the PD group, in which the timescale incorporating the significant portion (> 3 standard deviations above baseline) was ~7.2 s, while for controls it was only ~1 s. Importantly, the MES distributions for PD subjects and controls were highly discriminable ( figure 1(d) shows distributions at the 7 s timescale; other timescales presented in supplementary figure 1(B)); the receiver operating characteristic (ROC) area-under-the-curves (AUC) comparing each PD subject to aggregated control subject performance ranged from 0.95 to 0.99 (mean ROC AUC ± S.E.M. = 0.99 ± 0.01; all p-values < 0.001). These observations that the PD vs. control MES distributions differed in timescale, range, and value support the proposition that the tracking task provided an appropriate platform for detecting PD motor impairment.

Correlations of STN neural activity with short timescale motor fluctuations
STN local field potentials (LFPs) were recorded from microelectrodes traversing the region of the STN (figure 2(a)) from 22 PD patients while they performed the target tracking task. Patients performed 1-4 sessions of the task, with recordings obtained at different depths in each session, for a total of 53 intra-operative sessions. At any particular depth, data were acquired from 2-4 microelectrodes, resulting in a total of 161 recordings from the region of the STN across electrodes, sessions, and patients ( figure 2(b)). Neural signals from the STN contralateral to the hand used to perform the task were aligned to motor performance (the MES) for analysis ( figure 2(c)).
Oscillations in the β frequency band are considered a likely biomarker for PD symptom expression, and indeed there was often a correlation between β power and motor performance ( figure 3(a)). However, short timescale correlations between spectral power and MES were not restricted to the β-band. Significant positive Spearman correlations were observed on 49 electrodes for θ/α (4-12 Hz), compared to 35 for β (12-30 Hz), generally of similar magnitude, and significant correlations were often observed between MES and higher frequency bands as well ( figure 3(b)). Significant negative correlations were also observed, most often in the vhf band (200-400 Hz) on 32 electrodes, suggesting greater power in these higher frequencies may in some cases reflect better motor performance. In summary, these correlations revealed that information about motor performance was potentially available throughout the tested spectrum (4-400 Hz).
When the spectrogram was sorted not according to time but rather according to motor performance, some LFP signals yielded clear patterns where the power in particular frequency bands varied as a function of MES ( figure 3(c)). There was striking heterogeneity in the particular frequencies that comprised these patterns, but the fact that visually-evident patterns could emerge when spectrograms were sorted in this manner further supported the utility of natural motor fluctuations as a foundation to identify neurophysiological biomarkers of PD behavior.

Decoding the quality of motor performance using STN LFPs
We next sought to investigate the extent to which STN neural activity could be used to estimate fluctuating motor impairment. Support vector regression (SVR) was applied toward multi-spectral decoding of the MES. Decoding was performed using spectral power across six 'canonical' frequency bands (θ/α; β; low γ; mid γ; high γ; vhf; supplementary figure 2), each divided into seven sub-bands (yielding a total of 42 spectral features across 4-400 Hz). The data were separated into training and testing portions (2:1) and 100-fold cross-validation was performed. Because the main question related to how well, for any given patient, neural signals might be used to estimate motor impairment, we were interested specifically in the most informative signal from each patient. Note that because recordings were obtained opportunistically when the procedure allowed, and simultaneous electrodes were not independently positionable, many signals were necessarily acquired from suboptimal locations, including outside the STN, so we would not expect all signals to support highlevels of decoding performance.
Indeed, STN neural activity could be used to decode short-timescale motor impairment (figure 4). Significant decoding capability was evident in every PD subject, with MES decoding accuracy varying from 0.17 to 0.84 (r-values correlating SVR estimates to MES in 7 s epochs: mean = 0.49). Decoding accuracy was generally better at somewhat longer timescales ( figure 5(a)), reaching a plateau around an 8 s epoch size, but average decoding accuracy with even just 1 s of neural data was nevertheless 0.39 (range: 0.14 to 0.69), increasing to 0.53 (0.29 to 0.88) with 10 s epochs, across all PD subjects.
There was wide variation in the success of MES decoding across signals even within the same subject ( figure 5(b)). To determine if this heterogeneity might result from the anatomic source of the neural signals, we next determined the location of the most informative signals by contrasting the top vs. bottom thirds of the decoding models (using all 140 recorded signals across subjects with adequate neuroimaging). The best performing signals arose from the dorsolateral STN, corresponding roughly to the motor subdivision that is classically targeted for DBS therapeutic effect (figure 5(c)). Signals recorded within a region-of-interest (ROI) comprising a 2 mm radius around the center of this 'hot-spot' (MNI coordinate: x = ±14, y = −12, z = −6) yielded higher decoding accuracies than those recorded outside this ROI (Mann-Whitney U-test: p = 0.001; inside ROI: r = 0.385, n = 28; outside ROI: r = 0.238, n = 112).
In general, broadband, multi-spectral decoding of MES performed better than decoding by individual canonical frequency bands (figure 6(a)), consistent with the presence of significant power-MES correlations across all bands, above. To determine the contributions of particular frequency bands to MES decoding, the feature weights of a linear SVR model were examined ( figure 6(b)). High positive weights (signifying positive correlations between high LFP power and high MES) were typically assigned to the θ/α and low β bands. Meanwhile, negative weights were more typically assigned to the high β band and vhf bands, suggesting higher LFP power in these frequency ranges was often associated with better motor performance (lower MES). To assess whether a global relationship existed between the SVR spectral weights and peak spectral power in any canonical frequency band, we calculated a Spearman correlation across all recordings and observed no significant correlations (figure S3), suggesting decoding models did not simply leverage the most prominent spectral features.
We hypothesized patterns of weight assignments might exist across frequencies, such that weights assigned to particular bands might co-vary with weight assignments to other bands; the existence of such higher-order patterns would strengthen the notion that those involved bands were consistently meaningful and might provide insight into potentially shared oscillatory mechanisms. Indeed, significant positive correlations were observed across adjacent bands in the θ/α and β ranges, and within the vhf band (figure 6(c)); this reflected the coexistence of positive weights in the former and negative weights in the latter. In addition, local positive correlations were observed in the mid/high γ band as well.
Interestingly, relatively strong negative correlations were observed between the vhf and γ high bands, implying effective decoding relied concurrently on low vhf power as well as increased γ high power. Likewise, a relatively prominent negative correlation was observed between the γ low and θ ranges. A variety of other significant correlations were observed as well, suggesting these potentially interesting cross-frequency interactions may underlie the spectral 'fingerprint' of PD motor dysfunction.
We also examined whether feature patterns depended on disease subtype by examining SVR weights separately for TD and nTD subjects. We found that SVR decoding of MES using STN neural signals was generally better for TD compared to nTD patients ( figure 7(a)). Furthermore, TD patients' decoding models exhibited a more consistent spectral profile ( figure 7(b), top), with positive weights distributed throughout θ/ and low β frequency bands (~10-25 Hz) and significant negative weights in the high β/γ low range. Decoding models for nTD patients, on the other hand, did not exhibit a consistent pattern of weight assignments ( figure 7(b), bottom).
These averaged patterns of SVR weights belied noteworthy individual variability. 'Generic' models constructed from mean spectral feature weights significantly outperformed patientspecific decoders for 4 of 22 subjects with PD, while patient-specific decoders significantly outperformed generic models in 16 of 22 cases (Wilcoxon paired test). Nevertheless, generic models did demonstrate some success in estimating the MES, with significant decoding above chance for 16 of 22 patients (significant r values ranging from 0.07-0.82, Wilcoxon paired test, figure 8(a)). Generic models generated and tested independently for each PD subtype (TD or nTD) yielded similar patterns of results: When generic models were created from only the TD patients' data, significant decoding above chance was observed for 8 of 9 of these patients (r values: 0.07-0.86). For one patient, the generic model outperformed the patient-specific model, while for seven patients, the patient-specific model was better. Likewise, constructing a generic model for only the nTD patients, we observed that 10 out of 12 nTD patients had generic models that significantly outperformed chance (r values: 0.17-0.59), while generic models for three patients outperformed patient-specific models, and 8 patient-specific models outperformed generic models.
Likewise, patient-specificity was evident in a cross-decoding analysis where the training and testing of SVR models was conducted across pairs of subjects ( figure 8(b)). For any given patient, decoding models trained on that individual's signals ('autologous' conditions, on the diagonal) usually outperformed decoders trained using another's signals ('homologous' conditions, off-diagonal). It may be relevant, however, that the highest-performing quadrant of this matrix (top-right) had somewhat higher cross-decoding success than the lowestperforming quadrant (bottom-left) suggesting that, even though the autologous decoders (along the diagonal) had superior accuracy, there was nevertheless evidence for slightly improved generalizability across the best-performing multi-spectral SVR models and subjects.

Discussion
We examined endogenous fluctuations in movement quality to elucidate the neural correlates of motor dysfunction at short timescales. We used a naturalistic task that elicited goaldirected action such that deviations from the intended target could be quantified to provide an objective measure of immediate motor impairment in PD subjects. Multidimensional assessment of behavioral performance on this task discriminated between PD subjects and controls in a highly accurate manner and organized neural activity in a sufficiently meaningful way that it could be used to decode the quality of movement. Specifically, the power spectral features of the STN LFP correlated with fluctuating behavior and enabled the quality of motor performance to be decoded by SVR using as little as 1 s of neural data from a single microelectrode. Importantly, decoding relied upon a broad range of frequency bands in a patient-specific manner. The links between neural rhythms and motor dysfunction on these short timescales suggest that such oscillations do not simply set a broad state within which motor performance fluctuates but rather that the rapid dynamics of these oscillations correlate directly with impairments in immediate, ongoing motor behavior.
With a few notable exceptions, prior work examining the relationships between oscillatory activity and PD manifestations considered only longer timescales, corresponding to on vs. off medication states, or on vs. off DBS stimulation. Some of what those studies found mirrors what we observed here at shorter timescales, such as a possible distinction between high and low frequency β activity [41][42][43][44], the importance of sub-β (i.e. θ/α) activity [45], and the possible relationship between higher-frequency activity and more fluid motor output [46,47]. Phase amplitude coupling between cortical β-range and vhf activity has also been proposed as a potential biomarker of PD and has been shown to be reduced by DBS [29], and recent work has suggested that at least some symptom improvement due to dopaminergic medication is related to a shift in coupling in the basal ganglia rather than a total suppression [30]. Here we show that power across a wide gamut of frequencies was significantly associated with PD motor dysfunction. Furthermore, combining multiple spectral features significantly improved motor performance decoding. Interestingly, we observed novel interactions between activity across different portions of the spectrum, such as negative correlations between α/low-β and high-β and between mid/high γ and vhf. Such interactions may provide clues about the underlying mechanisms that generate PD-relevant activity.
The few notable studies that have examined PD neural activity in relation to shortertimescale motor behavior have focused on specific frequency bands. For example, β 'bursts' immediately preceding movement were found to be associated with impaired movement dynamics [48], and these bursts were linked to bradykinesia during repetitive movements [49], consistent with our results in the low, but not high, β range. Meanwhile, γ activity was observed to correlate positively with movement velocity at short timescales [50]. In contrast, we found that vhf activity was most consistently reflective of effective motor output.
The increased performance of multi-spectral motor performance decoding, compared to single-band decoding, may result from better estimation of multiple dimensions of motor signs, or it could reflect better estimation of individual motor features that have representations in multiple parts of the spectrum. In addition, the wide range of informative frequency bands could reflect cognitive processing related to the recognition, evaluation, or attempted correction of motor errors [51,52]. This might help to explain the presence of significant information about motor performance throughout the recorded region, outside the dorsolateral, sensorimotor 'hot spot.' The precise pattern of informative oscillations varied across individuals. Some of this may reflect anatomical differences (inexact correspondence of STN subregions recorded across patients). For example, the distributions of β and α activities may differ topographically across the STN [53]. Here, neural recordings were performed opportunistically when the operative procedure allowed; this may have been in part responsible for the variability of symptom state decoding accuracy and feature weights across patients. Furthermore, there is likely an interaction between spatial location and symptomatic subtype in determining the particular pattern of neural oscillations [46,54]. Nonetheless, we observed relatively poorer generalizability of neural decoders across individuals, suggesting that while there may be some common substrates of particular symptoms within certain regions of the STN, there is nevertheless significant heterogeneity that may be harnessed to yield more optimal decoders using patient-specific biomarkers. Furthermore, because microelectrode-derived LFP recordings detect more spatially-restricted signals, our use of micro-LFP signals rather than macro-electrode (e.g. DBS electrode) LFPs may have highlighted individual differences. Consistent with this, prominent STN signal heterogeneity has been observed using experimental DBS electrodes with smaller electrical contacts [55].
Our approach to decoding ongoing motor dysfunction using neural activity may potentially be further optimized. On the front-end, while the continuous visual-motor tracking task succeeded in differentiating the behavior of subjects with PD from controls to a high degree, it is unlikely this task captures the full 'ground truth' of motor disability, even for the single upper extremity for which it was evaluated. Improved methods to quantify motor signs, including perhaps over the entire span of bodily movement, may further enhance decoding accuracy.
Additional limitations may have also influenced these results. Control subjects performed the task in a clinical setting, and patients were necessarily in the intraoperative setting for these sessions. Short-acting medications such as propofol and dexmedetomidine are often administered at the beginning of a case; even though at least 60 min elapsed between drug administration and task performance, there is the possibility that lingering pharmacologic effects altered neural activity. Our patient population also demonstrated a stronger male bias than reported literature [23]. Lastly, the extent to which neurophysiological activity observed in any given moment is necessarily Parkinsonian or whether it might also reflect more general aspects of brain states associated with task engagement (e.g. attention) is unknown.
On the back-end, for MES decoding, we restricted the input feature space to LFP power at different frequencies. This ignored significant information that is likely present in the phases of these oscillations, particularly when combined with amplitude information [27,44,54,56]. Further optimization of the feature sets and algorithms (e.g. artificial neural network approaches) may yield more precise decoders of motor impairment, though whether decoding remains stable over longer periods of time remains to be investigated. Nonetheless, our use of simple features and models for neurophysiologically-based MES decoders allowed us to focus on the links between specific aspects of neural activity and quantified behavior in a relatively straightforward manner.
Short timescale neurophysiological biomarkers of PD motor dysfunction may enable several important, practical applications. Specifically, the ability to estimate and track motor quality continuously over time would allow objective feedback about real-world, 'in the wild' disease burden and so may facilitate adjustments of medical therapy. Likewise, closed-loop DBS, in which stimulation is delivered dynamically, as needed, requires a control signal that reflects the immediate or impending presence of symptoms. Our results suggest that proposed strategies focusing largely on broad β-band activity may be suboptimal and not fully generalizable across patients and STN sites. Note also that STN β decreases just before and during movement [57][58][59], including tremor [60], so its use as a surrogate for PD motor dysfunction during ongoing behavior might be limited [61]. Furthermore, β oscillations are not necessarily pathological, even in PD [62], and some evidence exists that β activity may not be causal of Parkinsonian symptoms [63]. Therapeutic stimulation may even enable the brain to regulate appropriate cortical β power [64], so stimulating simply to disrupt β might have unanticipated adverse influences on motor behavior. That and other work [11,51] have suggested targeting longer β episodes might more selectively address PD symptoms. We show here that patient-specific, multi-spectral decoding of motor impairment can further refine the strategy for implementing closed-loop DBS. Ultimately, a better understanding of the links between neural activity and motor behavior on the relevant timescales, and novel algorithms to fully exploit these links, will guide the design of more advanced therapies to realize the full potential of neurophysiology and neuromodulation for optimal patient benefit.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.  Motor task and behavior. (A) A continuous motor performance task was used to assess fluctuations in PD motor performance at short timescales. Subjects were instructed to manipulate a joystick to direct a cursor (white) to follow a moving target (green circle) as closely as possible. Yellow arrows depict example trajectories (not displayed to the subjects) with the target following a preset path, (also invisible to the subject, shown here as a dark lemniscate). (B) A library of eight motor metrics was created to quantify potential deviations of the cursor from the target. The calculation of three metrics is depicted here (all eight metrics are defined below). The per-epoch estimates of each metric created a distribution in 8-dimensional space; the distribution for a PD subject was compared to those of control subjects via SVM classification. The normal distance from the SVM-defined hyperplane to each point was then defined as the MES for that epoch.   Neural decoding of motor performance: time-series. Multi-spectral decoding of motor performance using neural signals obtained from the STN is demonstrated at the 7 s timescale (A) the 3 s timescale (B) and the 1 s timescale (C). In each case, time series for the 10 subjects with the highest decoding accuracies at each timescale are shown (in descending order) to provide a sense of the capabilities of this approach. In each panel, the observed MES (green) is plotted against the SVR-decoded MES (black); SVR estimates are plotted against MES segments with associated neural data that were not used for training the corresponding decoding models. Neural decoding of motor performance: timescales and anatomic localization. (A) The mean of subjects' peak decoding accuracies are shown as a function of timescale. Accuracy tended to increase with increasing window size, though a relative plateau in decoding accuracy was observed at about the 8 s timescale. (B) A wide range of decoder performance was observed across signals, within individuals. Decoding accuracies (r-values, here at the 7 s timescale) are grouped in increasing order by subject. The highest values, reflecting the best achieved decoding performance for each subject, are shown in red. Gray lines show the means of the control analysis in which decoding was performed on neural data shuffled with respect to the MES. (C) To determine if the best decoding performance tended to arise from signals that were anatomically clustered, recordings yielding the highest (top third) and lowest (bottom third) accuracies were localized on an MNI reference volume (approximate outline of STN overlaid in black), and the differences between these were plotted to reveal the sources of the most informative signals. Coronal, axial, and sagittal slices are shown through the region of peak accuracy, observed near the dorsolateral border of the STN (insets: L = lateral; A = anterior).
near the bottom represent regions where weights were lower, based upon a contiguitysensitive permutation test (p = 0.0002-0.0064; see Methods). (C) To determine if crossfrequency patterns of SVR weight assignments existed, a cross correlation between feature weights for every pair of frequencies was calculated, across SVR models. This is plotted (top) along with the mean absolute correlation for a particular frequency in relation to all others (bottom). The contours on the correlation matrix depict significance (α) levels of 0.001 and 0.01 (dark and light contours, respectively) computed using a bootstrap permutation test to create 'null' distributions for each point on the matrix (see Methods). The overlaid region on the lower area plot includes the mean (red line) and mean ± 3 standard deviations (yellow line) as determined using the same bootstrap distributions as for the correlation matrix. Several significant peaks are visible above this range, most notably in the θ/α and low β ranges, in the low/mid γ range, and in the vhf range, confirming crossfrequency interactions were most consistently observed across these ranges. Generalizability of MES decoding models. (A) A 'generic' classifier was constructed using the average, performance-weighted linear SVR model coefficients, across patients, for each frequency feature. The performance of the generic decoder (x-axis) was compared against SVR decoders generated for each subject (y-axis). Significant, above-chance decoding was observed using the generic decoder for 16 of 22 patients (red). Directly comparing the generic and patient-specific decoders, four patients had significantly greater decoding with the generic decoder compared to the patient-specific decoder, while 16 showed significantly greater decoding with the patient-specific decoder. (B) To determine if there might be particular decoding models derived from individual subjects that generalized well to others, decoders were trained on one subject's data and tested on another's data to estimate the latter's MES, across all pairs of subjects. The results are plotted in a cross-decoding matrix where the best result across each pair of subjects is shown (selected from among all the recording pairs between those subjects). The cases along the diagonal (where the same subject's data were used for training and testing) are referred to as 'autologous,' whereas the off-diagonal cases (reflecting training and testing across subjects) are termed 'homologous' conditions. Because there are more combinations of homologous electrode pairs across subjects than there are autologous cases within subjects, this analysis potentially was biased against the autologous condition. Nonetheless, the best-performing cases were typically found on the diagonal. Furthermore, the lack of horizontal banding in the matrix demonstrates there was no 'special' decoder that succeeded in generalizing across patients, and the lack of vertical banding demonstrates there was no particular patient whose neurophysiological 'fingerprint' was sufficiently simple that it could be estimated successfully using some broad set of homologous decoders that happened to emphasize some simple feature.