A machine-learning approach to volitional control of a closed-loop deep brain stimulation system

Objective. Deep brain stimulation (DBS) is a well-established treatment for essential tremor, but may not be an optimal therapy, as it is always on, regardless of symptoms. A closed-loop (CL) DBS, which uses a biosignal to determine when stimulation should be given, may be better. Cortical activity is a promising biosignal for use in a closed-loop system because it contains features that are correlated with pathological and normal movements. However, neural signals are different across individuals, making it difficult to create a ‘one size fits all’ closed-loop system. Approach. We used machine learning to create a patient-specific, CL DBS system. In this system, binary classifiers are used to extract patient-specific features from cortical signals and determine when volitional, tremor-evoking movement is occurring to alter stimulation voltage in real time. Main results. This system is able to deliver stimulation up to 87%–100% of the time that subjects are moving. Additionally, we show that the therapeutic effect of the system is at least as good as that of current, continuous-stimulation paradigms. Significance. These findings demonstrate the promise of CL DBS therapy and highlight the importance of using subject-specific models in these systems.


Introduction
Essential tremor (ET) is one of the most common neurological movement disorders, affecting up to 4.5% of the population over the age 65, and up to 20% of the population over the age 95 [1]. The cardinal symptom for ET is tremors during voluntary movement, typically in the dominant hand and arm [2]. A common therapy for treating the symptoms of ET is deep brain stimulation (DBS), which consists of an electrode implanted into a deep brain region (typically the ventral intermediate nucleus (VIM) of the thalamus) that continuously delivers high-frequency stimulation (130-180 Hz) to the area. Through a still-unknown mechanism, stimulation alleviates many of the symptoms of ET. Despite its therapeutic effect, continuous stimulation has several drawbacks, such as side effects [3][4][5] and an unnecessary depletion of battery life S Supplementary material for this article is available online (Some figures may appear in colour only in the online journal) Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
can change on both short (seconds to minutes, due to presence of symptoms only during movement) and long (months to years, due to the neurodegenerative nature of a disease) timescales [6].
A way to improve DBS is to integrate it into a closedloop (CL) system where stimulation is only delivered when the patient is experiencing symptoms. Such a system would increase battery life and minimize the time that the patient experiences stimulation-induced side effects. To determine the correct time to stimulate, a CL system requires a biosignal that is indicative of the presence of symptoms. Importantly, for ET, detecting symptoms themselves may not be necessary. Rather, the underlying volitional movement that evokes the tremor can be readily detected due to the stereotypical changes in the neural activity it causes, despite the presence of tremors [7,8]. These changes are most prominent at frequencies between 12-30 Hz, often termed 'β-band'. Additionally, there is evidence that there are similar changes in neural activity during tremors [9,10]. Thus, neural signals are a promising biofeedback signal for a CL DBS system since they are correlated with both normal and pathological movement. Indeed, successful CL control of DBS using activity recorded from the DBS electrode has been demonstrated in Parkinson's disease patients [11]. Additionally, CL stimulation using cortical signals for the treatment of epilepsy has seen marked success, with an implantable neurostimulator currently FDA approved for patient use [12]. Recently, a proof-of-concept study from our group, using a single ET patient, showed that cortical activity could also be used as a feedback signal for CL DBS [13]. However, these results still need to be extended to multiple patients.
Both aforementioned studies used manually-tuned thresholds to detect power changes in specific local field potential (LFP) frequency bands, which would be onerous for clinicians to perform and may not make use of all the neural features available. Furthermore, despite the occurrence of common phenomena, such as beta-band desynchronization [14], neural activity has subject-specific features, which make using a general rule for extracting content suboptimal [15]. Machine learning (ML), on the other hand, can be used to automate the process of building models that relate neural activity to symptoms and can determine which neural features are most useful. While ML has been used offline to detect movement disorder symptoms in previously recorded data [16][17][18][19][20][21], it has never been used to detect symptoms from neural activity in a real-time CL DBS system.
Herein we develop and test a CL DBS system that relies on ML to build subject-specific models for predicting volitional, tremor-evoking movement from cortical activity in several human ET patients and describe the performance of the system. We show that the system is capable of accurate detection of tremor-evoking movement and that the system exerts a high therapeutic effect. A preliminary form of this study in a single ET patient is found in [22].

Subjects
Three male human subjects (see demographics and stimulation settings in table 1) with ET were implanted with the Activa PC + S (Medtronic, Inc.) investigational DBS device for the treatment of symptoms [23]. Additionally, they were implanted with a four-contact ECoG strip electrode over the hand/arm area of the primary motor and somatosensory cortex. Electrode locations in all three subjects were verified by co-registered CT and MRI scans and functional screening (figure 1). Additionally for subject S3, ECoG electrode location was intra-operatively verified by examining somatosensory evoked potentials during median nerve stimulation (subject S3 did not receive an improved outcome, compared to the other subjects). Subjects provided informed consent in accordance with the institutional review board and participated in the study for up to two years (S1 = 2 y, S2 = 1 y, S3 = 3 mo). Prior to experiments, each subject's DBS parameters (contacts, voltage, frequency and pulse width) were set by a trained clinician for optimal therapeutic benefit. During the experiments, only the DBS voltage was altered by the CL system and this voltage was always between 0 volts and the maximum voltage set by the clinician. The stimulation voltage was increased/decreased in 500 mV steps, with stimulation updates occurring every 400 ms. This ramping paradigm was chosen to allow a reasonable system response time but also to mitigate any paresthesias that occurred due to increasing stimulation too quickly. The subjects did not discontinue any medication for the treatment of symptoms prior to the experiments.

Data collection
During the experiments, a single channel of cortical activity was sensed in a differential configuration from two electrode contacts by the Activa PC+S device. The cortical signal was low-pass filtered at 100 Hz, high-pass filtered at 0.5 Hz and sampled at 10-bit resolution and 422 Hz with a gain of 2000. The ECoG electrode contacts used for sensing cortical activity were selected for each subject by recording from all six potential electrode configurations and examining which signal had the highest power in the beta frequency range (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30). It confirmed by observation the decreases in this power during movement of the right arm and/or hand. Compressed ECoG LG G-watch) was worn on the right wrist, and three-axis gyroscope signals were sampled at 100 Hz and streamed via Bluetooth onto the laptop computer.

Experimental tasks
All experiments for a given subject were performed during a single visit. The subjects first engaged in a prompted movement (PM) task that entailed repeatedly performing a phasic or tonic movement that evoked tremors. The movements for subjects S1, S2 and S3, respectively, were holding his right hand close to his nose, holding a pen in his right hand close to a fixed point on a wall, mimicking using a screwdriver, and holding his right hand up near his head and slowly rotating it back and forth. Movement and rest periods were prompted by a voice saying 'rest' and 'move' and/or a computer screen displaying the same words (the choice of cues was determined by the equipment available at the time of the experiment), and the duration of each movement/rest period was approximately 30 s. A PM trial lasted approximately 4 min, and during training, a trial was carried out with stimulation both off and on. After classifier training, the PM task was repeated several more times to examine the performance of the CL system. During testing, the duration of each movement/rest period was drawn from uniform distribution between 3-12 s. This distribution was chosen to reflect the relatively short duration of many natural movements, such as bringing a spoon to one's mouth, as well as to prevent fatigue from repeating long movements.
Subjects performed approximately 16 movements over the course of about 4 min.
Additionally, subjects engaged in the Fahn-Tolosa-Marin (FTM) tremor assessment [27], which scores tremor during various movements and postures. The subjects performed this assessment with stimulation off, stimulation on at the therapeutic voltage and stimulation under CL control. For the CL portion of the task, the classifiers used were the same as those used during each subject's PM task. A video of the subjects performing the FTM assessment was recorded and symptoms were evaluated by three blinded clinicians trained in evaluating movement disorders.

Classifier training
Using cortical data collected during the PM tasks, binary classifiers were trained offline to detect tremor-evoking movement from neural activity. Due to the stimulation resulting in physiological responses being seen in the signal, separate classifiers were trained for when stimulation was off and on. Power spectral density (PSD) features were extracted from cortical data over a sliding, 1 s window using Welch's method (50% overlap with a 0.5 s Hann window), resulting in frequency bins with a width of approximately 2 Hz. Features were extracted using this sliding window five times per second for offline training. This frame sampling was chosen to provide a sufficient time resolution of changes in the spectral features. Due to the presence of stimulation artifacts in various parts of the spectrum, only power in 2 Hz bins in the frequency range [4,28) Hz was used for the features. The features were normalized to have zero mean and unit standard deviation. Gyroscope signals recorded from the smartwatch were visually inspected to determine tremor-evoking movement/rest epochs for labeling neural data. These features and labels were used to train logistic regression classifiers. Logistic regression was chosen due to its relatively low computational complexity being amenable to real-time processing. L2-norm regularization was used to prevent overfitting , and a grid search method was used to find the optimal hyperparameter for maximizing classifier sensitivity. Features and labels were partitioned into two equal halves (without shuffling) for cross-validation of the classifiers during training. The classifiers were incorporated into the CL DBS system (figure 2). During real-time testing, a packet containing 400 ms of neural data was sent to the computer using the Nexus-D bridge. This packet was added to a buffer, and the last 1 s of neural data was preprocessed and fed into the system of classifiers. If the system classified the 1 s frame as 'movement', a command was sent back to the PC+S to increase the stimulation voltage by one step (500 mv). If the system classified the frame as 'no movement', a command was sent back to the PC+S to decrease the stimulation voltage by one step (500 mV). These updates occurred every 400 ms.

System evaluation
The performance of the system was evaluated using several different metrics. First, was classifier performance, both in terms of overall accuracy and sensitivity. Due to the noninstantaneous changes in stimulation voltage in response to false negatives, the stimulation often remained on despite occasional errors by the classifiers. Due to this fact, the system was also evaluated in terms of the percentage of time that stimulation was on while the subject was moving (termed 'system sensitivity') and percentage of time that stimulation matched the movement condition (termed 'system accuracy'). Reduction in average stimulation amplitude (compared to stimulation at the therapeutic voltage) and system delay were also used to evaluate performance.
Additionally, a video of the tremor assessment task was scored by three blinded clinicians to evaluate the effect of the different stimulation conditions. Scores from each clinician were normalized to be a percent reduction in score compared In the CL system, neural activity is sensed by the cortical electrode and streamed onto a laptop computer. Power features are extracted from the time-series cortical signal and are used by the pair of classifiers to determine if any tremor-evoking movement is detected. If so, a command is sent back to the implant to increase the stimulation voltage (up to a safe limit set by a clinician); if no movement is detected, a command is sent back to decrease the stimulation voltage (down to a minimum of 0 V). A smartwatch also collects movement data from the right arm to provide a ground truth for movement/rest detection. to the off-stimulation score, and then averaged across the three clinician's scores. Kolmogorov-Smirnov tests did not reveal a significant departure from normal distributions (p > 0.05), so standard parametric statistical tests were used. Statistical packages for R and python were used for analyses.

CL system performance
Using neural data recorded during the PM task, the PSD was calculated and used as a feature set for creating logistic regression classifiers. There were large, consistent differences in spectra during rest and movement, both with stimulation OFF and ON (figure 3). However, the shape of the spectra, and the differences between rest and movement spectra, were different across subjects, both in power and frequency. These differences in neural activity between subjects are reflected in the coefficients learned by each subject's trained classifiers ( figure 3, black traces).
Using the stimulation off and stimulation on classifiers created with the training data, each subject first repeated the PM task to assess the performance of the CL system when detecting the same tremor-evoking movements that were used to train the classifiers. During this task, the classifiers detected tremorevoking movements in real time from cortical activity and the system altered stimulation voltage accordingly (figure 4, see supplementary video online at stacks.iop.org/JNE/16/016004/ mmedia). To quantify the performance of the system, accuracy and sensitivity were calculated using the predictions from the classifiers. Across all subjects, the accuracy and sensitivity of the classifiers were 75.0% ± 5.6% and 76.5% ± 9.3%, respectively (table 2, Prompted movement (Fast)). Across all subjects, the system accuracy and sensitivity were 67.1% ± 4.6% and 87.7% ± 3.7% (table 2, Prompted movement (Fast)). So, the system was able to deliver stimulation almost 90% of the time that the subjects were performing tremor-evoking movements.
During the test PM task using CL control of DBS, the average stimulation voltage decreased by 38.3% ± 12.7%, compared to each subject's therapeutic stimulation voltage (table 2, Prompted movement (fast)). Stimulation was on at some voltage (not necessarily the therapeutic voltage set by the clinician) during 81.7% ± 7.0% of the PM task. During the PM task, subjects were moving 64.4% ± 4.9% of the time, respectively, so stimulation was on more often than the subjects were moving. This was expected since the classifiers were optimized for sensitivity (i.e. prioritizing detecting all movements at the expense of potentially having more false positives) and because of the non-instantaneous changes in stimulation voltage. The delay between the initiation of movement and the first increase in stimulation voltage was 1.56 ± 0.28 s (movements, which were initiated while . The CL DBS system is able to detect tremor-evoking movements (as recorded by a gyroscope, shown in black and gray; classifier detection shown in red) from changes in cortical activity (spectrogram) and respond by increasing the stimulation voltage (blue) during the PM task. Representative data shown are from subject S2. stimulation was already on, were not used for estimating the delay).
To begin investigating possible design choices for optimizing the CL system, the control policy used to update the stimulation voltage was altered; instead of decreasing the stimulation voltage every time a negative classifier result was obtained (fast decrementing), the system only began decreasing stimulation after two consecutive negative classifier results were found (slow decrementing). This was done to increase the sensitivity of the system. The PM task was repeated using this updated control policy and, as expected, the sensitivity of the system increased (92.0% ± 3.4%).
Next, the subjects engaged in drawing and writing tasks as part of the Fahn-Tolosa-Marin (FTM) tremor assessment. During these tasks, the classifiers that were trained and tested during the PM task were used to detect movement as part of the CL DBS system. These tasks were used not only to examine the therapeutic effect of the CL stimulation, but also to assess the performance of the system on movements that are more representative of movements the subjects might perform during everyday activities. For subjects S1 and S3, the fast decrementing control policy was used for updating the stimulation voltage during the FTM task, while for subject S2, the slow decrementing control policy was used. Different control policies were used for different subjects to see if they had an effect on performance, however, not all policies could be evaluated on all subjects due to the constraints on the duration of the experiment.
During the drawing/writing tasks, the accuracy of the classifiers improved compared to the acc uracy during the PM task with the classifiers correctly detecting movement around 85% of the time (table 2, Spiral, Line Drawing, Writing). Accordingly, because of the non-instantaneous changes in stimulation voltage, stimulation was on for the duration of the task in all (table 2, Spiral, Line Drawing, Writing). There was only a decrease of between 1.8% and 4.3% in average stimulation voltage, compared to the therapeutic voltage set by the clinician because stimulation was on at a voltage close to that of the actual therapeutic voltage set as a maximum.

Therapeutic effect
To assess the therapeutic effect of the CL DBS system, subjects engaged in the FTM tremor assessment with stimulation under CL control. The classifiers used for movement detection during the FTM assessment were the same as those used during the PM task (for subjects S1 and S3, the PM-fast system was used; for subject S2, the PM-slow system was used). For a comparison of therapeutic effect, subjects also engaged in the FTM assessment with stimulation off and with stimulation on at the therapeutic voltage. A video of the subjects performing this tremor assessment under the different stimulation conditions was recorded, and three blinded clinicians specializing in movement disorders rated the severity of tremor (intra-class correlation = 0.63).
There was a significant group effect of stimulation on clinical scores (figure 5) (within-subjects analysis of variance (ANOVA); F = 43.3, p = 0.0019). Continuous stimulation resulted in an improvement of 42.3% ± 7.5% compared to no stimulation (Student's t-test; t off-on = 10.0, p off-on = 0.0154). CL stimulation resulted in a significant improvement of 46.0% ± 4.9%, compared to no stimulation (Student's t-test; t off-cl = 13.4, p off-cl = 0.0055). While CL resulted in slightly more improvement in scores than continuous stimulation, this result was not significant (Student's paired t-test; t cl-on = 0.5, p cl-on = 0.65). This lack of a significant difference between CL and continuous stimulation scores is likely attributable to subject S2, who was the only subject out of the three whose scores during continuous stimulation were better than during CL DBS.

Discussion
To date, this work represents the first time that ML has been used in a chronically implanted, CL DBS system in human ET subjects. The CL system was evaluated on (1) its ability to accurately detect movement and deliver stimulation when detected (2), the system delay, and (3) the power savings achieved. The performance of the CL system was tested during a volitional PM task where subjects repeated a stereotyped movement that evoked tremor, as well as during more natural movements, such as drawing and writing. The system cannot necessarily be described as detecting tremor itself, since changes in cortical activity during movement and tremor are largely conflated [7][8][9][10]. However, these changes are robust and consistent and thus provide a feedback signal indicating when tremor is possibly occurring and when stimulation should be delivered. So, the system could be described as a CL-brain computer interface (BCI) system, but unlike typical BCI systems, the CL-DBS system requires no additional mental effort other than what is normally required to move.
Across the subjects, the classifiers were accurate about 75% of the time during the PM task and 85% of the time during the natural movement tasks. While these accuracy values are not ideal, it is important to note that this was a pilot study involving only a few subjects, using a single channel of cortical data with a simple, linear classifier that was only trained using a few minutes of data. Despite these limitations, the ML approach taken resulted in classifiers that adequately captured each subject's unique, movement-related changes in neural activity. Additionally, the system of classifiers was able to cope with stimulation-related changes in neural activity that might have confused simple, power-based thresholding algorithms.
It is noteworthy that not only was detection robust across several different movements, despite only training classifiers on data from the PM task, but performance actually improved on the more natural movement tasks. A likely reason for this improvement is that the classifiers favored features in beta band (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) Hz; see figure 3 for classifier weights). The changes in activity in this band are spatially widespread over the motor cortex and other areas and show characteristic changes that are not movement-specific [15]. Also, the movements performed during the PM task were tonic in nature for two of the three subjects, such as holding a hand to the face. When stimulation was off, this resulted in tremors. However, when stimulation was on, no tremor was present, and thus there was much less movement and therefore less desynchronization in beta bands. Future studies may benefit from using data from both tonic and phasic movements to train classifiers.
While the accuracy of classifiers is a common and useful metric, sensitivity may be a more appropriate metric since it only penalizes false negatives. Sensitivity of the entire CL system itself (the percentage of time that stimulation was on when movement was occurring) was also used to measure performance. The sensitivity of the system was around 90% during the PM task and 100% for the writing/drawing tasks. So, the system consistently delivered stimulation the majority of the time that the subjects were engaging in movements. It must be noted that these system sensitivities only reflect whether stimulation was on or not and does not consider the amplitude of the stimulation. There is typically no linear mapping between stimulation amplitude and therapeutic effect, so it is hard to know if transient, sub-therapeutic stimulation voltages still deliver an effect. However, the stimulation voltage typically stayed close to the maximum therapeutic value most of the time (see figure 4, for example) and there is also evidence that intermittent, non-constant voltage stimulation may be more therapeutically effective than continuous stimulation, if delivered at the appropriate time [11,24]. During both the PM task and the drawing/writing tasks, the average stimulation voltage was less than the therapeutic value set by the clinician for continuous stimulation. It is important to note that all power consumption has only been discussed in terms of the average stimulation voltage during a task, which is only a fraction of the power consumption from streaming data to and from the device. The reason for portraying data in this manner is that future iterations of the system will embed CL capabilities onto the device itself, removing the power consumption due to telemetry from the equation.
Across subjects, the time between detecting the onset of a movement during the PM task and the response of the system to increase stimulation was, on average, about 1.5 s. Ideally, this delay should be as small as possible so that symptoms are mitigated quickly. A large portion of this delay (at least 1 s or more [7]) comes from transmission delays between the implanted device and external hardware. As aforementioned, future systems will circumvent this transmission time by using embedded algorithms to detect symptoms and trigger stimulation.
The preceding discussion has evaluated the performance of the CL system in the context of tremor-inducing movement detection, system delays and the correct delivery of stimulation. A related, and more important performance metric for a CL DBS system is its therapeutic effect, that is, its ability to mitigate the symptoms of the disease that afflicts the DBS user. To examine this, the ET subjects underwent a clinical tremor assessment under different stimulation conditions and their performance on the test was rated by blinded clinicians. Across subjects, scores were lowest during CL DBS, although the difference in scores between CL DBS and continuous DBS was not significant. This result agrees with other published CL DBS studies using neural activity as a control signal, which also saw CL DBS being more therapeutically effective than continuous stimulation [11,25]. The reason for this apparent superiority is not clear, however, the intermittency of CL stimulation may prevent adaptation that diminishes the effect of the stimulation. More work is necessary to elucidate these mechanisms.
For both subjects S1 and S3, FTM scores were lowest during the CL stimulation. However, for S2, FTM scores were lowest during the constant stimulation. Although different control policies were used for S1/S3 and S3, it seems unlikely that this contributes to the different outcomes, since simulation was on at the maximum level during the movement portions of the FTM for all three subjects (see table 2). The portion of the FTM evaluation that contributed most to the difference between CL and constant-stimulation scores was left-limb movements, which was worse during CL. For all three subjects, stimulation was often on during left-sided movements, but varied a little more in amplitude than during right-sided movements. It is possible that subject S3 was more sensitive to these variations in amplitude or slightly lower average amplitude during left-sided movements than the other two subjects.
It is important to note that the stimulation ramp rate was constant throughout the study (500 mV steps every 400 ms). While ramp rate is important for optimizing therapy, it can also have the undesirable effect of causing paresthesia when too large a value is used. All three subjects reported paresthesia during the frequent stimulation changes in the CL testing, however, this side-effect was well-tolerated and eventually diminished. This issue highlights one of the many potential problems that need to be addressed in future CL system design.
While the performance of this system may not currently be sufficient for replacing continuous stimulation therapy, this pilot study shows the transformative potential of ML-based, CL DBS systems. Future hardware iterations will provide multiple sensing channels and much larger frequency bands, making manual threshold tuning, the only other CL-DBS feedback detection algorithm used thus far [11,13], untenable. The ML paradigm used in this system is much more amenable to such a large control feature space. Furthermore, the ML system used here could easily be altered to construct an adaptive system whose parameters change to optimize therapy over extended periods of time. The ease-of-use, automation and adaptability of the ML paradigm to patient-specific changes in feedback signals over time are definite benefits over threshold-based algorithms. Future hardware will also embed these algorithms on the device itself, greatly reducing or even eliminating the delay in the system. Several studies have used motion sensors or muscle activity from extremities to trigger stimulation but these systems depend on tremor manifesting itself before they can work [24,26]. However, an embedded, brain-controlled system may be able to preempt symptoms, given the stereotyped changes in neural activity that occurs hundreds of milliseconds before movement [9].
In summary, we demonstrated a novel CL DBS system for the treatment of ET that used ML to create subject-specific models relating cortical activity and tremor-inducing movement. As a proof of concept work, this system was created to be as simple as possible. Despite the simplicity of the system, it was able to reliably detect tremor-inducing movement and deliver stimulation at the appropriate time during a variety of tasks. Because of this, the CL system was able to deliver a therapeutic benefit that appears to be as good as that of continuous stimulation. This work adds to the growing literature demonstrating CL DBS systems as a promising technology for optimizing the tradeoffs between therapeutic benefit, sideeffect mitigation and power consumption. Furthermore, it highlights the growing need for relationships between engineering and neuroscience disciplines to address issues arising during the development of next-generation medical devices and technologies.