Movement decoding using neural synchronization and inter-hemispheric connectivity from deep brain local field potentials

Objective. Correlating electrical activity within the human brain to movement is essential for developing and refining interventions (e.g. deep brain stimulation (DBS)) to treat central nervous system disorders. It also serves as a basis for next generation brain–machine interfaces (BMIs). This study highlights a new decoding strategy for capturing movement and its corresponding laterality from deep brain local field potentials (LFPs). Approach. LFPs were recorded with surgically implanted electrodes from the subthalamic nucleus or globus pallidus interna in twelve patients with Parkinson’s disease or dystonia during a visually cued finger-clicking task. We introduce a method to extract frequency dependent neural synchronization and inter-hemispheric connectivity features based upon wavelet packet transform (WPT) and Granger causality approaches. A novel weighted sequential feature selection algorithm has been developed to select optimal feature subsets through a feature contribution measure. This is particularly useful when faced with limited trials of high dimensionality data as it enables estimation of feature importance during the decoding process. Main results. This novel approach was able to accurately and informatively decode movement related behaviours from the recorded LFP activity. An average accuracy of 99.8% was achieved for movement identification, whilst subsequent laterality classification was 81.5%. Feature contribution analysis highlighted stronger contralateral causal driving between the basal ganglia hemispheres compared to ipsilateral driving, with causality measures considerably improving laterality discrimination. Significance. These findings demonstrate optimally selected neural synchronization alongside causality measures related to inter-hemispheric connectivity can provide an effective control signal for augmenting adaptive BMIs. In the case of DBS patients, acquiring such signals requires no additional surgery whilst providing a relatively stable and computationally inexpensive control signal. This has the potential to extend invasive BMI, based on recordings within the motor cortex, by providing additional information from subcortical regions.


Introduction
Brain-machine interfaces (BMI) with both neural decoding and stimulation have recently attracted great attention in the deep brain stimulation (DBS) and optogenetics research fields. Decoding of neural activities related to movement or pathological states is essential in developing bi-directional BMIs as well as adaptive neuroprosthetics [1][2][3][4][5][6]. A bidirectional interface not only enables the decoded brain activity to be used to drive intelligent stimulation patterns for treatment of neurodegenetive diseases but also provides tools for investigating neural circuits. Therefore, a key element of such an interface is the decoding algorithm, which extracts information from the raw neural signals to communicate with output devices. There has been substantial progress made in neural decoding algorithms in the last few decades, which has significantly advanced development of various neural interfaces [3,[7][8][9][10][11][12]. However, barriers remain in the transfer of neural interfaces from laboratory to useful clinical applications, especially when neural signals from real patients with neural implants are used as such data is limited and can be influenced by a variety of human factors due to pathological or individual effects.
One rare opportunity to monitor the neural activity of the human brain is to directly record local field potentials (LFPs) from patients following functional neurosurgery of DBS electrode implantation [13]. The most common target for DBS implantation is within the basal ganglia nuclei, specifically the subthalamic nucleus (STN) or globus pallidus interna (GPi) for treatment of movement disorders [14,15]. The LFP can be recorded through the implanted DBS electrodes. The function of the basal ganglia in motor control has been well studied through recordings of LFPs in patients with Parkinson's disease (PD) or dystonia when performing specific behavioural tasks [15][16][17][18][19][20]. The recordings are generally made within the week following electrode implantation, with the electrode leads externalized prior to connection to the neurostimulator. Such postoperative recordings are captured from patients during execution of purposely designed experimental paradigms [21].
Recently, there has been some work investigating the oscillations at ultra-high frequencies up to 500 Hz [15]. Such oscillatory activity not only shows functional connectivity with disparate cortical oscillations but also movement-related de-synchronization and synchronization in the STN and/or GPi during externally cued or self-paced movements [14,21,24]. This suggests that oscillations may be involved in motor preparation and control, and provides a means of communication of neuronal information, for example, through neural binding. Many previous studies have shown that both during and after externally cued or self-initiated movement, de-synchronization of the beta frequency band dominates in the STN and/or GPi [14,16]. Both contra-and ipsi-lateral gamma band synchronization have also been found in STN LFPs during wrist extensions [16]. However, during less well-defined real-world tasks involving complex behaviours such as turning or balance, these frequency specific synchronizations can be hidden beneath intrinsic or extrinsic neuronal noise, have high variability and/or encompass other brain areas [25]. Instead, Basal ganglia STN activity can also be modulated by imaginary movements. Motor imagery studies involve the subject imagining to perform a specified well-defined action or simply watching visual images of movements [26]. Such imagined movements lead to event-related synchronization (ERS) and de-synchronization (ERD) which have been found to have similar frequency and time course characteristics to that during actual voluntary movements [14,16,26]. Therefore, decoding algorithms appropriate for movement decoding are applicable to motor imagery induced ERS/ERD and will enable elucidation of motor intention without any physical movement occurring. This can potentially open up a communication and/or control pathway for individuals with little or no movement.
As stated above, basal ganglia frequency dependent oscillations are related to motor preparation, execution and decision [16,21,26]. Loukas et al have previously investigated the onset of voluntary hand-movement prediction using a neural network approach [21]. They evaluated features of the STN LFPs using three different spectral measures including the fast Fourier transform, continuous wavelet transform and the statistical properties of wavelet spectra. The wavelet transform features coupled with a linear vector quantization neural network achieved high accuracy (95% sensitivity and 77% specificity) for online movement (rest versus action) prediction of self-paced hand-movements prior to the event onset. However, decoding the laterality of movement (e.g. left versus right side) was not explored.
Human limb movements are controlled by the contralateral cerebral hemisphere [7]. Left and right hand movements are associated with different spatiotemporal patterns of movement related de-synchronization and resynchronization [27]. Reliably decoding movements from basal ganglia oscillatory neural activities for left and right hands will provide additional information for motor control and bilateral co-ordination [28]. Such basal ganglia movement onset information, specifically from the STN or GPi, incorporated with motor cortex information could potentially enhance movement decoding performance towards the development of robust BMI applications for neuroprothetics. Furthermore, real-time prediction and classification of movement and its laterality will open up several experimental and therapeutic possibilities for neural interface systems, as well as the treatment of diseases and investigation of underlying neural circuit mechanisms. A closed-loop intelligent stimulation strategy based on decoding of neural activity will not only improve the efficacy of the treatment, but also provide demand-driven stimulation according to symptoms, laterality, sleeping or drug intake [15].
The aim of the present study is to decode human voluntary movement information from basal ganglia LFPs. It addresses issues related to the limited amount of data and trails that are feasible from invasive recording of brain activity. By utilizing new feature measures based on causality related to inter-hemispheric connectivity and a weighted sequential feature selection (WSFS) method specifically designed for data sets with small amount of trials and high variability, improves decoding performance. Relevant neural synchronization and inter-hemispheric connectivity features were extracted from the LFPs by utilizing the WPT and Granger causality analysis, respectively. Feature selection and classification of LFPs based on support vector machine (SVM) or Gaussian naïve Bayes (GNB) classifiers sequentially recognize the occurrence of movement and further can discriminate the laterality of the movement. The findings of this study not only enhance our understanding on the underlying processes in the STN or GPi for voluntary movement control but also provide an approach and insight into the development of more advanced BMIs for neuroprosthetics and neuro-rehabilitation.

Patients and DBS electrode implantation
Twelve patients (age 49.6 ± 13.9 years, mean ±1 SD) with PD or dystonia (disease duration 14.8 ± 10.3 years) participated in the study. They underwent bilateral implantation of DBS electrodes in the STN or GPi. Table 1 summarizes their clinical details. The experiment was approved by the local research ethics committee and all participants gave informed consent.

DBS electrode implantation
The DBS electrodes (Model 3387, Medtronic Neurological Division, Minnesota, USA) were bilaterally implanted in the STN or GPi for the treatment of the disease (Parkinson's or dystonia). The electrode has four platinum-iridium cylindrical contacts (1.27 mm diameter and 1.5 mm length) and a contact-to-contact separation of 1.5 mm. The surgical procedure has been described in detail previously [29,30]. The exact position of the electrodes was chosen where a decrease in disease symptoms occurred during intra-operative electrical stimulation of the site and confirmed by examining the postoperative MRI scan or the fused images of pre-implantation MRI with post-implantation CT. All the patients included in this study had at least one pair of electrode contacts within or very close to the STN or GPi. LFPs were recorded via  Electrode pair  used for  analysis  Right  Left  Total   1  58F  10  PD  STN  L23/R12  41  52  93  2  53F  3  PD  STN  L12/R12  37  31  68  3  59M  7  PD  STN  L01/R01  84  71  155  4  60M  13  PD  STN  L12/R01  82  31  113  5  72F  21  PD  GPi  L01/R01  54  56  110  6  55M  10  PD  STN  L12/R01  31  25  56  7  36M  14  Dystonia  GPi  L12/R12  62  61  externalized leads from the STN or GPi during the week immediately following implantation before the pulse generator was implanted.

Cued movement experiment paradigm
LFPs from STN or GPi and surface EMGs were recorded together during a bilateral finger-pressing tasks. These experiments were carried out in a randomized manner with short resting periods between tasks and patients seated approximately 60 cm from a computer screen. Prior to each motor task, they were instructed to place their left and right index fingers on distinct keys on the left or right hand side of a standard keyboard, and to look at a 10 mm cross that was continuously displayed in the centre of the screen. A visual cue (the letter A, 8 mm in height and 7 mm in width) appeared on the screen for a period of 400 ms immediately to the left or right of the central cross, thus indicating which finger to move. The laterality and interval between cues was randomized. Subjects were instructed to press a key ipsilateral to the cue with the corresponding index finger as quickly as possible.

Signal recording
The recording of STN or GPi LFPs were made from the electrode leads which were externalized for 4-6 days postoperatively after the patients had been off medication overnight. Deep brain activity was recorded from the adjacent four contacts of each macroelecrode (0-1, 1-2, 2-3). LFP information between the available contact pairings of each electrode was highly correlated and therefore only one contact pair per electrode was utilized. The analysed contact pair is shown in table 1 and was selected based on postoperative imaging, using the contact pair placed within or very close to the STN or GPi, and which exhibited the highest beta power across the trials [21]. Signal segments containing premature, absent, or erroneous responses to visual cues were excluded. In addition to noting the timing of the key press as registration of the motor response, the onset of the motor response and other voluntary or involuntary movements were monitored using surface EMGs recorded from the index finger. Movements that were executed <1 s before or >5 s after the previous movement were excluded to ensure a limited range of inter-movement intervals and to specifically avoid rapid repetitive movements. Table 1 highlights the number of (right, left and total) trials for each subject used during the analysis. Signals were amplified using isolated CED 1902 amplifiers (×10 000 for LFPs and ×1000 for EMGs), filtered at 0.5-500 Hz and digitized using CED 1401 mark II at a rate of 2000 Hz, displayed online and saved onto a hard disk using a custom written program in Spike 2 (Cambridge Electronic Design, CED, Cambridge, UK).
3. Neural feature representation of movement and its laterality 3.1. Feature extraction 3.1.1. Frequency band synchronization. The WPT has received considerable attention in signal analysis and modelling, with diverse applications including signal detection, classification, compression, noise reduction and image processing [31,32]. The WPT method provides multiresolution analysis of a signal and has the ability to give improved time-frequency resolution over traditional methods, such as the short-term Fourier transform [33]. In particular, the WPT has the capability to localize specific frequency information especially in higher spectral bands as opposed to, for example. the discrete wavelet transform [34]. The WPT decomposes a non-stationary signal, using a binary tree structure, from the time domain to a time-scale representation by recursively using low-pass and high-pass analysing filters.
The efficacy of the WPT is dependent upon the wavelet basis or filter. One common approach to choosing the wavelet filter is to select one with minimum reconstruction error according to an entropy cost function [31,35,36]. Saito et al proposed an empirical algorithm to maximize the discriminant ability of the WPT by using a class separability cost function [37]. Using an empirical method for wavelet filter selection is often more appropriate for the purposes of feature extraction and classification, as opposed to signal denoizing or compression as it provides better matching between the signal of interest and the properties of the wavelet. In this study, the WPT was computed using the discrete Meyer wavelet as it broadly matches the oscillatory characteristics of LFP activities, is orthogonal and allows efficient localization of scale and temporal properties of neuro-electric signals [34]. It has been demonstrated that the discrete Meyer wavelet is capable of isolating frequency components of LFPs and EEG event related potentials [21,34]. Furthermore, STN LFP activities from PD patients have been previously effectively analysed using a discrete Meyer wavelet filter [21,38,39].

Movement laterality through inter-hemispheric
connectivity. Analysis of functional coupling in oscillatory neural activity has led not only to a better understanding of the inherent mechanisms of movement control but also obtaining discriminative information for decoding neural activity [41]. One widely used method of estimating the functional coupling between two oscillatory signals is coherence estimation [42]. The coherence value indicates the strength of coupling between two signals in the frequency domain but does not consider causality. Directional interaction analysis to reveal causal influence or coupling between neural signals is essential to uncover more specific information (e.g. laterality) associated with motor activity. Granger causality is widely used in neuroscience for analysing and identifying directional influences and connectivity between different brain areas and neural activity [42][43][44][45].
Let X t ( ) 1 and X t ( ) 2 denote the time series from two data channels. According to Granger causality, X 2 causes X 1 if the inclusion of past observations of X 2 reduces the prediction error of X 1 within a linear regression model of both X 1 and X , 2 compared to a model which includes only previous observations of X . 1 Similarly for X 1 causing X . 2 To illustrate Granger causality, the temporal dynamics of X t ( ) 1 and X (t) 2 with length T can be described by the following univariate autoregressive models.
Incorporating both X t ( ) 1 and X (t) 2 together in a bivariate autoregressive model yields the following equations.  where p is the model order (with ≪ p T), A ij is a coefficient of the model, and = e t k ( ), 1, 2, 3, 4 k are the prediction errors with variance e var( ) i for each time series. If X 2 Granger causes X 1 then < e e var( ) var( ) 3 1 i.e. the noise term is reduced due to the inclusion of X 2 in the linear regression model. Similarly for X 1 to Granger cause X . 2 Assuming that X 1 and X 2 are wide sense stationary, the magnitude of this interaction can be quantified by taking the log ratio of the variance of prediction errors.
there is a causal influence from X 2 to X 1 and similarly for To determine the appropriate model order the Akaike information criterion (AIC) can be used and helps prevent under-representation or overfitting of the model [41,46]. For n variables the AIC is defined as 2 where Σ is the estimation of the prediction error covariance matrix of the bivariate autoregressive model, p is the model order, and T is the length of the signal.
can be estimated using, for example, a Burg-type algorithm on multiple realizations of a common stochastic process (i.e. a single neural data set with similar temporal dynamics) [47]. Inter-hemispheric causality (or connectivity) features can then be estimated for corresponding left (X ) 1 and right (X ) 2 LFP signals by solving equations (1)-(4) for e t ( ), 3 and e t ( ), 4 and inputting these values into equations (5) and (6).

Selecting appropriate feature subsets
Although the described feature extraction methods provide a suitable set of features for neural decoding, it is useful to select a parsimonious subset of these features which can maximize discriminability and reduce dimensionality. Using a feature selection method will provide a set of the most informative features by mitigating redundancy and maximizing relevance, and further indicates individual feature contributions towards a given decoding task.
3.2.1. Sequential feature selection (SFS). The SFS approach is a univariate search algorithm and is a popular feature selection technique due to its simplicity and efficient computation [48]. This is especially important when used in conjunction with a computationally heavy wrapper method and/or the initial feature set is large. Starting from an initially empty set of features Z , 0 at each forward step l, feature is the next available feature from the set of ranked and ordered features [48]. Using this process all ranked features can be incorporated into the feature set Z l until = Z Z l M and, at each l, = … l M 1, 2, , the feature set is evaluated and classification accuracy (CA) recorded. Finally the optimal subset of features is selected at + l , where the CA reaches its maximum. The CA can be found using a variety of different classifiers (the wrapper function), for example, a SVM or GNB classification. Although, simple and efficient to compute the SFS approach does not take into account feature dependency. Therefore, it is unable to remove features that become obsolete after the addition of other features, thus increasing feature redundancy.
3.2.2. Weighted sequential feature selection. To counter the aforementioned drawbacks associated with the SFS algorithm a new method, the WSFS, is proposed in this paper. It selects the most relevant features and effectively reduces the number of redundant features by utilizing the SFS approach in combination with computing a contribution factor for each feature by using a resampling strategy during the evaluation process. Such an approach is expected to significantly improve the performance during decoding by increasing the searched feature space and reducing feature redundancy. Furthermore, it has similar computational overheads to that of associated greedy feature selection algorithms and potentially less than other ranking approaches [49] depending on dimensionality.
Initially, all features are ranked according to their individual classification accuracies. During the subsequent selection process, the SFS approach is evaluated repeatedly for Q times using randomized cross-validation. At each repetition, a binary record is kept to identify which feature have been selected (1) and which have not (0). The repetition value Q can be obtained through repeated k-fold cross-validation for q times for a given training data set, so that = × Q q k. The randomized cross-validation implies that − N k k ( 1) / random training and N k / random validation instances will be selected at each repetition with each instance used exactly q times for validation (where N is the total number of training instances). Due to different (but overlapping) training and validation subsets used at each repetition, some features will be consistently contributing across the repetitions while others will only contribute partially or potentially not at all. Based on this insight, a binary contribution matrix ( × Q M) is obtained for M features across Q repetitions. The total contribution Σ ( ) of each individual feature is then obtained by summing across all the binary values for each feature. To select the most robust subset of features for classification, regardless of specific training instances, all the features are grouped into Q overlapping subsets based on the total contribution of each feature, (A being a set of features) and ∈ Σ j is generated by assigning all the features l with Σ ⩾ j. l Therefore, the subset f j contains all the features who contributed at least j times during the evaluation of the Q repeated SFS subsets. After generation of Q feature subsets, each subset is evaluated using a classifier function and the subset with the highest CA is selected as the optimal feature subset for decoding.

Decoding of LFPs
3.3.1. Pre-processing and feature extraction of LFPs. The deep brain LFP recordings were selected by excluding the segments contaminated with unintended movements by analysing the associated surface EMGs. Trials contaminated with artifacts, for which the behavioral response was incorrect, were also removed during visual inspection. The LFPs were filtered with a low-pass Chebyshev Type I filter with 90 Hz cut-off and a bandstop filter centered at 50 Hz to remove the power line noise using a custom adaptive filter [19]. Figure 1 shows the recorded LFPs and associated spectrograms from both STN and GPi during externally-cued left and right clicking. The stimulus timings (dotted lines) and subsequent motor response (solid lines) are highlighted. Figure 2 highlights the decoding procedure (pre-processing, feature extraction, feature selection, and classification) of deep brain LFPs for both movement and subsequent laterality.
Prior to feature extraction, the filtered LFPs were resampled at 256 Hz and the frequency dependent components were extracted as δ = 0-4 Hz, θ = 4-8 Hz, α = 8-12 Hz, low β = 12-20 Hz, high β = 20-32 Hz, low γ = 32-60 Hz and high γ = 60-90 Hz using the WPT with a discrete Meyer wavelet at decomposition scale 5. In each component, the left/right events were segmented as 2 s preceding and following each motor response registration. Similarly, the resting activity was segmented as 2 s preceding and following each stimulus registration. For resting activity, it was assumed that during the stimulus timing, the subject was focused on the visual cue and did not perform any voluntary movement as monitored from the EMG recordings. Although this assumed eye fixation was not validated, for example, using gaze tracking, the patients were instructed to concentrate solely on the cross during rest periods. Segmented signal features were extracted based on neural synchronization and inter-hemispheric connectivity from each frequency band. To compute the neural synchronization features, the envelope of each component was computed using the HT. Figure 3 shows the amplitude modulation of each component during left and right clicking events over all trials for one subject. It can be seen that there is an amplitude decrease in the β band and increases in the δ, θ, α and γ bands, with the increase most visible in the δ band. Figure 4 highlights the average ERD and ERS in these bands based on the envelope of instantaneous power and are defined as the average amplitude within each of five consecutive 100 ms windows for each frequency band. The five windows for the resting condition ran from −750 to −250 ms before the stimulus and the five windows for the movement (clicking) events ran from −150 to 350 ms around the motor response (see figure 4). The time windows are mainly acausal as the movements are externally-cued with the patients instructed to press the corresponding button as quickly as possible given a single 'go' cue. Therefore, ERD/ ERS changes will begin, at the earliest, after this trigger at approximately the same time as the movement onset [50]. In total thirty-five features (each consisting of two values, one for left STN/GPi LFPs and the other for right STN/GPi LFPs) were obtained for movement classification.
It is assumed that synchronization of neural activity between different regions of the brain relates to specific movement states. Therefore, the investigation of changes in the causal relationships (connectivity) between neural signals recorded from the left and right STN or GPi can provide more discriminative information to decode movement laterality. Therefore, the causal strength between LFPs of left and right STN/GPi was evaluated by analysing Granger causality between the two hemispheres and denoted as the interhemispheric connectivity or neural causality features for decoding left and right clicking events. During causal analysis, the features were defined by computing contralateral and ipsilateral causal strength for each frequency band and movement event. The causal strengths for left movements were computed from the right → left (contralateral) and left → right (ipsilateral) hemispheres. For right movements, the causal strengths were computed left → right (contralateral) and right → left (ipsilateral). Analysis of Granger causality for each event was performed using the segmented LFPs and by varying a single analysis window. Two different window sizes were compared, ±0.5 s and ±1 s around each motor response registration. For each window size, the MVAR model was estimated from the windowed data and the optimal model order p for the MVAR model was identified by locating the minimum of the AIC. Model orders of 5, 8, 10, 12, 15, 20, 25 and 30 were compared, with a final model order of 10 estimated from the single ±0.5 s window selected as a trade-off as it provided sufficient discriminability without over-parameterization.
The subject-specific LFP signals were treated as realizations of a common stochastic process implying that a single underlying signal model could be estimated for each frequency band. Therefore, the 10th order MVAR model = … A j A j A j A j j ( ( ), ( ), ( ), ( ), 1, , 10) 11 12 21 22 were calculated using all (left and right) trials simultaneously for each frequency band using a single ±0.5 s window centred at the response onset (at t = 0). Figure 5 shows the causal strengths between the left and right STN hemispheres for one subject, for both left (a) and right (b) clicking events and in each frequency band. To demonstrate how the causal strength changes over time and across trials the causal strength is also shown for individual trials. The causality has been recalculated at each consecutive time step using equations (1)-(6), the MVAR models computed previously and LFP data segmented to a single ±0.5 s window centred at the specific time step ( figure 5 left-side).
For the purposes of decoding, fourteen inter-hemispheric connectivity features were obtained for classification by evaluating contralateral and ipsilateral causal strengths in each of the seven frequency bands for left (a) and right (b) clicking events ( figure 5 right-side). The causality strengths were calculated in a single ±0.5 s window centred on the response onset (i.e. at t = 0) using the MVAR model calculated previously. The Granger causality analysis for extracting inter-hemispheric connectivity features was performed using the GCCA MATLAB toolbox [46]. 3.3.2. Feature selection and classification of LFPs. During feature extraction, thirty-five synchronization and fourteen causality features were obtained for decoding movement and it's laterality from finger clicking events. As interhemispheric causality features primarily relate to laterality, only synchronization features were used to decode movement (clicking) from rest. During laterality analysis, different initial feature pools were tested using (1) synchronization features only and (2) synchronization and causality features together, to determine if inter-hemispheric causality information could improve laterality decoding. Both movement and laterality decoding were performed through selection of optimal feature subsets followed by classification using either a GNB or SVM classifier. The classifiers were trained, validated and evaluated using multiple k-fold cross-validation procedures [48]. An external one hundred repeated 10-fold cross-validation procedure was used to divide the LFP data sets for each subject into independent training and test sets. Two further internal and independent 10-fold cross-validation procedures were employed for (1) feature selection and (2) selecting the hyper-parameters of the SVM classifier, using data only from the training set.
During feature selection, the features are first ranked and reordered individually followed by feature subset evaluation based upon either the SFS or WSFS methodology. Evaluation of both individual and feature subsets was determined using a wrapper function. A wrapper function evaluates the predictive accuracy, based upon the selected classifier (GNB or SVM) in use. In this work, an internal five repeated 10-fold cross-validation (i.e. = × Q 5 10) procedure was employed for selecting the optimal feature subset from the training data. Based upon the selected feature subsets, GNB and SVM classifiers were trained. The final decoding performance was then evaluated in terms of average accuracy, sensitivity and specificity across all one thousand test set folds. The performance between the feature selection, classifier and initial feature pool strategies were compared for decoding both voluntary from no movement and movement laterality. Each of the subjects was evaluated separately for movement and laterality decoding, and performance across all subjects was averaged and presented as mean ±1 SD. Final comparison was

Results
In this study the decoding of voluntary movement activities was divided into two stages, i.e. classification between resting and clicking movements, and then laterality classification between left and right hand clicking. Deep brain LFPs recorded from twelve subjects (six PD and six dystonia subjects) were used for testing various decoding strategies. Figure 6 highlights example results related to the various stages of both the SFS and WSFS feature selection strategies in terms of movement laterality decoding for a typical subject. The average GNB CA of each individual synchronization feature ranges from 31.2% to 65.0% across all thirty-five features ( figure 6(a)). The performance of the SFS achieved a maximum 78.5% accuracy using the twelve top ranked features ( figure 6(b)). These included δ, θ, α, β and Hγ band features, during and immediately after the clicking event occurred and are shown as a binary matrix in figure 6(c) (with selected features shown in black).

Feature selection
For the WSFS strategy, the ranked features were repeatedly evaluated fifty-times using SFS ( = Q 50) and the contribution of each feature was measured as the number of times the feature was selected and is defined as the feature weight. Figure 6(d) shows the feature weight (contribution) for each feature with the features ordered from {δ, W1}, {θ, W1}, …, {Hγ, W5}. Based on the total feature weight, Q subsets of features were generated and each subset was evaluated to find the most consistent subset for providing better predictive accuracy (figure 6(e)). In this case, subset 29 containing only four features provided the highest accuracy (84.8%) including features from Hγ band during (W2: −50 to 50 ms) and δ, θ and Hγ bands after (W5: 250-350 ms) the clicking onset (figure 6(f)). The WSFS selected a smaller number of features than the SFS and provided better accuracy during training. Figure 7 presents the movement (clicking versus rest) decoding performance (accuracy, sensitivity (clicking) and  The causal strength for each event was computed within a 1 s window (−500 to +500 ms) at sequential time steps (left plots) or at the response onset (right plots). Therefore, the fourteen inter-hemispheric causality features are defined as seven contralateral (red) and seven ipsilateral (black), and are centred on the response onset. The y-axis represents both ipsilateral (lower) and contralateral (higher) trials with the left plots showing individual trials while the right plots highlight the average over all subject-specific trials. specificity (rest)) for all the subjects and their average. The optimal subset of features obtained through SFS and WSFS, with both GNB and SVM classifiers, achieved high decoding performance with average accuracies, sensitivities and specificities all greater than 98%. The SVM with WSFS achieved the best performance with less intra-subject variability. Overall, the WSFS outperformed the SFS method with no notable difference between the STN and GPi recordings.

Movement decoding
It should be noted that during the movement decoding process only 15 ± 12.7 features on average across all subjects were selected out of 35 features for both feature selection strategies using the GNB classifier. However, the SVM classifier selected less features, only 10 ± 12 and provided slightly better or similar performance. Both feature selection strategies selected a similar number of features and performed coherently.
Most of the features for movement decoding were selected from α, β and γ bands during and immediately after  clicking. Features from the Hγ band were consistently selected in nearly all cases. Lβ and Lγ bands were selected in some cases during and immediately after the event onset. Figure 8 shows the overall influence of the features during movement decoding.

Movement laterality decoding based upon synchronization features only
After movement decoding, movement-related events were passed to the laterality decoder. Figure 9 presents the laterality (left versus right) decoding performance in terms of accuracy, sensitivity (left) and specificity (right), for all subjects and their average, based upon synchronization features only. The SVM with WSFS achieved the highest performance overall with an average accuracy, sensitivity and specificity of 79.8 ± 2.8%, 75.5 ± 4.3% and 82.0 ± 4.3% respectively. The lowest performance was obtained by GNB, with SFS achieving an average accuracy, sensitivity and specificity of 65.8 ± 3.5%, 62.5 ± 5.5% and 69.5 ± 5.0% respectively. The SVM classifier performed significantly better than the GNB classifier (p = 0.003) while the WSFS provided significantly higher performance than the SFS method (p < 0.001). On average 30 ± 4.8 (GNB) and 24 ± 6.4 (SVM) features were selected out of a possible 35 using the SFS strategy. In contrast, the WSFS selected much fewer features with an average of 13 ± 9.6 (GNB) and 13 ± 9.1 (SVM), respectively.
It was observed that most of the features were consistently selected from θ (W3), α (W4), Lβ (W2, W3), Hβ (W2, W5) and γ (W5) bands during and immediately after the movement onset. Overall features from θ, β and Hγ bands during and immediately after event onset across all subjects were most influential in the decoding process. Figure 10 shows the overall influence of the synchronization features for laterality decoding.

Movement laterality decoding based upon synchronization and causality features
When both synchronization and causality features were combined and incorporated into the laterality decoding process, the classification performance was overall improved. Figure 11 presents the laterality (left versus right) decoding performance in terms of accuracy, sensitivity (left) and specificity (right), for all subjects and their average, based upon both feature sets. SVM with WSFS achieved the highest performance overall with an average accuracy, sensitivity and specificity of 81.5 ± 2.6%, 77.5 ± 4.4%, and 83.3 ± 4.2% respectively. The GNB with WSFS also had improved decoding performance and achieved more than 75% recognition accuracy, sensitivity and specificity. The SVM classifier performed significantly better than the GNB classifier (p = 0.001) while the WSFS provided significantly higher performance than the SFS method (p < 0.001). On average 40 ± 6.9 (GNB) and 36 ± 7.3 (SVM) features were selected out of a possible 49 using the SFS strategy. In contrast, the WSFS selected much fewer features with an average of 17 ± 9.1 (GNB) and 15 ± 8.1 (SVM), respectively.
In the selected feature subset, causality features contributed more dominantly and consistently than the synchronization features. Figure 12 highlights the overall influence of the synchronization and causality features for laterality decoding. The causality {contralateral: δ, θ, α, Lβ and ipsilateral: δ, θ, α} and synchronization {W2: β bands just before and during onset and W5: Hβ and Hγ bands immediately following onset} features were selected across all subjects. Beside this, δ and Lγ band features after onset also contributed occasionally. Overall, the contralateral θ, α and Lβ features associated with causality and β band features associated with neural synchronization were most influential for movement laterality decoding.

Discussion
DBS offers a unique opportunity to sense and control the human brain circuits. The function of the basal ganglia and thalamus in motor control has been studied using invasive LFP recordings from DBS stimulation macro-electrodes in PD patients. In this study, LFPs from the Basal Ganglia (STN or GPi) were investigated to decode voluntary cued movement and its laterality. Two types of features, namely (1) neural synchronization features, and (2) inter-hemispheric causality features, were extracted from the LFPs. Using efficient feature selection (i.e. SFS and WSFS) and parameter optimization during training, both GNB and SVM classifiers achieved high decoding accuracy to recognize movements (clicking versus rest) and laterality (left versus right). Movement decoding based on synchronization performed accurately while the movement laterality decoder although demonstrating increased error presents a significantly more challenging classification problem. The performance of the laterality decoding was improved by combining both the synchronization and causality features across all classification strategies. SVM classification outperformed GNB classification in all cases while the WSFS performed significantly better than the SFS. Overall, the WSFS with SVM classifier achieved the highest performance for decoding movement (99.8%), and in-conjunction with both synchronization and causality features achieved the highest performance for decoding movement laterality (81.5%).

LFP decoding
During feature extraction, the WPT in conjunction with the HT was used to analyse band-limited neural synchronization from movement-related LFPs [39]. This information highlights de-synchronization and re-synchronization in local neural populations following movement events. For decoding purposes, these local synchronization features were computed by averaging the amplitude in defined temporal windows with the optimal window evaluated by varying its length (50 ms or 100 ms) and position (between ±500 ms). To improve laterality (left versus right movements) decoding performance, inter-hemispheric causality features were also included. These features were computed using bivariate Granger causality between the left and right-side LFP signals. This considers the subcortical nuclei a 2-element network with the extracted information indicative of long range functional connectivity (neural binding) between these lateral brain areas. Overall, it was found that contralateral were more dominant than ipsilateral causality features and strongly contributed to the classification during laterality decoding ( figure 12).
Evaluation of other measures, such as local causality (e.g. between electrode channels), cross-modality (cross frequency band, phase, power-phase) synchronization and spectral Granger causality may enhance synchronization and connectivity information thus providing more discriminative features [51]. In the current work, a fixed MVAR model order was implemented across all subjects. However, dynamic subject-specific selection of the model order based on, for example, the AIC criterion could further enhance the decoding performance [52,53]. This would be at the expense of increased computation requirements during training.
To select the optimal subset of features for decoding, a wrapper approach was employed based on a SFS strategy. A wrapper methodology uses the predictive accuracy estimated from the training set to score the individual or subset of features. Compared to filter (or embedded) methods this generally provides a better feature set for classification of a particular decoding task but can be computationally heavy. A SFS strategy was employed to reduce computational load as it significantly reduces the feature subset search space by making naïve and univariate feature assumptions. It should be noted that searching all combinations of features or utilizing stochastic or even greedy feature selection algorithms, can quickly become intractable as the number of features increases.
To reduce redundancy in the chosen SFS feature subset, a new WSFS strategy has been developed. It is based on estimating the effective contribution of each feature during repeated evaluation of the SFS, relying on random resampling to achieve diversity across the feature subsets. The WSFS performs significantly better as it selects the most consistent features during both movement and laterality decoding (figures 7, 9 and 11). As the algorithmic parameter  Q (controlling the number of SFS repetitions) is increased, confidence in the feature contribution estimate is strengthened and the number of feature subsets evaluated is increased at the expense of increased computational load. Therefore, the WSFS strategy is useful for selecting the optimal subset of features from high dimensional correlated data sets as the computational intensity can be directly controlled and the feature redundancy reduced. Future work, will look to expand upon and validate the WSFS strategy in a broader perspective and compare its performance with other state of the art feature selection strategies, such as sequential floating forward or backward selection, minimum-redundancy-maximum-relevance feature selection and regularized trees.
Both the classifiers, GNB and SVM performed very well in movement and laterality decoding. The SVM outperformed the GNB classifier. One recent study on PD tremor classification showed that a SVM was able to perform better than a neural network classifier [54]. However, the SVM requires the tuning of hyper-parameters which can lead to model overfitting and is therefore sensitive to the size of the training set [31,52,55]. Conversely, a GNB classifier is simple, can generalize well with smaller training set sizes and is efficient for real-time applications [31,53]. According to the trade-off between decoding performance and complexity, the SVM performed better during offline evaluation, however to design an online decoder a GNB classifier may be preferred due to its simplicity and capacity to generalize better.

LFP feature influences
It was shown that the δ band amplitude was high during the event onset compared to the α, θ and γ bands. Nonetheless the contribution of δ band features within the synchronization feature domain was not as strong as other frequency band features for decoding of movement and laterality across all subjects. However, the contralateral δ band neural synchronization feature did contribute to the laterality decoding ( figure 12). For movement decoding, the Hγ feature was dominant with features from α, Lβ and Lγ during onset (−150 ms to +150 ms) also selected in most cases (figure 8). During laterality decoding, based on both synchronization and causality features, the most consistent were the causalitybased features including contralateral θ, α and Lβ band features ( figure 12). Ipsilateral θ, α and Lβ also contributed in some subjects. Most of the synchroniation features were selected from θ, β, and γ bands with strong contribution from Hβ during and immediately after event onset. The overall influence of these frequency band features broadly reflect the previous finding of movement-related frequency band synchronization within the basal ganglia [21]. In particular, ERD of the β-band within the STN has been shown to occur with the onset of voluntary movements [14] while ERS of the γband has been shown to occur contralaterally and in a prokinetic fashion [56]. However, there is also evidence that both contralateral and ipsilateral movement-related activity in the STN has, for the most part, a bilateral representation [16].

LFP decoding performance and limitations
In this study, high decoding performance was achieved across all twelve subjects during movement classification with sensitivity and specificity greater than 99%. This is a significant improvement on similar work performed by Loukas and Brown who achieved 95% sensitivity and 77% specificity for predicting self-paced hand-movements from rest states [21]. Moreover, their results are only from three subjects  (compared to twelve subjects in this study) with an average disease duration of 10.7 ± 2.1 years (compared with 14.8 ± 10.3 years in this study). It should be noted that although Loukas and Brown were able to predict movements (⩾1 s) in advance, these movements were much larger unilateral self-paced wrist and hand motions with the movement onset based on the position of a joystick. In this paper, subjects were instructed to press a key, as quickly as they could, with either their left or right index finger following the corresponding cue. This meant that no warning cue was given with the average time between the cue and the exact movement onset being 0.42 ± 0.11 s. Therefore, there was no possibility of ERD/ERS activity occurring before this time. The additional decision process of choosing the laterality of the movement would further cause a delay in the movementrelated neural activity.
The techniques that have been presented in this paper are easily extendible to the analysis of self-paced (and imagined movements) where there is the possibility of decoding the activity much earlier in time, for example, prior to the movement onset (as shown previously by Loukas and Brown) [21]. This would require the analysis windows (i.e. the five WPT synchronization windows: −150 ms to +350 ms, and Granger causality analysis window: −500 ms to +500 ms) to be shifted, based on the information content of the LFPs, so that they are now causal with respect to the movement. The exact size, number and timing of these analysis windows could be optimized using a stochastic search algorithm (e.g. a Genetic algorithm) enabling these parameters to be automatically selected assuming sufficient data was available.
In this study, compared to the movement classification, the performance of the subsequent laterality decoding degraded to around 81.5%. Several issues may account for this deterioration: (1) laterality presents a much more challenging decoding problem than simply classifying movement versus rest, (2) an unbalanced number of trials between the class data sets and (3) higher intra-class variability compared to inter-class differences. Other issues such as the subject's ability to concentrate and perform the actions according to the specific cue, subject's disease severity, lack of medication and ability to perform actions would also affect the decoding performance. It was decided that the patients should be offmedication during signal recording as this can have a profound effect on frequency specific synchronization especially in the β and γ bands [50]. For most of the subjects, the number of trials (both events {mean ±1 SD: 115.8 ± 43.6, min: 56, max: 202}, right event {mean ±1 SD: 58.7 ± 21.8, min: 28, max: 89} and left event {mean ±1 SD: 57.2 ± 25.3, min: 25, max: 113}) was less than the number of features (total initial dimensionality: 84). To estimate the classification model robustly a large sample size compared to feature dimensionality is desirable [52,57]. The WSFS feature selection strategy was designed to overcome the issue of high dimensionality, small sample size data sets and overall improved the decoding performance. The effective solution of many of these drawbacks are still open problems in the machine learning literature and further investigation is needed to minimize their effects in similar application domains [58,59].
The subjects were instructed to fixate on a central cross, however without Gaze tracking, the possibility that some of the ERS/ERD is from oculomotor activity cannot be fully eliminated. This would potentially enhance the movementrelated information assuming it has a relatively similar time course as the activity solely caused by the finger movements alone (i.e. any eye movement occurring would still occur after the cue and only slightly ahead of the finger movement). As the focus of this work is the decoding of movement and its laterality, as opposed to differentiating movement specifics, this would have no consequence on the movement source as long as the laterality of the movement is preserved (assuming the appropriate ERS/ERD response is still elicited). Conversely, if the eye movements are not correlated with the left/ right cue then it would create signals interfering with LFP activity, thus making the decoding task more challenging During laterality decoding, in most subjects, the detection of right movements (average specificity, 83.3 ± 4.2%) was higher than left movements (average sensitivity, 77.5 ± 4.4%). All participants in these experiments were right-handed and therefore this could be a major contributing factor for obtaining higher specificity than sensitivity. It has been highlighted in literature that a well-trained activity response is more stable and detectable than a less trained one [31]. Information regarding the patients' disease conditions such as disease severity and handedness are not considered, and as such correlational analysis of these factors with the decoding performance was considered beyond the scope of this work. However, based on the available demographic data (table 1), it was observed that patients with longer disease duration (i.e. more than 25 years) generally achieved lower decoding accuracy, and may reflect the influence of disease progression on decoding performance. In some cases, patients with an older age (greater than 53 years) also achieved lower decoding accuracy. According to disease types, patients with PD achieved much higher laterality decoding accuracy (87.6 ± 10.5%) on average compared to patients with dystonia (76.8 ± 14.3%). This can probably be attributed to the fact that the dystonic patients (18.8 ± 12.5 years) had a longer disease duration than the PD patients (10.7 ± 6.1 years). It should also be noted that dystonic episodes are worsened during the initiation of voluntary movement and would be further compounded by the patients' abstinence from their medications.

Potential future work
Recent progress in LFP-based BMIs is improving current DBS systems [4,5,[12][13][14]. By integrating decoding methods within DBS-driven neural interfaces, it should help to develop advanced BMI systems and will be useful not only for PD patients but for other pathologies also. For example, spinal cord injury patients could use such a system to control their limbs, prostheses, or even a brain-machine-brain interface [15,60]. In this study, LFPs were generally recorded as a single session (although some subjects required two-sessions of recording). To evaluate the stability in decoding basal ganglia LFPs for motor activities, analysis over multiple sessions should be performed by utilizing training and test data from different sessions [61]. Investigation of the decoding performance in other conditions such as self-paced and imagined movements from basal ganglia LFPs will also advance our understanding and facilitate its application towards development of an assistive neural interface system.
Potentially, combining information obtained from LFPs of the basal ganglia and cortex using the methods described here may advance the field significantly. To date, effective bidirectional BMIs for DBS have recorded action potentials or ECoG from the motor cortex, as they are highly informative about detailed movements [62][63][64][65][66]. However, these require dedicated electrodes to be placed into the cortex resulting in additional surgeries with associated health risks and costs, and as such have generally been limited to studies in primates. In this work, the DBS leads were already present in the human subjects and therefore the subjects do not require any additional surgical procedures. One idea to obtain cortical control signals whilst avoiding additional operations could be to position a small ECoG strip over the motor (or premotor) cortex through the same burr hole as the DBS electrode.

Summary and conclusions
We have introduced a new approach to decoding movement related deep brain LFPs in this study. The results not only enhance our understanding of human brain circuits related to motor operation but also facilitate development of advanced human-machine interfaces. The integration of neural synchronization and inter-hemispheric causality measures, relating to information flow and brain connectivity, provide useful discriminatory information for developing more reliable BMIs. The introduced WSFS method is particularly useful for robust feature selection, as training data available is often high in dimensionality and limited in size. This addresses a very significant gap with regard to the 'curse of dimensionality' in having enough trials and data samples from subjects with surgically implanted electrodes. Specific contributions of this work include: (1) A new method, weighted feature sequential selection (WSFS), for selecting an optimal feature subset when there is limited data with high initial dimensionality. This method highlights the contribution of each feature, further elucidating the underlying nature of the data. The computational overheads can be tuned based on desired performance, enabling computational expense and optimality to be balanced. (2) Implementation of this method for the first successful identification of movement laterality, as decoded from deep brain LFPs, recorded from clinically implanted macro-electrodes in the basal ganglia.
(3) The introduction of Granger causality features to extract inter-hemispheric connectivity related features for improved accuracy in movement laterality decoding. (4) Improvement in overall classificatio.n performance of movement (from rest) when compared with similar studies. (5) A comprehensive set of testing results of the decoder using data from twelve PD and dystonic patients.
Future work will focus on advancing implanted neural interfaces and correlation of movement to brain activity to introduce new therapies for neural dysfunction.