Low-Rank Linear Dynamical Systems for Motor Imagery EEG

The common spatial pattern (CSP) and other spatiospectral feature extraction methods have become the most effective and successful approaches to solve the problem of motor imagery electroencephalography (MI-EEG) pattern recognition from multichannel neural activity in recent years. However, these methods need a lot of preprocessing and postprocessing such as filtering, demean, and spatiospectral feature fusion, which influence the classification accuracy easily. In this paper, we utilize linear dynamical systems (LDSs) for EEG signals feature extraction and classification. LDSs model has lots of advantages such as simultaneous spatial and temporal feature matrix generation, free of preprocessing or postprocessing, and low cost. Furthermore, a low-rank matrix decomposition approach is introduced to get rid of noise and resting state component in order to improve the robustness of the system. Then, we propose a low-rank LDSs algorithm to decompose feature subspace of LDSs on finite Grassmannian and obtain a better performance. Extensive experiments are carried out on public dataset from “BCI Competition III Dataset IVa” and “BCI Competition IV Database 2a.” The results show that our proposed three methods yield higher accuracies compared with prevailing approaches such as CSP and CSSP.


Introduction
With the development of the simpler brain rhythm sampling technique and powerful low-cost computer equipment over the past two decades, a noninvasive brain-computer interface (BCI) called electroencephalography (EEG) has attracted more and more attention than other BCIs such as magnetoencephalography (MEG), functional magnetic resonance imaging (fMRI), and near infrared spectroscopy (NIRS). Among various EEG signals, certain neurophysiological patterns can be recognized to determine the user's intentions such as visual evoked potentials (VEPs), P300 evoked potentials, slow cortical potentials (SCPs), and sensorimotor rhythms. EEG brings hope to patients with amyotrophic lateral sclerosis, brainstem stroke, and spinal cord injury [1]. Motor imagery (MI), which is known as the mental rehearsal of a motor act without real body movement execution, represents a new approach to access the motor system for rehabilitation at all stages of the stroke recovery. People with severe motor disabilities can use EEG-BCI to realize the communication and control and even to restore their motor disabilities [2,3]. Therefore, an increasing number of researchers are working on MI-BCI for stroke patient rehabilitation [4,5].
MI-BCI concentrates on sensorimotor -or -rhythms that has the phenomenon known as event-related synchronization (ERS) or event-related desynchronization (ERD). However, the MI pattern recognition is still a challenge due to the low signal-to-noise ratio, highly subject-specific data, and low processing speed. For these reasons, more and more digital signal processing (DSP) methods and machine learning algorithms are applied to the MI-BCI analysis. Unlike static signals such as images and semantics, the EEG signals are dynamic that lie in a spatiotemporal feature space. Thus a large variety of feature extraction algorithms are proposed, including power spectral density (PSD) values [6,7], autoregressive (AR) parameters [8,9], and time-frequency features [10]. For MI-BCI pattern recognition, there are mainly three types of methods: autoregressive components (AR) [11], wavelet transform (WT) [12,13], and CSP [14,15]. Because of effectiveness and simplicity in extracting 2 Computational Intelligence and Neuroscience spatial features, CSP becomes one of the most popular and successful solutions for MI-BCI analysis according to the winners' methods analysis of "BCI Competition III Dataset IVa" [16,17] and "BCI Competition IV Database 2a" [18,19]. Therefore, many researchers devote theirselves to improving the original CSP method for better performances, such as common spatiospectral pattern (CSSP) [20], common sparse spectral spatial pattern (CSSSP) [21], subband common spatial pattern (SBCSP) [22], filter bank common spatial pattern (FBCSP) [23], wavelet common spatial pattern (WCSP) [24], and separable common spatiospectral patterns (SCSSP) [25]. Most of these improved CSP methods fuse spectral and spatial characteristics in the spatiospectral feature space and finally achieve success by comparison experiments.
Despite its effectiveness in extracting features of MI-BCI, CSP needs a lot of preprocessing and postprocessing such as filtering, demean, and spatiospectral feature fusion, which influence the classification accuracy easily. In this paper, we utilize linear dynamical systems (LDSs) for processing EEG signals in MI-BCI. Although LDSs succeed in the field of control, to the best of our knowledge, this model has barely been tried in the feature extraction of EEG analysis so far. Compared with CSP method, LDSs have the following advantages: first, LDS can simultaneously generate spatiospectral dual-feature matrix; second, there is no need to preprocess or postprocess signals, and the raw data can be directly fed into the model; third, it is easy to use and of low cost; last, the extracted features from the LDS are much more effective for classification.
Furthermore, we apply low-rank matrix decomposition approaches [26][27][28] that have the ability to learn representational matrix even in presence of corrupted data. The noise of the data can be get rid of, hence to improve the robustness. However, there are two ways for the EEG lowrank decomposition. One aims at the EEG raw data; the other aims at features extracted from LDSs, which is proposed by us and called low-rank LDSs (LR-LDSs).
This paper mainly has the following contributions. (1) We utilize LDSs for MI-EEG feature extraction to solve the MI pattern recognition problem. (2) Low-rank matrix decomposition method is applied to improve the robustness for the raw data analysis. (3) We propose LR-LDSs on finite Grassmannian feature space. (4) Plenty of comparison experiments demonstrate the effectiveness of these approaches.
The rest of this paper is organized as follows. Section 2 provides LDSs model to realize the feature extraction of EEG signals. Section 3 presents low-rank matrix decomposition method for the EEG raw data analysis. Section 4 introduces LR-LDSs method. Then, the proper classification algorithm is explained in Section 5. Section 6 compares the three proposed methods (LDSs, LR+CSP, and LR-LDSs) with other state-of-the-art algorithms in different databases. Finally, the summary and conclusion are presented in Section 7.

LDSs Modeling
LDSs, also known as linear Gaussian state-space models, have been used successfully in modeling and controlling dynamical systems. In recent few years, more and more problems extending to computer vision [29,30], speech recognition [31], and tactile perception [32] have been solved by LDSs model. EEG signals are sequences of brain electron sampling that have typical dynamic textures. We present the features of EEG dynamic textures by LDSs modeling and apply machine learning (ML) algorithms to capture the essence of dynamic textures for feature extraction and classification.
Let { ( )} =1,..., , ( ) ∈ be a sequence of EEG signal sample at each instant of time . If there is a set of spatial filters is an independent and identically distributed sequence drawn from a known distribution resulting in a positive measured sequence. We redefine the hidden state of ( ) to and consider a linear dynamic system as an autoregressive moving average process without firm input distribution: with V( ), ( ( )) distribution unknown, however. In order to solve the above problem, we can regard it as a white and zero-mean Gaussian noise linear dynamical system and propose a simplified and closed-form solution: where ∈ × is the transition matrix that describes the dynamics property, ∈ × is the measurement matrix that describes the spatial appearance, ∈ is the mean of ( ), and V( ) and ( ) are noise components. We should estimate the model parameters , , , from the measurements ( ), . . . , ( ) and transform them into the maximum-likelihood solution: and, however, optimal solutions of this problem bring computational complexity. We apply matrix decomposition to simplify the computation by the closed-form solution. The singular value Computational Intelligence and Neuroscience (4) Let = Σ , and we get the parameter estimation of̂, wherê= [ (1), (2), . . . , ( )].̂can be determined by Frobenius:̂( So the solution is in closed-form using the state estimated where † denotes matrix pseudoinverse. We

Low-Rank Matrix Decomposition
EEG signals have poor quality because they are usually recorded by electrodes placed on the scalp in a noninvasive manner that has to cross the scalp, skull, and many other layers. Therefore, they are moreover severely affected by background noise generated either inside the brain or externally over the scalp. Low-rank (LR) matrix decomposition can often capture the global information by reconstructing the top few singular values and the corresponding singular vectors. This method is widely applied in the field of image denoising and face recognition (FR). Concretely, low-rank (LR) matrix recovery seeks to decompose a data matrix into + , where is a low-rank matrix and is the associated sparse error. Candès et al. [33] propose to relax the original problem into the following tractable formulation: where the nuclear norm ‖ ‖ * (the sum of the singular values) approximates the rank of and the 1 -norm ‖ ‖ 1 is sparse constraint.
Then, Zhang and Li [34] decompose each image into common component, condition component, and a sparse residual. Siyahjani et al. [35] introduce the invariant components to the sparse representation and low-rank matrix decomposition approaches and successfully apply to solve computer vision problems. They add orthogonal constraint to assume that invariant and variant components are linear independent. Therefore, we decompose EEG signals as a combination of three components: resting state component, motor imagery component represented by low-rank matrix, and a sparse residual. However, in practice, it needs some digital signal processing (DSP), that is, wavelet transform or discrete Fourier transform before decomposition. Particularly, raw time-domain signals without any preprocessing are not suitable for low-rank matrix decomposition directly. The training dataset can be decomposed by fl + + , where ∈ × is a low-rank matrix and collects eventrelated EEG signal components, ∈ × approximates invariant and denotes resting state signal components that are sampled by subjects without any motor imagery, and ∈ × is the matrix of sparse noise. Therefore, training dataset can be decomposed as the following formulation: On ideal condition, each sampling channel of subject's brain EEG signals in resting state is similar. In other words, sum of each different raw is minimum. should add common constraint as the following formulation: We propose optimization problem formulation as min , , Then, augmented Lagrange multiplier (ALM) [36] method is utilized to solve the above problem. The augmented Lagrangian function ( , , , ) is given by where is a positive scalar and is a Lagrange multiplier matrix. We employ an inexact ALM (IALM) method described in Algorithm 1 to solve this problem, where ( ) = max(lansvd( ), −1 ‖ ‖ ) in the initialization 0 = / ( ) and lansvd(⋅) computes the largest singular value.
When low-rank matrix denoting event-related EEG signal components are generated, we can utilize some feature extraction methods such as CSP and CSSP to classify MI-BCI. In other words, low-rank matrix decomposition method in this section can be considered as a preprocessing part before feature extraction and classification.

LR-LDSs on Finite Grassmannian
Beginning at an initial state 1, the expected observation sequence generated by a time-invariant model = ( , ) is obtained as LDSs can apply the extended observability subspace O as descriptor, but it is hard to calculate. Turaga et al. [37,38] approximate the extended observability by taking the -order observability matrix; that is, ( , ) = [ , ( ) , . . . , ( −1 ) ] . In this way, an LDS model can be alternately identified as an -dimensional subspace of . Given a database of EEG, we can estimate LDSs model and calculate the finite observability matrix that span subspace as a point on the Riemannian manifold. Then, based on low-rank and sparse matrix decomposition, observability matrix can be decomposed into + as the following formulation: where is a low-rank matrix and is the associated sparse error.
The inexact ALM method can be also used to solve the optimization problem like Algorithm 1. The output represents low-rank descriptor for LDSs and can be employed for the classification of EEG trails.

Classification Algorithm
We extract features by the above LDSs model and get two feature matrices and . Unfortunately, and have different modal properties and dimensionalities. So they cannot be represented directly by a feature vector. Riemannian geometry metric for the space of LDSs is hard to determine and needs to satisfy several constraints. Common classifiers such as Nearest Neighbors (NNs), Linear Discriminant Analysis (LDA), and Support Vector Machines (SVM) cannot classify features in matrix form. The feature matrix must be mapped to vector space. We use Martin Distance [39,40], which is based on the principal angles between two subspaces of the extended observability matrices, as kernel to present distance of different LDS feature matrix. It can be defined as where Θ = { , }, Θ = { , }. is the eigenvalue solving as the following equation: We can classify EEG signals by comparing Martin Distance between training data and testing data. Nearest two samples mean that they may be of the same class. So the forecast label and predict accuracy can be calculated. Algorithm 3 in Supplementary Material is the classification method of KNN.
Considering LR-LDSs methods generating on Finite Grassmannian, unlike two feature matrices ( , ) by LDSs, Euclidean Distance and Mahalanobis Distance can describe Computational Intelligence and Neuroscience 5 the distance between two feature spaces of EEG trails after LR-LDS. They are simple, efficient, and common for measuring distance between two points. In order to improve the accuracy of classification, we can also employ metric learning methods using the label information to learn a new metric or pseudometric such as neighborhood components analysis and large margin nearest neighbor.

Experimental Evaluation
From the above sections, we propose three methods for EEG pattern recognition: LDSs, LR+CSP, and LR-LDSs. Two datasets of motor imagery EEG including BCI Competition III Dataset IVa and BCI Competition IV Database 2a are used to evaluate our three methods compared with other state-ofthe-art algorithms such as CSP and CSSP. All experiments are carried out with MATLAB on Intel Core i7, 2.90-GHz CPU with 8 GB RAM.

BCI Competition III Dataset IVa.
Dataset IVa is recorded from five healthy subjects, labeled as "aa," "al," "av," "aw," and "ay," with visual cues indicated for 3.5 s performing right hand and foot motor imagery. The EEG signal has 118 channels and markers that indicate the time points of 280 cues for each subject, band-pass filtered between 0.05 and 200 Hz, and downsampled to 100 Hz. Before feature extracting for comparison experiment, the raw data needs some preprocessing. Firstly, we extract a time segment located from 0.5 to 3 s and employ FastICA to remove artifacts arising from eye and muscle movements. Secondly, we chose 21 channels over the motor cortex (CP6, CP4, CP2, C6, C4, C2, FC6, FC4, FC2, CPZ, CZ, FCZ, CP1, CP3, CP5, C1, C3, C5, FC1, FC3, and FC5) that related to motor imagery.
In order to improve the performance of CSP and CSSP, we apply Butterworth filter for EEG signals filtering within a specific frequency band between 8 and 30 Hz, which encompasses both the alpha rhythm (8)(9)(10)(11)(12)(13) and the beta rhythm (14-30 Hz) that relate to motor imagery. Then, we program MATLAB code to get spatial filter parameters and feature vectors by variance. Finally, a LDA classifier is used to find a separating hyperplane of the feature vectors.
In LDSs model, the value of a hidden parameter describing dimension of Riemannian feature space is closely related to final accuracy. We chose the highest accuracy performance subject "al" and the lowest accuracy performance subject "av" to show the relationship between hidden parameter and  The relationship between hidden parameter and accuracy for LDSs. We choose "al" and "av," which are the highest and lowest accuracy performance, respectively, to show the relationship between hidden parameter and accuracy. classification accuracy. The result of experiment is presented in Figure 1, which indicates that the accuracy tends to increase when the value of hidden parameter augments approximately and the highest accuracy happens near hidden parameter value of 16.
Then five methods including CSP, CSSP, LDSs, LR+CSP, and LR-LDSs are compared with each other. The results are listed in Table 1.
From Table 1, the bold figures present the best performance results. LR-LDSs are in the majority. The last row shows that the mean of LR-LDS classification accuracy is much better than CSP and a little higher than the others. Comparing with CSP and LR+CSP, LR method is very efficient and useful to improve accuracy. LDSs related methods outperform CSP and CSSP due to their both spatial and temporal features extraction.

BCI Competition IV Database 2a.
Database 2a consists of EEG data from 9 subjects. There are four different motor imagery tasks including movement of the left hand, right hand, both feet, and tongue. At the beginning of each trial, a fixation cross and a short acoustic warning tone appear. After two seconds the subject is cued by an arrow pointing to either the left, right, down, or up that denote the movement of left hand, right hand, foot, or tongue for 1.25 s. Then the subjects carry out the motor imagery task for about 3 s. The BCI signals are sampled by 25 channels including 22 EEG channels and 3 EOG channels with 250 Hz and bandpassfiltered between 0.5 Hz and 100 Hz. Different from dataset IVa, database 2a is a multiclassification problem. However, LDA is a two-class classifier. Therefore, we choose -NN algorithms for CSP, CSSP, LDSs, LR+CSP, and LR-LDSs methods uniformly. Table 2 describes the classification accuracies results of five above concerned methods. Similar to the results of BCI Competition III Dataset IVa, the mean accuracies of LDSs, LR+CSP, and LR-LDSs are higher than CSP and CSSP methods. Furthermore, LR-LDSs method abstains the best performance.

Conclusion
CSP has gained much success in the past MI-BCI research. However, it is reported that CSP is only a spatial filter and sensitive to frequency band. It needs prior knowledge to choose channels and frequency bands. Without preprocessing, the result of classification accuracy may be poor. LDSs can overcome these problems by extracting both spatial and temporal features simultaneously to improve the classification performance. Furthermore, we utilize a low-rank matrix decomposition approach to get rid of noise and resting state component in order to improve the robustness of the system. Then LR+CSP and LR-LDSs methods are proposed. Comparison experiments are demonstrated on two datasets. The major contribution of our work is realization of LDSs model and LR algorithm for MI-BCI pattern recognition. The proposed LR-LDSs methods achieve a better performance than CSP and CSSP.