Using Higher Order Nonlinear Operators for SVM Classification of EEG Data

Brain-computer interface (BCI) is a communication system that translates brain activity into commands for a computer or other digital devices [1]. The major goal of BCI research is to develop systems that allow disabled users to communicate with other persons, to control artificial limbs, or to control their environment. Other applications include multimedia communication, Augmented Reality applications, robot control and game development. The majority of BCI systems work by reading and interpreting cortically evoked electro-potentials across the scalp via an electro-encephalogram (EEG). The EEG signal has become the main data source of BCI study due to its low cost and non-invasive nature. The EEG data is inherently complex and difficult to analyze. Oscillatory activity in the EEG is classified into different frequency bands or rhythms: delta (0.5–3.5 Hz), theta (4–8 Hz), alpha 1 (8–10.5 Hz), alpha 2 (10.5–13 Hz), beta 1 (13–21 Hz), beta 2 (20–32 Hz), and gamma (36–44 Hz) [1]. Because EEG signals are non-stationary and nonlinear, and normally interfered by eye movements and muscle noises, it is difficult to differentiate the classes of mental tasks from EEG [2]. Different features can be extracted from the EEG data such as: time domain features related to changes in the amplitude of neurophysiologic signals, occurring time-locked to the presentation of stimuli or time-locked to actions of the user of a BCI, frequency domain features related to changes in oscillatory activity, and spatial domain features extracted and combined from several electrodes [1]. BCI systems require correct classification of signals interpreted from the brain for useful operation. After acquiring the EEG data, the pre-processing (filtering, denoising), feature extraction, and dimensionality reduction is performed, before machine learning algorithms can be applied to learn from a training dataset how to classify the signals into classes, where each class corresponds to a specific action of the user. A variety of machine learning and artificial intelligence methods are used for EEG data classification such as Principal Component Analysis (PCA), Independent Component Analysis (ICA), Bayesian Linear Discriminant Analysis (BLDA), Support Vector Machine (SVM), Fisher Linear Discriminant Analysis (FLDA), etc. [3]. Usually, the raw EEG data is preprocessed using the DSP methods such as Fourier analysis or wavelet transform for denoising and the filtered signal is used as input data for the classification method. Recently, the non-linear operators such as the TeagerKaiser energy operator (TKEO) [4, 5] have attracted the attention of researchers in the BCI domain. The TKEO so far has been applied in speech recognition [6], for image enhancement [7] and in EEG data analysis for detecting high frequency oscillations [8], mental task classification [9], and sleep spindle detection [10]. The structure of the remaining parts of the paper is as follows. Section II discusses the Teager-Kaiser energy operator. Section III proposes and describes the proposed higher order nonlinear operator. Section IV describes data classification using Support Vector Machine. Section V presents a case study. Section VI presents conclusions and outlines future work.


Introduction
Brain-computer interface (BCI) is a communication system that translates brain activity into commands for a computer or other digital devices [1]. The major goal of BCI research is to develop systems that allow disabled users to communicate with other persons, to control artificial limbs, or to control their environment. Other applications include multimedia communication, Augmented Reality applications, robot control and game development.
The majority of BCI systems work by reading and interpreting cortically evoked electro-potentials across the scalp via an electro-encephalogram (EEG). The EEG signal has become the main data source of BCI study due to its low cost and non-invasive nature. The EEG data is inherently complex and difficult to analyze. Oscillatory activity in the EEG is classified into different frequency bands or rhythms: delta (0.5-3.5 Hz), theta (4)(5)(6)(7)(8), alpha 1 (8-10.5 Hz), alpha 2 (10.5-13 Hz), beta 1 (13-21 Hz), beta 2 (20-32 Hz), and gamma (36-44 Hz) [1]. Because EEG signals are non-stationary and nonlinear, and normally interfered by eye movements and muscle noises, it is difficult to differentiate the classes of mental tasks from EEG [2]. Different features can be extracted from the EEG data such as: time domain features related to changes in the amplitude of neurophysiologic signals, occurring time-locked to the presentation of stimuli or time-locked to actions of the user of a BCI, frequency domain features related to changes in oscillatory activity, and spatial domain features extracted and combined from several electrodes [1].
BCI systems require correct classification of signals interpreted from the brain for useful operation. After acquiring the EEG data, the pre-processing (filtering, denoising), feature extraction, and dimensionality reduction is performed, before machine learning algorithms can be applied to learn from a training dataset how to classify the signals into classes, where each class corresponds to a specific action of the user. A variety of machine learning and artificial intelligence methods are used for EEG data classification such as Principal Component Analysis (PCA), Independent Component Analysis (ICA), Bayesian Linear Discriminant Analysis (BLDA), Support Vector Machine (SVM), Fisher Linear Discriminant Analysis (FLDA), etc. [3]. Usually, the raw EEG data is preprocessed using the DSP methods such as Fourier analysis or wavelet transform for denoising and the filtered signal is used as input data for the classification method.
Recently, the non-linear operators such as the Teager-Kaiser energy operator (TKEO) [4,5] have attracted the attention of researchers in the BCI domain. The TKEO so far has been applied in speech recognition [6], for image enhancement [7] and in EEG data analysis for detecting high frequency oscillations [8], mental task classification [9], and sleep spindle detection [10].
The structure of the remaining parts of the paper is as follows. Section II discusses the Teager-Kaiser energy operator. Section III proposes and describes the proposed higher order nonlinear operator. Section IV describes data classification using Support Vector Machine. Section V presents a case study. Section VI presents conclusions and outlines future work.

Teager-Kaiser Energy Operator
Nonlinear models are systems where either the additivity or the scalability properties do not hold in general are signals, H is a nonlinear model (operator), and  is a constant.
The Teager-Kaiser Energy Operator (TKEO), proposed by Teager [4] and further investigated by Kaiser [5], is a special case of nonlinear models. For a continuous An approximation of the derivatives by one-sample differences provides the definition of the TKEO for the discrete-time signal [5] (3) Moore et al. [11] propose a generalization of the Teager operator as 1-D Volterra filter Tomar et al. [12] introduce two generalizations of TKEO. A variable length TKEO (VTEO) is defined as The Summed-over Variable length Teager Energy Operator (S-VTEO) is defined as A combination of (4) and (5) operators is proposed in [13]  A generalization of the continuous TKEO as the higher-order energy operator (HOEO) k  is proposed in [14]  For discrete-time series, the HOEO can be rewritten as the discrete energy operator (DEO) [Maragos]: The advantage of the TKEO family of operators over the traditional DSP analysis methods such as Fourier Transform or wavelet analysis is the ability of the TKEO to discover high-frequency low-amplitude components [8] in analyzed data. The TKEO unlike conventional energy takes into account the frequency component of the signal as well as the signal amplitude.

Proposed Nonlinear Operators
In a general case, the TKEO operator can be generalized to the Homogeneous Multivariate Polynomial Operator (HMPO) , where the 2 nd order HMPO is defined as where   2 / m z  , and A is the coefficient matrix. The 3 rd order HMPO is defined as follows For example, TKEO (2) can be written as follows Such operator also can be seen as a special case of the 2D Volterra system as noted by Kvedalen [13].
The properties of the

Classification using Support Vector Machine
Support Vector Machines (SVM) [15] is a binary classification algorithm based on structural risk minimization. First, the SVM implicitly maps the training data into a (usually higher-dimensional) feature space. A hyper-plane (decision surface) is then constructed in this feature space that bisects the two categories and maximizes the margin of separation between itself and those points lying nearest to it (the support vectors). This decision surface can then be used as a basis for classifying vectors of unknown classification.
Consider an input space X with input vectors where SV are the support vectors,   j i x x K , is the kernel function, i  are weights, and b is the offset parameter.
, j x lies on the decision boundary and can not be classified.
Therefore, here we have a binary classification problem in which the outcomes are labelled either as positive (P) or negative (N) class. There are four possible outcomes from a binary classifier. If the outcome from a prediction is P and the actual value is also P, then we have a true positive (TP); however if the actual value is N then we have a false positive (FP). Conversely, a true negative (TN) has occurred when both the prediction outcome and the actual value are N, and false negative (FN) is when the prediction outcome is N while the actual value is P. To evaluate the precision of classification the following metrics are commonly used: 1) Precision is a measure of how well a binary classification test correctly identifies the true positives against all the positive results; 2) Recall is a measure of how well a binary classification test correctly identifies the positive cases; 3) Accuracy is the proportion of true results (both true positives and true negatives) in the test data; 4) F-measure (F) evaluates the accuracy of classification as the harmonic mean of specificity and recall; 5) Area Under Curve (AUC) is the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one.

Case study
For experiments, Data set Ia (Tübingen, ‹selfregulation of SCPs›, subject 1) [16] from the BBCI competition datasets (http://bbci.de/competition/) was used. The datasets were taken from a healthy subject. The subject was asked to move a cursor up and down on a computer screen, while his cortical potentials were taken. During the recording, the subject received visual feedback of his slow cortical potentials (SCPs). The dataset consists of 135 trials belonging to class 0 and 133 trials belonging to class 1. Each trial consists of 896 samples from each of 6 channels. The sampling rate of 256 Hz and the recording length is 3.5s. The dataset was randomly partitioned into 5 parts, and 5-fold cross-validation was used to evaluate the classification results.
The following nonlinear operators were applied to the raw EEG data: TKEO [13], and the proposed HMPO where the non-zero elements of the 3D matrix A are: The matrix values were set based on the results of grid-based search using all possible combinations of integer numbers from the range {-3, 3} Classification of data was performed using the SVMPerf [17] implementation of Support Vector Machine (available at http://svmlight.joachims.org/) with linear kernel. Kernel parameters were optimized using the Nelder-Mead algorithm based method described in [18]. The results of experiments are summarized in Table 1 (best results in each category are shown in bold).
Evaluating the experimental results, we can claim that the 3 rd order nonlinear operators such as the proposed HMPO can demonstrate better results for feature identification of the EEG data than traditional 2 nd order operators such as TKEO or its generalizations as demonstrated by Table 1. Even visually inspecting the graphs of positive and negative series in Figure 1, one can see that the HMPO operator allows for better identification of significant features (slow cortical potential signals of ~ 1 Hz frequency). a) b) Fig. 1. Samples of raw EEG data after application of HMPO operator: a -positive instance; b -negative instance

Conclusions and future work
In this paper, we proposed a novel nonlinear operator based on the generalization of the Teager-Kaiser Energy Operator, called Homogeneous Multivariate Polynomial Operator (HMPO). The applicability of the proposed operator is demonstrated for classification of the EEG signals. The experimental results obtained using a Support Vector Machine demonstrate an improvement of the classification results.
The proposed operator can be used for developing new EEG signal processing algorithms, which can be used in Brain-Computer Interface applications, e.g., for robot control in the noisy environment.
Future work will focus on the integration of higherorder nonlinear operators with the DSP-based filtering techniques to improve the classification accuracy of the EEG data.