Picture-induced EEG Signal Classification Based on CVC Emotion Recognition System

Emotion recognition systems are helpful in human–machine interactions and Intelligence Medical applications. Electroencephalogram (EEG) is closely related to the central nervous system activity of the brain. Compared with other signals, EEG is more closely associated with the emotional activity. It is essential to study emotion recognition based on EEG information. In the research of emotion recognition based on EEG, it is a common problem that the results of individual emotion classification vary greatly under the same scheme of emotion recognition, which affects the engineering application of emotion recognition. In order to improve the overall emotion recognition rate of the emotion classification system, we propose the CSP_VAR_CNN (CVC) emotion recognition system, which is based on the convolutional neural network (CNN) algorithm to classify emotions of EEG signals. Firstly, the emotion recognition system using common spatial patterns (CSP) to reduce the EEG data, then the standardized variance (VAR) is selected as the parameter to form the emotion feature vectors. Lastly, a 5-layer CNN model is built to classify the EEG signal. The classification results show that this emotion recognition system can better the overall emotion recognition rate: the variance has been reduced to 0.0067, which is a decrease of 64% compared to that of the CSP_VAR_SVM (CVS) system. On the other hand, the average accuracy reaches 69.84%, which is 0.79% higher than that of the CVS system. It shows that the overall emotion recognition rate of the proposed emotion recognition system is more stable, and its emotion recognition rate is higher.


Introduction
Emotional recognition is an essential subarea of affective computing. In recent years, researchers have tried many ways to identify the emotions of the subjects. Some of the main methods and techniques include facial expression recognition [Covic, Von Steinbüchel and Kiesehimmel (2019)], gesture recognition [Shao and Wang (2018)], speech recognition [Guo, Fu, He et al. (2019)], natural language processing [Batbaatar, Li and Ryu (2019)], human physiological signal recognition [Chao, Dong, Liu et al. (2019); Hsu, Wang, Chiang et al. (2020); Jerritta, Murugappan, Wan et al. (2014); Goshvarpour, Abbasi and Goshvarpour (2017)] and so on. Physiological signals recognize emotions mainly through electroencephalography (EEG) and electrocardiogram (ECG) [Chao, Dong, Liu et al. (2019); Hsu, Wang, Chiang et al. (2020)]. Among these sources, EEG is easily available and price effective with very high spatial and temporal resolution. The flow of emotion analysis based on EEG is shown in Fig. 1. Previous studies on EEG emotional recognition mainly used traditional machine learning algorithms to identify emotional state. One of the most common methods is the support vector machine (SVM) method. Neshov et al. [Neshov, Manolova, Draganov et al. (2018)] proposed an algorithm that is able to recognize five mental tasks using 6 channel EEG data. The processed data is utilized to train SVM classifier. In terms of 5 folds cross-validation obtained an average of 82.7% recognition rate. Saccá et al. [Saccá, Campolo, Mirarchi et al. (2018)] focused on Creutzfeldt-Jakob disease (CJD) EEG signals. To reduce the dimensionality of the dataset, principal component analysis (PCA) is used. These vectors are used as inputs for the SVM classifier. The classification accuracy reaches 96.67%. Richhariya et al. [Richhariya and Tanveer (2018)] proposed a novel machine learning approach based on universum support vector machine (USVM) for classification. Using approach of universum selection with universum twin support vector machine (UTSVM) which has less computational cost in comparison to traditional SVM. The results show that USVM and UTSVM show better generalization performance compared to SVM and Twin SVM (TWSVM). The proposed UTSVM has achieved highest classification accuracy of 99 % for the healthy and seizure EEG signals. Bengio [Bengio (2009)] has found that traditional machine learning algorithms have limited ability to represent complex functions in the case of finite samples and computational units, and their generalization ability is constrained for complex classification problems. Braverman [Braverman (2011)] also pointed out that there is a large class of models that cannot be represented by traditional learning networks. These mathematical results point to the limitations of traditional networks. Therefore, the classification results of traditional machine learning algorithms are limited in some complex models. This prompted researchers to turn to deep neural networks to model and classify complex problems. With the advent of deep learning, it has replaced the traditional methods due to its automatic feature extraction ability [Huang, Tian, Lan et al. (2019)]. Recently, much deep learning models-based EEG signal classification methods have demonstrated superior performance compared to traditional machine learning methods [Tabar and Halici (2017); Yang, Sakhavi, Ang et al. (2015)]. As one of the most widely used deep learning models, CNN is commonly used in various fields [Zhang, Wang, Li et al. (2018); Liu, Yang, Lv et al. (2019)]. At the same time, in the aspect of EEG, it is always used in combination with the extracted features of EEG signal data to provide improved classification results. Convolutional neural networks (CNN) are biologically-inspired variants of multilayer perceptron (MLP) designed to use minimal amounts of preprocessing [Lecun, Bottou, Bengio et al. (1998)]. They have wide applications in image recognition [Nam, Choi, Cho et al. (2018)], and natural language processing [Zhang, Chen, Liu et al. (2017)]. Based on the receptive field and weight sharing, the complexity of the network structure is reduced, i.e., the number of weights is reduced. Tang et al. [Tang, Li and Sun (2017)] have found that a new method based on the deep convolutional neural network (CNN) to perform feature extraction and classification for MI EEG signal. Then compared with other three conventional classification methods. The average accuracy using CNN (86.41 0.77%) is 9.24%, 3.80% and 5.16% higher than those using power+SVM, CSP+SVM and AR+SVM, respectively. Tabar et al. [Tabar and Halici (2017)] applied the CNN model to classify MI EEG signals and obtained 9% accuracy improvement compared to traditional methods. Yang et al. [Yang, Sakhavi, Ang et al. (2015)] proposed a multi-class MI EEG signal classification method based on augmented CSP features and CNN, and the accuracy is 69.27%. In this paper, we propose to apply CSP to the EEG with picture stimulation and put forward a new method based on the deep convolutional neural network (CNN) to classification for EEG signal. Firstly, using CSP to reduce the EEG data. Then the VAR is selected as the parameter to form the feature vector. Thirdly, a 5-layer CNN model is built to classify EEG signal. Finally, compare with the conventional classification method of CVS.

Related work
In this paper, we first using common spatial patterns (CSP) to reduce the EEG data. Then the standardized variance (VAR) is selected as the parameter to form the feature vector. Thirdly, using SVM and CNN as classifiers to classify EEG signals. Finally, compare the two classification results. All of this processing is shown in Fig. 2.

Figure 2: Schematic diagram of the correlation method
In emotion recognition, emotion feature extraction and classification algorithms are the most important processes. In subsequent chapters, we will introduce in detail.

A new feature extraction method, CSP_VAR
In the processing of EEG signals, the common spatial pattern (CSP) has been widely used in feature extraction of EEG signals, and it has been proved that common spatial pattern algorithm is a very useful algorithm in feature extraction of EEG data [ Fu, Tian, Shi et al. (2020); Tang, Li, Wu et al. (2019); Zhang, Xiao, Liu et al. (2018)]. Its basic principle is to find a spatial transformation matrix and transform the EEG to obtain a new matrix. Each row vector of the original matrix has a different degree of influence on the classification according to the size of the feature value. The largest m-line vector is selected according to the degree of influence, and the matrix of 64×750 is transformed into the matrix of 2 m×750 (where m is an integer and satisfies 1≤m≤32), thus achieving the effect of dimensionality reduction. The principle of the standard CSP is introduced below. We use the EEG signal for classification to be represented by a matrix E of N×T. Where N is the number of channels for collecting EEG, T is the number of samples per EEG signal, and T is greater than or equal to N. Then the normalized covariance matrix is: where is the transposition operation and () is the trace of the matrix The spatial covariance matrix for positive and negative emotions, using 1 ��� and 2 ��� , respectively, is obtained by calculating the mean of the covariance matrix. Then, the composite matrix of the two covariance matrices can be expressed as: can be broken down into: where is the eigenvector of , and is the diagonal matrix of the eigenvalues of . Calculate the whitening matrix below: After the whitening matrix is calculated, the average covariance matrix is deformed by using the whitening matrix. The formula is: 1 and 2 have the same eigenvector, namely: In the above two formulas, 1 and 2 satisfy 1 + 2 = . That is, the maximum eigenvalue of 1 corresponds to the minimum eigenvalue of 2 . The feature values 1 are arranged from large to small, and the feature vector is also sorted accordingly to obtain s . The optimal separation covariance matrix can be obtained by transforming the whitened matrix. Taking the first m rows and the last m rows of the transformation matrix s to form a new matrix 2m , the projection matrix for transforming the original signal is: The transformed matrix is: In the paper, the EEG data after dimension reduction of CSP is extracted according to Formula (11) to form feature vectors. Then they are learned and classified by SVM and CNN, respectively. Formula (11) normalizes the variance of each line of EEG data after dimensionality reduction according to the proportion. This feature is referred to in the paper as a standardized variance (VAR).
where represents the p-th element of the feature vector, () is the variance operation, and and represent the p-th row and the i-th row of the reduced-dimensional EEG matrix, respectively.

Classification algorithm based on CNN
According to Bengio [Bengio (2009)], we have established a similar CNN model, the structure of which is shown in Fig. 3. This CNN model consists of two convolutional layers, one fully connected layer, one dropout layer, and a SoftMax layer. Because the sample has a small data dimension, to save more information, we did not use the pooling layer for dimensionality reduction. After continuous trials, we chose a 1×5 convolution kernel. The first volume has 16 convolution kernels, and the second volume has 32 convolution kernels. The main parameters of this model are shown in Tab. 1. As can be seen from Tab. 1, although the CNN model does not have a pooling layer, it is due to the small training samples, the small number of convolution kernels, and the few parameters of the fully connected layer. The main training parameters of this model are less than 5000, so the single-step training speed of this model is relatively fast. When training a neural network, we record the value of cross entropy every ten steps of training. Fig. 4 is the change of cross entropy of a certain experimental training set with the number of training steps. The convolutional neural network of this model is usually trained within 100,000 steps, and the value of the loss function will be lowered and kept stable. This shows that the network has been relatively converged after training. In order to ensure that the convolutional neural network in the CNN model is stable after training. We trained the network in this model for 110,000 steps, and selected the classification result at this time as the final classification result.  The EEG signal is weak, and it is very susceptible to changes from the internal or external environment during the measurement process so that the measured signal is doped with many electrical activity disturbances that are not caused by the brain. These disturbances are called artefacts [Jiang, Zhang, Chen et al. (2017)]. In this paper, the pretreatment of EEG is done by Scan 4.5 software, which mainly works to remove bad areas, electro-optical artefacts and digital filtering.

Classification of Emotions
During the CNN training step, each subject's data were used to train his own classification model. The whole data set were separated into training set (80% data) and testing set (20% data). We performed the 10-fold cross-validation procedure in the model training step, i.e., 9-folds data of the training set were used for training and the remaining 1-fold data were used for validation. To compare with our proposed method, we employed CVS conventional classification methods.

CSP_VAR with SVM (CVS)
The variance extracted for each row of the dimensionality reduction matrix is used as a feature to form a 2 m-dimensional feature vector. Then train and classify through SVM. When different m values are taken, the classification accuracy of this model is shown in Tab. 2 (to preserve higher accuracy and subsequent curve fitting processing, the accuracy is expressed in the decimal form here). In the table, "aw", "ll" and so on are the subjects, "Ave" is the average accuracy.  It is difficult to see the influence of m on the classification from Tab. 1. Therefore, we have averaged the accuracy of these subjects and performed curve-fitting on average. The processing results are shown in Fig. 6. The histogram shows the average accuracy as a function of m. From the histogram, it can be found that the value of m is not as significant as possible. The dotted line is the fitting curve. It can be seen from it that the classification accuracy increases first and then decreases with the increase of m. When m=8, the average classification accuracy is the highest, and with the rise of m, the dimension reduction effect of CSP will become smaller and smaller. Therefore, when using CSP dimension reduction, we finally choose m=8. The final classification result of CVS system is shown in Tab. 3.

CSP_VAR with CNN (CVC)
The feature extraction process of the CVC system is the same as the CVS system, except that the CVC system uses CNN instead of SVM to classify. In the previous section, it is found by curve fitting that when the parameter m=8 of the CSP is the best dimension reduction effect of EEG data. In this section, the dimensionality is reduced in the same way, and the normalized variance is used as the feature. The classification accuracy is shown in Tab. 4.

Classification results
The classification results of CVC and CVS are shown in Tab. 5. According to the results in Tab. 5, it can be found that the CVC system has a slightly better effect than the CVS system.

Conclusion
In this paper, we transplant the CVC system into a picture-induced EEG emotion classification. In order to compare the classification effects of the two systems. We first classify the sample data with the CVS system and select the optimal dimension reduction method by curve fitting. Next, we use the same dimension reduction method and choose the standardized variance as the feature to establish the CVC system. The classification results show that this emotion recognition system can better the overall emotion recognition rate: the variance has been reduced to 0.0067, which is a decrease of 64% compared to that of the CVS system. On the other hand, the average accuracy reaches 69.84%, which is 0.79% higher than that of the CVS system. The results show that the overall recognition is more stable and more suitable as a method of emotion recognition.