Emotion Recognition Using WT-SVM in Human-Computer Interaction

: With the continuous development of the computer, people's requirements for computers are also getting more and more, so the brain-computer interface system (BCI) has become an essential part of computer research. Emotion recognition is an important task for the computer to understand social status in BCI. Affective computing (AC) aims to develop the model of emotions and advance the affective intelligence of computers. There are various emotion recognition approaches. The method based on electroencephalogram (EEG) is more reliable because it is higher in accuracy and more objective in evaluation than other external appearance clues such as emotion expression and gesture. In this paper, we use the wavelet transform (WT) to extract three kinds of EEG features in time, and frequency domain, which are sub-band energy, energy ratio and root mean square of wavelet coefficients. They reflect the emotion related to EEG activities well. The average classification accuracy of support vector machine (SVM) can reach 82.87%, which indicates that these three features are very effective in emotion recognition. On the other hand, compared with international affective picture system (IAPs), EEG data collected by Chinese affective picture system (CAPs) stimulation has a higher emotion recognition rate, indicating that there are cultural background differences in emotions.


Introduction
Research efforts in emotion recognition are focused on the means to understand the human brains' working mechanism, e.g., speech recognition and facial recognition systems. Despite the tremendous achievements in this area over the past few years, many problems remain. Many researchers are working to resolve these issues. Besides, there is another critical but ignored mode of physiological signals that may be important for more natural interaction: emotion plays an essential role in the understanding of messages from others in many forms. There are numerous areas in emotion recognition and Braincomputer interface (BCI) that could effectively use the capability to understand emotion. Although limited in number compared with the efforts being made towards intention-translation means, some researchers are trying to realize BCI by EEG signal separately.
Most scientists are focused on facial expression recognition and speech signal analysis. More and more researches are realized that Emotional state can be expressed in a variety of biological information, EEG studies of negative and positive emotions have found that different responses to positive and negative emotions can be found in front of the brain. In a positive mood, the EEG activity in the left anterior hemisphere increased. The asymmetry of hemispheric activity was higher than that of negative emotion. Davidson and Fox also found that asymmetric movement in the front of the brain in both positive and negative emotions existed during infancy. They concluded that although both positive and negative emotions are processed in the left hemisphere, positive emotions cause more EEG activity in the left region than negative emotions. EEG has the following characteristics.
(1) Loud background noise and weak signal amplitude. The amplitude of the EEG signal is measured by μ V, while the EP signal is more fragile. In the collected EEG signal, there are a lot of interference signal components, and the signal-to-noise ratio is low.
(2) Non-linear. As an advanced biological tissue, the regulation and adaptability of the brain are reflected in electrophysiological signals, which makes the electrophysiological signals non-linear.
(3) The spectral characteristics of EEG are more prominent than that of the time domain. Compared with other physiological signals, the EEG analysis method based on spectral characteristics has more advantages.
(4) EEG signals are usually obtained by multi electrodes, and there is a correlation between the leads. How to effectively use these leads to extract the crucial features hidden in the multi-channel EEG signals is a significant problem in the research of EEG signals.
We believe that this is a more natural means of emotion recognition, in that the influence of emotion on EEG, eye-tracking and facial expression can be suppressed relatively quickly, and emotional status is inherently reflected in the activity of the nervous system.

Related Work
According to the above characteristics of EEG, many methods of EEG signal analysis and processing are proposed. These methods are mainly divided into time-domain analysis and frequency domain analysis.
Time-domain analysis of EEG. The initial EEG analysis mainly used time-domain analysis. Through the study of EEG time-domain waveform, the important EEG time-domain features such as variance analysis, histogram analysis, correlation analysis, peak detection and zero-crossing detection are obtained.
Frequency-domain analysis of EEG. Through the power spectrum analysis of the EEG signal, we can get the frequency characteristics of the EEG signal, such as classical spectrum estimation and recent spectrum estimation.
Time-frequency analysis of EEG. Due to the strong non-stationarity of the EEG signal, the wavelet transform has been paid more and more attention in the field of EEG signal processing and formed an obvious research direction.
Non-linear analysis of EEG. According to the non-linear characteristics of EEG, the non-linear dynamic parameters of EEG are extracted from the perspective of non-linear dynamics, such as complexity analysis, Lorentz scatter diagram, correlation dimension D2, Komograve entropy and Lyapunov index.
According to the EEG study of emotional and cognitive processes. Based on the dynamics of the generation and regulation of positive, neutral, and negative emotions, a collaborative processing model is developed. First, choose stimuli that can effectively induce different emotions, and use the emotional paradigm (positive and negative stimulus matching) and dynamic processing of emotional regulation to arrange the order of stimulus presentation and change different cognitive task requirements (guideline). Then carry out the EEG experiment of stimulus information evaluation, we can obtain useful EEG emotion data. On this basis, EEG features of different emotional states are extracted, and corresponding classifiers are designed to improve the recognition rate of the feature classification algorithm.

Emotion Recognition System Based on EEG
The design of the emotion recognition system is more popular, and many references are introducing some emotion recognition system. Professor Kim has designed an emotion recognition system based on physiological signals, and the input signals were electrocardiogram, skin temperature variation and electrodermal activity. Correct classification ratios for 50 subjects were 78.4% and 61.8% for the recognition of three and four emotion mood, respectively [1].
The Face Emotion Recognition (FER) is achieved in two parts; Pictures processing and classification.
The first part investigates a set of pictures processing methods suitable for recognizing the face emotion [2]. The acquired pictures then undergo few preprocessing techniques, including Gaussian smoothing. An artificial neural network trained using the Levenberg Marquardt algorithm was developed for emotion recognition through facial expression [3][4][5][6]. Chen et al. [7] proposed an enhanced speech emotion recognition system, which would improve the recognition accuracy. Furthermore, some modifications were performed on traditional KNN classification, which could reduce the interruption of noise. The experiment result shows that our system makes 3%-5% relative improvement compared with the conventional method.
With the rising interest in brain-computer interaction (BCI), EEGs have been analyzed as an essential emotion factor for emotion recognition. Whether the EEG just shows a physiological response, or also gives insight into the emotion as to how it is experienced mentally, correct EEG-based recognition of artificially evoked emotion is only about 60% or less than. Still, Reference [8,9] shows the suitability of EEG for this kind of task.
Although several automatic emotion recognition systems have explored the use of either facial expressions or speech to detect human affective states, relatively few efforts have focused on emotion recognition using both modalities [10,11]. It is hoped that the multimodal approach may give not only better performance but also more robustness when one of these modalities is acquired in a noisy environment [12]. The data acquisition system used in this study consists of the picture acquisition system, EEG information acquisition system and eye movement acquisition system. For EEG information acquisition system, stimulation animation, video, picture or sound generated by PC1 computer act on the subject, and EEG signal of the subject is collected and transmitted to PC2 through amplifier; picture acquisition system is composed of camera and PC3 machine in front of the subject to obtain the facial expression of the subject to PC3 computer. All the collected information is stored in a multi-ethnic and large-scale EEG database composed of Oracle and Dell disk storage array (12t). As shown in Fig. 1. In order to achieve EEG, eye tracking and facial surveillance synchronously, we have built a synchronization acquisition of cognitive experimental platform; the platform can be easily standardized to complete the cognitive experiments.
In this platform, EEG, eye tracking and facial signals can be achieved under the same stimuli experiment, which will enhance the accuracy of emotion recognition. All information obtained will be analyzed by that existed algorithms, feature extraction, feature selection and classification algorithm.

Data Work Mode
In the process of acquisition of EEG data, send data to the specified parallel port (2 to 9-pin), then following the binary in the timeline of the EEG data marked with the number sign, if the defined interval, for example, each 100 photo printing a mark, the mark on the video data and EEG data to synchronize. The reasons why every specified number of frames you need to mark to eliminate the accumulated error and to better the performance of the multi-mode emotion system.
In this system, the EEG information database should contain not only EEG information but also the personal data and emotional state of the subjects. However, subjects often perform more than one EEG test, so it is necessary to define separate tables for the subject's personal information and EEG data. And according to the corresponding relationship between EEG information and emotional state, the subject's data and emotional state are listed separately. The final designed EEG information database contains three tables: EEG data table used to store EEG information data for each EEG test. The primary key of this table is the EEG number, and the foreign key is the primary key of the test table. The test table is used to store the personal data of the subject. The primary key of the table is the subject number-emotional state table for storing topic numbers. The primary key of the table is also the subject number. The relationship between the tables is shown in Fig. 2.

Method and Data
During the period of design, an emotion stimulus experiment, E-Prime software is used to design and write emotion-inducing files. E-Prime is a set of experimental generation system for computerized behavior research jointly developed by Pittsburgh University and PST Company in the United States. It is a globally recognized professional design software for psychological experiment program, and the timing accuracy can reach the millisecond level.
As mentioned earlier, picture stimulation can induce a relatively single emotion, so that the experiment can exclude the influence of other factors, which is more targeted, so it has been widely used. However, due to the short time of pictures presentation, it is challenging to induce stable and sustained emotions. To solve this problem, we used the method of continuous stimulation of similar pictures to produce emotion in the design of induction file. Similar pictures have high consistency in arousal and validity. When switching pictures, we can try to avoid the emotion fluctuation of the subjects. Besides, because the content of the picture is different, compared with the way of presenting a picture for a long time, this method can avoid the fatigue or even restless state caused by the monotonous and boring content.
Taking 90 pictures of IAPs and CAPs as stimulus materials, two stimulus files are designed with the same rules. Taking IAPs as an example, every 5 pairs of pictures of the same category without repetition were divided into one group as a stimulus unit, 18 groups in total. In each group of stimulation, each picture showed about 2 s (2000 ± 200 ms), and there was no time interval between pictures. The stimulus document consists of two parts: exercise and test. The first is the instruction, which explains the experiment process and operation rules to the subjects. Then there is the practice part, including three stimulation units, to make the subjects familiar with and accurately press the key. After the participants confirm the exercise, they enter into the formal test process. There are 18 groups of stimulus units, which are randomly presented, and the subjects give feedback according to their emotional feelings. Before each group of stimulation, there will be a red cross in the centre of the screen for 3 s to remind the subjects to pay attention. Then a group of stimulation, namely 5 consecutive pictures, will be displayed. After the picture, there will be a 1 s prompt, prompting the subjects to feedback according to their emotional experience. The last is the 8 s black screen to clear the influence of the stimulation of the upper group. In the practice part, after the critical feedback, it will prompt whether the key is correct, but not in the formal test. The detail process is shown in Fig. 3.

Practice Exercises
Requirements: the subjects are familiar with the experimental conditions; the key reaction results of the subjects can be fed back, which helps the subjects to judge whether they can correctly carry out the experimental operation, the subjects can independently determine the experimental process in Fig. 4.

Formal Experiment
It can be divided into three parts: joy, disgust and neutral emotion, in this process as in Fig. 5, the key reaction results of the subjects will be recorded, and the EEG data of the subjects will be recorded synchronously. This experiment adopted the non-intrusive EEG signal acquisition mode. The principle of the system is that the relative distance between the scalp poles is represented by 10% and 20%, and the sagittal and coronal lines are used as the marker lines.
When the original EEG signal is collected, the collected signal also contains many interference signals, such as EOG and EMG, so it is necessary to de-noise the EEG signal.
The simplest way to reduce the noise of the EEG signal is to directly remove the part of EEG or EMG from the collected signal. Still, often these signals also contain part of EEG signal, so doing so will inevitably cause the loss of original EEG signal. If the frequency of the interference signal is not high, and the accuracy required by the test is not high, the loss of EEG signal will not have a significant impact on the results. However, if the frequency of the interference signal is high, this method of direct removal will have a severe effect on the performance and accuracy of the emotional classification algorithm of the EEG signal in the future. In order to improve the accuracy of noise reduction, many new noise reduction methods have been proposed.

Feature Extraction of EEG Signal
After the noise is removed, the feature of the EEG signal should be extracted. For the following classification processing.
Use the relevant mathematical tools, and base on the mapping between physiological signals and emotional state, we can be able to describe the emotional state of physiological features extracted from the physiological signal. These characteristics of information include the amplitude in the time domain and the frequency for domain frequency components, phase and space between the electrodes [13,14]. However, the correspondence between the determine the emotional state and physical or behavioral characteristics is a basic premise of the theory of affective computing, the corresponding relationship is not very clear, and need further exploration and research. Commonly used feature extraction methods can be divided into the time domain and frequency domain methods [15][16][17], space method [18,19], spatial and temporal combination of methods. Among them, time-domain and frequency-domain methods include discrete Fourier transform, wavelet transform, and regression model. The spatial complexity of the model has independent component analysis-multiple regression model of spatiotemporal combination spatiotemporal filtering, spatiotemporal complexity, spatiotemporal synchronization model. Guyon et al. [20] using the more commonly used in EEG data processing method of the spatial patterns extracts the characteristic signal in the previous EEG signal processing, this approach proved to be very good, and there is a healthy adaptation.
Conventional methods of feature extraction include autoregressive, Fourier transform, surface Laplacian, wavelet transform, etc.
Considering that the EEG signal is a kind of time-varying non-stationary signal, the traditional time-domain and frequency-domain features cannot clearly distinguish the frequency components and some transient details in a specific time range. The multi-resolution analysis of wavelet transform can extract the time-frequency characteristics of EEG signals. Compared with the short-time Fourier transform, the wavelet basis function in the wavelet transform is not unique; that is, the flexibility mentioned in the previous section. Different wavelet functions have their characteristics and scope of application. Therefore, the selection of wavelet basis function is an essential problem in the wavelet transform analysis.
In this section, considering the orthogonality and compactness of Daubechies wavelet and the approximately optimal positioning characteristics of DB4 wavelet base [20], we use this function as a wavelet base function to transform the EEG signal and extract the time-frequency characteristics.
The time-frequency characteristics of each sub-band are calculated by wavelet decomposition coefficients. The signal has five sub-bands; each band has three kinds of characteristics. The dimension of the eigenvector is 5 × 3 × 62 = 930, where 62 is the EEG derivative.

Classification of EEG Signals
After the feature extraction of EEG, it classifies the signals according to their features. Generally speaking, classifiers can be divided into linear classifiers and non-linear classifiers. Conventional linear classifiers include Linear Discriminant Analysis (LDA), the logical classifier (logistic expression), etc.; standard non-linear classifiers include support vector machine SVM, neural network (NN), etc.
Using physiological signals to recognize the emotional states, due to the complexity of emotion, the physiological characteristics of a particular emotional state are not only. Ensuring that describing the emotional states meets certain accuracy, we minimize the physiological characteristics, reduce the complexity of the system, and make the system accord with the fact, to find the combination mode of the most representative of the emotional characteristics of the corresponding emotional state. The feature selection problem is essentially a combination of multidimensional variables and has been extensively studied and applied in many fields. During the emotion recognition process based on physiological signals [21], the more features extracted, the better the ability to distinguish emotions is. However, due to the algorithm limitations, in practical problems, more characteristics do not mean that the system will have a stronger ability to identify. This shows that the feature set there must be some redundant features. Bian et al. [22] has mentioned that different feature selection algorithms will focus on various benefits. The benefits of feature selection are beneficial to the data visualization and the data understanding; Reduce the requirements of data measurement and data storage; Reduce the time of the training and application; Challenge "dimension disaster" and improve the prediction performance of the system.
The task of feature selection [22] is select a number d (D > d) optimal features from a set of D features. It is a combinatorial optimization problem, so you can use the methods solving the optimization problem to solve the feature selection problem. The commonly used methods are branch and bound method, SFS, SBS, increased l reduction r method, SFFS, SFBS, etc., we can also use an intelligent heuristic algorithm, such as the simulated annealing, genetic algorithms, Tabu Search and Particle Swarm Optimization.
In the process of emotion recognition, feature selection is only the intermediate stage of the learning system. The sample-set corresponded to the finally chosen feature subset will be processed by classification or regression analysis and processing, so the final result of feature selection is to judge by the learning algorithm. The often-used classification learning algorithms [23][24][25] in conjunction with feature selection are K-nearest neighbor algorithm, BP algorithm, Multilayer Perceptron (MLPN), Fisher projection criteria, Support Vector Machine (SVM), Linear Discriminant Function (LDF), the Probability Neural Network (PNN), the Bayesian classifier. In recent years, scientists have mostly chosen deep learning networks for classification problems [26][27][28]. But these algorithms are more suitable for processing large sample data. Considering the data size of this article, we choose a more mature support vector machine algorithm for classification of emotions.
There were 6 subjects in this experiment, and 12 groups of experimental data were collected under the stimulation of two groups of pictures. In each group of data, we randomly selected 4/5 as training samples, and the remaining 1/5 as test samples for learning classification. The process of sample learning and classification is completed by the libsvm tool developed by Dr. Zhiren Lin. We choose the crossvalidation method to optimize the parameters of C and γ in SVM.
The Tab. 1 shows the three emotion recognition accuracy rates of six subjects under the stimulation of CAPs and IAPs continuous pictures, respectively. In the table, "Gxb", "Fym" and so on are the number of subjects, "Ave" is the average accuracy. From the table, it can be seen that the average accuracy can be as high as 82.87%, which proves the effectiveness of sub-band energy, energy ratio and root mean as emotional characteristics, and also reflects the excellent analysis ability of wavelet transform for nonstationary signals such as EEG. In addition, continuously stimulating similar emotional pictures to induce the subject's emotions, on the one hand, enhances the awake effect, and on the other hand, avoids continuous switching or not waiting and causing the subject's emotional instability. Because it avoids causing the previous emotion, and the next emotional picture stimulation restarts this situation, which can keep the emotion for a time, which is consistent with people's cognitive habits. The table also compares the differences between men and women, the effectiveness of CAPs and IAPs on Chinese subjects, and the correct rate of emotion recognition of female subjects is higher than that of male subjects in terms of both individual and average measurement. This shows that women's emotions are more easily induced, which is in line with our common sense.

Conclusion
In this paper, we study the emotional classification of EEG, including experimental design, timefrequency feature extraction and feature classification.
First of all, in terms of experimental design, CAPs and IAPs, two authoritative emotional picture databases, were used as stimulus materials to induce three different emotions of the subjects in the way of continuous stimulation of similar pictures, namely, positive, neutral and negative emotions. The emotion induced by pictures stimulation is relatively pure and straightforward. In contrast, the constant presentation of multiple pictures can enhance the inducing effect, stabilize the emotional state, and avoid the emotional fluctuation and even disorder caused by frequent switching. After collecting EEG data, we do preprocess to ensure the validity of the data.
Then, we use the wavelet transform to extract three time-frequency features of EEG, and use SVM for training and emotion recognition; the recognition rate can reach 82.87%. In the extraction of timefrequency features, we choose DB4 as the wavelet base to carry out 5-level wavelet transform. The energy of each sub-band, the proportion of energy and the root mean of wavelet coefficients are extracted as emotional features. These three characteristics will reflect the degree of EEG activity and the weight of each sub-band in emotion recognition. Compared with IAPs, CAPs have a higher inducing effect on Chinese subjects, reflecting the difference of emotion in cultural background. When intercepting EEG signal samples, we caught the same length of the signal at the time of pictures stimulation and in the process of pictures presentation. This way reflects the process of emotion induction and the non-linear time-varying characteristics of EEG.