Elsevier

Neuropsychologia

Volume 146, September 2020, 107506
Neuropsychologia

Emotion recognition with convolutional neural network and EEG-based EFDMs

https://doi.org/10.1016/j.neuropsychologia.2020.107506Get rights and content

Highlights

  • Proposed a novel concept of EFDMs with STFT based on multiple channel EEG signals.

  • Constructed four residual blocks based CNN for emotion recognition.

  • Performed cross-datasets emotion recognition based on deep model transfer learning.

  • Studied the number of training samples used for cross-datasets emotion recognition.

  • Obtained the key EEG information automatically based on EFDMs and Grad-CAM.

Abstract

Electroencephalogram (EEG), as a direct response to brain activity, can be used to detect mental states and physical conditions. Among various EEG-based emotion recognition studies, due to the non-linear, non-stationary and the individual difference of EEG signals, traditional recognition methods still have the disadvantages of complicated feature extraction and low recognition rates. Thus, this paper first proposes a novel concept of electrode-frequency distribution maps (EFDMs) with short-time Fourier transform (STFT). Residual block based deep convolutional neural network (CNN) is proposed for automatic feature extraction and emotion classification with EFDMs. Aim at the shortcomings of the small amount of EEG samples and the challenge of differences in individual emotions, which makes it difficult to construct a universal model, this paper proposes a cross-datasets emotion recognition method of deep model transfer learning. Experiments carried out on two publicly available datasets. The proposed method achieved an average classification score of 90.59% based on a short length of EEG data on SEED, which is 4.51% higher than the baseline method. Then, the pre-trained model was applied to DEAP through deep model transfer learning with a few samples, resulted an average accuracy of 82.84%. Finally, this paper adopts the gradient weighted class activation mapping (Grad-CAM) to get a glimpse of what features the CNN has learned during training from EFDMs and concludes that the high frequency bands are more favorable for emotion recognition.

Introduction

Human emotion plays an important role in the process of affective computing and human machine interaction (HMI) (Preethi et al., 2014). Moreover, many mental health issues are reported to be relevant to emotions, such as depression, attention deficit (Alkaysi et al., 2017), (Bocharov et al., 2017). Much information such as posture, facial expression, speech, skin responses, brain waves and heart rate are commonly used for emotion recognition (Liberati et al., 2015). There is some evidence that electroencephalogram (EEG) based methods are more reliable, demonstrating high accuracy and objective evaluation compared with other external features (Zheng et al., 2015). Although EEG has a poor spatial resolution and requires many sensors placed on the scalp, it provides an excellent temporal resolution, allowing researchers to study phase changes related to emotion. EEG is non-invasive, fast, and low-cost compared with other psychophysiological signals (Niemic, 2004). Various psychophysiological studies have demonstrated the relationship between human emotions and EEG signals (Sammler et al., 2007), (Mathersul et al., 2008), (Knyazev et al., 2010). With the wide implementation of machine learning methods in the field of emotion recognition, many remarkable results have been achieved. Sebe et al. summarized the studies of emotion recognition with single modality, described the challenging problem of multimodal emotion recognition (Sebe et al., 2005). Alarcao et al. presented a comprehensive overview of the existing works on EEG emotion recognition in recent years (Alarcao and Fonseca, 2019). A number of EEG datasets have been built with various emotions or scored in one continuous emotion space. However, the problem of modeling and detecting human emotions has not been fully investigated (Mühl et al., 2014). EEG based emotion recognition is still very challenging for the fuzzy boundary between emotion categories as well as the difference of EEG signals from kinds of subjects.

Various feature extraction, selection and classification methods have been proposed for EEG based emotion recognition (Zhuang et al., 2017). Friston modeled the brain as a large number of interacting nonlinear dynamical systems and emphasized the labile nature of normal brain dynamics (Friston, 2001). Several studies have suggested that the human brain can be considered as a chaotic system, i.e., a nonlinear system that exhibits particular sensitivity to initial conditions (Ezzatdoost et al., 2020). The nonlinear interaction between brain regions may reflect the unstable nature of brain dynamics. Thus, for this unstable and nonlinear EEG signals, a nonlinear analysis method such as sample entropy (Jie et al., 2014) is more appropriate than that of linear methods, which ignores information associated with nonlinear dynamics of the human brain. Time-frequency analysis methods are based on the spectrum of EEG signals. Power spectral density and differential entropy of sub-band EEG rhythms are commonly used as emotional features (Duan et al., 2013), (Ang et al., 2017). In the last decade, a large number of studies have demonstrated that the higher frequency rhythms such as beta and gamma outperform lower rhythms, i.e., delta and theta, for emotion recognition. Traditional recognition methods are mainly based on the combination of hand-crafted features and shallow models like k-nearest neighbor (KNN), support vector machines (SVM) and belief networks (BN) (Duan et al., 2012), (Sohaib et al., 2013), (Zubair and Yoon, 2018). However, EEG signals have a low signal-to-noise ratio (SNR) and are often mixed with noise generated in the process of data collection. Another much more challenging problem is that, unlike image or speech signals, EEG signals are temporally asymmetry and nonstationary, which has created significant difficulties for data preprocessing to obtain clean data for feature extraction. The nonstationary means the properties (mean, variance and covariance) of EEG signals varied with time partly or totally. Temporally asymmetric refers to the fact that the corresponding activation lobes and activation degree are different under various cognitive activities. Pauls has identified these two nonlinearity properties of EEG (Palus, 1996). Moreover, traditional manual feature extraction and selection methods are crucial to an affective model and require specific domain knowledge. The commonly used dimensionality reduction techniques for EEG signal analysis are principal component analysis (PCA) and Fisher projection. In general, the cost of these traditional feature selection methods increases quadratically with respect to the number of features that is included (Dash and Liu, 1997).

As a form of representation learning, deep learning can extract features automatically through model training (Zhang et al., 2018). Apart from the successful implementation in image and speech domains, deep learning has been introduced to physiological signals, such as EEG emotion recognition in recent years. Zheng et al. trained an efficient deep belief network (DBN) to classify three emotional states (negative, neutral, and positive) by extracting differential entropy (DE) of different frequency bands and achieved an average recognition of 86.65% (Zheng and Lu, 2015). As a typical deep neural network model, convolutional neural network (CNN) has achieved great progress in computer vision, image processing and speech recognition (Hatcher and Yu, 2018). Yanagimoto et al. built a CNN to recognize the emotional valence of DEAP and analyze various emotions with EEG (Yanagimoto and Sugimoto, 2016). Wen et al. rearranged the original EEG signals through Pearson Correlation Coefficients and fed them into the end-to-end CNN based model for the purposes of reducing the manual effort on features, which achieved an accuracy of 77.98% for Valence and 72.98% for Arousal on DEAP, respectively (Wen et al., 2017).

The mainly used feature extraction methods of EEG signals can mainly be divided into time domain, frequency domain, and time-frequency domain (Wang, 2011), (Chuang et al., 2014), (Li et al., 2017). Frequency analysis transformed the EEG signals into frequency domain for further feature extraction. Since many studies demonstrated that the frequency domain features have higher distinguishability, we proposed the novel concept of electrode-frequency distribution maps (EFDMs) firstly. With the successful application of CNN in speech recognition (Abdelhamid et al., 2014), we build a deep neural network for emotion recognition based on EFDMs. The EFDMs of EEG signals can be regarded as grayscale images. Therefore, with proposed EFDMs, we realized the purpose of constructing emotion recognition model based on CNN.

At present, studies on EEG emotion recognition mainly focus on subject-dependent emotion recognition tasks. For engineering applications, it’s obviously impossible to collect a huge amount of subjects’ EEG signals in advance to build a universal emotion recognition model to identify the emotions of every person. Therefore, how to realize the subject- dependent pattern classification is one tough issue in the practical application of emotion recognition. Traditional emotion recognition models are usually established for a specific task on a small dataset, thus they often fail to achieve good effect under new tasks, due to the possible differences in stimulus paradigm, subjects and EEG acquisition equipment. In addition, the learning process of deep neural networks is vitally important, and generally requires a large amount of labeled data, while the acquisition of EEG signals is not as easy as that of image, speech and text signals. Accordingly, how to achieve a highly effective classifier through the training process based on a small number of labeled samples is another issue that needs to be considered. In this paper, transfer learning is employed to solve those problems highlighted above. Among various transfer learning methods, one is to reuse the pre-trained model from source domain to target domain, dependent on the similarities of data, tasks and models between them (Pan and Yang, 2010). Transfer learning accelerates the training process by transferring the pre-trained model parameters to a new domain task. Since Yosinski et al. published an article on how to transfer the features in deep neural network, it has achieved a rapid development in the field of image processing (Yosinski et al., 2014).

We firstly proposed a novel concept of EFDMs based on multiple channel EEG signals. Then four residual blocks based CNN was built for automatic feature extraction and emotion classification with EFDMs as input. We mainly set up two experiments in this paper. One is to evaluate the effectiveness of the proposed method on SEED. Second, based on the deep model transfer learning strategy, the pre-rained CNN from the first experiment is applied to DEAP for the cross-datasets emotion recognition. At the last, we have given more neuroscience interpretation by revealing the key EEG electrodes and frequency bands corresponding to each emotion category based on the attention mechanism of deep neural network and the proposed EFDMs.

Section snippets

Methods

In this section, we will detail the general framework of the EFDMs based CNN for emotion recognition, including a short description of short-time Fourier transform (STFT), the structure and key parameters of the proposed CNN as well as a brief introduction to Grad-CAM.

Dataset description and analysis

In this section, we make a description on two EEG emotion recognition datasets, i.e. SEED and DEAP. Then some data preprocessing methods are presented to prepare samples for cross-datasets emotion recognition. Finally, data distribution between different subjects are analyzed.

Experiments and results analysis

We set up two experiments. First, the effectiveness of the proposed method for EEG-based emotion recognition is verified using SEED. Then, based on the deep neural network transfer learning strategy, the pre-trained model is applied to DEAP with 12 training samples of each emotion class.

Conclusion

In this paper, we have provided a solution to tackle the challenge of differences in individual emotions with deep model transfer learning. Aims to build a robust emotion recognition model independent of stimulus, subjects, and EEG collection device etc. We have mainly set up two experiments, within and cross-datasets emotion recognition. First, the effectiveness of the proposed approach is valid on SEED with an average accuracy of 90.59%. After that, the pre-rained CNN from the first

CRediT authorship contribution statement

Shichao Wu: Methodology, Software, Writing - original draft, Writing - review & editing. Weiwei Zhang: Data curation, Validation. Zongfeng Xu: Resources. Yahui Zhang: Visualization. Chengdong Wu: Investigation. Sonya Coleman: Formal analysis.

References (46)

  • C. Chuang et al.

    Independent component ensemble of EEG for brain–computer interface

    IEEE Transactions on Neural Systems and Rehabilitation Engineering

    (2014)
  • R. Duan et al.

    EEG-based emotion recognition in listening music by using support vector machine and linear dynamic system

    In international conference on neural information processing

    (2012)
  • R. Duan et al.

    Differential entropy feature for EEG-based emotion classification

    In international ieee/embs conference on neural engineering

    (2013)
  • K.J. Friston

    Book Review: Brain function, nonlinear coupling, and neuronal transients

    The Neuroscientist

    (2001)
  • W.G. Hatcher et al.

    A survey of deep learning: platforms, applications and emerging research trends

    IEEE Access

    (2018)
  • X. Jie et al.

    Emotion recognition based on the sample entropy of EEG

    Biomedical Materials and Engineering

    (2014)
  • G.G. Knyazev et al.

    Gender differences in implicit and explicit processing of emotional facial expressions as revealed by event-related theta synchronization

    Emotion

    (2010)
  • S. Koelstra et al.

    Deap: a database for emotion analysis; using physiological signals

    IEEE Transactions on Affective Computing

    (2012)
  • Z. Lan et al.

    Domain adaptation techniques for EEG-based emotion recognition: a comparative study on two public datasets

    IEEE Trans. on Cognitive and Developmental Systems

    (2019)
  • Y. Li et al.

    A novel neural network model based on cerebral hemispheric asymmetry for EEG emotion recognition

    In international joint conference on artificial intelligence

    (2018)
  • G. Liberati et al.

    Extracting neurophysiological signals reflecting users’ emotional and affective responses to BCI use: a systematic literature review

    NeuroRehabilitation

    (2015)
  • W. Liu et al.

    Emotion recognition using multimodal deep learning

    In international conference on neural information processing

    (2016)
  • Y. Lu et al.

    Combining eye movements and EEG to enhance emotion recognition

    In international conference on artificial intelligence

    (2015)
  • Cited by (104)

    • A multi-task hybrid emotion recognition network based on EEG signals

      2023, Biomedical Signal Processing and Control
    View all citing articles on Scopus

    This work was supported in part by the National Natural Science Foundation of China under Grant 61973065, Fundamental Research Funds for the Central Universities of China under Grant N172608005 and N182612002, Liaoning Provincial Natural Science Foundation of China under Grant 20180520007.

    View full text