EEG based mental state analysis

This work proposes a method for mental state analysis from the Electroencephalogram (EEG) signal data using Convolutional Neural Network (CNN). In recent years, deep learning techniques, particularly CNN have become a very popular topic. Emotions also play a very crucial role in our daily life. Emotions affect a person’s cognition, behaviour and decision-making. In this paper, we analyse the EEG signals and classify them into various emotions. The result would be a real-time EEG-based emotion recognition system. The DEAP dataset, which is a multimodal dataset, has been used in this study for analysis of human mental states.


Introduction
Emotions are psychological states that can impact a person's cognition, behaviour, decision-making and attention. The past decades focused on various emotion recognition methods based on facial expression, speech pattern and body gestures. But recently, researchers have recognized that brain signals can be used in assessing the human emotions effectively. Electroencephalography is a non-invasive neuroimaging medical technique which is used to evaluate the electrical activities in the brain. Due to ease of usage, EEG-based mental state analysis has become popular now. However, scalp EEG is a random dynamic signal from various cortex sources of the brain and hence cannot directly indicate any exact information. Hence the major challenge is to identify how the EEG could be mapped with the momentary mental-states. This paper focuses on identifying discriminative features and classifying the emotions appropriately.
The paper proceeds as follows. Related works are summarized in section II. Methodology is described in section III. Dataset details, implementation of model and results are presented in section IV. 2 Automated emotion recognition system based on deep learning algorithm and higher order statistics was proposed by Rahul Sharma et.al [5]. The work focused on higher order statistics to explore the emotions. Signals were decomposed into sub-bands using Discrete wavelet transform (DWT). Non-linear dynamics of each sub-band signal were then explored with the use of Third-order cumulants (ToC). The deep learning technique of long short-term memory was used to classify the EEG signals. The study could achieve an accuracy of 82.01% on the DEAP dataset.
Hassouneh et. Al [10] put forward a real-time emotion recognition system. The work focussed on classifying the emotions of physically disabled people using facial expressions and EEG signals in CNN and LSTM classifier respectively. The subjects were instructed to express six different emotions in front on the camera and these were recorded. Mathematical model of the facial markers which were placed on the face of the subjects was generated and analysed. This was classified with the help of CNN. Similarly, the EEG signals were used on the LSTM classifier. The work could achieve an accuracy of 99.81% on facial expressions while using CNN and 87.25% on EEG signals while using LSTM classifier.
In [13], Nitin Kumar et. al presented a bispectral analysis method for recognizing emotions using DEAP dataset. Bispectrum, which is the third order statistics of a signal and method of identifying the phase relationships between the various frequency components, was used. The emotions were classified on the basis of a valence-arousal model. The brain signals were filtered into theta, alpha and beta bands and bispectrum was calculated using 2 EEG channels. Using the various derived bispectrum features, emotion classification was performed by the Linear Kernal Least Square Support Vector Machine (LS-SVM) and ANN (Artificial Neural Network) back-propagation algorithm. Backward sequential search was employed as the feature vector and used for feature selection.

Proposed Methodology
Our methodology involves the below mentioned steps. In the first step, raw EEG signals are preprocessed. In the second step, wavelet transform is applied to the preprocessed EEG signals. Third step involves generating scalogram images. Next step is to train the CNN using scalogram images in order to classify the emotions.

Preprocessing
At first the EEG signals were recorded from 32 subjects using 32 active AgCl electrodes [2]. The sampling rate was 512 Hz. The raw EEG data is then down-sampled to 128Hz. EOG artefact removal was performed using a blind source separation technique in order to remove the noise elements caused by eye and muscle movements. A bandpass frequency filter from 4.0-45.0Hz was applied and the data was averaged to a common reference. The data was segmented into 60 second trials and a 3 second pre-trial baseline removed.

Dataset
The dataset we used is the DEAP dataset that analyses human emotions. This dataset is taken from 32 subjects. It includes multichannel EEG signals and peripheral physiological signals. A music video clip is shown to each of the subject and their feedback is recorded on a continuous 9-point scale in terms of valence, arousal, dominance, and liking. A total of 8064 samples were recorded from 32 individual subjects with 40 trials for each subject. The data is of the format 32×40×40×8064 that includes 32 subjects, 40 trials/videos, 40 channels and 8064 samples of data [2].

Wavelet Transform
A wavelet transform (WT) is the decomposition of a signal into a set of basic functions consisting of contractions, expansions, and translations of a mother function ϕ(t), called the wavelet [7]. The continuous wavelet transform (CWT) is defined by (1) in terms of dilations and translations of a prototype or mother function ϕ(t). In time and Fourier transform domains, the wavelet is defined as Given f ∈ L 2 (R), the continuous wavelet transform (CWT) of f at time u and scale s is defined as Scalogram is a function of frequency and time that is used to get better time localization for shortduration, high-frequency events, and better frequency localization for low-frequency, longer-duration events [6]. Scalograms are obtained by the Continuous Wavelet Transform. It is used to analyse nonstationary signals at multiple scales. The signals are decomposed into wavelets and time-frequency representation is created to distinguish noise from the signal.

Scalogram
Scalograms are used to get better time localization for short-duration, high-frequency events, and better frequency localization for low-frequency, longer-duration events. Here we need better time localization for short-duration, high-frequency events. It is a timefrequency plot obtained from the absolute value of the continuous wavelet transform (CWT) of a signal.
The scalogram of f is defined by the function 3.6. Model Implementation 3.6.1. Architecture of AlexNet. AlexNet is a CNN architecture that consists of 8 layers, which includes 5 convolutional layers and 3 fully-connected layers. Some of the convolutional layers are followed by maxpooling layers and the fully connected layers are followed by a final 1000-way softmax. This is one of the simplest architectures compared to the many complicated ones introduced in the recent years. The input size of the images to the AlexNet is 224 x 224 x 3 without padding and 227 x 227 x 3 with padding. The architecture uses ReLU as the activation function for faster convergence and dropout layers to avoid over-fitting. Fully connected layers were employed for the classification purpose. The first convolutional layer has kernel size of 11 x 11 with stride 4 and the maxpooling uses a filter size of 3 x 3 with stride 2 [8]. The SqueezeNet architecture consists of, as the name suggests, "expand" and "squeeze" layers. The squeeze convolutional layer has a filter size of 1 x 1, which are then fed into an expand layer that has a mix of 1 x 1 and 3 x 3 convolutional filters. The SqueezeNet uses 50x fewer parameters in order to achieve the AlexNet-level accuracy. The model also makes use of the fire modules. At first, the input image is fed into a standalone convolutional layer. This is then followed by 8 fire modules. The architecture also utilizes maxpooling with a stride of 2.   4. Architecture of ShuffleNet. The ShuffleNet architecture was mainly designed for use in mobile devices with very less computing power. The architecture makes use of two operations, such as pointwise group convolution and channel shuffle. Using these two operations, feature map channels could encode more information. The usage of pointwise convolution helps in reducing the costly dense 1 x 1 convolutions. In order to overcome the issue of the group convolutions blocking the information and weakening the representation, channel shuffle operation is presented. The channel shuffle obtains the input data from different groups by splitting the channels in each group to several other groups. This is then fed into the next layer with different subgroups.

Experiment and Results
This section describes about the dataset division, experiment and its setup as well as the results and performance analysis.

Dataset Division
In order to obtain a greater number of samples to train the CNN, the dataset was divided into 4 parts, i.e., the dataset is now in the format 4×40×40×2016 for each of the 32 subjects.
In our work, we classify the data into four different emotions: Happy, Scared, Sad and Calm based on the Valence and Arousal features of the DEAP dataset. Valence and arousal were rated on a scale of 1 to 9. The rating of valence greater than 5 indicates a high valence and less than or equal to 5 indicates a low valence, Similarly, the rating of arousal greater than 5 indicates a high arousal and less than or equal to 5 indicates a low arousal. The data is then reorganised into 4 categories on the basis of these emotions.

Experimental Setup
The experiments were performed in two different PCs with Intel(R) Core (TM) i5-10210U CPU at 1.6 GHz and AMD Ryzen 5 3500U CPU at 2.10 GHz. The tool used for generating the scalogram and training the CNN is MATLAB R2020b.

Experiment
The experiments are based on multi-channel EEG signals. Continuous Wavelet Transforms (CWT) is applied to the sampled data. We have chosen Morse Wavelets, which is a part of the analytic wavelets and are useful in analysing modulated signals. Morse wavelets are defined by two parameters-symmetry and time-bandwidth product. These parameters will define the behaviour of the transforms and also the shape of the wavelet. For a generalized Morse wavelet, the Fourier transform is expressed as Ψ P,γ (ω)=U(ω)a P,γ ω P 2 /γ e −ωγ (3) Scalogram images are plotted for these wavelets as a function of time and frequency. These images are taken as inputs to the CNN model. CNN is trained to extract features from the scalograms in order to classify the emotions

Results
The results of training performed are as shown below:

Conclusion
In this research paper, we had focused on classifying the emotions into 4 categories based on the valence and arousal dimensions of the DEAP dataset. The proposed approach has not achieved the best accuracy and results. This could be due to the limitation in dataset size and hence the lesser number of scalogram images which are having high resemblance with each other. Future work aims at achieving better results with the limited datasets and hence classifying the emotions.

Acknowledgement
This work was supported by the Vellore Institute of Technology, Chennai, India. The authors would like to thank the authors (S. Koelstra et al. 2012), who proposed the multimodal dataset for emotion recognition and provided the dataset on request.