Improved Deep Feature Learning by Synchronization Measurements for Multi-Channel EEG Emotion Recognition

Emotion recognition based on multichannel electroencephalogram (EEG) signals is a key research area in the field of affective computing. Traditional methods extract EEG features from each channel based on extensive domain knowledge and ignore the spatial characteristics and global synchronization information across all channels. -is paper proposes a global feature extraction method that encapsulates the multichannel EEG signals into gray images. -e maximal information coefficient (MIC) for all channels was first measured. Subsequently, an MIC matrix was constructed according to the electrode arrangement rules and represented by an MIC gray image. Finally, a deep learning model designed with two principal component analysis convolutional layers and a nonlinear transformation operation extracted the spatial characteristics and global interchannel synchronization features from the constructed feature images, which were then input to support vector machines to perform the emotion recognition tasks. Experiments were conducted on the benchmark dataset for emotion analysis using EEG, physiological, and video signals. -e experimental results demonstrated that the global synchronization features and spatial characteristics are beneficial for recognizing emotions and the proposed deep learning model effectively mines and utilizes the two salient features.


Introduction
As an advanced function of the human brain, emotion plays an important role in daily human life. Emotional expressions enable easier communication, and emotional states can affect a person's work and learning. erefore, emotion recognition offers a high application value and broad prospects in the fields of medicine, education, intelligent systems, human-computer interactions, and commerce and has become a research area of great interest [1].
Transitions between emotion states are accompanied by complex neural processes and physiological changes. In addition to facial expressions [2], speech [3][4][5], and body movement [6,7], electrophysiological signals and endocrine-related indicators can reflect changes in emotion states [8][9][10] as well. However, these same physical characteristics are easily affected by the subjective will of a person as well as the external environment. On the other hand, emotion recognition through analyzing physiological electrical signals is relatively objective. erefore, investigating the relationship between physiological signals, such as electroencephalogram (EEG) and emotion states, has garnered considerable attention [11][12][13]. A critical step in the EEG-based emotion recognition task is to extract features related to human emotional states from multichannel EEG signals. Various time-domain, frequency-domain, and time-frequency-domain features have been proposed in previous studies. e time-domain features include statistical features (such as power, mean, and standard deviation) [14][15][16], event-correlated potentials (ERP) [17], Hjorth features (e.g., mobility, activity, and complexity) [18], nonstationary index [19], and higher-order crossing features [20,21]. Frequency-domain features, such as power spectral density (PSD), power, and energy, were often utilized in existing studies [22][23][24]. Pan et al. [25] used the emotion-related common spatial pattern and differential entropy features of five frequency bands in their research. To acquire the time-varying characteristics reflected by EEG frequency data, the short-time Fourier transform was used to analyze EEG signals [26,27]. As EEG signals are time-varying, researchers proposed new methods to obtain additional information by combining time and frequency-domain features.
e Hilbert-Huang transform and discrete wavelet transform (DWT) study EEG signals from both the time and frequency domains. Hadjidimitriou and Hadjileontiadis [28] used the EEG features extracted from the Hilbert-Huang spectrum method for recognition emotions. Yohanes et al. [29] proposed the use of DWT coefficients as features for emotion identification from EEG signals. ese results show that the EEG time-frequency features can provide salient information related to emotional states.
Many machine learning methods have been applied to emotion recognition tasks. Frantzidis et al. [17] obtained the amplitudes and latencies of ERP components and the eventrelated oscillation amplitude features and then employed a support vector machine (SVM) as the classifier. Murugappan et al. [15] proposed processing linear and nonlinear statistical features derived from five frequency bands (delta, theta, alpha, beta, and gamma) into DWT and k-nearest neighbor (k-NN) classifiers. Chun et al. [30] applied spectral power features and a Bayes classifier with a weighted-logposterior function for emotion recognition. Various deep learning methods were also applied to EEG-based emotion recognition. Wang and Shang [31] presented an emotion recognition method based on deep belief networks (DBNs). Kwon et al. [32] used fusion features extracted from EEG signals and galvanic skin response signals and convolution neural networks (CNNs) for emotion recognition. In our previous work, an emotion recognition method based on an improved deep belief network with glia chains (DBN-GC) and multiple domain EEG features was proposed [23]. Suwicha et al. [33] utilized fast Fourier Transform to calculate the power spectral density features from EEG signals and proposed a deep learning network based on a stacked autoencoder. Li et al. [34] used the CNNs and recurrent neural network to construct a hybrid deep learning model.
Compared with traditional machine learning methods, deep learning models have shown excellent performance and the potential ability for multichannel EEG-based emotion recognition tasks [31][32][33][34][35][36]. However, two challenges remain in multichannel EEG-based emotion recognition. First, most studies only consider the salient information related to the emotional states of each channel EEG signal in the time domain, frequency domain, and time-frequency domain.
e spatial characteristics and global synchronization changes for all channels' EEG signals under different emotion states were neglected, and these may provide salient information related to emotion states and so are beneficial to emotion recognition. Second, using a simple and effective deep model to mine and utilize the two salient information contributions for emotion recognition is important.
To address these challenges, a feature extraction method based on a global synchronization measurement from all EEG channels and a principal component analysis network-(PCANet-) based deep learning model is proposed. First, the maximal information coefficient (MIC) for each EEG channel pair is measured by a synchronization dynamics method. Subsequently, the MIC values of all channel pairs are arranged according to the given electrode order to construct the MIC features matrix, which is represented by a MIC gray image. Next, a novel deep learning model containing two PCA convolutional layers and a nonlinear transformation operation is designed for extracting the spatial characteristics and global synchronization features from the MIC gray image as the high-level abstract features. Finally, these high-level features are input to linear support vector machines to perform emotion recognition tasks. Synchronization analyses are extensively investigated in the neuroscience community, and synchronization measurements of EEG can characterize the underlying brain dynamics effectively [37][38][39]. Synchronization patterns of EEG signals change with changing emotion states. MIC is considered the best bivariate synchronization measurement method [40] that can find synchronization patterns related to emotion states in EEG. Compared with traditional deep learning methods for emotion recognition [31,35,36], the PCA network achieves similar or better emotion recognition performance with low computational complexity. Additionally, the PCA network effectively learns robust invariance features from the MIC gray images, and filter learning does not require regularized parameters and a numerical optimization solver [41]. e remainder of this paper is organized as follows. e emotion dataset and model, MIC-based feature extraction method, and PCANet-based emotion recognition method are introduced in Section 2. e experimental results are described in Section 3. Section 4 provides a discussion. Section 5 presents a brief conclusion of this work.

Dataset and Emotion Model.
is research adopts the DEAP dataset [42] to evaluate the proposed emotion recognition methods. DEAP is a public dataset for emotion analysis that includes electroencephalogram, physiological, and video signals. DEAP recorded EEG and peripheral physiological signals of 32 subjects (including 16 females and 16 males, aged 19 to 37 years with an average of 26.9) while watching 40 one-minute music videos as stimuli.
ese videos were chosen among 120 YouTube videos, and each was selected to activate a related emotion state. Each subject conducted 40 trials, so there are 1,280 trials (32 subjects × 40 trials) in the dataset. e EEG signals in each trial included 3 s baseline signals and 60 s stimulation signals. For each trial, 32 EEG channels (Fp1, AF3, F7, F3,  FC5, FC1, T7, C3, CP5, CP1, P7, P3, PO3, Pz, O1, Oz, Fp2,  AF4, Fz, FC2, Cz, C4, T8, CP6, CP2, F4, F8, FC6, P4, P8, O2, and PO4) were used. Each subject was asked to complete the self-assessment manikin (SAM) for valence, arousal, dominance, and liking after each trial, presented in Figure 1, where the scale of the four self-assessment dimensions range from 1 to 9. e preprocessed raw EEG data and the corresponding emotion self-assessment in the DEAP dataset are used in this work. e EEG signals (512 Hz) were processed to remove the effects of the electrooculogram (EOG) and then downsampled to 128 Hz. Band-pass filtering was implemented with cut-off frequencies of 4.0 and 45.0 Hz.
In this study, the arousal-valence scale is used for emotion analysis because it can measure emotions effectively and is widely used in similar research. As shown in Figure 2, a two-dimensional emotional model can be constructed. e arousal ranges from inactive (e.g., bored, uninterested) to active (e.g., excited, alert) and valence ranges from negative (e.g., stressed, sad) to positive (e.g., elated, happy).
We define two emotion classes for valence and arousal scales. For each trial in the dataset, if the associated selfassessment value of the arousal is greater than five, then the trial is assigned to the high arousal (HA) emotion class. Otherwise, the trial is assigned to the low arousal (LA) emotion class. Similarly, there are low valence (LV) and high valence (HV) emotion classes in the valence dimension.

MIC-Based Deep Feature Learning Framework for Emotion Recognition.
Considering the need for the performance of recognition and efficiency of analysis, the synchronization between EEG signals using a simple and effective deep feature learning method is analyzed in this study. e phases of our proposed method consist of feature extraction of the synchronization dynamics followed by high-level feature learning and pattern classification upon the linear SVM, as illustrated in Figure 3. e 32-channel EEG signals are segmented with the same window size of 12 seconds in the experiments. All MIC measurements of each channel pair within each time window are calculated and organized according to the electrode arrangement rules of the MIC gray image. ese images are then processed and classified by the deep learning architecture based on a PCA network.

Synchronization Measurement Based on the Maximal Information Coefficient.
e brain is a large network of neurons, and synchronous activities of neurons in different regions can provide useful information regarding the neural activity of interest. Relationships between brain regions can be described as synchronization between electrodes. en, a connectivity matrix can be defined with elements representing synchronization information between two electrodes. e order of the electrodes is also essential because the spatial features of brain regions in the connectivity matrix can be retained. erefore, this work uses new methods to explore the synchronization and spatial features of brain regions as they relate to emotion states.

Maximal Information Coefficient.
e MIC is calculated by a synchronization measurement method that is used to measure the linear and nonlinear synchronization relationship between two random variables (e.g., bivariate EEG signal segments) [40]. is concept originates from maximal information-based nonparametric exploration statistics.
Specifically, according to [40], for the finite set S of ordered pairs, the x-value and y-value of S are partitioned into a grids and b grids (empty grids allowed), respectively, to form an a-by-b grid represented by G, and the maximal mutual information of each grid partition is assigned to I * by equation (1): where I(S|G) represents the mutual information [43] of distribution S|G and g p denotes the partitioned grid. e characteristic matrix of S can be expressed using equation (2) as e value of MIC is obtained using equation (3) where the number of samples is n, and the maximum number of partitioned grids is less than B(n), where B(n) � n 0.6 in this paper.
e value of each element in the characteristic matrix M(S) a,b is between 0 and 1. If the order of two variables is changed, then the value of MIC remains unchanged.
e activation level of neurons in brain regions is not synchronous under different emotion states. e MIC is a nonlinear bivariate synchronization measurement method, whereas EEG is nonstationary and provides nonlinear physiological signals.
eoretically, MIC can effectively measure the synchronization features between brain regions. erefore, using MIC to measure the synchronization of physiological signals (EEG) reflected by different brain regions can obtain salient information related to corresponding emotion states.

MIC Gray Image.
Studies have shown that the global information and spatial features of all EEG channels can improve the performance of emotion recognition [13,24]. To characterize the spatial information and global synchronization information of all EEG channels, an MIC gray image is constructed as the feature for all EEG channels. e construction process of an MIC gray image is shown in Figure 4. e MIC of each EEG channel pairs is measured for each sample. Assuming the sample has c channels, each EEG channel pair requires a MIC measurement, and there are c * (c − 1)/2 measurements. In this work, the value of c is set to 32, so 496 MIC features are obtained for each sample.
When constructing the feature matrix, the arrangement order of the EEG channels is determined by the arrangement rules. According to the arrangement rules of the electrodes (Fp1, AF3, F7, F3, FC1, FC5, T7, C3, CP1, CP5, P7, P3, Pz, PO3, O1, Oz, O2, PO4, P4, P8, CP6, CP2, C4, T8, FC6, FC2, F4, F8, AF4, p2, Fz and Cz), the MIC values for all EEG channel pairs are combined to construct a MIC feature matrix (MICFM), represented as U, as seen in the following equation: where u ij (i, j � 1, . . . , n) represents the MIC of the synchronization measurement between the EEG channels i and j. MICFM is a symmetric matrix, and when i � j, u ij � 1. e MICFM is constructed from each sample, as shown in part b of Figure 4, and each element in the matrix represents the MIC of the corresponding EEG channel pair. For example, the red elements in the matrix of Figure 4(b) represent the MIC of channel AF3 and Fp1.
To enhance the ability of the feature representation and extract high-level features easily, the feature matrix is represented by a MIC gray image. As shown in Figure 4(c), the value of each pixel is the MIC of the corresponding EEG channel pair.

Complexity
In brain regions, EEG signals of two physically adjacent electrodes tend to be similar due to the volume conductance effect. erefore, the electrode arrangement rules retain the similarity information of the same brain region as well as difference information between different brain regions. e MIC gray image shows the synchronization variations of human EEG signals on the scalp directly and accurately. Also, compared with traditional features, the MIC gray image contains the spatial characteristics and global synchronization features of multichannel EEG signals, which is beneficial to identify emotion states. Following these advantages, the MIC gray image of each sample is extracted for further analysis.

PCA Network for Deep Feature Learning and Emotion
Recognition. To utilize the spatial characteristics and global synchronization features, a deep learning model is introduced to extract high-level features from MIC gray images for emotion recognition. e proposed deep model is based on a PCA network [41], PCANet, which consists of a hierarchical feature learning layer and a linear SVM classifier.
e structure of the model is shown in Figure 5. First, convolution filters in the feature learning layer are learned based on the input MIC gray images through PCA. e local patterns and the patterns of neighboring values of the MIC gray images are extracted by PCA convolution filters. In CNNs, convolutional filters are initialized randomly and directly determine the features of learning. e primary convolution filters of this PCANet-based deep learning model, which is different from those of CNNs, are generated by PCA to learn more discriminative features with a simple architecture.
ese discriminative features can effectively represent different synchronization patterns from EEG signals in various emotion states. Similar to CNNs, the first part of the PCANet-based deep learning model can be comprised of multilayer PCA filters, and this work only includes two PCA convolution layers.
Second, a nonlinear processing layer that includes binarization and hash mapping processing enhances the separation of the discriminative features. en, block-wise histograms are used to reduce the dimension of features.
Finally, the SVM classifier outputs the emotion recognition results based on the learned high-level features from the input images.
Suppose there exist N input MIC gray images I i N i�1 of size m × n, and the size of the patch in all convolution layers is k 1 × k 2 . Only the PCA convolution filters need to be learned from the input MIC gray images I i N i�1 . e components of the model structure are described in detail in the following sections.

First PCA Convolution Layer.
For each MIC gray image in the training set, a k 1 × k 2 patch is applied around each pixel. All patches can be obtained from the i-th image, i.e., en, . , x i,mn can be achieved by subtracting the patch mean from each patch where x i,j represents a meanremoved patch. e same matrix is constructed from putting together each MIC gray image from the training set. us, equation (5) is obtained as Assuming that the number of filters in the i-th layer is L i , the PCA minimizes the reconstruction error within a set of orthonormal filters, as expressed in the following equation: where I L 1 is an L 1 × L 1 identity matrix and the L 1 principal eigenvectors of XX T is the solution. us, the PCA filters can be defined as where the l-th principal eigenvector of XX T is represented as q l (XX T ), which is mapped to W 1 l by the function mat k 1 ,k 2 (v). Here, mat k 1 ,k 2 (v) is a function that maps v ∈ R k 1 ×k 2 to a matrix W ∈ R k 1 ×k 2 . e variation of all the mean-removed training patches is captured by the dominating principal eigenvectors. e output of the first convolutional layer is expressed asI l i � I i * W 1 l , i � 1, 2, . . . , N where the convolution operation of two dimensions is denoted by * . e output of the i-th input image is I l i , and W 1 l is the l-th filter of the PCA in the first convolutional layer.

Second PCA Convolution Layer.
e operations of the second convolutional layer are similar to that of the first convolutional layer where all the patches of I l i can be collected. After subtracting the patch mean from each patch and collecting the mean-removed patches for all the filter outputs, the following equation is obtained: where Y l represents the l-th filter output of the first layer after the patch mean and mean are removed from each patch. e PCA filters of the higher layers are denoted as For each input I l i of the h-th layer convolving with W h l for l � 1, 2, . . . , L h , the output O will have L h images of size m × n, such that λ�1 . e outputs are binarized using a Heaviside step function H(·) with a value of one for positive entries and zero otherwise, resulting in equation (11). Around each pixel, the vector of L h binary bits is viewed as a decimal number, so this maps the L h outputs from O d i back into a single integer-valued "image", such that where every pixel value is an integer ranging from 0 to 2 L h − 1.
To reduce the dimension of features, a block-based histogram is next applied. Each "image" T l i is partitioned into B blocks. en, the histogram of the decimal values in each block is computed so that all B histograms are concatenated into a single vector, represented as Bhist(T l i ). us, the "high-level features" of the input MIC gray image I i are defined to be the set of block-wise histograms, such that  Figure 5: PCA network-based deep learning model structured from a simple deep architecture for emotion recognition. e model consists of PCA convolution layers and a nonlinear processing layer to learn high-level features from MIC gray images. 6 Complexity is block-wise histogram encodes special information and offers some degree of translation invariance in the obtained features within each block. e block size and overlapping ratio of local blocks are important parameters of PCANet-based deep learning models and are discussed in the next section.
After the high-level EEG features of the MIC gray images are learned by the process described above. An SVM based on a linear kernel is introduced to process the extracted high-level features and perform the emotion classification tasks.

Results
When a deep learning model is employed for emotion recognition, adequate data is essential to achieve meaningful performance. In this work, we augment the available training dataset through a temporal segmentation method. A 3-second pretrial baseline is removed in the first stage.
en, a sliding window divides the raw EEG signal of each channel into several segments.

Comparison between Global MIC Features and Common
To obtain convincing results, the dimensions of these two types of EEG features should be similar. Previous studies have also shown that the recognition performance of fusion features is better than that of a single feature [23,36].  e MIC features based on the synchronization measurement offers new ideas for feature extraction in emotion recognition using EEG signals.    e 10-fold cross-validation technique is also used, and to verify the effectiveness of the proposed high-level features, different model parameters are set for the experiments. Parameters of the PCANet-based model include the filter number of each layer (L 1 and L 2 represent the number of filters in the first layer and second layer, respectively), the filter size of each layer (the filter size in the first and second layers are defined as k 1 and k 2 , respectively), the block overlap ratio (BOR), and the block size in the nonlinear processing layer.

Impact of the Filters Number.
Except for the number of filters, the other parameters initially remain unchanged. e block overlap ratio is set to 0.5, the filter size of two layers is k 1 � k 2 � 5, and the block size is 8 × 8. We alternate changing the number of filters in each layer while keeping the number of filters in other layers unchanged. For example, when we change the value of L 2 , the value of L 1 remains the same. e results in Figure 9 are from our experiment in which we found that when the values of L 1 and L 2 are within a certain range, the impact on the recognition performance of the two emotion dimensions changes significantly. us, the values of L 1 and L 2 are set to 1 through 21 and 7 through 15, respectively. Also, as shown in Figure 9, for any value L 2 takes the optimal interval of L 1 is from 9 to 17 in the arousal dimension and from 7 to 17 in the valence dimension. e recognition performance improves with increasing L 2 . However, when the value of L 2 increases to 15, the recognition accuracy increases negligibly. As the number of filters increases, computing resources and memory requirements increase dramatically. erefore, it is appropriate to set the value of L 2 to 15. e highest accuracy in the arousal dimension of 0.7130 was achieved from a combination of L 1 = 11 and L 2 = 15. e highest accuracy in the valence dimension of 0.6958 was achieved from a combination of L 1 = 9 and L 2 = 15. In addition, these results also show that with the increase of L 1 and L 2 , the impact of L 1 and L 2 on emotion recognition decreases gradually. Moreover, L 2 impacts emotion recognition more significantly than L 1 in two dimensions.

Impact of the Filters Size.
e optimum combination of the number of filters is obtained from the previous section. Under these parameters, the impact of the filter size may be investigated. Specifically, the filter size k 2 of the second layer is set to 5, 7, 9, and 11, respectively, and the filter size k 1 of the first layer increases from 3 to 15 with an interval of 2. e block overlap ratio is set to 0.5, and the size of the block is 8 × 8. e recognition results are shown in Figure 10. e filter size refers to the input image size, and we select in our experiment the appropriate ranges (k 1 from 3 to 15, k 2 from 5 to 11) to show the influence of k 1 and k 2 on emotion recognition. As shown in Figure 10, with an increase in k 1 , the recognition accuracy rises rapidly. However, when the value of k 1 continues to increase, the recognition accuracy decreases slowly. In addition, with the increase of k 2 , the recognition performance of the model shows a downward trend. Whatever value k 2 takes, the optimal value of k 1 is 7 for the arousal dimension and 5 in the valence dimension.
On the other hand, as the filter size decreases, the computational costs increase significantly, but the recognition performance does not change much. e highest recognition accuracy of 0.7169 in the arousal dimension is achieved by a combination of k 1 � 7 and k 2 � 5. e highest recognition accuracy of 0.6958 in the valence dimension is achieved with a combination of k 1 � 5 and k 2 � 5. ese results also show that k 1 and k 2 have a certain impact on the performance of emotion recognition, where the impact of k 1 is more obvious.

Impact of the Block Overlap Ratio.
After determining the optimum values of the filter size and number in the two emotion dimensions, the impact of the overlap ratio on emotion recognition is next considered. e block size is   Table 1 for the different values of the block overlap ratio. As shown in Table 1, with an increase in the block overlap rate, the recognition performance changes little, which may be due to two reasons. First, the regularity between pixels in the MIC gray images of each trial is not apparent. Second, the global MIC features of each participant are different. In the arousal dimension, the maximum recognition accuracy is only 1.77% higher than the minimum while the variance is only 0.2826. In the valence dimension, the maximum recognition accuracy is only 1.69% higher than the minimum while the variance is only 0.1656. ese results suggest that the block overlap ratio offers no significant impact on the performance of recognition in the two dimensions when using MIC gray images. e best recognition accuracies of the two dimensions are achieved when the block overlap ratio is 0.2, with values of 0.7169 and 0.6968 for the arousal and valence dimensions, respectively. To reduce the dimension of the features without affecting the recognition accuracy, the overlap rate of the blocks is recommended to be set to 0.2 in the two dimensions.
e results are shown in Table 2 where the block size refers to the input image size. When the block size is less than 4 and greater than 13, the recognition performance changes a little. erefore, the values of the block sizes range from 4 × 4 through 13 × 13. Initially, with an increase in size, recognition accuracy gradually improves. As the size continues to increase, the recognition accuracy begins to decreases. Blocks can reduce the dimension of the features and offer some degree of translation invariance in the obtained features. Because of the complexity of the EEG signals, there may be various deformations in a MIC gray image. With the increase in the block size, the robustness of the model to various deformations strengthens, which leads to an increase in recognition accuracy as the block size increases.
When the block size is too large, the number of features obtained by the model is small, so the recognition accuracy is unsatisfactory. In the arousal dimension, the maximum recognition accuracy is 0.7185 when the block size is 6 × 6. In addition, the maximum recognition accuracy is 1.80% higher than the minimum, while the variance is 0.1776. In the valence dimension, the maximum recognition accuracy is 0.7021 when the block size is 5 × 5. e maximum is 1.53% higher than the minimum, and the variance is 0.1370. ese results also demonstrate that the block size offers a slight impact on emotion recognition accuracy. At this point, all parameters of the PCA network are analyzed, and the best identification results (arousal: 0.7185, valence: 0.7021) and parameter settings are obtained.

Comparison between Global MIC Features and High-Level
Features. To illustrate the advantages of the high-level features extracted from MIC gray images over the global MIC features, the recognition performance of these features is compared with the results shown in Table 3. First, we compare the recognition accuracies of the two types of features using a linear SVM with the same parameters. In the arousal dimension, the average recognition accuracy of the high-level features is 3.79% greater than the global MIC features. In the valence dimension, the average recognition accuracy of the high-level features is 4.23% greater than the global MIC features.
Second, a Wilcoxon signed-rank test (α < 0.05) analyzes the recognition performance of the high-level and MIC features in both dimensions. Here, a null hypothesis exists that the recognition performance is similar and is accepted if the p-value is larger than α. e p values of arousal and valence dimensions are 0.002 and 0.0039, respectively, meaning that the recognition performance of the high-level features is superior to the global MIC features. e results of these comparisons show that high-level features can improve the performance of emotion recognition in the two dimensions. e MIC gray images include the global synchronization features as well as the spatial characteristics, which contain salient information related to the emotion states making the recognition performance of the high-level features better than that of the global MIC features.

Comparison between CNN and PCA Network-Based Deep
Learning Model. e components for constructing the PCA network-based deep learning model are basic and computationally efficient. To demonstrate its lightness and effectiveness, the PCA network-based deep learning model is compared with a traditional CNN. With 6,400 MIC gray images from all subjects, 10-fold cross-validation is used. To obtain convincing results, the number of convolution layers, number of filters, and filter sizes of the CNN are similar to those of the PCA network. e CNN includes two convolution layers with the first employing a 5 × 5 kernel with a stride size of 1, a ReLU activation function, and ten filters. e second convolutional layer features the same parameters as the first except for having 15 filters. e output layer is a softmax classifier, and the batch size and epochs are 120 and 500, respectively. e average recognition accuracies of the CNN in the arousal and valence dimensions are 0.6907 and 0.6853, respectively. e recognition performance of the PCA network in two emotion dimensions is better than that of the CNN with average recognition accuracies of the PCA network (0.7185, 0.7021) being 4.03% and 2.45% higher than the CNN. In addition, the overall training time of the CNN is significantly longer than the PCA network. During an experiment, the training of the PCA network on 6,400 images of 32 × 32 pixel dimensions lasted for approximately seven minutes, while the CNN took about an hour. According to Chan et al. [41], the overall computational complexity of the two-layer PCA network in the training and testing phases can be verified as From equation (13), the PCA network offers low computational complexity. Compared with the PCA network, the filters of CNN require a numerical optimization solver during the training phase, which significantly increases the computational complexity. Compared to the CNN, the PCA network-based deep learning model offers better emotion recognition performance with lower computational complexity. erefore, it is suitable for emotion recognition using MIC gray images.

Discussion
is study investigates the feasibility of MIC and PCA network-based deep learning approaches, which have recently developed in big data relevance analyses and image classification methods. To consider multichannel interdependencies, data from all available channels are included in feature extraction by using the MIC and deep learning algorithm as opposed to the individual application of traditional time-frequency analysis of each channel.

Synchronization Dynamics Related to Emotions Expressed in EEG.
Our first observation is that the emotion classification performances for all classifiers and emotion states are higher using MIC features compared to time-frequency features. Many studies have shown that brain regions often respond differently to various emotions. When an individual is in an emotional state related to avoidance motivation, such as disgust and fear, a clear activation in the right frontal lobe relative to the left frontal lobe occurs. In the emotional state related to proximity motivation, such as pleasure, the activation degree of the left frontal lobe relative to the right frontal lobe is high [45]. erefore, the synchronization of brain regions in the corresponding emotion state can be used to represent salient information of emotion. e global MIC features might reveal different varieties of dynamic processes of perception arousal level and excitation, which might be reflected by the nonlinear mode measured by MIC. Our experiments demonstrate these features as the global synchronization measured by MIC are superior to those measured by traditional time-frequency analysis, indicating that MIC could capture a variety of potentially interesting relationships between paired brain regions that traditional time-frequency analysis cannot capture.
Emotion states can be represented by the physiological electrical signals reflected from the cerebral cortex with a representation in the range of 2D space. Preserving the spatial characteristics of multichannel EEG signals can enhance the separation of EEG features in different emotional states. erefore, in this work, according to the electrode arrangement rules, the MIC gray image is constructed by the global MIC features. e MIC gray image represents the features closer to the real response of the brain, which may contain additional information gain related to emotion compared with traditional features. e experiments in this paper also suggest that the reserved spatial characteristics are beneficial to emotion recognition.

Advantages of Unsupervised Deep Neural Network.
Second, the experimental results show that the high-level features based on the synchronization and spatial characteristics of multichannel EEG can improve the performance of emotion recognition. Neural networks successfully are used in many fields because of their high nonlinearity, selfadaptive weight adjustments, anti-interference, and selfadaptive feature selection. is research uses a PCA network, an unsupervised deep network model, to process MIC gray images, and this unsupervised deep learning model effectively captures the synchronization and spatial characteristic features of the MIC gray images.
By verifying the influence of different network parameters on the recognition performance, we found that the

Advantages of Proposed Approach over Existing Methods.
e results of our proposed method are also compared with those of other emotion recognition methods based on the same dataset. Table 4 presents the details of the comparison methods and emotion recognition results. Among them, features extracted from the central nervous system (CNS) were used in reference [42] and hierarchical bidirectional Gated Recurrent Unit network (H-ATT-BGRU) were used in reference [51].
As shown in Table 4, the performance of our proposed model outperforms most of the compared methods by achieving the highest recognition accuracy, with the exception of reference [48] in the arousal dimension. e reason for this scenario may be that the method used in reference [48] utilizes a subject-related emotion classification model that only classified samples belonging to the corresponding subject. e average of all subjects' results was used as the final recognition result.
In contrast, this study uses samples from all subjects, based on the general EEG classification model fostered to detect emotional states of different subjects accurately. is study also shows that the change of synchronization between EEG channels can be used to represent the change of a person's emotion. On the other hand, conventional emotion classification approaches rely on time and frequency analyses of EEG, which need sufficient a priori knowledge, where our method requires no a priori knowledge. Furthermore, the proposed approach is not required to remove intensive noise from the EEG while other available methods do.

Potential for EEG Data Applications.
Using the MIC to find complex associations in EEG data and the physiological and psychological information represented by associations, such as emotion and disease, can now be further analyzed.
is approach offers a new way to use EEG for related pattern classification and recognition tasks. In addition, a parallel MIC computing scheme can reduce the computing complexity of MIC [52] to enable real-time synchronization analysis of EEG data in real applications. e filter learning in PCANet does not involve regularized parameters or a numerical optimization solver. Moreover, the construction of the PCANet only includes a cascade linear map and a nonlinear output stage. Such simplicity offers an alternative and refreshing perspective to convolutional deep learning networks for processing EEG data. e overall work enables a general and cost-effective solution for the emotion classification of EEG and holds great potential for other classification tasks related to EEG, such as epileptic dementia and Alzheimer's disease detection.

Conclusions
A novel feature extraction method based on synchronization dynamics and deep learning was proposed for multichannel EEG-based emotion recognition, including two primary tasks. First, a method based on synchronization dynamics is used to extract the global MIC features from all the channel pairs of the EEG signals, which are then represented by a MIC gray image according to the proposed feature construction method. e MIC gray image reflects the global synchronization information as well as the spatial characteristics in all EEG signals. us, the image contains the spatial and global synchronization features that provide salient information related to emotional states. Second, a PCA network-based deep learning model and a linear SVM classifier extract high-level features and emotion classification, respectively. e experimental results suggest that the proposed feature extraction method achieved satisfactory results and proves that MIC features can automatically and effectively characterize salient information in EEG signals related to emotional states. In addition, this work demonstrates that the spatial and global synchronization features contained in the proposed MIC gray image are beneficial to recognize human emotion. e deep learning model based on the PCA network can effectively mine and utilize the two salient information dimensions for emotion recognition.
Data Availability e dataset used in this paper is derived from the Queen Mary University of London (http://www.eecs.qmul.ac.uk/ mmv/datasets/deap/).

Conflicts of Interest
e authors claim that there are no conflicts of interest in terms of the publication of this paper.