Feature Extraction Method of EEG Signals Evaluating Spatial Cognition of Community Elderly With Permutation Conditional Mutual Information Common Space Model

In order to improve the traditional common space pattern (CSP) algorithm pattern in EEG feature extraction, this study proposes a feature extraction method of EEG signals based on permutation conditional mutual information common space pattern (PCMICSP), which used the sum of the permutation condition mutual information matrices of each lead to replacing the mixed spatial covariance matrix in the traditional CSP algorithm, and its eigenvectors and eigenvalues are used to construct a new spatial filter. Then the spatial features in the different time domains and frequency domains are combined to construct the two-dimensional pixel map, Finally, a convolutional neural network (CNN) is used for binary classification. The EEG signals of 7 community elderly before and after spatial cognitive training in virtual reality (VR) scenes were used as the test data set. The average classification accuracy of the PCMICSP algorithm for pre-test and post-test EEG signals is 98%, which was higher than that of CSP based on CMI (conditional mutual information), CSP based on MI (mutual information), and traditional CSP in the combination of four frequency bands. Compared with the traditional CSP method, PCMICSP can be used as a more effective method to extract the spatial features of EEG signals. Therefore, this paper provides a new approach to solving the strict linear hypothesis of CSP and can be used as a valuable biomarker for the spatial cognitive evaluation of the elderly in the community.


Feature Extraction Method of EEG Signals Evaluating Spatial Cognition of Community Elderly With Permutation Conditional Mutual
Information Common Space Model information), and traditional CSP in the combination of four frequency bands. Compared with the traditional CSP method, PCMICSP can be used as a more effective method to extract the spatial features of EEG signals. Therefore, this paper provides a new approach to solving the strict linear hypothesis of CSP and can be used as a valuable biomarker for the spatial cognitive evaluation of the elderly in the community.
Index Terms-Permutation conditional mutual information common space pattern, EEG signals, community elderly, spatial cognitive evaluation, virtual reality.

I. INTRODUCTION
S PATIAL cognition is a relatively independent spatial element, which uses visual memory encoding to understand the higher cognitive function of the spatial environment. Spatial cognitive impairment can lead to many diseases, such as Alzheimer's disease [1]. The incidence rate of Alzheimer's disease is increasing along with the aging of the population, which can lead to memory loss, impaired attention, impaired problem-solving ability, and varying degrees of degeneration in spatial memory, visual-spatial structure, orientation, and other abilities in older adults [2]. At present, there is no feasible specific drug for such problems. Considering that community hospitals can provide auxiliary training services with portable devices, it is an urgent problem to explore and promote the diagnosis and treatment methods of spatial cognitive impairment of the elderly based on the spatial cognitive training and evaluation of the elderly in the community. Also, the development of spatial cognition training and evaluation methods is of great significance to prevent the decline of spatial cognition in healthy subjects. Relevant research results show that there is a close correlation between spatial cognition and EEG signals [3], [4]. EEG signals are often used to evaluate the correlation between spatial cognitive function and brain response.
In recent years, spectral analysis [5], [6], [7], [8], blind source separation (BSS) [9], [10], brain region synchronization analysis [11], [12], [13], [14], [15], event-related potential (ERP) [16], [17], [18], and brain function network analysis [15], [19], [20] have been mainly used to explore the changes of large brain related to spatial cognition. However, these methods mainly analyze spatial cognitive EEG signals from the correlation perspective of the time domain and frequency domain, without considering the spatial characteristics. As we all know, spatial features play an important role in the field of two classification analyses of EEG signals, which can describe the spatial distribution of multi-lead EEG signals in each category [21]. CSP is the most widely used spatial filtering technique, which can effectively extract the features of different states of the brain [22], [23]. The classical CSP algorithm focuses on the linear relationship between EEG signals of different channels [24], while EEG signals are nonlinear and non-stationary [25]. Therefore, a few researchers have introduced the nonlinear relationship of EEG signal into the CSP algorithm to extend the nonlinear spatial filter. Sun and Zhang [26] proposed a kernel CSP (KCSP) method based on kernel optimization feature extraction. However, the KCSP method is difficult to generalize because it requires the same sample size for the two kinds of data. Nasihatkon et al. [27] proposed to use linear and KCSP to construct spatial filters. However, this method requires a large amount of kernel matrix eigenvalue decomposition and high computational complexity.
Mutual information (MI) uses the concept of entropy to quantify the nonlinear synchronization relationship between EEG signals [28]. There is high demand for enough data length, associated with good statistical significance [29]. Moreover, the coupling strength of mutual information calculation is unsigned, which is not enough to indicate the effect driving between brain regions [30]. Conditional mutual information (CMI) quantifies the relationship between the two variables and eliminates the influence of the third variable, which can better reflect the information drive between brain regions [30], but the noise of EEG signals will have a great impact on the results [31]. Permutation condition mutual information (PCMI) [32] can measure the linear or nonlinear coupling strength of two-time series, and it has strong robustness to signal noise. Relevant studies show that the analysis performance of PCMI on EEG signals is better than CMI [32]. Yuan [33] used PCMI to extract the features of EEG signals before and after spatial cognitive training. Through the effective evaluation of training results, it has been proved that PCMI is effective in analyzing spatial cognitive EEG signals.
This study proposed a permutation conditional mutual information common space pattern (PCMICSP) algorithm. The mixed space covariance matrix of two data series, calculated by the traditional CSP method, is replaced by the mixed spatial permutation conditional mutual information matrix. So that the obtained spatial filter contains both linear and nonlinear features. The algorithm is verified by comparing the EEG signals of the elderly before and after spatial cognitive training. First, the PCMICSP method is used to calculate the spatial features of multiple frequency bands, and then the spatial features in the different time-and frequency domains are combined. Finally, we use CNN to compare the performance of the proposed PCMICSP with MI CSP (MICSP), CMI CSP (CMICSP), and traditional CSP algorithms, respectively.

A. Subject Information
Supported by the Ethics Committee of Qinhuangdao First People's Hospital, 7 healthy subjects with an average age of 67±6.81 years were recruited from nearby communities to participate in virtual reality spatial cognition training and testing tasks for 28 days [34]. The subjects signed the informed consent, had normal visual acuity or corrected visual acuity, had no history of major mental illness and had never participated in such spatial memory game training. The data collection was approved by the Ethics Committee of First Hospital of Qinhuangdao in Hebei Province, China (The approval number is 2018B006 in 2018).
In this study, two popular and widely used scales, Minimental State Examination (MMSE) [35] and Montreal Cognitive Assessment (MoCA) [36], were used to determine whether the subjects had cognitive dysfunction. The mean MMSE of the subjects was 28.29±1.25 and the mean MoCA was 24.7±2.56. The MMSE results showed that the cognitive function of the subjects was normal. Three spatial cognition scales, namely Guilford Zimmerman spatial orientation test (GZSOT) [37], Perspective Taking /Spatial Orientation Test (PTSOT) [38], and Corsi Block-Tapping Task (CBTT)) [39], were used to detect the users' spatial positioning ability.

B. Experimental Design Scheme
In this paper, the 'virtual community' spatial cognitive training game and the virtual 'city roaming' test task, designed in literature [40], were used to conduct the spatial cognitive training task for 28 days in a relatively quiet laboratory. The experiment was carried out in four cycles, each cycle of training for seven days. The training process (as shown in Fig. 1) is divided into three stages.
(1) Pre-training test. Before the first training cycle, all subjects wore electrode caps and head-mounted VR glasses to conduct a pre-training spatial cognitive ability assessment test using a spatial cognitive scale. During the test, the EEG signals of the subjects were mainly collected.
(2) Training and testing. The subjects used 'virtual community' training during each cycle. To ensure the participation enthusiasm of the elderly, the experimental group used the three-target points version of the virtual community in the first week and the six-target points version in the last three weeks. The training session recorded the time spent by the subjects to complete each task.
(3) Post-training test. Three spatial cognition scales (GZSOT, PTSOT, CBTT) were used to evaluate the training  effect of all subjects, and the VCW test route was used to complete the post-training task. The evaluation results of the spatial cognition scale, test game behavior data, and EEG signals were archived to evaluate the effectiveness of training in the later stage. The VCW test route is shown in Fig. 2. The test task is divided into two modes: learning mode and testing mode. In the learning stage, users are required to try their best to remember reference buildings and routes along the virtual city in accordance with certainly visible route guidance. This process is the coding learning stage. In the test stage, the original route guide is hidden, and then the user is asked to repeat the original route based on spatial memory and positioning ability. To keep the difficulty of each test consistent, each test path is treated symmetrically. Fig. 3 indicates the experimental setup and scenario for the spatial cognition test, which consists of an OpenBCI EEG acquisition device, HTC Vive Focus headset VR glasses, supporting control handles, and user perspective. OpenBCI device is used to collect users' EEG signals, and bluetooth communication is realized between OpenBCI USB dongle and EEG acquisition client. HTC Vive Focus head-mounted VR glasses provide users with virtual scenes required by test tasks, and the matching control handle is used for the control required in tasks. In addition, TCP/IP protocol is used to interact with behavioral data acquisition clients to complete data transfer. The subject view can be shared by using the screen assistant of 360 mobile assistant software in USB debugging mode of headset VR glasses, so that the user's perspective in headset VR glasses can be shared in time, which is convenient for experimental staff to guide their work.

C. Recording and Preprocessing of EEG Signals 1) Recording:
The sampling rate is 125Hz and the electrode impedance is less than 10k . Two 8-channel Cyton amplifiers are used to realize the synchronous acquisition of 16-channel EEG signals, and the data is transmitted to the computer via  Bluetooth. Based on the literature regarding the evaluation of spatial coding and retrieval [15], [41], [42], the electrodes in this experiment were determined as shown in blue in Fig. 4. In this experiment, the key response code of the subject in the VR task is used as an event marker to synchronize the EEG signal, align the subject's key response with the EEG signal on the time axis, and facilitate the offline analysis of the subject's EEG signal at the time of action response. Fig. 5 shows the EEG signals of the subjects when performing the test task, where L, R, and R' represent the left, right, and right key event markers respectively, and the abscissa represents the time dimension of EEG signals, and the ordinate represents 16 sampling channels. The data of each channel is the potential amplitude of EEG signals.
2) Preprocessing: In this study, EEGLAB [43] toolbox was used to preprocess EEG signals, mainly including the following steps: (1) Channel location. To ensure that independent component analysis (ICA) can estimate the source location of independent components of data, the relative coordinate information of each channel needs to be imported.
(2) Band pass filtering. A band-pass filter of 1-49hz is used to filter the attenuation of signals outside this frequency range (50Hz electrical frequency interference, etc.).
(3) Remove artifacts. Firstly, the artifacts caused by the small relative motion between the scalp and the electrode are removed by visual inspection; Secondly, ICA is used to remove the artifacts of EEG and EMG.
(4) Data segmentation. The key events in the test game were taken as event markers, and the EEG signals were divided according to one second before and after the event markers.
(5) Baseline correction. The data before 0ms is used as the baseline. The average value of each point data before 0ms can be subtracted from the data after 0ms to eliminate part of the spontaneous EEG noise.

D. Relevant Feature Extraction Methods
1) CSP Method: The CSP algorithm diagonalizes the covariance matrix of the two types of samples and uses the method of principal component analysis to find the part with the largest difference between the two types of samples to construct the optimal spatial filter. After the two types of sample data are processed by the spatial filter, the energy difference between the two types of samples in spatial components is the largest [44]. But the correlation can only reflect the linear relationship between the characteristics of EEG signals and cannot measure the nonlinear relationship between the twotime series.
2) MI, CMI, PCMI Methods: MI based on information theory reflects the information that a random variable carries with another random variable and can measure the degree of interdependence of two linear or nonlinear time series. CMI can quantify the coupling relationship between the two channels and eliminate the influence of the third channel, to better reflect the coupling information drive between brain regions [46]. In addition, the coupling direction index of brain regions can be analyzed [30]. MI and CMI can be used to calculate the coupling degree and interdependence of task state brain regions [47]. PCMI is another effective method to analyze the coupling strength in the information theory method. This method combines the permutation mode method and CMI to analyze the linear and nonlinear coupling strength of different brain regions of EEG signals. Its effectiveness in EEG feature extraction has been verified by many people [48], [49]. In principle, given any two-channel time series and observation data at a specific time, this method determines the probability distributions of arrangement mode, joint arrangement mode, and conditional arrangement mode in multi-dimensional space. Permutation entropy and PCMI value of EEG signal in these three modes can be obtained [32]. In addition, PCMI is more robust to signal noise than the CMI method [32], [33].

E. Permutation Conditional Mutual Information Common
Space Pattern Method 1) Method Description: The original EEG signals are segmented according to the pre-test and post-test categories. The two types of EEG signals are expressed as E 1 = e 11 , e 11 , · · · , e 1 p−1 , e 1 p and E 2 = e 21 , e 21 , · · · , e 2q−1 , e 2q respectively, and the dimensions are N × T × q. Where N represents the number of source signal channels, N = 16. T represents the number of sampling points on each channel, p and q represent the number of two types of signal segments respectively.
(1) Calculate the mixed spatial permutation condition mutual information of two kinds of EEG signals Take any two time-series X and Y , expressed as X = where n is the number of observation points of EEG signal. They are embedded into m-dimensional space to obtain new vectors, as shown in formulas (1) and (2).
where m represents the embedding dimension and τ represents the delay time. The elements X i Y j are sorted in ascending order. If there are equivalent elements in X i or Y j , the sorting order is determined according to the size of subscript i or j.
The vectors expressed in formulas (1) and (2) correspond to a sorting mode. There are m ! kinds of sorting patterns for vectors in m-dimensional space. For two EEG signals X and Y from different channels, analyze the vector X k , k = 1, · · · , n and Y k , k = 1, · · · , n sorting modes π i , i = 1, · · · , m! and π j , j = 1, · · · , m!, there are m! * m! joint patterns for vectors in m-dimensional space. For two EEG signals X and Y from different channels, analyze the vector X k , k = 1, · · · , n and Y k , k = 1, · · · , n sorting modes π i , i = 1, · · · , m! and π j , j = 1, · · · , m!, there are m! * m! joint sorting modes. Take the vectors with the same sorting pattern as a class, and the occurrence probability of each joint sorting pattern can be obtained according to the occurrence times of each pattern C i j : P x i , y j is defined as the joint probability distribution of X and Y ordering patterns of EEG signals. Similarly, the conditional probability distribution P x i | y j of the sorting mode X under the assumption that the EEG signal Y exists can be calculated: According to formulas (3) and (4), combined with the Shannon entropy method, the conditional ranking entropy X under the premise of the existence of Y can be calculated, as shown in the following formula: Now, Y δ is set to represent the observation value of Y the δ time in the future. According to the above formula (5), the ranking condition entropy P E (Y δ | Y ) Y δ under the premise of assuming the existence of Y can be obtained; Then, the joint ranking modes of vectors X i and Y j at the δ th time in the future are analyzed. At this time, m! * m! * m! joint ranking modes can be obtained at most, and the joint ranking entropy of the above vectors can be obtained.
Assuming the existence of Y , the conditional joint ranking entropy of X and Y S is: Finally, the PC M I δ X →Y used to calculate the permutation condition mutual information of EEG signals X and Y is obtained, as shown in formula (8): Similarly, PC M I δ Y →X can be obtained. At subsequent time points, the coupling strength of the X and Y is calculated accordingly. The transmission between such signals is defined as: Similarly, PC M I Y →X can be obtained, which represents the coupling strength from X to Y and Y to X , respectively. In (9), N refers to the maximum step size. This paper N takes the optimal empirical value of 15. The final calculated permutation condition mutual information matrix is shown in formula (10): where i is the classification category, and the values are 1 and 2, indicating two classifications; j is the sample size of each category; N is the number of channels; PC M I means calculating the coupling strength of the two channels. Then calculate the matrix expectation of the two types of raw data after segmentation: Formula (11) shows that the permutation condition mutual information matrix after obtaining their average normalization. C 1 is the expectation of the permutation condition mutual information matrix of the first type of samples and C 2 is the expectation of the permutation condition mutual information matrix of the second type of samples. C c represents the mixed space matrix of two types of data.
(2) The whitened eigenvalue matrix P is obtained by principal component analysis. Eigenvalue decomposition of C c : In formula (13): U c is the eigenvector matrix of the matrix C c , λ is the diagonal matrix composed of corresponding eigenvalues, arrange the eigenvalues in descending order, and obtain the eigenvalue matrix after whitening: At the same time, the diagonalization of C 1 and C 2 is obtained: S 1 and S 2 respectively represent two types of sample orthogonal whitening transformation matrices and have common eigenvectors. Therefore, S 1 and S 2 can be decomposed into two diagonal matrices λ 1 , λ 2 and an eigenvector matrix B respectively: The two diagonal matrices λ 1 , λ 2 meet the requirements of formula (17) and I is the identity matrix.
(1) Calculate the projection matrix For the eigenvector matrix B, when one type S 1 can take the maximum eigenvalue, the other type S 2 can take the minimum eigenvalue. Therefore, the matrix B can be used for binary classification calculation. The projection matrix can be obtained by the following formula: (2) Eigenvalues are calculated by projection The original EEG signal E N ×T is projected through the projection matrix W of formula (18) to obtain the characteristic matrix Z , whose dimension is N × T : Normalize the calculated characteristic matrix, as shown in formula (20), where f i is the normalized characteristic of the i th sample. According to the previous research conclusions, the spatial feature information is mainly concentrated in the head and tail of the feature matrix [50]. Therefore, we selected the first n rows and the last n rows of data as the feature matrix calculated by the PCMICSP algorithm, where 2n ≤ N .Z i represents the characteristic matrix of the i th sample, and Z in represents the j-th eigenvalue of the i th sample.
F 1 represents the normalized spatial features of the first type of samples obtained by the PCMICSP algorithm, and p represents the number of the first type of samples.
Similarly, the normalized spatial feature F 2 of the second type of samples obtained by the PCMICSP algorithm can be obtained.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. 2) Feature Combination Method: CSP can effectively extract spatial features from EEG signals that are easy to distinguish and classify. However, Alotaiby et al. [51] proved that the performance of the CSP spatial filter has an important relationship with the working frequency band of EEG, so this paper will analyze the spatial characteristics of PCMICSP in multiple frequency bands. In this paper, the feature combination method [52] designed by Zhang et al. is used to combine the spatial features extracted by PCMICSP with the time and frequency features of the source signal. The time domain, spatial domain and frequency domain features of each category can be extracted from multi-channel EEG signals. The spatial domain feature is the normalized spatial domain feature obtained in subsection I); The time-domain feature is the time feature before and after the source EEG signal is divided into 1 second; The frequency-domain feature is the combined feature of multiple frequency bands of the source signal. Fig. 6 is a feature combination flow chart of the method applied to the data in this paper.
In this experiment, the preprocessed data are the EEG signals one second before and one second after the subjects press the key in the test game. Its length is, 16 × 250, 16 is the number of EEG leads, and 250 is the sampling point of the 2s event sequence. Firstly, the data is divided into EEG_0 and EEG_1 according to event markers, respectively representing the data one second before and one second after the button pressing. Then, the PCMICSP algorithm proposed in this paper is used to obtain the normalized eigenmatrix of the two segments of data, whose dimension is, 1 × 16 and respectively represent the weights of 16 channels' characteristics in the spatial space. Finally, the 2-dimensional eigenmatrix is constructed according to the frequency band combination. In this paper, the combination of five frequency bands [45] is selected. Making the final two-dimensional feature matrix form a square matrix is made in the last step of the PCMICSP algorithm, that is, the first five rows and the last five rows of data are selected as the feature matrix for PCMICSP feature extraction. EEG of the previous second and the next second according to the above EEG_0 and EEG_1 will get 5 * 10 characteristic matrix respectively, 5 represents 5 frequency bands, and 10 represents the selected spatial characteristic matrix. The two feature matrices are placed up and down and combined into a two-dimensional feature matrix as the input value of the subsequent convolution neural network classifier. The matrix contains the time domain, spatial domain, and frequency domain characteristics of EEG signals of each key event before and after spatial cognitive training.

F. Classification and Statistical Methods
1) Classification Model: CNN model is used to classify EEG signals before and after spatial cognitive training. CNN is widely used in computer vision and natural language processing [53]. This paper uses Keras deep learning library to build the CNN model, in which the optimization function is Adam, the learning rate is 10 −4 , the batch size is set to 64, and 200 epochs per iteration. The two-dimensional feature matrix is used as the input of shallow CNN to construct the classification model of EEG signals before and after spatial cognitive training. Fig. 7 shows the structure of the CNN model.
The two-dimensional feature matrix obtained by the above feature combination method is used as the input layer of the CNN model, and the dimension is 10 * 10 * 1. The second layer is the convolution layer. For the two-dimensional characteristic matrix of the input layer, 28 convolution cores with a size of 3 * 3 and a moving step of 1 are used for convolution operation, and the activation function is the corrected linear unit (RELU). The third and fifth layers select the maximum pooling layer with the size of 2 * 2 and the moving step size of 2. The pooling layer realizes the data simplified sampling processing of the output matrix data of the previous layer. The fourth layer is the convolution layer, which uses 56 convolution cores with the size of 3 * 3 and the moving step size of 1 for convolution operation, and the RELU function for nonlinear transformation. The sixth and seventh layers are fully connected layers, which contain 56 and 14 neurons respectively. The nonlinear transformation is carried out by using the RELU activation function. The last layer is the output layer, which contains two neurons. The second classification is realized by using the Softmax activation function. To increase the robustness and generalization of the model, a Dropout layer with a probability coefficient of 0.5 was added to the fully connected layers of the sixth and seventh layers.
2) Statistical Method: In this study, 5-fold cross-validation was used to evaluate the comprehensive performance of the model. Four evaluation indicators were compared: precision, F1 value, recall rate, and AUC value. Since the accuracy curve and the loss curve can be used to evaluate the fitting ability of the CNN model, this study gives the average accuracy curve and the average loss curve obtained through 5-fold crossvalidations. The precision curve refers to the accuracy change curve of the CNN model in the training iteration process, and the loss curve refers to the loss change curve of the model in the verification data set.

III. RESULTS
In this study, combined with the feature combination method, the traditional CSP, mutual information CSP, conditional mutual information CSP, and the proposed sequential condition mutual information PCMICSP (CSP, MICSP, CMICSP, and PCMICSP) were used to convert the EEG signals before and after spatial cognitive training into two-dimensional images. Among them, the MICSP algorithm is a CSP feature extraction algorithm that fuses mutual information. The main difference between MICSP and PCMICSP is whether the mutual information matrix is used to calculate the mixed space matrix of the two kinds of data. The same applies to the CMICSP algorithm. Then the image data transformed by the four methods are input into the shallow CNN model for classification. The superiority of the PCMICSP feature extraction algorithm in spatial cognitive training effect evaluation was verified by CNN classification results. There are many combinations of PCMICSP features in the different time domains and frequency domains. This paper only selects the classification results of some frequency bands [33], including Delta-Theta-Alpha1-Beta1-Beta2, Theta-Alpha1-Beta1-Beta2-Gamma, Delta-Alpha2-Beta1-Beta2-Gamma, Delta-Theta-Alpha2-Beta1-Gamma combination.
A. Delta-Theta-Alpha1-Beta1-Beta2 Band Combination Fig. 8 and Fig. 9 show the variation curves of average accuracy and loss in the CNN model of CSP, MICSP, CMICSP, and PCMICSP feature sets obtained under the Delta-Theta-Alpha1-Beta1-Beta2 frequency band combination method. The figure shows that when the training iterations of CNN model are 110-120, 120-130, 150-160, and 160-170, the average accuracy of PCMICSP, CMICSP, CSP, and MICSP feature set is stable at 98%-98.5%, 96%-97%, 94.5%-95.5%, and 96%-97%, respectively. The corresponding  average loss rate steady at 4%-5%, 7%-8%, 12%-13%, and 8%-9%, respectively. Table I shows the average evaluation values of the above four feature sets in the CNN model under the combination of the Delta-Theta-Alpha1-Beta1-Beta2 frequency band. As can be seen from Table I, each evaluation index of the PCMICSP feature set classified by CNN classification is superior to that of CMICSP, MICSP, and CSP. Table II shows the classification accuracy of the CNN model under the combination of the Delta-Theta-Alpha1-Beta1-Beta2 frequency band. As can be seen from Table II, each accuracy index of the PCMICSP feature set classified by CNN classification is superior to that of CMICSP, MICSP, and CSP.
B. Theta-Alpha1-Beta1-Beta2-Gamma Band Combination Fig. 10 and Fig. 11 show the average accuracy and loss curves of CSP, MICSP, CMICSP, and PCMICSP feature sets obtained by the CNN model in the combination of Theta-Alpha1-Beta1-Beta2-Gamma bands, respectively. The figure shows that when the training iterations of CNN reach 160-170, the average accuracy rate and average loss rate of MICSP and CMICSP are stable at 96%-96.5% and 10%-11%, respectively. In addition, when the training iterations of the CNN model are 170 ∼ 180 and 190 ∼ 200, the average accuracy of the    PCMICSP and CSP feature set is stable at 98%-98.5% and 94.5%-95.5%, respectively. And the corresponding average loss rate was steady at 8%-9% and 12%-13%, respectively. Table III shows the average evaluation value of the above four feature sets in the CNN model under the Theta-Alpha1-Beta1-Beta2-Gamma band combination. It can be seen from Table III that each evaluation index of the PCMICSP feature set classified by CNN classification is superior to that of CMICSP, MICSP, and CSP.   Table IV shows the classification accuracy of the CNN model under the combination of the Theta-Alpha1-Beta1-Beta2-Gamma frequency band. As can be seen from Table IV, each accuracy index of the PCMICSP feature set classified by CNN classification is superior to that of CMICSP, MICSP, and CSP.
C. Delta-Alpha2-Beta1-Beta2-Gamma Band Combination Fig. 12 and Fig. 13 respectively display the variation curves of average accuracy and loss in the CNN model of CSP, MICSP, CMICSP, and PCMICSP feature sets under the Delta-Alpha2-Beta1-Beta2-Gamma band combination. The figure shows that when the training iterations of CNN reach 181-190, the average accuracy rate of CSP and PCMICSP are stable at 94%∼95% and 97%∼97.5%, respectively. And the corresponding average loss rate is steady at 11%∼12% and 7%∼8%, respectively. Also, when the training iterations of the CNN model are 180 ∼ 190 and 190 ∼ 200, the average accuracy of MICSP and CMICSP feature set is stable at 96.5%-97%, and the corresponding average loss rate steady at 10%-11% and 9%-10%, respectively.    CNN classification is superior to that of CMICSP, MICSP, and CSP.
D. Delta-Theta-Alpha2-Beta1-Gamma Band Combination Fig. 14 and Fig. 15 respectively show the average accuracy and loss curves of CSP, MICSP, CMICSP, and PCMICSP feature sets obtained by using the above feature combination method in the Delta-Theta-Alpha2-Beta1-Gamma band combination in the CNN model. The figure shows that when the training iterations of the CNN model reach 150 ∼ 160, the   Table IX shows that the Pvalues of the three spatial cognition scales of the 7 individuals before and after training were all less than 0.05 before and after training, and the statistical results showed significant differences. The average CBTT score increased from 37.14 points before training to 50 points after training, showing a wide range of improvement. The average score of GZSOT increased from 4.86 points before training to 9.89 points after training, with a wide variation range. The mean error Angle obtained by PTSOT decreased from 35.75 points before training to 19.32 points after training, with a wide range of variation.

A. Feasibility and Effectiveness of PCMICSP
In this section, the CSP feature extraction method and convolutional neural network classification method were combined to classify and evaluate the EEG signals of the spatial cognitive training group before and after the training. The results of convolutional neural network classification show that the classification accuracy and other performance indexes of MICSP, CMICSP, and PCMICSP are better than that of traditional CSP. The common space pattern feature extraction algorithm combining the MI theory [54] method can effectively improve classification accuracy. MI theory method can measure the degree of linear and nonlinear correlation of EEG signals, which is very important for non-stationary and nonlinear EEG signals. By comparing the classification accuracy and other performance indexes of MICSP, CMICSP, and PCMICSP, it is found that the results of PCMICSP proposed in this paper are the best ones under the combination of multiple frequency bands, while the results of MICSP and CMICSP are not significantly different. This is consistent with the conclusion that the permutation condition mutual information algorithm is superior to the conditional mutual information method described in Li and Ouyang literature [32]. PCMI can more effectively detect the interaction delay between the event sequences of two brain regions. In addition, PCMI has strong robustness to EEG, so the EEG noise will not seriously damage the inherent timing pattern, especially when there is strong coupling strength between channels [48], [49], [55].

B. The Classification Performance of PCMICSP
The classification results of the PCMICSP feature set under different frequency band combinations were analyzed separately. It was found that the CNN model of the Delta-Theta-Alpha1-Beta1-Beta2 frequency band combination had the best classification effect in terms of average loss, average accuracy, and various evaluation indexes. The Theta-Alpha1-Beta1-Beta2-Gamma band combination is next, but not far apart. The variation of the Theta frequency band is consistent with the research results of Guilford and Zimmerman [37]: the Theta frequency band oscillation of humans is correlated with the coding and retrieval of spatial information. Koenig et al. found an increase in EEG synchronous measurements in the Delta band in their study of spatial cognitive network function [56]. The Theta and Alpha1 frequency bands are consistent with the results obtained in literature [40]: the brain network attributes of Theta and Alpha1 frequency bands change most significantly before and after training. Yuan [33] showed a good EEG classification effect in Beta1 and Beta2 bands before and after spatial cognitive training, which was consistent with the frequency band combination obtained in this study. It can be concluded that the PCMICSP EEG feature extraction method based on the combination of the Delta-Theta-Alpha1-Beta1-Beta2 frequency band is the most effective method to evaluate the effect of spatial cognitive training.
However, the sample size of this paper is relatively small, and the deep learning model with more hidden layers cannot be used. Therefore, the rationality of the PCMICSP algorithm can be further verified by adding more EEG signals in the future.

V. CONCLUSION
In this paper, the common spatial patterns feature extraction algorithm based on the mutual information of sorting conditions is proposed. The covariance matrix in the original CSP algorithm is replaced by the mutual information matrix of sorting conditions so that CSP can construct spatial filters according to the linear and nonlinear correlation degree of EEG signals simultaneously. The EEG signals before and after spatial cognitive training were used as the data set, and the feature combination method and convolutional neural network classification method were used to verify the performance of the proposed feature extraction algorithm. Experiments show that PCMICSP has the highest classification accuracy under the combination of multiple frequency bands. The algorithm combines the advantages of PCMI and solves the strict linear hypothesis of traditional CSP. The experimental results show that the PCMICSP algorithm can be used for the classification and evaluation of spatial cognitive task state EEG.