Identifying Patients with Poststroke Mild Cognitive Impairment by Pattern Recognition of Working Memory Load-Related ERP

The early detection of subjects with probable cognitive deficits is crucial for effective appliance of treatment strategies. This paper explored a methodology used to discriminate between evoked related potential signals of stroke patients and their matched control subjects in a visual working memory paradigm. The proposed algorithm, which combined independent component analysis and orthogonal empirical mode decomposition, was applied to extract independent sources. Four types of target stimulus features including P300 peak latency, P300 peak amplitude, root mean square, and theta frequency band power were chosen. Evolutionary multiple kernel support vector machine (EMK-SVM) based on genetic programming was investigated to classify stroke patients and healthy controls. Based on 5-fold cross-validation runs, EMK-SVM provided better classification performance compared with other state-of-the-art algorithms. Comparing stroke patients with healthy controls using the proposed algorithm, we achieved the maximum classification accuracies of 91.76% and 82.23% for 0-back and 1-back tasks, respectively. Overall, the experimental results showed that the proposed method was effective. The approach in this study may eventually lead to a reliable tool for identifying suitable brain impairment candidates and assessing cognitive function.


Introduction
Cognitive impairment after a stroke can affect the activities of daily living. Specifically, stroke patients are often associated with the working memory loss compared to age-matched healthy controls. The working memory is used for temporary storage and manipulation of the information and plays a key role in long-term memory, language, and execution function. Mild cognitive impairment (MCI) is common in poststroke patients [1], and it is widely considered to be the clinical transition stage between normal aging and dementia. Therefore, early MCI detection is of crucial importance for preventing poststroke dementia onset [2][3][4][5][6].
The accurate identification and assessment of cognitive function present a major clinical challenge. At present, the medical diagnosis of MCI is usually performed by some extensive tests including neuropsychological tests such as Mini-Mental State Examination (MMSE), neurological examination, and electrophysiological signal detection such as EEG [4,[7][8][9]. Due to its temporal resolution in the millisecond range as well as its noninvasiveness, wide availability, and relatively low costs, EEG is a popular measurement technique containing a lot of information about the human brain function and neurological disorders [10,11]. It can also provide the objectivity and quantity evidence for the medical diagnosis [12,13].
Prior EEG studies have concentrated on measuring scalp P300 event related potential (ERP) and EEG frequency power in cognitive impairment patients. The P300 component is typically elicited approximately 300 ms after each infrequent target stimulus, with reflecting the context updating and the categorization of relevant tasks [14][15][16]. Parameters extracted from ERP signals are of clinical interest because they are useful in differentiating the healthy controls from cognitive impairment patients [8,11,17]. In the time domain, the majority of studies on P300 in cognitive impairment have reported prolonged latencies and reduced amplitudes in visual or auditory modality [15,18]. In the frequency domain, the spectral  studies on the cognition have reported theta power changes due to increased demands on cognitive processes, such as the presentation of infrequent target stimulus in an oddball paradigm. Therefore, it is possible that the reduction of EEG theta power is a feature of cognitive impairment [2,15,19,20].
Some studies have been motivated by the goal of using EEG to identify cognitive impairment patients with effective algorithms. Lehmann et al. explored the ability of a multitude of linear and nonlinear classification algorithms (i.e., linear discriminant analysis (LDA), neural network (NN), and support vector machine (SVM)) to discriminate between EEG signals of patients with varying degrees of cognitive impairment [9]. Dauwels et al. used LDA and quadratic discriminant analysis (QDA) to classify cognitive impairment [4]. Akrofi et al. studied the classification of cognitive impairment using Gaussian mixture model and selected features from relative average power and the coherence between intrahemispheric channel pairs [3]. Gallego-Jutglà et al. used theta band power and LDA to achieve the best accuracy for diagnosing cognitive impairment [21].
In those EEG-based classification algorithms, SVM based on structural risk minimization yields good performances in many applications, especially for solving problems with high dimension, nonlinearity, and small dataset. However, it is often unclear what the most suitable kernel in SVM is, and so the user may wish to combine several possible kernels. Multiple kernel learning SVM (MKL-SVM) is an efficient way of optimizing kernel weights [22,23]. Compared with one single kernel SVM, MKL-SVM can enhance the interpretability of the decision function and improve the performance [24,25].
Recently, nonlinear methods that include independent component analysis (ICA) and orthogonal empirical model decomposition (OEMD) have been proposed to extract parameters for the analysis and classification of EEG signals [11,16]. ICA is a kind of blind source separation technique that extracts statistically independent sources called independent components (ICs) from a set of recorded signals [26]. OEMD is a self-adaptive signal processing and data driven method. Compared with classical time-frequency analysis methods, such as short time Fourier transform (STFT) and Wavelet decomposition, it is based on the local characteristic time scales of a signal and could decompose the signal into a set of complete orthogonal components called intrinsic mode functions (IMFs) which are determined by the signal itself without prior knowledge about the signal [26,27]. OEMD can overcome the mode aliasing and avoid the occurrence of the fault mode [28].
In this study, we first applied the algorithm combining ICA and OEMD to the ERP data of stroke patients and healthy controls and used four types of features including P300 peak latency, P300 peak amplitude, root mean square (RMS), and theta frequency band power to separate stroke patients from healthy ones. Then the features and the evolutionary multiple kernel SVM (EMK-SVM) based on genetic programming (GP) were used to perform the recognition of stroke patients and healthy controls based on working memory tasks. These tasks that may elicit a P300 ERP component were 0-back and 1-back tasks.

Materials and Methods
We proposed a classification approach of stroke patients and healthy controls, as illustrated in Figure 1. The presented approach consisted of three main parts. (1) The preprocessing algorithm combining ICA and OEMD was used to extract independent source components from the 18-channel ERP signals. (2) Four types of features including P300 peak latency, P300 peak amplitude, RMS, and theta frequency band power were estimated, and they were differently chosen to compose a feature vector for further classification. (3) EMK-SVM was employed to perform the working memory task classification, and the classification accuracies were used to evaluate the performance of the proposed algorithm.

ERP Recordings.
Consecutive patients aged 50 years or older with a first-ever acute ischemic stroke at Huadong Hospital Affiliated to Fudan University between May 2012 and January 2013 were recruited. All patients underwent neuropsychological and neuroimaging assessments, and those who met the criteria for vascular MCI were included ( = 13) [29,30]. 13 age-and sex-matched healthy controls were enrolled in this cross-sectional study. All subjects were right handed and had normal vision. This study was approved by Huadong Hospital Affiliated to Fudan University Ethics Board, and all subjects gave written, informed consents before participation. As shown in Figure 2, the working memory was assessed using a verbal -back task [5,31]. A pseudorandom set of 4 digit numbers was displayed on a monitor, and the subjects were instructed to determine whether specific digit one appeared on the screen (0-back task); or the currently displayed number at any given time had been already displayed in the previous presentation (1-back task). Stimuli consisted in a 0.5 sec. Inter-stimulus interval (ISI) was 2.5 sec in all conditions. Subjects had to distinguish between targets and nontargets by pressing a keyboard. Continuous ERP signals were acquired using an EEG/ERP amplifier system (NATION Inc.). For all ERP recordings, 18 electrodes were placed according to the 10-20 international system. The chosen electrode positions were EOG1, EOG2, Fp1, Fp2, F3, F4, F7, F8, Fz, C3, Cz, C4, P3, Pz, P4, O1, Oz, and O2 (see Figure 2). The data were sampled at 256 Hz. Signals were recorded for 120 s during each task. Each task was repeated for three sessions. Each session contained 40 trials with a 1 : 1 target/nontarget relation. Namely, the total number of targets was 60, the same as that of nontargets. The hit rate and the reaction time were measured, as shown in Figure 3.

Preprocessing Algorithms.
The difficulty in developing a classification system based on EEG is to discriminate the responses from the background noise reliably because the signals are relatively weak and interfered easily by the artifacts, such as electromagnetic interference, powerline noise, EOG, EMG, ECG, and subject movements. Some preprocessing algorithms have been applied to the EEG data in order to extract more informative features, which can be used as inputs to a classifier to improve the classification accuracy of task-related activity, such as ICA and OEMD [32,33]. The EMD method may encounter the difficulty of mode mixing, and it is mainly caused by noise, boundary effect, and so forth. This type of mixing will lead to the presence of several components of the signal of interest on the same IMF, which can cause the difficulty in the physical discriminant of each mode. To solve these problems, ICA and OEMD were combined in this study. we first removed the artifacts from the given ERP signals to extract statistically independent sources by independent component analysis (ICA) and then decomposed them to extract real IMF components using OEMD algorithm.

Independent Component Analysis.
ICA was applied on the entire collection of raw ERP signals ( ) = [ 1 ( ), . . . , ( )] , where indicated the number of channels on the scalp. The goal of ICA is to find an unmixing matrix that initially produces the ERP signals ( ) based on statistically independent sources ( ) in the matrix form = → . In contrast to correlation-based transformations such as principal component analysis (PCA), ICA reduces the statistical dependencies of the signals and attempts to make them as independent as possible. This technique has shown great promise for analyzing EEG recordings [16,[34][35][36]. There are many ways for learning . We used the extended Infomax algorithm which minimizes the mutual information among the data projections in order to achieve the independence.
The learning rule of the unmixing matrix is [36] ∝  where are elements of the -dimensional diagonal matrix , is the iteration number, and is the step size. The switching parameter can be derived from the variation of the kurtosis sign. can be obtained as

Orthogonal Empirical Model
Decomposition. EMD is a data-adapted interactive method, which can decompose any complicated time series into additive components with multiscale features; that is, These components are denoted by IMFs, where is the number of IMFs, ( ) is the th IMF, and +1 ( ) is the final residue [37].
One of the major drawbacks of the original EMD algorithm is that the IMFs are not strictly orthogonal to each other [27], which can cause the energy leakage while decomposing. Therefore, it is necessary to perform the orthogonal processing for the IMFs from EMD in order to obtain the completely orthogonal IMFs [38].
In particular, once the first IMF is derived, define 1 ( ) = 1 ( ), which is the smallest temporal scale in ( ).
To determine the rest of the IMFs, generate the residue 1 ( ) = ( )− 1 ( ). 1 ( ) can be treated as the new signal and the EMD decomposing is performed for the second IMF 2 ( ). In order to achieve the orthogonal component, 1 ( ) has to be subtracted from 2 ( ); that is, where 2 ( ) is the second orthogonal IMF and 21 is the orthogonal parameter between 2 ( ) and 1 ( ). With the orthogonality between 2 ( ) and 1 ( ), 21 can be obtained as The above process is repeated until the expected index number of IMF is met [11,38].

Ensemble ICA-OEMD.
First, ICA based on the extended Infomax algorithm was applied to ERP signals, and the independent components were extracted. Second, OEMD was performed for each obtained source, and a set of orthogonal IMFs was derived, in which only the IMF of interest based on theta frequency band power and the peak between 200 ms and 450 ms was selected. The combining decomposition is adaptive to the time and frequency content of the data themselves and can separate original ERP signals into orthogonal components with different time scales. Meanwhile, Step 1. Initialize control parameters and 0 ; Step 2. Center data; Step 3. Compute SVD of the centered data matrix; define the th IMF ( ); Step 8. Update residual +1 ( ) = ( ) − ( ); Step 9. Repeat Step 7-8 with = + 1 until the expected index number of IMF is met.
Algorithm 1: ICA-OEMD. the orthogonality property implies that different IMFs do not have similar frequency content [26]. The algorithm is described in Algorithm 1.

Feature Extraction.
The efficient feature extraction from multichannel working memory task EEG signals is a major component for the cognitive state classification. In this study, four types of features including P300 peak latency, P300 peak amplitude, RMS, and theta frequency band power were chosen. They are classical, quantitative ERP measures that are commonly used in this field [14]. In particular, we extracted theta frequency band data of working memory task EEG signals through the wavelet packet transform (WPT) decomposition (4 levels) with the wavelet basis db4 and then calculated the energy spectrum.

Classification
Algorithm. SVM is a powerful approach for pattern recognition especially for high dimensional, nonlinear problems. Recent developments on SVM have shown that it is necessary to consider multiple kernels [22]. This provides flexibility and reflects the fact that typical learning problems often involve multiple, heterogeneous data sources. Although the MKL-SVM algorithm is shown to improve the classification performance effectively, it relies more on the empirical kernel functions (e.g., polynomial function and radial basis function) and parameters (e.g., degree and Gaussian width), which can affect its effectiveness because different functions and parameters may result in different performances. A potential solution is to use GP to evolve the kernels and associated parameters automatically [39,40]. GP is an evolutionary algorithm inspired by biological evolution, where each individual is a computer program represented as a tree structure. These computer programs that solve a given problem are genetically bred. This breeding is done by using the genetic operations from Darwinian principle, such as selection, crossover, and mutation. The evolutionary process is repeated over many generations until the fittest individual computer program is found [40].
In this study, we evolved more effective kernel function using GP and kernel closure properties, where each tree structure represented a multiple kernel function [41]. EMK-SVM is described in Algorithm 2.
The algorithm was based on GP-kernel and SVM. First, GP started with an initial population of randomly generated computer programs which are composed of functions and terminals, and their individuals were some nonlinear combination trees of kernel functions. Second, the SVM learning was performed for each individual tree. The following steps Step 1. Create initial random population with ramped half and half; Step 2. Calculate the classification accuracy as the fitness f of each individual with SVM; Step 3. Repeat the following Step 4 until max , − is less than 0.01 or the maximum generation number is met. were performed iteratively until the termination criterion has been satisfied. Third, the best kernel function that appeared in any generation was designated as the result.
Some settings of control parameters used in this study are given in Table 1.

General Classification Performance.
In this study, 16channel ERP signals were classified via a 5-fold crossvalidation. First, the ERP data were partitioned into 5 equally sized folds. Second, 5 iterations of training and validation were performed such that within each iteration a different fold of the ERP data was held out for validation while the remaining 4 folds were used for learning. Finally, the classification results from 5 folds were averaged to produce the classification accuracy. Four types of features, including P300 peak latency, P300 peak amplitude, RMS, and theta frequency band power, were extracted for the classification using the EMK-SVM algorithm.
The results of the -test and -test are shown in Table 2. For 0-back and 1-back tasks, the differences between stroke patients and healthy controls were significant. There were differences of P300 latency, P300 amplitude, RMS, and theta band power in the stroke patients compared with healthy ones. In particular, the differences of P300 amplitude and RMS between stroke patients and healthy controls in 1-back task were larger.
For 0-back and 1-back tasks, the multiple comparison tests were also performed to determine which were significantly different from other groups on the basis of Table 2. The results of the multiple comparison tests with the Bonferroni procedure are shown in the Table 3. For four types of features, the differences between stroke patient group and healthy control group were obvious, which were consistent with the results in Table 2. These significant differences provide a basis for the further classification.  Table 4 shows the results achieved with four selected features using the EMK-SVM classifier distinguishing stroke patients versus healthy controls. The classification results obtained in 0-back and 1-back tasks ranged from 75% to 91.67%. In 0-back task, the accuracy for RMS or theta band power was the highest, 91.67%, while that for peak latency was the lowest, 78.4%. In 1-back task, the accuracy for theta band power was the highest, 82.23%, while that for peak latency was the lowest, 75%. As mentioned above, the general classification accuracy of 0-back task was higher than that of 1-back task. It can be seen that theta band power is the best feature used for the classification. Figures 4 and 5 show the classification comparison results of 0-back and 1-back tasks between the proposed algorithm and another two state-of-the-art algorithms (QDA and LDA) with the same features. As has been shown, the classification results based on EMK-SVM were better than those based on QDA and LDA.

Classifier Results.
EMK-SVM based on the parameters listed in Table 1 is performed for automatic classification of stroke patients and healthy controls. An initial population of 100 kernel function trees was created and then iteratively proceeded through 100 generations with the genetic operations. The initial generation consisted of highly unfit individuals. The intermediate generations contained a few somewhat fit individuals.    The final generation of each run contained at least one individual that was effective in solving the classification problem. The optimal kernel functions are shown in Table 5. For example, the optimal kernel function for 0-back task classification with theta band power feature was exp(−1.68 2 Poly 2 RBF ), which occured in generation 79.

Conclusions
In this paper, we presented the EMK-SVM algorithm for ERP-based signal classification for stroke patients and healthy controls with four features (i.e., P300 peak latency, P300 peak amplitude, RMS, and theta frequency band power). The proposed method had better performance than other typical methods (i.e., QDA and LDA). It achieved above 78.4% accuracy for 0-back task and above 75% for 1-back task. The statistical test results showed that the differences of selected features were significant. Therefore, it provides a powerful tool to assess cognitive function.
In sum, it is an effective method to implement working memory task-based BCI based on cognitive impairment. We applied the preprocessing algorithm combining ICA and OEMD to extract more informative features and also applied the recognition algorithm combining SVM and GP to discover better kernels to improve the classification accuracy. This study provides theoretical and experimental basis of the quantity diagnosis for cognitive impairment. It is helpful for the intelligent identification of cognitive function and appropriate rehabilitation training.