Elsevier

Expert Systems with Applications

Volume 113, 15 December 2018, Pages 18-32
Expert Systems with Applications

Classification of focal and non-focal EEG signals using neighborhood component analysis and machine learning algorithms

https://doi.org/10.1016/j.eswa.2018.06.031Get rights and content

Highlights

  • CADFES tool is designed for automated classification of focal seizures

  • Significant features were selected using neighbourhood component analysis

  • Regularization parameter was optimized to ensure less classification loss

  • Classification accuracy of 96.1% was attained using support vector machine classifier

  • Results were better than existing approaches.

Abstract

Background: Classification and localization of focal epileptic seizures provide a proper diagnostic procedure for epilepsy patients. Visual identification of seizure activity from long-term electroencephalography (EEG) is tedious, time-consuming and leads to human error. Therefore, there is a need for an automated classification system.

Methods: In this paper, we introduce a tool called CADFES: computerized automated detection of focal epileptic seizures. For the study, total 41.66 hours of EEG data from the Bern-Barcelona database was used. Set of 28 features were extracted from time, frequency, and statistical domain and significant features were selected using neighborhood component analysis (NCA). In NCA, optimization of regularization parameter ensured better classification accuracy (less classification loss) with seven features. The performance of the algorithm was assessed using support vector machine (SVM), K-nearest neighbor (K-NN), random forest and adaptive boosting (AdaBoost) classifiers.

Results: Experimental results revealed sensitivity, specificity, accuracy, positive predictive rate, negative predictive rate, and area under the curve of 97.6%, 94.4%, 96.1%, 92.9%, 98.8% and 0.96 respectively using the SVM classifier. Finally, MATLAB based software tool referred to as CADFES was introduced for automated classification of focal and non-focal seizures. Comparison results ensure that proposed study is superior to existing methods. Hence, it is expected to perform better at the hospitals for automated classification of focal epileptic seizures in real-time.

Introduction

A seizure is the clinical manifestation of abnormal, excessive, hypersynchronous discharge of a population of cortical neurons (Bromfield, Cavazos, & Sirven, 2006). Due to the hyperactive and synchronous behavior, neurons fire action potential at a higher rate than its standard. Consequently, it leads to the generation of irregular brain patterns. According to the international classification of seizure types, focal or partial seizures and non-focal or generalized seizures are predominant. A damaged neuron in the cortical brain network is the indication of focal seizures. Secondarily generalized seizures spread across 75% of the brain regions (Moshé, Perucca, Wiebe, & Mathern, 2011). One must understand the process of initiation and generalization for the effect and implication of focal seizure. While the focal affects one side of the cerebral hemisphere, nonfocal influences both sides of the brain. A typical symptom of non-focal seizure includes loss of consciousness followed by generalized body stiffening (Richard, Rosen, Mattson, & Novelly, 1980).

The need for efficient and optimized computerized automated detection techniques still gains importance in the current clinical scenario among the clinical community. Though the existing methods reported in the literature work well for off-line seizure detection, the transition from off-line to real-time requirements involve challenges towards attaining optimal clinical diagnosis. One must develop optimized bio-marker tool which suits well for both off-line and real-time clinical conditions.

Neighborhood component analysis (NCA) was utilized to identify optimal feature set from time, frequency, and statistical domain based features. The well-known principal component analysis (PCA) and linear discriminant analysis (LDA) are optimization techniques for class separability and dimensionality reduction of high dimensional data. Results of PCA depend on the scaling of the variables, and the LDA algorithm is optimal if all class distributions are Gaussian (parametric assumption) with a single shared covariance (Fisher, 1936). Whereas, NCA algorithm is a non-parametric technique with the criteria that the feature set need not follow a typical Gaussian distribution (Goldberger, Roweis, Hinton, Salakhutdinov, 2005, Yang, Wang, Zuo, 2012). Hence, significant features were selected using the NCA algorithm. To the best of the author’s knowledge, it is the first attempt of NCA on EEG signals.

Several attempts have been made to classify focal and non-focal activities from EEG recordings (Das, Bhuiyan, 2016, Gupta, Priya, Yadav, Pachori, Acharya, 2017, Sharma, Dhere, Pachori, Acharya, 2017, Sriraam, Raghu, 2017). In Sharma et al. (2017), EEG signals were decomposed using a novel class of orthogonal wavelet filter banks. Various entropies were computed from wavelet coefficients of the decomposed EEG signal, and the least squares-support vector machine (LS-SVM) classifier showed the accuracy of 94.25%. The difference EEG signals were decomposed into 15 levels using a flexible analytic wavelet transform (FAWT) to compute Stein’s unbiased risk estimate (SURE) entropy and log energy entropy (Gupta et al., 2017). Empirical mode decomposition (EMD) followed by discrete wavelet transform (DWT) was applied on EEG signals to compute log energy entropy (Das & Bhuiyan, 2016). The obtained features were classified using K-NN classifier which yields the accuracy of 89.4%. In Sharma, Pachori, and Acharya (2015b), sample entropy and variances of the intrinsic mode functions were extracted through EMD and accuracy of 85% was attained using LS-SVM classifier. Nine statistical features were derived from EEG signals decomposed using various wavelet functions such as bior5.5, coif3, db2, dmey, haar, rbio6.8, and sym6 (Chen, Wan, & Bao, 2015). This research study showed the accuracy of 83% using SVM classifier with radial basis function as a kernel function. Focal and non-focal EEG signals were decomposed up to six levels using DWT to compute approximate and detail coefficients of sub-band signals (Sharma, Pachori, & Acharya, 2015a).

Several automated seizure detection algorithms have been proposed using different features and machine learning algorithms. Menshawy et al., proposed the method using features such as mean, standard deviation, skewness, kurtosis, the median in the first and second derivative of EEG signals (Menshawy, Benharref, & Serhani, 2015). These features were employed in mobile-based automated epileptic seizure detection using k-means clustering technique. Bogaarts et al., extracted features such as curve length, root mean square, band power, zero crossing, Hjorth parameters and Teager energy to classify epileptic EEG from normal (Bogaarts, 2016) using the SVM classifier. The features were ranked using ttest, receiving operator characteristics, Bhattacharyya and Wilcoxon methods (Raghu, Sriraam, & Hegde, 2017). The effect of wavelet packet decomposition of EEG signals was studied using log energy entropy (Raghu, Sriraam, & Pradeep Kumar, 2015). Different entropies like log energy, norm (Raghu, Sriraam, & Pradeep Kumar, 2016), wavelet, sample, and spectral (Pravin, Sriraam, Benakop, & Jinaga, 2010), approximate (Srinivasan, Eswaran, & Sriraam, 2007), Renyi, and Shannon (Acharya, Molinari, Vinitha, Chattopadhyay, 2012a, Raghu, Sriraam, 2017), spectral and normalized spectral (Srinivasan, Eswaran, & Sriraam, 2005), minimum variance modified fuzzy (Raghu, Sriraam, Padeep Kumar, & Hegde, 2018) entropies were applied to EEG for classification of epileptic seizures.

All the features extracted from EEG may not be suitable for classification. Additionally, the presence of insignificant features leads to a burden for any classifier. Therefore, several dimensionality reduction methods such as PCA, LDA, sequential forward selection (SFS), sequential backward selection (SBS), t-distributed stochastic neighbor embedding (t-SNE), lasso, evolutionary optimization and feature ensemble techniques have been used to identify best features (Geva, Kerem, 1998, Ghosh-Dastidar, Adeli, Dadmehr, 2008, Iasemidis, et al., 2003, Mao, Yao, Huang, 2017, Mirowski, Madhavan, LeCun, Kuzniecky, 2009, Wang, Lyu, 2015). Ghosh-Dastidar et al. (2008) used PCA for feature enhancement which improved the classification accuracy of the cosine radial basis function neural network. This method showed classification accuracy of 99.3% with normal and interictal EEGs. A variation of PCA called global modular PCA (GModPCA) was applied for epileptic seizure detection using the SVM classifier (Jaiswal & Banka, 2017). This study claimed that time and space complexities of GModPCA were less compared to PCA. Three-dimensionality reduction techniques, namely PCA, independent components analysis (ICA), and LDA were used to classify seizures from normal EEG (Subasi & Gursoy, 2010). In a specific study made by Acharya, Sree, Alvin, and Suri (2012b), EEG time series were decomposed using wavelet packets, and eigenvalues were extracted from the resultant wavelet coefficients using PCA. Classification accuracy of 99% was achieved using the Gaussian mixture model classifier with 10-fold cross-validation.

Using SFS and SBS feature selection methods and SVM classifier sensitivity of 98.8% was achieved (Wang & Lyu, 2015). An attempt was made to extract the best features from a continuous wavelet transform using SFS and SBS methods, and an overall accuracy of 99% was shown using the Bayes classifier (Yaghoobi, 2014). A simple random sampling technique was used to extract features from the time domain of EEG signals, and the SFS algorithm was applied to reduce the dimensionality of the data (Ghayab, Li, Abdulla, Diykh, & Wan, 2016). Classification accuracy, sensitivity, and specificity of 99.90%, 99.80%, and 100% was achieved respectively using the LS-SVM classifier. A review was performed for channel selection algorithms such as PCA, SFS, SBS and LDA for EEG signal processing to address the issues with regards to the following (i) to reduce the computational complexity, (ii) to minimize the overfitting, and (iii) to reduce the setup time (Alotaiby, El-Samie, Alshebeili, & Ahmad, 2015).

A dimensionality reduction method called t-SNE was used to identify the significant features (Birjandtalab, Pouyan, & Nourani, 2016). The t-SNE technique was applied for extracting the nonlinear features for motor imagery EEG (Li, Luo, & Yang, 2016) and biometric identification using EEG with deep learning (Mao et al., 2017). Another dimensionality reduction method called lasso or L1-norm weight selection was used for seizure prediction from the long-term multichannel EEG (Mirowski et al., 2009). Results using a convolution neural network yielded sensitivity of 71% with 0 false positives and false alarms. For further understanding, one can refer to other related feature dimensionality reduction techniques feature ensemble (Babloyantz, Destexhe, 1986, Kannathal, Choo, Acharya, 2005, Truccolo, et al., 2012, Tzallas, Tsipouras, Fotiadis, 2009) and evolutionary computational techniques (Geva, Kerem, 1998, Iasemidis, et al., 2003, Wendling, Hernandez, Bellanger, Chauvel, Bartolomei, 2005).

Goldberger et al. (2005) proposed the NCA algorithm for high dimensional data analysis for pattern recognition problems. In Weinberger and Sau (2009), NCA was used to identify the most significant distance using Mahalanobis for binary classification problem. Along with constructive neural networks, NCA was used to maximize stochastic variant of the leave-one-out K-NN classifier performance (Qin, Song, & Huang, 2014). The dimension of the feature space was reduced from 50 to 22 using NCA (Rizwan & Anderson, 2016). Through the literature survey, it was observed that most of the EEG related studies reported earlier focused on PCA, SFS, SBS, and LDA. Meanwhile, to the best of the author’s knowledge attempts have not been made using NCA. Therefore, in this study, the NCA algorithm was applied for the first time on EEG signals for feature selection.

Different software packages have been proposed for the EEG signal analysis (Delorme, Makeig, 2004, Oostenveld, Pascal, Maris, Schoffelen, 2011, Teixeiraa, 2011). EPILAB is one such software package for the automated prediction of epileptic seizures from multichannel EEG recordings (Teixeiraa, 2011). In the same way, EEGLAB performs pre-processing of EEG signals by implementing ICA, time/frequency evaluation, artifact rejection and information visualization (Delorme & Makeig, 2004). FieldTrip is another MATLAB toolbox for advanced and straightforward analysis of EEG and magnetoencephalography such as time/frequency analysis, non-parametric statistical testing, source reconstruction using dipoles and distributed sources (Oostenveld et al., 2011). Though these software packages were developed for EEG based studies, automated classification of focal and non-focal EEG was missing.

Selection of appropriate segmentation length plays a vital role in the proper classification of EEG signal. The studies (Cho, Min, Kim, Lee, 2017, Parvez, Paul, 2016, Srinivasan, Eswaran, Sriraam, 2007) have shown the effect of segmentation length towards achieving better classifier performance. Different segmentation lengths such as 5s, 10s, and 15s were applied and segmentation length of 10s showed better results (Parvez & Paul, 2016). Srinivasan et al. (2007) was used different frame size regarding data points such as 173, 256, 512, 1024, and 2048 to calculate the approximate entropy. Results suggested that classification accuracy gets better for the frame size with higher data points. Similarly, in Cho et al. (2017), the performance of the seizure prediction and detection was evaluated using different segmentation lengths of 1s, 5s, 10s, and 15s. Results suggested that variation of segmentation lengths on EEG reflected on classification accuracy. Above studies recommended the use of different segmentation lengths that would help to improve the classification accuracy. Therefore, based on the evidence of these three studies, we have varied segmentation length in terms of 2s, 5s, and 10s.

As per the literature, the PCA algorithm relies on scaling of variables, and orthogonal transformations of original data (Malthouse, 1998),Principalcomponent-analysis. Whereas, NCA algorithm is a non-parametric technique, preferred over other feature selection methods due to its supervised learning capability. On the other hand, few studies (Das, Bhuiyan, 2016, Gupta, Priya, Yadav, Pachori, Acharya, 2017, Sharma, Dhere, Pachori, Acharya, 2017, Sharma, Pachori, Acharya, 2015a, Sharma, Pachori, Acharya, 2015b, Sharma, Pachori, Gautam, 2014) have repeated DWT and EMD decomposition techniques utilizing various entropy procedures. Therefore, we aim to improve the current classification results with appropriate feature parameters, feature selection, and classifier.

The rest of the paper is organized as follows: Section 2 describes the dataset and the methodology adopted in this study. Results are presented in Section 3 along with CADFES tool. Comparison results are discussed in Section 4 and finally proposed work concluded in Section 5.

Fig. 1 shows the step by step flow of the proposed CADFES algorithm using NCA and machine learning algorithms. Initially, Butterworth low pass filter followed by detrending was applied to EEG signals. Subsequently, 28 features were extracted using the segmentation lengths of 2s, 5s, and 10s. In this study, first-time application of the NCA algorithm was used for the selection of significant features for EEG signals. Three machine learning algorithms, namely SVM, K-NN, random forest and AdaBoost classifiers were used to identify the best performance. A MATLAB based CADFES tool was developed using the complete procedure to classify focal and non-focal EEG signals automatically. Finally, compared the results of the proposed method and the existing methods.

Section snippets

Database description

The study was conducted using the intracranial EEG obtained from publicly available Bern-Barcelona database (Andrzejak, Schindler, & Rummel, 2012). This dataset was comprised of EEG recordings derived from five pharmacoresistant temporal lobe epilepsy patients with 3750 focal and 3750 non-focal bivariate EEG files. All five patients had an excellent surgical outcome. Three patients attained complete seizure freedom, and two patients only had auras but no other seizures following surgery.

Selection of different segmentation lengths

In seizure detection studies the selection of appropriate segmentation length is essential to attain better classification results. Furthermore, in a real-time scenario detection delay and false alarm play a vital role to evaluate the performance of the algorithm. Hence, attention should be given for the proper selection of segmentation length which shows better classification results too. Some of the existing studies have used different segmentation lengths no longer than 10s (Iasemidis,

Discussion

The combination of multi-features showed better results compared to previous studies conducted using the Bern-Barcelona EEG database (Chen, Wan, Bao, 2015, Das, Bhuiyan, 2016, Gupta, Priya, Yadav, Pachori, Acharya, 2017, Sharma, Dhere, Pachori, Acharya, 2017, Sharma, Pachori, Acharya, 2015a, Sharma, Pachori, Acharya, 2015b). Less number of studies have been conducted for the classification of focal EEG signal using this database. Moreover, some of the studies have used first 50 EEG files for

Conclusion

The objective of this study is to provide an automated detection tool for bio-marking of focal and non-focal EEG signals. Such a technique helps neurologist in clinical decision making, reducing errors due to manual classification and treating more subjects. Therefore, an application called CADFES was developed on MATLAB platform which can analyze 10s EEG to classify focal and non-focal EEG signals. Neighborhood component analysis was used to assess the significant features with tuning of

Conflict of Interest

The authors do not have any conflict of interest to declare.

Acknowledgment

Authors would like to thank Dr. R.G. Andrzejak, for providing permission to use EEG database for research work. The authors would also like to thank the anonymous reviewers for their helpful comments and suggestions that significantly improved the quality and clarity of the manuscript.

References (74)

  • W. Li

    Outlier detection and removal improves accuracy of machine learning approach to multispectral burn diagnostic imaging

    Journal of Biomedical Optics

    (2015)
  • S.L. Moshé et al.

    The international league against epilepsy at the threshold of its second century:year 1

    Epilepsia

    (2011)
  • T.K. Padma Shri et al.

    Comparison of t-test ranking with PCA and SEPCOR feature selection for wake and stage 1 sleep pattern recognition in multichannel electroencephalograms

    Biomedical Signal Processing and Control

    (2017)
  • M.Z. Parvez et al.

    Epileptic seizure prediction by exploiting spatiotemporal relationship of EEG signals using phase correlation

    IEEE Transactions on Neural Systems and Rehabilitation Engineering

    (2016)
  • S. Raghu et al.

    Features ranking for the classification of epileptic seizure from temporal EEG

    IEEE International Conference on Circuits, Controls, Communications and Computing (i4c)

    (2017)
  • M. Sharma et al.

    An integrated index for the identification of focal electroencephalogram signals using discrete wavelet transform and entropy measures

    Entropy

    (2015)
  • C.A. Teixeiraa

    EPILAB: A software package for studies on the prediction of epileptic seizures

    Journal of Neuroscience Methods

    (2011)
  • U.R. Acharya et al.

    Use of principal component analysis for automatic classification of epileptic EEG activities in wavelet framework

    Expert Systems with Applications

    (2012)
  • D.W. Aha et al.

    Instance-based learning algorithms

    Machine Learning

    (1991)
  • T. Alotaiby et al.

    A review of channel selection algorithms for EEG signal processing

    EURASIP Journal on Advances in Signal Processing

    (2015)
  • R.G. Andrzejak et al.

    Nonrandomness, nonlinear dependence, and nonstationarity of electroencephalographic recordings from epilepsy patients

    Physics Review

    (2012)
  • A. Babloyantz et al.

    Low-dimensional chaos in an instance of epilepsy

    PNAS, Proceedings of the National Academy of Sciences

    (1986)
  • M. Bedeeuzzaman et al.

    Automatic seizures detection using higher order moments

    In Proceedings of the International Conference on Recent Trends in Information, Telecommunication and Computing

    (2010)
  • J. Birjandtalab et al.

    Nonlinear dimension reduction for eeg-based epileptic seizure detection

    Biomedical and Health Informatics (BHI), 2016 IEEE-EMBS International Conference

    (2016)
  • J.G. Bogaarts

    Optimal training dataset composition for SVM based age independent, automated epileptic seizures detection

    Journal of Medical Biological Engineering Computing

    (2016)
  • Quick Introduction to Algorithms in Machine Learning....
  • E.B. Bromfield et al.

    Basic mechanisms underlying seizures and epilepsy

    An Introduction to Epilepsy, American Epilepsy Society West Hartford, USA

    (2006)
  • D. Chen et al.

    Epileptic focus localization using EEG based on discrete wavelet transform through full-level decomposition

    IEEE International Workshop on Machine Larning for Signal Processing

    (2015)
  • D. Cho et al.

    IEEE Transactions on Neural Systems and Rehabilitation Engineering

    (2017)
  • C.K. Chua et al.

    Automatic identification of epilepsy by HOS and power spectrum parameters using EEG signals: A comparative study

    Proc IEEE Engineering in Medicine and Biology Social Conference

    (2008)
  • A.B. Das et al.

    Discrimination and classification of focal and non-focal EEG signals using entropy-based features in the EMD-DWT domain

    Biomedical Signal Processing and Control

    (2016)
  • R.A. Fisher

    The use of multiple measurements in taxonomic problems

    Annual of Eugenic

    (1936)
  • Y. Freund et al.

    A decision-theoretic generalization of on-line learning and an application to boosting

    Journal of Computer and System Sciences

    (1997)
  • H.R.A.R. Ghayab et al.

    Classification of epileptic EEG signals based on simple random sampling and sequential feature selection

    Brain Informatics

    (2016)
  • H. Ghosh-Dastidar et al.

    Principal component analysis-enhanced cosine radial basis function neural network for robust epilepsy and seizure detection

    IEEE Transactions on Biomedical Engineering

    (2008)
  • J. Goldberger et al.

    Neighbourhood components analysis

    Advances in Neural Information Processing Systems

    (2005)
  • B.R. Greene

    A comparison of quantitative EEG features for neonatal seizures detection

    Clinical Neurophysiology

    (2008)
  • Cited by (0)

    View full text