Two-Stage Feature Selection Algorithm Based on Supervised Classification Approach for Automated Epilepsy Diagnosis

--Epileptic diagnosis is generally achieved by visual scanning of Interictal Epileptiform Discharges (IEDs) using EEG recordings. The main objective of this research is to select a smallest relevant feature subset from the original dataset in order to reduce the diagnosis time and increase classification accuracy by removing irrelevant and redundant features. For this purpose we suggest a two-stage feature selection algorithm based on supervised classification approach adopting successively a wrapper feature selection and a wrapper feature subset selection method. Matlab simulation results illustrate that through comparing the two classifiers, the highdimensionality is reduced at only one relevant feature that showed classification metrics of 100%. The epilepsy diagnosis is successfully tested in the discriminant Fisher-space with the single-best relevant feature. Keyword---Cross-validation, Classification metrics, EEG, Feature selection, IED’s, LDA, Mahalanobis Distance Classifier. QDA, Supervised classification,


I. INTRODUCTION
Epileptic is a neurological disorder marked by sudden recurrent episodes of sensory disturbance, loss of conscience, convulsions, associated with abnormal electrical activity in the brain. The confirmation of the existence of an epileptic diseases is based on visual detection of isolated Interictal Epileptiform Discharges (IEDs) (spikes or spikewaves complex), using EEG (Electroencephalogram) signal recordings in certain brain areas , for example, the confirmation of the epileptic-absence type is based on presence of a spike-waves rhythmic at 3 Hz [1], [2]. This technique is inaccurate, fastidious and too time consuming.
The aim of our research is to establish an automated diagnosis of epileptic disease employing a supervised classification approach. Fig1 shows the different sections of the article:

Fig.1 Block diagram of automatic diagnosis process
To create a training set, we need to build a knowledge database composed of normal EEG sample and epileptic EEG sample. Feature extraction is an essential preprocessing step to pattern recognition and machine learning problems. To build the training set, the signal pattern may be described by three field analysis: Time field, [4] suggested to classify the unknown EEG signal into "Normal" or "Epileptic" classes. For an optimal visualization of both of them, the samples are projected in the linear Fisher space [18], [19] using Fisher linear Discriminant Analysis (FDA) that consists of seeking the optimal directions that are efficient for discrimination.

A. Knowledge Database
The population selected is composed of 20 labeled single-EEG signals (derived from the Neurology department of University Hospital of Sousse-Tunisia), sampled at a frequency F = 200Hz, segmented at 1 second epoch, and filtered from artifacts, divided into two groups: 10 normal signals for the first group and 10 epileptic signals for the second group. These signals will be modeled by a set of features to form the training set that will be used in the feature selection process.

B. Feature Extraction
In feature extraction process, we have adopted the statistical analysis approach from each single-EEG signal.
Feature vector is composed of 48 features that are extracted from time, frequency and time-frequency fields (Table1):

C. Training Dataset
The training dataset is represented as (nxd) data pattern, it is defined as: x : General term of training dataset The signals are manually labeled and ordered into two groups, normal and epileptic, by an expert neurologist.
The "normal" group is defined by the following dataset: The "Epileptic" group is defined by the following dataset:

2) Sequential Backward Selection stage
In the second algorithm stage, to reduce the dimensionality of

E. Mahalanobis distance classifier (MDC)
Mahalanobis Distance Classifier computes the distance d( , ) unk k xm between unknown EEG feature vector and the two classes "Normal" and "Epileptic" as follow: Xunk: Unknown feature vector; mk: Mean of the k th class; T: Covariance matrix of the learning dataset XTR

A. First-stage experimental results
In   stage, a 5-fold cross-validation procedure is applied in QDC-classifier in order to estimate the metrics of each feature (Fig 4) and the algorithm selects only the features having the higher metrics (Table 3). , , ,    The diagnostic result was successfully tested (Fig.6)