Abstract

The electroencephalogram (EEG) examination provides information on the brain’s electricity, especially in cases of epilepsy. Since the characteristics of EEG signals are nonlinear and nonstationary, visual inspection becomes very difficult. To overcome this problem, digital EEG signal processing was developed. Automatic epileptic EEG recognition is an area of interest on which much research focuses. The complexity approach to EEG signal analysis is interesting to be used as feature extraction, referring to the nonlinear characteristics of the signal. This study proposed an automatic epileptic EEG classification method based on the multiscale Hjorth descriptor measurement. EEG signals consisting of normal, interictal, and seizure (ictal) were simulated. The signal is scaled into new signals using the coarse-grained procedure on a scale of 1–20. Then, the Hjorth parameter which consists of activity, mobility, and complexity is calculated on the new signal. This process produces a feature vector that is used in the classification stage. Support vector machine (SVM) is used to evaluate the proposed feature extraction method. Simulation results showed that the Hjorth parameter on a scale of 1–15 yields 99.5% accuracy. The proposed method is expected to be applied to digital EEG for seizure detection and prediction.

1. Introduction

Bioelectrical signals carry information about a person’s health status [1]. Bioelectrical signals are an accumulation of biological processes at the cellular, tissue, organ, and organ system levels. Some have very complex characteristics, including the electroencephalogram (EEG). EEG is a signal generated by brain cell activity coordinating body functions [2]. EEG is essential in analyzing brain function abnormalities, for example, epilepsy. Epilepsy is the most common neurological disorder due to the excessive activity of a group of neuron cells in the brain [3, 4]. Epilepsy can trigger an unprovoked seizure [5], requiring treatment so that the worst conditions do not occur. It can be done by determining treatment in patients by analyzing the pattern of EEG signals. Visual inspection requires excellent skill and precision, which tends to be time-consuming [6, 7]. This requires high costs when analyzing signals in a large patient population. Since the development of computer-based signal computing methods, many researchers have developed automatic recognition.

Characterizing and detecting epileptic EEG is an essential step in studying epilepsy [8]. This will determine the proper treatment for seizure prevention [4, 9]. Detections include normal, preictal, interictal, and ictal stages. Researchers have proposed automatic detection algorithms with various mathematical approaches. This approach includes analysis of the time, frequency, and time-frequency domain. Recently, complexity analysis on EEG signals has been simulated [10]. Other methods such as deep learning have been proposed in the detection of epileptic EEG as reported in [11]. These studies aim to find the best performance with the highest accuracy with low-cost computing. To simulate seizure detection, the University of Bonn shared an epileptic EEG dataset, with many studies proposing various methods for detecting ictal EEGs on that dataset.

A study by Martis et al. proposed a method for feature extraction of epileptic EEG using empirical mode decomposition (EMD) and entropy, and it produced the highest accuracy of 95.3% [12]. Samiee et al. proposed the rational discrete short time Fourier transform (DSTFT) for feature extraction [13]. These features were classified using a multilayer perceptron (MLP) and produced the best accuracy of 99.8%. However, simulations are only carried out on two classes of data. Another study by Mohammadpoory et al. proposed the weighted visibility graph entropy (WVGE) to identify seizures from EEG signals. Experimental results show that the proposed method produces the highest accuracy of 97% for classifying normal, ictal, and interictal [14]. Bhattacharyya et al. presented the tunable Q-factor wavelet transform and KNN entropy for seizure detection [15]. This study provides 100% accuracy but only in two classes of normal and ictal data. However, feature extraction methods using a time-frequency domain approach tend to require high computational costs. A study by Tsipouras used the frequency approach for extracting EEG spectral information in epileptic EEG classification. This study obtained 98.8% accuracy for normal, interictal, and ictal EEG classification [16]. Previous studies related to epileptic EEG have shown high performance in classification or detection. However, there are still opportunities in developing methods to improve accuracy and solve classification problems with more than two classes. Another issue is developing a feature extraction method with low computational costs using a time domain analysis approach.

In this paper, we developed a new protocol for epileptic seizure detection. This study is one of our concerns, whereas in previous studies, a wavelet and entropy-based approach was simulated. We proposed a multiscale Hjorth descriptor as a method for feature extraction in epileptic seizure detection. This multiscale method was adopted from the previous study by Rizal et al. [17], where new signals will be generated using coarse-grained procedures. The new signals which are generated are similar to the original signal, which is sampled to N-scales. Hjorth parameters, namely, activity, mobility, and complexity, are calculated at each scale, becoming a feature vector. A support vector machine with 10-fold cross-validation was employed to evaluate the performance of the proposed method, and the measured performance parameter is accuracy. To test the robustness of this method in epileptic seizure detection, we simulate it in six scenarios. These scenarios are explained in Section 2. The protocol proposed in this paper can be recommended as a powerful method for detecting epileptic EEG.

2. Materials and Methods

Figure 1 displays the procedure of the proposed method. The multiscale process was the coarse-grained procedure often used to analyze ECG signals [18], lung sounds [19], and others. The Hjorth descriptor was measured on the signal from this process to get the signal characteristics [17]. Then, we used SVM with 10-fold cross-validation to classify the EEG signals. There was also a feature reduction in this classification process to see the effect of the number of scales on classification accuracy. Details of each method are explained in the next sections.

2.1. Epileptic EEG Database

In this study, open access EEG datasets were employed to evaluate the performance of the proposed method. EEG data were recorded from epilepsy patients and healthy subjects at Bonn University. All EEG records were performed using a 128-channel amplifier system with a sampling frequency of 173.61 Hz and 12 bit resolution. The multichannel electrode recordings were segmented into a 23.6 seconds of EEG recordings after removing the artifacts from muscle and eye movements. The EEG dataset consists of 5 groups (F, N, O, Z, and S), each group contains 100 EEG segments. The F signal had EEG records of seizure-free interval (interictal) conditions at the epileptogenic zone, while N signal was EEG records of seizure-free intervals (interictal) from hemisphere hippocampal formation. O and Z subsets are EEG records of healthy volunteers with eyes closed and open, respectively. The S subset contained EEG records when the seizure occurred (ictal activity) [20]. In general, the proposed method was evaluated for ictal vs. nonictal classification. The test scenario consists of OZ-NF-S, OZNF-S, Z-N-S, Z-F-S, O-N-S, and O-F-S. Figure 2 shows the examples of EEG signals analyzed in this study.

2.2. Feature Extraction with Multiscale Hjorth Descriptor

Hjorth descriptor had formerly functioned to analyze the EEG signal on the time domain. Then, Hjorth descriptor was also used to analyze electromyogram (EMG) [21], ventricle repolarization on ECG signal [22], and lung sound processing [17]. This method includes three parameters, i.e., activity, mobility, and complexity [23], which are shown in the following equations:where x(n) is the signal and σx is the variation of x(n) andwhere and are the first-order and second-order x, respectively.

We modified the multiscale entropy as in the previous research [17]. We changed the entropy with the Hjorth descriptor to have a multiscale Hjorth descriptor. The multiscale process used is often referred to as a coarse-grained procedure, as shown in the following equation:

The sample in the previous simulation used up to 60 features, which were generated on a scale of 20. The illustration of the coarse-grained procedure is shown in Figure 3.

2.3. Evaluation with Support Vector Machine (SVM) Dan N-Fold Cross-Validation

In this study, a support vector machine (SVM) is proposed to predict the qualitative characteristics of EEG signals (seizure, interictal, and normal). In this case, high accuracy is the main goal in research that relies heavily on the quality of features as SVM input. Therefore, feature selection scenarios are critical because they affect accuracy. SVM was chosen because it has an excellent ability in medical applications, one of which is to classify EEG signals, as reported in research [1, 24, 25].

SVM is a linear classification with a hyperplane on a flat plane as a separator between classes. However, many cases must be resolved nonlinearly, so the SVM concept is developed to solve nonlinear issues using kernel tricks. SVM can map the feature sets into new spaces, so separation between classes becomes more assertive. From this explanation, it is clear that SVM can find the best hyperplane to separate two data classes [26]. Initially, SVM was designed for binary classification problems. In multiclass applications, slight modifications are made by comparing one class to another at the same time. Thus, this will result in n classifiers, where n is the number of classes with n hyperplanes [27].

The best hyperplane is obtained by maximizing the margins between sets of features from different classes. Margin is the distance between the hyperplane and the closest pattern in each data class (see Figure 4). The most consolidated position between the patterns of each class is called the support vector. In this research, quadratic SVM and cubic are used to compare linear SVM.

Since SVM is a method that requires training (supervised learning), this study uses N-fold cross-validation (NFCV) to spilt training data and test data. In NFCV, each data class is divided into N data sets. The N − 1 data set is used as training data, and one data set is used as test data. The process is repeated up to N times so that each data set has become test data and training data [28]. Accuracy is taken from the average of all trials conducted [17]. The advantage of this method is to avoid overfitting. The proposed method’s performance parameter is the accuracy of the number of samples correctly classified by the system.

3. Results and Discussion

Figure 5 shows an example of a signal scaled using a coarse-grained procedure on ictal EEG. The sample size for each level becomes N/τ (level). The new signal is the average of the closest sequential samples; therefore, there is no significant change in signal shape visually. Then, activity, mobility, and complexity are calculated at each level as feature vectors. The average activity, mobility, and complexity values of normal, interictal, and ictal EEG are presented in Figure 6. The ictal state has the highest activity value compared with others, representing the excessive activity of neuronal cells. The activity value represents the power of the signal [23, 29, 30]. Figure 6 shows the activity values on all EEG stages tend to decrease because the signal variation is reduced on a larger scale. Hjorth mobility in the ictal state also tends to be higher than normal and interictal on a scale of more than 7.

In contrast, Hjorth’s complexity in ictal EEG is the lowest compared with others. This represents that the complexity of EEG signals decreases during seizures, or it can be thought that EEG waves tend to be stationary or more regular. Based on studies [31, 32], ictal state is less complex than other state. Finally, the performance evaluation of the proposed method was carried out with SVM and 10-fold cross-validation. We also broke down the feature into three scenarios based on a scale of 1–5, 1–10, 1–15, and 1–20 to evaluate the effect of scale on the accuracy. Since Hjorth parameter produce three features from each scale, thus for each scenario, there are 15, 30, 45, and 60 features, respectively. Figures 712 show the accuracy results for all scenarios.

Figure 6 shows the average activity, mobility, and complexity value for each class using the scales from 1 to 20. It is shown that the activity values of ictal decrease when the scales rise. For mobility value, the value increases and begins to flat when it reaches a scale of 10. The complexity value is similar to the activity values. It decreases on higher scales. Activity value as shown in Figure 6(a) for normal and interictal conditions tends to be the same for all scales. It can be concluded that in interictal conditions, people with epilepsy are like in normal conditions where no seizure symptoms occur. The characteristics obtained indicate that the activity is good enough to distinguish between seizures and nonseizures (normal and interictal). Meanwhile, to distinguish three conditions (normal, interictal, and seizure), it is not enough to use one Hjorth parameter but must combine all three with a particular scale. Roughly speaking, it can be seen that for mobility, the three classes are separated on a scale of 9–11, while for complexity, they are separated on a scale of 7–9. In general, the use of the 1–20 scale is adopted from the study by Costa et al. [18]; with a scale of 20, the variance of the signal tends to decrease and become stagnant. Testing will be carried out at the performance measurement stage using a scale as in previous studies [17].

Figures 712 show the accuracy of the EEG classification using three SVM kernels. They are linear, quadratic, and cubic SVM kernels. The feature is selected based on the scales of the multiscale process. It is shown that the cubic kernel produces higher accuracy results compared to the two other kernels. This study achieves higher accuracy compared to Dhar et al. [33], Wijayanto et al. [1], and Orhan et al. [34]. On the contrary, this study has lower performance compared to studies by Li et al. [6] and Bhattacharyya et al. [15]. However, this study can still compete in terms of the number of features used.

Table 1 shows the comparison of the proposed method with other studies that used multiscale analysis and the same dataset. The study by Orhan et al. [34] used discrete wavelet transform (DWT), K-mean clustering, and multilayer perceptron to obtain the highest accuracy by using 56 and 18 features. Dhar et al. [33] used 42 features from cross wavelet transform and achieved the highest accuracy for ZO-NF-S. Bhattacharyya et al. [15] obtained the highest accuracy of 99% and 98.6% for ZONF-S and ZO-NF-S, respectively, by using 16 numbers of features. Li et al. used wavelet-based nonlinear analysis and SVM obtained the highest accuracy using 24 features [37]. The result of the two studies mentioned above shows a higher accuracy compared to this study. However, this study achieved a competitive result by using fewer features. In addition, the feature extraction method proposed in this study is simpler in computation. Nicolaou and Georgiou [38] and Wijayanto et al. [1] used fewer features than this study. Even so, our study achieves higher accuracy in the ZONF-S case compared with the study by Nicolaou and Georgiou [38] and Wijayanto et al. [1]. Furthermore, this study has competing results for other cases, even with better performance in the O-F-S case, compared with Wijayanto et al. [1]. The confusion matrix of the result is shown in Figure 13.

The study expanded its evaluation by incorporating the CHB-MIT dataset. In this supplementary analysis, the researchers extracted both normal and ictal conditions from the dataset, which includes recordings from 139 patients, encompassing a total of 196 seizure conditions. The ictal conditions were designated as the seizure class, while the normal conditions were drawn from the time immediately preceding the ictal events, matching the duration of the seizure conditions. Applying the proposed method to this dataset yielded an accuracy rate of 82.7%.

This research focuses on the development of a feature extraction method, namely, the multiscale Hjorth descriptor. For this reason, the classifier used is a classifier commonly used today (SVM) so that it is easy to compare with other feature extraction methods. This is shown in Table 1, where many researchers use SVM as a classifier. The use of more advanced machine learning methods will be carried out in the next research.

4. Conclusions

This paper proposes a method of classification of EEG signals in the case of epileptics based on the Hjorth descriptor. Hjorth descriptor is calculated on the ECG signal after passing the coarse-grained procedure. Coarse-grained procedures will produce EEG signals at various scales. This process will make the signal dynamics more visible, making it easier to distinguish using the Hjorth descriptor. The highest accuracy of 99.5% for three data classes uses 15 scales. These results indicate that the proposed method provides higher accuracy compared to previous studies. Some things can be done for further research. In addition to exploration using a variety of other machine learning methods, the investigation of other multiscale methods is an exciting research topic in the future.

Data Availability

The data used to support the findings of this study are available from open databases https://repositori.upf.edu/handle/10230/42894.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

AR conceptualized, supervised, and validated the study, was responsible for formal analysis, and reviewed the original draft. IW proposed the methodology, provided the software, validated and visualized the study, and edited the original draft. SH provided the software, validated and visualized the study, and wrote the original draft. SA proposed the methodology, validated the study, was responsible for formal analysis, and reviewed and edited the article. TT was responsible for formal analysis and reviewed and edited the article. ZS visualized the study, provided the software, and edited the article.

Acknowledgments

The authors would like to thank to BioSPIN Research Group, Telkom University, and Biomedical Engineering and Pharmaceutical Sciences Research Group, Mohammed V University, for supporting this research. This research was funded by the Directorate of Research and Community Services of Telkom University.