Heartbeat type classification with optimized vectors.

In this study, a feature vector optimization based method has been proposed for classification of the heartbeat types. Electrocardiogram (ECG) signals of five different heartbeat type were used for this aim. Firstly, wavelet transform (WT) method were applied on these ECG signals to generate all feature vectors. Optimizing these feature vectors is provided by performing particle swarm optimization (PSO), genetic search, best first, greedy stepwise and multi objective evoluationary algorithms on these vectors. These optimized feature vectors are later applied to the classifier inputs for performance evaluation. A comprehensive assessment was presented for the determination of optimized feature vectors for ECG signals and best-performing classifier for these optimized feature vectors was determined.


Introduction
Electrocardiogram (ECG) signals have a critical importance for determining abnormal heart conditions. Computer-aided analysis of these data is particularly considerable regarding the development of smart medical platforms. For this purpose, it is necessary to automatically detecting and recognizing heartbeats from ECG signals. Heartbeat detection process from the ECG signals is not easy to realize because of the various types of noise, which exist in the ECG signals [1]. Different techniques have been comprehensively analyzed for beat detection such as digital filter usage [2,3], the wavelet transform (WT) [4][5][6][7], neural networks (NN) [8,9], hidden markov model [10] and particle swarm optimization (PSO) [11]. Detected heartbeat signals are segmented to be used in the classification systems. There are two main stages in the ECG signal classification. These are feature extraction and classification stages. In the feature extraction stage, distinguishing features of ECG signals are revealed. These features are used to generate feature vectors for each signal. Feature extraction is required to remove unnecessary, noisy or corrupted inputs. This stage improves the accuracy of classifiers which are used in the consequent classification stage. In the classification stage, a suitable classifier was trained on the obtained features.
WT methods such as multiresolution and discrete types [12][13][14][15][16] are frequently used for feature extraction process. There are also other studies where higher order statistic [17] and mathematical morphology method [18] were employed. In the classification stage, different classifiers can be applied such as extreme learning machines [19] and support vector machines [20][21][22]. NN classifier also was widely used in classification studies [23][24][25]. In this study, feature vector optimization approach was used to classify heartbeat signals. The PSO search [26], genetic search [27], best first-greedy stepwise [28] and multi-objective evolutionary search [29] methods are applied on these feature vectors for reducing the computational complexity of the overall process. The optimized vectors were passed as inputs to the classifier algorithms. Random forests (RF) [30] and least square support vector machines (LS-SVM) [31] classifiers were selected for determining the heartbeat types. Experimental results were conducted on an ECG dataset which includes five different heartbeat type. These ECG signals are collected from the MIT-BIH arrhythmia database. The rest of this paper is organized as follows: Section 2 briefly presents the feature optimization problem. Section 3 describes the proposed method in detail. Section 4 presents the experimental results and Section 5 concludes the paper.

Feature optimization
Feature optimization is one of the most significant challenges in data analysis studies due to the huge volume of data to be processed. Feature selection and optimization reduce the dimension of the data by removing unnecessary features so that these processes improve the performance of algorithms. There are variously supervised, semi-supervised and unsupervised feature optimization techniques in the literature. The main idea of this optimization process is gathering a subset of existing features by eliminating features which are containing relatively little information. The relevance relation of a set and the target class should be defined to facilitate the feature optimization. Let X set denotes the input features and Y set denotes the relevant classes.
Among {x, y} pairs, the objective function of feature selection is finding a subset of pairs which can be defined as follows, In equation (2), O function is the feature optimization function which calculates the accuracy of a feature subset. In feature optimization, the search space contains all the possible subsets of features so that feature optimization can significantly affect the performance of ECG signal classification. Consequently, feature selection and optimization problem is an NP-hard problem [32] so that metaheuristics such as evolutionary algorithms are frequently considered in creating a solution space when there is a large number of features [33]. It is essential to determine the best techniques for specific tasks such as beat detection and recognition. In this study, we analyze ECG signal with various optimization methods and classifier algorithms.

Feature vectors
The first step in the process of determining the optimal feature vectors for ECG signals is to make feature deductions on these signals. For this purpose, 6 level Dabuchies (Db6) [34] wavelet transform method was used in the study. By using wavelet transform, coefficient matrices are obtained for the signals separated into lower frequency bands. The steps of this process are shown in detail in Figure 1. In the wavelet transform process, input signals are passed through high-pass and low-pass filters after each conversion. These filters provide a detailed analysis of high and low-frequency components of the signal. As a result of the wavelet transform, approximation (A n ) and detail (D n ) coefficients are formed at each level of the input signal. Input Signal Coefficient matrices do not have an appropriate use because of the large size data classifiers they contain. For this reason, data in these coefficient matrices have to be reduced to lower dimensional representing data. Statistical methods were commonly employed for this purpose. In this study, energy, mean, standard deviation and norm entropy methods were used. Energy calculation on coefficient matrices can be defined as: (3) where C denotes coefficient matrices, and N is the size of these matrices. Finally, M denotes the number of the sub-bands. The average of coefficient matrices is calculated as follows, Thus, an average value is obtained for the coefficient matrices of each frequency sub-band. Another employed method, standard deviation calculation, is as follows, Lastly, the norm entropy calculation for each coefficient matrix is obtained by the following equation.
The coefficient matrix for each input signal consists of a total of 7 coefficient matrices as:  (7) where k is number of feature vectors, and j is number of sub-bands respectively. PVC is an abnormal condition that the heartbeat is initiated by ventricular Purkinje fibers rather than by the sinoatrial node, which is the normal heartbeat initiator. As a result, extra contractions occur, and the regular heart rhythm breaks down. An illustration of the PVC signal is given in Figure 3. An example ECG signal for paced, another abnormal heartbeat, is shown in Figure 4. The Bundle branch block is a delay in the way of electrical impulses which are ejected to provide a heartbeat. This delay can occur in the right or left ventricles of the heart. If this delay happens in the right ventricles, the RBBB heartbeat shown in Figure 5 occurs, and if this delay happens in the left ventricle, the LBBB heartbeat shown in Figure 6 occurs.

Experimental results
In order to determine the optimal features for ECG signals, five different heartbeat classes were selected from the MIT-BIH arrhythmia database. The 50% of the data were used in the training phase while the rest were used for testing. The numerical distributions of data classes are given in Table 1. Statistical methods have been used on 6-levels Db6 wavelet transform coefficients to generate 28dimensional feature vectors of ECG signals. The properties and definitions in feature vectors are given in Table 2. Feature vectors that have 28-dimensional for each heartbeat signal are available as input data. PSO, genetic search, best first, greedy stepwise and multi objective evolutionary algorithms are used for feature optimization. The optimal features obtained after applying these methods to the feature vectors are given in Table 3. PSO, best first and greedy stepwise methods were determined 17 feature for this signals. The lowest number of features is determined by the genetic search method while the multi objective evolutionary search algorithm determines nine features. We have tested the classification accuracy with feature vectors which are optimized for evaluating the performance of the optimization algorithms. For this purpose, LS-SVM and RF classifiers were used in the classification of ECG signals. The recognition accuracy is determined by applying the feature vectors obtained from the optimization algorithms to these classifier inputs.    As seen in Table 4, the best performance regarding both feature size and classification performance has been obtained by the genetic search method. RF classifier provides 98.5% performance on ECG datasets having eight features and being optimized by using genetic search. The RF classifier achieved 98.82% performance when the 28-dimensional feature vector containing all the features was given as input. As a result, both feature size has been drastically reduced, and the performance has increased. The PSO algorithm increased the performance from 97.30% to 97.34% with the LS-SVM classifier and reduced 11 features from the feature vector. Other optimization techniques have reduced feature size, but at the same time, they also reduced the performance.

Conclusion
In this study, feature vector optimization methods were analyzed in the classification of ECG signals. Five different heartbeat classes from the MIT-BIH arrhythmia dataset were used in experimental studies. Feature vectors containing distinctive features of signals are obtained by using wavelet transform and statistical methods on heartbeat signals. Various optimization algorithms have been used to optimize the 28-dimensional feature vector. Feature vectors obtained from these optimization algorithms are given as input to LS-SVM and RF classifiers. The feature vector, which is a total of 28 dimensions, was reduced to 8 dimensions as a result of genetic search optimization algorithm, resulting in 98.95% performance. When 28 features are used, it is seen that this performance is 98.82%. With the genetic search optimization algorithm, both the feature vector is reduced, and the recognition performance is improved. In addition, the number of feature vectors is reduced by the PSO algorithm, and the recognition performance is preserved. Another important point is the selection of the classifier. The success achieved by the genetic search algorithm with the LS-SVM classifier was as low as 88.52%, but the success rate with the RF classifier increased to 98.95%. As a result of experimental studies, it has been observed that the feature vectors significantly affect the performance in recognizing ECG signals. Further, it was shown that how the selected classifier can lead to a better performance on optimized feature vectors.