An Empiric Analysis of Wavelet-Based Feature Extraction on Deep Learning and Machine Learning Algorithms for Arrhythmia Classification

The aberration in human electrocardiogram (ECG) affects cardiovascular events that may lead to arrhythmias. Many automation systems for ECG classification exist, but the ambiguity to wisely employ the in-built feature extraction or expert based manual feature extraction before classification still needs recognition. The proposed work compares and presents the enactment of using machine learning and deep learning classification on time series sequences. The two classifiers, namely the Support Vector Machine (SVM) and the Bi-directional Long Short-Term Memory (BiLSTM) network, are separately trained by direct ECG samples and extracted feature vectors using multiresolution analysis of Maximal Overlap Discrete Wavelet Transform (MODWT). Single beat segmentation with R-peaks and QRS detection is also involved with 6 morphological and 12 statistical feature extraction. The two benchmark datasets, multi-class, and binary class, are acquired from the PhysioNet database. For the binary dataset, BiLSTM with direct samples and with feature extraction gives 58.1% and 80.7% testing accuracy, respectively, whereas SVM outperforms with 99.88% accuracy. For the multi-class dataset, BiLSTM classification accuracy with the direct sample and the extracted feature is 49.6% and 95.4%, whereas SVM shows 99.44%. The efficient statistical workout depicts that the extracted feature-based selection of data can deliver distinguished outcomes compared with raw ECG data or in-built automatic feature extraction. The machine learning classifiers like SVM with knowledge-based feature extraction can equally


I. Introduction
T HE automation in electrocardiogram (ECG) measurement enables users to monitor their cardiac signals using smart portable devices like wearables [1]. Any heart complexity is immediately observed, reported, or consulted to the experts. With these advancements, ECG classification and analysis are upgraded from machine learning to deep learning. The change of data from 1D to 2D or 3D or vice versa requires high accuracy and low computational time. The computer configuration needs to get compatible with new technologies.
There are two phases for the automatic detection and realization of any cardiac anomaly. These phases are feature extraction and classification, such as binary or multi-class. The feature extraction stage gives flexibility to any algorithm to become efficient and increase the performance rate. It is based on a thorough knowledge of the inputs and dataset. With expert experience added, it becomes a powerful tool to extract the desired features easily. If features extracted are large in dimensions or direct data samples are acquired, the need comes from feature compression [2] or reduction. This feature selection filters primary significant features that make an easy input for classifiers. The second stage is classification, where the classifier algorithm gets trained by the collected input feature dataset to predict the test data and unknown data. This type of automation is seen in traditional models that use artificial intelligence and machine learning. The traditional models require a separate feature extraction module like features extracted by experience, signal processing techniques, and classification algorithms. These may include wavelet features [3], [4], [5], Principal Component Analysis (PCA) [6], Independent Component Analysis (ICA) [7], [8], and statistical features [9]. Wavelet Transform (WT) has shown a high impact on ECG analysis as wavelet decomposition gives its sub-bands and coefficients at different levels. This disintegration helps in finding unique features for analysis. A wavelet design devoted to noise suppression with the Hidden Markov Model (HMM) gives successful multi-classification with distinctive feature extraction [10].
Recently, technology up-gradation has given deep learning algorithms that have a single end-to-end structure for feature extraction and classification. These innovations have given many new classification algorithms like Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN) [11], [12], [13], a hybrid structure like CNN with Bidirectional LSTM [14] and active classification using deep learning networks [15]. There is another interesting combination of CNN and LSTM that feature extract and classify ECG signals of variable length and achieving accuracy of 98.10% [16]. These models learn features automatically and get trained.
This experimental study, analyze and compare BiLSTM network and SVM classification algorithm on 1D sequential ECG data. The paper contributes towards, • Implementing discrete wavelet-based denoising and Maximal Overlap Discrete Wavelet Transform (MODWT) based feature extraction method for extracting 6 morphological and 12 statistical ECG attributes.
• Providing no information loss due to time in-variant, nonorthogonal, less variable estimation, and stationary detail time series achieved by the multi-resolution analysis of MODWT.
• Illustrating the application and the data-based choice to use machine learning or deep learning for 1-D signals of arbitrary length.
• Conduction of a systematic experiment that demonstrates that SVM can perform as good as the BiLSTM network on the same benchmark PhysioNet ECG datasets in similar conditions.
In addition to this, the arrhythmic features are discussed and supervised by cardiac experts. The classification outcome shows that extracted featured ECG data yields higher performance than raw ECG data for deep learning and machine learning classification techniques.

A. Multi-resolution Wavelet Transform
Wavelet Transform (WT) has a wide application area for nonstationary electrical signals like biomedical. WT provides timefrequency information simultaneously. The signal representation at various frequency levels and analyzing it through high and low pass filters at different scales give the concept of multi-resolution analysis. MODWT is indifferent to the start point selection of a time series sequence. MODWT implements DWT twice, once to original series and another to its transformation, and then merges the outputs. MODWT coefficients are scaling (~s k,m ), wavelet (~w k,m ), approximation (~a k,m ) and detail (~d k,m ). These coefficients are described as, where ~g o = ~g, periodized to length N and ~h o = ~h, periodized to length N [17].
MODWT can manage arbitrary sample dimensions as it is an undecimated type of wavelet transform. The multi-resolution of MODWT exhibits the zero-phase filtering giving an advantage to the extracted features to be time-aligned. The characteristics like less variable estimation and content retention help MODWT be wellsuited with time series as recommended in [18], [19].

B. Support Vector Machine (SVM)
SVM represents supervised machine learning models implementing kernel functions for non-linear mapping space. SVM can handle binary and multi-class problems efficiently. Many real-world applications are successfully implemented using support vector classification. The working is based on an optimal separable hyperplane [20]. The hyperplane corresponds to a non-linear decision margin for classification.
SVM deals with noisy and sparse datasets efficiently. SVM is an exception in handling large and small datasets.

C. Bidirectional Long Short-Memory (BiLSTM) Network
After the growth of machine learning, RNN has ideally started by retaining and utilizing state information. Storing previous time information leads to a memory unit. An improvement over RNN, i.e., LSTM classifier has a gating mechanism that manages long term input data. It has three layers: input, forget, and output layer. For a complete long sequence of data, Bidirectional RNN proposes forward and backward state RNN.
BiLSTM network uses two LSTMs for both the past token state and future token state. The information is processed from left to right and vice-versa. For each time stride, there is a hidden forward layer containing an unknown unit function that operates on the previous hidden state, input forward state, and hidden back layer having a hidden unit that stores future hidden state and input to the current step. A long vector comprises forward and backward representation. Moreover, the final outputs are the predictions [21]. The detailed feature extraction and classification modules are structured in Fig. 1.

A. ECG Dataset Acquisition
The frequently used PhysioNet databases are involved in the present study. A detailed description of the dataset acquisition is tabulated in Table I. For the binary dataset, the PhysioNet 2017 Challenge [22] includes two types of ECG signals, such as Normal (N_S) and Atrial fibrillation (AFib_S). The data is stored at 300 Hz with 0.5-40 Hz of bandwidth. The direct samples of each signal give accurate signal statistics. The length of each signal is trimmed to 9000 samples for balanced data collection. The multi-class dataset requires three different ECG signals from three different PhysioNet databases, namely MIT-BIH Arrhythmia Database for Arrhythmia ECG Signal (A_S), the BIDMC Congestive Heart Failure Database for Congestive heart failure Signal (CHF_S) and MIT-BIH Normal Sinus Rhythm Database for Normal Sinus Signal (NS_S). The data collection has 65536 samples of each ECG recording, which is sampled at 128 Hz [23].

B. Pre-processing Unit
During the pre-processing stage, the collection of raw ECG samples is refined by two processes, such as normalization that returns data with the centre to zero and standard deviation to one. The amplitude variation is reduced to a minimum, and consistent data is available for further processing. The next step is to filter ECG and remove noise artifacts like baseline wander and power line interferences. In the present work, the discrete wavelet transform (DWT) is implemented using the Daubechies wavelet family (db4). The wavelet decomposition, removal of undesired detail, and approximate coefficient and reconstruction of signal results in filtered ECG signal [24]. Fig. 2 and Fig. 3 displays normalized and filtered ECG signal.

C. Feature Extraction
For the feature extraction process, a preliminary session was conducted to determine the difference between arrhythmic conditions involved in the present study. Cardiac experts supervise the feature recognition workout. MODWT and MODWT Multiresolution Analysis (MODWTMRA) are applied for extracting the distinctive attributes. The filtered ECG signal is decomposed to level 4 using Daubechies(db4) wavelet, and MRA is applied that results in detail  (D1, D2, D3, D4) and an approximation coefficient (A4). D4 exactly matches the original sample coordinates. So, it is used for extracting the morphological features using signal processing techniques [25].
For the binary dataset DB1, 18 feature vectors comprising 6 morphological and 12 statistical features are extracted. The morphological features are the amplitudes of prominent peaks (P, R, T, 'QRS' complex), RR interval, and Pcount. As it is observed in Fig. 2 that in atrial fibrillation P peaks are not prominent, and their count varies from normal ECG. Also, there is a difference in RR interval, Ramp, and Tamp. The statistics are applied to the coefficients reducing their dimensions to achieve better results. The attributes are mean of A4, standard deviation, and variance of D1, D2, D3, D4, and A4. And lastly, maximum MRA energy from all scales. The ECG signal dimension reduces from 5655 x 9000 to 5665 x 18 to be used by the classifier.
For the multi-class dataset (DB2), the ECG signal count is few for classification. So, beat segmentation from 162 ECG signals is required. The beat segmentation requires R peak location and 99 samples before R peak and 100 samples after R peak, comprising 120 samples for each heartbeat count. It is observed that the three different ECG signals such as A_S, CHF_S, and NS_S are very similar in morphological metrics, and only the slope and QRS width have shown variation, as presented in Fig. 3. So, these extracted 120 data points of every single heartbeat can directly be used. The ECG signal dimensions reduce from 162 x 65536 to 22400 x 120 ECG beats and can be used by the classifier.

D. Classification
The differentiating feature vectors of datasets DB1 and DB2 are inputted to the classifiers such as the BiLSTM network and SVM. The two categories of data are imported to a simple BiLSTM network layer. For DB1, the input to BiLSTM is direct samples (5665 x 9000) and featured data (5665 x 18). For DB2, the input to BiLSTM is direct samples (162 x 65536) and featured data (22400 x 120). The output size of the BiLSTM layer is kept 100 units, and the output mode is set to 'last' that maps input signal into 100 features. The other attributes of BiLSTM training are adaptive moment estimation, mini-batch size of 150 for each epoch, maximum epochs of 10, Initial learning rate as 0.01, and gradient threshold is set to 1 to stabilize output.
In parallel, SVM is also used as a classifier, and the input is 5665 x 18 featured ECG signals, and 22400 x 120 featured ECG beats. SVM uses three kernel functions that are linear, rbf, and quadratic or polynomial.

IV. Experimental Results
The proposed classification setup requires both the ECG signal as well as ECG beat. So, features are extracted, and beats are detected from the signal. For extensive performance analysis and evaluation, two different datasets are created from PhysioNet, namely DB1(binarydataset) comprising normal(N_S) and abnormal (AFib_S) signals, and DB2(multi-class) comprising three different ECG beats such as AB, CFB, and NSB. The classification results are realized using MATLAB (R2018 working environment for academic use), and NVIDIA Discrete graphics with GPU are used for the training process.
ECG data signals and beats are grouped as testing and training data. The training process helps the classifier train on existing data, whereas the testing process checks the accuracy of the classifier on unknown or new data. As for DB1, the AFib_S signals are very few compared to N_S (718: 4937), so data augmentation is proposed that is also known as oversampling. The MATLAB function 'repmat' is used for this purpose. As for DB2, the three different ECG beats are good in the count. So, there is no need of data repetition. The data partitioning scheme is not required for the BiLSTM network as the neural network shuffles the data automatically. Nevertheless, for SVM, 5-fold and 10-fold cross-validation schemes are implemented for DB1 and DB2, respectively. The proposed testing and training arrangement yield efficient results. Table IV gives training and testing of data information. Fig. 4 to Fig. 7 show the accuracy obtained with the BiLSTM network scope. Each plot is divided into two sections. The top section depicts the training process, and the bottom section depicts the training loss simultaneously. The respective confusion matrix is also shown. Fig.  4 presents the classification through the BiLSTM network for DB1 using direct ECG samples showing training and testing accuracy of 61.6% and 58.1%, respectively. Moreover, the same network inputted with a featured dataset, as shown in Fig. 5, depicts an improvement of training and testing accuracy of 81.5% and 80.7%, respectively. In the case of DB2, Fig. 6 shows the BiLSTM network with direct ECG samples, and Fig. 7 shows a vast improvement in training and testing accuracy from 88.8% to 95.9% and 49.6% to 95.4% respectively. Unlike the previous result, 120 segmented ECG data points help in the improvement of accuracy.
The statistical parameters are Overall Accuracy Analysis (OAA), Precision (%), Recall (%) and F1Score that are defined by, where TPR: True Positive Response, FPR: False Positive Response, FNR: False Negative Response, and TNR: True Negative Response. TPR means truly existing and detected signal. FPR means not a true response but detected. FNR means to be a true response but not detected. F1 Score means minimum and maximum optimal recognition. Table II and Table III tabulates the classification performance of binary and multi-class SVM.               Fig. 8 displays a scatter diagram for SVM that discriminates coefficients of binary and multi-class datasets. The experimental results predict that with a large number of beat counts and a large dataset, SVM gives better accuracy for non-linear and non-stationary biological signals like ECG compared to the BiLSTM deep learning network.

V. Discussion
The impact of employing different classification techniques on direct, in-built, and knowledge-based handcrafted features of binary and multi-class ECG datasets has shown consequential observations, as indicated in Table IV. The feature extraction before applying classification shows much better performance in the present study, and the same is also reported in [34]. For both the datasets, the accuracy rate of 95% and above is achieved only in the knowledge-based extracted features of ECG signals. The statistical variations can be justified by the points described below.

A. ECG Feature Set
In the case of a binary dataset, using knowledge-based 18 extracted attributes with the BiLSTM network results in an increase of 19.9 % training accuracy and 22.6 % of testing accuracy compared to using direct raw ECG samples.
The same feature set with SVM results in an increase of 19.18 % performance accuracy compared with the BiLSTM network. This means that if known features of arrhythmic ECG signal are differentiated and extracted, as shown in Fig. 2, machine learning can perform better than deep learning in such cases.
Similarly, for a multi-class dataset, ECG beat segmentation is done to demonstrate another positive impact of extracting PQRST data points of a single beat. These are hand-crafted direct 120 ECG data points of each heartbeat, as shown in Fig. 3. Using these features with the BiLSTM network results in an increase of 7.1 % training accuracy and 45.8 % of testing accuracy compared with using direct raw ECG samples or whole signal as input. The same feature set with SVM results in an increase of only 4.04 % accuracy compared with the BiLSTM network. This illustrates that instead of using all direct raw ECG samples, it is beneficial to use required and informative features with deep learning to increase performance accuracy above 95%. Also, machine learning algorithms like SVM can perform equal or better than deep learning networks like BiLSTM.

B. Performance Comparison with Existing Literatures
The efficient classification outcomes performed by different methods recently are illustrated in Table V. The robust feature extraction techniques like wavelet decomposition are used before classifiers like SVM, as reported in [26], [28]. Sahoo et al. [26] detected the QRS complex using MRA of WT with SVM classification on MIT-BIH ECG database of PhysioNet achieving 98.39% accuracy and a meager error rate 0.42%. In 2018, Pawel Pławiak achieved 98.85% accuracy on ECG fragments using feature extraction with pre-processing. ECG characteristics were estimated using PSD and tested using genetic optimization and selection before employing SVM classification on 1000 cardiac beats [27]. An ensemble SVM, i.e., multi SVM approach, is demonstrated with wavelet-based, HOS, LBP, and many amplitude values for feature extraction with specific SVMs [28]. The ensemble methodology implemented showed satisfactory performance of 94.50 % of accuracy.
The automatic in-built feature extraction concept is also known as the End-to-end technique, is used in deep learning algorithms, as reported in [29]- [33]. Zubair et al. [29] employed a small patient-specific ECG dataset to implement CNN achieving classification accuracy of 92.50 % for five different beats. Acharya et al. [30] proposed CNN to diagnose normal and myocardial beat with an accuracy of 95.22%. They investigated ECG beats with and without noise removed. Another CNN model was designed by Acharya et al. [31]  The accuracy of 80.7% achieved by the proposed BiLSTM networks using hand-crafted feature extraction, yet it is lower than the accuracy of 95.4 % achieved by proposed BiLSTM network using informative beat segmented direct ECG data points. Besides, the proposed SVM with MODWT extracted features outperforms CNN and BiLSTM networks with built-in or hand-crafted features by achieving an accuracy rate of 99.88% for binary and 99.44% for multi-classification respectively. More evidence is reported in [35] where the combination of MRA of DWT with Online Sequential Extreme Learning Machine (OSELM) as classifier has achieved a 99.44% accuracy rate for two classes and 98.51% accuracy rate for multi-class, respectively.

C. Limitations
In the present study, there is the usage of data augmentation for BiLSTM networks, 5-fold, and 10-fold cross-validation for SVM due to small sample size constraints. So, overfitting issues can exist. This limitation can be rectified by experimenting with large size datasets. Moreover, by using same datasets of different studies and same validation methods the results can be directly compared considering similar environment.

VI. Conclusion
The proposed work is an experimental research analyzing the classification capability using in-built feature extraction of deep learning with machine learning using distinctive knowledge-based feature extraction on time series sequential ECG data. BiLSTM network with automatic feature extraction is implemented on the publicly accessible and available PhysioNet 2017 Challenge dataset, and then the same two-class dataset is treated with SVM using manual feature extraction derived using MODWT, and MODWTMRA. The 18 feature vectors of normal and Atrial Fibrillation ECG signals are extracted under the supervision of cardiac experts. Another dataset comprising of three different classes from the PhysioNet database is also used. For this, feature extraction involves beat segmentation comprising 120 informative data points of each category of ECG beat. In both cases, under similar experimental scenarios, the raw ECG data is firstly fed to BiLSTM networks, then hand-crafted ECG features to the BiLSTM network and SVM. The research outcomes suggest that deep learning with in-built feature extraction cannot always be an efficient method for all types of ECG datasets. However, machine learning with manual feature extraction can prove to show better performance in certain experimental conditions. The pre-processing and feature extraction are two significant preliminaries before classification for one-dimensional data. The handcrafted feature extraction involves expert experiences and control of signal data. It is observed that for a long duration dataset instead of training BiLSTM with raw ECG samples, it is justified to train with informative segmented beat data points or distinctive vital feature set for desired outcomes. Also, the appropriate feature extraction like wavelet decomposition can be incorporated in the deep learning algorithms to achieve high-performance classification.
For future direction, the featured input data can be made robust and refined to achieve higher accuracy using network classifiers by applying dimensionality reduction techniques.