Faster Detection of Abnormal Electrocardiogram (ECG) Signals Using Fewer Features of Heart Rate Variability (HRV)

To reduce the effect of noise in raw Electrocardiogram (ECG) data for faster detection of cardiac arrhythmia, Heart Rate Variability (HRV) features represent good choices. This work extracted 34 popular Heart Rate Variability (HRV) features based on the MIT-BIH Arrhythmia Database. Combinations of 11 feature selection algorithms and 2 classification algorithms are used to discover the effective features of the abnormal ECG signal detection. The systematic comparisons show that the combination of 34 original features has a stable classification performance for 3 different time windows, i.e., 32 RR-intervals, 5 minutes, and 30 minutes of raw ECG records. It has been discovered that a 10-feature combination (RMSSD, SDNN, CV, TINN, HF, SampEn, SD1/SD2, VAI, ED, and DC) can rapidly classify the arrhythmia and normal state, based on the shortest ECG records (32 RR-intervals). The future work will utilize this combination of features to implement in a portable ECG equipment and clinical Arrhythmia on-line detection. Faster Detection of Abnormal Electrocardiogram (ECG) Signals Using Fewer Features of Heart Rate Variability (HRV)


Introduction
Cardiovascular disease (CVD) has the highest mortality. According to the World Health Organization (WHO), approximately 17.7 million people died of cardiovascular disease in 2015 [1]. Automatic arrhythmia detection and classification can be achieved by many kinds of features, such as morphological features [2], a 2D representation of electrograms known as the Spatial Projection of Tachycardia (SPOT) [3], etc. Different kind of machine learning algorithms were used to achieved better detections performance of arrhythmia based on ECG morphological features, such as wavelet transform [4], fuzzy C-means (FCM) clustering [5], independent component analysis (ICA) [6], Support vector machines [7][8][9][10]. Moaveian et al. performed a qualitative comparison of a neural network and SVM for classifying ECG arrhythmia based on the performance and training time for beats [11], and they found that Support Vector Machine (SVM) was faster but a neural network was more stable. Arrhythmia classification based on morphological features is a time-consuming process, and the results are very sensitive to the amount of the noise. Heart rate variability (HRV) [12] is a slight fluctuation of the instantaneous heart rate between consecutive heartbeats. Those extracting features based on an HRV signal from the ECG recordings can overcome the influence of noise in the original ECG data. The combination of the linear and nonlinear features appeared to be a better choice to detect arrhythmia [13]. Deceleration capacity (DC) of the heart rate was also a good predictor of arrhythmia or even sudden cardiac death [14,15].
Recent developments in electronic technology and algorithms have provided the possibility to offer monitoring and early warning of cardiovascular disease using portable ECG equipment. Martis et al. reviewed the current methods (linear and non-linear features) in ECG which could be utilized on the portable ECG equipment [16]. Deep convolutional neural network was invited into automated Myocardial Infarction detection [17]. He et al. developed the lightweight neural network for detecting arrhythmias which might be used on mobile devices [18]. To detect arrhythmia with good features easily, effectively, and fast, remains a challenging mission. This work utilizes the famous MIT-BIH database [19], which has been used to detect automatically arrhythmia in many studies [7,18]. Overall, 34 features, including both linear and non-linear features as well as the DC of the heart rate, are extracted. A systematic performance comparison of combinations of 11 feature selection and 2 classification algorithms is carried out in this work to find the most effective HRV features. We investigate the influence of feature selection on the training and testing performance as well as the computation time for the SVM and the Back Propagation Neural Network (BPN) for detecting arrhythmia. Finally, we discover that a 10-feature combination that may suitable for a portable ECG device, which have a rapid computational speed and a high prediction accuracy with a short data acquisition time.

Database
The MIT-BIH Arrhythmia database was collected from 25 males aged between 32 and 89 years as well as 22 females aged between 23 and 89 years. This database has a total of 48 ECG records, with two records obtained from each subject. Each recording contains ECG data with a duration of more than 30 minutes. The MIT-BIH Normal sinus rhythm database was collected from 5 males aged between 26 and 45 years as well as 13 females aged between 20 and 50 years, for a total of 18 ECG recordings, each containing ECG data with a length of 24 hours. To ensure that the sizes of the data sets were consistent, we selected 1.5 hours of ECG data for each normal subject and segmented the data F5-CV: The Coefficient of Variation, which is the overall standard deviation (SDNN) divided by the total RR interval average recorded at the same time.

F6-TRI:
The HRV triangular index, which is the total number of RR intervals divided by the height of the RR interval histogram.

F7-TINN:
The RR intervals' square width of the triangle formed by the histogram.
Frequency domain analysis: Previous studies found that the normal heart rate power spectrum lay between 0 and 0.4 Hz, with a lowfrequency band of 0.04-0.15 Hz (F8-LF), and a high-frequency band of 0.15-0.4 Hz (F9-HF). The total power spectrum between 0-0.4 Hz (F10-TP) reflects the HRV size. In addition, the ratio of LF to HF (F11-LF/HF) represents the ability of the human autonomic nervous system to regulate the balance.

Non-linear parameters:
Many parameters have been used to measure nonlinear properties of HRV, including 1/f scaling of Fourier spectra, the H scaling exponent, and Coarse Graining Spectral Analysis (CGSA). For data representation, there are Poincaré sections, lowdimensional attractor plots, singular value decomposition, and attractor trajectories. The D 2 correlation dimension, Lyapunov exponents, and Kolmogorov entropy have been used as other quantitative descriptions. In this work, 22 nonlinear features, from F12 to F33, were calculated.

F12-ApEn:
The Approximate Entropy measures the probability of generating a new pattern in the signal in terms of the complexity of the time series. In this paper, the model dimension m=2 and the similarity tolerance r=0.2 are calculated for the approximate entropy.

F13-SampEn:
The Sample Entropy is an improved algorithm with the approximate entropy, which is relatively consistent and more accurate than the approximate entropy and is thus suitable for analysing bio-medical signal sequences.

F14-DFA: A Detrended Fluctuation
Analysis is a non-linear method implemented by searching and removing the trend components in the integrated signal at different scales to achieve non-stationary artefacts in the cancellation signal and then extracting the non-stationary signal.
A Poincaré scatter plot [22] is a cross-sectional view of a multidimensional 'spatial structure' with a non-linear blend of properties that is used to observe and study the evolution of nonlinear systems. The Poincaré scatter plot has been applied to the analysis of heart rate variability, and the mapping method is as follows.
The coordinates of any point in the scatter plot are composed of two RR interval pairs of the HRV signal time series. The previo u s RR interval is the abscissa, and the following RR interval is the ordinate. The interval between the RR intervals is the delay of the scatter plot. A plot with a delay=1 can graphically show the correlation of the adjacent RR interval in the HRV sequential signal.
For long and short axis measurements, the scatter plot is fitted in an ellipse (the long axis of the ellipse is along the 45-degree contour line), SD 1 is its half-short axis length, F15-SD 2 is the half-long axis length, and F16-SD 1 /SD 2 is the ratio between the half-short axis SD 1 and the half-long axis SD 2 .

F19-Vector-angle index (VAI):
The VAI is used to measure the spread of the scatter plot on both sides of the 45-degree contour.

F20-Complex Correlation Measure (CCM):
The CCM is calculated using a sliding window in which the time information of the sequence is embedded, which consists of three consecutive points in the scatter plot [23].
Heart Rate Asymmetry (HRA) analysis: In the healthy state, the physiological system is in a non-equilibrium state, and asymmetry is the basic nature of the non-equilibrium system [24]. This asymmetry also exists in a healthy heart system and is known as Heart Rate Asymmetry (HRA). The main measures of heart rate asymmetry (HRA) are the Guzik index (F21-GI), Porta index (F22-PI) and Ehler index (F23-EI).

F21-GI:
The distance D to the contours.

F22-PI:
The ratio of the number of scatter points below the contour to the number of scatter points that are not on the contour.

F23-EI:
The first-order difference transformation of the RR interval sequence to measure the asymmetry of the scatter distribution.
The high dimension time irreversible (HDTI) analysis is used on a Poincaré scatter plot to measure the different delays, the synthesis of the delay in the scatter plot of PI and GI and the new asymmetry index F24-D t .
Another commonly used form of the Poincare Scatter Plot is known as the differential Poincaré scatter plot. The difference scatter plot removes the high correlation of the continuous heartbeat interval compared to the standard scatter plot, and it highlights the interrelationships in continuous cardiac tachycardia (or heart rate) variability. According to the positive and negative values of the difference, the points in the scatter plot will be distributed in the four quadrants of the Cartesian coordinate system. F25-P a , F26-P b , F27-P c , and F28-P d , respectively, indicate the ratio between the number of scattered points in A(+,+), B(-,+), C(-,-), and D(+,-) the total number of points.

F29-Distribution Entropy (ED):
The distribution entropy reflects the distribution of the scatter points in the differential Poincare scatter plot at different distances from the origin.
F30-ED 1 , F31-ED 2 , F32-ED 3 and F33-ED 4 are respectively, the distribution entropies of the four quadrants in the Poincare scatter plot (including the coordinates of the axis), known as the quadrant distribution entropy. Deceleration capacity (DC) of heart rate: F34-DC is determined by following calculation procedures in the literature [15].

F1
The phase-rectified signal averaging (PRSA) signal X (i) is obtained by averaging the signals within the aligned segments; i.e., X(0) is the average of the RR intervals at all anchors, X(1) and X(-1) are the averages of the RR intervals immediately following and preceding the anchors, and so on. This method of quantification corresponds to the quantification of X by a Haar wavelet analysis, for which a scale of 2 is used.
Normalization: In this paper, the Range Scaling method was used to standardize the difference of the eigenvalues of each feature.
This can accelerate the convergence speed of SVM and BPN and improve the classification accuracy while preserving the original distribution characteristics of the data.
Feature selection: Feature selection serves to improve the classification efficiency and accuracy by excluding the noise, redundancy and irrelevant features of the problem by selecting a feature subset of the important information from the original feature set. In this study, two standard feature selection methods are used, the Filter and Wrapper methods. Further, to evaluate the best feature combination, the author uses histogram analysis to select features manually based on the 32 RR-Intervals ECG records.

T-test analysis:
The filter method is a type of feature selection method that is relatively simple. All the characteristics of the evaluation and ranking are used for selection according to the ranking characteristics. In this work, a T-test analysis followed the formula: , and 2 X are the average values of two samples.

SVM-RFE:
The wrapper feature selection method based on a classifier model is a feature with which the classifier performance can perform evaluations and guide the search process in the feature subset. During the search, this method must call on the training sample classification model to evaluate good or bad characteristics. The process generally requires multiple calls to provide a more accurate evaluation, so the selected feature subsets may become more adapted to subsequent processes. In this work, another feature selection method is Support Vector Machine-Recursive Feature Elimination (SVM-RFE) [25].

Histogram-manual feature selection:
A third feature selection method is manual feature selection by a histogram analysis, as shown in Figure 2. The feature distributions of the two groups are illustrated in each histogram. We chose the top 10 features that, according to our experience, show a visible delineation of two groups from the histogram: Automated feature selection: There are large differences in the patterns of the features extracted from ECG records. There may have been an overlap between the classes in the feature space, as Figure 2 shows. Under such a condition, a feature transformation that can minimize the within-class scatter and maximize the between-class scatter will be very useful. In this work, two traditional methods are utilized to reduce the dimension of features: PCA and LDA.

Principal component analysis (PCA):
PCA is an unsupervised multivariate analysis method based on the principle of optimal statistics. The original feature space is projected into a new feature space through a series of linear algebraic operations. These new features are irrelevant to each other under the premise of reflecting the original characteristics of the difference information. This process is measured by the covariance matrix of the original feature set. A larger variance indicates a more obvious difference. The dimensionality reduction process of principal component analysis serves to reduce the secondary components that contain less information and to retain the main components that contain the main information from the original feature set.

Linear discriminant analysis (LDA):
The LDA assumes that intraclass changes exist in a linear subspace of the original feature space, so that the classes are convex and linearly separable. The linear projection method can be used to achieve the linear discriminant of dimensionality. The goal of a linear discriminant analysis is to measure the ratio of the dispersion between-classes and the intra-class dispersion to find the projection direction with the largest ratio. That is, the dispersion of the between-class is made as large as possible, and the dispersion of the intra-class is made as small as possible, which can improve the pattern classification performance.
Arrhythmia classifiers: Many different types of machine learning algorithms have been employed to detect arrhythmia automatically. Of them, SVM and Artificial Neural Network are shown to be good effective classification schemes for arrhythmia detection. To find an easy and fast detection algorithm, in this work, SVM and BPN are exploited.

Support vector machine (SVM):
The SVM maps the training sample points of the low-dimensional space into a corresponding high-dimensional space through an appropriate kernel function, so the original linearly inseparable sample points are linearly separable in the high-dimensional space. Then, a hyper-plane is constructed in this high-dimensional space, so as many of the two types of data points as possible are correctly separated. The hyper-plane that shows the largest distance from the nearest data point to the classification surface is the optimal hyper-plane, which is the plane that SVM seeks, and it can ultimately be transformed into a convex quadratic programming problem.
According to the previous experience in the design of an ECG signal classification system, this work chose the RBF kernel function: A grid factor is used to determine the penalty factor C and the kernel parameters γ.

Back propagation neural network (BPN):
The BPN is a supervised learning method for training. First, the network is given a set of initial weights, then a sample is entered and the output is calculated. According to the difference between the actual output and the expected output, the appropriate method is used to change the network weight, thereby reducing the difference by repeating the calculation until the difference is less than a predetermined value. A BP neural network consists of an input layer, a hidden layer and an output layer, of which the hidden layer can be extended to multi-layers. This study uses a single hidden layer network.
Performance evaluation: To assess the performance of the classifiers, each classification algorithm was calculated 10 times in an 8-fold cross validation. Then, a confusion matrix ( Table 2) containing the True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN) rates was used to calculate the following performance parameters: Sensitivity: Sensitivity refers to the ability of a classifier to correctly identify a subject with a normal heart rate.

TPR=TP/(TP+FN)
Specificity: Specificity refers to the ability of the classifier to correctly identify subjects with an abnormal heart rate.

TNR=TN/(TN+FP)
Positive Predictive Value (PPV): When the classifier identifies the subject's heart rate as abnormal, the subject's actual heart rate is abnormal.

PPV=TP/(TP+FP)
Accuracy: The proportion of the correct predictions of the classifier to the total number of tests.

Accuracy=(TP+TN)/(TP+TN+FN+FP)
Computational time: The measured performance using stopwatch timer on a normal PC with CPU 2.4 GHz, RAM 2.0 GB.

Results
The classification performance based on 30 minute ECG recordings: In this study, 96 30-minute ECG recordings are used (normal: arrhythmia=1:1), and 80% was training set and the rest was testing set. 10 times 8-fold cross validation results are shown Tables 3  and 4. Table 3 shows the arrhythmia classification performance of the SVM classification of different input feature combinations obtained by different methods based on a 30-minute ECG recording. The accuracy of classifications of the training set is similar, and all values are greater than 98.8%. The accuracy range for the test set is from 95.6 ± 6.5% (by SVM-RFE with LDA) to 98.1 ± 4.2% (by 34 Original features). In this case, in terms of the computation time, the PCA method is more effective than the T-test with PCA, even though the accuracy is almost the same (97.3 ± 4.4% by T-test with PCA and 97.1 ± 4.8% by PCA alone).

The classification performance based on 5 minute ECG recordings
During data pre-processing, the 30 minute recordings are segmented into 5 minute recordings, producing 288 (48 recording × 6 pieces) 5 min recording samples for both the normal and arrhythmia groups. Among the total 576 samples, 80% was used for training and the remaining 20% was used for testing, and carried out 8-fold cross validation 10 times.
Based on the analysis of the 30-minute recordings, the authors choose the 4 best combinations of algorithms (SVM with 34 original

No. Methods a)
Training Testing

No. Methods a)
Training Testing Original 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 95.   features, SVM with PCA, BPN with 34 original features and BPN with T-test) to analyse the 5-minute recordings. The comparison results are shown in Figure 3. After increasing the number of data sets from 48 to 288 for each group, the best classification accuracy is improved from 98.3% (for the 30 min recording T-test for feature selection classified by BPN) to 99.6% (for the 5 min recording with the original 34 features). Each of the four combinations of algorithms produce a classification accuracy greater than 99.0%.

The classification performance of 32 RR intervals ECG recordings
To achieve a good solution for rapid arrhythmia detection, the third part of this work is a systematic analysis of the classification performance based on 2000 samples of the 32 RR-intervals of the HRV signal. In this study, total 4000 ECG recordings were used (normal: arrhythmia=1:1), and 80% was training set and the rest was testing set, and took 10 times 8-fold cross validation.
Because this is a systematic analysis of arrhythmia detection. Two feature combinations are added. Manual (10) features/ Manual (16) features are selected by a histogram analysis guided by experiences. There are total 11 input feature combinations for both SVM and BPN. The results are illustrated in Table 5. The SVM classifier requires more than 10 times the calculation time for the training set as the BPN classifier. The testing computation time for BPN is approximately twice that of SVM. The fastest computational time is 7.0 ms by Manual (16) features classified by SVM, and it has a 97.2 ± 0.8% classification accuracy. The best classification accuracy is 97.9 ± 0.7% by 34 multiple features using BPN, which consumed 22.0 ms.
Based on the 32 RR-Intervals ECG recording, many feature combinations failed to produce classification results. When using BPN as the classifier, 3 input feature combinations fail, including SVM-RFE/SVM-RFE with PCA/SVM-RFE with LDA. When using SVM as the classifier, 6 feature combinations fail, including the T-test/ SVM-RFE/T-test with PCA/T-test with LDA/SVM-RFE with PCA/SVM-RFE with LDA. Meanwhile the manual features combinations achieve the similar classification performance as original 34 features combination. For SVM, the manual features combinations even save the computation cost during both training and testing periods.

The comparison between this work with others in the literature
Several researchers are interested in arrhythmia detection by ECG signals directly or by analysing the HRV signals. In Table 6, the comparison results are summarized between the two groups Arrhythmia and Normal groups proposed in the literature and this work. To find the better features for portable ECG equipment, a two-group classification is considered in this study. In the literatures Elif 's [7], Jenerfar's [9], and Nuryani's [10] works both focus on QRS wave analysis based on different time size of HRV signals. In this work, the best classification performance results of two groups is 99.6% when using the 34 original features based on the 5 minute ECG records. Compared with manually selected 16-feature combination, the manually selected 10-feature combination performs a little better whenever used with SVM or BPN as the classifier. The manual-selected 10 features combination holds the similar classification performance (97.4%) as the 34 original features (97.9%) with a fast testing time (about 10.0 ms) based on 32 RR-Intervals ECG records.

Discussion
To adapt to the online screening of arrhythmia, a simple and rapid algorithm is required. The aim of this work is to find better combinations of features for distinguishing abnormal and normal ECG records [26]. Tsipouras et al. [27] calculated different characteristics of the HRV signals to serve as the classification features. Compared with the morphological classification, a classification based on an analysis of HRV features appears to be effective for noise suppression, even for very short recordings of ECG raw data (i.e., a 32 RR intervals). An investigation of those features is discussed by Babak et al. [13] who classified 6 subclasses of arrhythmia with a performance accuracy of approximately 99.6%. Compared with 15 features used in Babak's work [13]. This study increased the number of features to 34 to include linear features, nonlinear features and the DC of the heart rate, which provide useful information about the ECG signal for detecting arrhythmia. The 34 original features employed in this work are effective and stable as the input for detecting arrhythmia in 3 different data sizes of ECG recordings.
During this systematic comparison, the classification accuracy and computational time are considered. In this study, both the SVM and BPN, the testing computational time is only several milliseconds based on short ECG records. When some features are distorted, compared with SVM, BPN is more stable. Based on the same input feature combination, BPN requires shorter training computational time than SVM does.
To find the best and simplest feature combination, it is investigated the influence of different feature combinations as well as the computational time for both training and testing sets. When an ECG recording is short (32-RR intervals), some feature combinations fail to separate the two groups. The 10 feature combinations manually selected by a histogram (F2-RMSSD, F3-SDNN, F5-CV, F6-TINN, F9-HF, F13-SampEn, F16-SD 1 /SD 2 , F19-VAI, F29-ED, and F34-DC) achieves the optimal classification performance, similar to the 34 original features. The additional 6 features (F11-LF/HF, F14-DFA, F18-VLI, F20-CCM, F26-Pb, and F28-Pd) do not improve the classification performance very well. These manually selected features might be the most important features to distinguish an arrhythmia from a normal ECG. They include linear features, nonlinear features and the DC of the heart rate. This 10-manual-selected features combination may be used in the future to faster detect the Arrhythmia with fewer features.

Conclusion
A systematic performance comparison of combinations of 11 input features selected by different methods and 2 classifiers (SVM and BPN) is performed. BPN requires a shorter training computational time than SVM but offered a similar classification performance. Based on raw ECG recordings of different durations, the 34 original features proposed in this paper had a very stable classification performance, and the best performance of 99.6% was achieved by BPN using 5-minute ECG recording. We discovered that the combination of 10 manually selected features (RMSSD, SDNN, CV, TINN, HF, SampEn, SD1/ SD2, VAI, ED, and DC) provides relevant information for detecting arrhythmia and can achieve the same classification performance as can the 34 original features with 32 RR-Intervals ECG recordings. It takes several milliseconds of the testing computational time and several seconds of the training computational time. These manual-selected 10 features used by SVM classifier is expected to be suitable for portable ECG monitoring equipment.