ECG Signal Classification Based on Fusion of Hybrid CNN and Wavelet Features by D-S Evidence Theory

At present, cardiovascular disease is regarded as one of the dangerous diseases that threaten human life. The morbidity and lethality caused by cardiovascular disease are constantly increasing every year. In this paper, we propose a two-stream style operation to handle the electrocardiogram (ECG) classification: one for time-domain features and another for frequency-domain features. For the time-domain features, convolutional neural networks (CNN) are constructed for feature learning and classification of ECG signals. For the frequency-domain features, support vector regression (SVR) machines are designed to perform the regression prediction on each signal. Finally, the D-S evidence theory is adopted to perform the decision fusion strategy on the time-domain and frequency-domain classification results. We confirm a recognition performance of 99.64% from the experiment result for the D-S evidence theory recognition system upon the MIT-BIH arrhythmia database. The analysis of various methods of ECG classification shows that the model delivers superior performance promotion across all these scenarios.


Introduction
Cardiovascular disease is a disease with the highest incidence and mortality of human noncommunicable diseases [1]; they are threatening the lives of millions of people. erefore, it is of great significance to diagnose ECG signals efficiently and accurately. e ECG signals can monitor the rhythm of the heart's activity, and it is useful to monitor heart diseases [2]; it contains a wealth of basic physiological signals of the heart, so it is often used to detect physical health disorders. At present, the processing methods of ECG signals are mostly manual analysis, which is very tedious and time-consuming. With the continuous progress of deep learning, the efficiency of ECG analysis and processing has been greatly improved.
ere are various methods to automatically detect and classify abnormal ECG signals, such as wavelets analysis [3], deep belief network (DBN), and support vector machine (SVM) [4]. For the time-domain features, Mehrdad Javadi et al. [5] used the complementary features of Mixture of Experts (ME) and Negatively Correlated Learning (NCL) to classify ECG signals; they obtained a recognition rate of 96.02%. Rajpurkar et al. [6] constructed a CNN model, which diagnoses 14 types of arrhythmias, and it achieved a high accuracy rate. Li et al. [7] employed a 1-dimensional CNN, they introduced SMOTE algorithm to augmented data and achieved 98.12% accuracy. Alif et al. [8] have developed a 2D CNN for extracting shape-related features to detect arrhythmia. In their study, the classifier contains six convolutional layers, and the method shows an accuracy of 94.37%. Marinho et al. [9] introduced Structural Co-Occurrence Matrix (SCM) to extract features for the first time, and it was demonstrated to be promising for ECG classification.
Due to the limitations of the time-domain methods, these methods just analyse the ECG signal in time domain, and the results are not very well. For the frequency-domain features, Faziludeen et al. [10] used wavelets and SVM for classification, and they achieved 98% accuracy, but they only identified two types of abnormal ECG signals. Radovan et al. [11] utilized PQ intervals and QRS complexes to extract features and used genetic algorithm to select features and train SVM. ey obtained a recognition rate of 84%. To reduce the segmentation step, Mondéjar-Guerra et al. [12] constructed multiple SVMs for the ECG signal classification, each SVM trained with a different feature. Compared with a single SVM model, their approach offered a satisfactory performance, and they achieved 94.5% accuracy. e highlights and drawbacks of these papers are shown in Table 1.
D-S evidence theory can fuse the prediction results of multiple classification models of the predicted target, and it is usually applied to the fusion of the decision layer to perform the final reasoning and decision-making process. Zhang Lizhi et al. [13] have implemented CNN and D-S evidence theory for the gearbox composite fault diagnosis, and they obtained an accuracy of 84.58%. Moreover, Geng Changxin [14] has advanced a diagnosis model of fused gearbox fault by using the D-S evidence theory, and this method showed 90% accuracy.
Due to the limitation of time-domain signal analysis, some researchers use frequency-domain characterization information to assist the signal analysis process. Compared with the time-domain method, frequency-domain signal processing has many advantages. However, it is not enough to complete the signal classification tasks by frequencydomain processing alone, so the time information is complemented for more accurate correspondence. In this paper, we propose a novel way to integrate the estimation in the frequency domain (such as the wavelet packet) and in the time domain. Furthermore, compared with the single classifier, multiple classifier systems (MCS) are more robust and the classification accuracy is higher [15]. Motivated by fusion of multiple classifier systems, we proposed to fuse the estimation of the CNN classifier and the SVR output of the wavelet packets.
In this paper, we proposed a new classification system using 1D CNN and SVR to learn the hybrid heartbeat features, and incorporate the outputs by D-S evidence theory, as shown in Figure 1. e CNN has the advantage in high-level feature mining due to its complex network structure. For time-domain features, this paper builds 1D CNN to classify the ECG signals. For frequency-domain features, wavelet packets and multiple SVR machines are designed for frequency-domain feature learning and classification. Both 1D CNN and SVR have performed relatively well on ECG signals. e innovation of this paper is that D-S evidence theory is used to fuse the recognition results of 1D CNN and SVR, because the results come from different domains, and this method is beneficial to improve the accuracy. Compared with a single classifier, this method has a higher classification performance. In the end, we compare our method with previous work. e results show that the proposed method has better accuracy. e main contributions of the proposed method are as follows: (1) is paper proposed a novel multiclassifier ECG signal recognition framework; specifically, we utilize 1D CNN to extract time-domain features, and frequency-domain features are extracted by wavelet packet and classified by SVR. en, we introduce the D-S evidence theory to fuse the recognition results of 1-D CNN and SVR, which obtains a better classification performance. (2) e introduction of D-S evidence theory can make up the singleness and incompleteness of feature extraction in a single domain. (3) Since the classification results of the classifier are derived from the time-domain and frequency-domain, the method in this paper achieves better performance than other methods in MIT-BIH dataset.
e remainder of this article is organized as follows. Methods are presented in Section 2, followed by the experimental design and classification models in Section 3. Experimental results and discussion of the proposed model are given in Section 4. Finally, this paper gives the conclusion and the future directions.

Dataset.
In this paper, we used MIT-BIH Arrhythmia Database for experiment and analysis [16].
is database contains 48 different patient records obtained from 47 subjects and extracted from two leads (lead II (MLII) and lead VI). e lead II was used as the default as the QRS is more prominent. e label of ECG signals was annotated by multiple cardiologists independently; it can reduce the error in the diagnoses. In this paper, five ECG signal categories are considered; each segment has one beat type, which consists of 250 samples. e created fragments contain 99 samples before the R peak and 150 samples following the R peak. Table 2 shows the details of the ECG signals in this study; the typical ECG segments of A, L, R, V, and N signals are shown in Figure 2.

ECG Signal Preprocessing.
Initially, ECG signals that obtained from MIT-BIH dataset contain low frequency and high frequency, such as machine interference [17] and muscle movement. For the low-frequency noise, we use the mean filtering and median filtering algorithm, and the effect of median filtering is generally better than mean filtering, so this paper used the median filtering algorithm to get rid of low-frequency noise, and the median filter window size was set to 50 and 150 steps [18]. For the high-frequency noise, we studied an approach to eliminate the power frequency interference from ECG signals based on wavelet transform. e method of the soft threshold is adopted to analyse the wavelet coefficient. It can effectively remove the high-frequency noise and retain the original features of ECG signals. Figure 3 shows the original and reconstructed ECG signal.

R-Wave Detection and Segmentation.
Having removed the noise, the ECG signals need to be segmented by waveform detection. In this paper, we utilize the QRS waves to segment ECG signals. Since the QRS complex is a dominant feature and a crucial part of ECG signals, we use the difference threshold algorithm [19] to detect QRS waves.
is algorithm contains the operation of difference, square, Journal of Healthcare Engineering 3   Figure 1: e framework of the proposed classification model. and moving window integration, which correspond to formulas (1)-(3) [19], respectively. Finally, the R-wave was extracted through the legal position of the dual threshold method (formulas (4) and (5)) [19].
where x(n) is the input ECG signal, y(n) is the output signal, n is the point of the difference operation, and N is the width of the moving window. Journal of Healthcare Engineering where TH2 and TH1 are high threshold value and low threshold value, respectively. SPKI is the value of the QRS peak, and NPKI is the first value of the noise peak.
en, the locations of the QRS wave were marked in the original signal time domain. Based on the MIT-BIH sampling rate of 360 Hz, the R-wave was used as the benchmark. e final decision was made to take 100 points to the left and 150 points to the right, respectively; that is, the 250 points contain a complete beat sample. Figure 4 shows an individual heartbeat waveform of the ECG signal sample.

Feature Extraction.
In the time domain, the 1D CNN model is applied to automatically extract features from preprocessed ECG signals. After the signals are detected and segmented into individual waveforms, we can obtain plenty of complete time-domain features, including P wave, QRS wave, and T wave. e arrhythmia types of each ECG signal in the MIT-BIH database have been labeled by medical experts. e 10-fold cross-validation is employed in this study, and the average result of all the 10 folds was calculated as the final performance of the system. As shown in Table 2, we choose five types of heart arrhythmia: N, R, L, V, and A; then we randomly select 2000 samples from each type as the sample set: 90% of them as the training sample set, and 10% samples as the test set. en, we use a 1D CNN for classification; it contains four convolution layers, four maxpooling layers, and a fully connected layer.
e wavelet packet analysis technique is utilized in the frequency domain, and it can be well applied to ECG signal processing in time-frequency analysis; it provides a more detailed method to signal analysis. As is shown in Figure 5, the wavelet packet can decompose the high-frequency part finer than the wavelet transformation. Moreover, according to the characteristics of the ECG signals, it can select the corresponding frequency band adaptively and make it match with the signal spectrum, so as to improve the time-frequency resolution. In this paper, we utilize Daubechies 6 (db6) [20] as the wavelet basis to decompose the signals in five scales and obtain 32 wavelet packet coefficients, including the high-frequency components of ECG signals. en, we extract the variance, mean, and maximum value of the 32 coefficients as the ECG signals transformation domain features. SVR is used to train the classification features of five arrhythmia types. e sample set is the same as that of CNN.

Construction and Fusion of the Classification Model
In this section, we proposed a novel framework to automatically recognize five classes of arrhythmias. is hybrid system combines 1D CNN with wavelets features to meet this requirement. e classification results of these two models are fused by D-S evidence theory.

Classification Model Based on 1D CNN.
CNN is a popular artificial neural network [21], because of its high learning efficiency; it has been widely used in the field of image recognition and achieved great success [22]. In the 1990s, Le Cun et al. [23] designed and trained the most classic CNN model, Lenet-5. e CNN model is composed of the input layer, convolutional layer, pooling layer, full connection layer, and output layer [24], the convolutional layer and pooling layer are arranged alternately, and this special structure can simplify the network parameters and makes CNN have translational and rotational invariance [25].

Convolution Layer.
e convolutional layer consists of multiple feature maps, each feature map is composed of multiple neurons, and each neuron is connected to the local area of the previous feature map through the convolution kernel. e function of the convolution layer is to learn the characteristic representation of the input data, enhance the  information of the original data, and suppress the noise. e convolutional layer reflects the advantages of local connection and weight sharing [26]. In convolution, the network will learn features automatically without manually selecting features [26]. e convolution layer is calculated by the following formula [27]: where X l j represents the j-th characteristic graph of the layer l, K l ij is the convolution kernel function, f is the activation function, b l j is the bias parameter, and M j represents the set of selected input feature graphs.

Max-Pooling Layer.
By reducing the resolution of the feature surface and the number of parameters, the maxpooling layer can obtain spatially invariant features. e maxpooling layer plays an auxiliary role in feature extraction. In this paper, we utilize the max-pooling operation to calculate the max value of a nearby set of inputs. It is defined as Here, max is the max sampling function, and the size of the sampling window is n × n. en, the output feature map is reduced by n times. Each output feature graph has its multiplicative bias parameters β and additive bias parameters b.
Finally, after alternating the convolution layer and pooling layer, SoftMax regression is adopted to return the probability of input data belonging to a certain category.
In the CNN classifier, the convolutional layer and the max-pooling layer are alternately connected to generate a deep neural network, which can efficiently extract the basic features of ECG signals. e ECG signals belong to onedimensional discrete data, so it is necessary to build 1D CNN. Table 3 shows the layer details and the parameters of 1D CNN.

Classification Model Based on SVR.
SVM technique is a classical machine learning method proposed by Vapnik and co-workers [28].
is technology has been further modified in various fields. As a kind of model for SVM to deal with fitting problems, support vector regression (SVR) can predict the test data by establishing the nonlinear relationship between the support vector and the predicted vector in the training data; SVR solves the problem of nonlinear fitting well by introducing kernel function to replace the calculation of high-dimensional space and regression. e regression function in the feature space is expressed as [29] f where ϕ(x) is the feature space, w is the weight parameter, and b is the bias parameter. According to the structural risk minimization principle, and introducing relaxation variables, w and b can be minimized to obtain the following objective function [29]: which is subject to the constraints where C and ε are the prescribed parameters, 1/2‖w‖ 2 measures the flatness of the functions, and C represents the trade-off between the flatness and empirical risk. In this paper, the ECG signal is broken down into five scale wavelet packets, and we obtained 32 wavelet packet coefficients, which make up the sample space. Besides, we constructed five subclassifiers for ECG signals classification. Each classifier is trained to classify one ECG signal type as a positive sample and the rest as negative samples. After the training step, each classifier can output the probability of the classification result of the test sample.
In this paper, we propose SVR for classification. Kernel functions of SVR include polynomial kernel function, linear kernel function, multilayer perceptron, and radial basis kernel function. e extracted transform domain features, namely, 96 wavelet statistical features, are normalized and composed into a sample set. We utilize the radial basis kernel function for classification.
is kernel function has two parameters: the kernel parameter g and the penalty parameter c. e optimal parameters of g and c are obtained by cross-validation, and the results are g � 3.2 and c � 2.9. Figure 5: e structure diagram of wavelet packet decomposition [20].

Principle of Evidence eory. D-S evidence theory is an uncertainty inference theory proposed by Dempster in 1967
Journal of Healthcare Engineering 7 and further developed by his student Shafer. It can deal with the uncertainty of results effectively and the fusion of multiple classifier data results [30]. e basic principle of D-S evidence theory is to define the frame Ω, which is usually a finite set; it is a set of incompatible basic propositions or assumptions about the problem. In general, all subsets of Ω are represented as power 2 Ω , and the basic probability distribution is defined as follows [31].
Definition 1. (see [31], basic probability distribution). If m(A) satisfies the mapping from 2 Ω to [0,1] in the frame of discernment, and A stands for any subset of Ω, then it satisfies en, m is the basic probability assignment function (BPAF) on the Ω. Definition 2. (see [31]). m(A) is the trust function, and it is usually utilized to define the lower bound of probability. e upper limit of probability can be determined by the likelihood function. e belief function (Bel) and plausibility function (Pls) are shown in the following formula:

Evidence eory Fusion of CNN and SVR Classifier
Results. In this paper, we propose a two-stream style operation to handle the electrocardiogram (ECG) classification: one for time-domain features (the features are obtained and classified by CNN) and the other stream for frequencydomain features (the features are obtained by wavelet packets and classified by SVR). e fusion of CNN and SVR classifier belongs to the fusion of two evidences. e formula of the composition rules [31] is as follows: where m 1 and m 2 are the probability distribution functions of the predicted results of CNN and SVR for the same test sample and A 1 ∼ A 5 and B 1 ∼ B 5 are the corresponding focal elements, respectively.

Experimental Setting.
e experimental software is MATLAB R2018a, and the hardware environment is Intel i7-10875H processor, 16 GB memory, Windows 10 operating system. e model is trained for 600 epochs each time, and the batch size is set as 64. We used the adaptive moment estimation (Adam) to update the CNN weights, and the initial learning rate was 0.01.
In this paper, a 10-fold cross-validation is employed, and the average result of all the 10 folds was calculated as the final performance. We use the positive predictivity (PPV), sensitivity (SEN), specificity (SPE), and accuracy (Acc) to evaluate the effectiveness of the model. Moreover, F1-score is used to evaluate the model; the formulas are as follows:

Experimental Results and Analysis.
e performance of different classifiers is mainly compared in this section. We use Acc as the evaluation indicator to compare the performance differences of 1D CNN, SVM, and the fusion of them by D-S evidence theory (D-S model). e classification results are presented in Tables 4-6.
We utilize 1D CNN and SVR to classify five types of ECG signals. e prediction results are shown in Tables 4 and 5.
e achieved results reported in Table 4 show that 1D CNN model successfully classifies the ECG signal. e F1scores of N and L are relatively high, reaching 99.34% and 99.65%.
e F1-score of A is low, only 97.43%, and the overall Acc of the model is 99.50%. Table 5 shows that the F1-scores of R and L are higher than others. e F1-score of A is relatively low, only 90.12%, and the overall Acc is 96.46%. e results of 1D CNN and SVR model are fused by D-S evidence theory. Figure 6 shows the final confusion matrix of this classification system. Table 6 shows that the F1-scores of R and L are relatively high, reaching 99.98% and 99.76%. e A is low, only 98.20%, and the overall accuracy rate of the model is 99.64%; it is generally increased nearly 0.14%. Table 7 lists the 16 metrics of classification results, including PPV, SE, and SP of each beat and the average Acc. e D-S model has the best comprehensive classification ability: yielding 11 highest scores on 16 metrics. It achieves the best average F1-score and Acc of 99.3% and 99.6%. From Table 6 and Table 7, it can be deduced that the D-S evidence theory improved the performance of classification.
As can be seen from Figure 7, the recognition rate of the evidence-based fusion method is better than that of the 1D CNN and SVR classifiers; it has a high and stable recognition accuracy.
In summary, the classification performance of the method is superior to those listed in Table 8 in the task of ECG signal classification. In our study (marked in bold, Table 8), the classification proposed in this paper is different from other studies. is method utilizes D-S evidence theory to fuse the classification results of 1D CNN and SVR to obtain the final classification results. e experiment proves that, by introducing the D-S evidence theory, the multiclassifier system can be more robust and improve the classification accuracy effectively. e main advantages of our proposed system are summarized as follows: (1) We utilize 1D CNN and wavelet packet to extract time-domain and frequency-domain features, respectively. en, we introduce the D-S evidence theory to fuse the recognition results, which obtain better classification performance.         (1) Classification of ECG signals needs to be improved, such as AAMI classification. (2) is method cannot deal with negative peaks effectively.

Conclusions
is paper proposed an innovative method for ECG classification, which combines 1D CNN and SVR. e proposed method has good generalizability in nature, and this method can effectively deal with the classification problem in biomedical applications. 1D CNN classifier is constructed in the time domain for feature learning.
e SVR classifier is constructed in the frequency domain for five scale wavelet packet decomposition of ECG signals; then we obtained 32 arrays of wavelet packet systems to construct the sample space. Finally, we used D-S evidence theory to fuse the predicted results of 1D CNN and SVR classifier, and the recognition accuracy is further improved. One of the most significant contributions of this study is that we propose D-S evidence fusion to classify ECG signals. To our knowledge, this is the first effort to use D-S evidence fusion for ECG signal detection. e classification accuracy of the proposed model reached 99.64%. In the future, we intend to improve the method of preprocessing ECG signals to solve the problem of negative peaks. Moreover, we will try to recognize more types of ECG signals in different datasets.

Data Availability
e MIT-BIH Arrhythmia Database used to support the findings of this study is publicly available and can be downloaded at https://physionet.org/content/mitdb/1.0.0/.

Conflicts of Interest
e authors declare that they have no conflicts of interest.