ECG Classification Using Deep CNN Improved by Wavelet Transform

Atrial fibrillation is the most common persistent form of arrhythmia. A method based on wavelet transform combined with deep convolutional neural network is applied for automatic classification of electrocardiograms. Since the ECG signal is easily inferred, the ECG signal is decomposed into 9 kinds of subsignals with different frequency scales by wavelet function, and then wavelet reconstruction is carried out after segmented filtering to eliminate the influence of noise. A 24-layer convolution neural network is used to extract the hierarchical features by convolution kernels of different sizes, and finally the softmax classifier is used to classify them. This paper applies this method of the ECG data set provided by the 2017 PhysioNet/CINC challenge. After cross validation, this method can obtain 87.1% accuracy and the F1 score is 86.46%. Compared with the existing classification method, our proposed algorithm has higher accuracy and generalization ability for ECG signal data classification.


Introduction
The most common manifestation of heart disease in the clinic is persistent arrhythmia, and atrial fibrillation (AF) occurs more frequently in heart disease [Berenfeld and Jalife (2011)]. The main hazard of atrial fibrillation is the increased risk of vascular embolism, which is one of the main causes of ischemic stroke [Nielsen and Chao (2015)]. Atrial fibrillation is manifested in the disappearance of the sinus P wave in each lead, the shape and amplitude of the QRS wave are basically the same as sinus rhythm, and the R-R interval is absolutely unbalanced [Lip and Tse (2007)]. The automatic analysis and classification system of electrocardiograms can greatly help doctors diagnose heart disease and is of great significance in improving medical efficiency, reducing medical costs and preventing heart disease. In recent years, extensive research has been carried out on the automatic identification and classification of ECG signals worldwide. Machine learning is an important method for solving artificial intelligence problems. Support vector machines (SVM) are used to perform multiclassification tasks. Its main idea is to find the maximum interval between support vectors, so as to find the largest hyperplane in the feature space for classification [Manavalan and Lee (2017) ;Peng, Zhang, Zhao et al. (2012)] to realize the efficient "reasoning transformation" from training samples to prediction samples [Shah, Oehmen and Webb-Robertson (2008)]. Kumar proposed an automatic classification method based on wavelet transform ECG [Kumar, Pachori and Acharya (2017)], which performs beat segmentation on ECG signals. Wavelet transform is performed on each ECG beat [Cvetkovic, Übeyli and Cosic (2008)] and finally classified by a support vector machine [Peng, Wu and Jiang (2010)]. Ramírez analyzed the electrocardiographic records of 597 patients with chronic heart failure and sinus rhythm [Burattini, Zareba and Moss (1999)], and finally used the support vector machine to divide the patients into three groups [Ramírez, Monasterio, Mincholé et al. (2015)] and achieved good results. The continuous application and development of deep learning in various fields in recent years, the advantages of neural network feature extraction are gradually becoming obvious [Zhao and Du (2016)]. Deep learning usually combines simple models and transfers data onto one layer to another to build more complex models [Kim (2014)]. Convolutional neural network (CNNs) have been widely used in the image field, making convolutional neural networks continue to develop [Zhang, Lu, Ou et al. (2019)]. CNN is especially suitable for discovering patterns of images [Palaz, Collobert and Doss (2013)]. CNN directly learns from image data and uses patterns to classify images. CNNs have shown good results in various applications. Hannun et al. [Hannun, Rajpurkar, Haghpanahi et al. (2019)] developed a deep neural network (DNN) that classifies 12 signals in a single-lead ECG signal with a classification performance accuracy of 83.7%, which is better than the 78% achieved by human cardiologists. Due to the inevitable existence of a large amount of noise interference, such as myoelectric interference and machine interference, among others. [Manikandan and Soman (2012)]. Ari et al. [Ari, Das and Chacko (2013)] proposed to use s-transform to filter the electrocardiogram data. Through experimental comparison, the effect of filtering noise is better than wavelet transform based on threshold. Wu decomposes the ECG signal through wavelet changes, and combines the methods of least mean square adaptive filtering and spatial selective noise filtering to get rid of noise and obtain stable ECG signals [Wu, Shen, Zhou et al. (2013)]. In the above two methods, while filtering to eliminate noise, the singular points in the ECG signal are regarded as noise and filtered together. In this paper, wavelet transform is used for data preprocessing. By performing continuous wavelet transform (SWT) on ECG signals, the original signal is decomposed and reconstructed to eliminate the effects of noise interference on the signal. When extracting features from ECG signals, the deep neural convolution network (DCNN) is used to extract features of the electrocardiogram signal, and each layer of the convolutional neural network detects different characteristics of the signal. The convolutional layer filter is applied to each training dataset of different resolutions to extract deep ECG features and shows good performance on ECG signal classification.

Related work 2.1 Data introduction
In this paper, we used the ECG data set provided by the 2017 PhysioNet/CINC challenge [Clifford, Liu, Moody et al. (2017)]. The data set provides 8,528 single-lead ECG data, including normal sinus rhythm, atrial fibrillation, other ECG data, and noise. The length of ECG data ranges from 9 s to 60 s, with an average length of about 30 s. The sampling frequency of each ECG data in the data set is 300Hz, and the data dimension is 2700 to 18000.

Convolutional neural network
CNNs usually consist of multiple convolutional layers and pooling layers to perform data feature extraction [Sainath, Mohamed, Kingsbury et al. (2013);Hong, Zheng, Xia et al. (2019)]. Compared with ordinary neural networks, the neurons of the convolutional neural network are connected in a local way. Only adjacent neurons will be connected. Another major advantage of CNN is that they are on the same feature plane. Neurons share weights, which can greatly reduce the amount of calculations and reduce the connections between the layers of the network. Pooling layer for the entire CNN, the feature dimension reduction of the pooling layer can effectively remove redundant information, to a certain extent, prevent overfitting, and make it easier to optimize. The pooling layer greatly simplifies the complexity of the model and reduces the parameters of the model. Convolutional neural networks generally include the following five parts.
(2) Convolution layer: Convolution kernels are used for feature extraction and feature mapping. Propose different data features through multiple convolution kernels. (3) Pooling layer (max pooling): The pooling layer can reduce the dimensionality of data features, compress the number of data and parameters, down sampling, refine the feature map and reduce the number of data operations, and reduce overfitting. (4) Fully-connected layer: Integrate the highly abstract features after multiple convolutions, and then normalize them and send them to the classifier. (5) Output layer (output): used to output results.  1) and (2) are the input and output for each convolutional layer, conv represents a convolutional function, w is a convolution kernel matrix, x is the input matrix, b is paranoid, and Each convolutional layer has a different weight matrix w , and w , x , y is a matrix form. The fully connected layer of the last layer is set to L layer, which outputs vector form N y . The expected value output is h , Then, there is a total error formula: The y h, in the total error is the vectors of the desired output and the network output. 2 x represents the 2-norm of the vector x . The calculation expression is:

Classifier
The learning strategy of SVM is to maximize the interval, which mainly allows the two support vectors to find the maximum classification interval. Finally, the support vector is used to find the optimal classification hyperplane to achieve the classification task. Without loss of generality, given the training sample set For linear binary classification problems, the hyperplane equation is generally , w is the weight vector and b is the bias term. The hyperplane is normalized to obtain the interval equal to w 2/ . Solving the optimal classification hyperplane is equivalent to minimizing w , that is, solving the conditional extreme value problem of formula 5, and using the Lagrangian multiplier method to obtain the optimal classification function, as shown in formula 6, where the Lagrangian multiplier When using support vector machines to solve support vectors, quadratic normalization is often used to calculate the m-th order matrix. When using machine learning for classification, the size of the data set is generally large, and the calculation of the matrix will consume a lot of time. When performing multi-classification tasks, the huge amount of calculation will greatly reduce the classification efficiency and accuracy of the algorithm, and at the same time, the requirements for computer hardware are relatively high. The advantages of deep learning are gradually reflected in applications, neural networks are used for feature extraction, and the collocation mode of softmax for classification shows excellent performance in multiclassification problems. Softmax is used in multiclassification. In the (0, 1) interval, it can be understood as a probability, and the function for multiclass softmax is: It can be seen that it has multiple values. All values add up to exactly 1. Each output is mapped to the interval 0 to 1.
x T j θ represents multiple inputs. Training is actually to approximate the best T θ .

Method
In this paper, we propose an automatic classification algorithm for ECG signals. We use wavelet transform to filter ECG signals and cooperate with DCNN for feature extraction. The method was used to achieve excellent performance on the ECG dataset provided by the 2017 PhysioNet/CINC challenge. In view of the weak ECG signal and insufficient extraction of feature levels, the classification accuracy is insufficient, and the improvement is mainly made in the following aspects: The ECG signal is filtered, and the wavelet transform can be used to localize the analysis on the time and frequency of the ECG signal. The telescopic translation operation is used to gradually multi-scale the signal, which can adaptively time-frequency signals. The requirements of the analysis effectively retain the signal characteristic value, so the noise removal effect is better. Considering the weak ECG signal and the more tedious time series, a deeper convolutional neural network is designed to better extract the hierarchical characteristics of ECG signals. Due to the difference between ECG data and image data, we designed a large convolution kernel to increase the perceptual field of view of the convolution kernel, and used the one-dimensional convolution kernel to extract the ECG signals. Since the data dimension is relatively high, there is a problem of how the convergence effect and the convergence speed are balanced. We use RAdam as the optimizer to solve the problem of convergence to the local solution, which can ensure that the convergence speed is fast and that it does not easily fall into the local optimal solution. The convergence result is not sensitive to the initial value of the learning rate, which not only improves efficiency but also helps to optimize the classification result.

Data preprocessing
Since the ECG signal is very susceptible to interference, the signal marked as normal for the dataset also has serious noise interference at the beginning. These interferences can be judged by the doctor according to the entire record, but for the computer, serious noise interference directly affects the identification and classification of ECG signals. In this paper, the ECG signal is filtered by wavelet transform in the data preprocessing stage to filter out the interference waveform in the ECG data. Wavelet transform has better effect on filtering time-sensitive data. We use wavelet transform to decompose the original ECG signal data and set the number of decomposition layers to 9, and the original signal is decomposed into wavelet components to the selected level. After filtering, the signal is reconstructed by wavelet, and the ECG signals reconstructed into different scales are obtained. The wavelet transform filtering process is shown in Fig. 2.
, and the output response is through a bandpass filter. Therefore, the wavelet transform is more stable in filtering the original ECG signal. As shown in Fig. 3 below, a small segment of ECG signal is extracted for wavelet transform processing. After comparison, the filtered signal is more stable.

Deep convolutional network design
For the ECG data, this paper designs a DCNN for feature extraction. Compared with the traditional convolutional neural network, a convolution kernel larger than the extracted image features is used to expand the perception field of the convolution kernel to meet the data characteristics of ECG signal timing. We designed a 24-layer convolutional layer, using different convolution kernel sizes and quantities to mine as many data features as possible. The network structure design is shown in Fig. 4. The neurons are only connected to their neighboring upper-layer neurons by combining the learned local features to form the final global feature. To deepen the network structure and remain efficient, CNN is very efficient because of the pooling layer. Convolution pooling is a vector used for the scalar transformation of each local area of data. In this paper, pooling is added after every two convolution layers to conduct the subsampling to ensure the efficiency of the algorithm. To prevent the model from overfitting, we add dropout after each convolution layer, which can randomly set some activation values to 0, forcing the network to explore more methods for classifying data, rather than relying on some functions excessively. Figure 4: Structure design of deep convolutional neural network All the deep network models extract the features because the network layer is too deep, and the gradient disappears. To prevent this phenomenon, we use RAdam as the optimizer. RAdam uses a dynamic rectifier to adjust Adam's self-adaptation according to the variance of the momentum and effectively provide automatic warm-up, customized according to the current data set to ensure a solid training start. Additionally, we added batch normalization between every two layers of convolution. Batch normalization can normalize the output mean and variance of each layer. It uses the normalization method to constrain the input value of the middle layer of the network. The normalization process is a distribution with a mean value of 0 and a variance equal to 1. To avoid the problem of vanishing gradients. A larger value means faster learning convergence, which can greatly speed up training. Through experimental tests, setting the training times to 300 steps can shorten the training time by half.

Dataset
In this paper, we used the ECG data set provided by the 2017 PhysioNet/CINC challenge. The data distribution is shown in Tab. 1. Each piece of ECG data consists of two parts, an ECG data .mat file and an ECG data annotation .hea file. We used 90% of the ECG data as training data for training, and the remaining 10% as a test set for cross validation.

Evaluation standard
The sampling dimension of this experiment is sampled according to the average value, and the three-level evaluation index is used to evaluate the classification result. The firstlevel evaluation uses a confusion matrix (also called the error matrix, confusion matrix) to display the classification effect [Ting (2017)], and the classification results are shown in Fig. 5. The accuracy of the second-level evaluation index (AC) is used to evaluate the whole model. Its formula is as shown in Eq. (11). The training and test results are shown in Fig. 6. We use accuracy to calculate the ratio between the classification results made by the ECG automatic classification model and the true results. Among them, the actual value in the real data is positive, and the automatic ECG classification model is determined to be positive (TP), the actual value in the real data is positive, and the automatic ECG classification model is determined to be negative (FN), the actual value in the real data is negative, and the ECG is automatically The classification model judged positive (FP).
The actual value in the real data is negative, and the ECG automatic classification model judges it as negative (TN). The three-level evaluation index F1 score is used to evaluate the classification performance. His calculation formula is as shown in Eq. (12). The training and test results are shown in Fig. 7, where P denotes precision and R denotes recall. The F1-score indicator combines the results of the precision and recall output.

Method comparison
Smisek performed data classification through two parts: data feature calculation and data classification. In feature calculation, single-lead ECG single beat and entire ECG record are used to synthesize ECG data features. Data classification is combined with support vector machine, decision tree and threshold-based. There are rules for ECG classification [Smisek, Hejc, Ronzhina et al. (2018)]. Rubin used the signal quality index (SQI) algorithm to evaluate the noise situation. According to the SQI index to filter the noise, used dense convolutional neural networks, the ECGs of different durations are extracted through two convolutional neural network models for feature extraction and ECG classification [Rubin, Parvaneh, Rahman et al. (2018)]. Pyakillya et al. [Pyakillya, Kazachenko and Mikhailovsky (2017)] used a deep learning framework to do feature extraction. It used a one-dimensional convolutional neural network and cooperates with a fully connected layer to perform data feature mining. Finally, the ECG signals are classified and displayed. Warrick also used a deep learning framework to extract ECG data features. It used a combination of convolutional neural networks and LSTMs to mine data features, and used dropout and normalization to prevent overfitting and improve the efficiency of the algorithm [Warrick and Homsi (2018)]. Rizwan used sparse coding as an extraction tool for unsupervised learning when extracting features, and processes it through data dimensionality reduction. Ultimately used decision tree to classify ECG data [Rizwan, Whitaker and Anderson (2018)]. In this paper, the wavelet transform is used to filter the ECG signal, and the analysis is localized on the time and frequency of the ECG signal. The signal is gradually multiscaled and refined by the telescopic translation operation. The signal characteristic value is effectively retained, and a deep convolutional neural network is used to better extract the hierarchical features of ECG signals, and finally achieve good performance on the test data set.

Conclusion
This paper proposes an automatic ECG signal classification method based on deep convolutional neural network, and uses wavelet transform to perform data filtering. The wavelet basis function is used to decompose the ECG signal into 9 layers of sub-signals according to the sampling frequency. After segmentation filtering, wavelet reconstruction is performed. The 24-layer CNN is used to extract features using cross-size convolution kernels. Dropout is used to transmit feature information. Batch normalization is used to prevent data overfitting, and finally, the softmax classifier is used for classification. The method is validated on the ECG dataset provided by the 2017 PhysioNet/CinC Challenge with an accuracy of 0.871 and a F1 score of 0.8652. The main conclusions of this study are as follows: the wavelet transform can effectively eliminate ECG signal noise, and the 24layer CNN can extract multilevel features and increase the size of the convolution kernel to increase the perception field to improve the classification performance of the model.