Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

ECG classification using 1-D convolutional deep residual neural network

  • Fahad Khan,

    Roles Data curation, Investigation, Methodology, Writing – original draft

    Affiliations School of Automation, Northwestern Polytechnical University, Xi’an, China, Department of Electrical and Computer Engineering, COMSATS University Islamabad, Abbottabad Campus, Pakistan

  • Xiaojun Yu,

    Roles Conceptualization, Formal analysis, Supervision, Validation

    Affiliation School of Automation, Northwestern Polytechnical University, Xi’an, China

  • Zhaohui Yuan,

    Roles Conceptualization, Methodology, Supervision, Validation

    Affiliation School of Automation, Northwestern Polytechnical University, Xi’an, China

  • Atiq ur Rehman

    Roles Methodology, Visualization, Writing – review & editing

    atiq.ur.rehman@mdu.se

    Affiliations Artificial Intelligence and Intelligent Systems Research Group, School of Innovation, Design and Engineering, Mälardalen University, Västerås, Sweden, Department of Electrical and Computer Engineering, Pak-Austria Fachhochschule Institute of Applied Sciences and Technology, Haripur, Pakistan

Abstract

An electrocardiograph (ECG) is widely used in diagnosis and prediction of cardiovascular diseases (CVDs). The traditional ECG classification methods have complex signal processing phases that leads to expensive designs. This paper provides a deep learning (DL) based system that employs the convolutional neural networks (CNNs) for classification of ECG signals present in PhysioNet MIT-BIH Arrhythmia database. The proposed system implements 1-D convolutional deep residual neural network (ResNet) model that performs feature extraction by directly using the input heartbeats. We have used synthetic minority oversampling technique (SMOTE) that process class-imbalance problem in the training dataset and effectively classifies the five heartbeat types in the test dataset. The classifier’s performance is evaluated with ten-fold cross validation (CV) using accuracy, precision, sensitivity, F1-score, and kappa. We have obtained an average accuracy of 98.63%, precision of 92.86%, sensitivity of 92.41%, and specificity of 99.06%. The average F1-score and Kappa obtained were 92.63% and 95.5% respectively. The study shows that proposed ResNet performs well with deep layers compared to other 1-D CNNs.

Introduction

Cardiovascular diseases (CVDs) are one of the major threats faced by humans [1, 2]. Normal heartbeat depends on factors such as age, body size, diet, emotions, and activity. Too fast or slow heartbeat conditions is medically known as palpitations. The rhythmic irregularities of heart are commonly known as arrhythmias or Cardiac Dysrhythmia. Arrhythmia is broadly categorized into two types: non-life-threatening and life-threatening.

ECG is a very easy, non-invasive, highly efficient, and useful tool to monitor and identify arrhythmia by measuring the electrical activity of heart [3]. In arrhythmias, there are three main malfunctions of heart; heart beats become slow, i.e, below 60 bpm (Bradycardia), fast over 100 bpm (Tachycardia), or irregular (Fibrillation), such that proper proportional of blood to body parts cannot be maintained by heart. The precise and early-stage detection and classification of ECG signals is critically essential for patient acute heart conditions and treatment [4]. To obtain the proper ECG record doctors use Holter and loop recorder to the suspected arrhythmias for a minimum duration of 24 hours. Later on, the ECG record is analyzed using computer programs for detecting specific type of arrhythmia, which is time-consuming procedure. Although arrhythmias have two types; non-life-threatening and life-threatening, however, arrhythmias are mostly detrimental. The type of arrhythmia is determined by the ECG signal shape and other morphological factors.

The Association for the Advancement of Medical Instrumentation (AAMI) has categorized five types of the non-life-threatening arrhythmia signals: non-ectopic beat (N), supra ventricular ectopic beat (SVEB or S), ventricular ectopic beat (VEB or V), fusion (F), and unknown beat (Q). The typical ECG waveform constitutes of the primary wave groups, such as P wave, QRS wave group, and T wave, which are all represented in a single ECG period. The energy and physiological implications of each waveform information and distinctive wavelet is different. The QRS wave group has more energy and amplitude than the P and T wave groups.

Feature extraction and pattern classification are employed in the classification of signals or images. The easiest way of ECG feature extraction can be obtained by extracting sampled points from an ECG raw signal. However, the extracted version has large features that severely affects the classifier efficiency. Other methods used for feature extraction is the morphological and/or statistical techniques. Another method for features extraction from raw signals is the morphological and/or statistical techniques. An example of such technique is the one where the RR interval measurement that uses time between the R peaks of two heartbeats, is used for features. Another statistical method for obtaining ECG features is Independent Component Analysis (ICA). In [5], training of an ECG classifier is conducted by morphological and statistical features.

The literature presents numerous techniques for the detection of heartbeat diseases including threshold-based methods [6], wavelet transform (WT) [7, 8], digital filter-based methods [9], morphology-based methods [10], non-invasive methods [11], and so on. Feature extraction of ECG signal using WT includes discrete cosine transform (DCT), continuous wavelet transform (CWT), and discrete wavelet transform (DWT). For instance, Khorrami et al. extracted features in ECG classification using DCT coefficients [12]. Furthermore, they applied CWT and DWT to obtain features for ECG classification, and provided performance classification among DCT, CWT, and DWT. In [13], the wavelet packet decomposition (WPD) is used to obtain ECG features and classification was calculated using wavelet packet entropy (WPE) and random forests (RF).

Machine learning (ML), which is a subset of artificial intelligence is being utilized for the detection and analysis of different diseases. The diagnostic tools are used in ML for the examination of diseases in human body [14, 15]. Deep learning (DL) is a subset of ML, have been widely used for the diagnostic of ECG signals and other diseases. In bio-informatics, DL techniques are extensively utilized due to their remarkable performance [14]. DL models utilize deep neural networks (DNNs), which are further categorized into CNNs, long-term short-term memory (LSTM), and recursive neural networks (RNNs). Among these, CNNs are extensively applied in various fields. The CNNs [16] and DNNs [17] have been successfully applied in classification of ECG signals.

In radiological image analysis, DL approaches are devised that provides outstanding performance [18, 19]. Recently, CNNs have shown greatest research potential since they are suitable for multi-dimensional inputs, such as ECG time series data (1-D), and images (2-D and 3-D) input [20]. Owing to the wide utilization of CNN in diverse applications [21, 22], CNN have proved effective in classification of ECG signals [23, 24]. CNN have been implemented for ECG classification that provides better accuracy and performs feature extraction by directly using input heartbeats [25, 26].

In [27], RNN is investigated where the training process was performed using the feature extraction obtaining an average accuracy of 98.06% for the classification of four different types of arrhythmias. The classification and feature extraction of 1-D ECG is performed in [26], where an adaptive CNN model is used enabling a classification accuracy of 96.72%. Moreover, their CNN model is generic due to its parameter invariance making it applicable to any real ECG dataset. Authors in [28] proposed an ECG classification model that provides a Chi2 selector, homeomorphically irreducible tree (HIT) pattern feature generator, and SVM classifier. Their model provides an accuracy of 92.95% and 97.18% classifcation accuracy for seven- and four-class ECG. A novel transformer-based DL model is presented in [29] that performed remarkable for MIT-BIH arrhythmia and MIT-BIH atrial fibrillation databases. In [30], an ECG classification model is investigated that employs a bidirectional long short-term memory networks (Bi-LSTMs) and a generative adversarial network (GAN). As reported therein, their model achieved an overall accuracy of 98.7%. On a sizable ECG dataset with more than 10,000 12-lead ECGs, it achieved accuracy scores for seven- and four-class arrhythmia classification of 92.95% and 97.18%, respectively. The accuracy of the model is on level with a DL model.

A deeper 34-layer 1-D CNN model was proposed for the classification of twelve arrhythmia types present in the time-series and obtained an average accuracy of 97.03% [31]. Li presented a 1-D CNN with five layers in additional to input and output layers for the classification of the five typical types of arrhythmias, i.e., normal, left bundle branch block, right bundle branch block, atrial premature contraction and ventricular premature contraction, achieving an accuracy of 97.5% [32]. A nine-layer 2-D CNN model was applied for an automatic classification of five different heartbeat arrhythmia types achieving an accuracy of 94.03% and 93.47% in the arrhythmia classification in original and de-noising heartbeats respectively [33]. An ECG monitoring system integrated with the Impulse Radio Ultra Wideband (IR-UWB) radar using CNN is provided with an accuracy of 88.89% [34].

Motivated by the above developments conducted in ECG classification, we have developed an ECG classification mechanism using 1-D convolutional for deep layers ResNet. Following are the main contributions of this study:

  • Development of a model that does not require a separate feature extraction procedure, rather employs convolutional and pooling layers in succession for robust features extraction from the input ECG signals, therefore, the preprocessed ECG signals are trained and classified directly.
  • Integration of SMOTE that generates synthetic minority data samples using the k-nearest neighbor technique enabling an equal number of samples for all the five heartbeat classes that are used in appropriate training of the ResNet model.
  • Designing a deeper ResNet model for ECG classification that performs significantly well with an increase in depth of the network providing a high classification accuracy for the real testing ECG dataset.

The rest of paper is organized such that Section II provides material and methods. In Section III the proposed ResNet model is presented. Experimental setup and results are discussed in Section IV and conclusions are highlighted in Section V.

Materials and Methods

The main procedure involved in the classification of ECG is provided in Fig 1.

thumbnail
Fig 1. Main procedure involved in the classification of ECG.

https://doi.org/10.1371/journal.pone.0284791.g001

MIT-BIH dataset

The MIT-BIH databases comprises of numerous sub-databases that record specific types of ECG. We have utilized PhysioNet MIT-BIH Arrhythmia [35], which is freely available and widely utilized heartbeat dataset for performance evaluation of numerous ECG categorization algorithms. This standard database comprises of a total of 48 two channels ECG signal recordings taken from 47 individuals under observation. The length and sampling rate of the recorded data are thirty minutes and 360 Hz respectively.

MIT-BIH ECG dataset is a collection and processed data of heartbeat signal, which were marked and manually interpreted by experts into 15 arrhythmia classes. However, AAMI Standard grouped these 15 arrhythmia classes into five types (one normal and four with arrhythmia), which are described in Table 1.

thumbnail
Table 1. MIT-BIH verses AAMI 5 heartbeat classes grouping.

https://doi.org/10.1371/journal.pone.0284791.t001

ECG data preprocessing

The preprocessing of ECG signal is very useful in improving the efficiency of the dataset and enables the extraction of different heartbeats from a particular ECG waveform. The preprocessing stage usually involves the steps of noise removal from ECG waveform and peak detection for segmentation of ECG signal into different heartbeat classes.

In this study, the single heartbeats from the continuous ECG were obtained using the Pam-Tompkins algorithm. Since QRS complex is the most prominent portion in the ECG, it serves as the foundation for practically all computerized ECG diagnostic methods. Here detecting every R peak is the same as obtaining a single heartbeat. The Pam-Tompkins algorithm uses a number of steps in finding R-peaks in the ECG signal, including derivative, squaring, integration, edge recognition, and search approaches for R-peaks. Finally, after detecting the QRS waveform and obtaining the P, R, and T peaks, segmentation of the single heartbeat is completed. The majority of pathological information is contained in each heartbeat, which is used for disease finding. In this paper, there are a total of 109446 heartbeats with a sampling frequency of 125 Hz taken from 44 records, which are utilized in the training and testing analysis of the proposed 1-D convolution neural network model.

Data imbalance process using SMOTE

The distribution of heartbeats in the different classes of the MIT-BIH ECG database is not uniform as shown in Fig 2. According to ANSI-AAMI, approximately 80% of heartbeats belong to N class, while the remaining 20% heartbeats are from V, S, F, and Q classes. Since, the heartbeat samples in N class is far higher than the samples in minority class, therefore, MIT-BIH classes presents a highly imbalance heartbeat dataset. Such class-imbalanced dataset results in misclassification due to the biased decision in favor of the majority class. There are different methods to solve the issue of class imbalance in datasets that employs balancing at algorithm level, data level, cost-sensitive methods, and integration methods [36]. Data-level approaches are extensively applied due to their advantages of being algorithm-independent and simplified operations. Its core concept is re-sampling, which includes both oversampling and under-sampling. Random oversampling (ROS) and random under-sampling (RUS) are the most basic re-sampling techniques. Other re-sampling techniques are EasyEnsemble [37], KNNOR [38], and SMOTE [39].

thumbnail
Fig 2. Distribution of heartbeats in different classes of the MIT-BIH ECG database.

https://doi.org/10.1371/journal.pone.0284791.g002

SMOTE is a famous technique that can remove class-imbalance problem in dataset. In traditional oversampling process, the minority data is merely duplicated from the minority dataset. Although dataset samples are increased, the oversampling does not provides any additional knowledge or variation to the classification model.

SMOTE generates synthetic minority data samples using the k-nearest neighbor technique. SMOTE begins by selecting random data from the minority class, after which the k-nearest neighbors of data are determined. SMOTE synthetic data xsyn are generated by the following mathematical relation: (1) Where xi is the instance of minority class (sample) under consideration, xj is the K-nearest neighbors of xi, δ is a vector with elements having random values from [0, 1]. Therefore, there are two steps to generate synthetic data samples in SMOTE:

  • Firstly, obtain the difference between the minority class (sample) under consideration and its nearest neighbor. The obtained difference is multiplied by a vector δ.
  • The calculated value is added to the minority class (sample) under consideration to generate xsyn along the line between vectors xi and xj.

Table 2. provides class wise heartbeats before and after using SMOTE on training dataset. SMOTE creates data balancing where all the heartbeat classes have an equal number of samples in the training dataset. It is important to note that except majority class (N) all the other four classes (S, V, F and Q) are oversampled using SMOTE.

thumbnail
Table 2. Total heartbeats in training dataset classes before and after SMOTE.

https://doi.org/10.1371/journal.pone.0284791.t002

Convolution neural network and residual neural networks

CNN are capable of extracting most appropriate features from input data by using convolution operation. Since ECG and electroencephalogram (EEG) are time-series signals, therefore, 1-D convolution is applied to process these signal.

In CNN, the input of each layer is obtained from the output of the preceding layer [40]. The fundamental unit of CNN is comprised of input layer, convolution layer, activation function and output layer. For the input x, the overall operations involved in CNN can be expressed as: (2) where y represents the output, f denotes the ReLU function, W is the convolution matrix, and b is the bias.

In complex CNN architectures, there are numerous other layers incorporated in basic structure of CNN. These layers consist of multiple convolutional layers, pooling (down sampling) layers, flatten layer, fully connected (FC) layer, and finally an output layer. CNN with sequence of layers from input to output is provided in Fig 3. A brief description of CNN is provided here:

  1. Input Layer-It is the first layer of CNN containing input data that may be time-series (1-D) or images (2-D or 3-D).
  2. Convolutional layers-These layers are responsible for convolutional operations on input dataset to extract significant features.
  3. Convolution layer employ kernels (filters) that has weight and a bias. The kernel has a matrix of weights multiplied with input data to obtain features. For time series input data, 1-D convolution is used, while image have 2-D and 3-D convolutions.
  4. Activation function-In a CNN model, each convolution layer is usually followed by an activation function. The Rectified Linear Unit (ReLu) function is a popular choice for most CNN models.
  5. Pooling layers-Because the output of a convolutional layer contains redundancy, extracting relevant features from an input data is difficult. The pooling layer reduces the number of parameters by repeatedly extracting feature value from a group of cells by some pooling method. There are a variety of pooling methods available for use in different pooling layers. Tree pooling, gated pooling, average pooling, minimum pooling, maximum pooling, global average pooling (GAP), and global maximum pooling are among some of the techniques available. The most popular pooling methods are max, min, and GAP. The over-fitting and extensive computations are reduced after processing through the pooling layer.
  6. Dropout-The term “dropout” refers to the process of removing units (both hidden and visible) from a neural network to considerably prevent the over-fitting of the underlying model.
  7. Flatten and FC layers-The flatten layers are placed before the output layer and they perform the conversion of multidimensional output into a vector. Flatten layer output is used as input for the FC layers. FC layers use the extracted features obtained from pooling layer to perform early classification on input data. Basically, the output matrix from the pooling layer is flattened to a one-dimensional vector and used as input for the FC layers.
  8. Softmax/ logistic layer-It is connected at the end of FC layers and is used to finally classify the training data into classes. For binary classification problem, logistic is used with sigmoid activation function whereas softmax is for multi-classification.

The structure of a plain CNN is shown in Fig 4(a). The filtering process occurs whenever the data goes from the convolution or pooling operations in previous layers. The overall objective of the processing on data is the size reduction of input vector. In general, it is desirable to decrease network parameters so that the problem of over-fitting does not happen. Although the learning potential of neural network increases with deepening the network, however, increasing number of layers may cause gradient dissipation or gradient explosion, which will degrade the performance and affects convergence [41]. To overcome gradient vanishing explosion in deeper networks, we employ residual networks (ResNets). ResNets avoids the gradient dissipation or gradient explo-sion issues in deep layer networks, thereby enabling improved accuracy and optimised performance.

The ResNet building block is shown in Fig 4(b). The ResNet has input parameter x and output target H(x) with a short circuit or skip connection structure. These shortcut connections in ResNet directly learn the residual given by the formula: (3) The targeted output H(x), therefore, becomes: (4)

In plain network structure, the processing of each layer comes from the output of the previous layer. In ResNet, input data do not merely depends on previous layer output but preceding network structure. As a consequence, sufficient information is extracted from input features [40]. The shortcut connections by-passes two or more layers and directly perform identity mapping. Therefore, such networks avoid performance degradation and accuracy reduction issues faced in plain networks due to large convolution layers.

Proposed resnet model

In DL, CNN has attracted popularity in recent years due to its outstanding performance for image and speech recognition applications. In CNN, feature learning is achieved by extracting useful local features from input data automatically.

Fig 5 provides the architecture of the proposed deep ResNet model. The proposed model contains deep layer architecture with three residual convolution blocks preceding a classification block. The input layer has 1-D ECG data with 188 samples. The number of channel is one because the ECG data is taken from single lead. The proposed model has six convolutional layers and three max pooling layers providing robust features extraction from the input ECG signals. The proposed model layer-wise architecture is explained as follows.

The ECG data is passed to down-sampling block, which consists of 1-D convolution and BN layer, and a ReLU activation function. The convolution layer has 64 filters, a kernel of 3, and stride of 2. The first residual block constitutes of series of two sets of convolutional layer with ReLU applied as an activation function and BN. The skip circuit follows this and then a ReLU activation function is used to reduce over-fitting. After that, BN is used to accelerate CNN training process by reducing internal covariate shift. The maximum-pooling (also known as max-pool) is added, which computes the maximum value in each patch of the feature map and enables to diminish the size of the feature. Maxpool1D with pool size of 5 and strides of 2 is used that performs max-pooling operation on spatial domain signal.

The second and third residual blocks have the same structure as the first residual block, i.e, Convolutional layer-ReLU-BN that are added to the output of down-sampling block through a skip connection, thereafter, ReLU, BN, and max pooling is performed.

Finally, the classification stage has a flatten layer that aims to translate the multi-dimensional information into 1-D information. Following flatten layer, there are 2 full connection dense layers with ReLU function and one dense layer with softmax for five heartbeats classification.

The proposed ResNet model has advantage over the plain network such that the input data does not merely depends on previous layer output but preceding network structure. As a result, sufficient useful information is extracted from input features of ECG. Moreover, the shortcut connections in the ResNet architecture by-passes two or more layers and directly performs identity mapping. Therefore, proposed network avoids performance degradation and accuracy reduction issues faced in plain networks due to large convolution layers.

Experimental setup and results

Training and testing

In this work, the real dataset is split into training and testing portions with 80:20 train-test split. 10-fold cross validation was used during the training process. Inside the training data, we further split into 80% to be the actual training data and 20% as a validation data. The validation data enables the monitoring of training process and prevent the model from over-fitting. Due to imbalance nature of heartbeat classes in original data, the real training data was oversampled using the SMOTE. This data was provided as an input to proposed deep ResNet model. After training the model, the testing procedure is conducted by using 20% of the real test dataset that was not oversampled by any method. The testing data is, therefore, the original data that the model has not seen in its learning or training phase.

The suggested model’s solidity was assessed using the ten-fold cross validation technique [42]. First, using stratified random sampling, the EEG signals are randomly divided into 10 equal parts while maintaining the class label distribution for each fold. The model is trained using 80% of the EEG segments from each fold, with the remaining 20% being utilized to evaluate the performance of the suggested model. To monitor the training process and avoid the model over-fitting, we further divided the training data into actual training data (90%) and validation data (10%). The procedure is repeated ten times, with each iteration training a fresh model with fresh training and testing data. The validation set’s classification outcomes are used to optimize the model after it has been trained using the training set.

In this study, four evaluation measures were used: accuracy, precision, sensitivity, and specificity.

  • The proportion of correctly identified instances to the total number of instances is represented by the accuracy (Acc). For the multi-class problem, Acc was calculated as follows: (5) Where TP stands for true positives, FP stands for false positives, TN represents true negatives, FN stands for false negatives, Cdepicts the class index, and N specifies the total number of classes.
  • Error rate or classification error shows the percentage of predictions that were incorrect. It is calculated by (FP + FN)/(TP + TN + FP + FN).
  • The precision (Prec) or positive predictive value (PPV) shows about the fraction of predictions as a positive class were in actual positive. Prec is calculated as: (6)
  • Sensitivity (Sen) or Recall is also called as True Positive Rate (TPR) or Probability of Detection. Sen basically provides the percentage of all positive samples that the classifier has accurately predicted as positive. Sen is estimated as follows: (7)
  • Specificity (Sp) is also called as True Negative Rate (TNR) or selectivity. Sp identifies the percentage of all negative samples that the classifier has accurately predicted as negative. Sp is estimated as follows: (8)
  • False Positive Rate (FPR), or Type I Error is the samples incorrectly predicted as positive out of total actual negatives. FPR is calculated by: (9)
  • False Negative Rate (FNR), or Type II Error is the samples incorrectly predicted as negative out of total actual positives. FNR is calculated by: (10)
  • The F1 score was calculated using the precision (Prec) and (Sen) as, (11)
  • The kappa coefficient (K) measures the agreement between predicted and true values. The higher is the K value, the better the performance of the classifier, i.e, K = 1 represents perfect agreement and K = 0 represents no agreement. K is computed as follows: (12) where c is the total number of elements that are correctly predicted, s is the total number of elements, pn denotes the number of times that class n was predicted (the sum of column n), tn is the number of times that class n truly occurs (the sum of row n), and N is the total number of classes.

Results

The ResNet model is build using Keras and Tensorflow GPU backend. After training process of the model, the network parameters were saved in the HDF5. The learning rate and the batch size have a key role in achieving the best accuracy in the automatic ECG classification. The suggested model was tested in a variety of experiments with varied learning parameter values. The speed of convergence was quite slow for a smaller value of the learning rate (i.e., less than 0.0005). The larger values also lead to low convergence. After several tests, the learning rate was set to be 0.001 and the Adam optimizer is used.

By selecting the batch size to be 32 and epochs equal to 50, the plots for training and testing accuracy were obtained as shown in Fig 6. It is evident from Fig 6 that training and testing accuracy increases with epochs. Initially, there were few valleys in the testing accuracy but after 9th epoch both curves become converging and reaches a stable state. Fig 7 shows training and testing loss curves. Initially, the testing loss shows abrupt characteristics but after 9th epoch the testing loss becomes steady with no abnormal fluctuations. 10-fold cross validation is employed to evaluate the model performance. The fold-wise accuracy plot is shown in Fig 8 providing the highest accuracy of 99.05% and average accuracy of 98.62% for the ten-folds. Without using SMOTE, the average accuracy achieved is 95%, which is far low than the accuracy obtained by employing SMOTE. Fig 9 provides the performance of proposed model using confusion matrices for all five classes without and with normalization. The diagonal elements reflect successfully categorized classes, while anything off the diagonal indicates improper categorization. The averaging of the diagonal values in the normalized confusion matrix provides the average accuracy of the classification system. Using 10-fold CV on ECG test dataset, Table 3 provides average values of different metrics such as precision, recall, F1-score, and K. Prec and Sen values for five classes are depicted in Fig 10. The 10-fold cross validation provides an average Prec and Sen values of 92.86% and 92.41% respectively, while the average K value is 95.5% (0.955). In DL, there are several challenges involved in the robust architecture of CNN. Among them, the appropriate hyper-parameters adjustment is a crucial as it will have an impact on the network’s performance as it approaches convergence. One of the most important hyperparameters to consider is the batch size. The batch size is the number of ECG data that will be utilized in the gradient estimation process, and it is one of the most important hyper-parameters to tune before starting the training process. By setting different batch size, we have evaluated the impact of batch size on network performance in terms of overall accuracy and convergence. In general, a small batch can converge more quickly than a large batch, while a large batch can achieve an optimum minimum that is not possible with small batch.

thumbnail
Fig 9. Classifier performance using confusion matrix (a) Without normalization (b) With normalization.

https://doi.org/10.1371/journal.pone.0284791.g009

thumbnail
Fig 10. Precision and sensitivity values for five classes.

https://doi.org/10.1371/journal.pone.0284791.g010

In this paper, the learning rate and the batch size are two important optimization factors that are used in the proposed model to evaluate the performance. These two op-timization parameters must be carefully chosen to get the best accuracy in the automatic categorization of arrhythmia using ECG signals in order to increase performance. The suggested model was evaluated with different learning rates and batch sizes using Adam optimizer.

Firstly, we used different learning rates with the batch sizes B = [32, 64, 100, 500, 1024, 2048, 3000, 4000, 6000, 8000, 10000] for fine-tuning the network. The number of epochs was set at 50 for consistency of results and due to the huge size of the ECG dataset. For leaning rates less than 0.0001, the speed of convergence is very slow. The stability and speed of convergence are improved as learning rate is increased.

When learning rate was set as 0.0001, Table 4 provides accuracy value for different batch sizes with fixed learning rate of 0.0001. The accuracy was highest and reached a stable state for a batch size of 1024. When leaning rate is set to 0.001, an optimum value of accuracy is obtained with stable state for a batch size of 32. Table 5 provides accuracy value for different batch sizes with fixed learning rate of 0.001. In this case, a general trend of decrease in accuracy is observed as batch size increases.

thumbnail
Table 4. Performance on ECG test dataset using different batch size for a learning rate of 0.0001.

https://doi.org/10.1371/journal.pone.0284791.t004

thumbnail
Table 5. Performance on ECG test dataset using different batch size for a learning rate of 0.001.

https://doi.org/10.1371/journal.pone.0284791.t005

Discussion

Table 6 highlights the performance comparison works published in literature by numerous ECG classification algorithms employing ML and DL concepts. Using the MIT-BIH arrhythmia database, these approaches had the best overall classification performance. The existing models utilize different training and testing datasets and diverse CNN architectures with numerous classification classes. Therefore, it is not suitable to directly compare the proposed deep ResNet model with existing techniques. Numerous ECG classification approaches for the categorization of arrhythmia have employed 1-D approach to their models using approaches such as SVM, K-NN, LSTM and CNN. In [43] authors proposed a DL based ResNet-LSTM classier combined with genetic algorithm for optimal feature combination. Their model is computationally complex and provides an average classification accuracy of 98%. However, our proposed 1-D deep ResNet classier provides superior performance in terms of average accuracy.

Conclusion

In this paper, 1-D convolutional ResNet model is proposed for the classification of five heartbeat types taken from PhysioNet MIT-BIH Arrhythmia database, which is freely available and widely utilized database for Arrhythmia classification. The entire data was divided into 80-20 train test split. We used SMOTE on training dataset only whereas testing dataset was not oversampled to preserve its originality. SMOTE creates data balancing where all the heartbeat classes have an equal number of samples in the training dataset. The training data is passed to the deep layer 1-D convolutional ResNet classifier. The model provides an average accuracy of 98.63% to classify different heartbeat signals. Therefore, the proposed model can provide effective diagnostic mechanism for heartbeat classification problems.

The procedures involved in finding the cardiac arrhythmias is a time-consuming process that requires a clinical professional to carefully observe recordings that can last for hours. CNN classifiers can enhance the performance of clinical specialists through these automated features learning CNN. This would help to enhance the clinical diagnosis and treatment of some of the most serious cardiovascular diseases.

References

  1. 1. McNamara K, Alzubaidi H, Jackson JK. Cardiovascular disease as a leading cause of death: how are pharmacists getting involved? Integrated Pharmacy Research and Practice. 2021;9:1–12.
  2. 2. Al-Absi HR, Refaee MA, Rehman AU, Islam MT, Belhaouari SB, Alam T. Risk factors and comorbidities associated to cardiovascular disease in Qatar: A machine learning based case-control study. IEEE Access. 2021;9:29929–29941.
  3. 3. Oresko JJ, Jin Z, Cheng J, Huang S, Sun Y, Duschl H, et al. A wearable smartphone-based platform for real-time cardiovascular disease detection via electrocardiogram processing. IEEE Transactions on Information Technology in Biomedicine. 2010;14(3):734–740. pmid:20388600
  4. 4. Mustaqeem A, Anwar SM, Majid M. A modular cluster based collaborative recommender system for cardiac patients. Artificial intelligence in medicine. 2020;102:101761. pmid:31980098
  5. 5. Afkhami RG, Azarnia G, Tinati MA. Cardiac arrhythmia classification using statistical and mixture modeling features of ECG signals. Pattern Recognition Letters. 2016;70:45–51.
  6. 6. Plaza-Florido A, Alcantara JM, Amaro-Gahete FJ, Sacha J, Ortega FB. Cardiovascular risk factors and heart rate variability: impact of the level of the threshold-based artefact correction used to process the heart rate variability signal. Journal of medical systems. 2021;45(1):1–12.
  7. 7. Merah M, Abdelmalik T, Larbi B. R-peaks detection based on stationary wavelet transform. Computer methods and programs in biomedicine. 2015;121(3):149–160. pmid:26105724
  8. 8. Yochum M, Renaud C, Jacquir S. Automatic detection of P, QRS and T patterns in 12 leads ECG signal based on CWT. Biomedical signal processing and control. 2016;25:46–52.
  9. 9. Phukpattaranont P. QRS detection algorithm based on the quadratic filter. Expert Systems with Applications. 2015;42(11):4867–4877.
  10. 10. Yazdani S, Vesin JM. Extraction of QRS fiducial points from the ECG using adaptive mathematical morphology. Digital Signal Processing. 2016;56:100–109.
  11. 11. Ijaz M, Rehman AU, Bermak A. Prediction of heart rate and blood oxygen from physiological signals. In: 2021 4th International Conference on Circuits, Systems and Simulation (ICCSS). IEEE; 2021. p. 244–248.
  12. 12. Khorrami H, Moavenian M. A comparative study of DWT, CWT and DCT transformations in ECG arrhythmias classification. Expert systems with Applications. 2010;37(8):5751–5757.
  13. 13. Li T, Zhou M. ECG classification using wavelet packet entropy and random forests. Entropy. 2016;18(8):285.
  14. 14. Rehman AU, Alam T, Belhaouari SB. Investigating potential risk factors for cardiovascular diseases in adult Qatari population. In: 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT). IEEE; 2020. p. 267–270.
  15. 15. Shafiq M, Yu X, Bashir AK, Chaudhry HN, Wang D. A machine learning approach for feature selection traffic classification using security analysis. The Journal of Supercomputing. 2018;74(10):4867–4892.
  16. 16. Andersen RS, Peimankar A, Puthusserypady S. A deep learning approach for real-time detection of atrial fibrillation. Expert Systems with Applications. 2019;115:465–473.
  17. 17. Pourbabaee B, Roshtkhari MJ, Khorasani K. Deep convolutional neural networks and learning ECG features for screening paroxysmal atrial fibrillation patients. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2018;48(12):2095–2104.
  18. 18. Anwar SM, Majid M, Qayyum A, Awais M, Alnowami M, Khan MK. Medical image analysis using convolutional neural networks: a review. Journal of medical systems. 2018;42(11):1–13. pmid:30298337
  19. 19. Irmakci I, Anwar SM, Torigian DA, Bagci U. Deep learning for musculoskeletal image analysis. In: 2019 53rd Asilomar Conference on Signals, Systems, and Computers. IEEE; 2019. p. 1481–1485.
  20. 20. Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, et al. Recent advances in convolutional neural networks. Pattern recognition. 2018;77:354–377.
  21. 21. Faust O, Hagiwara Y, Hong TJ, Lih OS, Acharya UR. Deep learning for healthcare applications based on physiological signals: A review. Computer methods and programs in biomedicine. 2018;161:1–13. pmid:29852952
  22. 22. LeCun Y, Bengio Y, Hinton G, et al. Deep learning. nature, 521 (7553), 436–444. Google Scholar Google Scholar Cross Ref Cross Ref. 2015;.
  23. 23. Xiang Y, Luo J, Zhu T, Wang S, Xiang X, Meng J. ECG-based heartbeat classification using two-level convolutional neural network and RR interval difference. IEICE TRANSACTIONS on Information and Systems. 2018;101(4):1189–1198.
  24. 24. Ebrahimi Z, Loni M, Daneshtalab M, Gharehbaghi A. A review on deep learning methods for ECG arrhythmia classification. Expert Systems with Applications: X. 2020;7:100033.
  25. 25. Acharya UR, Fujita H, Lih OS, Hagiwara Y, Tan JH, Adam M. Automated detection of arrhythmias using different intervals of tachycardia ECG segments with convolutional neural network. Information sciences. 2017;405:81–90.
  26. 26. Kiranyaz S, Ince T, Gabbouj M. Real-time patient-specific ECG classification by 1-D convolutional neural networks. IEEE Transactions on Biomedical Engineering. 2015;63(3):664–675. pmid:26285054
  27. 27. Salvatore C, Cerasa A, Battista P, Gilardi MC, Quattrone A, Castiglioni I, et al. Magnetic resonance imaging biomarkers for the early diagnosis of Alzheimer’s disease: a machine learning approach. Frontiers in neuroscience. 2015;9:307. pmid:26388719
  28. 28. Baygin M, Tuncer T, Dogan S, Tan RS, Acharya UR. Automated arrhythmia detection with homeomorphically irreducible tree technique using more than 10,000 individual subject ECG records. Information Sciences. 2021;575:323–337.
  29. 29. Hu R, Chen J, Zhou L. A transformer-based deep neural network for arrhythmia detection using continuous ECG signals. Computers in Biology and Medicine. 2022;144:105325. pmid:35227968
  30. 30. Sun L, Wang Y, Qu Z, Xiong NN. BeatClass: a sustainable ECG classification system in IoT-based eHealth. IEEE Internet of Things Journal. 2021;9(10):7178–7195.
  31. 31. Rajpurkar P, Hannun AY, Haghpanahi M, Bourn C, Ng AY. Cardiologist-level arrhythmia detection with convolutional neural networks. arXiv preprint arXiv:170701836. 2017.
  32. 32. Li D, Zhang J, Zhang Q, Wei X. Classification of ECG signals based on 1D convolution neural network. In: 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom). IEEE; 2017. p. 1–6.
  33. 33. Acharya UR, Oh SL, Hagiwara Y, Tan JH, Adam M, Gertych A, et al. A deep convolutional neural network model to classify heartbeats. Computers in biology and medicine. 2017;89:389–396. pmid:28869899
  34. 34. Yin W, Yang X, Zhang L, Oki E. ECG monitoring system integrated with IR-UWB radar based on CNN. IEEE Access. 2016;4:6344–6351.
  35. 35. Moody GB, Mark RG. The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine. 2001;20(3):45–50. pmid:11446209
  36. 36. Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N. A survey on addressing high-class imbalance in big data. Journal of Big Data. 2018;5(1):1–30.
  37. 37. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). 2011;42(4):463–484.
  38. 38. Islam A, Belhaouari SB, Rehman AU, Bensmail H. KNNOR: An oversampling technique for imbalanced datasets. Applied Soft Computing. 2022;115:108288.
  39. 39. Ishaq A, Sadiq S, Umer M, Ullah S, Mirjalili S, Rupapara V, et al. Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques. IEEE access. 2021;9:39707–39716.
  40. 40. Huang JS, Chen BQ, Zeng NY, Cao XC, Li Y. Accurate classification of ECG arrhythmia using MOWPT enhanced fast compression deep learning networks. Journal of Ambient Intelligence and Humanized Computing. 2020; p. 1–18.
  41. 41. Xie L, Li Z, Zhou Y, He Y, Zhu J. Computational diagnostic techniques for electrocardiogram signal analysis. Sensors. 2020;20(21):6318. pmid:33167558
  42. 42. Oh SL, Ng EY, San Tan R, Acharya UR. Automated diagnosis of arrhythmia using combination of CNN and LSTM techniques with variable length heart beats. Computers in biology and medicine. 2018;102:278–287. pmid:29903630
  43. 43. Hammad M, Iliyasu AM, Subasi A, Ho ES, Abd El-Latif AA. A multitier deep learning model for arrhythmia detection. IEEE Transactions on Instrumentation and Measurement. 2020;70:1–9.
  44. 44. Kallas M, Francis C, Kanaan L, Merheb D, Honeine P, Amoud H. Multi-class SVM classification combined with kernel PCA feature extraction of ECG signals. In: 2012 19th International Conference on Telecommunications (ICT). IEEE; 2012. p. 1–5.
  45. 45. Kumar RG, Kumaraswamy Y, et al. Investigating cardiac arrhythmia in ECG using random forest classification. International Journal of Computer Applications. 2012;37(4):31–34.
  46. 46. Martis RJ, Acharya UR, Mandana K, Ray AK, Chakraborty C. Cardiac decision making using higher order spectra. Biomedical Signal Processing and Control. 2013;8(2):193–203.
  47. 47. Park J, Lee K, Kang K. Arrhythmia detection from heartbeat using k-nearest neighbor classifier. In: 2013 IEEE International Conference on Bioinformatics and Biomedicine. IEEE; 2013. p. 15–22.
  48. 48. Lin CC, Yang CM. Heartbeat classification using normalized RR intervals and morphological features. Mathematical Problems in Engineering. 2014;2014.
  49. 49. Raj S, Maurya K, Ray KC. A knowledge-based real time embedded platform for arrhythmia beat classification. Biomedical Engineering Letters. 2015;5(4):271–280.
  50. 50. Sahoo S, Kanungo B, Behera S, Sabut S. Multiresolution wavelet transform based feature extraction and ECG classification to detect cardiac abnormalities. Measurement. 2017;108:55–66.
  51. 51. Yang W, Si Y, Wang D, Guo B. Automatic recognition of arrhythmia based on principal component analysis network and linear support vector machine. Computers in biology and medicine. 2018;101:22–32. pmid:30098452
  52. 52. Kachuee M, Fazeli S, Sarrafzadeh M. Ecg heartbeat classification: A deep transferable representation. In: 2018 IEEE international conference on healthcare informatics (ICHI). IEEE; 2018. p. 443–444.
  53. 53. Rajkumar A, Ganesan M, Lavanya R. Arrhythmia classification on ECG using Deep Learning. In: 2019 5th international conference on advanced computing & communication systems (ICACCS). IEEE; 2019. p. 365–369.
  54. 54. Izci E, Ozdemir MA, Degirmenci M, Akan A. Cardiac arrhythmia detection from 2d ecg images by using deep learning technique. In: 2019 Medical Technologies Congress (TIPTEKNO). IEEE; 2019. p. 1–4.
  55. 55. Pandey SK, Janghel RR. Automatic detection of arrhythmia from imbalanced ECG database using CNN model with SMOTE. Australasian physical & engineering sciences in medicine. 2019;42(4):1129–1139. pmid:31728941