Automated Classification Model With OTSU and CNN Method for Premature Ventricular Contraction Detection

Premature ventricular contraction (PVC) is one of the most common arrhythmias which can cause palpitation, cardiac arrest, and other symptoms affecting the work and rest activities of a patient. However, patients hardly decipher their own feelings to determine the severity of the disease thus, requiring a professional medical diagnosis. This study proposes a novel method based on image processing and convolutional neural network (CNN) to extract electrocardiography (ECG) curves from scanned ECG images derived from clinical ECG reports, and segment and classify heartbeats in the absence of a digital ECG data. The ECG curve is extracted using a comprehensive algorithm that combines the OTSU algorithm with erosion and dilation. This algorithm can efficiently and accurately separate the ECG curve from the ECG background grid. The performance of the classification model was evaluated and optimized using hundreds of clinical ECG data collected from Fujian Provincial Hospital. Additionally, thousands of clinical ECG reports were scanned to digital images as the test set to confirm the accuracy of the algorithm for practical application. Results showed that the average sensitivity, specificity, positive predictive value, and accuracy of the proposed model on the MIT-BIH dataset were 95.47%, 97.72%, 98.75%, and 98.25%, respectively. The classification average sensitivity, specificity, positive predictive value, and accuracy based on clinical scanned ECG images can reach to 97.24%, 81.6%, 83.8%, and 89.33%, respectively, and the clinical feasibility is high. Overall, the proposed method can extract ECG curves from scanned ECG images efficiently and accurately. Furthermore, it performs well on heartbeat classification of normal (N) and ventricular premature heartbeat.


I. INTRODUCTION
Cardiovascular disease (CVD) has become one of the main causes of death worldwide in the past decade [1]. The vast majority of heart diseases include chronic diseases and often The associate editor coordinating the review of this manuscript and approving it for publication was Giovanni Dimauro .
have symptoms of arrhythmia [2]. Therefore, real-time and accurate arrhythmia recognition can provide doctors with accurate information which can not only effectively prevent the occurrence of heart disease [3] but also provide a targeted treatment program for patients with heart disease. At present, electrocardiogram (ECG) is the most commonly used clinical method for diagnosing heart disease [4], [5]. However, VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ cardiologists have limited time to spend in analyzing millions of heartbeats of patients [6], [7]. Therefore, an effective solution, such as an arrhythmia automatic detection system should be developed. Data mining is becoming popular in the healthcare field. It can extract useful information from large datasets and determine the relationship between data attributes which are further utilized for disease diagnosis and classification. Many learning machines such as artificial neural networks [8], hidden Markov model [9], K-nearest neighbors (KNN) algorithm [10], [11], and support vector machine (SVM) have been proposed [12]- [14]. Qaisar and Hussain [15] set up an automated arrhythmia classification system based on random forests by extracting wavelet decomposition and sub-band statistical features. In another work [16], DL-CCANet and TL-CCANet were proposed to extract abstract discriminating features from dual-lead and three-lead ECGs, respectively. Then, the linear SVM specializing in high-dimensional features was used as the classifier model. A composite classification and prediction model based on SVM for atrial fibrillation detection (AF) were proposed in [17]. Mohamed et al. [18] used an artificial neural network classifier to classify positive and abnormal (AN) ECG with an accuracy rate that could reach 96%.
The above experimental results proved that the use of machine learning such as the ECG automatic classification algorithm is feasible but a few information needs to be clarified. Most research included the process of feature extraction where the performance of the classifier depends on features extracted.
The low training efficiency of complex models and the long time cost of training make it difficult to repeatedly train a model using large amounts of data in the early days where there was insufficient computing power. The complex model learns the features of the training set by rote and even regards the noise in the training data as the feature after repeated training with a small amount of data making it is easy to cause overfitting of the model.
With the development of cloud computing, big data, and other technologies as well as the significant improvement of computing performance make it possible to repeatedly train model using large amounts of data thus effectively preventing the occurrence of overfitting. Meanwhile, an increasing number of scholars has incorporated deep learning [19]- [21] into ECG classification. Oh et al. [22] proposed an automated system by using a combination of convolutional neural network (CNN) and long short-term memory (LSTM) which demonstrated high classification performance in the handling of variable-length data. Hsieh et al. [23] proposed an atrial fibrillation (AF) detection method based on an end-to-end 1D CNN architecture to raise the detection accuracy and reduce network complexity. Lynn et al. [24] proposed a deep recurrent neural network (RNNs) based on gated recurrent unit (GRU) in a bidirectional manner (BGRU) for human identification from ECG-based biometrics. Zhou et al. [25] proposed a new approach that combined CNN, LSTM and rules inference for PVC detection.
However, the proposed ECG automatic classification algorithm still needs improvement in accuracy and real-time performance. Moreover, most of the studies are based on numerical ECG data. Few researches has been conducted for ECG image diagnosis. Several studies and research work have been reported in literature for ECG curve extraction and digitization from clinical ECG images. Patil and Karandikar [26] proposed an entropy-based bit plane slicing algorithm to realize ECG extraction from electrocardiogram paper records. Mishra et al. [27] proposed a deep learning model to get the threshold value for dividing the clinical ECG images into foreground and background. However, they need to find the threshold manually to generate the training set for the proposed model. Therefore, the combination of ECG curve extraction with cardiac classification through image processing in the case of digital ECG data should be explored to improve the accuracy and intelligence of ECG automatic diagnosis. The proposed classification method is divided into 3 parts as follows: 1) Automatic extraction of ECG waveforms. The ECG curve needs to be separated from the background grid of ECG before the classification of heartbeat. We proposed a comprehensive algorithm that combines Gamma transform and OTSU algorithm [28] to separate the curve. 2) Construction of automatic heartbeat segmentation and classification model for premature ventricular contraction (PVC) and normal heartbeat (N). In this study, a more intelligent convolutional neural network classification model was proposed. After the extraction of the ECG curve, the heartbeat can be segmented and classified without any further processing. 3) Evaluation and optimization of the performance of the model was performed by applying the model to clinical trials. Most of the experiments are based on standard database such as MIT-BIH database [29] which lacks clinical trials and evaluation. In this study, thousands of paper-based ECG pictures obtained from the hospital were used to verify and optimize the algorithm therefore indicating the practicability of the algorithm. Based on the experimental results, the average accuracy of classification for the normal and ventricular premature heartbeat is 98.25% based on MIT-BIH. Besides, the proposed method can extract ECG curve waveform and the accuracy of classification based on clinical scanned ECG can reach 89.33%. In comparison with the existing classification methods, the method proposed in this study is more intelligent, has clinical feasibility and has a good generalization ability.
The remainder of this paper is organized as follows: Section II describes the materials and methods adopted in heartbeat classification including database, ECG curve extraction, and CNN classifier. Evaluation experiment results of heartbeat classification are presented in Section III and discussed in Section IV. Finally, Section V concludes this paper.

II. METHODS TO ECG CLASSIFICATION MODEL
The block diagram of the proposed ECG classification model is shown in Figure 1. This study presents a novel approach for the automatic detection of ventricular premature beats by using the OTSU algorithm and CNN model. The proposed system consists of three parts, namely, the ECG curve extraction, segmentation and feature classification with CNN algorithm.
1) ECG signals from numerical database were transformed into images by plotting each heartbeat as an individual 128 × 128 grayscale image with a duration of 1 second. 2) Datasets were divided into 3 sets that have independent and unrelated set of images for the training, validation, and test sets. Moreover, the test set was based on the records before training to ensure that a separate test set is used to evaluate the model performance. The training and validation sets were used to build the CNN classifier based on AlexNet [30], and the test set was used to verify the recognition performance of the classifier to ventricular premature beat.
3) The OTSU algorithm, erosion, dilation, and other image-processing methods were used to extract ECG curves from the scanned clinical ECG images. A fixedlength sliding window was used to scan the long image of the ECG signal with a certain sliding step after ECG curve extraction and cut the ECG of the same length.
The classification results were obtained by using the equally cut ECG images into the trained classification model. In this section, we describe in detail the dataset and methods used for ECG classification.

A. DATASET
Two types of data sets, namely, numeric and image, were used in this paper. The numerical type of ECG data was obtained from the MIT-BIH [29] and FZU-FPH arrhythmia databases. The image datasets were all obtained from the scanned clinical ECG signals provided by Fujian Provincial Hospital.
The MIT-BIH arrhythmia database contains 48 twochannel ECG recording subjects where each recording has an half an hour duration of Holter recording digitized at 360 Hz. In the current study, the 102 and 104 subjects that lacks the lead II data were excluded. For the remaining 46 subjects, there were 14 subjects selected for the test set, and 25 subjects and 7 subjects for the training and validating set, respectively. The samples in the training, validation and test sets are collected from different individuals from the MIT-BIH arrhythmia database. The database contains 48 two-channel ECG recording subjects only that were collected from 47 individuals. Therefore, it is suitable to select 14 subjects as the test set.
These 14 subjects are not randomly selected. The sample size of normal heartbeats are and the PVC heartbeats in each recording subject are counted. According to the PVC numbers of MIT-BIH database, it can be divided into three type which includes A-type: there are 24 recording subjects having almost no PVC samples (<20 samples; i.e., the number of PVC samples are less than 20); B-type: there are 7 recording subjects having partly PVC samples (20< samples <100; i.e., the number of PVC samples are between 20 and 100); and C-type: there are 17 recording subjects having abundant PVC samples (>100 samples; i.e., the number of PVC samples are bigger than 100). In this study, 14 subjects were selected to generate the test set of the model, 4 of which coming from the A-type, 5 from the B-type, and 5 from the C-type.
The MIT-BIH arrhythmia database contains 74790 normal samples and 7124 PVC samples in total (excluding 102 and 104 subjects), and the test set chosen contains 22735 normal samples and 2187 PVC samples. The ratio of the training set and validating set to the test set is 7:3 [31]- [34], in the aspect of sample size. In addition, the training set and the validation set chosen has a ratio of 3:1 specifically containing 3689 and 1248 PVC samples, respectively.

VOLUME 9, 2021
The FZU-FPH arrhythmia database was established by Fuzhou University and Fujian Provincial Hospital through clinical experiment under the ethics committee number of K2019-03-009. The database has more than 500 ECG data of patients aged 18-65 years from all regions of Fujian Province. The detailed information of the equipment used in the ECG acquisition process can be found in [35], [36]. The length of each record is approximately 10 min, and the sampling rate is 100 Hz. The ECG data have seven leads, namely limb lead I, II, III, aVR, aVL, aVF, and chest lead V1. All ECG data were filtered and the QRS complex wave was detected. Each record includes details such as sex, age, acquisition time, ECG data, label of each heartbeat, and record conclusion. The label and conclusion of each record were completed by the doctors in Fujian Provincial Hospital. The symbols used in the database were the same as those in MIT-BIH database. Considering that the database is still expanding and improving, it is not available at this time.
The ECG image data used in this study was obtained from the digitally scanned version of the clinical ECG drawings provided by Fujian Provincial Hospital without any sensitive information. It has exactly 2,128 ECG sheets, with 1 sheet for each patient, were used to verify the performance of the model which includes 1,078 sinus ECG sheets and 1,050 ventricular premature beats ECG sheets. Each sheet contains 10 seconds of ECG signal collected from lead II.
The CNN classifier was designed to classify 2 types of heartbeats, particularly normal beat (N) and ventricular premature heartbeat (V), as shown in Figure 2. The sample size of the ventricular premature beats is much lower than that of the sinus beats thus, reducing the performance of the classification model. For the numerical dataset, this study adopts 4 methods to address the problem of having a class imbalance. For the image training set, this study adopted methods 3 and 4 to solve the problem of class imbalance.

B. ECG CURVE EXTRACTION
The clinical ECG image consists of two parts, namely, the ECG curve and the background grid. The separation of the curve refers to the separation of the foreground curve and the background grid in order to obtain a complete ECG curve. ECG curve extraction involves 4 steps, namely, image graying, Gamma calibration and OTSU algorithm, erosion and dilation, and image thinning. The process of ECG curve extraction from clinical scanned images is shown in Figure 3.

1) IMAGE GRAYING
Image graying refers to the conversion of color images into an 8-bit grayscale images. The grayscale images only contain brightness information, and their color information is removed. Considering that the amount of information provided by color in medical images is very small, the image can be directly converted to its grayscale equivalence to facilitate image processing later.

2) GAMMA CORRECTION AND OTSU ALGORITHM
The OTSU algorithm can be understood by subdividing the original image into its foreground and its background image through the use of a threshold. Assuming a certain threshold value, the portion whose gray value is greater than the threshold is called the background, otherwise the portion is called the foreground. The segmentation threshold of the foreground and background images is denoted as T . The proportion of pixel points belonging to the foreground in the whole image is denoted as w 0 , with the average gray level value is denoted by u 0 . The ratio of background pixel points to the whole image is w 1 , and the average gray scale value is denoted is by u 1 . The total gray scale mean value of the image is denoted as u, and the variance between classes is denoted as g. The core formula of the OTSU algorithm is as follows: Then, the formula for calculating the variance in mathematical statistics is as follows: Substituting (1) into (2), the equivalent formula below is obtained: The traversal method was used to obtain the threshold T that maximizes the variance g between classes.
However, the traditional OTSU algorithm has some limitations. The OTSU algorithm is only applicable in two conditions. First, the gray histogram of the original image exhibits a bimodal state distribution. Second, the gray histogram presents the unimodal state, and the threshold is selected at the edge of the segmentation region. This study proposes a comprehensive algorithm that combines Gamma correction and OTSU algorithm to separate the ECG curve. The difference between foreground and background is enlarged by Gamma correction so that the 2 peak distances of the gray histogram are larger. Then, given a more precise threshold which increases the success rate of the adaptive separation between the ECG curve and the background grid is obtained. The formula for gamma correction is as follows: where γ is gamma. After the ECG data has undergone through this transformation, the difference between the ECG curve where the gray value is low and the background grid is increased. Hence, this transformation will not cause a loss of image details.

3) EROSION AND DILATION
After the ECG curve is separated from the background grid by the threshold determined by the gamma correction and the OTSU algorithm, the ECG curve may have break points. The erosion and dilation was used to connect discontinuity in the curves and ensure the integrity of the ECG information.
The process of the image erosion operation and the subsequent dilation operation mainly aims to eliminate some noise in the image and connect the break points caused by some noise to ensure the integrity of the curve. Erosion is an operation to determine a local minimum. Some boundary points that are considered useless are eliminated making the boundary of the target image appear to shrink. Dilation refers to the determination of a local maximum, and this process can expand the highlight an area according to the size of the custom kernel. Generally, erosion and dilation can connect some adjacent break points.
If f (x, y) is the grayscale function of the input image and b (x, y) is a structural element, both functions are defined on R 2 or Z 2 . The grayscale erosion operation of the input image f (x, y) with structural element b can be defined as follows: where D f and D b are the domains of f (x, y) and b (x, y), respectively. The grayscale dilation operation of the input image f (x, y) with structural element b can be defined as follows: The structure element b performs erosion and dilation on an input image, which can be defined as follows: Figure 4 shows the effect of erosion and dilation in a given ECG data by a kernel with a size of 2×2 and a circular shape. Moreover, it shows that the erosion and dilation can fully separate the white part and connect the black part, thereby connecting the intermittent ECG curves.

4) IMAGE THINNING
Although the refinement of an ECG curve does not have much influence on the shape of the ECG waveform, if the ECG waveform needs to be converted into an ECG data later, the curve that has not been refined will have a greater effect. This study adopts the method of skeleton refinement based on binary edge image [37]. The basic principle of this method is the determination of points that can be deleted and then deleting them until all pixels do not conform to rules of elimination after multiple iterations. Until such time where it no longer changes, the skeleton refinement is completed.

C. CNN CLASSIFIER
CNN algorithm is an artificial neural network, which is a very effective structured multi-layer neural network with forward feedback. CNN mainly aims to recognize two-dimensional images. The network structure has strong invariance for other forms of image deformation such as translation, tilting, and scaling. Nowadays, the application of CNN is not only limited to image recognition but is also applied to speech signal processing and text recognition. Generally, CNN has three structures, namely, the convolution, pooling, and full connection layers.
In this study, the basic structure of AlexNet is followed and an optimized CNN model is used to obtain the optimal performance for ECG arrhythmia classification. The ECG images in this paper is a relatively simple 128×128 grayscale images. Hence, a deep depth layer is not needed and an increase in parameters without restriction may cause overfitting and degrade the performance.
The overall architecture of the proposed CNN model is presented in Figure 5. The model has 4 convolution layers and 3 max-pooling layers. The number in the figure represents the size of the output and the kernel. In this study, the lead II ECG was used as input in the form of a two-dimensional image and the dimension of the input layer is 128 × 128. Except that the output layer activation function is a Softmax function, the activation functions of the other layers are ReLu functions. The cross-entropy cost function is chosen as the loss function in the output layer.

D. CLASSIFICATION PERFORMANCE EVALUATION
The technical indicators of sensitivity (Se), specificity (Sp), positive predictive value (Ppv), and accuracy (Acc) are used to evaluate the performance of the proposed models. The Se, Sp, Ppv, and Acc of the classification are defined as follows: where TP (i.e., true positive) and TN (i.e., true negative) denote the number of correct classifications. In contrast, FP (i.e., false positive) and FN (i.e., false negative) denote the number of incorrect classifications. The detailed description is listed in Table 1.

III. EXPERIMENTAL RESULTS
This study utilized the OpenCV software to perform a series of image processing techniques on the ECG for the extraction of the ECG curve to use it as the input to the subsequent classification model. This study used the MIT-BIH and FZU-FPH databases and clinical scanned ECG data to verify the proposed method. This section will show the experimental results of the ECG curve extraction and the heartbeat classification experiments.

A. EXPERIMENTAL RESULTS OF ECG CURVE EXTRACTION
The scanned ECG signals obtained from the hospital were first cut and cropped to obtain the long-strip ECG image of the lead II as shown in Figure 6 (a), and the image after simple grayscale conversion is shown in Figure 6 (b). Then, Gamma correction was performed on the gray image, and the ECG curve was extracted using the OTSU algorithm as shown in Figure 6 (c). Finally, the continuous, smooth, and correct ECG curve was obtained by erosion, dilation and image thinning as shown in Figure 6 (d). Figure 6 shows that the series of methods performed can extract the ECG curve using OpenCV efficiently and accurately. The original image was compared with the curve extracted image. According to the medical diagnosis results, the main waveforms were similar and the symptoms were the same which could meet the requirements of the ECG curve and made a good preparation for the classification of heart beats.

B. EXPERIMENTAL RESULTS OF HEARTBEAT CLASSIFICATION
After the ECG curve is extracted, a fixed sliding window is used to scan the image and then the scanned image is used as an input into the trained classification model to automatically classify the heartbeats. The processing and classification results of the model on different training sets are summarized below.

1) HEARTBEAT CLASSIFICATION IN MIT-BIH AND FZU-FPH DATABASE
The ECG data may have noise interference during the acquisition process. The noise interference of the ECG signal has 3 parts, namely, power frequency noise, baseline drift, and electromyography (EMG) interference (i.e., motion artifact). Some ECG images that were converted from the data with significantly large noise interference could not even be identified by a professional cardiologist. Similarly, it could neither help the model to learn nor evaluate the performance of the model thus these data with large noise interference were removed. After the translation, noise reduction processing and screening were performed. The resulting training and validation sets consists of 148,901 samples inclusive of 90,000 normal heartbeat and 58,901 ventricular premature beats.   Figure 7, the accuracy of the training set continuously increases and gradually stabilizes after the 4 th epoch. As for the accuracy of the validation set, it is at 100% before the 4 th epoch. After the 4 th epoch, it shows an oscillating decline in its accuracy. From the 4 th epoch, the accuracy of training sets increases continuously while the accuracy of validation sets declines in an oscillating trend. From the 4 th epoch the complex classification model overfitted the data of the training set. For instance, the noise of the training set was regarded as a feature to learn which resulted in the reduction of the generalization ability of the model. Therefore, by analyzing the accuracy curves of the training set and the verification set, terminating the training of the model prematurely at the critical epoch where the accuracy of validation sets began to decline, 4 th epoch in this case, will effectively prevent the occurrence of overfitting.
The test set was adopted to further evaluate the performance of the model. Table 2 shows the performance of the model on the test set of MIT-BIH database. Records 107, 109 and 124 do not have sinus beats. Hence, their specificity cannot be calculated. Records 212 and 220 do not have ventricular premature beats. Hence, their sensitivity cannot be calculated as well. As listed in Table 2, the average sensitivity, specificity, positive predictive value, and accuracy are 95.47%, 97.72%, 98.75%, and 98.25%, respectively.
Moreover, the performance of the model was evaluated in the FZU-FPH database. As shown in Table 3, the sensitivity,  specificity, positive predictive value, and accuracy of the model are 94.97%, 99.79%, 89.29%, and 99.73%, respectively. However, considering that the FZU-FPH database was collected from the hospital, the number of sinus beats objectively exceeded that of the premature ventricular beats to a large extent resulting in a lower positive predictive value compared with that of the MIT-BIH database.

2) HEARTBEAT CLASSIFICATION IN SCANNED CLINICAL ECG
The performance of the proposed model on the ECG provided by the hospital was explored by introducing the Q class on the classification results of premature ventricular beats and sinus beats. A threshold was set and if the predicted value was less than or equal to this defined threshold, the image was classified as unable to be classified by heartbeat, namely, Q class. Otherwise, the classification results of the model were not modified. Table 4 shows the classification results of the model on the test set. The sensitivity, specificity, positive predictive value, and accuracy of the model were 97.24%, 81.6%, 83.8%, and 89.33%, respectively. Notably, the clinical ECG drawing that is a 10 sec ECG data corresponds to only one label. For example, if a 'V' is detected in a clinical ECG drawing, the model would mark the 10 sec image as 'V' similar to how a doctor interprets real ECG. Hence, the  specificity will decrease and the sensitivity will increase. In addition, considering that the baseline drift interference in some clinical ECG drawings that could not be intervened to be eliminated, 'N' is more likely to be classified as 'V' therefore reducing the specificity, positive predictive value, and accuracy of the model. Although an accuracy of 89.33% is higher than that of the doctors' interpretation, various performance indicators of clinical ECG drawing classification can still be improved by adjusting the model architecture and other methods.

IV. DISCUSSION
An automatic ECG curve extraction, heartbeat segmentation, and classification model based on CNN is proposed in this study. The experimental results indicate that the ECG classification model proposed in this study exhibits a good performance in detecting ventricular premature beats. A comparison between the proposed ECG classification model and other system is conducted and the results are summarized in Table 5.
Most of the papers are using MIT-BIH arrhythmia database for testing, except for He et al. [40], In [40], although the data of the test set was collected from 214 patients, the test set contains a very small number of samples with only 291 PVC samples. Oh et al. [38] randomly selected 10% of heartbeats from the MIT-BIH database as the test set. Krishnan et al. [41] uses only 436 PVC samples for testing. In spite of having 22 patients records for the study in in Chen et al. [39] and Malik et al. [42], there were only 7 records that have an abundant premature ventricular contraction samples (i.e., the number are bigger than 100). However, the test set used in the proposed method contains 2187 samples. In this study, a large number of samples were used for the test sets to test the proposed model which can show the generalization performance of the model more.
Oh et al. [38] used a U-net auto encoder for beat-wise arrhythmia detection. The sensitivity, accuracy, and positive predictive values were slightly lower than the proposed method in the detection of ventricular premature beats. Chen et al. [39] proposed a two-staged classification structure with global and customized classifiers. They used KNN algorithm as a global classifiers and applied a set of decision rules to the category predicted as normal beats in the global classifier to determine whether it constitutes an abnormal alarm or not. For ventricular type, the average accuracy of the model was 96.26%. He et al. [40] presented an automatic algorithm for the recognition PVC beat based on longterm 12-lead ECG. A 97.2% accuracy was achieved on the SVM classifier. Krishnan et al. [41] presented a simulinko model-based approach by using fuzzy logic for the accurate detection of PVC beats in ECG signals and formulated a severity index of the PVC. Malik et al. [42] built 5 simple, interpretable, and computationally efficient features from each cardiac cycle and allowed ventricular ectopy detector to obtain high precision.
Although the proposed model achieved good performance, it needs to be further improved in terms of the following aspects: 1) The original images used in this study were all obtained from the electronic version or scanned version. Telemedicine and intelligent medical treatment will benefit if the original image can be obtained using a mobile phone.  Her current research focuses on image processing and computer vision with applications that revolve on biomedical and transportation/traffic, anomaly detection, and IoT systems which granted ALIVE several best research paper and presentation awards both here and abroad. VOLUME 9, 2021