An Intelligent Heartbeat Classification System Based on Attributable Features with AdaBoost+Random Forest Algorithm

Arrhythmia is a common cardiovascular disease that can threaten human life. In order to assist doctors in accurately diagnosing arrhythmia, an intelligent heartbeat classification system based on the selected optimal feature sets and AdaBoost + Random Forest model is developed. This system can acquire ECG signals through the Holter and transmit them to the cloud platform for preprocessing and feature extraction, and the features are input into AdaBoost + Random Forest for heartbeat classification. The analysis results are output in the form of reports. In this system, by comparing and analyzing the classification accuracy of different feature sets and classifiers, the optimal classification algorithm is obtained and applied to the system. The algorithm accuracy of the system is tested based on the MIT-BIH data set. The result shows that AdaBoost + Random Forest achieved 99.11% accuracy with optimal feature sets. The intelligent heartbeat classification system based on this algorithm has also achieved good results on clinical data.


Introduction
In recent years, the incidence rate of cardiovascular diseases is increasing, which seriously threatens human life [1]. Arrhythmia can be divided into two types, life-threatening arrhythmia and nonlife-threatening arrhythmia. Lifethreatening arrhythmia can lead to cardiac arrest and sudden death. ese patients need urgent treatment. Although nonlife-threatening arrhythmia may not threaten human life immediately, it still needs to be treated timely to avoid further deterioration. erefore, the intelligent detection and diagnosis of arrhythmia is of great significance for monitoring, preventing the occurrence of heart disease, and improving the work efficiency of doctors.
Long-term continuous monitoring [2,3] of electrocardiogram (ECG) provides valuable information for the prevention of heart attack diseases [4]. Doctors can diagnose the nature of arrhythmia by analyzing the ECG. In order to monitor the nature of abnormal heartbeat, it is necessary to analyze the electrical signal of each heartbeat. However, analyzing long-term ECG records is very time-consuming for doctors. Sometimes, doctors may inevitably make personal mistakes. erefore, accurate intelligent classification of arrhythmia can improve doctors' efficiency and reduce the occurrence of misdiagnosis or missed diagnosis [5].
is paper presents a heartbeat classification system based on multiple-feature fusion and improved random forest, which can classify arrhythmia from real data collected by the ECG acquisition equipment. is system completely accomplished the process from ECG signal collection to heartbeat classification, and then to present the classification results to doctor. e contribution of this research presented in this paper is as follows: (1) Developed an intelligent heartbeat classification system from collection, analysis, to result presentation. is system can effectively improve doctors' work efficiency.
(2) An optimal attributable heartbeat feature set is obtained towards the best possible heartbeat classification system via feature comparison identified through the implementation of multiple classifiers, arrhythmia classification analysis, and feature combination comparison. (3) After testing, AdaBoost + Random Forest was found to be the best heartbeat classification system. is method can not only deal with input samples with high-dimensional characteristics but also effectively deal with imbalance data classification with random forest approach, which provided an effective method to balance the error of the data set. (4) e interpretability analysis of the learning content of heartbeat classification system based on Ada-Boost + Random Forest algorithm is carried out by constructing feature sets of different ECG prior knowledge and improving the accuracy of heartbeat classification, which has important clinical significance.
e rest of this paper is organized as follows: Part 2 briefly introduces the related work. Part 3 provides an overview of the system architecture. Part 4 details the major process of our system, the algorithm of heartbeat classification. Part 5 presents system performance analysis. Part 6 summarizes the full text.

Related Work
In the past, the diagnosis of arrhythmia was mainly dependent on the experience of doctors. With the development of artificial intelligence, automatic classification has been applied to various industries in recent years [6][7][8][9]. A variety of systems for automatic classification of heartbeat had been proposed by some researchers. ese systems can be divided into two types, a classification system based on deep learning and a classification system based on feature engineering.
A few systems [10][11][12][13][14] used deep neural network to classify heartbeat, but deep neural networks (DNN) have problems with parameter redundancy. Hilera et al. [15] used ANN to detect arrhythmia to study the usefulness of ANN in clinical diagnosis. But, it is difficult to accurately analyze the performance indicators of neural network using ANN. Li et al. [16] merged the shape and rhythm of the heartbeat into a two-dimensional information vector, and used Convolutional Neural Network to classify the heartbeat. e results show that the system has a better classification effect on v and s categories. Wang et al. [17] proposed a globally updatable classification system in view of the large individual differences and the high cost of marking the ECG icon, which has a good classification effect. e algorithm of heartbeat classification based on deep learning can effectively distinguish between different types of arrhythmias. It is very important for doctors to diagnose diseases, as precise analytical classification can help doctors to accurately diagnose and accurately draw up appropriate treatment plans. However, methods based on deep learning cannot analyze the impact of features on the performance of heartbeat classification.
Feature extraction is an important step in the accurate classification of arrhythmia. Over the past few decades, researchers have used a variety of features to automatically classify arrhythmia, including ECG morphology [18,19], interval [20][21][22][23], QRS area [24,25], wavelet coefficient [26,27], hermite coefficient [28], and high-order statistics [28,29]. e relevant features of ECG signals are extracted, and then input into learning algorithms to induce models to classify arrhythmia. At present, machine learning is widely used in medical diagnosis to assist doctors improve the efficiency of diagnosis and treatment. Yildirim [30] uses the wavelet transform method to detect heartbeats in ECG data, and divides these heartbeats into segments according to certain cycles, and then performs multi-resolution wavelet transforms into the segmented signals to obtain different frequencies of the wavelet coefficients, and then develop a heartbeat classification system. e test results confirm the effectiveness of this approach. However, this method only focuses on the features of wavelet coefficients and does not concentrate on other features, which provides a good potential for further improvement. Alickovic and Subasi [31] applied Random Forest to the diagnosis of ECG arrhythmia, using Discrete Wavelet Transform to decompose the ECG signal into different continuous frequency bands, and the purpose of ECG intelligent diagnosis system is to distinguish each kind of heartbeat accurately. e above work has made some achievements in heartbeat classification using morphological features, interval features, QRS wave area, wavelet coefficients, and other aspects, and these intuitive features make full use of the doctor's logical experience and ECG data. ere is an expectation in improving heart rate classification accuracy. e project depends on the combination of intuitive attributive features. In order to select the best feature combination, the influence of different feature combinations of heartbeat classification is analyzed. A new heartbeat classification system based on attributable intelligent classification method is presented in this paper. It improves the accuracy of classification and the work efficiency of doctors.

System Architecture
e fundamental purpose of the ECG intelligent diagnosis system is to distinguish each kind of heartbeat accurately.
ere is an expectation in improving heart rate classification accuracy. e research value of this project depends on the effective selection and combination of intuitive attributive features, as well as the interpretable reasoning analysis of classification performance. A new heartbeat classification system based on multi-feature fusion and Ada-Boost + Random Forest method is presented in this paper. In order to select the best feature combination, the influence of different feature combinations on heartbeat classification is analyzed. e best feature combination is selected and input into AdaBoost + Random Forest for classification. It effectively uses the ECG data and improves the accuracy of classification.
e focus of the system is heartbeat 2 Journal of Healthcare Engineering classification system based on attributable features and AdaBoost + Random Forest algorithm; the next section will describe the system process in detail. is section introduces the overall structure of the system. e overall architecture of the system is shown in Figure 1. It shows that the system components mainly include Holter, smart phones, and soft system. e data collected through Holter is uploaded to the server for analysis. is process requires the user to download the APP on the mobile phone. Before classification, the system automatically extracts features based on denoised ECG data. ese features are the best combination of features selected after the test. Compared with other systems, the classification system based on these features makes full and effective use of ECG data and has better interpretability. After classification, the result report will be sent to the patient after being checked by a doctor. Figure 2 is the logical design of the system. Figure 3 shows the ECG system function diagram. e system is developed based on Microservice [32], which has better scalability, easy deployment, and low coupling. e holter collects data from the patient and transmits the data to the subject's mobile phone via Bluetooth. e acquired data are stored in the ECG parameter center on the cloud platform through the http/https protocol, namely, ECG-DatabaseGateway Server. e specific transmission, storage, and analysis steps of the collected ECG data on the cloud platform are as follows: ECGDeviceGateway receives and distributes ECG data; ECGDataRouter server realizes the forwarding of ECG data through the subscription-publishing mechanism through NATS. ECGAnalyzeGateway sends the ECG data analysis request to ECGAnalyzeServer that deploys the core algorithm proposed. After receiving the request, ECGAnalyzeServer passes the ECG parameters to the algorithm program for analysis. e transmission uses RPC [33] in order to achieve high transmission efficiency and low performance consumption.
e ECGAnalyzeGateway server uses NATS [34] to transmit the analyzed results to the ECGPipeGateway server for real-time transmission during diagnosis. e ECGPipeGateway server displays the results of real-time ECG monitoring on each terminal through the WebSocket protocol [35] that can unilaterally send information to the client, reducing the delay caused by waiting for a reply. At the same time, ECGAnalyzeServer transmits the results to the ECGDataBaseGateway server, which is responsible for storing results of the ECG analysis and classification. Due to the complexity of ECG data, MongoDB [36] is used to store ECG data.

System Process
is process uses multi-feature fusion and Ada-Boost + Random Forest method to classify heartbeats. It includes ECG signal data, preprocessing, feature extraction, and heartbeat classification. Figure 4 shows the framework of heartbeat classification algorithm.

ECG Signal Preprocessing.
In order to classify heartbeat more accurately, it is necessary to denoise the ECG data and detect waveform. e wavelet transform can retain the features and important physiological details of ECG signal, and have a simple calculation process [37]. erefore, a continuous wavelet is used to remove the original ECG signal noise and detect the boundaries and peak positions of the three waves. In this paper, wavelet transform is a signal time-frequency analysis method [38], which provides timedomain and frequency-domain features. e principle of continuous wavelet transform [39] is shown in formulas (1) and (2). (1) and (2). is paper uses wavelet transform to achieve denoising and R wave detection.
In formula (1), a is the scale factor, and b is the transform factor. eir role is to stretch the basic wavelet function ψ(t); τ reflects the displacement, and a and τ are continuous variables. e results of CWT can be expressed as functions of scale factor a and transformation factor b. e translation factor enables the wavelet to complete the ergodic analysis along the time axis of the signal. e transform factor can approach different frequency signals in every traversal through scaling wavelet transform.

Heartbeat Feature Extraction and Combination.
A complete cardiogram consists of P wave, QRS wave, and T wave. e time interval between the feature points of these waves directly reflects the systolic and diastolic process of heart atrium and ventricle, which is of great value in the diagnosis of heart disease [40]. e extracted ECG features are described as follows: In this paper, the sampling rate is 360 Hz. According to the R peak position, 235 single heartbeat morphological features are extracted [41]. Among them, there are 90 sampling points before the R peak and 144 sampling points after the R peak. If there are less than 235 sampling points before and after the first or last QRS complex is detected in the ECG record file, the corresponding heartbeat is ignored. Each record in the MIT-BIH arrhythmia database contains two leads, of which lead A is lead II and lead B is lead V1. However, in some records, lead B is known to be V2, V5, or V4. According to literature [41,42], QRS complex is more prominent in lead A, so lead A is usually used to detect heartbeat, lead B has more advantages in distinguishing S and V category arrhythmia. A total of 470 single heartbeat morphological features are obtained from two leads, respectively. Figure 5 shows the two leads in an MIT-BIH arrhythmia database record 210 (10 sec). P wave interval, QRS wave interval, T wave interval, PR segment, ST-T interval, QT interval, and RR interval are common features of ECG. e sampling rate (SR) is 360 Hz in this study. Figure 6 is the interval diagram of normal heartbeat. e calculation formula is shown in equations (3)- (9):

RR interval
e QRS complex is the most energetic part of the ECG signal and contains most of the information of the entire heartbeat. e time from the start of the QRS complex to the end of the QRS complex is the QRS time limit, and the QRS wave area is the integral sum of the QRS complex from the start to the end. e QRS complex reflects the changes in left and right ventricular depolarization potentials and time.
e first downward wave is the Q wave, the upward wave is the R wave, and the downward wave is the S wave. erefore, the shape of the QRS complex is mainly considered when selecting the wavelet base. According to the waveform of a cardiac cycle, we choose the db6 wavelet as the wavelet basis function to implement wavelet decomposition [43], because the db6 wavelet has good regularity, which makes the   Figure 7 shows ECG original signal and its wavelet transform on a 1-6 scale. Figure 8 shows ECG original signal and its wavelet transform on a 7-12 scale. S represents the input ECG signal, A represents the approximation coefficient of the wavelet packet decomposition, and D represents the detail coefficient of the wavelet packet decomposition. In this paper, the wavelet coefficients are extracted as features.
In feature engineering, as a single feature cannot fully describe the properties of ECG comprehensively, first-order discrete features are often combined to form high-order combined features, so as to improve the fitting ability of complex relationships. erefore, the features are divided into five sets, namely, A, B, C, D, and E. Among them, set A and B are morphological features, set C is interval information, set D is area information, and set E is frequency feature. e definition and division of five feature sets are as follows:  [44][45][46]. e algorithm trains different basic learners on the same training set, changes the weights of samples of iteration, and uses a weighted voting mechanism to stack multiple basic learners, and finally gets the best, strong learner for the overall classification performance. e basic learner of AdaBoost + RF algorithm consists of random forest. Random forest is an important integrated learning method based on bagging. e final prediction result is based on a voting algorithm. Compared with other classification algorithms, random forest algorithm can    maintain high accuracy and has good stability [47]. e steps in AdaBoost algorithm are described in Algorithm 1. Figure 9 shows the process of generating the final classifier. Equations (10)-(13) refer to literature [45].
In AdaBoost ensemble algorithm, the training data set is X � (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x N , y N ) , where x i represents the sample point and y i represents the corresponding category of the sample; the importance of the basic classifier G m depends on its error rate. e error rate e m is defined as ECG original signal and its wavelet transform in 1, 2, 3, 4, 5, 6 scales 2 1 0   where (x j , y j )|j � 1, 2, . . . , N represents a set of N training samples. If the predicate P is true, then I(p) � 1, otherwise 0. e base classifier is defined as According to the definition of α m and e m , the lower error rate means that the base classifier is more important. Once the e m of a base classifier is higher than 50%, the weight of this round needs to be restored to the initial value and resampled. Update the weight distribution of the training data set: is the gauge factor that makes the sum of all w equal to 1. e final output function is When the data are unbalanced, for the multi-classification problem, the type with a large sample size will have an overfitting phenomenon during training, and the type with a small sample size will have an under-fitting phenomenon, resulting in false high accuracy of the overall heartbeat classification. is paper deals with data imbalance based on the algorithmic perspective and alleviates the problem of data imbalance by combining AdaBoost and Random Forest. Random Forests are relatively robust to missing data and unbalanced data, and can well predict the effects of up to thousands of variables [48]. e principle is to use small sample sizes in all classifications when generating the training set, and at the same time randomly extract data from the large sample size in the classification to combine with the small sample size to form the training set to obtain multiple training sets and decision tree models. Data imbalance can be effectively alleviated by integrating multiple decision trees. Adaboost can independently and randomly extract several subsets from the majority class, combine each subset with the minority class data to train to generate multiple base classifiers, and then weight them to form a new classifier. In this paper, comparing the performance of multiple classifiers, while considering the characteristics of the classifiers, a random forest is selected as the base classifier.

System Performance Assessment Analysis
en, ensemble learning is used to alleviate data imbalance and improve classifier performance. is section mainly describes the test data, evaluation criteria, and presents 16 different test results tested on different sets of data with varied composition of heartbeat features. e results of different test are compared and analyzed. Clinical data are used to verify system performance.

Test Data.
In this paper, all the tests are carried out on MIT-BIH arrhythmia database. e MIT-BIH arrhythmia database is a standard database to evaluate arrhythmia detection, and is widely used for algorithm verification. e database contains 48 records from 47 subjects. Each record contains two 30-minute ECG lead signals (lead A and lead B). In 48 records, 23 records included normal sinus rhythm (NSR) and a representative group of conventional arrhythmias; the other 25 records included uncommon but clinically significant cardiac abnormalities [49].
According to the ANSI/AAMI EC57 standard proposed by the Association for the advancement of medical instruments (AAMI), there are five categories proposed by AAMI in 2012-specific classification: N (nonectopic beats), S (supraventricular ectopic beats), V (ventricular ectopic beats), F (fusion beats), and Q (unknown beats) [50]. In this paper, 90% data of MIT-BIH arrhythmia database are randomly used for training set and 10% data of MIT-BIH arrhythmia database are used for test set, and there is no intersection between the training data set and test data set. e experimental data statistics can be seen from Table 1. e common heartbeat category example is shown in Figure 10. (14)- (17), TP, FP, TN, and FN need to be calculated in this paper to get the result of heartbeat classification. Among them, TP N represents N category true-positive heartbeat, FP N represents N category false-positive heartbeat, TN N represents N category true-negative heartbeat, and FN N represents N category false-negative heartbeat. e classification results of other categories of heartbeat are calculated in the same way [39]. Table 2 shows the confusion matrix of classification results, where, N, S, V, F, Q represent the real type of heartbeat, and n, s, v, f, q represent the predicted result.

Evaluation Measurement. As shown in formulas
In this paper, sensitivity, specificity, positive predictivity, and accuracy are used to evaluate the performance of classifiers [39]. Sensitivity (Se) refers to the proportion of samples judged as positive cases in all positive cases. Specificity (Sp) refers to the proportion of samples judged as negative cases in all negative cases. e positive predictive Journal of Healthcare Engineering value (+p) is also known as precision. Accuracy (Acc) is the ratio of correctly classified samples to total samples. e calculation formulas (18)-(21) for the above four evaluation indicators are as follows: Acc � TP + TN TP + TN + FP + FN .    98.59%.  (6) and test (7), the deficiency of this feature combination is the lack of interval features, which is an important basis for judging S category and F category of heartbeat. In test (9), the heartbeat is classified based on set C, D. e results show that the average accuracy of classification is 96.59%, but the sensitivity of F category heartbeat is only 39.24%. Table 11 shows the classification results of the test on the interval feature and QRS area. Compared with test (7), it can be seen that it is better to use the 470 single heartbeat morphological features to distinguish F category heartbeat. Test (10) classified heartbeat based on set C, E. e test results show that the average classification accuracy is 98.51%. Table 12 shows the classification results of the test on the wavelet coefficient and interval feature. However, this kind of feature combination cannot effectively distinguish morphologically similar categories of heartbeat, such as N, S category, or V, F category. In test (11), the heartbeat is classified based on set D, E. e results show that the average accuracy of heartbeat classification is 98.24%. Table 13 shows the classification results of the test on the QRS area and wavelet coefficient. Similar to the test (10), this feature combination also cannot effectively distinguish morphologically similar categories of heartbeat.
Test (12) classified heartbeat based on set B, C, D. e test (12) results show that the average classification accuracy is 99.09%. Table 14 shows the classification results of the test on the interval feature, QRS area, and the 470 single heartbeat morphological features. Compared with test (9), the 470 single heartbeat morphological features can effectively improve the overall recognition rate of heartbeat classification. Test (13), heartbeat is classified based on set B, C, E. e average accuracy of this experiment is 99.00%. Table 15 shows the classification results of the test on the 470 single heartbeat morphological features, interval feature, and wavelet coefficient. Compared with test (14), the interval features are slightly better than QRS area in overall classification performance. In test (14), the heartbeat is classified based on set B, D, E. e test (14) results show that the sensitivity of S category and F category heartbeat is not high, and the average accuracy of classification is 98.92%. Table 16 shows the classification results of the test on the 470 single heartbeat morphological features, QRS area, and wavelet coefficient. Compared with test (16), the disadvantage of this experiment is the lack of interval feature, which is an important basis for improving the judgment of heartbeat category. In test (15), the heartbeat is classified based on set C, D, E. e test results show that the average accuracy of classification is 98.44%. Table 17 shows the classification results of the test on the interval feature, QRS area, and wavelet coefficient. Compared with test (16), it is necessary to use the 470 single heartbeat morphological features to distinguish heartbeat categories. In test (16), heartbeat is classified based on set B, C, D, E. e test results show that the average classification accuracy is 99.11%. Table 18 shows the classification results of the test on the 470 single heartbeat morphological features, interval features, QRS area, and wavelet coefficient. All the above experiments show that the optimal feature combination is set B, C, D,     E. Figure 11 presents the classification results and performance of the optimal feature combination with Ada-Boost + Random Forest model.
As the tree (n_estimators) in the random forest of the base classifier is random, different classification results will be produced when different number of parameters are set.     Generally, the number of parameters is too small to fit; and the number of parameters is too large to improve the model significantly. erefore, the parameter selection is very important. Table 19 is the classification result using different number of parameters of the optimal feature combination. It can be seen from Table 19 that the performance of the classifier is the best when n_estimators are equal to 70. e purpose of this experiment is to compare the classification performance of multiple classifiers and find that the random forest algorithm has a better recognition effect on the small sample types in the unbalanced experimental data in this article, so the random forest model is used as the basic classifier for AdaBoost ensemble learning. In test (17), the accuracy is used as the evaluation indicator to     In this paper, after comparing the classification performance of these classifiers under each feature combination, the Random Forest with the best performance is selected as the base classifier for ensemble learning. Tests (1)- (16) are the detailed results of the classification of AdaBoost + Random Forest.

Comparative Results with Other Implemented Classification Methods.
e system developed in this paper performs better on classification indicators [43,[51][52][53][54]. Table 21 presents comparison with previous studies. Compared with references [21,[55][56][57][58], the sensitivity and positive predictive value of N, S, and V heartbeats have     been greatly improved. Compared with other methods, the accuracy of heartbeat classification is improved. e experimental results show that the method has the advantages of distinguishing N (nonectopic beats), V (ventricular ectopic beats), and Q (unknown beats). AdaBoost + Random Forest model is used to classify arrhythmia, and an accurate and objective heartbeat analysis system is established.

Clinical Data Test.
In order to verify the actual effect, real data were collected for testing. e disease tag is the result given by the doctor. e format is float, 32 bit binary format, the sampling rate is 1000, and each data have 12 leads. Figure 12 is the presented results of the ECG classification. erefore, this system has evident clinical significance and practical value in the diagnosis of arrhythmia.

Conclusion
A novel, effective system of arrhythmia classification based on multi-feature fusion with optimal feature selection using AdaBoost + Random Forest model is presented in this paper. Based on this system, doctors can check the similarity and difference in features through machine learning model. e classification system of arrhythmia proposed in this paper has high recognition rate and is of great significance in clinical application. Although cardiac classification has made significant progress in the diagnosis of cardiovascular diseases, the sensitivity of this method of S and F category heartbeat needs to be improved. In order to achieve better classification effect, the future research will focus on improving the recognition performance of S and F category heartbeat.
Highlights of this paper are as follows: (1) is system accomplished automation from ECG signal collection, intelligent analysis to result presentation, thereby effectively improving the efficiency of doctor's diagnosis. (2) In the case of unbalanced ECG data set, a novel AdaBoost + Random Forest approach is proposed for the heartbeat classification system. (3) e framework is used to learn the potential correlation between an individual heartbeat internal data and the relationship of the different individual heartbeats. (4) Among a total 8 of classifiers examined, the Ada-Boost + Random Forest is capable of achieving the best on the obtained optimal feature set.

Data Availability
(1) All data sets used to support the findings of this study are included within the paper. (2) All data sets used to support the findings of this study were supplied by the publicly available MIT-BIH database from the Massachusetts Institute of Technology. e URL to access this data is https:// www.physionet.org/cgi-bin/atm/ATM. (3) e coding used to support the findings of this study have not been made available because the source code in this paper is part of a national project and is a trade secret; hence, the source code is not available.

Conflicts of Interest
e authors declare that they have no conflicts of interest to this work. e authors declare that they do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.