Support Vector Machine-based Multi-scale Entropy of Curves Recognition for Electrocardiogram Data

collaboration Authors CCW and CDC designed the study, the protocol and wrote the first of the manuscript. Author CDC collected the data and searched for the literatures. Authors CCW and CDC analyzed the results. ABSTRACT Objective: Multiscale entropy (MSE) analysis has been widely used to analyze the physiological signals in the frequency domain. Higher complexities of MSE curve present in the physiological system have the better ability to adapt under environmental change. Most people use the subjective experience to distinguish different complexity groups of MSE curves. When the difference between curves is hard to distinguish, the results are often misinterpreted. Methodology: In this study, four features were designed for the purpose to use the support vector machine technique to develop an automatic recognition procedure for the MSE curve. Results: A dataset of the electrocardiogram was used to illustrate the proposed analytical process. The results show that AUC is not the only MSE curve feature that should be employed, and new design features may increase recognition ability of MSE curves for electrocardiogram data. Conclusion: The study results imply that the proposed process can facilitate MSE recognition among nonprofessionals.


INTRODUCTION
Physiological signals provide valuable information to analyze and monitor human health status. Environmental changes or physiological conditions may cause the physiological systems to be up-or down-regulated by the interacting mechanisms that operate across multiple spatial and temporal scales. Some examples are the output signals from physiological monitoring systems such as electromyography (EMG), electroencephalography (EEG), and electrocardiogram (ECG). Costa et al. [1] stated that these signals frequently exhibit complex fluctuations that contain information regarding the underlying dynamics. Multiscale entropy (MSE) analysis, as proposed by Costa et al. [2], is a common method for quantifying the complexity, irregularity, or randomness of physiological time series signals. Physiological signals with higher complexity typically indicate a healthier physiological system [2]. Norris et al. [3] suggested that complexity might be a new clinical biomarker for outcomes. The MSE curves of a patient's heart rate within hours of hospital admission can be used to predict his or her mortality.
Traditional MSE methods that define complexity involve calculating the area under the MSE curve (AUC) or comparing the sample entropy (SampEn) values on the same scale. Trunkvalterova et al. [4] used MSE to detect autonomic dysregulation in young patients with type 1 diabetes mellitus (DM). They found that the MSE of a young patient with DM was significantly reduced on scales 2 and 3. Conversely, SampEn values of SBP and DBP on scale 3 were significantly lower in patients with DM than in healthy participants. Park et al. [5] used MSE to analyze EEG signals from patients with various pathological conditions of Alzheimer's disease (AD) to measure the complexity of the signal. They found that the MSE curves of patients with severe AD showed lower levels of entropy than those of healthy participants and patients with Mild Cognitive Impairment (MCI). Hung and Jiang [6] used MSE to investigate the effect of fatigue on cardiac dynamics during long-term web browsing. They found that the cardiac dynamics of participants who were browsing the web were less complex than those of healthy young participants under free-running conditions. MSE analysis can be used to analyze the signals of physiological time series as well as to investigate the balance problems. Jiang et al. [7] used three cases to introduce MSE analysis applied to the center of pressure (COP) signal and to compare the difference of the COP signal in young and elderly participants. They found higher levels of complexity in young participants compared with elderly participants.
Although the MSE analysis was often sufficient for distinguishing physiological conditions, Park et al. [5] showed that different physiological conditions may have similar AUCs. In such situations, using only AUC cannot distinguish between the MSE curves during varying physiological conditions; thus, other features of the MSE curves must be considered. This study examined four features from the MSE curves and used a support vector machine (SVM) to classify the various patterns of the MSE curves according to combinations of the four features. The analytical results indicate that the new design features may increase the ability to recognize data from the MSE curves.

METHODOLOGY
To evaluate a one-dimensional discrete timeseries signal , we constructed  The greatest difference between these two entropies is that SampEn does not count self matches, whereas ApEn does. SampEn has the advantage of being less dependent on the time series length and has greater consistenc broad range of possible r, m, and finite length N, SampEn is calculated as follows: where differs from to the extent that for SampEn self-matches are not counted and .
A two-stage process was proposed to analyze the MSE curve. The first stage emphasizes describing the features of the MSE curves, and the second stage classifies the MSE curves using SVM.

Features of the MSE Curves Calculated
The MSE curves are used to compare the relative complexity of normalized time series (same variance as scale 1) by guidelines: (1) If the entropy values for the majority of the scales are highest for the series, the unique series is considered the most complex. (2) A monotonic decrease in the

The coarse-graining procedure of scales 2 and 3
tion of the approximate entropy (ApEn). The greatest difference between these two entropies is that SampEn does not count self-SampEn has the advantage of being less dependent on the time has greater consistency over a , and N values. For , SampEn is calculated as follows: (2) the extent that for counted stage process was proposed to analyze the MSE curve. The first stage emphasizes describing the features of the MSE curves, and classifies the MSE curves

Features of the MSE Curves
The MSE curves are used to compare the relative complexity of normalized time series the following guidelines: (1) If the entropy values for the are highest for the past is considered the most complex. (2) A monotonic decrease in the entropy values indicates that the original signal contains information only on the smallest scale. Because higher entropy values connect and form a curve with a larger area, guideline (1) can interpreted to mean that a greater MSE curve represents a more complex physiological signal.
This study designed four features of the MSE curves to form a coordinated matrix and used this matrix to describe the curve. The features are defined as follows.

Feature 1: AUC
AUC is the most commonly used feature to describe the complexity of an MSE curve. A larger AUC indicates a physiological signal with higher complexity. We used a trapezoidal area to determine the AUC approximately: where the area(i) is the i th AUC between the curve i and the X axis, represents the scale number, and maximum scale number. In this study, 20.
represents the i th curve's SampEn value in scale j. Fig. 2a shows the gray area under the MSE curve between scales 1 and 20.

Feature 2: The slope of maximum difference of small-scale entropies
The MSE curves for physiological signals typically show a relatively substantial increase or AUC between the and is the maximum scale number. In this study, is set to curve's SampEn  − L τ τ decrease on small scales and gradually stabilize on large scales. Therefore, we selected the slope of the maximum difference of small-scale entropies as a feature. According to the MSE curve trend, in this study, the first seven scales (one-third of all scales) were defined as small scales. Fig. 2b shows the slope of maximum difference in the first seven scales. Feature 2 is calculated as follows: (4) where means the i th curve's maximum slope of the SampEn of the first seven scales, is the maximum SampEn value of the first seven scales, and represents the minimum SampEn value of the first seven scales.
The j and variables are the numbers on the scales that have the maximum and minimum SampEn values, respectively.

Feature 3: Average entropy value on large scales
Although the values of entropy increase in smaller scales for healthy participants, in larger scales the entropy values become stable. Moreover, the values of entropy obtained from elderly participants are notably lower than those from younger participants. Therefore, different physiological conditions, as well as aging, may be defined in average entropy values on large scales. (4) According to the MSE curve trend, this study used the average entropies of the last of the five scales (one-fourth of all scales) as the third feature, as shown in Fig. 2c. Feature 3 is calculated as follows: The ael(i 5 ) variable is the average SampEn value of the i th MSE curve of the last five scales.

Feature 4: The variation in the absolute difference between the two scales that comprise the first half of the scale
In this study, although the slope of the maximum difference in the first seven scales is a feature, the MSE curves may have differing patterns but similar slopes. Therefore, feature 4 was selected to overcome this situation. Feature 4 is the variation of two entropy values in the two scales that comprise the first half of the scale. The left side of Fig. 2d is an MSE curve plot. Each dot on the right side of Fig. 2d is the absolute value of the difference calculated from every two SampEn values. Feature 4 is calculated as follows: where the suffix is the number of the scale, and vhs(i) means the i th curve's standard deviation of the absolute difference between the two scales that comprise the first half of the scale.
The variable represents the standard deviation of all from to in the i th curve. Because of limited data length, signals are used only to calculate the entropies of the first 20 scales; thus, feature 4 focuses on the variations of the first ten scales.
After calculating the four features of the MSE curves in an MSE plot, we normalized each feature vector to avoid differing scale measurements for each feature. The normalization formula is calculated as follows: (7) where and are the standardized value and real value of the i th curve, respectively. For example, Here norm_area(i) is the normalization value of the area of the i th curve. The mean(area) variable refers to the average of all the curve areas, and stdev(area) is the standard deviation of all the curve areas. After normalization, the analysis produces four normal vectors of the features; each curve can be drawn in spatial coordinates according to the four features.

Using SVM to Classify the MSE Curves
SVM is a supervised learning technology for classification [9,10]; Chen et al. [11] have stated that SVM performs appropriately for problems with low training sets and nonlinear and multidimensional data. Thus, we used SVM to test the classification accuracy of various physiological states using different feature combinations of the MSE curves. Assuming a set of points as follows: (9) where indicates the class of participant , and each is a p-dimensional singular values vector.
The term j i , , 2 , 1 K = is the participant's number. The aim of the SVM is to identify a maximummargin hyperplane that divides the points into different physiological states (healthy middle-aged, healthy elderly, or patients with congestive heart failure -CHF). This hyperplane can be written as the set of points and is expressed as: is a normal vector perpendicular to the hyperplane, and is the offset of the hyperplane from the origin along the normal vector . Geometrically, the width of the margin is under the minimum .

ANALYSIS AND DISCUSSION
In this study, a dataset from PhysioNet [12] was used to demonstrate the performance of the proposed method. The dataset included two subsets of participants. The first subset was labeled as the regular sinus rhythm R-R interval database. The data comprised the heart rate R-R interval of 54 healthy participants. The 46 elderly patients were aged 65.87±3.97 years, ranging between 58 and 76 years and eight middle-aged male patients were aged 35.44±4.52 years, ranging between 28.5 and 40 years. The other subset was labeled the CHF R-R interval database and contained 29 patients (aged 55.28±11.60 years) diagnosed with CHF. All data were recorded using an ECG Holter monitor (sampled at 128 Hz) for six-hour when the participants were awake. This study used the R-R interval of healthy participants and patients with CHF and calculated the MSE values of each participant by setting the parameters of MSE to m = 2 and r = 0.15. The MSE curves of the R-R interval signals for all participants are presented in Fig. 3.   Fig. 3 shows the MSE curves of the heart rates from participants in three physiological states. As illustrated in Fig. 3a, most of the MSE curves of patients with CHF were lower than those of the other two groups. For healthy middle-aged participants, the MSE curves were mostly higher than the other curves. However, some of the patients with CHF and healthy elderly participants had AUCs similar to those of the healthy middle-aged participants. The average MSE curve for the healthy middle-aged participants was higher than that for the other two groups. The patients with CHF had the lowest average MSE curve, as shown in Fig. 3b. The healthy middle-aged participants had R-R interval signals with greater complexity than those of the patients with CHF. The complexities of the R-R signals of healthy elderly participants were at an intermediate level.
After extracting the four features of the MSE curves (discussed above), we used the SVM method to compare the classification accuracy of different feature combinations. Using random sampling, we selected 70% of the samples as the training data; the remaining 30% of the data set were used to test the classification effect. Each feature combination was tested ten times, and we used the average accuracy rate of the SVM classification results to compare the effectiveness of all feature combinations. The classification results of all the feature combinations are shown in Table 1. Table 1, the average accuracy rate for ten replications of the SVM classification using Fig. 1 (AUC) was 60.1%. Using Feature 1, 2 and 4 can achieve a 68.8% average accuracy rate. Moreover, the fewer features used higher was the accuracy rate. When all the features were used to classify the MSE curves  by SVM, the average accuracy rate of 10 replications was 65.5%. However, when we used only features 1, 2, and 4 as a feature combination, we achieve an average accuracy rate of 68.795%.

As shown in
The highest accuracy rate was 72.3% for all the analysis results using features 1, 2, and 4 as the feature combination. A comparison of the classification results of using features 1, 2, and 4 as a feature combination with that of using only AUC as the feature shows that using features 1, 2, and 4 as a feature combination for SVM can provide a highest accuracy rate.

CONCLUSION
MSE analysis is frequently used to measure system complexity. Although MSE was easily employed when comparing various physiological conditions, feature selection for the MSE curves is a critical task. This study identified four features to describe MSE curves and compared the feature combinations applied to classify the MSE curves with SVM. The electrocardiogram data from PhysioNet was used to illustrate the proposed analytical process. The results showed that the feature combination of AUC, the slope of maximum entropy difference in scales 1 to 7, and the variation in the absolute difference of the following two scales for scales 1 to 10 can provide the highest accuracy rate for all the feature combinations. The contributions of this paper can be summarized into two points. First, AUC is not the only feature that can be used for clustering the MSE curves. We suggest the slope of maximum entropy difference for scales 1 to 7 and the variation in the absolute difference between the two scales following scales 1 to 10 as additional features. Second, fewer the features used, higher is the accuracy rate.
After classifying the MSE curves into several groups, creating estimation models with groups is a critical task. The use of estimation models can help to assess the complexity and physiological condition of patients and can provide invaluable information.

CONSENT
It is not applicable.

ETHICAL APPROVAL
It is not applicable.

COMPETING INTERESTS
Authors have declared that no competing interests exist.