Heart rate dynamics distinguish among atrial fibrillation, normal sinus rhythm and sinus rhythm with frequent ectopy

Atrial fibrillation (AF) is usually detected by inspection of the electrocardiogram waveform, a task made difficult when the signal is distorted by noise. The RR interval time series is more frequently available and accurate, yet linear and nonlinear time series analyses that detect highly varying and irregular AF are vulnerable to the common finding of frequent ectopy. We hypothesized that different nonlinear measures might capture characteristic features of AF, normal sinus rhythm (NSR), and sinus rhythm (SR) with frequent ectopy in ways that linear measures might not. To test this, we studied 2722 patients with 24 h ECG recordings in the University of Virginia Holter database. We found dynamical phenotypes for the three rhythm classifications. As expected, AF records had the highest variability and entropy, and NSR the lowest. SR with ectopy could be distinguished from AF, which had higher entropy, and from NSR, which had different fractal scaling, measured as higher detrended fluctuation analysis slope. With these dynamical phenotypes, we developed successful classification strategies, and the nonlinear measures improved on the use of mean and variability alone, even after adjusting for age. Final models using all variables had excellent performance, with positive predictive values for AF, NSR and SR with ectopy as high as 97, 98 and 90%, respectively. Since these classifiers can reliably detect rhythm changes utilizing segments as short as 10 min, we envision their application in noisy settings and in personal monitoring devices where only RR interval time series may be available.

detect rhythm changes utilizing segments as short as 10 min, we envision their application in noisy settings and in personal monitoring devices where only RR interval time series may be available.
Keywords: cardiac rhythm classification, RR series, nonlinear analysis, sample entropy, atrial fibrillation (Some figures may appear in colour only in the online journal)

Introduction
The dynamics of the heart beat have been studied for several decades and there is consensus that advanced mathematical analyses that describe its hallmark features, variability and complexity, yield important clinical information. While the majority of analyses have been of normal sinus rhythm (NSR), dynamical analyses of the RR interbeat intervals time series have also been advanced toward classifying cardiac rhythms. For example, it is reasonable to think that NSR might be distinguished from atrial fibrillation using nonlinear measures, and we previously developed the coefficient of sample entropy for this purpose (Lake and Moorman 2011). Similarly, there have been successful methods developed for atrial fibrillation (AF) detection based on Poincaré plots (Sarkar et al 2008) and cumulative distribution functions (Tateno and Glass 2001), as well as linear measures. These approaches might be confused, though, by sinus rhythm (SR) with frequent ectopy, for here the overall variability might rise to the level of atrial fibrillation. Since AF, SR with frequent ectopy, and NSR account for the majority of non-paced stable cardiac rhythms, methods in this area are of widespread importance.
AF commonly eludes diagnosis because of its paroxysmal nature, and, once found, decisions about its therapy are best informed by knowledge of its burden (Prystowsky 2000). For patients with AF, clinicians must make complex decisions about oral anticoagulation, rate control with either drugs or cardiac ablation with pacemaker therapy, and rhythm control with either drugs or ablation procedures. The implications for anticoagulation therapy alone are enormously important, as the rate of stroke goes up sharply with even short episodes of AF (Healey et al 2012).
Another clinical diagnosis of importance is SR with frequent ectopic beats. Atrial ectopy can be a harbinger of AF, and ventricular ectopy leads to poorer prognosis even in patients with no obvious heart disease (Lee et al 2012), possibly as a result of reversible cardiomyopathy (Yokokawa et al 2013).
Usually, AF is clinically diagnosed by the absence of P waves in the electrocardiogram (ECG) and a rapid irregular ventricular rate. However, even the high fidelity monitoring can have a poor performance when the ECG waveform is noisy or distorted. On the other hand, the heart beat patterns in ECG are less prone to artifacts and noise, such as baseline wandering, and they could be used in clinical contexts where ECG is noisy or not available and other pulsatile signals are used, such as arterial blood pressure or plethysmograph (Park et al 2009).
The analysis of the RR interbeat intervals time series is frequently applied for rhythm detection in implanted devices, where only ventricular electrographic recordings and RR intervals are available (Lake andMoorman 2011, DeMazumder et al 2013). However, noninvasive devices has a reduced confidence in detecting AF based on the heart rate or RR series alone and thus they are not commonly adopted for heart rhythm identification.
We propose a new approach in which entropy measurements are used in conjunction with other time series measurements in the time and nonlinear dynamical domains. The insight is that higher dimensional characterizations of the time series properties successfully separate the classes of rhythm. Here, we test three dynamical measures-entropy, local dynamics and fractal scaling. The coefficient of sample entropy (COSEn) is efficient at detecting AF from NSR, but yields intermediate values for SR with ectopy (Lake and Moorman 2011). The local dynamics score (LDs) detects the combination of reduced heart rate variability (HRV) with ectopy, and is a powerful predictor of four year mortality in ambulatory patients undergoing Holter monitoring (Moss et al 2014). Finally, the familiar detrended fluctuation analysis (DFA) (Peng et al 1995) informs on long-range correlations and has been studied in aging (Schmitt and Ivanov 2007), heart failure (Huikuri et al 2003, Cerutti et al 2007 and in predicting AF after coronary bypass surgery (Tarkiainen et al 2008).
One approach to rhythm classification using only RR intervals is to examine the dynamics of the time series. For example we expect AF to be irregular, SR with ectopy to have a regular baseline with intermittent departures, and NSR to be more consistently regular than either. Bravi and coworkers have reviewed the lexicon of RR interval time series variabilities and dynamics, categorizing more than 100 into domains of time, frequency, phase, nonlinear dynamical, statistical, geometric, energetic, informational and invariant domains (Bravi et al 2011). Many of these methods are at least modestly effective in distinguishing patients at high or low risk, or degrees of illness severity for patients in NSR (that is, with little or no ectopy). Some have been effective in distinguishing AF from other rhythms, principally NSR. Examples include coefficient of sample entropy (Lake andMoorman 2011, DeMazumder et al 2013), a histographic method (Tateno and Glass 2001), and a Poincaré plot-based method (Sarkar et al 2008). There is, however, no unified and effective solution for an algorithm that distinguishes NSR and AF from the condition of SR with atrial and ventricular ectopy.

Study population
We studied RR interval time series from 24 h Holter recordings collected from 2722 consecutive patients at the University of Virginia (UVa) Heart Station from 12/2004 to 10/2010, and clinical characteristics of these patients are reported in a previous work (Moss et al 2014).
The age of the patients varied from 0 to 100 years, with an average value of 47 ± 25 years. AF, premature atrial contraction (PAC) and premature ventricular contraction (PVC) labels were obtained from an automatic classifier (Philips Holter Software). The RR series were subdivided into 377 285 10 min segments, accounting for 96% of the maximum possible recording time. Each 10 min segment was classified as AF if the burden of AF was greater than 5% (i.e. for more than 30 s), as SR with ectopy if the burden of PAC or PVC was more than 10%, as NSR otherwise. After this classification of individual segments, we found that the dataset was composed of 79% NSR, 8% AF and 13% SR with ectopy. The three categories are mutually exclusive and reflect clinical practice.

Heart rate metrics
We computed means and standard deviations (SD) for 30 s segments and averaged results for each 10 min segment. Thus our measure of standard deviation is an average of standard deviations of up to 20 30 s segments. To investigate the nonlinear dynamics, we computed the coefficient of sample entropy (COSEn), detrended fluctuation analysis (DFA) and the local dynamics score (LDs).

Coefficient of sample entropy (COSEn)
COSEn is an entropy measure derived from the sample entropy (SampEn) and designed specifically to detect AF in very short RR interval time series (Lake and Moorman 2011). Generally, entropy estimators measure the degree of regularity of a signal by counting how many template patterns repeat themselves. Repeated patterns imply order and lead to low values of entropy. In particular, sample entropy is the negative natural logarithm of the conditional probability that two sequences of length m that match within tolerance r will also match at the m + 1st point. Defining as A the total number of matches of length m + 1 and B the total number of matches of length m, then the conditional probability (cp) is equal to A/B. The sample entropy is computed as: If A and B are equal, which means that the time series is very regular, the entropy measure is zero, whereas if A is smaller than B, this leads to a higher value of entropy. The choice of the parameters m and r is crucial in order to obtain a reliable estimation of the conditional probability, especially in very short time series. Lake (2006) proposed the conversion of the measured conditional probability to a probability density. The new measure was named quadratic sample entropy (QSE) and it consists of normalizing the sample entropy by the volume of each matching region, (2r) m . Equation (1) QSE allows direct comparison of results obtained by using different values of r. Regression analyses showed that heart rate was an important independent predictor of AF (Lake and Moorman 2011). Hence, the COSEn measure requires the subtraction of the natural logarithm of the mean RR interval: In this work the choice of r and m reflects the standards and findings already described in (Lake and Moorman 2011), therefore we chose m = 1 and r = 30 ms. We calculated COSEn over 30 s segments, consistent with the clinical idea that AF must usually last 30 s to be considered (Fuster et al 2006).

Detrended fluctuation analysis
DFA quantifies fractal-like scaling properties of RR interval time series (Peng et al 1995). The interbeat intervals time series of total length N is first integrated, where B(i) is the ith interbeat interval and B ave is the average interbeat interval. Next, the integrated time series is divided into non-overlapping boxes of equal length, n. In each box a least-squares line is fitted to represent the trend in that box. Let y n (k) be the y coordinate of the straight line segments. The integrated time series, y(k), is detrended by subtracting the local trend y n (k) in each box. The root-mean-square fluctuation of this integrated and detrended time series is: This computation is repeated over all the box sizes. The results from each box length are averaged-fewer points contribute to the plotted value for higher n. Typically, F(n) will increase with the box size n. A linear relationship on a log-log plot of F(n) versus the box size n indicates the presence of power law (fractal) scaling. Under such conditions, the fluctuations can be characterized by a scaling exponent. In fact, if the data are long-range correlated, F(n) increases as a power-law with respect to n, F(n) = n α Thus, the fluctuation scaling exponent can be determined by a linear fit on the log-log plot. For uncorrelated data, such as white noise, the scaling exponent is α = 0.5. A slope larger than 0.5 indicates persistent long-range correlations. In contrast, 0 < α <0.5 indicates an anti-persistent type of correlation. The slope equal to 1 is the theoretical value which corresponds to 1/f noise, and α = 1.5 to Brownian noise. The original calculation of the DFA was proposed for long signals, such as 24 h Holter ECG recordings, but it was used also in shorter interval time series (Peña et al 2009). In this work the scaling exponent α was calculated on the 10 min segments only over the box size range of 4 to 12, consistent with other measures computed here on short segments.

Local dynamics score (LDs)
The LDs is a new index to investigate the local dynamics of short RR series (Moss et al 2014). The new idea is to examine how often individual templates in a short series match each other. Given a 12-beat segment, the algorithm consists mainly into counting the number of times each sample matches with the other 11 with a tolerance r of 20 ms. A histogram of the count of templates as a function of the number of matches is constructed. If no points match, a bar of 12 counts appears in the bin 0, and all the other bins are empty, when all 12 points match each other, the histogram will have a bar of 12 counts in bin 11.
The LDs is computed as a linear combination of the values in bin 0, bin 10 and bin 11; the coefficients were normalized so as to sum to 1. For uniform distribution of matches (the counts in all bins are 1), the LDs is 1. Lower scores imply a bell-shaped distribution, and higher scores imply a distribution concentrated on either or both extremes of the histogram. Here, we calculated the score for every 10 min segment from the average of the 12-beat histograms.
This analysis is expected to report mostly on differentiating NSR from SR with PVCs. We note that the value of LDs in AF is not a useful classifier, and we note that extremely frequent ectopy such as a bigeminal pattern yields a low value, as beats are concentrated in the middle bins-in a 12-beat bigeminal series, each template matches its five siblings, and bins 0, 10 and 11 are empty.

Statistical analysis
One-way Kruskal-Wallis ANOVA was used to compare the index values among the three groups (NSR, AF, SR with ectopy) and post-hoc multiple comparisons were performed by the Wilcoxon rank-sum test using the Bonferroni correction. For this univariate statistical analysis only a single 10 min segment for each patient was randomly selected. To test the overall hypothesis that dynamical measures were useful for rhythm classification, we used several schemes. The strategy was to compare the accuracy of classification using means and SD alone with the accuracy after addition of the dynamical measures COSEn, LDs and DFA. The first scheme was a system of three multivariate logistic regression models, each used to distinguish one rhythm class from the other two. The final classification of the RR series was obtained by using the highest probability estimate among the three models.
The second approach was a K-nearest neighbors (K-NN) technique (Xiao et al 2010), which requires no statistical model. The strategy was to classify a 10 min segment based on the classification of the majority of neighbors. For a new test record x i , distances from all the points of the training set were sorted in ascending order. Points with the first K smallest distances are chosen and their classifications are used in a popular vote: the new record was classified the same as the most populous classification of the neighbors. We analysed three distance metrics: the Euclidean distance, the standardized Euclidean distance and the Mahalanobis distance and, on the basis of pilot study, selected the standardized Euclidean distance. The accuracy of classification did not increase for K > 15, and we used K = 25.
The third approach was random forests (Breiman 2001), an ensemble of tree predictors. The principle behind this method is that a group of weak learners can result in a strong learner. Every element is a normal decision tree where, at each node, a number of predictor variables are selected at random from all the predictors. The number of predictors randomly selected at each node was the square root of the number of variables available, and the predictor that provided the best split in terms of maximizing the separation between the observations was chosen. A forest of 100 trees had maximal performance. Each 10 min segment was evaluated by every tree in the forest, and the final classification was based on the popular vote.
All models were validated using a 10-fold cross-validation procedure on the entire dataset. Figure 1 shows ECGs of four patients from the UVa Holter database. Four different rhythms are illustrated: NSR (a), AF (b), SR with PVCs (c) and SR with PACs (d). Figure 2 shows 10 min segments of the RR interval time series from the same ECG recordings displayed in figure 1. In this example we display10 min records of SR with a high burden of PVCs, equal to 57% (figure 2(c)), and a very high burden of PACs, equal to 73% (figure 2(d)).

An example of analysis
The means (SD) of the four series are NSR 1170 (73), AF 749 (161), SR with PVCs 696 (151) SR with PACs 607 (185) ms. It is noteworthy that the SD is as high for the records of SR with ectopy as it is for the AF record, more than twice the variability of NSR. This points to the fundamental limitation of heart rate variability (HRV) as standardly practiced for this task of rhythm classification.
The NSR series led to the lowest value of COSEn,−2.1, and the AF series led to the highest,−0.5. SR with PVCs and PACs led to intermediate values: −1.3 and −1.8, respectively. Figure 3 shows the DFA results, plotted as the average variance as a function of the box length. As expected, NSR is readily distinguished from AF by its higher slope: 1.1 and 0.5, respectively. Both NSR and AF, though have higher slopes than either of the records with ectopy; 0.2 for the record with PVCs, and 0.3 for the one with PACs. The phenotype of the records with ectopy is a high and unvarying fluctuation, a consequence of the frequent excursions from the baseline. Thus even short boxes contain outliers and lead to large fluctuation values. Since the outlying ectopic beats are uniform in their values and density, the fluctuation values do not change as a function of window lengths.
LDs was computed on the averaged histogram of template count as a function of template matches calculated every 12 beats. SR with PVCs led to a higher value than NSR, as expected: 1.2 and 0.3, respectively. AF led to variable results (here a high value, 2.0) and LDs is not useful in its classification (Moss et al 2014). This record of SR with PACs in a bigeminal pattern led to an intermediate value of 0.5. Figure 4 shows the distribution of each index as a function of the burden of ectopy calculated on all 10 min records, with the 95% confidence interval. Each bin in the figure represents the mean and CI computed every 2% after ranking according to ectopy burden. As expected, ectopic beats decreased the mean RR interval and increased the standard deviation.

Effect of ectopy on dynamical measures
Likewise, as expected, ectopic beats increased entropy and the related LDs-after a point, though, entropy fell with increased ectopy. This may be caused by regularized trigeminal and bigeminal rhythms, or by non-sustained atrial and ventricular tachycardia. The DFA slope fell with increasing ectopic burden, as the fluctuation was maximally increased in even the shortest windows. Table 1 shows the mean values for each group and for each parameter, and the phenotypes of the example series in figures 1 and 2 are recapitulated. NSR had the lowest variability, entropy and LDs, and the highest DFA. AF was found in older patients, and had higher variability and higher entropy. Ectopy was just as variable as AF but had lower DFA.   Figure 5 plots the linear (HRV as a function of HR, panels A to C) and nonlinear (COSEn as a function of LDs, panels D to F) measures for all segments. The color represents age (blue is older, A to C) or DFA (blue is higher, D to F).

Univariate analysis
The goal of the analysis is to distinguish the rhythm categories atrial fibrillation (A and D), normal sinus rhythm (B and E), and sinus rhythm with ectopy (C and F). The major findings are all visually apparent. In the linear and age domain, AF increases with age (A), heart rates in NSR slow with age (B), and SR with ectopy overlaps AF a great deal but occur earlier in life (C). In the nonlinear domain, AF is distinguished by high COSEn (D), and NSR has higher DFA than SR with ectopy (E and F).
Thus no single univariate metric suffices to separate the groups, and we sought to capture these patterns for classification use using multivariable statistical methods.

Multivariate analyses
The separation of the three rhythm classes in figure 5, especially using nonlinear dynamical measures (panels D to F) suggests that all three multivariable methods should be suitable. The monotonic relationships of the parameters to the classes justifies logistic regression, the distinct separation of neighborhoods justifies kNN analysis, and the simple two-step process of separating AF from non-AF by high entropy, and NSR from SR with ectopy by high DFA justifies decision trees and random forests. Tables 2 and 3 show the accuracy of labeling 10 min records using the linear measures of HR and HRV along with age (table 2) and the nonlinear dynamical measures of COSEn, LDs and DFA (table 3). Table 4 shows the results for models that combine all features. The columns  give the correct classification based on the labels provided by the Holter software. The rows show the classification results from the models we developed. Positive predictive value (PPV) is the proportion of true positive over the total of true and false positive in a classification problem. The averaged values in the contingency tables are approximated to the unity. The major finding is that the nonlinear dynamical measures led to classification models with higher positive predictive values, and the models using all measures had good accuracy. The random forests analysis led to positive predictive values of 97, 98 and 90% for AF, NSR, and SR with ectopy, respectively.
Clinical practice considers a burden of ectopy to be significant at 10%, and one of our most common misclassification errors was calling a segment NSR when in fact there was slightly  more than 10% ectopy. Thus we tested the effect of varying the diagnostic threshold for ectopy from 4 to 20% in steps of 2%-as we raised the threshold, the accuracy for detecting SR with ectopy improved from 55.5% (4% burden) to 89.7% (20% burden), while the accuracy for detecting NSR decreased 4%. The overall accuracy peaked at 94.4% for a threshold of 14% burden.

Continuous behavior of the classifier
We tested the effectiveness of the classifier in a continuous context, mimicking real-time implementation. Figure 6 shows the performance of the 10 min model by adopting a two minutes update in one of the Holter recordings. Changes in heart rhythm are faithfully reported.

Discussion
We studied cardiac rhythm classification using linear and dynamical measures of RR interval time series to discriminate among NSR, AF and SR with ectopy. In 2722 24 h ambulatory ECG recordings from patients divided into 10 min RR series, we measured entropy using coefficient of sample entropy, local dynamics using a new score, and fractal scaling using detrended fluctuation analysis. We tested the hypothesis that these measures added to standard measures of HR and HRV using model-based and model-free statistical classifiers. Our major finding is improved rhythm classification using the nonlinear dynamical measures. Of particular importance was accuracy in classifying SR with ectopy, a clinical finding with prognostic impact. The threshold to classify a segment as SR with ectopy, as opposed to NSR, was only a 10% ectopic burden-this reflects clinical practice, and the accuracy increased further if the threshold for diagnosing ectopy was higher.

Comparison with prior studies
Several groups have addressed the arrhythmia detection problem based on HRV signals using linear and nonlinear measures and different classifiers (table 5). All the papers cited include AF in their database, but few of them take into account atrial or ventricular ectopy. Only Huang et al (Huang et al 2011) and Bardossy et al (Bárdossy et al 2014) consider PVCs and PACs; the first employed linear measures to identify the transitions between AF and NSR, whereas the second proposed a diagnostic algorithm for implantable cardioverter defibrillators. This reached a very high accuracy, but the real innovation of that work lay in applying a new fuzzy logic-based scheme to standard measures of onset and instability.

Figure 6.
A 24 h Holter recording with demonstration of model output, updated every 2 min. Blue dots represent RR intervals labeled as NSR, green as PVCs, and red as AF.
The model output depicted along the lower edge of the plot correctly identifies distinct changes between all three rhythms.

Limitations
Our clinical emphasis constrains our analysis in several ways. First, we assigned the diagnosis of AF when as little as 30 s, or 5% of a 10 min segment was present. This can generate a misclassification of some AF segments into NSR or SR with ectopy, but it is consistent with clinical practice, where episodes lasting this long elicit full consideration in authoritative guidelines (Fuster et al 2006), count toward AF burden (Glotzer et al 2009), and thus can lead to full AF treatment measures. Second, we consider atrial flutter (AFL) to be the same as AF. This approach, which is based on the similarities in clinical management, is certain to lead to misclassifications. The dynamics of AFL when AV conduction is fixed can never be considered the same as AF, and the regularity with which AFL rhythm can present is responsible for most of the 7 to 10% misclassification error of AF as NSR. Finally, we assign the diagnosis of ectopy when as little as 10% is present, as noted above. This cause an erroneous classification of this portion of low-burden records into NSR, but a burden of 10% can induce clinician to evaluate the use of anti-arrhythmic drugs or ablation procedures, especially if PVC cardiomyopathy is suspected.
The end result of these classification decisions leads us to diagnoses of AF and SR with ectopy even when the rhythm is 90% or more purely normal SR.
Surprisingly, these measures do not distinguish well between atrial and ventricular premature beats, a potentially important clinical distinction. Atrial premature beats may be harbingers of AF, and ventricular premature beats are associated with increased mortality, especially when structural heart disease is present.

Clinical implications
The clinical importance of accurate cardiac rhythm classification is related to the specific treatments that the presence of the arrhythmia requires. For example, the distinction between AF and SR with ectopy can be difficult without PQRST waveforms, but is nonetheless important for two reasons. First, the diagnosis of AF calls for decisions about anticoagulation, rate control and rhythm control. Second, atrial ectopy may presage AF, and ventricular ectopy may lead to cardiomyopathy or in other ways increase the risk of mortality (Lee et al 2012a, Lee et al 2012b. An atrial ectopy burden of even less than 1% increases the risk of AF over the next 5 to 15 years (Dewland et al 2013). More than 10% ventricular ectopy can be associated with LV dysfunction and clinical heart failure syndromes that are reversed by ablation of the ectopic site (Baman et al 2010, Lee et al 2012a, Yokokawa et al 2013.

Applications
The utility of this classification are represented by its ability to work on segments of only 10 min, allowing it to be potentially useful in a real time context. Thus these algorithms should be useful for personal monitoring devices. In addition, continuous monitoring in hospitalized patients can be crucially important since changes in the cardiac rhythm are very abrupt and life-threatening. An example of this application is reported in figure 6 where the proposed classifier was proved to be able to track sudden changes in heart rhythm.

Conclusion
Multivariable statistical pattern recognition techniques improve on simple measures in correctly distinguishing SR with frequent ectopy, AF and NSR. Moreover, nonlinear measures add significantly to the mean and the variability. These results should be useful in designing rhythm classifiers from heart rate devices where high fidelity ECG waveforms are not available.