Area 1 of Approximate Entropy as a Fast and Robust Tool to Address Temporal Organization

Aims: To evaluate the consistency and robustness of an informational entropy analytical tool derived from Approximate Entropy (ApEn). Study Design: A set of in machina time-series of known properties were generated to test and compare the proposed tool with the standard ApEn and with peak-ApEn. (b) is that a1ApEn is able to correct inconsistencies found when using peak-ApEn (all P < .01, Student’s t-test). Conclusion: The proposed tool, the area under the curve for ApEn of window 1 (a1ApEn) is objective and more consistent than both the ApEn and the peak-ApEn estimators.


INTRODUCTION
Approximate Entropy (ApEn) is a widely employed tool to characterize temporal organization in time-series (S). This method seeks to estimate the degree of organization by counting the number of equal events (matches) of a sub-vector i of size m along the original vector, where the distance between two subvectors is given by the Heaviside function for a certain tolerance r. More details can be found in [1,2], but the central idea is to count all the matches (#) for a certain i, m and r as: (1) From the counts of each sub-vector i, another value, , is computed: (2) Finally, ApEn is obtained through: ApEn(m,r,N) =  m (r)   m+1 (r) On the one hand, ApEn putatively achieves its objective in many cases, as, for example, prediction of survivability through body temperature regularity and estimation of machine health via analysis of vibration in rolling bearings [3][4][5][6][7][8][9]. On the other hand, drawbacks in the estimator are present [2,10,11] and, consequently, there is room for improvement.
A first issue is what we call as "positional sensitivity": two arbitrarily chosen fragments of a larger original data may possess very distinct ApEn values. This problem holds true even for highly organized series such as the sum of two sine waves (Fig. 1A), where it is possible to observe a twofold increase in ApEn values in different sampling intervals of the original data set (Fig. 1B). This problem is usually mitigated, as shown in Fig. 1B, by the use of a moving window to obtain the mean ApEn of a given timeseries [6,8,9].
A second, and much more important issue, is the lack of an objective procedure to obtain the value of the estimator. ApEn relies on two arbitrary choices of parameters, namely, the window size of comparison, m, and the tolerance for distinguishing two vectors as non-equals, r (equations [1][2][3]. This arbitrariness is a huge shortcoming in this tool since two very distinct series can be classified differently depending on the choice of the parameters [2]. A well known variation of ApEn, sample entropy (developed by Richman & Moorman [10]), also from suffers from this weakness. There are two attempts to overcome such a problem, as presented below.
One is based on the computation of ApEn for a large set of tolerance values in order to obtain the highest (peak) ApEn for a given m (peak-ApEn - [11][12][13]). The logic behind this approach is clear when comparing peak ApEn values in different time-series (Fig. 2). Due to the formulation of equation 3, small tolerances are associated with small ApEn values (since the counting is as small for m = k as it is for m = k+1). Therefore, as the tolerance increases, ApEn values rise to a peak and then decreases with further tolerance increases (since for large tolerances all sub-vectors would be considered as equal for both m = k and m = k+1, resulting in ApEn  0). Therefore, it is possible to observe that the use of a single value of r, as suggested by Pincus [1] (e.g. 0.15 as illustrated in Fig. 2) can lead to spotting different regions of the ApEn curves of different time-series.
The other alternative approach is based on a double summation of ApEn values along all suitable m and r, resulting in a pseudo-volume below the surface thus obtained (vApEn, developed by Santos et al. [2]). vApEn is much more robust than ApEn and Sample ApEn. However, it is extremely demanding on computational time/resources, and turns out prohibitive for series containing more than 400 points even in powerful conventional computers.  Table 2 for nomenclature Likewise, peak-ApEn is much more robust than ApEn and Sample ApEn. Nevertheless, some inconsistencies still persist as we describe next (and exemplify in Table 1). Firstly, there remains subjectivity in the choice of the value of m, since the classification may change depending on this parameter (Table 1 subset A). Secondly, there is a high dependency on the size of the vector analyzed and reversed results are obtained with small differences in size (Table 1 subset B). Finally, well-organized timeseries (e.g., a sine wave) analyzed through only one period may present higher peak-ApEn values than a more variable series (Table 1 subset C).
In short, peak-ApEn and vApEn are much more reliable than ApEn, but the tools deserve further improvement.

A NEW TOOL: A1APEN
Here we propose an approach that might be considered as a step forward in relation to peak-ApEn and a step backward in relation to vApEn. a1ApEn is based on the construction of the area under the curve of ApEn versus tolerance r* (see Fig. 2) and is defined for a time-series S of size N as: (4) Where the integral stands for a numerical integration. The meaning of r* is given in section 2.1. Why the use of the window m = 1 is presented in section 2.2.

The Tolerance Vector r and the Normalized Tolerance Vector r*
The first crucial step to obtain an accurate a1ApEn value is to construct a detailed tolerance vector, r. The usual practice to establish a value of tolerance for comparison is to compute it as a fraction of the standard deviation of the timeseries (see [1]). For instance, some figure between 15% and 25% of the standard deviation is the most typical choice. Here, we do not use the standard deviation as a milestone to compute r values.
Because the Heaviside distance d between two vectors i = (i 1 ,i 2 , …,i n ) and j = (j 1 ,j 2 , …,j n ) is given as d(i,j) = max(absolute(i k -j k )), k = 1,2 … n, then, for the entire time-series data, there would be a pair of points (henceforth we will use "point" to refer to a certain datum value in the timeseries S) that has a minimal distance greater than zero, and another pair that has a maximal distance (this will be absolute(max(S) -min(S)) 1 . Why we are considering a minimal distance greater than zero becomes clear shortly.
The detailed tolerance vector r. Initially, the timeseries is sorted (ascendingly) and the absolute difference between each pair of consecutive sorted points is used to create a delta-vector, D. From the delta-vector, the zeros are excluded, and D has size n (unknown before these procedures). Then, the tolerance r vector is constructed. The first value of r is zero. 1 An exception for that is a binary series, since for r lower than some critical value, no matches can occur, while for r higher than that critical value, all the sub-vectors are equal; and ApEn should not be employed to characterize a binary series at all. Therefore, for r 1 = r(1) = 0, only completely equal sub-vectors will count as a match. There are two paths to construct the subsequent r values.
If the time-series size N  300, r will have size = n+1, and x = 2,3, … n+1, i.e., each value in r is the sum of the previous ones in D. Notice that the last value of the tolerance vector is r(n+1) = absolute (max(S) -min(S)), i.e., the maximal possible Heaviside distance among all the pairings in the original time-series.
If N > 300, then the first 51 values of r are obtained as described above. Then, depending on the fraction of the maximal distance that the sum of these 51 th values reaches, different partitioning procedures are taken in order to populate the remaining of the tolerance vector r. The main focus is to have a large number of r values up to the range of 35% of the maximal distance, since the ApEn versus r* curve has its most significant part within this range (personal observations -see Fig. 2 as an example). At the same time, from 35% to 100% of the maximal distance, fewer values are needed, so computational time should not be too much affected. The ApEn values are computed using each r from the tolerance vector.
Then, the area is obtained through a numerical integration (eq. 4) using a normalized tolerance vector r* = r/max(r) as the basis. This is very important in order to have comparisons on the same foot among different time-series. Otherwise, high amplitude series would end up, inherently, with higher a1ApEn values.

The Window Size m = 1
If the window size is kept fixed, a1ApEn has no subjectivity in the choice of parameters for the analysis, a much desired condition. In this sense, the window size 1 is chosen because it can be shown that it results in the largest area compared to m = 2, 3, 4, … up to m << N. In other words, m = 1 gives a complete objectivity for the a1ApEn measure because one can fix such a window size beforehand, having the knowledge that this choice will result in the largest area for the timeseries under analysis. We show this result below. Equation 2 can be rewritten as: Let X(m) be a mean count. Substituting this mean count in place of the true counts # i (m): Then, inserting (6) in equation (5b), we obtain a mean ϕ, related to the mean counting: Consequently, from equation (3): Replacing (7) in (8) results in an ApEn value related to the mean counts in each window: (9) Let us call ApEn as the expected ApEn value. If N→ ∞ is considered, then: (10) It is clear that the expected ApEn depends on how X(m) increases in proportion to X(m+1). It is not possible to define the rule for such a relation, but it is known that it must respect the following conditions: r* = 0 implies X(m,0) = 1 and r* = 1 implies X(m,1) = N. It is interesting that even an arbitrary rule may still give valuable information about the behavior of the curve ApEn versus tolerance. Consider the following formulation that obeys the above conditions and is able to reproduce the curves in Fig. 2: (11) With this rule, it is possible to observe that if q(m) possess a linear behavior (i.e., q(m)=a+bm), ApEn (m = 1) encompasses the ApEn for all other windows sizes (see Fig. 3). Furthermore, the only situation in which an area computed for m = 2 is greater than an area computed for m = 1 is when q(3) ⁄ q(2) ≫ q(2) ⁄ q(1). Generally, this last situation is not expected for m << N since, for each inclusion of a new dimension in the state-space (i.e., m, m+1, m+2 ...), almost-always there would be a proportional decrease in the number of counts (C i ). This is particularly accurate for less organized time-series, given that the probability of decreasing the number of counts remains the same with the addition of a new dimension.
Therefore, we are lead to conclude that the area under the curve of ApEn versus tolerance r for m = 1 is appropriate, remaining no subjectivity in the choice of the parameters for analysis.

Testing a1ApEn
Seventeen time-series from different generating processes were created in machina (Matlab R2013a). Table 2 presents a description of the generating processes.  For the purposes of testing the "positional sensitivity", each generated series had 360 points and a moving sampling window of N = 180 run along these 360 points. Therefore, we obtained 181 samples of each original series, and each sample had its ApEn (r* = 0.15), peak-ApEn and a1ApEn values computed. The variances of the 181 values obtained for each estimator were, then, compared using a F-test for sample variance.
For the purposes of testing the consistency of the tool, we generated time-series of varying sizes (2N = 240, 420 and 600). A moving window of half size (N = 120, 210 and 300) run along the series. Therefore, for each N, we computed N values of the peak-ApEn and a1ApEn estimators. Two things are expected: a beforehand known well-behaved series should have lower estimators values, and, a series classified as less-organized for a given N should be classified as less-organized for a different N when compared to another series.

Results of the Positional Sensitivity
The issue of positional sensitivity is still present in the a1ApEn. However, the variance of the moving sampling windows is significantly lower (all P < .01) than the ones obtained using peak-ApEn for all the 17 data sets analyzed. Regarding ApEn, the variance of a1ApEn was significantly lower in 16 cases (P < .01, Fig. 4A).
The only exception was the tent map C (Fig. 4B). This time-series is poorly characterized by ApEn using r > 0.15 (Fig. 4C), and all moving sampling windows had a very low and similar ApEn value (  1.56 x 10 -5 ) and, therefore, a close to zero variance resulted. Hence, this exception reinforces the benefits of the a1ApEn approach indeed.

Results of the Consistency of a1ApEn
Tables 3 and 4 present mean values of peak-ApEn and a1ApEn, respectively, obtained for each process. The series are classified ascendingly accordingly to their mean values for N = 120. Table 5 presents the pairing comparison between the classification given by peak-ApEn and a1ApEn. Coincidences are highlighted. Notice that the extremities of the classifications are coincident between peak-ApEn and a1ApEn. Table 6 shows a symbolic pattern of changes in classifications that would occur using peak-ApEn if, instead of sorting the processes using the 120 points size (Table 3), one had chosen the results from N = 210 or N = 300. There are two main columns, denoted by Δ 1 and Δ 2 . Δ 1 is the difference between a line "j" and the line "j-1", in Table 3. Δ 2 is the difference between a line "j" and the line "j-2", in Table 3. The "+" symbol indicates that the process in line "j" maintains its classification in relation to the preceding process when 210 or 300 points are analyzed. On the other hand, the "-" symbol indicates that a change in classification would occur. Similarly, Table 7 shows the symbolic pattern of changes in classifications for a1ApEn.
Two points are to be noted. Firstly, changes in classification using peak-ApEn are much more pronounced than using a1ApEn. In fact, while the latter gives no second-degree changes (Δ former presents two changes in Δ while most of the changes in classification by peak-ApEn occur for non-coincident processes ( Table 5), changes using a1ApEn occur at the more disorganized processes extremity. We discuss the importance of this fact shortly.

Discussion
ApEn is a wide employed estimator of organization (complexity) of time Introduction). As discussed in [14] devised as an alternative method to approach short time-series (< 1,000 data points) originated from unknown underlying processes. At that time, there was a struggle in empirical time  Table 7 shows the symbolic pattern of changes Two points are to be noted. Firstly, changes in ApEn are much more pEn. In fact, while the degree changes (Δ 2 ), the former presents two changes in Δ 2 . Secondly, while most of the changes in classification by coincident processes (Table 5), changes using a1ApEn occur at the re disorganized processes extremity. We discuss the importance of this fact shortly.
ApEn is a wide employed estimator of organization (complexity) of time-series (cf., [14], ApEn was devised as an alternative method to approach series (< 1,000 data points) originated from unknown underlying processes. At that time, there was a struggle in empirical time-series analysis regarding the r strange attractors (deterministic chaos) from other processes and ApEn proved a valuable tool as an estimator of the rate of (informational) entropy production of a Markov chain approximating a given process (see However, as pointed out in a number of studies (cf. Introduction), this estimator suffers from consistency and objectivity. Fig. 2 illustrates well these problems and more robust tools are, therefore, relevant. In this study, we pres new estimator that fulfil this task, i.e., has no subjectivity and, at the same time, has consistency. To obtain the estimator, one constructs, numerically, the area under the curve of ApEn values for m = 1, obtained in a detailed range of tolerances, along a normalized tolerance vector. To supply the tool with a complete objective procedure, we show that almost-always the area obtained with the window size m = 1 will be the largest one in relation to other window sizes greater than 1. Thus, in face of this, we name the estimator as a1ApEn. series analysis regarding the recognition of strange attractors (deterministic chaos) from other processes and ApEn proved a valuable tool as an estimator of the rate of (informational) entropy production of a Markov chain approximating a given process (see [14]). However, as pointed out in a number of studies (cf. Introduction), this estimator suffers from consistency and objectivity. Fig. 2 illustrates well these problems and more robust tools are, therefore, relevant. In this study, we present a new estimator that fulfil this task, i.e., has no subjectivity and, at the same time, has consistency. To obtain the estimator, one constructs, numerically, the area under the curve obtained in a detailed , along a normalized tolerance vector. To supply the tool with a complete objective procedure, we show that always the area obtained with the will be the largest one in relation to other window sizes greater than 1.

Comparison of the variance of 17 data-sets with ApEn and a1ApEn
ce of this, we name the estimator as   Table 2 It is expected that a time-series from a given generating process could have different values of a certain estimator depending on the sample interval. Our first goal was to show that a1ApEn is less sensitive to the sample interval than ApEn. Thus, the tool is not even objective and more consistent, but it is also more precise. Next, we proceed to show that the new estimator is more consistent than, and without the subjectivities of, another estimator derived from ApEn as well, namely, peak-ApEn. Some of the problems with this estimator are illustrate in Table 1.
The results from peak-ApEn (m = 1) and a1ApEn are shown in Tables 3 and 4, respectively. There, the time-series were sorted accordingly to the values obtained for each estimator with N = 120.
The first important point to be noted is that the extremities (i.e., low and high values of the estimators) are coincident. This indicates that, in general, these two tools are able to, apparently, recognize time organization in similar ways (Table 5). Table 5. Comparison of the classification given by peak-ApEn and a1ApEn as presented in Tables 3 and 4 peak  On the other hand, when we address the issue of changes in classification that would occur if another size was employed, we find out that peak-ApEn presents much more changes than a1ApEn (Tables 6 and 7). This feature highlights that a1ApEn is a more robust tool for analysis.
As important as the issue of changes in classification above, is another fact. Despite of the size N, a1ApEn segregates the organized deterministic process of sine waves from the maps, while peak-ApEn mixes up the ordering of these different processes. As we stated, this is a very relevant issue since, even for small series (N = 120), a1ApEn can correctly identify different generating processes. At the same time, it should be noted that both tools correctly classify the random processes as the less organized ones.
Finally, plain inspection of Tables 3 and 4 reveals another significant result. As can be observed, the standard deviations of a1ApEn are all less elevated than those of peak-ApEn, with a 4.8times lower median. This implies, as in the case of positional sensitivity, that a1ApEn is much less prone to variations than peak-ApEn when evaluating the organization of a time-series.

CONCLUSION
The analytical tool a1ApEn is consistent and has a completely objective procedure to address time-series temporal organization. The tool is able to discriminate adequately different generating processes and presents less variance than ApEn and peak-ApEn.