CONTROL ENTROPY: A COMPLEXITY MEASURE FOR NONSTATIONARY SIGNALS

. We propose an entropy statistic designed to assess the behavior of slowly varying parameters of real systems. Based on correlation entropy, the method uses symbol dynamics and analysis of increments to achieve suﬃcient recurrence in a short time series to enable entropy measurements on small data sets. We analyze entropy along a moving window of a time series, the entropy statistic tracking the behavior of slow variables of the data series. We employ the technique against several physiological time series to illustrate its utility in characterizing the constraints on a physiological time series. We propose that changes in the entropy of measured physiological signal (e.g. power output) during dynamic exercise will indicate changes in underlying constraint of the system of interest. This is compelling because CE may serve as a non-invasive, objective means of determining physiologicalstress under non-steadystate conditions such as competition or acute clinical pathologies. If so, CE could serve as a valuable tool for dynamically monitoring health status in a wide range of non-stationary systems.

1. Introduction.Information theoretic analysis of dynamical data has become a popular approach to leverage thermodynamic concepts, information theory, and statistical mechanics in an attempt to quantify the so-called "complexity" of the system which is responsible for generating the data.Particularly for the biological and physiological data-sets , quantifying disorder of the system has become popular as an intense area of promising recent research; see for example cardiac variability analysis in [19,16,9,13], gait analysis [15,8], circadian rhythm [1], postural control [6,7], and other important physiological experimental time series which have been analyzed by such methods.At the heart of such analysis, is the concept of quantifying the information evolution of transitions associated with probabilities assigned to each state, with a goal of providing single value (an entropy) to describe this information content.For example, with an appropriate finite partition of labeled states, i = 1, 2, ..., n, and a probability measure p i on that partition, the Shannon entropy of a random variable is defined [30,10] by However, given a finite data set (for instance, a portion of a time series), the issue of how to partition and how to best estimate these quantities p i is not trivial.When the data comes from a dynamical system, one may proceed as Grassberger and Proccia to generalize Shannon entropy by constructing the the Renyi spectrum of entropies, [14,26].From a time series of measurements, {x(t i )} N i=1 , after constructing delay vectors, X(t i ) = (x(t i ), x(t i + τ ), x(t i + (m − 1)τ ), i = 1, ..., n, n = N − (m − 1)τ , to create the m-dimensional Taken's embedding [32,29,12]. .Then in terms of the m-dimensional partition of uniformly sized hypercubes of side rhypercubes, a relative probability p i may be associated to be the relative occupancy of the orbit in the i-th hyper-cube.Then Renyi-entropies are defined, K q = lim r→0 lim m→∞ 1 1 − q ln I q (r), where, I q (r) = (p j ) q , q ≥ 0, although, more generally one would need to take the supremum over all possible partition refininements, rather than simply refining the partition.As a special case, Komolgorov-Sinai entropy [31,12], h KS -also called measure theoretic entropy, associated with K 1 , provides an ergodic quantifier of information theoretic complexity of a dynamical system.It defines a rate at which a dynamical systems looses initial precision, and amplifies external noise, where this rate is emphasized according to the system's natural measure [22].However h KS is generally not easy to accurately estimate from data, with difficulty in actually quantifying the appropriate probabilities.Another special case is the correlation entropy (K 2 ) [12,17], defined by (2) with q = 2.In addition to being a lower bound of KS-entropy as discussed in [17], due to convexity [35], it serves in its own right as a suitable quantifier of diversity of a data set.Most important to application is that K 2 can be quickly and accurately computed from a correlation integral, for which there are excellent algorithms for estimation from finite discrete data sets, such as in [33].
A great deal of attention has been paid lately toward a particular statistical estimator of an entropy of a given time-series, called Approximate Entropy (ApEn) by Pincus [23,25].Recently, an "unbiased" estimator of an entropy called Sample Entropy (SampEn) was developed by Richman and Moorman [27] as a counter-point of ApEn.We do not present the details of SampEn of a signal here, since it is well defined in [27].We will say only that the codes are widely accessible on the authors webpage, and they serve as a good estimator of K 2 .However, any code which adequately estimates K 2 (for some q) could be substituted in its place, such as the subroutines of the TISEAN package [17] or the maximum likelihood estimator of [21].A typical application is to use the algorithms to compute a regularity measure on a time series and then to use that computed value to classify the time series as being of one type or another.In many biological applications, they have been used to distinguish "healthy" from "unhealthy" biological signals [24,34,13].
As an extension of these applications, we consider the problem of continuous health monitoring, where the time series is not a fixed and complete set, but is "streaming."If we can associate a change in signal complexity with a change in health of the system, then we might hope that an entropy like measure might detect a developing problem (and possibly provide some warning before system failure).An inherent difficulty is that the basis for the entropy estimators makes an assumption of sufficient stationarity of the process such that the estimates of probabilities p i can be accurately and efficiently collected from the finite data sample.Moreover, our goal problem of detecting a change in the system implies an essential non-stationary at some time scale.We remark that that generally empirical KS analysis assume that the process that generates the time-series is sufficiently stationary that the underlying attractor in the phase space of the dynamical system becomes sufficiently sampled so that probability estimates become representative of the ergodic statistic.However, estimators such as ApEn and SampEn may be viewed as simply a statistic of the finite sample, without requiring stationarity.But even under this less formal interpretation, for these statistics to reasonably interpret the complexity of the signal requires that there is sufficient recurrence so that computed values can be interpreted as estimates of transition probabilities.Our main question of issue in this work will be the development of a method to infer by observations from a time series, of solutions from the system, whether there has been a significant parameter drift within the system.
Many real world systems which evolve in time may be idealized by a general deterministic model of the form, In this form, we may explicitly define a stationary system to be a dynamical system of the form Eq. ( 3), where the parameter(s) λ is (are) constant and where there is no explicit dependence on time in the description of f.One may argue that no real world system maintains its "constant" parameters λ, to be constant indefinitely.For example, consider the simple forced spring oscillator, ÿ + γ ẏ + λ = A sin wt, which may be written in the form of Eq. ( 3) with constant parameters for short times scales, but on much longer time scales, the spring constant λ may actually be time varying and perhaps decaying.To emphasize fast variables x and slow variables λ, it is common to write the standard singularly perturbed system [5], whose long term dynamics of the slow variable λ corresponds to the = 0 slow manifold, with corresponding differential-algebraic system, λ = f 1 (λ, x), f 2 (λ, x) = 0.However, when calling λ to be approximately constant, we are referring to the so-called associated system [2]; in the fast time scale, s = t/ , whence Eq. ( 4) becomes, and in the limit → 0, Eq. ( 5) yields the associate system, In this sense, it can be said that the parameter λ changes slowly in time.
In the rest of this paper, we will write x(t) to denote a vector valued function parameterized by time-t, where Ω is the scalar time domain.Similarly, x(t) will be written to denote a scalar measurement of the vector value, typically one of the vector components.Our measurements may come from a dynamical system, or a stochastic process, with measurement error.It is also notable that real world data sets from a continuous process are necessarily discretely sampled, and we will write, where typically, t i+1 − t i = constant, is a uniform time grid controlled largely by the engineering of the experimental measuring devices.For this reason, it is natural to consider not just continuous time processes such as flows from differential equations, or solutions from stochastic differential equations.For the rest of the presentation of this paper, we restrict our model problems to be discrete time maps, F : m → m , x i → x i+1 = F(x i ), which may be related to a flow by the Poincare' section method or by the stroboscopic method.

Development of Control Entropy (CE) .
Our goal is to develop an entropy like regularity statistic that can be applied against (possibly) non-stationary time series data in a way that allows insight into the (possibly) time varying parameters of the system.While in the case of a chaotic dynamical system with an invariant measure, it is suitable to discuss the full Renyi spectrum of entropies [17], it becomes less well founded for our goal problem.For this reason, we adopt the philosophy of ApEn [23,25], and similarly SampEn [27], developing a statistical measure of a data stream without a specific claim that the statistics should be converging to some "true" entropy value.Consequently, we may tune the algorithm to improve its utility in a particular setting.As such, our techniques are meant to allow the user to exploit (where possible) any expert knowledge of the underlying physical system that generates the data.The general characteristics of our statistic, which we call control entropy, are as follows: 1.The control entropy should be calculated in a moving window along the time series, generating an entropy value for each of many (possibly overlapping) windows.As such, the original data time series may be viewed as generating an associated control entropy series.In some cases, the physical system that generates the data might lead to reasoned choices of window, but one might also consider evaluating under multiple window sizes provides a multiscale view of the data.2. Computing entropy on only a window of the data necessitates a rapidly converging statistic.As such, we use a entropy based on correlation sums [17] , closely related to the technique of SampEn, but where we allow for removing the effects of short time correlation by using a Theiler window [33].3. To increase the stationarity of the analyzed signal, when we consider time series signal x(t), control entropy of x(t) is based on a correlation entropy of dx dt .For data signals sampled in discrete time, we simply compute on the finite difference (or increment) of the signal.That differences often render the signal stationary can be viewed as an empirical observation of many real world time series [11,17] without trying to establish a rigorous basis.We remark that many of the stochastic processes that arise in modeling (Levi process, Brownian motion, Poisson process, etc) have stationary increments as a fundamental assumption of the model, making our choice to analyze the differences a very natural approach to the non-stationarity.4. Standard methods for computing correlation entropy estimates on a continuous space require the choice of a "neighborhood size" which is related to the partitioning required to compute a numerical approximation of (2).As an alternative, one may transform the continuous signal into a sequence of symbols, and analyze the entropy of the symbol stream.We will often apply this symbolization approach, using the methodology of [20], with a primary goal of reducing the number of tunable parameters to increase the robustness of the approach [18].
To define control entropy via correlation sums, we proceed as follows: Let {z i } N i=1 be a scalar time series where data is taken on a uniform time grid.Without loss of generality, we assume unit spacing in time.For embedding dimension of m, let v i = (z i , z i−1 , . . ., z i−m+1 ) be a delay embedding, where we assume a unit embedding delay.The correlation sum [17] is given by where Θ is the heaviside function, r is a parameter defining a neighborhood, and is the total number of pairs of delay vectors.Integer parameter T ≥ 1 is the Theiler window, which mitigates the effects of time correlation in the data.We define Using this formulation as a regularity measure (without trying to consider scaling regimes and limiting values) is essentially equivalent to SampEn, which motivates are choice of symbology for the windowed version.Consider time series where J represents a time offset.SE assigns an entropy value to a time window of the dataset.J allows us to associate that entropy with a specific instant in time.
Choosing J = w associates the entropy of the window with the ending time of that window, while J = w/2 would associate that entropy with the time at the middle of the window.We now define control entropy CE by applying SE to the first difference, The CE statistic may be reasonable applied to both deterministic and stochastic process.In Sec.2.1, we show a short example based on Pincus' M IX p signal [23] and a stochastic random walk.Additionally, in Appendix A, we apply the method to a simple system of deterministic chaos.That example focuses on the robustness of the procedures in the presence of significant observational noise and added nonstationarity of the measured variable.is mean 0 with unit variance.Pincus showed that an entropy measure could be used to assess the randomness of the signal.We will use M IX p as a building block to create a signal that will illustrate the CE tool.
Let us assume that instead of being fixed, the parameter p is slowly varying in time.(For our example, p(t) is a prescribed function, graphed in Fig 1 .At each instant in time (t = j), the M IX process uses the current value of p(t) to generate a step size m j which feeds a random walk, given by where we take x(0) = 0 as a starting point.The resultant sequence is a sample from a generalized random walk, with strong correlations in time.Although the increments are not stationary, they are always mean 0 with unit variance.(See Appendix B for a detailed description of this process.)The goal of the CE analysis attempts to recover the shape of p(t) through entropy measurement of {x j }. Figure 1 shows a plot of signal {x}, parameter p(t) and an windowed SE and CE analysis.We remark that on windows that are small enough to follow the changes in p, there is insufficient recurrence in x(t) to provide reliable entropy analysis, and SE provides little insight to the parameter, while CE provides a reasonable recovery of the variations in p(t).

2.2.
Symbolization and SAX Partitions .Symbolic methods have a long history in application to dynamical systems.In many statistical and datamining environment, converting from continuous signals to a symbol space has been shown to be a robust way of detecting changes in signal characteristics [20,18].Let us assume that we have a reasonable method for converting continuous values x i to symbols s i , where the cardinality of the symbol set is b.If we assume s i − s j = 0 when s i = s j , and 1 otherwise, then we can compute control entropy on the symbolized space by We acknowledge that the symbolization procedure may can give misleading results; we have written extensively about the role of a generating partitions in dynamical systems and the computation of entropy with so-called "misplaced" partitions [4,3].The relevant message here, from our work, is that the role of the partition is important to the specific results.At best, one expects the numerical values to be biased, but it may also possible lead to nonintuitive non-monotonicity (inconsistency) of the results.With respect to symbolic dynamics from dynamical systems, we take a broad view here, toward a principle that the partition should be designed to allow the data set to demonstrate its full "complexity."While, for an infinite data set, refining the partition to allow more symbols may lead to better results, refinement eventually leads to poor results for finite data sets because of an under-sampling of recurrences through the partition.On the other hand, as analyzed extensively in [4,3], when investigating a highly complex signal with a theoretical entropy > ln b, then an b-symbol partition cannot reveal the full complexity.We take the design goal of a useful partition to be to facilitate the finite signal to "maximally express itself" in the sense of frequent recurrences through the designed partition while maintaining a simplicity of implementation.As a general approach, one would like to have approximately equal usage of each of the symbols in the chosen symbol set.In Fig. 2, we show an empirical histogram of the power data, {x(t i )}, for the Giro d'Italia data set from Fig. 10, which suggests a normal The signal {x j } to be analyzed.The "constant width" of the noise (small scale oscillations) is due to unit variance of the M ix p signal which provides the increment for the random walk.(Bottom) Drive signal p(t) (gray) was specified.The black signal p 3 00 shows resultant (random) realization, where the plotted value shows the fraction of random steps within a length w = 300 moving window.CE 8 (red) (using a 8 symbol discretization of data, as described in Sec.2.2, and window size w = 300) provides a very reasonable tracking of p(t).However SE 8 (blue) is unable to follow the changes in the parameter.distribution, in this case.Further, in Fig. 10(right), we show a windowed collection of histograms resulting from sliding block windows across the same data set.In this case, we see a generally unimodal distribution is maintained throughout.This motivates our standard partitioning method, called SAX -Symbolic Aggregate ap-proXimation, [20].The method fits an appropriate normal distribution, N (µ, σ), to the data set {x(t i )}, and then converting to the standard normal distribution N (0, 1), by the usual Z = x(ti)−x σ , we may associate a integer j ∈ 1, 2, ..., b by a uniform partition of the z-score of each {x(t i )}.In particular applications, we may find that the data set requires other partitioning schemes, but the SAX approach provides an "off-the-shelf" approach that appears to perform satisfactorily against a wide variety of signals.A pdf suggestive of a normal distribution, supporting the use of the SAX algorithm.The notable deviation from normality is the p(t) = 0 bin, which is due to the moments when the bicycle racer "coasts" during a ride.(right) Windowed Histograms, moving in time, maintains a generally unimodal distribution, for this data set, suggesting some validity in this case in maintaining the partition developed using the full time series.
Although these symbolic procedures reduce the number of parameters, we must still choose an appropriate alphabet size.We note that the theoretical maximum entropy on b symbols is ln b.In practice, if we find CE b ({x(t i )} ≈ ln b, then we say that the signal is saturating the entropy measurement, indicating that the partition is not fine enough to accurately capture the relevant dynamics.The competing limitations to balance in choosing b are that, • If b is too small, there may be saturation, but, • If b is too large, we may observe lack of recurrence, making estimates of transition probabilities (and the entropy estimate) unreliable.See the example data set in Fig. 10(Left) of a power data set which suggested to us the need of just such a non-saturating CE b as in Fig. 10(Right) in which b = 8 symbols are chosen for SAX-8.

Biomedical and Physiological Examples .
In demonstration of the kind of natural data sets with nonstationarity which we are now able to address, consider the samples demonstrated in Figs.3-8.These are each biomedical and physiological examples for purpose of this paper, each with the characteristic nonstationarity which renders traditional entropy methods more difficult.These examples range from cyclists power output during exercise in the lab during a Vo2Max test, Fig. 3, a maximal aerobic power test Fig. 4, during an intervals session in the lab, Fig. 5, to heart rate data taken during ergonomic testing, Fig. 6, to accelerometer on muscles during isometric exercise data, Fig. 7, and accelerometer data taken during treadmill testing for VO2max Fig. 8.One may compare this to non-laboratory data taken on the open road during a professional Grande Tour bicycle race, Fig. 10.2.4.Time-scales and Variability.We are primarily interested in how the data set may evolve in time.In particular, we hope to assess the behavior of some underlying, slowly varying parameter.We use a windowed fraction of the data, we compute the statistic CE on windows of size w, and then observe the windowed CE value as it evolves in time.The choice of window size must balances competing factors of: • Smoothing and convergence of the estimated statistic CE with larger w.
Larger numerical samples reduce statistical fluctuations in the computed statistic for each sliding window.If w is too small, the fluctuations may be so large as to mask the underlying parameter change of interest.• Detecting changes on (relatively) short time scales.If w is chosen to large, the window may stretch across several "features" of a parameter change.The CE algorithm assigns an "average" value to the window.Real fluctuations of the parameter that occur on time scales that are shorter than the window length may be difficult to discern.
The balance is similar to that of standard moving average filters, with longer windows providing smoother curves while limiting the capability to follow higher frequency fluctuations in the signal of interest.The examples, shown in Fig. 12, as well as Fig. 11, illustrate the smoothness gained by longer windows, while shorter time scales pushes the limits of limited data for convergence of the entropy statistic.We remark that the average period of the underlying system drives the sampling rate that is needed for the discrete time series to capture the full dynamics, and such sampling requirements are well understood.However, since the signal of interest is the changing behavior of some system parameter, that theory cannot be directly applied.To find the appropriate data scale, we run the the windowed CE analysis, across many time scales, as shown in Figures 11 and 12.
3. The Name Control Entropy.In this section, we motivate the choice of the term Control Entropy to define our statistic.Although the formulation of Sec. 2 may be viewed simply as an estimator of correlation entropy of the derivative of a time series, we focus our description on a primary application of interest, where the measured time series is taken from some physical process where control is required.We remark that a vast number of physical systems of interest can be interpreted within the framework of control systems theory such that our discussions are relevant to the broader set of examples.
For specificity, let us suppose that we desire to analyze power signal P (t), the measured output by a bicycle rider.Furthermore, suppose the rider is trying to achieving some specified power level, denoted P ref .
Obviously, over the course of time, that value may change, but let us treat it as a constant over some time interval of interest, with the rider doing something to try to reach and maintain that level.We would expect that some type of feedback control description of the system would provide a reasonable characterization of the important dynamics.(See Fig 9 .)Let x(t) be defined by where we now view x(t) as carrying the state variable information (where some embedding may be required to properly characterize the state equations).A standard control theory approach is to assume that we are near the desired control point, Figure 9. Basic feedback controller.e(t) is the error between the reference power and the actual power.u(t) is the control signal.so that a linearized approximation is reasonable, with the system governed by the dynamical system ẋ = Ax + Bu (16) where ẋ denotes derivative with respect to time.u = u(t) is a control signal, generated based on the sensed error.In our simple description, x(t) is the difference from reference power, and can be treated as the error, and the controller generates signal u(t) as some function g of the error.If we continue the assumed simplification of a linear form then this relationship can be described by where K is a matrix that defines the controller.Then ( 16) can be rewritten as The control system designer chooses K so that the resultant system of ( 18) is stable.CE is computed as an entropy of Ṗ .If we assume that the rider is trying to maintain a constant power, then dP ref /dt = 0 most of the time, deviating only during those short intervals when the rider decides to ride at a different power level.Differentiating (15), we have which we interpret as indicating that the CE computation as allowing direct insight into the control system behavior, providing an entropy measure of the control signal.The linear analysis above meant to justify the view that CE is descriptive of the controller but is not rich enough to show why we consider the evolution of CE, as described in the time-window statistic of (13).To provide a heuristic argument, we consider the following general description of the controlled dynamical system: If x describes a vector of state variables for some system, then the controlled behavior is ẋ = f(x, u s (t), u c (t), t), (20) where f gives the functional dependence of the evolution of the system on state x at time t, with control u decomposed into a subconscious component u s , and conscious component u c .While the control signals will certainly depend upon some desired reference behavior, we make no specific effort to distinguish feedback and feedforward controllers, allowing that these control signals might also depend upon the state x.Then CE can be viewed as measuring the complexity of the signal Changes in the complexity of this signal would indicate changes in the characteristics of the attractor of this system, which would logically be associated with either changes in the control system or changes in the underlying physiology of the system.
Analysis of human gait experiments indicate that subjects walking at their natural pace (called unconstrained walking) show a complex pattern in stride interval, with long time correlations in the time series data, with these correlations disappearing when the person was required to walk to a metronome [15].CE measurement of the signals indicates higher entropy for the metronome walking as compared to the unconstrained case.One interpretation of the stride walking data is that without a metronome, the body is required to generate a clocking signal to drive the overall cadence.With the goal of a steady walking pace, this requires periodic control signal u c .The resultant signal is correlated in time, which implies that the future states is correlated with current state, so that less information (per unit time) is required to describe the evolution of the system (a lowe entropy).When stepping with a metronome, the periodic drive is generated by an external source, with entrainment to that signal requiring only a small amount of conscious (periodic) control.In other dynamic exercise situations, the "constraint" might be the individuals desire to maintain a certain work output (e.g.running speed).During extended efforts, as the body fatigues, the natural (unconstrained) action by the body would tend to slow down the system.Anecdotal evidence indicates that the athlete must concentrate harder to maintain a constant work rate.If we associate this concentration with an increased conscious control u c , (asking the body to maintain a steady level), then the entropy of signal s(t) would fall as the athlete becomes more fatigued.Additionally, we might expect that if the athlete loses concentration, u c would be small, and the signal entropy would rebound to previous levels, though now the work rate output by the system might be significantly reduced.Although we use the u c and u s designations for this heuristic justification, from a practical standpoint, CE does not depend on conscious control.Simply stated, the time-varying methodology is meant to detect complexity shifts that would be indicative of changes in an underlying system parameter.
4. An effect of observation noise on CE .Since our methods are focused on application to measured data from physical systems, the effect of noisy observations must be understood.The methodology of [27] provides a means of assessing the expected fluctuations in the measurement due to the randomness of the model, but they do not account for noise.Specifically, if a process has a continuous noise input, its "true" entropy is infinite [17], with the measured entropy strongly dependent upon the observation scale.Although our choice of symbolization mitigates some of these effects, if one is attempting to assess parameter changes of the system through entropy measurement, one must be able to quantify the effects of observational noise.CE computations are fundamentally counting statistics.Counting a particular symbolization amounts to application of a characteristic function, which is inherently a nonlinear function, where standard statistical methods are difficult to apply.The goal of this section is to present a method to develop an algorithmic approach to assess an empirical distribution of error in the noisy CE computation.We model the noise of the time dependent random variables X(t) leading to individual measurement realizations x(t), where we acknowledge both a continuous measurement noise (error) as well as discretization errors which may occur in certain digital devices that quantize the reported signals: Let, where X (t) is the "true" physical value being measured, with error due to added random noise variable ξ(t).We take ξ(t) to be a time dependent random variable, with a time dependent probability distribution function ν(t).Notice that we have assumed added noise to the signal, which should be sufficient for modeling measurement noise, (whereas other models might be required to study the stochastic evolution of the system.)Suppose we wish to compute a time series of CE from measured data {x(t i )}.Since different realizations of the added noise ξ(t) at each instant t may lead to different symbolizations, different computed control entropy values could result.Therefore, if we have a good model for the noise profile, ν(t), we may use a random number generator to produce appropriately distributed instances of possible, −ξ(t).Thus we will define a Monte-Carlo style simulation of the noise, from which we will simulate appropriate samples.In practice as computer code, we declare pseudo random numbers ξ(t) (a coded subroutine to be built), to be distributed as ν(t) at each instant t.With ξ(t), we may generate the Monte-Carlo instance, For each sampled instance, x(t i ), we may compute instances of CE, where all parameters are chosen as in the analysis of the original time series.Repeated sampling, Monte-Carlo style, of this generated time series may be performed to sample the associate distribution of ĈE(t), at each t, with error bounds based on these distributions.For the simulations shown here, we specialize to assume a typical form of ν(t).We take as an example the power in watts from the professional bicyclist in Fig. 10 as measured by a device called an SRM Pro powermeter, which uses several piezoelectrical strain gauges built into the bicycle crank armature to infer applied torque.Measured values are reported in whole integer valued numbers of watts, in the range 0-2500watts, with a margin of error imprinted from the factory of +/-2.5%, which we assume to indicate normally distributed and unbiased noise, which we take (as worst case) to be identically distributed.So we take ξ(t) ∼ N (0, σ) + U (−0.5, 0.5), again where, σ = 0.025x(t), (25) where the uniform noise term is meant to describe the discretization error by assuming a uniform rounding procedure.
Proceeding with the given measured signal, {x(t i )}, and computing ĈE by Eq. ( 24) and with discretization and noise errors modeled (25), we generate many samples, N , of possible instances of ĈE time series.In Fig. 10(Right), we see the result of such repeated Monte-Carlo simulations of possible CE, resulting from possible original signals, through N = 250 simulated pseudo randomized signals.In the Figure, we have displayed 95% (blue) error bars from the simulation, which is the CE-value for each t such that 0.025 of the simulated pseudo randomized signals lie above (below) the sampled population.Also shown in red is the computed CE[j; w, {x(t i )}] from the measured signal {x(t i )} without any pseudo randomization.The expectation is that the CE-signal will lie within the blue error bars roughly 95% of the time.We observe in Fig. 11 that as the window length increases, the error bands of CE become tighter.This is due in part to the particular form of the noise, for which repeated sampling gives better averages, and the essentially counting nature of the CE computation.
On the other hand, as the window length becomes too long, the computation averages through the slower-time scaled parameter variations, which are the very events which are being investigated.That, the time scale of slow varying parameters is changed with window lengths, is also discussed in Sec.2.4.See also Fig. 12 for further study of this time scales issue.5. Discussion.We have introduced a new entropy statistic designed to assess the behavior of real systems under the condition of nonstationarity.In particular the focus is on those systems where the nonstationarity is due to some slowly varying parameter of the system.Our focus is on application to systems where the change in parameter would be indicative of a change in the "health" of the system.Control Entropy is based on a windowed application of correlation entropy, but uses techniques from symbol dynamics (as well as the increased stationarity of signal increments) to improve the robustness of the statistic.A key component of the approach is that to analyze for a slowly changing parameter, one must employ some type of windowed analysis of the signal.CE provides a sufficiently rapid convergence of the statistic to allow exploitation of this irregularity measure on real signals.In Sec.2.3, we have shown a vast array of examples on physiological data.In Sec. 3 we give an heuristic argument as to how changes in CE might be related to the underlying control mechanism.These initial studies seem to indicate that CE may serve as a non-invasive, objective means of determining physiological stress under non-steady state conditions such as competition or acute clinical pathologies.Crucial to being able to properly interpret these signals, especially in a clinical setting, requires that the effects of measurement noise be understood.Section 4 explores a statistical methodology for determining when the changes in CE can be interpreted as "real" changes in the underlying system.With appropriate statistical tools in hand, it appears that CE could serve as a valuable tool for dynamically monitoring health status in a wide range of non-stationary systems.However, at some point, when the window is longer than natural trends in parameter variations, there may be a loss in variability of CE computation due to averaging between several trends.By the second and third CE plots from the bottom-left, we see a general decline in CE as the ride progresses, overlaying some general oscillations which perhaps correspond to ebbs and waves of the race pace due to mountains and strategy.
Appendix A. Example of CE applied to deterministic chaos system with noisy observation.Consider a discrete-time chaotic map, for which we choose the specific example, a family of skew tent maps, F a,b : When a = 1/2, b = 0, it is known [28,4] that this map has entropy exactly h 2 (F ) = ln(2).In Fig. 13, we show two different tent maps, one with a = 1/2, b = 0, and the other with slightly lower entropy, which will be the subject of demonstration in Fig. 14.The time series experiment in Fig. 14 is run for half of the time with the higher entropy blue map, and then is switched abruptly to the lower entropy red map.
In Fig. 14(Upper Left), we show a chaotic time-series x(t) of 25, 000 iterates, where the parameters of the underlying dynamical system are abruptly changed at time 12, 500, from the maximal entropy tent map, to the submaximal tent map shown in Fig. 13.Because the data-set is stationary, a windowed application of the popular SE successfully detects the parameter change, as Fig. 14(Middle Left) demonstrates an abrupt decrease in measured entropy.For reference, the vertical blue line is placed at the moment of the theoretical change in the map's parameters.
Next, we modify the stationary chaotic signal to produce what our "benchmark" of a nonstationary signal, shown in Fig. 14(Upper Right).The dark curve, which we name y(t), running through the middle of the erratic signal shown is an artificial and deliberately wandering baseline signal designed explicitly to produce a definitive nonstationarity.The erratic signal, x(t), shown is then a sum of erratic components, x(t) = x(t) + y(t) + ξ(t), (28) where x(t) is the deterministic chaotic signal, y(t) is the wandering baseline signal, and ξ(t) is a random component as follows: Let N (0, σ) be a normal random variable for which we choose a large variance, U (0, 1) the uniform random variable, and which yields a noise signal which is "off" 95% of the time, but gives a large normal noise component on about 5% of the measurements.The designed signal x(t) is highly erratic with both chaotic and large wandering random components.Due to the high degree of nonstationarity, application of SE algorithm to the x(t) signal fails to indicate the moment of the changed parameters, as no apparent change in SE at the time of parameter change (see Fig. 14(Middle Right)).However, as seen in Fig. 14, CE also detects the event change in both the nonstationary and the stationary signal.

Figure 1 .
Figure1.A random walk from M ix p .(Top) The signal {x j } to be analyzed.The "constant width" of the noise (small scale oscillations) is due to unit variance of the M ix p signal which provides the increment for the random walk.(Bottom) Drive signal p(t) (gray) was specified.The black signal p 3 00 shows resultant (random) realization, where the plotted value shows the fraction of random steps within a length w = 300 moving window.CE 8 (red) (using a 8 symbol discretization of data, as described in Sec.2.2, and window size w = 300) provides a very reasonable tracking of p(t).However SE 8 (blue) is unable to follow the changes in the parameter.

Figure 2 .
Figure 2. Data histogram(s) of the power data from a professional bicycle rider on the open road, as shown in Fig. 10.(Top)A pdf suggestive of a normal distribution, supporting the use of the SAX algorithm.The notable deviation from normality is the p(t) = 0 bin, which is due to the moments when the bicycle racer "coasts" during a ride.(right) Windowed Histograms, moving in time, maintains a generally unimodal distribution, for this data set, suggesting some validity in this case in maintaining the partition developed using the full time series.

Figure 3 .
Figure 3. CE of power output during incremental cycling VO2max test on laboratory ergometer.(Upper) Plot of p(t), measured power in Watts, versus t in minutes.The subject completes a warm up at 60 Watts.At 5 minutes, the workrate is increased to 100 Watts, and increases thereafter 15 Watts per minute until exhaustion while metabolic gasses are collected (data not shown).(Bottom) The CE of power as a function of time t displays what we have observed to be a typical scenario during stressful exertions, that being a decline in CE.

Figure 4 .
Figure 4.A cycling effort on laboratory ergometer at maximal aerobic power (MAP) established by VO2max test shown in Fig. 3.(Upper).Again, p(t), measured power in Watts, plotted versus t in minutes.The subject performs a warm up at 25%, ramp to 50%, then maximal effort at MAP until exhaustion.(Bottom).CE of power is elevated during the warm up, and declines as workrate is ramped to 50%, then reaches a nadir during the effort at MAP despite the relatively short duration of the maximal effort.

Figure 5 .Figure 6 .
Figure 5.An intermittent cycling effort on laboratory ergometer at MAP. (Upper).p(t), measured power in Watts, plotted versus t in minutes.The subject performs a warm up at 25%, ramp to 50%, then effort is alternated between 100 and 50% of MAP until exhaustion.(Bottom).Again, despite the variable nature of the effort, CE of power declines at the end of the warm up and stays suppressed during the remainder of the protocol.

Figure 7 .Figure 8 .
Figure 7. CE of mechanomyography (MMG) during isometric muscle action at 100% maximal voluntary contraction.(Upper).V (t), measured voltage, versus t in seconds.Voltage is signal from a uniaxial accelerometer placed on the surface of the Vastus lateralis muscle of the thigh.Signal is collected during a 10 sec baseline rest, 70 sec maximal isometric leg extension, and 10 sec recovery.The accelerometer measures muscle vibration as a result of muscle action.(Bottom).CE of accelerometer voltage declines dramatically at the start of the contraction, and remains depressed until fatigue and return to baseline rest.

Figure 10 .
Figure 10.Power data from a professional bicycle rider on the open road, during a major European Grande-Tour, the 2001 Giro d'Italia.(Upper Row) Power p(t) data set in watts is plotted versus time t in minutes.Apparently, and notably, there is a great deal more variability in the power of a riding in the professional peleton compared to a amateur in the laboratory setting from Figs. 3-3.(Bottom-Left) CE computed by sign-partition method, Eq. (2).With essentially 2-symbols allow, and the highly variable and complex signal of the open road racing, a great deal of time yields saturation of the computed CE≈ ln 2, suggesting the need for a finer partition.(Bottom-Right) CE 8 , using the SAX method and 8-symbols reveals a common scenario here , which is decreasing entropy with rider stress as the control signal becomes more constrained.Statistical error bands are shown, as explained in Sec. 4.

Figure 11 .
Figure 11.Window length study of CE of power during submaximal cycle ergometer trial.The subject performed warm up and cool down at 200 Watts, with three 15 min efforts at 315 Watts and 20 min at 280 Watts (50, 78 and 68% of VO2max, respectively).(Top) Power p(t) in watts as a function of time t in minutes.(Successively from 2nd to Bottom) CE for increasing window lengths w = 201, 401, 801, 1601, 3201 data points, with w = 201 corresponding to a window duration of ≈ 4.2 minutes.Error bands (black) are as described in Sec. 4. Observe that smaller windows are able to detect the high intensity intervals, but at the cost of greater statistical fluctuation.The longer windows provide a timeaveraged interpretation that indicates the overall trend toward exhaustion without regard to changes in intensity of the workout.

Figure 12 .
Figure 12.CE time-scale study of power data from a professional bicycle rider on the open road.(Left) CE[j; w, {p(t i )}], according to Eq. (13), for increasing time windows, from top, w = 401, 801, ..., 12801, for a CE filtering of data in windows of natural units, W = 8.41sec, 16.821sec, ..., 268.821sec.(Right) The same data plotted simultaneously in the same window.We see for small w variability in CE[j; w, {p(t i )}] due both to statistical variability of the data and corresponding convergence of the CE statistic, as well true parameter drift.On the other hand, as w is increased, we see smoothing of the CE variability, to the point that we can see variations due to slow drift in parameters more clearly.However, at some point, when the window is longer than natural trends in parameter variations, there may be a loss in variability of CE computation due to averaging between several trends.By the second and third CE plots from the bottom-left, we see a general decline in CE as the ride progresses, overlaying some general oscillations which perhaps correspond to ebbs and waves of the race pace due to mountains and strategy.

Figure 13 .
Figure 13.Two Tent Maps with different underlying entropy.The time series experiment in Fig.14is run for half of the time with the higher entropy blue map, and then is switched abruptly to the lower entropy red map.
(t)]Straight SampEn[x(t)] fails to identify underlying entropy event of nonstationary signal (t)] successfully identifies underlying entropy event

Figure 14 .
Figure 14.CE locates changes in entropy of nonstationary signals.(Top Left) A chaotic signal, determined by a chaotic tent map is designed with a change in from the larger entropy blue chaotic tent map to the smaller entropy red map as shown in Fig. 13.(Mid Left) a moving window estimate of entropy of the time-series signal using SampEn method successfully detects the change in the driving process.(Lower Left) CE also successfully detects the change in parameter corresponding to changed topological entropy.(Top Right) The stationary signal x(t) (shown Top Left) is added to riding signal -shown in blue through the middle -and the resultant signal is contaminated with measurement noise, which gives the nonstationary stochastic signal shown in red.(Mid Right) SampEn of this noisy signal; note that SampEn cannot "find" the parameter change in the nonstationary signal.(Lower Right) CE detects the change, despite the highly nonstationary signal.

E
.M. BOLLT, J.D. SKUFCA AND S.J. MCGREGOR generalized random walk is time-inhomogeneous Markov chain with strong time correlations and non-stationary increments.