S: A Novel Statistical Approach To The Use of Control Entropy

Control entropy (CE) is a complexity analysis suitable for dynamic, non-stationary conditions which allows the inference of the control effort of a dynamical system generating the signal. These characteristics make CE a highly relevant time varying quantity relevant to the dynamic physiological responses associated with running. Using High Resolution Accelerometry (HRA) signals we evaluate here constraints of running gait, from two different groups of runners, highly trained collegiate and untrained runners. To this end,we further develop the control entropy (CE) statistic to allow for group analysis to examine the non-linear characteristics of movement patterns in highly trained runners with those of untrained runners, to gain insight regarding gaits that are optimal for running. Specifically, CE develops response time series of individuals descriptive of the control effort; a group analysis of these shapes developed here uses Karhunen Loeve Analysis (KL) modes of these time series which are compared between groups by application of a Hotelling T² test to these group response shapes. We find that differences in the shape of the CE response exist within groups, between axes for untrained runners (vertical vs anterior-posterior and mediolateral vs anterior-posterior) and trained runners (mediolateral vs anterior-posterior). Also shape differences exist between groups by axes (vertical vs mediolateral). Further, the CE, as a whole, was higher in each axis in trained vs untrained runners. These results indicate that the approach can provide unique insight regarding the differing constraints on running gait in highly trained and untrained runners when running under dynamic conditions. Further, the final point indicates trained runners are less constrained than untrained runners across all running speeds.

1. Introduction.It has been estimated that approximately 10.5 million Americans run at least 100 days/year [23].In spite of this popularity, there is little data with regard to the gait patterns of highly trained or elite runners, compared to untrained runners .It might be anticipated that highly trained runners would develop an optimal pattern of movement, and corresponding variability of movement for the activity of running through practice [8], [35], and therefore, comparisons with untrained runners would be not only of performance, but clinical value.It has been 126 PARSHAD, MCGREGOR,BUSA, SKUFCA, BOLLT argued that a dynamical systems approach to gait analysis is more appropriate than more traditional linear approaches, and as such, there has been increasing interest in the variability of gait.It is not clear if higher or lower variability is optimal for performance, nor if changes in variability of movement are dependent upon practice/training [8], [26].This ambiguity may, in part, be due to the nature of the variability that is identified (i.e.linear vs non-linear) and the differences that lie therein.To add clarity to this area, if we compare the non-linear characteristics of movement patterns in highly trained runners with those of untrained runners, we may gain insight into aspects of gait that are optimal for running.This could be of value for clinical comparisons or as models of optimization for the development of robotic locomotion systems [6], [9].
Previously, various tools in the field of non-linear dynamical systems analysis have been applied to human gait data [1], [5], [10], [16] , [17], [18], [19].In particular, regularity/complexity statistics such as Approximate Entropy (AE) and Sample Entropy (SE) have been used to study complexity of gait [5], [10], [16], [17] , [19], [24] , but a problem with many of these tools is the requirement of stationarity, making them ill suited for analysis of data collected under dynamic conditions.Recently, we developed a novel tool for complexity analysis to be used under dynamic, nonstationary conditions termed control entropy (CE) [4].A central characteristic of CE is that it allows the inference of the control effort of a dynamical system generating the signal, while not requiring such a signal to be stationary [4].
We have also previously used high resolution accelerometers (HRA) to characterize differences between trained and untrained runners using linear approaches [21] .Further, we have used high resolution accelerometers and CE to examine differences in constraints between planes of movement during walking and running in highly trained runners [22].In this previous work, appropriate tools for rigorously testing differences between groups were limited though.In the current study, we use high resolution accelerometers and CE of the acceleration signal to compare complexity of gait patterns while running in highly trained versus untrained runners.We hypothesized that 1) differences would exist in CE of acceleration between axes in both trained and untrained runners, 2) trained runners would exhibit higher CE values at comparable speeds to untrained runners, and 3) decreases in CE from peak values would occur at higher speeds in trained vs untrained runners.In order to test these hypotheses between groups, we have applied statistical tools to complement the CE analysis, which allow shape analysis of each group response in their CE profile descriptive of the gait control complexity.Shape analysis here is by a Karhunen Loeve Analysis (KL) analysis into dominant modes, and the group analysis of these shapes is by a Hotellings T 2 test.We provide the details of development of this new approach to group analysis of shapes of systems responses in the methods section.

Results.
Control entropy responses in untrained and trained runners by axis.Results of Karhunen Loeve Analysis of CE of accelerations for individual axes in untrained runners can be seen in Figure 1.For each axis, a dominant mode was identified and its likelihood determined and presented as a power spectrum.Our analysis indicates a significant difference in shape of the CE response in untrained runners for vertical (V) vs anterior-posterior (AP) and mediolateral (M) vs anteriorposterior (Figure 1, Table 1).A non-significant trend was also observed between vertical and anterior-posterior (   We find mean vector results not significant when comparing vertical(V) vs mediolateral(M), we find significance when comparing both vertical(V) vs anterior-posterior(AP) and mediolateral(M) vs anterior-posterior(AP).
In Figure 1, it can be seen that the CE response in vertical (blue) appears to remain more consistently higher than mediolateral (red) and anterior-posterior (green), which start higher than vertical at 8 kph, but decline below vertical by the 16 kph stage.The results of Karhunen Loeve Analysis of CE of accelerations for individual axes in trained runners can be seen in Figure 2. Similar to untrained runners, a dominant mode was identified for each axis and its likelihood presented as a power spectrum.In Figure 2, it can be seen that the CE response in anterior-posterior (green) appears to remain more consistently higher than vertical (blue) and mediolateral (red).In contrast to untrained runners, in trained runners, anterior-posterior started lower than vertical and mediolateral at 8 kph, and increased slightly until declining late in the test, but remaining higher than vertical and mediolateral (Figure 2).

Control entropy response of trained versus untrained runners by axis.
When untrained runners were compared to trained runners using the developed shape analysis, it was determined that the shape of the CE response was significantly different in vertical plane between groups (Figure 3), but not in the mediolateral (Figure 4).There was a non significant trend for the CE response to be different in the anterior-posterior axis between groups (Figure 5).Analysis of scatter plots.In [21] we brought some of the techniques of proper orthogonal decomposition, into our work.If we take the time series CE data as a statistically sampled ensemble with standard assumptions regarding normally distributed i.i.d.data, Singular Value Decomposition (SVD) yields the least squares solution to a parametric fit descriptive of how each individual signal is a linear combination of the singular vectors by an equation which describes level curves of the χ 2 distribution, [20], where δa describes the projection of a particular data point (a new sampled experimental and processed CE time series in this case) onto the singular vectors.Now a fast decaying average power spectrum implies that only a few singular vectors describe dominant modes, we provide figures (Figures 6-11) of scatters of δa 1 and δa 2 , of the first two modes.We will consider a direct description of points as they may reside in a simple region.That is, considering a distribution ρ(δa 1 , δa 2 ) and a box shaped region, (2) then the probability that a single sample will reside in the box is, In any case, 0 ≤ p ≤ 1 so if we ask what is the probability that of n independently sampled experiments landing in the box, that is (4) while this quantity approaches 1 as n → ∞.Rather we may ask what is the probability that a sample of n trials has m ≤ n.This becomes a Bernoulli trials experiment.
Without dealing here with the straightforward specific statistic to hypothesis test confidence in a given number k or less of failures of the sample to occupy a chosen box, we will present the result of our two groups based on using the same box in each group .That each and every sample of n = 11 of the trained group (Figures 9-11) lands in the box shown suggests that p is rather large and close to 1.However, in contrast comparing to the untrained group (Figures 6-8), we see several and often many lying outside the box contrasts that this sample is from a similar distribution ρ leading to the same larger p.
In all axes, the untrained runners are more scattered than the trained runners.The trained runners scatter plots are quite tightly clustered.In the anteriorposterior axis in particular, we don't see a significant difference between trained We find mean vector results significant when comparing untrained vs trained runners in vertical, we find no significance when comparing both untrained vs trained runners in mediolateral and untrained vs trained in anterior-posterior.
Means Comparison for untrained vs trained.We find mean vector results significant in vertical, mediolateral and anterior-posterior.In the event that no significant differences are observed for Hotelling T 2 test, it is appropriate to perform a simple means comparison between groups to determine the average difference in CE.P values of simple means comparisons are shown.
In the case of untrained vs trained for mediolateral and anteriorposterior axes, p values were immeasurable and therefore reported as 0. It should be noted that in the case of the vertical axis, a significant Hotelling T 2 test result was observed (Table 3), and simple mean differences in this case should be viewed with caution.runners and untrained runners in the Karhunen Loeve Analysis.There is a large variance in the untrained runners.The responses appear quite different though,we decide to proceed with a more rigorous statistical hypothesis.We also provide figures (Figures 6-11) of the scatter plots of these modes of the runners by axis.This is seen via the Karhunen Loeve Analysis followed by the singular value decomposition.Some details behind the theory of the Karhunen Loeve Analysis as applicable in this context are provided in the methods section.For complete details the reader is referred to [4], [21].
Discussion.We present results of a comparison of non-linear dynamics of running gait between trained and untrained runners; in particular, under non-stationary conditions, using the regularity statistic Control Entropy.We hypothesized that 1) Differences would exist for CE of acceleration between axes in both trained and untrained runners.
2) Trained runners would exhibit higher CE values at comparable speeds to untrained runners.
3) Decreases in CE from peak values would occur at higher speeds in trained vs untrained runners.
In order to test these hypotheses, it was necessary to develop new statistical tools to complement the CE analysis.A unique, beneficial characteristic of CE versus other regularity statistics is the mitigation of the requirement for stationarity [4].That being said, due to the dynamically changing characteristics of the non-stationary systems examined in this study, traditional statistical approaches to means comparisons were not appropriate.Therefore, we developed code to apply the Hotelling T 2 test, to the multivariate data, appropriate in our setting.This was used to rigorously test the similarity of the shapes of Karhunen Loeve transforms to determine if the dynamics of the systems are behaving similarly or differently.With these tools we identified within group and between group differences in patterns of regularity/complexity in untrained and trained runners.It was anticipated that differences would be observed in CE profiles between axes in both trained and untrained runners.
In our previous work [22], we determined differences in CE profile existed between axes in highly trained runners.In the current study, this was confirmed in both trained and untrained runners.It has been proposed that constraints of human motion can be categorized as either organismic, environmental or task oriented [27].It is likely that the predominant constraints in the vertical axis are due to gravity (environmental) and, to some extent, the energy required to overcome it (organismic).Support for this may come from the apparent association between the reduction in RMS of accelerations in the vertical versus speed, particularly in untrained runners [21].In this previous work, we observed that RMS of accelerations in vertical were lower for trained versus untrained runners.In the current study, the CE response in vertical was significantly different between trained and  2)-( 5), tight clustering within the boxes shown is indicative of a strongly homogeneous group, here as measured within the singular value decomposition dominant modal description in the first two modes δa 1 and δa 2 of the CE response profile of the corresponding accelerometry axis labelled.Notice that in this presentation, it is immediately apparent that the trained group presents a highly homogeneous response, where the untrained group is quite the contrary.untrained runners (Fig 3).Since CE is indicative of control constraints, and trained runners exhibit lower accelerations in vertical plane [21], it appears as though CE can distinguish the reduced constraints in trained versus untrained runners in this parameter.Constraints in the anterior-posterior axis, on the other hand, are likely more organismic or possibly task oriented in nature.When running on a treadmill, environmental constraints in the anterior-posterior axis are minimal as there is no wind.Therefore, it is more plausible that the ability to increase stride length and/or frequency proportionally to speed are constrained [11].Such constraints could be biomechanical and/or metabolic in nature.For example, the decline in CE in the anterior-posterior axis appears to occur much later, and at higher speeds in the trained compared to the untrained runner (Fig 5).This apparent difference was not statistically significant by virtue of the Hotelling T 2 test, but it is intriguing to consider that this decline in CE in this axis might be related to the lactate threshold, and important metabolic determinant of performance in trained endurance athletes [37].Since we did not investigate the lactate threshold in this study, further work will be necessary to determine this with certainty.
In the current study , metabolic fitness levels were significantly different between trained and untrained runners as indicated by RER (lower in trained vs untrained runners) at given speed, aside from cardiorespiratory fitness represented by V O 2 max (Table 5), this discrepancy could suggest differences in metabolic constraints between groups.To that end, we have also examined highly trained runners compared to triathletes who are lesser trained at running, yet have equivalent metabolic fitness.In these populations, accelerations in the anterior-posterior axis are still significantly different (McGregor et al., unpublished data), which argues against metabolic constraints to accelerations in the plane of progression.Therefore, it may be that there are neuromuscular recruitment patterns that are learned in highly trained runners through practice that reduce accelerations in the plane of progression and allow such runners to run faster than lesser trained runners despite comparable fitness.In the current study, although the CE response in the anterior-posterior axis appeared to be qualitatively different between trained and untrained runners, this difference did not reach statistical significance (p=0.1;Table 3).Still, a means comparison revealed CE was higher in trained than in untrained runners, again indicating reduced constraints in trained runners.The apparently different response of CE in the plane of progression, and significantly higher CE on average in trained runners supports the notion that constraints in the anterior-posterior axis are lower in trained compared to untrained runners, and this contributes to the ability to run faster.Further work in comparably fit populations will be required to answer this point unequivocally.
An alternate explanation relates to the role of executive function (EF) in gait.Although much attention has been applied to the role of executive function in walking (reviewed in [36]), there is a paucity of data in this regard for running, in trained or untrained individuals.When walking though, numerous investigators have reported altered gait characteristics (e.g.stride frequency/length, speed) with the addition of a cognitive task during walking [2], [3], [33].Since untrained runners are less practiced at the activity of running, it may be that the trained runners are more skilled at the task of running, and devote less executive function to the task.This also agrees with the view of Davids et al. [8] who have argued that individuals who are skilled at a given task generally exhibit higher levels of variability (both linear and non-linear), as they are freer to explore options to the solution of Bernstein's problem, that being reducing the degrees of freedom in a highly complex system.So, in this current work, since trained are highly trained or practiced at the task of running, they are less constrained, in general, and exhibit higher CE as a result.In contrast, as Davids et al. point out [8] unskilled practitioners solve Bernstein's problem of reducing degrees of freedom by rigidly restricting segmental movements.This results in less variability in general, and also lower CE due to the increased constraints of this approach.Hence, trained are less constrained and exhibit higher CE, in general, than untrained runners.Recently, Nakayama et al. [39] have investigated the variability of stride interval in trained vs untrained runners.In this work, when compared at equivalent speeds, trained exhibited significantly lower coefficient of variation, and apparently, but not statistically significant, lower alpha (α) exponent of detrended fluctuation analysis (DFA).It is difficult to directly compare these results due to the different technical approaches, but the tendency for trained to exhibit lower α across all speeds is conceptually similar to our results as α indicates long range correlations, and a lower α would be associated with reduced constraints as α is lowest for running at preferred running speed [40].More work, possibly adding cognitive tasks while running, will be required to more clearly elucidate the nature of the constraints in the anterior-posterior axis, but it appears as though trained are less constrained than untrained runners in this parameter.
Another novel aspect of this study which provides intriguing insight regarding differences in variability between trained and untrained runners can be found in the scatter plots of the dominant modes of Karhunen Loeve analysis in Figure 3.In the untrained runners (Figures 6-8), the dominant modes exhibited quite a broad scatter, particularly in the mediolateral and anterior-posterior axes.In contrast, the trained runners ( Figures 9-11) exhibit a much tighter scatter in all axes.In particular, the contrast in scatter between untrained and trained runners in the mediolateral (Figures 7 and 10, respectively) and anterior-posterior axes (Figures 9 and 11, respectively) is quite striking.The reason for this observation is not clear, but may be a contributing factor to the lack of statistical significance between trained and untrained runners in the anterior-posterior axis (Figure 5).For example, the dominant modes of the Karhunen Loeve transforms in the anteriorposterior axis appear quite distinctive, more so than the other axes, but the T 2 test was not significantly different between groups.The large spread of the scatter for untrained runners in Figure 8 indicates a high variance that would confound the statistical test.This was not apparent in trained for the same axis (Figure 11).This may provide an interesting avenue of investigation in future studies as, if this is indicative of heterogeneity (untrained runners) or conversely, homogeneity (trained) in CE responses within groups, this may provide additional insight with experimental interventions or comparison between different clinical groups.
The observation of differing CE responses between individual axes of accelerations at the same time/speed may be of value.Because CE is a measure of system constraint and the systems controllers effort to maintain a current state of the system or respond to perturbations to the system [4], this tool may prove useful in the clinical context.If certain pathologies impart constraints on gait, these may be more apparent when contrasting the CE of accelerations in individual axes within an individual, or when compared against putative normative data.This could be used as a prospective diagnostic tool to identify pathologies that may not be apparent by using other approaches (e.g.variability of stride interval etc.) Typically, in contrast to some other biological parameters (e.g.cardiovascular), greater non-linear measures of variability with respect to gait patterns are associated with diseased states or poor health outcomes [14].The reason for this discrepancy is not clear, but it will be necessary to perform studies in clinical populations, in contrast to healthy and/or highly trained individuals, under similar experimental conditions to determine if this generalization applies to CE analysis of gait parameters.This study is the first to compare the CE of accelerations of running gait between trained and untrained runners under non-stationary conditions.Further, we also apply the Hotelling T 2 test, which we used to rigorously test the similarity of the shapes of Karhunen Loeve transforms to determine if the dynamics of the systems are behaving similarly or differently.Using this approach, differences were observed between axes within groups, as well as by axis between groups.These differences could be used to identify characteristic constraints in clinical populations and assist in treatment/rehabilitation.Additionally, these distinctions could also be used to determine optimized patterns of complexity that could serve as models for development of robotic locomotor systems.

Materials and Methods.
Ethics Statement.Subjects gave written informed consent to take part in this study, which was approved by the Eastern Michigan University, College of Health and Human Services -Human Subjects Review Committee.All procedures were conducted in accordance with the principles expressed in the Declaration of Helsinki.
Subjects.Fourteen subjects consisting of seven male NCAA Intercollegiate Division 1 distance runners (trained) and seven recreationally active, college students considered untrained (untrained runners) for running ( ) high resolution accelerometers, walking and running speed which are presented elsewhere [22].The subjects reported to the laboratory on the day of testing after having refrained from strenuous exercise, alcohol, and caffeine for 24 hours prior to the day of testing and having fasted for 3 hr.
Incremental exercise tests to volitional exhaustion.All subjects performed a standardized pre-run phase which consisted of walking initially at 2 km/h, and increasing speed by 2 km/h every 2 min.Subjects began running at 8 km/h and continued until volitional exhaustion.During tests, metabolic data was collected on a breath-by-breath basis using portable open circuit spirometry (Jaeger Oxycon Mobile, CA).V O 2 max was determined as the highest 30s average of the test.From this maximal aerobic speed (lowest speed eliciting V O 2 max) and maximal speed (maximal speed attained before exhaustion) were determined.
Metabolic Measurements.Indirect calorimetry was used to collect breath-bybreath measurements of V O 2 and VCO 2 using electrochemical oxygen measuring cell (SBx) in an Oxycon Mobile (Cardinal Health, OH) and averaged over 5 sec.The oxygen and carbon dioxide sensors were calibrated prior to each test for: ambient conditions (temperature and barometric pressure), volume and gas content against precision analyzed gas mixtures.
Accelerometry.The high resolution accelerometer device consisted of a triaxial MEMS accelerometer model ADXL210 (G-link Wireless Accelerometer Node 10g, Microstrain, Inc., vertical).The device was mounted to a semi-rigid strap and placed, superficial to L3/L4 vertebrae on the posterior side of the body in order to approximate the subjects center of mass [25].It was additionally secured with elastic athletic tape in order to remove extraneous movement of the device not associated with locomotion.Accelerations in gs were streamed in real time using telemetry to a base station at a frequency of 617 Hz.For the purposes of comparison between groups, data was only compared for speeds between 8 km/h and 16 km/h, stages which all individuals in both groups could complete.
Control Entropy.We begin by describing some of the popular forms of entropy, as found in the literature.From an information theory standpoint, the Shannon entropy [7], [34] is defined as where p i is the probability of being in a state i.This motivates the so called Renyi entropy [30], Here where m-dimensional partitions of uniformly sized hypercubes of side r-hypercubes with relative occupancy probability p i , although in general, one must define the supremum over all possible partitions and their refinements.
Of much importance in the current context is the Komolgorov-Sinai (KS) entropy, often called measure theoretic entropy [13].Due to difficulties in estimating this we often consider the so-called correlation entropy [15].This is often preferred in calculations regarding data due to its quick computation.Recently much attention has been drawn to the approximate entropy (AE) of Pincus [28], [29].Recently a modification of this called sample entropy (SampE) designed to remove self matching biases inherent in approximate entropy has also been considered [31].
An essential assumption in the aforementioned methods is an inherent assumption of stationarity.Note estimators such as approximate entropy and sample entropy may be a statistic of the finite sample, without requiring stationarity, but even they require sufficient recurrence so that computed values can be interpreted as estimates of transition probabilities.In [4] we developed a regularity statistic and coined the phrase, control entropy (CE).Our aim is to construct a tool that would be entropy like, but which we could apply to non-stationary time series data.Non-stationarity is observed in a large number of real world processes, and thus merits the usage of a tool, exactly like CE. Furthermore part of our goal was to understand parameter changes within the system as a way of detecting developing problems, or to serve as a warning before system failure.The CE tool is well suited for this.We now recap certain essentials from [4].
Consider a data set {z i } N i=1 to be a scalar time series from an ergodic process sampled on a uniform time grid.Let an embedding dimension m by a delay embedding,v i = (z i , z i−1 , ..., z i−m+1 ), with unit index delay.The correlation sum is defined as Here Θ is the Heavy side function, r is a parameter which defines a neighbourhood and N pairs is the total number of pairs of delay vectors.Integer parameter T ≥ 1 is a Thieler window which is used to smooth effects of near time correlations in data.We define This leads to the development of sample entropy.Thus define, [27] SE(j + J, w, {z i } n i=1 , m, r, T )) = h({z i } w+j i=1+j , m, r, T )), for 0 ≤ j ≤ n − w (11) where J represents time offsets.Sample entropy, with these arguments, represents an entropy assignment to each time window of dataset, and associated to each time instant J. From the Sample entropy entropy of a signal z i , we define the control entropy of the signal, for 0 ≤ j ≤ n − w We adopt the SAX method, [38] here, b is chosen to consist of n symbols, and x i is mapped to s i according to an equipartition of Z-values from a normal model on the data set.We shall use the SAX symbolization in computing CE b according to Eq. (12), where n will be chosen to satisfy the saturation criterion which we described in [4] that is if measured CE b becomes ln(n) at any time t, then it is assumed that n is too small and the cause is that there is overloading of symbolization, [38].
Karhunen Loeve Analysis.We now focus our attention briefly on the development of the KL analysis, to conduct a pattern analysis of the response profiles.This is well detailed in [4].Principle Component Analysis (PCA) also known as Principle Orthogonal Decomposition (POD) or the Karhunen-Loeve Transform have a long standing history in the field of partial differential equations or infinite dimensional dynamical systems.The analysis of PDE's most often involves making a truncation of the equation under consideration and then making apriori estimates on this truncation, followed by extracting the right sub sequences, to answer questions regarding well posedness and regularity.This follows via standard functional analysis theory [32].The analysis of many time series can also be cast into this form.In [4] we brought some of these techniques into our work.We first discuss here the Singular Value Decomposition (SVD) [12].Consider a population of p members, each of which presents a signal, thus presenting theoretical data set,{z i (t)} , 1 ≤ i ≤ p, which however in practice is discretely sampled in time, {z i (t j )} , 1 ≤ j ≤ N, 1 ≤ i ≤ p may rather be considered as a data array, Z p,n , While we have written this in general terms, here we shall always take each z i (t) to be the CE time series signal processed from each of the i th member sampled.Then considering subtracting the mean from the data.We denote this as Ẑi,j = z i (t j ) − j z i (t j ).(14) To compare this to common spatio-temporal analysis notation, w(i, t) = Ẑi,t , where due to sampling, t is one of t j , 1 ≤ j ≤ N .Then, K-L eigen modes are the eigenfunctions of the autocorrelation matrix, K j,j = Ẑi,j , Ẑi,j = 1 p Ẑi,j Ẑi,j which denotes products at each time pairing t j and t j , averaged across sample indexed by i, where the brackets < .> denotes integration across the sample set indexed by i.The spectral decomposition theorem [12], tells us that the eigenfunctions of K are orthogonal, since K must be positive semi-definite, and represent an optimal basis in population average.Therefore, writing as in Karhunen Loeve analysis, we write, Here, φ n (t) denotes the eigenfunction, which is a function of time, and a n (i) is the coefficient of projection for each sample.The relevance of this modal analysis, is that the modes φ n (i), are known to be orthogonal, and optimal in average.That is, the power spectrum is fastest decaying in time average, when compared to a power spectrum as developed by any other basis set.See [12] for details.Note an eigenvalue may be written as, where round parahentesis (., .)denote an inner product with respect to integration in time, (f (t), g(t)) = T 0 f (t)g(t)dt and < .> denotes the average across the samples.
Statistical Hypothesis Testing.The singular value decomposition method explained previously essentially assumes the sample of data as an ellipsoid cloud in the parameter space.This is a normality assumption.Here the length of the major and minor axis of the ellipsoid are inversely proportional to each s 2 n .Also if we assume that the data is normally distributed and i.i.d., singular value decomposition yields the least squares solution to a parametric fit describing how an individual signal is a linear combination of the singular vectors.This is given by an equation which describes level curves of the χ 2 distribution, [20], Specifically interpreting Eq (18), projections onto the few major axis (in our case this is the first two) when the data is tightly correlated, it is all contained within the region bounded by an ellipse.Thus data points lying outside this ellipse are identified as outliers.This method was adopted in [4].
Our current goal is to adopt a formal statistical approach to continue the agenda of [4].We would now like to construct something stronger than the "ellipsoid" approach, which the proper orthogonal decomposition provides.Thus we want to tell with a statistical confidence, how different two groups of runners might be.We will resort to multivariate statistical analysis as we are considering the first two modes.
A statistical hypothesis test is a means to make a statistical decision via data from an experiment [20].We say a result is statistically significant if it is unlikely to have occurred by chance [20].The method of all hypothesis testing is to formulate a hypothesis, that is decide what we are trying to test for.In statistical language this is the so called alternative hypothesis: H 1 .The antithesis of this is the null hypothesis: H 0 .This is the hypothesis that our initial claim is wrong.The outcome of a statistical test is a certain parameter value, which is commonly referred to as the p-value.This value will have to be below a certain threshold if the null hypothesis has to be rejected.If the p-value is above this critical level we say nothing significant can be concluded from this test.Thus as modellers our role is to formulate a hypothesis, devise the right test, and then carry out the aforementioned procedure.
Under the assumption of normality, we are dealing with a projective data cloud, and choose to use the Hotellings T 2 test.This is a multivariate version of the students t test.The students t distribution is a continuous probability distribution that arises when one wants to estimate the mean of a normally distributed population.It is used when the sample size is small [20].The distribution has the following density function Here ν are the number of degrees of freedom, and γ is the standard gamma function.
Recall that the univariate t-test for the mean of a sample X = x 1 , x 2 , ...x i .The variable t given by has a t distribution if X is normally distributed.If we want to test a hypothesis that the mean between two groups is equal or if µ = µ 0 , then we would have so that we would obtain In the event that we generalise to p variables we obtain Here S is the sample covariance matrix, and It is known that when µ = µ 0 we have Where F is the standard F distribution.Thus, if we specify µ = µ 0 , this could indeed be tested by taking a single p-variate sample, which would be of size n.We would then compute T 2 and compare this to For a suitable choice of α.In our case we have to extend this to the multivariate case.We now clarify the methodology.Instead of single observations x, we now have vector observations, as a result of the proper orthogonal decomposition routine applied to the CE signal of the raw data from the runners.
Here X 1i represents a particular runner in say the first group with x 11 and x 12 representing his first two modes.Similarly there are X 2i , X 3i , ..X ni and Y 2i , Y 3i , ..Y ni , for the two different groups under consideration.The scalar population means are replaced by vector population mean vectors.Thus we have that µ 1 is the population mean vector for the first group and µ 2 is the population mean vector for the second group.We will formulate our goal as follows: Goal: We are interested in testing the null hypothesis that the population mean vectors for the two groups of runners are equal, against the alternative hypothesis that these mean vectors are not equal Thus we are testing H 0 : µ 1 = µ 2 against H 1 : µ 1 = µ 2 (28) This can be carried out via the following procedure.Under the null hypothesis the two mean vectors are equal element by element.Thus we will look at the differences between the observations.We define We also define the vector Thus we have now converted our original problem into a problem of testing the null hypothesis that the population mean vector µ Z = 0.This formulation reads H 0 : µ Z = 0 against H a : µ Z = 0 (31) This hypothesis is tested using the paired Hotelling's T 2 test.We define We also define S Z to denote the sample variance-covariance matrix of the vectors Z i .
Various assumptions are made for the Hotelling's T 2 test to be carried out We assume normality and independence, that is the Y i 'z are independently multivariate normally distributed.Paired Hotelling's T 2 test statistic is given by ) This is a function of the sample size n, the sample mean vectors, z, and also the inverse of the variance-covariance matrix S Z .
We next define an F-statistic : T 2 ∼ F p,n−p (35) We will reject the null hypothesis at level α if the F-value exceeds the value with p and n-p degrees of freedom, evaluated at level α, which for our purposes (as well as in most cases) is set at 0.05.The computations for the above were carried out in MATLAB.We developed code to symbolise the raw data, from which firts the CE is calculated.This is passed into a second routine which performs the proper orthogonal decomposition, and yields the dominant modes, for runners for the groups in question.This is finally passed pairwise, into a routine which carries out the multivariate Hotelling T test, yielding the statistics of interest, which essentially allows us to compare the groups.

Figure 1 .
Figure 1.Dominant modes of control entropy responses for untrained runners by axis Control entropy (CE) of accelerations collected in high resolution at the approximate center of mass from untrained runners during an incremental test.Karhunen-Loeve transformation was performed to generate a dominant mode for the CE response in each of three axes (vertical = blue; mediolateral = Red, anterior-posterior = green).Like symbols (*, ±) indicate significantly different shapes of dominant modes between axes.

Figure 2 .
Figure 2. Dominant modes of control entropy responses for trained runners by axis Control entropy (CE) of accelerations collected in high resolution at the approximate center of mass from trained runners during an incremental test.Karhunen-Loeve transformation was performed to generate a dominant mode for the CE response in each of three axes (vertical = blue; mediolateral = Red, anterior-posterior = green).Like symbols (*, ±) indicate significantly different shapes of dominant modes between axes.

Figure 3 .
Figure 3. Dominant modes of Karhunen-Loeve transformations generated from control entropy (CE) responses of accelerations compared between trained and untrained runners.Accelerations were collected in high resolution at the approximate center of mass from trained (T) and untrained (UT) runners during an incremental test, and CE of accelerations were compared between groups for vertical, mediolateral, and anterior-posterior axes at equivalent speeds (untrained = red, trained = blue).In Figure 3, * indicates significantly different shape of dominant modes between trained and untrained runners.

Figure 4 .
Figure 4. Dominant mode of trained (blue) vs untrained (red) runners for mediolateral axis In Figure 4 , # indicates significantly different mean CE values between dominant modes for trained and untrained runners.

Figure 5 .
Figure 5. Dominant mode of trained (blue) vs untrained (red) runners for anterior-posterior axis In Figure 5, # indicates significantly different mean CE values between dominant modes for trained and untrained runners.

Figure 6 .
Figure 6.The scatter plots for untrained runners, in the vertical channel .Scatter plot presentation of clustering in untrained runners (Figure 6-8) versus trained runners (Figure 9-11) in vertical, mediolateral, and anterior posterior channels is shown.According to discussion in equations (2)-(5), tight clustering within the boxes shown is indicative of a strongly homogeneous group, here as measured within the singular value decomposition dominant modal description in the first two modes δa 1 and δa 2 of the CE response profile of the corresponding accelerometry axis labelled.Notice that in this presentation, it is immediately apparent that the trained group presents a highly homogeneous response, where the untrained group is quite the contrary.

Figure 7 .
Figure 7.The scatter plots for untrained runners, in the mediolateral channel .

Figure 8 .
Figure 8.The scatter plots for untrained runners, in the anterior posterior channel .

Figure 9 .
Figure 9.The scatter plots for trained runners, in the vertical channel .

Figure 10 .
Figure 10.The scatter plots for trained runners, in the mediolateral channel .

Figure 11 .
Figure 11.The scatter plots for trained runners, in the anterior posterior channel .

Table 1 .
). Statistical comparison of dominant modes of CE response of accelerometry in untrained runners between axes.

Table 5 )
gave written informed consent to take part in this study, which was approved by the Eastern

Table 5 .
Physical characteristics of subjects.Subjects completed two continuous, incremental exercise tests on a motorized treadmill (True ZX-9, St. Louis, MO) with at least 6 days separating each trial.Exercise tests were performed to volitional exhaustion while high resolution triaxial acceleromety and open circuit spirometry was collected to determine relationships between metabolic parameters (e.g.Ve, V O 2 , VCO 2