Entropy determination based on the ordinal structure of a dynamical system

The ordinal approach to evaluate time series due to innovative works of Bandt and Pompe has increasingly established itself among other techniques of nonlinear time series analysis. In this paper, we summarize and generalize the theory of determining the Kolmogorov-Sinai entropy of a measure-preserving dynamical system via increasing sequences of order generated partitions of the state space. Our main focus are measuring processes without information loss. Particularly, we consider the question of the minimal necessary number of measurements related to the properties of a given dynamical system.


Introduction
Since the invention of permutation entropy by Bandt and Pompe [8] and the proof of its coincidence with Kolmogorov-Sinai entropy for piecewise monotone interval maps by Bandt et al. in [7], there is some increasing interest in considering time series and dynamical systems from the pure ordinal point of view (see Amigó,[4]). The idea behind this viewpoint is that much information of a system is already contained in ordinal patterns describing the up and down of its orbits. This ordinal view can be particularly useful when having physical quantities for which the statement that a measuring value is larger than another one is well interpretable, but concrete purely given differences of measuring values are not. A prominent example is the (indirect) measurement of temperature as the mean kinetic energy of the particles of a system by a thermometer. One can make statements about what is warmer or colder, but, for example, the interpretation of an increase by 1 • C with not knowing the baseline value is complicated. This paper is generally discussing the Kolmogorov-Sinai entropy from the ordinal viewpoint. It reviews and particularly extends and generalizes former results given by Antoniouk et al. [6], Amigó [3], Keller [14], Keller and Sinn [15,16] and Amigó et al. [5]. Aspects of entropy estimation are touched.
The framework. The basic model of our discussion is a measure-preserving dynamical system (Ω, A, µ, T ), i.e. Ω is a non-empty set whose elements are interpreted as the states of a system, A is a sigma-algebra on Ω, µ : A → [0, 1] is a probability measure, and T : Ω ←֓ is a A-A-measurable µ-preserving map describing the dynamics of the system. µ-preserving means that µ(T −1 (A)) = µ(A) for all A ∈ A; the measure µ is then called T -invariant.
We want to have some kind of regularity of T by assuming at least one of the following conditions: T is ergodic with respect to µ, i.e.
Ω can be embedded into some compact metrizable space so that A = B(Ω). (2) Here and in the whole paper, B(Ω) denotes the Borel σ-algebra in the case that Ω is a topological space. As usual, equivalent to T is ergodic with respect to µ, we say that µ is ergodic for T .
Often the states of a system, whatever they are, cannot be accessed directly, but information on them can be obtained by measurements. In this paper such measurements are assumed to be given via observables X 1 , X 2 , X 3 , . . . defined as R-valued random variables on the probability space (Ω, A, µ). So the measurements are provided by a stochastic process -we say sequence of observables X = (X i ) i∈N -whose realization has components (X i (T •t (ω))) t∈N 0 . Here X i (T •t (ω)) is interpreted as the i-th measured value from the system at time t when starting in state ω ∈ Ω.
A priori we have infinitely many observables providing more and more information, the finite case, however, is included by equality of all X i ; i ≥ n for some n ∈ N. We will write X = (X i ) n i=1 in the case of finitely many observables and X = X in the case of only one observable X.
Unless otherwise stated, in the following (Ω, A, µ, T ) is a measure-preserving dynamical system and X = (X i ) i∈N a sequence of observables.
Kolmogorov-Sinai entropy. In order to recall the Kolmogorov-Sinai entropy, let q ∈ N and P = {P 1 , P 2 , . . . , P q } ⊂ A be a finite partition of Ω, i.e. Ω = q l=1 P l , P l = ∅ for l = 1, 2, . . . , q, P l 1 ∩ P l 2 = ∅ for different l 1 , l 2 ∈ {1, 2, . . . , q}, and let A = {1, 2, . . . , q} be the corresponding alphabet. Each word a 1 a 2 . . . a t of length t ∈ N defines a set and the collection of all non-empty sets obtained for such words of length t provides a partition P t ⊂ A of Ω. In particular, P 1 = P.
The entropy rate of T with respect to an initial partition P is given by where H µ (C) denotes the (Shannon) entropy of a finite partition C = {C 1 , C 2 , . . . , (with 0 ln(0) := 0), and the Kolmogorov-Sinai entropy is defined by Although the Kolmogorov-Sinai entropy is well-defined, its determination is not easy.
In some special cases one can find finite partitions already determining it, usually called generating partitions (see Definition 6.3), however, do not exist or are not accessible. As a substitute, we want to consider special sequences of partitions only depending on the ordinal structure of a dynamical system.
Ordinal partitioning. For a single observable X on (Ω, A, µ, T ) and s, t ∈ N 0 with s < t, consider the bisection of Ω and, for observables X 1 , X 2 , . . . , X n on (Ω, A, µ, T ) and d, n ∈ N, the partition i.e. the coarsest partition refining all bisections P X i ,T s,t ; i = 1, 2, . . . n, 0 ≤ s < t ≤ d. (If one of the sets of the right hand side of (3) is empty, P X,T s,t is considered to consist of only one set.) The partition P . By definition its parts contain all states with equal ordinal measurement structure for an initial orbit part.
A central statement. Clearly, in order to preserve information of the given system, the observables should separate orbits of the system in a certain sense. In order to give a precise description, let in the following σ((X • T •t ) t∈N 0 ) be the σ-algebra generated by all random variables The following generalization of a statement in Antoniouk et al. [6] says that if there is no information loss by measuring with observables, all information is preserved also by only considering measurements from the ordinal viewpoint. Theorem 1.1. Let (Ω, A, µ, T ) be a measure-preserving dynamical system and X = (X i ) i∈N be a sequence of observables such that σ((X • T •t ) t∈N 0 ) µ ⊃ A. Assume that (1) or (2) holds. Then When Bandt and Pompe [8] invented the permutation entropy, they considered onedimensional systems with coincidence of states and measurements. This fits into the given general approach as follows: Ω is a Borel subset of R and only one observable is considered to be the identity map id from Ω into R. In this situation the assumptions of Theorem 1.1 are satisfied and so it holds [15,16]).
Structure of the paper. The paper is organized as follows. In Section 2 we provide a proof of Theorem 1.1 on the basis of Antoniouk et al. [6]. We, moreover, discuss this statement from different perspectives in Section 3 by presenting its modifications and variants. Section 4 is devoted to the concept of permutation entropy, in particular to the two different approaches to it given by Bandt et al. in [7] and Amigó et al. in [5], respectively, and to its relation to the Kolmogorov-Sinai entropy. The ordinal approach to dynamical systems opens new perspectives to the estimation of system complexity. Advantages and limitations of this approach are discussed in Section 5. The natural question of how many observables are necessary for satisfying the assumptions of Theorem 1.1 is in the focus of Section 6. The corresponding discussion is strongly related to Takens' delay embedding and similar ideas (see Takens [23] and Sauer [22]).

Kolmogorov-Sinai entropy from the ordinal viewpoint
This section is devoted to the proof of Theorem 1.1.
Preliminaries. In the following we write F µ = G if F µ ⊃ G and F µ ⊂ G, and denote by 1 A the indicator function of a subset A ⊂ Ω. Moreover σ(♦) denotes the σ-algebra generated by a set ♦ of subsets of Ω, by a sequence or double sequence ♦ of sets of subsets of Ω, or by a random variable ♦ on Ω.
Given two finite partitions C, D ⊂ A of Ω, we write C ≺ D if D is finer than C or, equivalently, if C is coarser than D, that is, each element C ∈ C is a finite union of some elements of D. Note that ≺ on the set of finite partitions of Ω contained in A is a partial order.
The join m r=1 C r of m ∈ N finite partitions C r = {C   (4)), for d, n ∈ N we are interested in the finite partitions Furthermore, we need the following σ-algebras associated to these partitions: The proof. Although we consider dynamical systems equipped with infinitely many observables, we can follow closely the argumentation in the paper Antoniouk et al. [6]. So let us first recall or modify those statements of that paper used in our proof.  By very slight modifications we can extend [6, Corollary 3.4 and Corollary 3.5] to countably many observables: Proof. Compare to [6,Corollary 3.4]. The σ-algebra Σ X,T is generated by the σ-algebras follows the assumption. This is true since I d d : Ω → [0, 1] is Σ X,T -B([0, 1])-measurable for all d ∈ N and hence so is F • X and X by Lemma 2.1 and Lemma 2.2. The inclusion Σ X,T ⊂ Σ X,T is given by construction (compare (4) and (6)).
Proof. For fixed n ∈ N, in [6, Proof of Corollary 3.5] it is shown that Moreover, Corollary 2.3 gives n∈N is an increasing sequence in n for fixed d, and for fixed n it is an increasing sequence in d.
In particular, (P implying P ,T d and so the above statements. For completing the proof of Theorem 1.1, we apply the following statement (see Walters [27,Theorem 4.22]): First suppose that T is an ergodic map. Then under the assumptions of Theorem

and by Corollary 2.4 it holds
is an increasing sequence in j with respect to ≺ for increasing sequences (d j ) j∈N and (n j ) j∈N in N, the assertion of Theorem 1.1 follows from Lemma 2.6.
In the non-ergodic case the ergodic decomposition theorem is consulted. For a thorough treatment we refer the reader to Einsiedler and Ward [10] and Einsiedler et al. [9].
In particular, the ergodic decomposition theorem claims that under certain conditions any T -invariant measure µ can be decomposed into ergodic components and subsequently the entropy rate as well as the Kolmogorov-Sinai entropy of T with respect to µ can be written as the integral of the entropies with respect to the decomposition.

Moreover, it holds
Altogether we obtain Here (n j ) j∈N and (d j ) j∈N are strictly increasing sequences of natural numbers.

Modifications and conseqences of Theorem 1.1.
We want to have a closer look at Theorem 1.1. For this recall that X • T •t can be interpreted as a measurement of a system at time t. As discussed in Section 1, there is no information loss when taking a pure ordinal viewpoint in the case that these measurements have 'separating properties'.
Less comparisons. The main Theorem 1.1 can be given in a relaxed version if the considered observables provide a 'separation' from the outset (compare also [16,17]). In order to determine the Kolmogorov-Sinai entropy, this means, in the case of 'separating' original observables, one does not need all comparisons between the elements of an orbit but only comparisons between points and their iterates.
Theorem 3.1. Let (Ω, A, µ, T ) be a measure-preserving dynamical system and X = (X i ) i∈N be a sequence of observables such that σ(X) For an ergodic map T we have that A µ ⊂ Σ X,T , which follows from Corollary 2.3 and the assumption σ(X) n∈N is an increasing sequence in d and N with respect to ≺, as it can be shown analogical to the proof of Lemma 2.5. Thus, for T ergodic the assertion follows by Lemma 2.6. To show the non-ergodic case one can use the the ergodic decomposition theorem as in the proof of Theorem 1.1.
It seems that the assumption σ(X) µ ⊃ A in Theorem 3.1 cannot be replaced by the is true as (9) is, the analogue d+1 for all d ∈ N and i = 1, 2, . . . , n of (7) is false. Therefore the analogue (T is the tent map preserving the equidistribution on [0, 1].) Let It follows that ω 1 and ω 2 are separated by P Y •T,T Other partitions. For a single observable X on a measure-preserving dynamical system (Ω, A, µ, T ) and s, t ∈ N 0 with s < t, let Further, for observables X 1 , X 2 , . . . , X n on (Ω, A, µ, T ) and d ∈ N, let (If one of the sets of the right hand side of (12) or (12) is empty, then it is not considered in order to have only nonempty sets.) Then the following is valid: The statement of Theorem 1.1 remains true when substituting P The existence of the limit lim and its coincidence with the corresponding supremum is obvious (compare discussion for P Let us consider an order ≺ between observables X, Y by X ≺ Y iff for all ω 1 , ω 2 ∈ Ω the following holds (compare [3]): One easily shows the following: After the following corollary being an immediate consequence of Theorem 1.1, we will illustrate this point by an example.
Corollary 3.5. Let (Ω, A, µ, T ) be a measure-preserving dynamical system and X = (X i ) i∈N be a sequence of observables with  Remark 3.7. Each finite partition C = {C 1 , C 2 , . . . , C q } ⊂ A; q ∈ N is generated by observables of the form X = q l=1 α l · 1 C l in the sense that C l = X −1 (α l ) for all l = 1, 2, . . . , q, where α l ; l = 1, 2, . . . , q are different real numbers. If a partition D ⊂ A is finer than C, than it can be written as j . If X = q l=1 α l · 1 C l for different α l ∈ N and if m > m l for all l = 1, 2, . . . , q, then for it holds X ≺ Y . This shows that an increasing sequence (C d ) d∈N can be 'generated' by a sequence (X d ) d∈N of observables with X 1 ≺ X 2 ≺ X 3 ≺ . . . .

Permutation entropy
The idea of considering dynamical systems from the ordinal viewpoint is strongly related to the invention of the permutation entropy, which we want to discuss now. We first give a definition of it in our general framework: Definition 4.1. Given a sequence X = (X i ) i∈N of observables on a measure-preserving dynamical system (Ω, A, µ, T ), we define the permutation entropy h µ (T, X) with respect to X by (14) h Originally, by Bandt et al. in [7] the definition of permutation entropy was given directly for one-dimensional systems. In our framework, this is h * µ (T, id) with T being an interval map.
Permutation and Kolmogorov-Sinai entropy. One reason for investigating the permutation entropy is its close relationship to the well-established Kolmogorov-Sinai entropy first observed by Bandt et al. in [7]. In their seminal paper they have shown that both entropies are coinciding for piecewise monotone interval maps T , i.e. for selfmaps T on intervals splitting into finitely many subintervals on which T is continuous and monotone.
Moreover, in the case that σ((X • T •t ) t∈N 0 ) µ ⊃ A and that (1) or (2) holds, the Kolmogorov-Sinai entropy is not larger than permutation entropy. It holds for finitely many observables ) for all n ∈ N (see Keller et al. [18,Corollary 3]), hence the corresponding inequality for infinitely many ones follows by n approaching to infinity. So let us summarize: Corollary 4.2. Let (Ω, A, µ, T ) be a measure-preserving dynamical system and X = (X i ) i∈N be a sequence of observables such that σ((X The approach of Amigó et al. [3,5]. This approach to permutation entropy different to the original is based on a refining sequence of finite partitions and is justified by the following statement due to Amigó et al. [3,5]. We express the statement by finite-valued observables and refer here to Remark 3.7. Theorem 4.3. For a measure-preserving dynamical system (Ω, A, µ, T ) the following is valid: (i) If X is a finitely-valued observable, and P the finite partition generated by X, then h µ (T, P) = h * µ (T, X).
(ii) If (X i ) i∈N is a sequence of finitely-valued observables with X 1 ≺ X 2 ≺ X 3 ≺ . . . and the corresponding sequence of finite partitions generates A, then One immediately sees that by Lemma 2.6 assertion (ii) follows directly from statement (i). Amigó et al. took the right hand side of (15) as their modified concept of permutation entropy before showing its equality to Kolmogorov-Sinai entropy.
We want to finish this section by stating the following general problem, which is interesting on the different levels from the original one-dimensional definition of permutation entropy to the generalization for finitely or infinitely many observables.
Problem. Are the Kolmogorov-Sinai entropy and the permutation entropy coinciding and, if not, under which assumptions?
Note that the pure combinatorial part of the problem is relatively well understood (see Unakafova et al. [26], Keller et al. [18]).

Ordinal time series analysis
Ever since the idea of Bandt and Pompe [8] to consider the rank order of consecutive values of a time series instead of the values themselves, the ordinal approach attracts increasing attention and is applied in many scientific fields, for example in biomedical research, engineering and econophysics (see Amigó et al. [1,2], Zanin et al. [28] and the references given there).
The reason is that the ordinal viewpoint brings with it many advantages especially for measuring complexity, such as robustness against small noise, simplicity of application and interpretation, and low computational costs. As mentioned, the determination of Kolmogorov-Sinai entropy is usually not easy, our discussion above, however, suggests that the ordinal approach can be used as a framework for estimating the Kolmogorov-Sinai entropy of dynamical systems and suchlike from real world data.
In the following we consider the theory developed in the previous sections in an applied context and discuss the pro and cons of using this approach in view of studying long and complex time series. A detailed exposition of this ordinal pattern approach is provided in Keller et al. [17].
Ordinal patterns. The task of gaining information about an underlying system via measurements is a common everyday problem. As already mentioned, this issue is increasingly addressed by using information lying in the ordinal structure of a system or a time series obtained from it. This leads to considering the up and downs in a time series, which can be described via so-called ordinal patterns.
Given a time series (x t ) t∈N 0 , the ordinal pattern of order d at time t is defined as that of (x t+s ) d s=0 and denoted by π t . Example 5.2. In Figure 1 we consider a time series of 50 data points where exemplary the ordinal pattern π 10 = (0, 5, 3, 4, 6, 1, 2) ∈ Π 6 is emphasized, which corresponds to the order relation of the six successive values at t = 10, that is It is easily seen that, following the framework given in Section 1, two states ω 1 ∈ Ω and ω 2 ∈ Ω belong to the same part of some ordinal partition P iff the ordinal patterns of the vectors (X i (ω 1 ), X i (T (ω 1 )), . . . , X(T •d (ω 2 ))) and (X i (ω 2 ), X i (T (ω 2 )), . . . , X(T •d (ω 2 ))) coincide. Clearly, the other previous considered partitions (see Equations (6), (12) and (13)), despite some adjustments in terms of equality, can be coherent assimilated to this ordinal approach by redefining ordinal patterns in terms of the equality of values. The setting (16) is here in some sense arbitrary, however, the proposed definition of ordinal patterns has established itself. We will use it in the following to demonstrate how the previous covered theory provides interesting and promising tools for extracting the information saved in an ordinal pattern sequence or suchlike, for example, by estimating the permutation entropy (see Equation (14)) or by approximating the Kolmogorov-Sinai entropy.
In order to utilize ordinal patterns for the analysis of a system, sequential data (x t ) t∈N obtained from a given measurement are transformed into a series (π t ) t∈N 0 of ordinal patterns. Distributions of ordinal patterns obtained from this approach are the central objects of exploration.
Note that ordinal patterns do not provide a symbolic representation as it is usually considered, since partitions of the state space are not given a priori, but are created on the basis of the given dynamics. However, the ordinal patterns as 'symbols' are very simple objects being directly obtained from the orbits of the system and containing intrinsic causal information. For the relationship of symbolic dynamics and representations and ordinal time series analysis see Amigó et al. [1].
For simplicity, we now restrict our exposition to the one-dimensional case with only one measurement. What we have in mind is a measure preserving dynamical system (Ω, A, µ, T ), where Ω is a Borel subset of R, acting as the model of a system, with a single observable X being the identity map. The extension of the ideas to the general case is obvious.
Estimation of ordinal quantities. The naive and mainly used estimator of ordinal pattern probabilities, so of the probability of the ordinal partition parts, is the relative frequency of ordinal patterns in an orbit of some length. For some t, d ∈ N, some ordinal pattern π of order d and some ω ∈ Ω the estimation is given by the number . . . , X(T •s+d (ω))) has ordinal pattern π}.
Here t + 1 is the length of the considered orbit of ω. Clearly, the estimation only makes sense in the ergodic case. Then, by Birkhoff's ergodic theorem, the corresponding estimator is consistent. If in the ergodic case allp π ; π ∈ Π d are determined, it follows immediately that in the simple case considered a reasonable estimator for (14) is given by the empirical permutation entropy of order d ∈ N: It gives furthermore also some information on the Kolmogorov-Sinai entropy.
Assets and drawbacks. Irrespective of the considered ordinal partition, the ordinal approach brings along some practical advantages and disadvantages. Note that most difficulties to overcome are common to any sort of time series analysis.
Considering the order relation between the values of a time series, small inaccuracies in measurements (e.g. errors between the state of a system and its observed value) are mostly negligible. Hence, the methods considered are relatively robust towards calibration differences of measuring instruments. Furthermore, the ordinal approach is easily interpretable and there already exist efficient methods to perform an ordinal time series analysis in real time. For a deeper discussion we refer to Riedl et al. [20] as well as Unakafova and Keller [25]. Last but not least, a foreknowledge of the data range when analyzing data is usually not necessary.
In contrast, the ordinal analysis of time series can be rather poor if the underlying system is so complex that such a large value d is needed that the computational capacity is insufficient. If, for example, the permutation entropy of a dynamical system is very large, its estimation by the empirical permutation entropy is problematic. Note that generally also for simple systems the convergency of empirical permutation entropies of order d to the permutation entropy can be rather slow, which is the reason for considering a conditional adaption of the permutation entropy (see Unakafov and Keller [24]).
In addition, the choice of a suitable order d with respect to the length of the original time series is affected by common problems. Large values of d are needed to evaluate encapsulated information as accurate as possible but a large d grants (d + 1)! possible ordinal patterns which have to be considered if nothing is known about the original time series. If one chooses an overlarge d relative to the length of a time series, it can happen that not all ordinal patterns which are substantial for describing the underlying dynamics are observed in the ordinal pattern distribution or suchlike. This is known as undersampling.
Moreover, ordinal time series analysis can lead to an arbitrary poor approximation of the Kolmogorov-Sinai entropy or poor representation of the underlying dynamics by the statistics, especially while working on wrong assumptions, e.g. a given system fails to be ergodic or the chosen observables cause information loss while measuring. The next section alludes to the latter problem.

Algebra reconstruction dimension
Theorems 1.1 claims that the Kolmogorov-Sinai entropy of T can be computed provided that we have sufficiently many observables "generating" A up to µ-measure zero. Essential for applications, the natural question arises how we can decrease the number of observables as much as possible. In this section we briefly review the known results in this direction.
Only one observable. The following example shows that theoretically in most real cases we can find only one such observable. , Ω ⊂ Z be its uncountable Borel subset, and A := B(Ω) be the Borel σ-algebra of Ω. Then the pair (Ω, B(Ω)) is called a standard Borel space. It is well known, e.g. see Kechris [13,Proposition 12.1], that then there exists a measurable isomorphism of (Ω, B(Ω)) onto the space I, B(I) , that is a bijection X : Ω → I such that X −1 (B(I)) = B(Ω).
Let µ be a measure on (Ω, B(Ω) and T : Ω → Ω be any µ-preserving map. Then Notice that the function X : Ω → [0, 1] ⊂ R from Example 6.1 is not in general continuous and its explicit construction is very complicated. Therefore it is not useful for real applications. This leads to the following notion.
Definition 6.2. Let (Ω, B(Ω)) be a standard Borel space with measure µ on B(Ω), and T : Ω → Ω be a B(Ω)-B(Ω)-measurable map. By the algebra reconstruction dimension of T with respect to µ we will mean the minimal integer number n ≥ 1 such that there exists a continuous map X : Ω → R n satisfying (17) σ This number will be denoted by ard µ (T ). If such n does not exist, then we will assume that ard µ (T ) = ∞.
Thus ard µ (T ) is the minimal number of continuous observables needed to approximate the Kolmogorov-Sinai entropy via (5).
Given a map T : Ω → Ω, a map X : Ω → R n and t ∈ N one can define the following t-reconstruction map Λ X,T,t = X, X • T, . . . , X • T •t−1 : Ω → R nt and an ∞-reconstruction map In particular, (17) can be reformulated as follows: Before discussing ard µ (T ) we will present an example for the existence of one separating observable, that is X : Ω → R satisfying (18), and therefore allowing to approximate the Kolmogorov-Sinai entropy by formula (5), see Theorem 6.5 below. However, now this observable is "discrete", i.e. it takes at most countable many values. Definition 6.3. Let (Ω, A, µ, T ) be a measure-preserving dynamical system. An at most countable partition C = {C l } q l=1 ⊂ A of Ω for some q ∈ N ∪ {∞}, is called generating with respect to T , if The following lemma is evident. Lemma 6.4. Suppose a measure-preserving dynamical system (Ω, A, µ, T ) has a gen- In general, a µ-preserving map does not have a generating partition. Nevertheless, for non-singular ergodic automorphisms of standard probability spaces such partitions do exist, what we discuss now. First we recall necessary definitions.
Let (Ω, A, µ) be a probability space. The measure µ is called complete if for any subset A ∈ A with µ(A) = 0 every its subset B also belongs to A.
A countable family of sets {A l } l∈N ⊂ A is called a complete basis of (Ω, A, µ) if (a) for each A ∈ A there exists a B ∈ σ({A i } ∞ l=1 ) with A ⊂ B and µ(B \ A) = 0; (b) for any ω 1 , ω 2 ∈ Ω there exists an l ∈ N such that ω 1 ∈ A l and ω 2 ∈ Ω \ A l ; (c) each intersection l∈N B l , where every B l is either A l or Ω \ A l , is non-empty. A probability space (Ω, A, µ) is called standard if it has a complete basis and µ is complete.
It has been proved by Rohlin [21] that every standard probability space with nonatomic measure is isomorphic with the probability space (I, B(I), λ), where λ is the Lebesgue measure on I.
Recall also that a one-to-one transformation T : Ω → Ω is non-singular with respect to a measure µ if it is bi-measurable, i.e. T −1 A = A and T A = A, and µ(A) = 0 if and only if µ(T (A)) = 0 for all A ∈ A.
The following theorem is a consequence of results by Rohlin [21], Parry [19] and Krieger [12] about the existence of countable and finite generating partitions of ergodic maps.
Moreover, if h KS (T ) < ∞, then T admits a finite generating partition, and so X can be assumed to take only finitely many distinct values.
The continuous case. Notice that the function X from Theorem 6.5 is slightly better than the one from Example 6.1, as it takes a discrete set of values mutually distinct for distinct elements of the generating partition C. Nevertheless, it is hard to construct as it requires to know a generating partition for T , and so it is not useful for application as well. Now we will consider the opposite situation when almost any continuous map X : Ω → R n satisfies (18). Lemma 6.6. Let Ω be a Polish space admitting an embedding X : Ω → R n . Then for any measure µ on B(Ω) and any µ-preserving map T , we have that ard µ (T ) ≤ n. In particular, if dim Ω = k; k ∈ N, then ard µ (T ) ≤ 2k + 1.
The second statement follows from the well known fact that every k-dimensional separable metric space Ω can be embedded into R 2k+1 , [11, Chapter V, §4, Theorem V3]. Moreover, by the same theorem the set of embeddings Emb(Ω, R 2k+1 ) is residual (and, in particular, dense) in the space C(Ω, R 2k+1 ) of all continuous maps. Therefore almost every family of 2k + 1 continuous observables will allow to approximate the Kolmogorov-Sinai entropy of T .
The next statement is a slight generalization of Theorem 2.2 from Keller [14].
Theorem 6.7. Let Ω be a smooth manifold and D(Ω) be the group of its C ∞ diffeomorphisms. Then there exists a residual subset W of D(Ω) such that ard µ (T ) = 1 for each T ∈ W and any measure µ preserved by T .
It is proved by Takens [23] that if n ≥ 2k+1, then E n is residual (and in particular nonempty and everywhere dense) in C ∞ (Ω, R) × D(Ω). Thus we have that E 2k+1 = ∞ l=1 U l , where each U i is open and everywhere dense in the space C ∞ (Ω, R) × D(Ω). Let p : C ∞ (Ω, R) × D(Ω) → D(Ω) be the natural projection, i.e. p(X, T ) = T . It is a standard fact from general topology that p is an open map, whence is a residual subset of D(Ω). Then ard µ (T ) = 1 for each T ∈ W and any measure µ preserved by T .
Notice that the latter result does not guarantee that for any measure µ on B(Ω) preserved by some diffeomorphism T there exists some other µ-preserving diffeomorphism T ′ with ard µ (T ′ ) = 1.
The following notion allows to decrease the dimension 2k +1 in Lemma 6.6 by putting some restrictions on µ.
Definition 6.8. Let X : Ω → R be a continuous map between topological spaces. Then the following subset of Ω N X = {ω ∈ Ω | X −1 (X(ω)) = {ω}} will be called the set of non-injectivity of X. Let Ω be a smooth manifold of dimension k. Say that a subset Q ⊂ Ω has Lebesgue measure zero, if for any local chart φ : Ω ⊃ U → R k in Ω the set φ(Q ∩ U) has Lebesgue measure zero in R k . Notice that there is no natural definition of a set of fixed positive Lebesgue measure.
A measure µ on B(Ω) will be said Lebesgue absolutely continuous if µ(Q) = 0 for each subset Q ⊂ Ω of measure zero. If n > k, then V n is residual in C ∞ (Ω, R n ). Hence ard µ (T ) ≤ k + 1 for any (not necessarily continuous) µ-preserving map T : Ω → Ω.