On the Relation between Topological Entropy and Restoration Entropy

In the context of state estimation under communication constraints, several notions of dynamical entropy play a fundamental role, among them: topological entropy and restoration entropy. In this paper, we present a theorem that demonstrates that for most dynamical systems, restoration entropy strictly exceeds topological entropy. This implies that robust estimation policies in general require a higher rate of data transmission than non-robust ones. The proof of our theorem is quite short, but uses sophisticated tools from the theory of smooth dynamical systems.


Introduction
This paper compares two notions of entropy that are relevant in the context of state estimation under communication constraints. Since the work of Savkin [1], it has been well known that the topological entropy of a dynamical system characterizes the smallest rate of information above which an estimator, receiving its state information at the corresponding rate, is able to generate a state estimate of arbitrary precision. Topological entropy is a quantity that has been studied in the mathematical field of dynamical systems since the 1960s and has turned out to be a useful tool for solving many theoretical and practical problems, cf. the survey [2] and the monograph [3]. A big drawback of this notion in the context of state estimation is that topological entropy is highly discontinuous with respect to the dynamical system under consideration in any reasonable topology, cf. [4]. As a consequence, estimation policies based on topological entropy are likely to suffer from a lack of robustness. Additionally, topological entropy is very hard to compute or estimate. There are only few numerical approaches that potentially work for multi-dimensional systems, cf. [5][6][7][8], and each of them has its drawbacks and restrictions.
A possible remedy for these problems is provided in the works [9,10] of Matveev and Pogromsky. One of the main ideas in these papers is to replace the topological entropy as a figure-of-merit for the necessary rate of data transmission with a possibly larger quantity, named restoration entropy, which describes the smallest data rate above which a more robust form of state estimation can be achieved (called regular observability in [9,10]).
Looking at one of the simplest types of nonlinear dynamical systems, namely Anosov diffeomorphisms, the main result of the paper at hand demonstrates that for most dynamical systems, we have to expect that the restoration entropy strictly exceeds the topological entropy. That is, to achieve a state estimation objective that is more robust with respect to perturbations, one has to pay the price of using a channel that allows for a larger rate of data transmission. More specifically, our result shows that the equality of topological and restoration entropy implies a great amount of uniformity in the dynamical system under consideration, which can be expressed in terms of the unstable Lyapunov exponents at each point, whose sum essentially has to be a constant. Such a property can easily be destroyed by a small perturbation, showing that arbitrarily close to the given system, we find systems whose restoration entropy strictly exceeds their topological entropy. Since Anosov diffeomorphisms are considered as a paradigmatic class of chaotic dynamical systems, this property can be expected for a much larger class of systems.
To prove our result, we need a number of high-level concepts and results from the theory of topological, measurable, and smooth dynamical systems. This includes the concepts of topological and metric pressure, Lyapunov exponents, SRB measures, and uniform hyperbolicity.
The structure of this paper is as follows: In Section 2, we collect all necessary definitions and results from the theory of dynamical systems. Section 3 introduces the concept of restoration entropy and explains its operational meaning in the context of estimation under communication constraints. In Section 4, we prove our main result and provide some interpretation and an example. Finally, Section 5 contains some concluding remarks.

Tools from Dynamical Systems
Notation: By Z, we denote the set of all integers, by N the set of positive integers, and N 0 := {0} ∪ N. All logarithms are taken to the base two. If M is a Riemannian manifold, we write | · | for the induced norm on any tangent space T x M, x ∈ M. The notation · is reserved for operator norms. We write clA and intA for the closure and the interior of a set A in a metric space, respectively. Finally, the notation A ⊂ B (A subset of B) does not exclude the case A = B.
In this paper, we use several sophisticated results from the theory of dynamical systems, in particular from smooth ergodic theory. In the following, we try to explain these results without going too much into technical details.
Let T : X → X be a continuous map on a compact metric space (X, d). Via its iterates: T 0 := id X , T n+1 := T • T n , n = 0, 1, 2, . . . the map T generates a discrete-time dynamical system on X with associated orbits {T n (x)} n∈N 0 , x ∈ X. We call the pair (X, T) a topological dynamical system (TDS).

Entropy and Pressure
Let (X, T) be a TDS. The topological entropy h top (T) measures the total exponential complexity of the orbit structure of (X, T) in terms of the maximal numbers of finite-time orbits that are distinguishable w.r.t. to a finite resolution. One amongst different possible formal definitions is as follows. For n ∈ N and ε > 0, a set E ⊂ X is called (n, ε, T)-separated if for any x, y ∈ E with x = y, we have: That is, we can distinguish any two points in E at a resolution of ε by looking at their length-n finite-time orbits. By the compactness of X, there is a uniform upper bound on the cardinality of any (n, ε, T)-separated set. Writing r(n, ε, T) for the maximal possible cardinality, This definition is due to Bowen [15] and (independently) Dinaburg [16]. However, it should be noted that the first definition of topological entropy, given by Adler, Konheim, and McAndrew [17], was in terms of open covers of X and was modeled in strict analogy to the metric (= measure−theoretic) entropy defined earlier by Kolmogorov and Sinai [18,19].
To define metric entropy, one additionally needs a Borel probability measure µ on X that is preserved by T in the sense that µ(A) = µ(T −1 (A)) for every Borel set A. By the theorem of Krylov-Bogolyubov, every continuous map on a compact space admits at least one such measure, cf. [20], Theorem 4.1.1. We write M T for the set of all T-invariant Borel probability measures. For any finite measurable partition P of X, we define the entropy of T on P by: Here, denotes the join operation. That is, n−1 i=0 T −i P is the partition of X whose elements are all intersections of the form P 0 ∩ T −1 (P 1 ) ∩ . . . ∩ T −n+1 (P n−1 ) with P i ∈ P. Moreover, H µ (·) denotes the Shannon entropy of a partition, i.e., H µ (Q) = − ∑ Q∈Q µ(Q) log µ(Q) for any finite partition Q. The metric entropy of T w.r.t. µ is then defined by: the supremum taken over all finite measurable partitions P of X (replacing measurable partitions with open covers and Shannon entropy with the logarithm of the cardinality of a minimal finite subcover, the same construction yields the topological entropy as defined in [17]).
To understand the meaning of h µ , note that H µ (Q) is the average amount of uncertainty as one attempts to predict the partition element to which a randomly-chosen point belongs. Hence, h µ (T) measures the average uncertainty per iteration in guessing the partition element of a typical length-n orbit.
The variational principle for entropy states that: where the supremum is not necessarily a maximum. This variational principle can be regarded as a quantitative version of the theorem of Krylov-Bogolyubov. Another concept (of which entropy is a special case) used in dynamical systems and inspired by ideas in thermodynamics is pressure. In this context, any continuous function φ : X → R, also called a potential or an observable, gives rises to the metric pressure of T w.r.t. φ for a given µ ∈ M T , defined as: To define an associated notion of topological pressure, put S n φ(x) := ∑ n−1 i=0 φ(T i (x)) and: Then, the topological pressure of T w.r.t. φ is given by: The associated variational principle, first proven in [21], reads: which includes (1) as a special case (simply put φ = 0).

Subadditive Cocycles
Let T : X → X be a map. A subadditive cocycle over (X, T) is a sequence ( f n ) n∈N 0 of functions f n : X → R satisfying: If equality holds in this relation, we call ( f n ) n∈N 0 an additive cocycle over (X, T). If X has the structure of a probability space with a σ-algebra F and a probability measure µ on F , T is measurable, and µ is T-invariant, we speak of a measurable subadditive cocycle provided that all f n are measurable. In the context of a TDS (X, T), we speak of a continuous subadditive cocycle if all f n are continuous.
The most fundamental result about subadditive cocycles is Kingman's subadditive ergodic theorem, cf. [3], Theorem 2.1.4: Theorem 1. Let T : X → X be a measure-preserving map on a probability space (X, F , µ) and ( f n ) n∈N 0 a measurable subadditive cocycle over (X, T) such that each f n is integrable. Then, the limit: exists for µ-almost every x ∈ X. If, additionally, µ is ergodic, then the limit is constant with: Observe that the limit on the right-hand side of (3) always exists by Fekete's subadditivity lemma (see [3], Fact 2.1.1), because the sequence a n := f n dµ is subadditive, i.e., a n+m ≤ a n + a m . Kingman's theorem can, in particular, be applied if (X, T) is a TDS, µ ∈ M T , and ( f n ) n∈N 0 is a continuous subadditive cocycle. Now, we consider again a TDS (X, T) and a continuous subadditive cocycle ( f n ) n∈N 0 over (X, T). We define the extremal growth rate of ( f n ) by: The following result is well known and can be found in [22], Theorem A.3, for instance: Lemma 1. Let ( f n ) n∈N 0 be a continuous subadditive cocycle over a TDS (X, T). Then: Here, all infima can be replaced with limits. Moreover, every supremum is attained.

Lyapunov Exponents, SRB Measures, and Pesin's Formula
To describe the long-term dynamical behavior of smooth systems, the notion of Lyapunov exponents is crucial. Given a C 1 -diffeomorphism T : M → M on a compact Riemannian manifold M, the Lyapunov exponent at x ∈ M in direction 0 = v ∈ T x M is the number: provided that the limit exists. Lyapunov exponents measure how fast nearby solutions diverge from each other. The most general result on their existence and their properties is the multiplicative ergodic theorem (MET), also known as Oseledets theorem, cf. [23,24]. We need the following version of the theorem (which is not the most general): Theorem 2. Let T : M → M be a C 1 -diffeomorphism of a compact Riemannian manifold M and µ ∈ M T . Then, there exists a Borel set Ω ⊂ M with µ(Ω) = 1 and T(Ω) = Ω such that the following holds: for every x ∈ Ω, there exist numbers λ 1 (x) > . . . > λ r(x) (x), and the tangent space at x splits into linear subspaces as: such that the following properties hold: (ii) The functions r(·), dim E i (·), and λ i (·) are measurable and constant along orbits. Moreover, (iii) For every x ∈ Ω, the limit: Typically, a given map has a huge number of associated invariant measures. To obtain a good description of the global dynamical behavior, one has to select specific invariant measures that determine the behavior of the system on a large set of initial states. In this context, the notion of an SRB measure (Sinai-Ruelle-Bowen measure) comes into play. An SRB measure is a measure with at least one positive Lyapunov exponent almost everywhere, having absolutely continuous conditional measures on unstable manifolds. We are not going to give a technical definition of the latter property. Instead, we state the following celebrated theorem due to Ledrappier and Young [25], which characterizes this property in terms of metric entropy. Here, we use the short-cut: for the sum of all positive Lyapunov exponents at a point x ∈ Ω, counted with multiplicities.
holds if and only if µ has absolutely continuous conditional measures on unstable manifolds.
Additionally, note that for any C 1 -diffeomorphism T and any µ ∈ M T , the inequality: holds, which is known as Ruelle's inequality or Ruelle-Margulis inequality [26] (Formula (4) was first proven by Pesin for smooth invariant measures).

Anosov Diffeomorphisms
One of the simplest classes of smooth dynamical systems with complicated dynamical behavior is the class of Anosov diffeomorphisms. In this paper, we use these systems for two reasons. First, they have positive topological entropy, and second, they are very well understood and there are many tools available to describe their properties.
Let M be a compact Riemannian manifold. A C 1 -diffeomorphism T : M → M is called an Anosov diffeomorphism if there exists a splitting: into linear subspaces such that the following conditions are satisfied: There are constants c ≥ 1 and λ ∈ (0, 1), so that, for all x ∈ M and n ∈ N 0 , From (A1) and (A2), it automatically follows that E s x and E u x vary continuously with x, cf. [20], Proposition 6.4.4. The existence of a splitting as above is also known as uniform hyperbolicity.
The simplest examples of Anosov diffeomorphisms are hyperbolic linear torus automorphisms, i.e., maps on the n-dimensional torus T n = R n /Z n of the form: where A ∈ Z n×n is an integer matrix satisfying | det A| = 1 and |λ| = 1 for all eigenvalues λ of A. Observe that the assumption | det A| = 1 guarantees that T A is invertible with inverse T −1 A = T A −1 (because A −1 also has integer entries) and at the same time implies that T A is area-preserving. That is, the normalized Lebesgue measure on T n is an element of M T A . The assumption on the eigenvalues of A together with the fact that the derivative DT A (x) at any point x ∈ T n can be identified with A itself implies the Anosov Properties (A1) and (A2).
It is well known that Anosov diffeomorphisms are structurally stable, i.e., any sufficiently small C 1 -perturbation T ε of an Anosov diffeomorphism T : M → M is also an Anosov diffeomorphism, which is topologically conjugate to T, see [20], Proposition 6.4.6 and Corollary 18.2.2. That is, there exists a homeomorphism h : M → M, so that: If we assume that T is an arbitrary Anosov diffeomorphism of the torus, the existence of a unique entropy-maximizing measure µ follows. That is, µ is the unique element of M T satisfying: This follows from a combination of results that can be found in Katok and Hasselblatt [20], namely Theorem 20.3.7, Proposition 18.6.5, Theorem 18.3.9, and Corollary 6.4.10. The entropy-maximizing measure µ is also known as the Bowen-measure.
In this context, also the notion of topological mixing is important. An Anosov diffeomorphism (or simply a continuous map) T : M → M is called topologically mixing if for any two nonempty open sets A, B ⊂ M, there exists an integer N such that T n (A) ∩ B = ∅ for all n ≥ N. In particular, all Anosov diffeomorphisms on T n are topologically mixing ( [20], Proposition 18.6.5).

State Estimation and Restoration Entropy
The notion of restoration entropy was introduced in [10] for systems given by ODEs on R n . However, it is immediately clear from the definition that restoration entropy can be defined for any continuous map on a compact metric space as follows. Let T : X → X be a continuous map on a metric space (X, d) and K ⊂ X a compact set with T(K) ⊂ K. For every x ∈ X, n ∈ N and ε > 0, let p(n, x, ε) denote the smallest number of ε-balls needed to cover the image T n (B ε (x) ∩ K). If the map is not clear from the context, we also write p(n, x, ε; T). Then: The existence of the limit in n follows from the subadditivity of the sequence a n := lim sup ε↓0 sup x∈X log p(n, x, ε) (using Fekete's lemma). If we assume that T is a C 1 -diffeomorphism of a compact Riemannian manifold, the numbers p(n, x, ε) can be estimated in terms of the unstable singular values of DT n (x). This is related to the simple fact that the image of a ball under a linear map (in our case, the local linear approximation DT n (x) to T n ) is an ellipsoid with semi-axes of lengths proportional to the singular values. This leads to the following result, proven in [10], Theorem 11, for continuous-time systems. The proof carries over to discrete-time systems on Riemannian manifolds without any problem.
For the analysis of h res , based on the above formula, the following observations are crucial: where DT n (x) ∧ denotes the linear map induced by DT n (x) between the full exterior algebras of the tangent spaces T x M and T T n (x) M, respectively; see [27], Chapter I, Proposition 7.4.2.

•
The sequence f n (x) := log DT n (x) ∧ , f n : M → R, is a continuous subadditive cocycle over (K, T |K ), since: Alternatively, this follows from Horn's inequality for singular values; see [27], Chapter I, Proposition 2.3.1.
In the following, we explain the operational meaning of the quantity h res (T |K ). Consider the dynamical system given by: Suppose that a sensor, fully observing the state x t , sends its data to an encoder. At the sampling times t = 0, 1, 2, . . ., the encoder sends a signal e t through a noise-free discrete channel to a decoder (without transmission delay). The decoder acts as an observer of the system, trying to reconstruct the state from the received data. We writex t for the estimate generated by the observer at time t. Moreover, we assume that we start with an initial estimatex 0 ∈ K of a specified accuracy.
With M denoting the coding alphabet, the encoder and the observer are described by mappings: The argument δ corresponds to the initial error at time zero, i.e., d(x 0 ,x 0 ) ≤ δ. In particular, we assume that both the encoder and the observer are given the datax 0 and δ. We assume that the channel can transmit at least b − (r) and at most b + (r) bits in any time interval of length r. The capacity of the channel is then defined by: assuming that these limits exist and coincide. We consider the following two observation objectives: (O1) The observer observes the system with exactness (O2) The observer regularly observes the system if there exist G, δ * > 0, so that for all δ ∈ (0, δ * ) and We say that the system is: • observable on K over a channel of capacity C if for every ε > 0, an observer exists that observes the system with exactness ε over this channel; • regularly observable on K over a channel of capacity C if there exists an observer that regularly observes the system over this channel.
Theorem 5. The smallest channel capacity C 0 , so that System (6) is: • observable on K over every channel of capacity C > C 0 is given by: • regularly observable on K over every channel of capacity C > C 0 is given by: Since regular observability implies observability, it is clear that: As already pointed out in the Introduction, the quantity h top (·) is highly discontinuous w.r.t. the dynamical system. Moreover, the corresponding data-rate theorem has the disadvantage that the final error ε may be much larger than the initial error δ, which cannot happen in the case of regular observability. From Theorem 4 in combination with Lemma 1, one sees that in the smooth case, h res is an infimum over functions that are continuous w.r.t. T in the C 1 -topology. This implies at least upper semicontinuity. Hence, we can expect that coding and estimation strategies based on restoration entropy enjoy better properties than those based on topological entropy.

Results
Before we present our main result, we prove two lemmas, which are of independent interest. Lemma 2. Let T : M → M be a C 2 -diffeomorphism on a compact Riemannian manifold M. Then, for any µ ∈ M T , we have: Proof. Let d = dim M. First observe that we have the identity: where α 1 (n, x) ≥ . . . ≥ α d (n, x) are the singular values of DT n (x), see [27], Chapter I, Proposition 7.4.2. Hence, The maximum over k is clearly attained when k is the maximal number such that α i (n, x) > 1 for all 1 ≤ i ≤ k. Hence, The numbers α i (n, x) are the eigenvalues of A n (x) := (DT n (x) * DT n (x)) 1/2 . Theorem 2 states that A n (x) 1/n → Λ x for µ-almost every x ∈ M and the logarithms of the eigenvalues of Λ x are the Lyapunov exponents at x. Since eigenvalues depend continuously on the matrix, it follows that: and consequently Applying the theorem of dominated convergence then yields the result.
Proof. Assume to the contrary that h µ * (T) < λ + dµ * (using Ruelle's inequality (5)). Then, Lemma 2 implies: According to Theorem 4 and the subsequent observation, an application of Lemma 1 yields: Combining these observations gives h top (T) < h res (T), in contradiction to our assumption. Now, we are in a position to state our main result.
Theorem 6. Let T : M → M be a topologically mixing C 2 -Anosov diffeomorphism on a compact Riemannian manifold M such that h top (T) = h res (T). Then, the unique entropy-maximizing measure µ * ∈ M T is an SRB measure. Moreover, the function: is constant.
Proof. First note that the existence and uniqueness of an entropy-maximizing measure µ * follows from [20], Theorem 20.3.7, Theorem 18.3.9, and Corollary 6.4.10. Here, the assumption that T is topologically mixing is crucial. By the preceding lemma combined with Theorem 3, we already know that µ * has absolutely continuous conditional measures on unstable manifolds. Since an Anosov diffeomorphism has positive Lyapunov exponents everywhere (where they exist), attained in all directions of the unstable subspace E u x , it follows that µ * is an SRB measure. Now, let µ ∈ M T be chosen arbitrarily. Due to the invariance of µ, we have: for every n ∈ N, implying: where we use Kingman's subadditive ergodic theorem, applied to the continuous additive cocycle f n (x) := log | det DT n (x) |E u x | (n ∈ N 0 ), and the theorem of dominated convergence. Observe that the function J u T(x) := log | det DT(x) |E u x | is continuous (using the fact that x → E u x is continuous). Hence, we can consider the affine function: The variational principle (2) for pressure tells us that: Hence, t → P top (−tJ u T), as the supremum over affine functions, is a convex function.
Using that µ * is the entropy-maximizing measure and Theorem 3, respectively, we obtain: On the other hand, also: The second identity here follows from the fact that P top (−1 · J u T) = sup µ∈M T (h µ (T) − λ + dµ) and h µ (T) ≤ λ + dµ by Ruelle's inequality (5). Hence, By convexity of t → P top (−tJ u T) and (7), this implies: From (7), it now follows that all of the maps α µ have the same slope, i.e., λ + dµ is independent of µ.
The above theorem shows that the equality h top (T) = h res (T) is a very restrictive condition. Indeed, this can be seen as follows. Any topologically mixing Anosov diffeomorphism has an abundance of periodic points. Indeed, the set of periodic points is dense in M; see [20], Corollary 6.4.19. If we consider a periodic point p ∈ M of period n p ∈ N, we can consider the invariant measure µ p given by: with δ (·) being the Dirac measure at a point. The above theorem implies that, under h top (T) = h res (T), the number: is independent of the periodic point p chosen. On the other hand, we know that every sufficiently small C 2 -perturbation of T yields another C 2 -Anosov diffeomorphism, topologically conjugate to T, hence also topologically mixing. If this perturbation is only performed in a small vicinity of a fixed periodic orbit, it can easily change the number γ(p), while not changing it for most of the other periodic orbits. As a consequence, the perturbed diffeomorphism T ε cannot satisfy h top (T ε ) = h res (T ε ).
The following corollary gives another characterization of Anosov diffeomorphisms with h top = h res in a two-dimensional case. Corollary 1. Consider a C 2 area-preserving Anosov diffeomorphism T : T 2 → T 2 of the two-torus. Then, the equality h top (T) = h res (T) is equivalent to the existence of a hyperbolic linear automorphism T A : T 2 → T 2 and a C 1 -diffeomorphism h : Proof. It follows immediately from Theorem 6 in combination with [20], Corollary 20.4.4, that the identity h top (T) = h res (T) implies the existence of a C 1 -conjugacy, as asserted. The other direction is easy to see, using the definition of restoration entropy. If h −1 • T • h = T A , then also h −1 • T n • h = T n A for all n ∈ N. We use that a C 1 -map on a compact manifold has a global Lipschitz constant. Let L := Lip(h) and L := Lip(h −1 ) be Lipschitz constants of h and h −1 , respectively. Then: Observe that h −1 (B ε (x)) ⊂ B L ε (h −1 (x)). Let N(l) denote the minimal number of ε-balls needed to cover an lε-ball in T 2 for any l > 0. Then, the minimal number of ε-balls needed to cover T n A h −1 (B ε (x)) is bounded from above by N(L ) sup z∈T 2 p(n, z, ε; T A ).  Taking the lim sup for ε ↓ 0 and subsequently the limit for n → ∞, we obtain that h res (T) ≤ h res (T A ). The other inequality can be proven analogously, so: h res (T) = h res (T A ).
Since T and T A are topologically conjugate (the C 1 -diffeomorphism h is a homeomorphism, in particular), they also have the same topological entropy: To complete the proof, it now suffices to show that h res (T A ) = h top (T A ). We can compute h res (T A ) using Theorem 4. To this end, observe that A is a hyperbolic matrix. If |λ 1 | > 1 > |λ 2 | are its eigenvalues, we obtain: max{0, log α i (n, x)} = log |λ 1 | ∀x ∈ T 2 , implying h res (T A ) = log |λ 1 |. It is well known that this is also the value of the topological entropy h top (T A ); see [20], Section 4. This also follows from the combination of the variational principle with Theorem 3.
The following example demonstrates how restrictive the condition h res (T) = h top (T) is by looking at small perturbations of Arnold's Cat Map. with determinant det A = 1. Observe that the derivative DT A (x) can be identified with A for each x ∈ T 2 . Since A is a hyperbolic matrix with eigenvalues: satisfying |γ 2 | > 1 > |γ 1 |, it follows that T A is a C ∞ area-preserving Anosov diffeomorphism. Hence, Corollary 1 yields: h top (T A ) = h res (T A ) = log |γ 2 |. Now, we consider a perturbation of the form: T ε A (x, y) := (2x + y + ε sin(2πx), x + y) (mod Z 2 ), ε > 0 which is well defined as a torus map, since the sine function is 2π-periodic. By the structural stability of Anosov diffeomorphisms, for a sufficiently small ε, this map is topologically conjugate to T A , hence has the same topological entropy log |γ 2 |. However, its restoration entropy is strictly greater. This can be seen by looking at the fixed point (0, 0) with the associated derivative: DT ε A (0, 0) = 2 + 2πε 1 1 1 .