Entropy in Dynamical Systems

In this article, the word entropy is used exclusively to refer to the entropy of a dynamical system, i.e. a map or a flow. It measures the rate of increase in dynamical complexity as the system evolves with time. This is not to be confused with other notions of entropy connected with spatial complexity. I will attempt to give a brief survey of the role of entropy in dynamical systems and especially in smooth ergodic theory. The topics are chosen to give a flavor of this invariant; they are also somewhat biased toward my own interests. This article is aimed at nonexperts. After a review of some basic definitions and ideas, I will focus on one main topic, namely the relation of entropy to Lyapunov exponents and dimension, which I will discuss in some depth. Several other interpretations of entropy, including its relations to volume growth, periodic orbits and horseshoes, large deviations and rates of escape are treated briefly.


Topological and metric entropies: a review
We begin with topological entropy because it is simpler, even though chronologically metric entropy was defined first.Topological entropy was first introduced in 1965 by Adler, Konheim and McAndrew [AKM].The following definition is due to Bowen and Dinaburg.A good reference for this section is [W], which contains many of the early references.
Definition 1.1 Let f : X → X be a continuous map of a compact metric space X.For ε > 0 and n ∈ Z + , we say E ⊂ X is an (n, ε)-separated set if for every x, y ∈ E, there exists i, 0 ≤ i < n, such that d(f i x, f i y) > ε.Then the topological entropy of f , denoted by h top (f ), is defined to be where N(n, ε) is the maximum cardinality of all (n, ε)-separated sets.
Roughly speaking, we would like to equate "higher entropy" with "more orbits", but since the number of orbits is usually infinite, we need to fix a finite "resolution", i.e. a scale below which we are unable to tell points apart.Suppose we do not distinguish between points that are < ε apart.Then N(n, ε) represents the number of distinguishable orbits of length n, and if this number grows like ∼ e nh , then h is the topological entropy.Another way of counting the number of distinguishable n-orbits is to use (n, ε)-spanning sets, i.e. sets E with the property that for every x ∈ X, there exists y ∈ E such that d(f i x, f i y) < ε for all i < n; N(n, ε) is then taken to be the minimum cardinality of these sets.The original definition in [AKM] uses open covers but it conveys essentially the same idea.

Facts If follows immediately from Definition 1.1 that (i)
Metric or measure-theoretic entropy for a transformation was introduced by Kolomogorov and Sinai in 1959; the ideas go back to Shannon's information theory.We begin with some notation.Let (X, B, µ) be a probability space, i.e.X is a set, B a σ-algebra of subsets of X, and µ a probability measure on (X, B).Let f : X → X be a measure-preserving transformation, i.e. for all A ∈ B, we have ), which we call the α-addresses of the n-orbit starting at x.

Definition 1.2
The metric entropy of f , written h µ (f ), is defined as follows: The objects in the definition above have the following interpretations: H(α) measures the amount of uncertainty, on average, as one attempts to predict the α-address of a randomly chosen point.The two limits in the second line can be shown to be equal.The quantity in the middle is the average uncertainty per iteration in guessing the α-addresses of a typical n-orbit, while the one on the right is the uncertainly in guessing the α-address of f n x conditioned on the knowledge of the α-addresses of The following facts make it easier to compute the quantity in the third line: h µ (f ) = h µ (f, α) if α is a generator; it can also be realized as lim h µ (f, α n ) where α n is an increasingly refined sequence of partitions such that α n partitions X into points.
The following theorem gives a very nice interpretation of metric entropy: Theorem 1 (The Shannon-McMillan-Breiman Theorem) Let f and α be as above, and write h = h µ (f, α).Assume for simplicity that (f, µ) is ergodic.Then given ε > 0, there exists N such that the following holds for all n ≥ N: ) → h a.e. and in L 1 where α(x) is the element of α containing x. SUMMARY Both h µ and h top measure the exponential rates of growth of n-orbits: -h µ counts the number of typical n-orbits, while -h top counts all distinguishable n-orbits.
We illustrate these ideas with the following coin-tossing example.Consider σ : where each element of Π ∞ 0 {H, T } represents the outcome of an infinite series of trials and σ is the shift operator.Since the total number of possible outcomes in n trials is 2 In particular, if the coin is biased, then the number of typical outcomes in n trials is ∼ e nh for some h < log 2.
From the discussion above, it is clear that h µ ≤ h top .We, in fact, have the following variational principle: Theorem 2 Let f be a continous map of a compact metric space.Then where the supremum is taken over all f -invariant Borel probability measures µ.
The idea of expressing entropy in terms of local information has already appeared in Theorem 1 (ii).Here is another version of this idea, more convenient for certain purposes.In the context of continuous maps of compact metric spaces, for x ∈ X, n ∈ Z + and ε > 0, define Theorem 3 [BK] Assume (f, µ) is ergodic.Then for µ-a.e.x, Thus metric entropy also has the interpretation of being the rate of loss of information on nearby orbits.This leads naturally to its relation to Lyapunov exponents, which is the topic of the next section.

Entropy, Lyapunov Exponents and Dimension
In this section, we focus on a set of ideas in which entropy plays a central role.
Setting for this section: We consider (f, µ) where f : M → M is a C 2 diffeomorphism of a compact Riemannian manifold M and µ is an f -invariant Borel probability measure.For simpllicity, we assume (f, µ) is ergodic.(This assumption is not needed but without it the results below are a little more cumbersome to state.)Let λ 1 > λ 2 > • • • > λ r denote the distinct Lyapunov exponents of (f, µ), and let E i be the linear subspaces corresponding to λ i , so that the dimension of E i is the multiplicity of λ i .
We have before us two ways of measuring dynamical complexity: metric entropy, which measures the growth in randomness or number of "typical" orbits, and Lyapunov exponents, which measure the rates at which nearby orbits diverge.The first is a purely probabilistic concept, while the second is primarily geometric.We now compare these two invariants.Write a + = max(a, 0).
Theorem 4 (Pesin's Formula) [P] If µ is equivalent to the Riemannian measure on M, then Theorem 5 (Ruelle's Inequality) [R2] Without any assumptions on µ, we have To illustrate what is going on, consider the following two examples: Assume for simplicity that both maps are affine on each of the shaded rectangles and their images are as shown.The first map is called "baker's transformation".Lebesgue measure is preserved, and h µ (f ) = log 2 = λ 1 , the positive Lyapunov exponent.The second map is easily extended to Smale's horseshoe; we are interested only in points that remain in the shaded vertical strips in all (forward and backward) times.As with the first map, we let µ be the Bernoulli measure that assigns mass 1 2 to each of the two vertical strips.Here h µ (f ) = log 2 < λ 1 .
Theorems 4 and 5 suggest that entropy is created by the exponential divergence of nearby orbits.In a conservative system, i.e. in the setting of Theorem 4, all the expansion goes back into the system to make entropy, leading to the equality in Pesin's entropy formula.A strict inequality occurs when some of the expansion is "wasted", and that can happen only if there is "leakage" or dissipation from the system.
The reader may have noticed that the results above involve only positive Lyapunov exponents.Indeed there is the following complete characterization: The last sentence is an abbreviated way of saying the following: If f has a positive Lyapunov exponent µ-a.e., then local unstable manifolds are defined µ-a.e.Each one of these submanifolds inherits from M a Riemannian structure, which in turn induces on it a Riemannian measure.We call µ an SRB measure if the conditional measures of µ on local unstable manifolds are absolutely continuous with respect to these Riemannian measures.
SRB measures are important for the following reasons: (i) They are more general than invariant measures that are equivalent to Lebesgue (they are allowed to be singular in stable directions); (ii) they can live on attractors (which cannot support invariant measures equivalent to Lebesgue); and (iii) singular as they may be, they reflect the properties of positive Lebesgue measure sets in the sense below: Fact [PS] If µ is an ergodic SRB measure with no zero Lyapunov exponents, then there is a positive Lebesgue measure set V with the property that for every continuous function ϕ : for Lebesgue-a.e. x ∈ V .
Theorems 4, 5 and 6 and Definition 2.1 are generalizations of ideas first worked out for Anosov and Axiom A systems by Sinai, Ruelle and Bowen.See [S], [R1] and [B] and the references therein.
Next we investigate the discrepancy between entropy and the sum of positive Lyapunov exponents and show that it can be expressed precisely in terms of fractal dimension.Let ν be a probability measure on a metric space, and let B(x, r) denote the ball of radius r centered at x. Definition 2.2 We say that the dimension of ν, written dim(ν), is well defined and equal to α if for ν-a.e.x, The existence of dim(ν) means that locally ν has a scaling property.This is clearly not true for all measures.
Theorem 7 [LY2] Corresponding to every λ i = 0, there is a number The numbers δ i have the interpretation of being the partial dimensions of µ in the directions of the subspaces E i .Here µ|W u refers to the conditional measures of µ on unstable manifolds.The fact that these conditional measures have well defined dimensions is part of the assertion of the theorem.
Observe that the entropy formula in Theorem 7 can be thought of as a refinement of Theorems 4 and 5: When µ is equivalent to the Riemannian volume on M, δ i = dim E i and the formula in Theorem 7(a) is Pesin's formula.In general, we have δ i ≤ dim E i .Plugged into (a) above, this gives Ruelle's Inequality.We may think of dim(µ|W u ) as a measure of the dissipativeness of (f, µ) in forward time.
I would like to include a very brief sketch of a proof of Theorem 7, using it as an excuse to introduce the notion of entropy along invariant foliations.
Sketch of Proof I.The "conformal" case.By "conformal", we refer here to situations where all the Lyapunov exponents are = λ for some λ > 0. This means in particular that f cannot be invertible, but let us not be bothered by that.In fact, let us pretend that locally, f is a dilation by e λ .Let B(x, n, ε) be as defined toward the end of Section 1. Then where h = h µ (f ).Comparing the expressions above and setting r = e −λn , we obtain which proves that dim(µ) exists and = h λ .II.The general picture.Let λ 1 > • • • > λ u be our positive Lyapunov exponents.It is a fact from nonuniform hyperbolic theory that for each i ≤ u, there exists at µ-a.e.x an immersed submanifold W i (x) passing through x and tangent to E 1 (x) + • • • + E i (x).These manifolds are invariant, i.e. f W i (x) = W i (f x), and the leaves of W i are contained in those of W i+1 .Our strategy here is to work our way up this hierarchy of W i s, dealing with one exponent at a time.
For each i, we introduce a notion of entropy along W i , written h i .Intuitively, this number is a measure of the randomness of f along the leaves of W i ; it ignores what happens in transverse directions.Technically, it is defined using (infinite) partitions whose elements are contained in the leaves of W i .We prove that for each i there exists δ i such that (i)

, u, and (iii) h
The proof of (i) is similar to that in the "conformal" case since it involves only one exponent.To give an idea of why (ii) is true, consider the action of f on the leaves of W i , and pretend somehow that a quotient dynamical system can be defined by collapsing the leaves of W i−1 inside W i .This "quotient" dynamical system has exactly one Lyapunov exponent, namely λ i .It behaves as though it leaves invariant a measure with dimension δ i and has entropy h i −h i−1 .A fair amount of technical work is needed to make this precise, but once properly done, it is again the single exponent principle at work.Summing the equations in (ii) over i, we obtain Step (iii) says that zero and negative exponents do not contribute to entropy.This completes the outline of the proof in [LY2].

Random dynamical systems
We close this section with a brief discussion of dynamical systems subjected to random noise.Let ν be a probability measure of Diff(M), the space of diffeomorphisms of a compact manifold M. Consider one-or two-sided sequences of diffeomorphisms chosen independently with law ν.This setup applies to stochastic differential equations where the X i are time-independent vector fields and the f i are time-one-maps of the stochastic flow.The random dynamical system above defines a Markov process on M with P (E|x) = ν{f, f (x) ∈ E}.Let µ be the marginal of a stationary measure of this process, i.e. µ = f * µ dν(f ).As individual realizations of this process, our random maps also have a system of invariant measures {µ f } defined for ν -a.e.f = (f i ) ∞ i=−∞ .These measures are invariant in the sense that (f 0 ) * µ f = µ σ f where σ is the shift operator, and they are related to µ by µ = µ f dν .
Extending slightly ideas for a single diffeomorphism, it is easy to see that Lyapunov exponents and entropy are well defined for almost every sequence f and are nonrandom.We continue to use the notation λ 1 > • • • > λ r and h.
Theorem 8 [LY3] Assume that µ has a density wrt Lebesgue measure.If We remark that µ has a density if the transition probabilities P (•|x) do.In light of Theorem 6, when λ 1 > 0 the µ f may be regarded as random SRB measures.

Entropy and volume growth
Let M be a compact m-dimensional C ∞ Riemannian manifold, and let f : M → M be a C 1 mapping.In this subsection, h(f ) is the topological entropy of f .As we will see, there is a strong relation between h(f ) and the rates of growth of areas or volumes of f n -images of embedded submanifolds.To make these ideas precise, we begin with the following definitions: For , k ≥ 1, let Σ(k, ) be the set of C k mappings σ : Q → M where Q is the -dimensional unit cube.Let ω(σ) be the -dimensional volume of the image of σ in M counted with multiplicity, i.e. if σ is not one-to-one, and the image of one part coincides with that from another part, then we will count the set as many times as it is covered.
For (i), Newhouse showed that V (f ) as a volume growth rate is in fact attained by a large family of disks of a certain dimension.The factor 2 k R(f ) in (ii) is a correction term for pathologies that may (and do) occur in low differentiability.
Ideas related to (ii) are also used to resolve a version of the Entropy Conjecture.Let S (f ), = 0, 1, • • • , m, denote the logarithm of the spectral radius of f * : H (M, R) → H (M, R) where H (M, R) is the th homology group of M, and let Intuitively, S(f ) measures the complexity of f on a global, topological level; it tells us which handles wrap around which handles and how many times.These crossings create "generalized horseshoes" which contribute to entropy.This explains why S(f ) is a lower bound for h(f ).On the other hand, entropy can also be created locally, say, inside a disk, without involving any action on homology.This is why h(f ) can be strictly larger.

Horseshoes and growth of periodic points
Recall that a "horseshoe" is a uniformly hyperbolic invariant set on which the map is topologically conjugate to σ : Σ 2 → Σ 2 , the full shift on two symbols.The next theorem says that in the absence of neutral directions, entropy can be thought of as carried by sets of this type.
Theorem 11 [K] Let f : M → M be a C 2 diffeomorphism of a compact manifold, and let µ be an invariant Borel probability measure with no zero Lyapunov exponents.If h µ (f ) > 0, then given ε > 0, there exist N ∈ Z + and Λ ⊂ M such that (i) f N (Λ) = Λ and f N |Λ is uniformly hyperbolic, (ii) f N |Λ is topologically conjugate to σ : Σ s → Σ s for some s, In particular, if dim(M) = 2 and h top (f ) > 0, then (i) and (ii) hold and the inequality in (iii) is valid with h µ (f ) replaced by h top (f ).
The assertion for dim(M) = 2 follows from the first part of the theorem because in dimension two, any µ with h µ (f ) > 0 has one positive and one negative Lyapunov exponent.
Since shift spaces contain many periodic orbits, the following is an immediate consequence of Theorem 11: Theorem 12 [K] Let f be a C 2 surface diffeomorphism.Then where P n is the number of fixed points of f n .
For Anosov and Axiom A diffeomorphisms in any dimension, we in fact have (see [B]) but this is a somewhat special situation.Recent results of Kaloshin show that in general P n can grow much faster than entropy.
For more information on the relations between topological entropy and various growth properties of a map or flow, see [KH].

Large deviations and rates of escape
We will state some weak results that hold quite generally and some stronger results that hold in specialized situations.This subsection is taken largely from [You].
Let f : X → X be a continuous map of a compact metric space, and let m be a Borel probability measure on X.We think of m as a reference measure, and assume that there is an invariant probability measure µ such that for all continuous functions ϕ : X → R, the following holds for m-a.e.x: For δ > 0, we define and ask how fast mE n,δ decreases to zero as n → ∞.
Following standard large deviation theory, we introduce a dynamical version of relative entropy: Let ν be a probability measure on X.We let h m (f ; ν) be the essential supremum with respect to ν of the function where B(x, n, ε) is as defined in Section 1. (See also Theorem 3.) Let M denote the set of all f -invariant Borel probability measures on X, and let M e be the set of ergodic invariant measures.The following is straightforward: Without further conditions on the dynamics, a reasonable upper bound cannot be expected.More can be said, however, for systems whose dynamics have better statistical properties.A version of the following result was first proved by Orey and Pelikan.For ν ∈ M e , let λ ν be the sum of the positive Lyapunov exponents of (f, ν) counted with multiplicity.
Theorem 13 Let f |Λ be an Axiom A attractor, and let U ⊃ Λ be its basin of attraction.Let m be the (normalized) Riemannian measure on U, and let µ be the SRB measure on the attractor (see Section 2).Then 1 n S n ϕ satisfies a large deviation principle with rate function A similar set of ideas applies to rates of escape problems.Consider a differentiable map f : M → M of a manifold with a compact invariant set Λ (which is not necessarily attracting).Let U be a compact neighborhood of Λ and assume that Λ is the maximal invariant set in Ū , the closure of U. Define Here, M is the set of invariant measures in Ū .For ν ∈ M, let λ ν be as before, averaging over ergodic components if ν is not ergodic.Let m be Lebesgue measure as before.The uniformly hyperbolic case of the next theorem is first proved in [B].If Λ is uniformly hyperbolic or partially uniformly hyperbolic, then the limit above exists and is equal to the right side.
The reader may recall that h ν (f ) ≤ λ ν is Ruelle's Inequality (Theorem 5).The quantity on the right side of Theorem 14 is called topological pressure.There is a variational principle similar to that in Theorem 2 but with a potential (see [R1] or [B] for the theory of equilibrium states of uniformly hyperbolic systems).
We finish with the following interpretations: The numbers λ ν , ν ∈ M, describe the forces that push a point away from Λ.For example, if Λ consists of a single fixed point of saddle type, then the sum of the logarithms of its eigenvalues of modulus greater than one is precisely the rate of escape from a neighborhood of Λ.The entropies of the system, h ν (f ), represent, in some ways, the forces that keep a point in U, so that h ν (f ) − λ ν gives the net escape rate.One cannot, however, expect equality in general in Theorem 14, the reason being that not all parts of an invariant set are "seen" by invariant measures.For a simple example, consider the time-onemap of the "figure 8" flow where Λ ⊂ R 2 consists of a saddle fixed point p together with its separatrices both of which are homoclinic orbits.If | det Df (p)| < 1, then Λ is an attractor, from whose neighborhood there is no escape.For its unique invariant measure µ = δ p , however, 0 = h ν < λ ν .

-Ruelle- Bowen measure or SRB measure if
Pesin's entropy formula holds if and only if µ is an SRB measure.An f -invariant Borel probability measure µ is called a Sinaif has a positive Lyapunov exponent µ-a.e. and µ has absolutely continuous conditional measures on unstable manifolds.