Symbolic Extensions of Smooth Interval Maps *

In this course we will present the full proof of the fact that every smooth dynamical system on the interval or circle X, constituted by the forward iterates of a function f : X → X which is of class C r with r > 1, admits a symbolic extension, i.e., there exists a bilateral subshift (Y, S) with Y a closed shift-invariant subset of Λ Z , where Λ is a finite alphabet, and a continuous surjection π : Y → X which intertwines the action of f (on X) with that of the shift map S (on Y). Moreover, we give a precise estimate (from above) on the entropy of each invariant measure ν supported by Y in an optimized symbolic extension. This estimate depends on the entropy of the underlying measure µ on X, the " Lyapunov exponent " of µ (the genuine Lyapunov exponent for ergodic µ, otherwise its analog), and the smoothness parameter r. This estimate agrees with a conjecture formulated in [15] around 2003 for smooth dynamical systems on manifolds.

forecasting is done by complicated software which must be fed information in the digital form.Modern black boxes that register the history of airplane flights or truck rides do it in the digital form.Even our mathematical work is registered mainly as computer files.Analog information is nearly an extinct form.
While studying dynamical systems (in any understanding of this term) sooner or later one is forced to face the following question: "How can the information about the evolution of a given dynamical system be most precisely turned into a digital form?"As researchers specializing in dynamical systems, we are responsible for providing the theoretical background for such a transition.
So suppose that we are observing a dynamical system, and that we are indeed turning our observation into the digital form.That means, from time to time, we produce a digital "report", a computer file, containing all our observations since the last report.Suppose for simplicity, that such reports are produced at equal time distances, say at integer times.Of course, due to bounded capacity of our recording devices and limited time between the reports, our files have bounded size (in bits).Because the variety of digital files of bounded size is finite, we can say, that at every integer moment of time we produce just one symbol, where the collection of all possible symbols (called the alphabet and denoted by Λ) is finite.
An illustrative example is filming a scene using a digital camera.Every unit of time, the camera registers an image, which is in fact a bitmap of some fixed size (camera resolution).The camera turns the live scene into a sequence of bitmaps.If the scene is filmed with sound, each bitmap is enhanced by a small sound file, also of bounded size.We can treat every such enhanced bitmap as a single symbol in the alphabet of the "language" of the camera.
The sequence of symbols is produced as long as the observation is being conducted.We have no reasons to restrict the global observation time, and we can agree that it goes on for ever.Sometimes (but not necessarily), we can also admit that the observation has been conducted since ever in the past as well.In this manner, the history of our recording takes on the form of a unilateral or bilateral sequence of symbols from some finite alphabet Λ. Advancing in time by a unit corresponds, on one hand, to the unit-time evolution of the dynamical system, on the other, to shifting the enumeration of our sequence of symbols.In this manner we have come to the conclusion, that the digital form of the observation is nothing else, but an element of the symbolic space Λ S , where S stands either for the set of all integers Z or nonnegative integers N 0 .The action on this space is the familiar shift transformation σ given by σ(x) = y, where x = (x n ) n∈S and y = (x n+1 ) n∈S .Now, in most situations, such observation of the dynamical system will be lossy, i.e., it will capture only some aspects of the observed dynamical system.Much of the dynamics will be lost.For example, the digital camera will not be able to register objects hidden behind other objects, moreover, it will not see objects smaller than one pixel or their movements until they pass from one pixel to another.However, it may happen, that after a while, each object will eventually become visible, and that we will be able to reconstruct its trajectory from the recorded information.
Of course, lossy digitalization is always possible and hence presents a lesser kind of challenge.We will be much more interested in lossless digitalization.When is it possible to digitalize a dynamical system, so that no information is lost, i.e., in such a way, that after viewing the entire sequence of symbols, we can reconstruct the trajectory of every smallest particle in the system?Well, it is certainly so, when the dynamical system under observation is not too complicated.When its rigidly moving particles are few, large, and the motion between the integer time moments is fully determined by the positions at the integer moments, and, at such moments each particle has only finitely many available positions.In other words, when the system is discrete in every aspect.But is this the only case?
The answer is no.At least at the purely theoretical level, the variety of systems that allow lossless digitalization is much larger.The class depends on the kind of approach we assume.We will concentrate on two levels: measuretheoretic and topological.Assuming the measure-theoretic point of view, each discrete time dynamical system is the action of a measure-preserving transformation on a measure space.We do not care about distances between particles, all we care about is partitions and probabilities with which the particles occupy the cells of these partitions.Here we are completely settled within the realm of ergodic theory.Assuming the topological point of view we do care about distances, but only up to preservation of convergence, i.e., we respect open and closed sets.In this setup we are within the realm of topological dynamics.
In the first, ergodic theoretic context, the question about "lossless digitalizability" of a system is relatively easy to answer.For automorphisms of probability spaces it is completely solved by the celebrated Krieger's Generator Theorem: an automorphism T of a probability space (X, F , µ) is isomorphic to the shift of a symbolic space Λ Z (equipped with some shift-invariant measure) if and only if the Kolmogorov-Sinai entropy h µ (T ) of the automorphism is finite.
For endomorphisms, although the theorem no longer applies (in full generality) we can employ the notion of natural extension.If T is an endomorphism of a probability space, and has finite entropy, then its natural extension is an automorphism and has the same finite entropy.By the Krieger Theorem, this natural extension is isomorphic to a symbolic system.The original endomorphism becomes a measure-theoretic factor of the symbolic system.The natural extension in its digital (i.e., symbolic) form clearly contains complete information about all its factors, in particular, about the original endomorphism system, which, in this manner becomes losslessly digitalized.
On the other hand, any system (automorphism or endomorphism) of infinite entropy can be neither represented nor embedded in a symbolic system, because all symbolic systems have finite entropy.So, any digitalization of an infinite entropy system must be lossy.We have thus fully characterized measure-theoretic systems (on probability spaces) which are losslessly digitalizable: these are precisely the systems of finite Kolmogorov-Sinai entropy.The digitalization is then isomorphic either to the system itself, or, at worst, to its natural extension.
At the level of topological dynamics this problem is much more complicated.Here, given a topological dynamical system (X, T ) (X is a compact metric space and T : X → X is a continuous map, perhaps a homeomorphism), we seek for its digitalization in form of some, also topological, symbolic system.These are constituted by compact, shift-invariant subsets of the symbolic spaces Λ S equipped with the action of the shift transformation σ.Such systems are shortly called subshifts.There are slight differences in our understanding of shift-invariance for the unilateral and bilateral sequences, but we skip these details here.
If we desire a symbolic system (subshift) (Y, σ) that carries all the information about a given topological dynamical system (X, T ), respecting its topological structure, a number of rather obvious limitations immediately pops out.First of all, we have very little chances to create a symbolic system that would be topologically isomorphic (i.e., conjugate) to (X, T ).Only expansive maps on zerodimensional spaces are conjugate to subshifts.And these properties are rather exceptional among topological dynamical systems.In every other case we can only hope to build a symbolic extension, i.e., a subshift (Y, σ), of which (X, T ) would be a topological factor.There are equally little chances, that the extension will be conjugate to the topological natural extension of (X, T ).The natural extension would have to be zero-dimensional and expansive which implies that X is itself zero-dimensional and T nearly (not exactly but close to) expansive.So, the symbolic extension (Y, σ), if one exist, will usually be something else than (X, T ) or its natural extension.Such a (Y, σ) will contain other "unwanted" dynamics joined with the dynamics of (X, T ).It may even have necessarily larger topological entropy!Unlike in the measure-theoretic case, finite entropy (this time topological) does not even guarantee the existence of a symbolic extension.This is a phenomenon first discovered by Mike Boyle, whose interest in this subject was provoked by the question of Joe Auslander.Mike Boyle also indicated examples of systems with finite topological entropy, such that symbolic extensions do exist, but all have topological entropy essentially larger (by some constant) than that of (X, T ).
In this manner we are lead to studying the following general problem, which we can summarize in the two questions below, concerning a given topological dynamical system (X, T ).QUESTION 1: Does there exist a topological symbolic extension (Y, σ) of (X, T )?In other words, is (X, T ) a topological factor of some subshift?QUESTION 2: If yes, what is the infimum of the topological entropies of all its symbolic extensions?
These two questions (and some related ones) have triggered the creation of a relatively new branch in topological dynamics, the theory of symbolic extensions.It should not be surprising, that this theory is embedded in the theory of entropy of topological dynamical systems.In fact, it lead to some new developments in this theory, the discovery of some new entropy-related notions and invariants of topological conjugacy.It turns out, that in order to handle the two major questions posed above, one needs to focus not only on the topological entropies of the involved systems (the system (X, T ) and its symbolic extensions (Y, σ)), but also on the measure-theoretic (Kolmogorov-Sinai) entropies of all invariant measures supported by these systems.The two key notions of the theory are defined below.Definition 1.1.Let (X, T ) be a topological dynamical system.The topological symbolic extension entropy of (X, T ) is defined as follows: A refinement of this notion at the level of invariant measures is provided below.
Definition 1.2.Let (X, T ) be a topological dynamical system and let P T (X) denote the set of all T -invariant measures µ on X.Let (Y, S) be a topological extension of (X, T ) and π : Y → X be the corresponding factor map. On P T (X) we define the extension entropy function by the formula Then, on P T (X) we define the symbolic extension entropy function, by One of the fundamental tools in the theory of symbolic extensions is the following theorem (one inequality is obvious, the other requires some machinery): The main task of the theory of symbolic extensions reduces to solving the following problem: PROBLEM 1: Compute (or estimate) h sex for a given system (X, T ) using its internal properties.
Notice that the definition of h sex is so constructed, that solving Problem 1 answers both of the formerly formulated questions 1 and 2. In full generality, so phrased problem has been solved in the paper [3], and then refined in [12].The solution is in terms of so-called entropy structure, a carefully selected sequence of functions on P T (X), which reflects the emergence of the entropy of different measures at refining scales.Crucial are upper semicontinuity properties of these functions and the multiple defect of uniformity in its convergence.The reason why these items are so essential can very roughly and briefly explained as follows: In the system (X, T ) some invariant measures may reveal all of its entropy already in large scale (like in expansive systems), other measures may need very small scale (i.e., fine covers) for their entropy to be detected.Now, in the symbolic extension (Y, σ), the small scale dynamics must be "magnified" and become visible in the large scale of the symbolic system (in symbolic systems all dynamics happens in large scale).If the "large scale measures" are approximated in P T (X) by the "small scale measures", the magnification of small scale dynamics may lead to enlarging the entropy of large scale dynamics.This causes the overall entropy of the symbolic extension to grow.
In this course we will concentrate on a more particular problem, concerning smooth maps: We will show how this problem is solved in dimension one, i.e., for smooth maps of the interval or of the circle, in terms of much more familiar parameters, such as the degree of smoothness r and the (slightly refined) Lipschitz constant.

The history of research on topological symbolic extensions
The first result concerning symbolic extensions in topological dynamics is due to William Reddy and goes back to 1968 ( [21]).It says that every expansive homeomorphism T on a compact metric space has a symbolic extension.The construction provided no control over the entropy of this extension.
It was clear that expansiveness was a much too strong requirement.All known examples of finite entropy systems seemed to admit symbolic extensions.One of the spectacular applications of symbolic extensions occurs in the studies of hyperbolic systems.Using Markov partitions, such systems can be lifted to subshifts of finite type, which allows to apply symbolic dynamical methods to the hyperbolic systems.This approach belongs to the classics, it is described for example in Bowen's book [1].Generally, however, very little was known.The natural question whether all finite entropy systems indeed have symbolic extensions has been presumably puzzling many people between the years 1970 and 1990.Around 1989, Joe Auslander addressed this question to Mike Boyle, one of the best experts in symbolic dynamics.Within some time (less than a year), Boyle came up with the negative answer, by constructing an appropriate example.A version of the same example showed, that even if a system does admit a symbolic extension, there may exist a necessary gap between the entropy of the system and that of any symbolic extension.He called this gap the residual entropy.These examples have been presented at the Adler conference in 1991, but never published until 2002 (after the author of this note has already published his own version of Boyle's examples in 2001).These examples proved only one thing: there is no easy answer to the questions 1 and 2 stated in the preceding section.
For the next 8 years, the progress was rather limited and not published.Mike Boyle collaborated in this matter with Doris and Ulf Fiebig.They tried to construct symbolic extensions by means of symbolic and topological methods (without using invariant measures), which, from today's perspective, explains why their results were so restricted.
Around 1998 the same problem was encountered by the author of this note.Together with Fabien Durand, they were characterizing all factors of so-called Toeplitz flows, and one of the three conditions for a system to be such a factor was that it admits some symbolic extension ( [13]).It soon occurred, that nobody knew any general criteria for that.Mike Boyle was able to say that any system of entropy zero has a symbolic extension also of entropy zero ( [2]), which was very useful for the study of factors of Toeplitz flows.
In year 1999, the author of this note spent a month in Marseille, devoting all his energy trying to understand why some systems have and other do not have symbolic extensions.For simplicity, he focused on zero dimensional systems, which seemed to be the best class to study.He discovered that the existence of symbolic extensions depends on the distribution of entropy on invariant measures, which lead to the first result containing the criteria for the existence and an estimate of the topological entropy of symbolic extensions for general zero-dimensional systems ( [11]).In particular, he showed that an asymptotically h-expansive zero-dimensional system admits a symbolic extension of the same topological entropy.In the same paper he published the already mentioned examples based on those by Mike Boyle.
A year later, Boyle and the Fiebigs publish a long paper containing the results of their long lasting collaboration ( [4]).The old examples appear here in the original version, next to new ones, where the transformation is on a disc and is differentiable at all but one point.In terms of positive results, all asymptotically h-expansive systems (not necessarily zero-dimensional) are shown to posses principal symbolic extensions, i.e., such that not only the topological entropy is the same as that of (X, T ), but also the Kolmogorov-Sinai entropy of every invariant measure is the same as that of its image in the system (X, T ).Since expansive systems are asymptotically h-expansive, we recover here a refined version of Reddy's first result.Since any system of entropy zero is asymptotically h-expansive, we also recover the fact communicated earlier by Boyle to the author of this note.Another spectacular application, neatly included in [4] concerns smooth maps.Soon before that, Jerome Buzzi just proved that any C ∞ map on a Riemannian manifold is in fact asymptotically h-expansive ( [9]).(Many years earlier Sheldon Newhouse proved a seemingly weaker statement [20], which from today's perspective is equivalent to Buzzi's result.)Now, this fact receives a new meaning: every C ∞ map on a manifold admits a principal symbolic extension.If we agree, that symbolic extensions are "lossless digitalizations", then principal symbolic extensions can be regarded "gainless" (without superfluous information) digitalizations.The fact that all C ∞ maps can be losslessly and gainlessly digitalized became one of the iconic achievements of the theory of symbolic extensions.However, an immediate question arises: what about C r maps, where r < ∞?
In 2001, the author of this note visited Mike Boyle.Leaving the smooth systems aside, they worked on the general theory.Their work [3] contains the complete and general characterization of the symbolic extension entropy func-tion h sex .It also contains the aforementioned variational principle for the symbolic extension entropy.Problem 1 and both questions stated in the preceding section, became completely solved.The solution still refers to zero-dimensional systems: each system with finite entropy is first shown to posses a principal zero-dimensional extension (using the theory of mean dimension, by E. Lindenstrauss and B. Weiss [17,16]) and then it is shown how to build a symbolic extension of a zero-dimensional system.The notion of an entropy structure is introduced for zero-dimensional systems, the key tool to compute the symbolic extension entropy function.A criterion is provided, when the symbolic extension entropy function is attained, i.e., when a symbolic extension exist, whose entropy function matches the symbolic extension entropy function (an "optimal" digitalization).
Next year, the author of this work develops a consistent theory of entropy structures for general topological dynamical systems ( [12]).Among other things, this allows to simplify the phrasing of several results from the preceding work, by skipping the intermediate stage of a zero-dimensional extension.The theory of entropy structures, although its importance depends upon the application to symbolic extensions, has gained an independent interest and several papers appeared devoted to other aspects of the entropy structure theory ( [8,18]).
At the same time the author collaborates with Sheldon Newhouse.The focus of this collaboration is on smooth maps on Riemannian manifolds.The obtained results ( [15]) are of negative nature: roughly speaking they prove that (in some class) a typical C 1 system of dimension d ≥ 2 admits no symbolic extensions at all (infinite symbolic extension entropy), while a typical C r map, where 1 < r < ∞ (also for d ≥ 2) does not admit a principal symbolic extension (without saying that is does admit a symbolic extension).In their examples the gap between the entropy of the system and the entropy of a symbolic extension (the residual entropy) is bounded below by some term (which we denote here by R) proportional to the Lipschitz constant and inverse proportional to r − 1.They formulate a conjecture, that the residual entropy in their examples is the worst, i.e., that every C r map with r > 1 does admit a symbolic extension, and the symbolic extension entropy is, in the worst case, equal to the entropy plus R.
This conjecture triggered a number of papers containing partial results.In all cases the conjecture has been confirmed: In 2005 the author of this note, jointly with Alejandro Maass, proves the conjecture true in dimension d = 1 ( [14], the subject of this course).This result is then complemented by David Burguet, who provides examples of C r interval maps showing the estimate R for the residual entropy to be sharp ( [5]).In the meantime Lorenzo Díaz and Todd Fisher prove related results for partially hyperbolic diffeomorphisms ( [10]).Recently, Burguet proved the conjecture in two more cases: for C r nonuniformly expanding maps (such that every invariant measure of positive entropy has all Lyapunov exponents nonnegative) on manifolds of any dimension and for any r > 1 ([6]), and, even more recently, for any C 2 surface diffeomorphisms ( [7]).The general case of a C r map (or diffeomorphism) on a compact manifold of dimension d remains an open problem, and the latest Burguet's result for d = 2 is the most advanced step toward the full solution.

Introduction to entropy structures
For the purposes of this course, we will not need the general definition of the entropy structure.It suffices to know, that any entropy structure has the form of a sequence of functions h k : P T (X) → [0, ∞), such that h k (µ) ր h(µ) for every invariant measure µ.Sometimes it is better to consider the tails θ k = h − h k .Then we have θ k ց 0 pointwise.Not all sequences (θ k ) k≥1 converging monotonically to zero are entropy structures.There are additional conditions on how they converge in reference to the dynamics.Still, there are many possible entropy structures in one dynamical system, but they are all equivalent to each other in a specific sense.Instead of listing the conditions which classify a given sequence (θ k ) as an entropy structure we will simply specify one particular such sequence (θ k ), which has been proved to satisfy these conditions in the paper [12].Only this entropy structure will be used throughout this course.The precise description of this sequence will be given in the next section.
So suppose we have already chosen an entropy structure (θ k ).This allows to compute the symbolic extension entropy function h sex .The derivation of h sex from the entropy structure is via the "transfinite sequence", as defined below: Step 0: (recall that f (x) = lim sup y→x f (y) ).

Theorem 3.1 ([3]
).There exists a countable ordinal α 0 such that u α = u α0 for every α ≥ α 0 , and Combining this with Theorem 1.3 we get As a digression, let us mention, that the theory of entropy structures allows to characterize the famous Misiurewicz parameter h * (T ) (the one used to define asymptotically h-expansive systems, by h * (T ) = 0) as the pointwise supremum of the function u 1 : The two parameters appear at opposite poles of the transfinite sequence: h * (T ) is the "supremum of the first order", while h sex (T ) is the "supremum of all orders".The participation of h(µ) in only one of the suprema causes the two notions not to be related by any inequality.Only one implication holds in general: h * (T ) = 0 =⇒ h sex (T ) = h top (T ).In fact we have the equivalence: This is to say, asymptotically h-expansive systems are exactly those which admit a principal symbolic extension ( [4]).

The Newhouse entropy structure
Now we provide the definition of a local entropy, created by Sheldon Newhouse in 1989.Later, the author of this note has verified, that local entropy with respect to a refining sequence of open covers becomes an entropy structure.Below we use the following notation: F is any Borel subset of X, V is an open cover of X and V n x is any set containing x and having the form A set E is (n, δ)-separated if for any two points y, y ′ ∈ E there is some i ∈ {0, 1, . . ., n − 1} with d(T i y, T i y ′ ) ≥ δ.Also µ denotes an invariant measure and is a number smaller than 1.
We extend the function h N ew (X|•, V) to all of P T (X) by averaging over the ergodic decomposition.This function is called the local entropy function given the cover V.
The Newhouse entropy structure is obtained as the sequence where V k is a sequence of open covers, each finer than the preceding one, and with the maximal diameters of their elements decreasing to zero.This is indeed an entropy structure ( [12]).

Key ingredients in the one-dimensional result
In this section we state the main result of [14] and two key theorems leading to it.The first one, called "The Antarctic Theorem" is an estimate of local entropy for C r interval (or circle) maps.The exotic name of the theorem comes from the fact that the breakthrough in proving it was made during the author's trip to Antarctica, in fact while he was spending a sleepless night camping on the snow on one of the Antarctic islands.This is the only statement in this course, which uses the specific properties of the interval.The second intermediate result, called "The Passage Theorem" can be phrased as a completely general fact and in this form has already been used by Burguet in his two latest results.Its name reflects that it provides a "bridge" between the local entropy estimate of the preceding theorem and the final estimate of the symbolic extension entropy function, given in the main result.One can also associate the name with Drake Passage, where, returning from Antarctica, the author attempted to apply his discovery to symbolic extensions (which was accomplished after returning to Santiago, with the help of the coauthor A. Maass).This section also contains the derivation of the Estimate Theorem from the two intermediate theorems.
The detailed proofs of the Antarctic, Passage and Estimate Theorems can be found in [14].In this course, we sketch these proofs skipping some details.Instead, we will try to be more convincing by illustrating some of the arguments using figures.
Let f be a C r transformation of the interval or of the circle X, where r > 1.Let µ ∈ P f (X).We denote (For ergodic measures in dimension one, χ(µ) is the Lyapunov exponent.)Theorem 5.1 (The Antarctic Theorem).Fix some γ > 0. For each µ ∈ P f (X) there exists an open cover V of X such that for every ergodic measure ν in an open neighborhood of µ in P f (X).
The Passage Theorem says the same, but without assuming ergodicity of ν.The function χ0 (µ) is defined by averaging χ 0 over the ergodic decomposition of µ.Since χ 0 is evidently convex, χ0 is usually slightly larger than χ 0 (except at ergodic measures, where these two are equal).
Theorem 5.2 (The Passage Theorem).Fix some γ > 0. For each µ ∈ P f (X) there exists an open cover V of X such that for every invariant measure ν in an open neighborhood of µ in P f (X).
The main result is this: Theorem 5.3 (The Estimate Theorem).Let f be a C r transformation of the interval or of the circle X, where r > 1.Then As a consequence, by the symbolic extension entropy variational principle (and the usual variational principle), Remark 5.4.The Lipschitz constant can be easily replaced by the smaller constant R(f ) = lim 1 n L(f n ), where f n denotes the composition power of f .We now describe how the Estimate Theorem is deduced from the Passage Theorem.This is fairly easy.
So, assume the Passage Theorem holds.The Ruelle inequality (h(ν) ≤ χ 0 (ν) for ergodic ν, see [22]) easily implies that for any invariant measure ν we also have Thus, for ν sufficiently close to µ we have Clearly, h N ew (X|ν, V) (as well as χ0 (ν)) cannot be negative.The situation is illustrated on the figure below.The horizontal axis represents all measures ν in the vicinity of µ parametrized by χ0 (ν), on the vertical axis we have the upper bound for h N ew (X|ν, V): It is seen from this picture (which replaces elementary calculations), that, for all considered measures ν, ). Plugging this to the definition of the transfinite sequence, we obtain We proceed by the transfinite induction.Suppose for all ordinals β < α.Then, near a measure µ, holds The situation is shown on the figure below.
We are using the fact that χ0 is an upper semicontinuous function, hence in a sufficiently small vicinity of µ all measures ν satisfy χ0 (ν) ≤ χ0 (µ) + γ.This is why the domain of the graph extends only a bit beyond χ0 (µ) (further to the right it would grow, so we are happy not have to include that part).By passing to the upper limit as ν approaches µ, we get By transfinite induction, u α ≤ χ0(µ) r−1 for all ordinals including α 0 .Now, using the transfinite characterization of the symbolic extension entropy.we get the desired result:

Sketch of the proof of the Antarctic Theorem
The proof relies on the following, fairly elementary counting lemma: Lemma 6.1.Let g : [0, 1] → R be a C r function, where r > 0. Then there exists a constant c > 0 such that for every 0 < s < 1 the number of components of the set {x : g(x) = 0} on which |g| reaches or exceeds the value s is at most Proof.For 0 < r ≤ 1, g is Hölder, i.e., there exists a constant The component containing x is at least that long and the number of such components is at most c • s − 1 r , where c = c 1 Jointly, the number of all components I on which |f | exceeds s is at most 2+(c+1)•s − 1 r ≤ c 1 •s − 1 r (the number 2 is added because the above argument does not apply to the extreme components, which need not contain critical points).
For g = f ′ we obtain the following.Corollary 6.2.Let f : [0, 1] → [0, 1] be a C r function, where r > 1.Then there exists a constant c > 0 such that for every s > 0 the number of branches of monotonicity of f on which |f ′ | reaches or exceeds s is at most c • s − 1 r−1 .Next we apply the above to counting possible ways by which a point with a bounded below derivative for the composition power of f may traverse the branches of monotonicity.We make a formal definition.Definition 6.3.Let f be as in the formulation of Corollary 6.2.Let I = (I 1 , I 2 , . . ., I n ) be a finite sequence of branches of monotonicity of f , (i.e., any formal finite sequence whose elements belong to the countable set of branches, admitting repetitions).Denote Choose S ≤ −1.We say that I admits the value S if Notice that, if there exists a sequence of points y i ∈ I i with log |f ′ (y i )| ≤ −1 for each i and satisfying 1 n n i=1 log |f ′ (y i )| ≥ S, then I admits the value S. Lemma 6.4.Let f : [0, 1] → [0, 1] be a C r function, where r > 1. Fix ǫ > 0. Then there exists S ǫ ≤ −1 such that for every n and S < S ǫ the logarithm of the number of sequences I of length n which admit the value S is at most Proof.Without loss of generality assume that S is a negative integer.Let I be a sequence of n branches of monotonicity which admits the value S. Denote k i = ⌊a i ⌋.Then (−k i ) is a sequence of n positive integers with sum at most n(1 − S).Now, in a given sequence (k i ), each value k i may be realized by any branch of monotonicity on which max log |f ′ | lies between k i and k i + 1 (or just exceeds −1 if k i = −1).From Corollary 6.2 it follows that there are no more than ce −k i r−1 such branches for each k i .Jointly the logarithm of the number of sequences of branches of monotonicity corresponding to one sequence (k i ) is at most Lemma 6.5.Let T be a C r transformation of the interval or of the circle X, where r > 1.Let U and V be as described above.Let ν be an ergodic measure and let Then Proof.Let F be the set of points on which the nth Cesaro means of the function 1 U log |f ′ | are close to S(ν) for n larger than some threshold integer (we are using the ergodic theorem; such a set F can have measure larger than 1−σ).For x ∈ F and large n consider a set containing x, with V i ∈ V (as in the definition of local entropy).Consider the finite subsequence of times 0 ≤ i j ≤ n − 1 when V ij = U .Let nζ denote the length of this subsequence and assume ζ > 0. For a fixed δ let E be an (n, δ)separated set in V n x ∩ F and let y ∈ E. The sequence (i j ) contains only (usually not all) times i when f i (y) ∈ U .Thus, since y ∈ F , we have where A is the similar sum over the times of visits to U not included in the sequence (i j ).Clearly A ≤ 0, so it can be skipped.Dividing by ζ we obtain The right hand side above is smaller than S ǫ .This implies that along the subsequence (i j ) the trajectory of y traverses a sequence I (of length nζ) of branches of monotonicity of f admitting the value S(ν) ζ smaller than S ǫ .By Lemma 6.4, the logarithm of the number of such sequences I is dominated by At times i other than i j the set V i contains only one branch, so if two points from V n x ∩ F traverse the same sequence of branches along the times (i j ), they traverse the same full sequence of branches along all times i = 0, 1, . . ., n − 1.The number of (n, δ)-separated points which, along all times i = 0, 1, . . ., n − 1, traverse the same given sequence of branches of monotonicity is negligibly small.This, together with (6.2), implies that the logarithm of the cardinality of E can be only negligibly larger than (6.2).The proof is concluded by dividing by n, and letting n → ∞.
Proof of the Antarctic Theorem.Fix an invariant measure µ and some γ > 0. We need to consider only ergodic measures ν close to µ.If χ(µ) < 0 then, by upper semicontinuity of the function χ, for ν sufficiently close to µ, χ(ν) < 0, so by the Ruelle inequality (and since always h N ew (X|ν, V) ≤ h(µ)), h N ew (X|ν, V) = 0 and the assertion holds for any open cover.Now suppose that χ(µ) ≥ 0. Clearly, then µ(C) = 0. Since log |f ′ | is µintegrable, the open neighborhood U of C (on which log |f ′ | < S ǫ ) can be made so small that the (negative) integral of log |f ′ | over the closure of U is very close to zero (closer than some ǫ).Then 3) The integral in (6.3) is an upper semicontinuous function of the measure (U c is an open set on which log |f ′ | is finite and continuous and negative on the boundary), hence (6.3) holds for all invariant measures ν in a neighborhood of µ.The more (we have included the boundary to the set of integration, and the function is negative on that boundary).Then We define the cover V with the above choice of the set U (recall, V consists of U and some intervals on which f is monotone).We can now apply Lemma 6.5.Substituting (6.4) into (6.1)we get Of course, χ(µ) can be replaced by a not smaller number χ 0 (µ).If χ(ν) < 0 then h N ew (X|ν, V) = 0 ≤ χ0(µ)−χ0(ν) r−1 , so, in any case we can write Because the function χ0(•) r−1 is bounded, the contribution of the error terms ǫ can be made smaller than the additive term γ.

Sketch of the proof of the Passage Theorem
In the Passage Theorem, we need to drop the assumption that ν is ergodic.The key tool is the lemma below.Recall that the ergodic decomposition allows to represent each invariant measure ν as the barycenter of a probability measure M ν supported by the set of ergodic measures.In order to easier distinguish between probability measures on P T (X) and invariant measures on X (which are points in P T (X)) we will consistently use the term "distribution" with regard to probability measures on P T (X), in particular to the ergodic (and nonergodic) decompositions of invariant measures.Below, by a joining of two distributions M, M ′ on some space we understand any distribution on the Cartesian square of the space, with marginals M , M ′ .Lemma 7.1.In a topological dynamical system (X, T ), let µ, ν n ∈ P T (X), and ν n → µ in the weak* topology.Choosing a subsequence we can assume that the ergodic decompositions M νn converge to some distribution M on P T (X).By continuity of the barycenter map, bar(M ) = µ.Then, given any ǫ > 0, for n large enough, there exists a joining J n of M νn and M such that J n (∆ ǫ e ) > 1 − ǫ, where ∆ ǫ e = { ν, τ ∈ P T (X) × P T (X) : ν is ergodic and dist(ν, τ ) < ǫ}.
Proof.The proof is elementary, and we only sketch it.We partition P T (X) into finitely many Borel sets F i diameter smaller than ǫ and with boundaries of measure M zero.Then, for large n the numbers M νn (F i ) are very close to M (F i ) (for every index i).for any ergodic ν in the ǫ τ -neighborhood of τ .For each τ the Lebesgue number of V τ is a positive number ξ τ .Let ǫ be so small that ǫ τ > ǫ and ξ τ > ǫ for M -nearly all τ (belonging to a set P ⊂ P T (X) with M (P) ≈ 1).We let V be an open cover by sets of diameter smaller than ǫ.This cover is finer than V τ for -nearly each τ , hence (7.1) holds for such τ , V and ǫ.By Lemma 7.1, for n large enough there exists a joining J of M νn and M satisfying J n (∆ ǫ e ) > 1 − ǫ.We fix such an and let J τ be the conditional of J n for τ fixed on the second coordinate, and we let ν τ denote bar(J τ ).We have ν τ dM (τ ) = ν n . (7.2) By the properties of the joining J n , for M -nearly all τ the distribution J τ is nearly supported by the ǫ-neighborhood of τ .These conditions together imply that for M -nearly every τ the distribution J τ is nearly supported by the ergodic measures ν which satisfy (7.1) for the cover V.
The idea of the above argument is presented on this figure.For simplicity, ν n is shown as a convex combination of two ergodic measures ν a and ν b (the distribution M νn is supported by these two points).The limit distribution M has barycenter µ and in this figure is also supported by two points τ a and τ b (not necessarily ergodic; M need not be the ergodic decomposition of µ, which in this case is a convex combination of completely different measures µ a and µ b ).The role of the joining J n is to associate to each τ in the support of M a "part" of the distribution M νn , called J τ , nearly supported by a small neighborhood of τ .In the case shown on the figure, it associates to τ a the point mass at ν a , and to τ b the point mass at ν a .
Integrating both sides of (7.1) with respect to J τ we get for M -nearly every τ : larger r we proceed inductively: suppose that the lemma holds for r − 1.Let g be of class C r .By elementary considerations of the graph of f , with each component I = (a I , b I ) of the set {x : g(x) = 0} we can disjointly associate an interval (x I , y I ), so that |g| attains at x I its maximum on I and y I is a critical point lying to the right of I (see the figure below).There are two possible cases: either (a) y I − x I > s 1 r , or (b) y I − x I ≤ s 1 r .Clearly, the number of components I satisfying a) is smaller than s − 1 r .If a component satisfies b) and |f | exceeds s on it, then, by the mean value theorem, |g ′ | attains on (x I , b I ) a value at least s/s 1 r = s r−1 r .Because g ′ is of class C r−1 , by the inductive assumption, the number of such intervals (x I , y I ) (hence of components I) does not exceed c •

For
large values of −S, the first term, and the last 1 can be skipped at a cost of multiplying −S by (1 + ǫ).The number of all possible sequences (k i ) with sum n(1 − S) is negligibly small on the logarithmic scale.So the logarithm of the number of all sequences of branches of monotonicity which admit the value S is, regardless of n, estimated from above as in the assertion.Regardless of whether f is a transformation of the interval or of the circle X, the derivative f ′ can be regarded as a function defined on the interval [0, 1].Let C = {x : f ′ (x) = 0} be the critical set.Fix ǫ > 0. Fix some open neighborhood U of C on which log |f ′ | < S ǫ .Then U c can be covered by finitely many open intervals on which f is monotone.Let V be the cover consisting of U and these intervals.The figure below shows f and the set U .
The joining J n is obtained as the sum of appropriately scaled product distributions M | Fi × M νn | Fi .Such a joining is supported by the ǫ-neighborhood of the diagonal (see figure below).Proof of the Passage Theorem.Suppose that there exists γ > 0 and a sequence ν n converging to µ, and which, for any choice of an open cover V, eventually does not satisfy the assertion of the Passage Theorem.By choosing a subsequence we can assume that the ergodic decompositions M νn converge to some distribution M on P T (X) with bar(M ) = µ.By the Antarctic Theorem, for every τ in the support of M there is some open cover V τ and ǫ τ > 0 such that h N ew (X|τ, V τ ) ≤ χ 0 (µ) − χ 0 (ν)