Lectures on dynamics , fractal geometry and metric number theory

These notes are based on lectures delivered in the summer school Modern Dynamics and its Interaction with Analysis, Geometry and Number Theory , held in Bedlewo, Poland, in the summer of 2011. The course is an exposition of Furstenberg's conjectures on transversality of the maps x → ax mod 1 and x 7→ bx mod 1 for multiplicatively independent integers a, b, and of the associate problems on intersections and sums of invariant sets for these maps. The rst part of the course is a short introduction to fractal geometry. The second part develops the theory of Furstenberg's CP-chains and local entropy averages, ending in proofs of the sumset problem and of the known case of the intersections conjecture.

For fractions of the form k/b n there are two possible expansions, we choose the one ending in 0s. These notes are about the deceptively simple question, what is the relation between [x] a and [x] b for a = b? Algorithmically, converting between bases is a trivial operation. But in most cases it is entirely non-trivial to discern any relation between the statistical or combinatorial properties of the expansion in different bases. There are two trivial cases where expansions in different bases are closely related. The first is when x is rational, in which case the sequence of digits is eventually periodic in every base (there remain subtle questions about the period, but qualitatively these expansions are all similar).
The second trivial case is when there is an algebraic relation between the bases. Specifically, if [x] b = 0.x 1 x 2 . . . and a = b 2 then the expansion [x] a arises by grouping the digits of [x] b into pairs. Indeed, Therefor, writing y i = bx 2i−1 + x 2i , we have [x] a = 0.y 1 y 2 . . .. In a similar way, if a = b n then we obtain [x] a from [x] b by grouping digits into blocks of length n.  In other words, low complexity in one base b implies correspondingly high complexity in every other base a ∼ b.
It is worth noting that this conjecture is related to problems about integer expansions. For example, Erdoős has conjectured that there is an n 0 such that for n > n 0 , the digit 2 appears in the base-3 expansion of 2 n (see [4,13]). Though as far as we know these two conjectures are not related, Conjecture 1.3 does imply a stronger fact for certain other pairs of bases: For example, that for every block w of binary digits, w appears in [2 n ] 10 for n > n 0 (w). See [6].
Little is known about Conjecture 1.3 itself, and we shall have little to say about it here. However, in its place Furstenberg proposed two geometric conjectures. These concern the intersections and linear projections of certain fractal sets, and their validity would provide some support for the conjecture above. The purpose of these notes is to present the state of the art on those problems (we postpone their precise statement to Section 5).

Organization
We begin in Sections 2-4 with a brief introduction to dimension theory. In Section 5 we state the geometric conjectures and discuss some related problems. In Section 6 we develop Furstenberg's notion of a CP-chain. In Section 7 we prove what is known about the intersections conjecture. In Section 8 we develop the method of local entropy averages, and in Section 9 present the proof of the projections problem.

Pre-requisites
We assume the reader has some background in analysis and ergodic theory. Specifically we freely use standard results in measure theory and ergodic theory, in particular the ergodic and ergodic decomposition theorems, conditional expectation and martingale convergence theorem. Some less well-known results of this nature are presented but without proofs. We also rely on the basic properties of Shannon entropy, stating the properties we need without proofs.
No background is assumed in fractal geometry.

Conventions and notation
N = {0, 1, 2, . . .} and N + = {1, 2, 3 . . .}. We equip R d with the metric induced by the sup norm · ∞ . When convenient we omit mention of the σ-algebra of a measurable space (it is by default the Borel algebra when the space is a topological space) and sets and functions are implicitly assumed to be measurable when this is required. Spaces of probability measures are given the weak-* topology when this makes sense. We follow standard "big O" and "little o" notation For the reader's convenience we summarize our main notation in the table below.

Notions of dimension for sets
Fractal geometry is a branch of analysis concerned with the fine-scale structure of sets and measures, usually in Euclidean spaces. The most basic quantity of interest is the dimension of a set. In this section we recall the definitions of Minkowski (or box) dimension and Hausdorff dimension, and the relations between them. In the next section we discuss the dimension of measures. For a more thorough introduction to fractal geometry see Falconer [5] or the monograph of Mattila [15].

First example: middle-α Cantor sets
The word "fractal" is not a well defined mathematical notion, and many of the tools of fractal geometry apply to arbitrary subsets of Euclidean space or a metric space. The term often refers, however, to sets which possess some hierarchical structure or that are invariant under some hyperbolic dynamics. Before giving general definitions, we begin with the simplest examples. Let 0 < α < 1. The middle-α Cantor set C α ⊆ [0, 1] is defined by a recursive procedure. For n = 0, 1, 2, . . . we construct a set C n α which is a union of 2 n closed intervals, each of length ((1 − α)/2) n . To begin let C 0 α = [0, 1]. Assuming that C n α has been defined and is the disjoint union of the closed intervals I 1 , . . . , I 2 n , set where I ± i ⊆ I i are the closed sub-intervals that remain after one removes the open subinterval of relative length α from I i (thus, if I = [a, a + r], then I − = [a, a + 1−α 2 r] and I + = [a − 1−α 2 r, a]). Clearly C 0 α ⊇ C 1 α ⊇ . . . and the sets are compact, so the set C n α is compact and nonempty. All of the sets C α , 0 < α < 1, are mutually homeomorphic, since all are topologically Cantor sets (i.e. compact and totally disconnected without isolated points). They all are of first Baire category. And they all have Lebesgue measure 0, since one may verify that Leb(C n α ) = (1 − α) n → 0. Hence none of these theories can distinguish between them.
Nevertheless, qualitatively it is clear that C α becomes "larger" as α → 0, since decreasing α results in removing shorter intervals in the course of the construction. In order to quantify this one uses dimension.

Minkowski dimension
Let (X, d) be a metric space. For A ⊆ X, the diameter of A is denoted |A| and given by The simplest notion of dimension measures the growth of the number of sets of a given diameter needed to cover a set. 3. It is possible that dim M A = ∞. In fact, dim M A < ∞ implies that A it totally bounded, and this is the same as compactness of the closure A.
4. Dimension is not a topological notion, rather, it depends on the metric. In R d we use the metric induced from the norm · ∞ , but it is not hard to verify that changing the norm changes N (A, δ) by at most a multiplicative constant, hence does not change dim M . 3. If A ⊆ R d has dim M A < d then Leb(A) = 0. Indeed, choose ε = 1 2 (d − dim M A). Then, for all small enough δ, there is a cover of A by δ −(dim M A+ε) sets of diameter ≤ δ. Since a set of diameter ≤ δ can itself be covered by a set of volume < cδ d , we find that there is a cover of A of total volume ≤ cδ d · δ −(dim M A+ε) = cδ ε . Since this holds for arbitrarily small δ, we conclude that Leb(A) = 0.
In particular, for a bounded set E ⊆ R d with non-empty interior we have dim M E ≥ d, and also, by

4.
A line segment in R d has Minkowski dimension 1. More generally any bounded k-dimensional embedded C 1 -submanifold of R d has box dimension k.
3. dim M A depends only on the induced metric on A.
The proofs are easy consequences of the definition and are omitted (see the closely related proof of Proposition 2.12 below).
Here is a simple but nontrivial application: Proof.

Covering with cubes
We now specialize to Euclidean space and show that in the definition of Minkowski dimension, one can restrict to covers by convenient families of cubes, rather than arbitrary sets. This is why Minkowski dimension is often called box dimension.
Definition 2.6. Let b ≥ 2 be an integer. The partition of R into b-adic intervals is The corresponding partition of R d into b-adic cubes is (We suppress the superscript d when it is clear from the context). The covering number and similarly for dim M and dim M .
Proof. Since |D| = b −n for any D ∈ D n b (recall that we are using the sup metric), On the other hand every set B with |B| ≤ b −n can be covered by at most 2 d cubes Substituting this into the limit defining dim M , and interpolating for b −n−1 ≤ δ < b −n as in Example 2.3 5 above, the lemma follows.

Hausdorff dimension
Minkowski dimension is relatively simple to compute, but it is a rather coarse quantity that is sometimes "too large". For example, countable sets may have positive dimension: Worse yet, this can occur for closed countable sets. For example the Monkowski dimension of We leave the verification to the reader.
Hausdorff dimension provides a better, albeit somewhat more complicated, notion of dimension. To motivate the definition, observe that sets of positive Lebesgue measure in R d are natural candidates to be considered fully d-dimensional, so one should look for sets of dimension < d among the Lebesgue nullsets. Recall that such a nullset is just a set with the property that it can be covered by balls whose total volume is arbitrarily small, where the volume of a ball of radius r is proportional to r d . Imagine now that we have a notion of "volume" for which the mass of a ball of radius r was of order r α . Then a set of positive "volume" would be a candidate to have dimension ≥ α, and a set of "volume" zero would be a candidate to have dimension ≤ α.
Although for α < d there is no canonical locally finite 2 measure on R d for which mass decays in this way, one can use this heuristic to define the notion of a null set. The following definition is the same as the definition of Lebesgue-null sets in R d , except that the contribution of each ball is r α instead of r d .
1. H α is not a measure, and it is usually denoted H ∞ α in order to distinguish it from Hausdorff measure. We shall not discuss Hausdorff measures here, and adopt the simpler notation without the superscript ∞.
2. The definition of H α does not require that the sets A i have small diameter. Whenever A is bounded one can cover it with a single set, and then H α is finite. For unbounded sets H α may be finite or infinite. Proof. Let 0 < ε < 1. Then there is a cover A ⊆ A i with |A i | α < ε. Since ε < 1, we know |A i | ≤ 1 for all i. Hence From the lemma it follows that for any A = ∅ there is a unique α 0 such that H α (A) = 0 for α > α 0 and H α (A) > 0 for 0 ≤ α < α 0 .
4. dim A depends only on the induced metric on A.
5. If f is a Lipschitz map X → X then dim f X ≤ dim X, and bi-Lipschitz maps preserve dimension.

1.
Clearly if B is α-null and A ⊆ B then A is α-null, and the claim follows.

Since
This follows from the fact that each A i is α-null by the same argument that shows that a countable union of Lebesgue-null sets is Lebesgue null. Specifically, Since ε was arbitrary, H α (A) = 0.
3. Let β > α > dim M A and fix any small δ > 0. Then there is a cover Since δ was arbitrary, H β (A) = 0. Since β > dim M A was arbitrary (we can always find suitable α), dim A ≤ dim M A.
We leave the proof of (4) and (5) to the reader.
Analogous to the fact that Minkowski dimension can be defined using boxes, we have: We leave the proof to the reader. Note, however, that if we reverse the quantifiers and consider covers {A i } such that there is an n with A i ∈ D n for all i, then rather than Hausdorff dimension one ends up with lower Minkowski dimension.  (2)).

If
M is an embedded k-dimensional C 1 submanifold M of R d , then it is bi-Lipschitz equivalent to a subset of R k with non-empty interior, so dim M = k.

Notions of dimension for measures
The Hausdorff dimension of a set is usually more difficult to compute than the Minkowski dimension. This is true even for very simple sets like the middle-α Cantor sets. One can often obtain an upper bound on the Hausdorff dimension by computing the Minkowski dimension, but in order to get a matching lower bound, if one exists, the appropriate tool is often the construction of appropriate measures on the set. In this section we develop this connection between the dimension of sets and measures.

The pointwise dimension of a measure
The definition of Hausdorff dimension of sets in R d was motivated by an imaginary "volume" which decays r α for balls of radius r. Although there is no canonical locallyfinite measure with this property for α < d, we shall see below that there is a precise connection between dimension of a set and the decay of mass of measures supported on the set. We restrict the discussion to sets and measures on Euclidean space. As usual let although one could use any other norm with no change to the results.
Thus dim(µ, x) = α means that the decay of µ-mass of balls around x scales no slower than r α , i.e. for every ε > 0, we have µ(B r (x)) ≤ r α−ε for all small enough r, and that this α is the largest number with this property.

1.
There is an analogous notion of upper pointwise dimension using limsup, but we shall not have use for it here.
2. In many of the cases we consider, the limit (1) exists. In that case µ is said to have exact dimension α at x.
3. There is a natural stronger notion of decay of mass at a point, namely, it may happen that for some α, the limit lim µ(B r (x))/r α exists and is positive and finite. For α = d and a measure µ on R d absolutely continuous with respect to Lebesgue measure, or to a smooth volume on a submanifold, such decay is guaranteed µa.e. by the Lebesgue differentiation theorem. It is a remarkable fact due to D.
Preiss [17] that if α is not an integer, then for any measure µ on R d the limit lim µ(B r (x))/r α can exists only for x in a µ-nullset. 1. If µ = δ u is the point mass at u, then µ(B r (u)) = 1 for all r, hence dim(µ, u) = 0.
2. If λ is Lebesgue measure on R d then λ(B r (x)) = cr d for any x, and dim(µ, x) = d.
4. Let µ = µ α on C α denote the probability measure which gives equal mass to each of the 2 d intervals in the set C n α introduced in the construction of C α . Let δ n = ((1 − α)/2) n be the length of these intervals. Then for every x ∈ C α , one sees that B δn (x) intersects at most two of the stage-n intervals and contains one of them, so Hence lim n→∞ log µ(B δn (x)) log δ n = log 2 log(2/(1 − α)) One obtains the same limit as δ → 0 continuously by observing that B δ n+1 (x) ⊆ B r (x) ⊆ B δn (x) whenever δ n+1 ≤ r < δ n . Hence dim(µ α , x) = log 2/ log(2/(1−α)) for every x ∈ C α .
The fundamental relation between pointwise dimension of a measure and Hausdorff dimension of sets is given in the next proposition, before which we recall the well-known Vitali covering lemma whose proof can be found e.g. in [15]. Lemma 3.4 (Vitali covering lemma). Let {B i } i∈I be a collection of balls in R d whose radii are all less than some R. Then there is a subset J ⊆ I such that {B j : j ∈ J} are pairwise disjoint, and i∈I B i ⊆ j∈J 5B j , where 5B j is the ball with the same center as B j and 5 times the radius.  Remark 3.6. In the first part of the theorem one can clearly relax the hypothesis and only require it to hold for µ-a.e x or even a positive µ-mass of x , since then the bound applies to the subset A 0 ⊆ A of points x for which it holds, and then dim A ≥ dim A 0 ≥ α. It is not possible to similarly relax the second part.
Proof. We prove the first statement. Suppose by way of contradiction that dim A < α and let dim A < β < α. Applying Egorov's theorem to the limit in the definition of dim(µ, x), we can find a subset of A of positive (actually, arbitrarily large) measure where the convergence in 1 is uniform, and of course this set still has dimension < α. Replacing A with this set we can assume that there is an r 0 such that if r < r 0 then µ(B r (x)) < r β for all x ∈ A.
For every δ > 0 there is a countable cover A ⊆ A i such that |A i | β < δ. We may assume A i ∩ A = ∅, since otherwise we can throw that set out. Let Also note that |A i | β < δ, so r i < δ 1/β . Hence, assuming δ is small enough, implies r i < r 0 . We now have Since δ was an arbitrary small number we get µ(A) = 0, a contradiction. Now for the second statement. Let ε > 0 and fix r 0 > 0. Then by assumption, for every x ∈ A we can find an r = r(x) < r 0 such that B Since µ is finite and r 0 was arbitrary, we find that H α+2ε (A) = 0. Hence dim A ≤ α + 2ε and since ε was arbitrary, dim A ≤ α.
As an application we can now compute the dimension of the sets C α from Section 2.1: Proof. Let β = log 2/ log((1 − α)/2). We saw already that dim M C α ≤ β, and so dim C α ≤ β. We also saw in Example 3.3 (4) that there is a measure µ α on C α with dim(µ, x) ≥ β for x ∈ C α , so by the proposition dim C α ≥ β. The claim follows.
The last argument is typical of computing the dimension of a set: generally one obtains an upper bound using Minkowski dimension, and tries to find a measure on the set which gives a matching lower bound.

Dimension of measures
Having defined dimension at a point, we now turn to global notions of dimension for measures. These are defined as the largest and smallest pointwise dimension, after ignoring a measure-zero set of points.
If the pointwise dimension is µ-a.s. constant, i.e. dim µ = dim µ, then their common value is the pointwise dimension of µ and is denoted dim H µ.
There is a stronger notion of dimension which is not always defined but, when it is, is sometimes useful: Definition 3.9. If the limit in Equation (1) exists and is µ-a.s. independent of x, then this value is called the exact dimension of µ and is denoted dim µ.
Clearly if µ is exact dimensional then dim µ = dim µ = dim µ, but the converse implication is false.
Proof. Since µ is σ-finite it is easy to reduce to the case that µ is a probability measure, which we now assume. Write α = dim µ. If A is a Borel set with µ(A) = 1, then by definition of dim µ for every ε > 0 there is a subset A ε ⊆ A such that dim(µ, x) ≥ α − ε for x ∈ A ε , and µ(A ε ) > 0. From the Proposition 3.
We have seen that the dimension of a set is no smaller than the dimension of the measures it supports. There is a converse result which we do not prove, see [15]: Theorem 3.12 (Frostman's lemma). If X ⊆ R d is a Borel set and H α (X) > 0 then there is a measure µ on X such that dim µ ≥ α. In particular, for every ε > 0 there is a probability measure µ supported on X such that dim µ > dim X − ε.
In general one cannot always find a measure µ on X with dim µ = dim X. Indeed, if X = X n and X n has dimension α − 1/n, then dim X = α, but by Theorem 3.11 any measure of dimension α will satisfy µ(X n ) = 0 for all n and hence µ(X) ≤ µ(X n ) = 0. Corollary 3.13. for a Borel set X, dim X = sup{dim µ : µ ∈ P(X)} Proof. For µ ∈ P(X) we have dim X ≥ dim µ by Proposition 3.11, giving dim X ≥ sup{dim µ : µ ∈ P(X)}. The reverse inequality follows from Theorem 3.12.

Density theorems
For λ =Lebesgue measure on R d , the Lebesgue density theorem states that if f ∈ L 1 (λ) then for λ−a.e. x, (here c is the inverse volume of the unit ball, which in the · ∞ norm is just 2 d ).
For other measures µ one might expect that, if dim(µ, x) = α, then the same would hold with r α in the denominator rather than r d . This is almost never the case (see Remark 3.2(3)), but we have the following, where r α is replaced by µ(B r (x)), and similarly along b-adic cells (rather than balls). We write D b (x) = the unique D ∈ D b containing x Theorem 3.14 (Differentiation theorems for measures). Let µ be a locally finite measure on R d and f ∈ L 1 (µ). Then for µ-a.e. x we have and for any integer b ≥ 2, Remark 3.15. .
1. The first of these results is due to Besicovtich and can be found e.g. in [15]. The formulation makes sense in a general metric space, but the theorem does not hold in this generality. The two main cases in which it holds are Euclidean spaces and ultrametric spaces, in which balls of a fixed radius form a partition of the space.
2. The second statement is a consequence of the martingale convergence theorem, since the ratio whose limit we are taking is nothing other than E(f | D b n )(x).
Let µ be a measure on R d and A a set with µ(A) > 0 and µ(R d \A) > 0. Topologically, A and its complement can be very much intertwined: for example both may be dense, or even have positive measure in every open set. However, from the point of view of µ, they become nearly separated when one gets to small enough scales.
Corollary 3.16 (Density theorems). If µ is a locally finite measure on R d and µ(A) > 0, then for µ-a.e. x ∈ A, Proof. Apply the previous theorem to the indicator functions 1 A and 1 R d \A .
Since 0 < f (x) < ∞ for ν-a.e. x, upon dividing the expression in the limit by log r the difference tends to 0, so the pointwise dimensions of µ, ν at x coincide. The second statement follows from the first.
The claim follows from the definitions.
Pointwise dimension of a measure can also be defined using decay of mass along b-adic cells rather than balls: We want to prove that equality holds a.e., hence suppose it does not. Then we can find an α and ε > 0, and a set A with µ(A) > 0, such that dim b (µ, x) > α + 2ε and dim(µ, x) < α + ε for x ∈ A. By further reducing the set A, we may, by Egorov's theorem, assume that the limit (3.1) defining pointwise dimensions converges uniformly for x ∈ A.
Let ν = µ| A . By the previous corollary, dim(µ, The union contains 2 d sets, and by uniformity, for k large enough, each has ν-mass , which is a contradiction.

Product sets
The following holds in general metric spaces but for simplicity we prove it for R d .
Taking logarithms and inserting this into the definition of dim M gives the claim.
The behavior of Hausdorff dimension with respect to products is more complicated than that of Minkowski dimension. In general, we have The claim follows.
For the first statement, apply Frostman's lemma (Theorem 3.12) to obtain, for each As ε was arbitrary the claim follows.
There are examples in which the inequality is strict, see [15]. However, we have the following condition for equality: It is enough to require equality of the Mankowski and Hausdorff dimension of one of the sets X, Y , but we will not prove this fact here. See [15].

Projections and slices
A classical and much-studied aspect of fractal geometry concerns the behavior of sets A ⊆ R d under intersection with affine subspaces ("slices" of the set), and under taking the image by a linear map π : R d → R k ("projection"). These problems are dual in the sense that for linear maps π, preimages π −1 (y) are affine subspaces, and heuristically the size of the fibers/slices A ∩ π −1 (A) should complement the size of the image π(A), as occurs by basic linear algebra when A = R d or when A < R d is itself an affine subspace.
Proof. Since πA ⊆ R k we have dim πA ≤ k. Since linear maps are Lipschitz, dim πA ≤ dim A. The first claim follows. For the second observe that there is a constant c > 0 such that for every x ∈ supp µ and r > 0, The inequality dim(πµ, πx) ≤ dim(µ, x) is a consequence of this, and from this the inequality dim πµ ≤ dim µ follows.

Strict inequality can occur. For example if
However, strict inequality dim πA < dim A is a rather exceptional situation. To motivate this statement, consider a set X ⊆ R 2 and let π θ be the orthogonal projection to the line of slope θ with the x-axis. Then for x, y ∈ X, the distance of the images π θ (x), π θ (y) is usually of order x − y : e.g. |π θ x − π θ y| ≥ δ x − y for all but a δfraction of the directions θ. Heuristically, this means that for a randomly chosen θ, the map π θ will behave, with high probability, like a bi-Lipschitz map when restricted to any "large" subset of . This is, essentially, why one expects the image to be as "large as it can be".
This heuristic takes the following precise form. Let Π d,k denote the space of surjective linear maps R d → R k , and parametrize it as the set of k × d matrices with rank k, which is an open subset of R dk . The volume measure on R dk then induces a measure class on Π d,k , and it is this measure class we refer whenever speaking of a.e. projection. The following is known generically as Marstrand's theorem, see e.g. [15] for sets, for measures see [12]. Theorem 4.6 (Marstrand [15]). Let A ⊆ R d be a Borel set. Then Together with the previous lemma this says that the image of a set is typically "as large as it can possibly be".
To motivate the dual statement about intersections, let us start with an apparently different problem of estimating the (box) dimension of the intersection of two sets A, B ⊆ [0, 1] whose (box) dimensions are α, β, respectively. Choose an interval I ∈ D n , I ⊆ [0, 1], randomly and uniformly. Each interval is chosen with probability 1/n, and A intersects roughly n α of them, so the probability of a random interval intersecting A is n α−1 . Similarly the probability of intersecting B is n β−1 . Now, suppose that A and B are "independent" at scale 1/n in the sense that the probability that a random interval I ∈ D n , I ⊆ [0, 1], intersects both A and B is the product of the probabilities that it intersects each individually. Then this probability is n α−1 · n β−1 = n (α+β−1)−1 . If α + β − 1 > 0, this is the probability associated to a set of box dimension α + β − 1. If α + β − 1 ≤ 0, this is (less than) the probability associated to a set of box dimension 0. Thus, under the stated independence assumption, we expect dim( To relate this to the slice problem, note that the line = {y = ux + v} intersects X = A × B in a set that is, up to a scaling of the metric, the same as (uA + v) ∩ B. When u, v are chosen randomly it is at least plausible that uA + v and B may display the kind of independence needed in the discussion above. This leads one to expect that for a generic line Something like this is indeed the case. Parametrize n-dimensional affine subspaces as W = π −1 (y), where π ∈ Π d,k and y ∈ R d−n are distributed independently according to Lebesgue measure (this measure is equivalent to the usual measure class on the Grassmanian). The following is Marstrand's slice theorem (more refined versions exist for measures, but we omit them).

1.
We cannot expect an equality here, since there will generally be an infinite-measure set of affine subspaces which do not intersect A at all. Strict inequality can also happen for subspaces W which intersect A non-trivially. A counterexample is again given by product sets: if A = A 1 × A 2 ⊆ R 2 and dim A < 1 then the theorem predicts that typically dim(A ∩ W ) = 0, while some lines parallel to the axes intersect A in copies of A 1 and A 2 , and these may have positive dimension.
2. Combining the two theorems, a.e. π ∈ Π d,k and a.e. y ∈ R k , writing W = π −1 (y), we find The projections π and subspaces W for which the conclusions of the theorems above fail are said to be exceptional. In general, the exceptional set can be badly behaved from a topological point of view. In particular, the map π → dim πA is measurable but does not generally have any continuity properties, and likewise the map W → dim(W ∩ A).
Bounds exist for the dimension of the set of exceptional maps π and subspaces W , but in general they can be large, e.g. uncountable, dense G δ subsets of their respective spaces, etc. For more information see e.g. [15]. Contrary to the "wild" situation for general sets, for "naturally defined" sets, it is believed that the only exceptions should be those that are necessary by algebraic or combinatorial reasons. Much progress has been made in this direction recently, at least with regard to projections. We will see one such case in Section 9. We now return to Conjecture 1.3. We shall re-state it in terms of the dynamics of the maps f b : [0, 1] → [0, 1] given by By an invariant set for f b we mean a closed non-empty subset X ⊆ [0, 1] satisfying f b X ⊆ X. Such sets represent sets of constraints on digit expansions: For any invariant set X there is a set L of finite words in the symbols 0, . . . , b − 1 such that X is precisely the set of points x ∈ [0, 1] which can be represented in base-b by a sequence containing no word w ∈ L as a sub-block. Conversely, any set such set L gives rise, by this procedure, to a closed and f b -invariant set X (although X it may be empty). For example, for b = 3 and L the set consisting of the single length-1 sequence 1, the corresponding set X is the middle- 1 3 Cantor set, C 1/3 . This method of defining invariant sets if very flexible and hints at the richness of the family of invariant sets, and indeed there is a great variety of invariant sets. Nevertheless, in many ways these sets are well behaved.
We will prove this in Section 7.2, but note here that the existence of dim M can be proved by showing that log N (X, D b n ) is a subadditive sequence, much as was done for c n (x, b) in Section 1.
Proof. Combine the previous proposition and Proposition 4.3.

Dynamical re-statement of Conjecture 1.3
The complexity of digit expansions was defined in the introduction. We now re-interpret it in terms of the orbit of x under the map f b (x) = bx mod 1, which we denote by Since m is in 1-1 correspondence with its digit sequence ω 1 . . . ω k , the claim follows.
Proof. By the definition of c(x; b), the previous lemma and Proposition 5.1 Thus, Conjecture 1.3 is equivalent to the following: Remark 5.6. Let us show again, in dynamical language this time, that the two hypotheses are necessary.
1. If x = k m ∈ Q for k, m ∈ N, then b n x mod 1 can be written as k /m for some integer 0 ≤ k < m. Therefore the orbit of x under any of the maps f b is a closed, finite set of dimension is 0, so the the conclusion of the conjecture is false.

For any
, which is the union of affine images of the elements of a countable (in fact, finite) decomposition of A. Since affine maps preserve dimension, 2 , then the conclusion of the conjecture fails for the bases b n and b m for any m, n ∈ N. Hence the assumption a ∼ b cannot be weakened to a = b.
Essentially all the instances in which we can confirm Conjecture 5.5 occur when x has dense orbit under one of the maps, say f b . In this case dim O b (x) = 1 and the conjecture holds trivially for every other base a. Since Lebesgue-a.e. x has a dense orbit, and, by general results in topological dynamics, the set of points with dense orbit is a dense G δ , it follows that the conjecture is satisfied by typical points both in the sense of measure and topology. It is important to note, however, that the set of points with non-dense orbit is large in many senses, e.g. it is dense, uncountable and has full Hausdorff dimension. Almost nothing is known about the conjecture for such points.
One way to re-phrase a special case of the conjecture is as follows. Consider the middle- 1 3 Cantor set C 1/3 . Since the f 3 -orbit of every x ∈ C 1/3 remains in C 1/3 , a-priori No such estimates are known, and, again, what we do know arises from the existence of points in C 1/3 whose f 2 -orbit is dense. Questions about the existence of such points have a long history, going back to Cassels and Schmidt [2,18,11,10], leading up to At the same time, many f b -invariant set also contain points which do not have dense f a -orbits. For instance, the following was proved by Broderick, Bugeaud, Fishman, Kleinbock and Weiss [1] Theorem 5.8. The set of numbers in C 1/3 which are not normal in any base has full dimension (i.e. log 2/ log 3).
Thus, the situation in C 1/3 vis-a-vis density or non-density of orbits under f 2 , is precisely the relativization of the situation in the interval [0, 1]: almost every point, with respect to natural measures, has dense f 2 -orbit, but there is a full-dimensional set of exceptions. It is a remarkable fact that, as far as we know, there are no explicit example either of a point x ∈ C 1/3 whose f 2 -orbit is dense, or x ∈ C 1/3 \ Q whose f 2 -orbit is not dense!

Furstenberg's conjectures on projections and intersections
which is impossible. In particular, the conjecture implies that dim(X ∩ Y ) = 0.
Now, X ∩Y is, up to a linear change of coordinates, the intersection of the product set In other words, the particular line = {x = y} behaves like a Lebesgue-typical line, since, by Theorem 4.7, for a.e. line , Furstenberg has proposed that for products X × Y as above, the exceptional set of lines should not only have measure zero, but should in fact consist only of the trivial exceptions (i.e. lines parallel to the axes). Let In view of the heuristic for the slice theorem described in Section 4.2, this conjecture is another expression of the mutual independence of the structure of f a -and f b -invariant sets.
While much is known about generic slices, very little is known about specific slices, and the conjecture remains open except for a partial result by Furstenberg which is an easy consequence of the main result of [6,Theorem 4], though apparently the derivation has not appeared in print.
We prove this in Section 7.4. The case dim X + dim Y > 1 2 remains completely open. In view of the heuristic relation between slices and projections, it is natural to ask about the "dual" version of the conjecture. This problem, also raised by Furstenberg, was recently settled by Hochman and Shmerkin [9], following earlier work by Peres and Shmerkin [16]. Let π u : R 2 → R be given by π u (x, y) = ux + y Theorem 5.11. If X, Y are as in Conjecture 5.9, then for every u = 0, The proof is given in Section 9.4.

Warm-up: a random walk on measures
In our study of f b -invariant sets, a central tool will be Furstenberg's notion of a CPchain [6,7]. 3 Roughly speaking, this is a random walk on the space of probability measures which at each step jumps from a measure to a suitably re-scaled "piece" of the measure. This framework allows one to view a measure on R d as a point in an appropriate dynamical system, with the dynamics representing magnification, and provides useful language for describing the recurrence of features of the measure at smaller and smaller scales. Sufficiently regular recurrence of features at different scales gives a very powerful generalization of "self-similarity", or of the hierarchical structure that is present in many examples (such as the sets C α from Section 2.1). Furthermore, the method of local entropy averages, developed in Section 8, allows one to derive geometric information about the initial measure from the statistics of these orbits.
To fix notation, let b ≥ 2 be an integer and for µ ∈ P([0, 1] d ) and for D ∈ D b with µ(D) > 0, denote the conditional measure of µ on D by . This measure is, naturally, supported on D, and it is useful to "re-scale" it back to the unit cube. Thus, let L D : D → [0, 1) d be the unique homothety 4 from D onto [0, 1) d and let The random walk on measures, alluded to above, can now be described as follows. Starting at some µ 0 ∈ P([0, 1] d ), we jump to that is chosen randomly with probability proportional to its mass µ(D 1 ). Repeating this process, from µ 1 we jump to µ 2 = (µ 1 ) D 2 for a b-adic cell D 2 ∈ D b chosen randomly with probability proportional to µ 1 (D 2 ). Continuing in this way we obtain a random sequence of measures µ n , each of which is of the form µ n+1 = (µ n ) D n+1 for some D n+1 ∈ D b . It is not hard to check that µ n = (µ 0 ) D n where D n ∈ D b n is a decreasing sequence of badic cubes whose intersection is a point x. Thus (µ n ) ∞ n=1 describes the "scenery" that is observed as one descends to x along dyadic cubes. One can also verify that the random point x arising as above is distributed according to the original measure µ (this is proved, in a slightly modified setting, in Proposition 6.18 below).
While this description is heuristically correct, there are various complications which require us to replace the random walk above with a random walk on a suitable symbolic space. The next few sections are devoted to describing this setup more precisely, and to a discussion of some elementary geometric implications.

Measures, distributions and measure-valued integration
For a compact metric space X let P(X) denote the space of Borel probability measures on X, with the weak-* topology: This topology is compact and metrizable.
If (X, B, Q) is a probability space then a function X → P(X), It is a direct verification that this is a probability measure on (X, B). Alternatively, when X is compact one can also use the Riesz representation theorem to define R as the measure corresponding to the positive linear functional C(X) → R given by In what follows, we shall use the terms measure and distribution both to refer to probability measures. The term measure will refer to measures on R d or on sequence spaces, while the term distribution will refer to measures on larger spaces, such as P(R d ) (in this example a distribution is a measure on the space of measures).
In this section we recall some basic definitions and properties relating to Markov chains, which are processes describing a "random walk" on a space X, in which, from a point x ∈ X, one jumps to a randomly chosen point which depends (only) on x. These probabilities are encoded in a Markov kernel: Definition 6.1. A Markov kernel on a compact metric space. is a continuous 5 map P : X → P(X), denoted P = {P x } x∈X , which to each point x ∈ X assigns a distribution P x ∈ P(X).
Given a Markov kernel P = {P x } x∈X and a random (or non-random) initial point ξ 0 ∈ X, a random walk ξ 0 , ξ 1 , . . . can be generated inductively: assuming we have reached ξ n at time n, jump to a random point ξ n+1 whose distribution is P ξn . The resulting sequence (ξ n ) ∞ n=0 is characterized as follows.
Definition 6.2. A process (ξ n ) ∞ n=0 of X-valued random variables is a Markov chain with transition kernel P = {P x } x∈X and initial distribution Q ∈ P(X) if It is often convenient to have a more concrete representation of the random variables ξ n and of the underlying probability space. The standard way to do this is to consider the space X N of infinite paths (x 0 , x 1 , . . .) whose coordinates are in X, and let ξ n : X N → X denote the coordinate projections, ξ n (x) = x n . Definition 6.3. The Markov chain distribution with transition kernel {P x } x∈X and initial distribution Q ∈ P(X) is the unique distribution Q ∈ P(X N ) such that the coordinate projections ξ n : X N → X form a Markov chain with transition kernel {P x } x∈X and initial distribution Q. Remark 6.4. .

Given Q and {P
x } x∈X , the existence and uniqueness of Q is demonstrated as follows. For uniqueness, note that Q is determined by its marginals Q n = Dist(ξ 0 , . . . , ξ n ) on X n+1 , and by the properties in Definition 6.2 these marginals are characterized by the property that for f ∈ C(X n+1 ), For existence, one can check that for Q n ∈ P(X n+1 ) defined as above, the distribution Q n+1 extends Q n in the obvious sense, and hence by standard measure theory has a (unique) extension to X N .
2. If Q is as in the definition, then the random variables ξ n on the probability space (X N , Q) form a Markov chain in the sense of Definition 6.2. Conversely if (ξ n ) ∞ n=0 is a Markov chain in the sense of Definition 6.2, then their joint distribution is a Markov chain distribution.
Define an operator T P : P(X) → P(X) by This is a continuous and affine map. Note that if Q = δ x 0 then T P Q = P x 0 . More generally, if (ξ n ) ∞ n=1 is a Markov chain and we denote Q n = Dist(ξ n ), then we have the relation Q n+1 = T P Q n , because In particular, by induction Q n = T n P Q 0 .
Definition 6.5. A stationary distribution Q for the transition kernel {P x } x∈X is a fixed point for T P .
Proof. Begin with any initial distribution Q, and let Then Q N ∈ P(X). Since P(X) is compact, there is a convergent subsequence Q N k → Q ∈ P(X). Then by continuity of T P , Remark 6.7. .

1.
In general there can be many stationary distributions.
2. In the proof one could also define each Q N using a different initial distribution Proof. Endow X N with the distribution Q and let ξ n denote the random variables given by the coordinate projections from X N . Note that shift-invariance is equivalent to Dist(ξ 0 , . . . , ξ k ) = Dist(ξ n , . . . , ξ n+k ) for all n, k ∈ N Suppose that Q is shift invariant. Since Dist(ξ n ) = T n P Q, applying the above with n = 1 and k = 0, Suppose now that Q is stationary. Fix n and k and let Q n = T n P Q denote the distribution of ξ n under Q. By the defining properties of Q it is clear that Dist(ξ n , . . . , ξ n+k ) is the same as the distribution of the first k + 1 terms of the Markov chain when started from Q n . If Q is stationary then Q n = Q 0 , so Dist(ξ 0 , . . . , ξ k ) = Dist(ξ n , . . . , ξ n+k ), and since n, k were arbitrary this implies shift invariance. Definition 6.9. A stationary distribution Q is ergodic if Q is ergodic with respect to the shift.
More intrinsically, Q is ergodic if for every A ⊆ X with Q(A) > 0, for Q-a.e. x, the random walk started from x will reach A after finitely many steps.
Our last task in this section is to show that the ergodic components of a stationary Markov chain distribution are also Markov chain distributions, and for the same kernel. In order to establish this it is necessary to extend our definitions to allow Markov chains that extend backward in time as well as forward.
Definition 6.10. A distribution R ∈ P(X Z ) is a Markov chain distribution for a transition kernel {P x } x∈X if Dist(ξ n+1 |ξ n−k , . . . , ξ n ) = P ξn a.s., for all n ∈ Z and k ∈ N.
Evidently, the restriction of a two-sided Markov chain distribution to the positive coordinates is a Markov chain distribution in the previous sense. One cannot always extend a Markov chain distribution Q ∈ P(X N ) to a two-sided one, but if Q is shiftinvariant then one always can do so. Indeed, it is a general fact that if R ∈ P(X N ) is shift-invariant then there is a unique shift-invariant distribution R ± ∈ P(X Z ), called the natural extension of R, characterized by the property that Dist R ± (ξ n , . . . , ξ n+k ) = Dist R (ξ 0 , . . . , ξ k ). Evidently, if Q is Markov then Q ± is a Markov chain in the sense just defined.
Proof. If (2) holds for some n then we obtain Dist R (ξ n |ξ n−1 , . . . , ξ n−k ) = P ξ n−1 for all k by taking expectation over the variables (ξ i ) n−k−1 i=−∞ . On the other hand if R is a Markov chain with transitions {P x }, then for any Borel set A ⊆ X, by the martingale theorem with R-probability one we have P ξ n−1 (A) = P R (ξ n ∈ A|ξ n−1 , ξ n−2 , . . . , ξ n−k ) −−−→ k→∞ P R (ξ n ∈ A|ξ n−1 , ξ n−2 , . . .) which gives the other direction. Theorem 6.12. Let Q ∈ P(X N ) be a stationary Markov chain distribution for transition kernel P . Then the ergodic components of Q are a.s. Markov chain distributions for P .
Proof. Consider the distribution R = Q ± ∈ P(X Z ) which is the natural extension of Q. Let I denote the σ-algebra of σ-invariant Borel sets in X Z . For a sequence x = (x i ) ∞ −∞ , let R x denote the ergodic component of R to which x belongs. Now, for any n ∈ Z the sequence (x i ) n i=−∞ determines the atom of I to which x belongs (up to R-probability zero), or equivalently, it determines R x . This can be seen by applying the ergodic theorem "backwards" in time to a dense countable set of functions f ∈ C(X Z ), and noting that (x i ) n i=−∞ determines their ergodic averages and hence the ergodic component. Therefore, by Lemma 6.11, for any Borel set A ⊆ X, with R-probability one, P x n−1 (A) = P R (ξ n ∈ A | ξ n−1 = x n−1 , ξ n−2 = x n−2 , . . .) = P R (ξ n ∈ A | ξ n−1 = x n−1 , ξ n−2 = x n−2 , . . . , I) = P Rx (ξ n ∈ A | ξ n−1 = x n−1 , ξ n−2 = x n−2 , . . .) which means, by the same lemma,that R x is Markov with kernel P .
As a corollary, we find that the ergodic stationary distributions for P are precisely the extreme points of the convex, compact set of stationary distributions for P .

Symbolic coding
If one tries to describe the random walk outlined in Section 6.1 using the formalism of the last section, one arrives at the kernel (F µ ) µ∈P([0,1] d ) given by F µ = D∈D b µ(D)·δ µ D , under which µ ∈ P([0, 1] d ) goes to µ D with probability µ(D). Unfortunately this is not really a kernel, since µ → F µ is discontinuous. 6 For this reason we work instead in a symbolic space which represents [0, 1] d , and in which the random walk corresponding to the one above becomes a bona-fide Markov chain.
We begin by describing the symbolic coding. Fix a base b and the dimension d of the Euclidean space we work in, and let This is a set of integer vectors in R d , and will serve as digits in the b-adic representation of points in [0, 1] d . Let Ω = Λ N + endowed with the product topology (with Λ discrete), which makes Ω compact and metrizable. We often denote elements of Ω by i = (i 1 , i 2 . . .). On the other hand we denote finite sequences without parentheses: a = a 1 . . . a k ∈ Λ k . The cylinder corresponding to such an a = a 1 . . . a n is the closed and open set ∈ Ω with coordinates i k = (i k,1 , . . . , i k,d ) ∈ R d we define Thus the i-th coordinate of γ( i) is given in base-b notation by 0.i 1,i i 2,i i 3,i . . .. In particular this shows that the map γ : Ω → [0, 1] d is surjective. On the other hand, since numbers of the form k/b n , k, n ∈ N, have two base-b representations, it also shows that γ is not 1-1. Rather, the set of points x ∈ [0, 1] d with multiple perimages under γ is precisely the set of x having a coordinate of the form x = k/b n . This set is a countable union of affine subspaces which form the boundaries of the b-adic cubes.
In the presence of a measure the non-injectivity of γ can often be corrected by ignoring a nullset. For µ ∈ P(R d ), we say that γ is 1-1 µ-a.e. if γ −1 (x) is a singleton for µ-a.e. x. By the above this is the same as requiring that µ(∂D) = 0 for all D ∈ D b n , n ∈ N. If this is the case, then there is a unique µ ∈ P(Ω) with γ µ = µ, and we sometimes say then that γ is 1-1 µ-a.e.
For a sequence a ∈ Λ n , it is also clear that γ([a]) = D, where D ∈ D b n is the unique element containing n k=1 a k b −k . Thus, up to topological boundaries, the partition C n and D b n are identified under γ, and in particular, if γ is 1-1 µ-a.e. for some µ ∈ P([0, 1] d ) then γ([a]) and D as above agree up to a µ-nullset, and the partitions C n and D b n are identified up to nullsets by γ.

Symbolic magnification of measures
Let σ : Ω → Ω again denote the shift map For a ∈ Λ n define the map L a : [a] → Ω by This is a homeomorphism [a] → Ω preserving the sequence structure. The map L a induces a map on measures, P([a]) → P(Ω), by push-forward. We denote this map also by L a . Given a measure µ ∈ P(Ω) and a ∈ Λ n we often write µ[a] instead of µ([a]). Assuming that µ[a] > 0, we define µ| [a] and µ a = L a µ a Proof. For the first identity, calculate: For the second, note that for any c 1 . . . c r ∈ Λ r , by several applications of (3),

CP-chains
Let us now return to the random walk on measures that was outlined in Section 6.1. In symbolic terms, it corresponds to the kernel {P µ } µ∈P(Ω) given by Unlike its Euclidean relative, the map µ → P µ is continuous, so P is a true kernel, but it is still not the "right" random walk to consider. The reason is that the sequence of measures that one sees when one descends along nested cylinder sets does not tell us which cylinder sets were chosen, and this information will be important to us later on. To demonstrate this shortcoming, consider Ω = {0, 1} N + with the uniform product measure µ. Then µ a = µ for every a ∈ {0, 1} N + , and so Q = δ µ is stationary for the kernel described above and the associated Markov chain is trivial. On the other hand, in the course of generating the Markov chain in this example, one chooses, at each step, a symbol a ∈ {0, 1} uniformly and independently of previous choices. This random sequence of symbols mirrors µ itself, and we shall see that this connection is general and can be exploited to great benefit.
Thus, in order to keep track of these choices, we enlarge the state space and modify the kernel in the following way.
1. There may be j ∈ Λ for which µ j is undefined, but in this case the transition to (j, µ j ) occurs with probability 0.
2. The symbol i does not play any role in the definition of F (i,µ) . Rather, it records "where we came from". The symbol j ∈ Λ "to which we go" is recorded in the resulting state (j, µ j ).

(i, µ) → F (i,µ) is continuous.
Definition 6.16. A (symbolic) CP-distribution is a stationary distribution for F . A sequence of random variables (ξ n ) ∞ n=0 representing the associated Markov chain is called a CP-chain. The associated measure on Φ N is called the CP-chain distribution.
If P ∈ P(Φ) = P(Λ × P(Ω)) is a CP-distribution, we often shall identify it with the marginal distribution of P on its second coordinate, P(Ω). Thus for f : P(Ω) → R we may write´f (ν)dP (ν) instead of´f (ν)dP (i, ν). 2. More generally, any σ-invariant measure µ ∈ P(Ω) gives rise to two kinds of stationary distributions. The first is P =´δ (ω 1 ,δσω) dµ(ω), which is by definition supported on atomic measures of the form δ ω . Then where in the second-to-last equality used the shift-invariance of µ; so P is stationary.
3. The second distribution arising from σ-invariant measure µ is more interesting.
[ ω] There is a family of conditional measures {µ ω }, measurable with respect to F − , such that µ ω is supported on [ ω] F − , and This family is defined a.e. and is unique up to measure 0 changes. Informally, given coordinates ( ω i ) i≤0 describing the "past", the measure µ ω ∈ P(Ω) is the conditional distribution of ( ω i ) i≥1 (note that µ ω depends only on the negative coordinates).
Since µ is σ-invariant, if ω ∈ Ω is distributed according to µ, then the distribution of µ σ ω is the same as µ ω . On the other hand clearly µ σ ω = (µ ω ) [ω 1 ] , and the conditional probability of It is interesting to note that this distribution coincides with the previous one when µ has entropy 0 with respect to the shift (equivalently, when πµ ∈ P([0, 1]) has dimension 0). Then the measures µ ω reduce to points: the infinite past completely determines the future, and P is again supported on point masses distributed according to µ.
One of the crucial properties of CP-chains is that they describe "zooming in" on a measure along nested cylinders which are chosen with the probabilities assigned by the original measure. This property is called adaptedness. Proposition 6.18. Let (i n , µ n ) ∞ n=0 denote the CP-chain with initial distribution Q ∈ P(Φ) (so here i n , µ n to denote random variables). Then for every n and a 1 . . . a n ∈ Λ n , P(i 1 . . . i n = a 1 . . . a n |µ 0 ) = µ 0 [a 1 . . . a n ] In particular, conditioned on µ 0 , the random point i = (i 1 , i 2 , . . .) ∈ Ω is distributed according to µ 0 .
Proof. By definition of the transition kernel F , with probability one, µ k = µ i k k−1 for all k, so by iterating Equation (4)  which, using Equation (5) and the law of total probability, implies P(i 1 . . . i n = a 1 . . . a n |µ 0 ) = n k=1 P(i k = a k |µ 0 , (i 1 . . . i k−1 ) = (a 1 . . . a k−1 )) This gives the first statement. The second is immediate from the first, since, conditioned on µ 0 , the distribution of i = (i 1 , i 2 , . . .) is determined by the probabilities P( i ∈ [a 1 . . . a n ]|µ 0 ), which by the above are the same as µ 0 [a 1 . . . a n ].

Shannon information and entropy
Let µ be a probability measure on a probability space (X, F) and A = {A i } i∈N a finite or countable measurable partition of X. The information function I µ,A : X → R of µ and A is where as usual A(x) is the atom of A containing x. The Shannon entropy of A is the mean value of the information function: with the convention 0 log 0 = 0. Intuitively, H(µ, A) measures how "finely" A partitions the probability space (X, µ), or how uniformly µ is spread out among the atoms. This is evident from the following basic properties, which we do not prove (see e.g. [3]): One technical problem which we shall encounter later when estimating entropy is that the function (µ, m) → 1 m H(µ, D m ) is not continuous (it is continuous when µ is restricted to the space of non-atomic measures, but not uniformly so). However, continuity does hold in an asymptotic sense: if m is large then small changes to µ and m have only mild effect on the entropy. The following lemmas make this precise. Lemma 6.20. Let µ ∈ P(R d ) and m ∈ N.
where C 1 depends only on d.

(Translation) If ν(·)
Finally, the following important inequality is essentially a consequence of convexity of the information function: Lemma 6.21. Let (p j ), (q j ) be probability vectors with q j = 0 =⇒ p j = 0. Then − j p j log q j ≥ − j p j log p j .

Geometric properties of CP-distributions
Recall that γ : Ω = Λ N + → [0, 1] d is the geometric coding map. We denote elements of Φ N by ( i, µ) = (i n , µ n ) ∞ n=0 ∈ Φ N (these are now elements of the sequence space, not a sequence of random variables). Definition 6.22. If P ∈ P(Φ) is a CP-distribution we denote by P ∈ P(P([0, 1] d )) the distribution P = γτ P , where τ : Φ → P(Ω) is the projection to the second component. We call P the geometric version of P , and say that it is a geometric CP-distribution.
Our first task is to address the non-injectivity of γ. Let Note that δ(Ω (k) ) is a face of the cube [0, 1] d . The next lemma allows us to assume that the measures of a CP-distribution make γ : Ω → [0, 1] d a.e. injective.
Lemma 6.23. Let P be an ergodic CP -distribution. Then the probability that γµ, µ ∼ P , gives positive mass to ∂D for some D ∈ D d b is 0 or 1. In the latter case γµ is P -a.s. supported on a face of the cube of the form x k = 1 for some k = 1, . . . , d, and correspondingly µ is supported on the set Ω (k) . In this case P can be identified with a CP-distribution constructed in dimension d − 1 (that is, with Proof. Consider the shift-invariant and ergodic distribution P ∈ P(Φ Z ) corresponding to P . For each k write is shift invariant so is A k , and hence by ergodicity, P (A k ) = 0 or 1. By the previous proposition, Hence, either P (A k ) = 1, in which case µ is supported on Ω (k) , P -a.s., or else P (A k ) = 0, in which case µ gives Ω (k) mass 0, P -a.s. The corresponding statement for πµ and faces of [0, 1] d follows.
Finally, if P (A k ) = 1 one can use the natural identification of Ω (k) with ({0, . . . , p − 1} d−1 ) N to identify P to a CP-distribution of dimension d − 1.
Our next goal is to obtain an expression for the dimension of γµ when µ ∈ P(Ω) is a typical measure for a CP-distribution P . A key lemma for us will be the representation of the mass of long cylinders as an ergodic-like average. Define the function I : Φ Z → R by This is of course just the information function I µ 0 ,C 1 evaluated at i (see Section 6.7). Lemma 6.24. If (i n , µ n ) ∈ Φ N satisfies µ n = µ in n−1 for all n, then, writing µ = µ 0 and i = (i ' , i 1 , . . .), Proof. Immediate by taking logarithms in the identity µ[i 1 . . . i n ] = n k=1 µ i 1 ...i k−1 [i k ] (Equation (5)), and using the fact that µ i 1 ...i k−1 = µ k−1 (which follows from the definition of the Furstenberg and Equation (4), as in the proof of Proposition 6.18).
Proof. Using Proposition 6.18, we calculate: Proposition 6.26. Let P be an ergodic CP-distribution with geometric version P . Then P -a.e. µ is exact dimensional and the dimension is given by Proof. By the previous proposition, we may assume that γµ(∂D b n ) = 0 for P -a.e. µ, since otherwise reduce to a lower-dimensional situation. Let us first re-state our objective, which is to show that for P -typical µ, for γµ-a.e. x, By definition, the point x = γ( i) is distributed according to γµ if i ∈ Ω is distributed according to µ. Hence, using the fact that γµ(D b n (x)) = µ[i 1 . . . i n ], what we need to prove is that for P -a.e. µ, for µ-a.e. i ∈ Ω, Let P ∈ P(Φ N ) be the CP-chain distribution corresponding to P . Then by Proposition 6.18, choosing µ according to P and i ∈ Ω according to µ is the same as choosing (i n , µ n ) ∞ n=0 according to P and taking µ = µ 0 and i = (i 1 i 2 . . .). Thus we need to prove (8) for a.e. µ, i chosen in this way.
The proof is now completed by noting that by (7), 1 n log µ[i 1 . . . i n ] = 1 n n−1 j=0 I(σ j ( i, µ)), which, by the ergodic theorem, converges to´I dP a.s. over choice of ( i, µ). By Lemma 6.25, this integral is just´H(µ, C 1 ) dP (µ), as claimed. Definition 6.27. If P is an ergodic CP-distribution we denote by dim P the a.s. dimension of γµ for µ ∼ P .

Constructing CP-distributions from f b -invariant sets
Recall that C n is the partition of Ω = Λ N into cylinders of length n. We generally denote elements of Ω by i = (i 1 , i 2 , . . .). Lemma 7.1. Let µ ∈ P(Ω). Then Proof. The poof is a computation based on taking logarithms in the identity µ([i 1 . . . i n ]) = n−1 (5)) and integrating. In more detail, using the identity µ = [a]∈Cn µ| [a] , we have The second claim follows from the first, since by Proposition 6.18, and writing i −1 = 0 (arbitrarily), we have

Dimension of invariant sets
Before discussing intersections of sets we prove a result about a single f b -invariant set which we shall later use, and which also provides a self-contained proof of the coincidence of Minkowski and Hausdorff dimension for such sets.
Theorem 7.2. Let X ⊆ [0, 1] be a closed, f b -invariant set with dim M X = α. Then there is a b-adic ergodic CP-distribution P such that πν is supported on X, and dim ν = α, P -a.s..
We pass to Ω. Let U n = {a ∈ A n : π[a] ∩ I = ∅ for some I ∈ I n } so that 1 ≤ |U k |/|I k | ≤ 2, and hence 1 N k log |U k | → α. For a ∈ U k let y a ∈ [a] ∩ π −1 X be a representative point and set Next, run the Furstenberg chain from time 0 to time N k starting at (0, ν k ). We obtain distributions P k given by Since P(Φ) is compact, by passing to a further subsequence we may assume that P k → P , and we have seen in the proof of Lemma 6.6 and the remark following it that P is Fstationary, i.e. is a CP-distribution. We claim that P -a.e. ν is supported on X. Indeed, since X is closed and π is continuous, the set {ν ∈ P(Λ N ) : γν(X) = 1} is closed in the weak-* topology, and so it is enough to show that P k -a.e. ν satisfies γν(X) = 1. To see this we must show that for each 0 ≤ n ≤ N k and a ∈ A n , the measure γ(ν a k ) is supported on X. Indeed, π(ν k ), and hence γ(ν k | γ[a] ), are supported on X, and since f n b X ⊆ X, we also have that π(ν a k ) = π(σ k ((ν k )| [a] ) = f n b (π(ν k | π[a] )) is supported on X, as desired.
On the other hand, H(·, C 1 ) : P(Ω) → R is continuous 8 . We thus havê Since P is the integral of its ergodic components, there is a set of positive measure of ergodic components P of P with´H(τ, C 1 ) dP (τ ) ≥ α and P -a.e. ν is supported on X. Let ν be a typical measure for P , and µ = πν. By Corollary 6.26 dim µ ≥ α, as required.
Proposition 4.3 follows form the theorem above.

Eigenfunctions
Let X 0 be a compact metric space, X = X N 0 , and σ : X → X the shift map defined in the usual way. Let µ ∈ P(X) be a σ-invariant and ergodic probability measure. A function f : X → S 1 = {z ∈ C : |z| = 1} is called an eigenfunction for (X, µ, σ) with eigenvalue λ ∈ S 1 if f (σx) = λf (x) for µ-a.e. x.
In the situation above, write R : S 1 → S 1 for the rotation map R(z) = λz. Then In particular if λ is not a root of unity then the only R-invariant measure on S 1 is normalized Lebesgue measure, 9 and so ν must be this measure.
We require a slight generalization of the situation above where f is set-valued. Let H denote the space of closed, non-empty subsets of S 1 , which can be made into a compact metric space using the Hausdorff metric We say that a measurable function f : X → H is an eigenfunctions with eigenvalue λ if f (σx) = λf (x) for µ-a.e. x, where on the right-hand side λf (x) = {λz : z ∈ f (x)}. We exclude the trivial case that f (x) = S 1 a.e., for which the equation holds for any λ ∈ S 1 . Lemma 7.3. Let f : X → H be an eigenfunction. Then there is a set E ∈ H such that f (x) is a rotation of E for µ-a.e. x.
Proof. S 1 acts continuously on H by rotations, with ρ ∈ S 1 acting by E → ρE. By the eigenfunction property, f (x), f (σx) lie in the S 1 -same orbit, so by ergodicity f µ must be supported on a single S 1 -orbit in H. This was the claim. Proof. Let E ∈ H be as in the previous lemma. Suppose first that E has no rotational symmetries, i.e. ρE = E for all ρ ∈ S 1 \ {1}. Then for µ-a.e. x. we have f (x) = ρE for a unique ρ = ρ(x) ∈ S 1 . It is easy to see that this implies that ρ = ρ(x) is measurable in x (this uses the fact that E is closed), and we have ρ(σx)E = f (σx) = λf (x) = λρ(x)E, so ρ is an eigenfunction with eigenvalue λ. Choose z 0 ∈ E and set f (x) = ρ(x)z 0 , which is also an eigenfunction with eigenvalue λ and satisfies that f (x) ∈ f (x) a.s. Now, f µ is normalized Lebesgue measure on S 1 , hence f µ(U ) > 0. This means by definition that µ(x : f (x) ∈ U ) > 0. But f (x) ∈ f (x) µ-a.s., so the event {x : f (x) ∈ U } is a.s. contained in the event {x : f (x) ∩ U = ∅}, and the lemma follows.
In general let G denote the group of rotational symmetries of E, i.e. those ρ ∈ S 1 such that ρE = E. Since E is closed so is G, and since E = S 1 also G = S 1 , so G, being a proper closed subgroup of S 1 , is finite, and consists of roots of unity of some order N . Let ϕ : S 1 → S 1 the map z → z N . It is then easy to check that E = ϕE has no rotational symmetries (any such symmetry could be lifted to a symmetry of E that is not in G, a contradiction). Now define f = ϕf . This is an H-valued eigenfunction with eigenvalue λ N , and f (x) = E µ-a.e.. Thus by the first case discussed above, if V ∈ S 1 has positive Lebesgue measure then µ(x : f (x) ∩ V = ∅) > 0. Taking V = ϕU (which is measurable since ϕ is a local homeomorphism) and using the fact that f (x) ∩ V = ∅ if and only if f (x) ∩ U = ∅ we obtain the claim.
Corollary 7.5. For f , λ as in the previous lemma, for any set X ⊆ X of full measure, f (X ) has full Lebesgue measure (and is Lebesgue measurable).
Proof. The only subtlety here is thee issue of measurability. By the theorems of Egorov and Lusin, we can find compact subsets X n ⊆ X on which f is continuous and, µ(X \ X n ) = 0. Write X = X n , so X has full measure. Also, f (X n ) are compact, so f (X ) = f (X n ) is measurable. By the previous lemma (applied to U = S 1 \ f (X )) we find that f (X ) has full Lebesgue measure. Since f (X ) ⊇ f (X ), this implies that f (X ) is Lebesgue measurable and of full measure.

Furstenberg's intersection theorem
In this section we prove Theorem 5. 10 As a first observation, we claim that if U = ∅ then U is dense in [0, ∞). Indeed, suppose that u ∈ U and write E = (X × Y ) ∩ u,v . Applying the map f a × id to E and using the invariance of X × Y under this map, we obtain The set f a × id( u,v ) is the union of finitely many line segments of slope u/a, hence by the above, f a × id(E) is a subset of a union of the form k i=1 u/a,v i . Since f a × id is piecewise bi-Lipschitz, dim(f a × id(E)) = dim E = α. Hence one of the line segments u/a,v i intersects X × Y in a set of dimension ≥ α, i.e., u/a ∈ U . Similarly, applying id ×f b to E, we find that there is a line segment ba,v which intersects X × Y in a set of dimension ≥ α, so bu ∈ U . In short, U is invariant under multiplication by b and 1/a, or equivalently, log U = {log u : u ∈ U } is invariant under addition of log b and subtraction of log a. Since log b/ log a / ∈ Q, it is a well known fact that follows that log U is dense in R, i.e. that U = [0, ∞).
The next theorem says that in the last paragraph density can be improved to full Lebesgue measure. We first consider how a measure µ ∈ P([0, 1]) can be affinely embedded in X × Y . Let ϕ u,v : [0, 1] → T 2 denote the affine embedding ϕ u,v (t) = (t, ut + v mod 1) This is a closed set. We make two observations. Lemma 7.6. If u ∈ L(µ) then bu ∈ L(µ). Similarly, if ν ∈ P(Λ N ) and u ∈ L(πν) then bu ∈ L(πν).
Proof. For any u, v, observe that (id ×f b ) • ϕ u,v = ϕ bu,v for some v . The claim follows.
Proof. Let I = [ k a , k+1 a ) and ψ(t) = 1 a t + k a . Let v ∈ R be such that ϕ u,v µ is supported on X × Y . Since ψµ I = µ| I , it follows that ϕ u,v ψµ I is also supported on X × Y . But a calculation shows that ϕ u,v ψ(t) = ϕ u/a,v for some v ∈ R. The claim follows, and the second part is proved similarly.
Theorem 7.8 (Furstenberg 1970). Let X be closed and f a invariant, let Y be closed and Proof. Assume without loss of generality that b > a. We begin as in the proof of Theorem 7.2. Start with measures µ k supported in (uX +v)∩Y with 1 N k H(µ k , D a N k ) → α. Lifting µ k to ν k ∈ P(Ω) using a-adic coding and running the a-adic Furstenberg operator N k steps starting from (0, ν k ), we obtain a sequence P k ∈ P(Φ) of distributions; after passing to a subsequence we can assume they converge to a a-adic CP-distribution P with´I dP ≥ α. Replacing P by an appropriate ergodic component we can assume that P is an ergodic CP-distribution and´I dP ≥ α, hence by Corollary 6.26, dim γν ≥ α for P -a.e. ν. As in the previous proof, for P -a.e. ν the measure γν is supported on X.
Since µ k is supported on (uX + v) ∩ Y y, we have u ∈ L(µ k ), so by the lemmas preceding the theorem, for every i ∈ Λ n with µ([i]) > 0 we have u/a n ∈ L(γ(µ i 1 ,...,in k ))), and hence b m u/a n ∈ L(γ(µ i 1 ...in k )) for all m. If n is large enough that u/a n < 1, then there is an m such that b m u/a n ∈ [1, b]. Thus, if for µ ∈ P([0, 1]) we set then U (π(ν i 1 ,...,in k )) = ∅ for all large enough k, n and i ∈ Λ n for which ν i k is defined. It follows that P k (ν : U (γν) = ∅) → 1 as k → ∞, and since µ → U (µ) is continuous, we find that P (ν : U (γν) = ∅) = 1 Next, note that if ω = (i n , ν n ) ∞ n=0 is a typical sequence in the Markov chain started from P , then again by the lemmas preceding the theorem, since ν 1 = ν i 1 0 and b > a, Thus if we define the set-valued function define f : Φ N → H by By ergodicity, we must a.s. have f (σω) = e 2πi log b a f (ω). Finally, with respect to the ergodic shift-invariant distribution P ∈ P(Φ N ) corresponding to P , the function f is an H-valued eigenfunction with eigenvalue e 2πi log b a , which, since a ∼ b, this is not a root of unity. By Corollary 7.5, the image of under f has full Lebesgue measure. But this precisely means that for Lebesgue-a.e. u there is a measure µ with dim µ ≥ α, and a v, such that ϕ u,v µ is supported on X × Y . This proves the theorem.
We can now prove the results on intersections that we stated earlier: Theorem 7.9 (Furstenberg). Let X be closed and f a invariant, let Y be closed and Proof. Suppose the conclusion were false. Then by the preceding theorem, in a.e. direction there is a line intersecting X × Y in a set of positive dimension. In other words, for a.e. u ∈ S 1 there exist x, y ∈ X × Y such that 0 = x − y has the same direction as u, so dim({x − y} x,y∈X×Y ) = 1. On the other hand, the map (R 2 ) 2 → R 2 , (x, y) → x − y, is a Lipschitz map, so the image of (X × Y ) 2 has dimension at most dim(X × Y ) 2 = 2 dim(X × Y ) = 2(dim X + dim Y ). By assumption this is less than 1 and so has Lebesgue measure 0. This contradiction proves the theorem.

Kakeya-type problems
The argument used in the last theorem solves the intersections conjecture when dim X + dim Y < 1 2 and raises the following problem: Problem 7.10. Suppose Z ⊆ R 2 is a set such that in every (or almost every) direction there is a line with dim(Z ∩ ) ≥ α. When can one conclude that dim Z ≥ 1 + α?
If the answer were affirmative for products of the form CZ = X × Y with X, Y as in Theorem 5.10, then the intersections conjecture would follow from that theorem.
Although Fubini-type heuristics would lead one to believe that the answer is affirmative in general, but this is not the case, see [19]. It is an open problem to find the best lower bound on dim Z in terms of α. However, known examples do not rule out the possibility that the answer is affirmative for the sets of the form X × Y that interest us.
It is worth noting that the problem is related to the following well-known problem: Conjecture 7.11 (Kakeya). If Z ⊆ R d is a set which contains a line segment in every direction, then dim Z = d.
In dimension d = 2 there is relatively elementary proof, see e.g. Falconer [5]. For d ≥ 3 the conjecture remains open. For a comprehensive, though slightly outdated, survey, see Tom Wolff's article [19], which also contains a discussion of Problem 7.10.

Martingale differences and their averages
We recall some standard tools from probability and analysis.
Definition 8.2. Let (Ω, B, µ) be a probability space, (F n ) a filtration. A sequence {f n } of L 1 -functions is called a martingale difference sequence 10 if it is adapted to (F n ) and E(f n |F n−1 ) = 0.
Starting with an L 1 sequence (g n ) adapted to (F n ), one obtains a martingale difference sequence by setting f n = g n − E(g n |F n−1 ).
The only fact we need about martingale differences is a consequence of the following ergodic-like theorem for orthogonal functions. The proof is similar to the standard proof of the law of large numbers for independent random variables using Kolmogorov's inequality (which is usually states for i.i.d. random variables, but is valid with the same proof for martingale differences). Note also that L 2 -martingale difference sequences also form an orthogonal sequence in L 2 , and together with norm-boundedness this is enough to ensure that the averages converge a.e. to 0. In fact one can do with even weaker non-correlation conditions, see e.g. [14].
Corollary 8.4. Let (g n ) be a sequence of functions and (F n ) a filtration such that for some p and every 0 ≤ k < p, the sequence (g np+k ) is a martingale difference sequence for (F np+k ), and sup n g n 2 < ∞. Then 1 N N i=1 g i → 0 a.s. and in L 2 .
Proof. For any N we can write N = N 0 p + k 0 for 0 ≤ k 0 < p, and then Since by the previous theorem, 1 N N i=1 g ip+k → 0 a.s. and in L 2 for each 0 ≤ k < p, and since there are p terms in the sum and N 0 N → 1 p as N → ∞, the corollary follows.

Local entropy averages
Throughout this section and the coming ones we fix an implicit (arbitrary) integer parameter b ≥ 2 and suppress it in our notation. 10 The reason for this terminology is that if (fn) is a martingale difference sequence, then FN = The following theorem allows one to compute the dimension of a measure µ at a typical point x via the average behavior of the measure on the b-adic cells D b n (x) descending to x. The motivation is dynamical, inasmuch as one can think of this sequence of measures as an orbit in a dynamical system, and this dynamical viewpoint is precisely what underlies the computation of dimension in Proposition 6.26. Unlike that proposition, however, the theorem below works in complete generality with no dynamical assumptions, and this is precisely its utility.
Theorem 8.5 (Local entropy averages lemma). Let µ ∈ P(R d ) and p ∈ N. Then for µ-a.e. x, Proof. For convenience, for n < 0 we re-define D n to be the trivial partition of R d . Consider the information function of µ D b n (x) with respect to the partition D b n+p , which we denote by and the terms of the averages 1 are a sequence of L 2bounded 11 martingale differences for the filtration 12 (D n ). By Theorem 8.4 they converge µ-a.e. to 0. Finally, we have already encountered the identity which combined with the above a.s. limit shows that for µ-a.e. x, It is often better to average in single steps rather than steps of p. For this we have: Lemma 8.6. Let µ ∈ P(R d ) and p ∈ N. Then for µ-a.e. x, 11 To verify L 2 boundedness, note that the function x log 2 x, which arises when integrating the second power of the information function, is bounded on [0, 1]. 12 We identify Dn with the σ-algebra generated by its atoms.
Proof. The proof of the last theorem is easily adapted to show for every 0 ≤ k < p that Averaging over k gives the claim.

Dimension of coordinate projections
The local entropy averages lemma bounds dim µ in terms of the average entropy of the measures µ D b n (x) , n ∈ N. In the next three sections our objective is to obtain an analogue for linear images of measures. Thus, for µ ∈ P(R d ) and π ∈ Π d,k a linear map R d → R k , we would like to bound dim πµ in terms of the mean behavior of the sequences µ D b n (x) for µ-typical x, and, specifically, the entropy of their π-images.
Definition 8.7. If µ ∈ P(R d ) and π : R d → R k is a linear map, then for x ∈ R d and m ∈ N write and e(µ, π, x) = lim sup m→∞ e m (µ, π, x) Although it is not obvious from the definition, the sequence e m (µ, π, x), m ∈ N, is µ-a.e. convergent, but we will not use this fact.
Proof. Let E i = π −1 D k i , so that µ(E i (x)) = πµ(D k i (πx)). Since a πµ-typical point y ∈ R k is obtained as the projection πx of a µ-typical point x ∈ R d , our goal is to show that Arguing now just as in the proof of the local entropy averages lemma (Theorem 8.5), we conclude that for every 0 ≤ k < p, Now fix x and let D = D b i (x) and E = E b i (x), and let E 1 , . . . , E r ∈ E b i+p denote the cells such that µ(E j ) > 0. Write q j = µ E (E j ) and p j = µ D (E j ), so that J b i takes the value q j on E j . Both (q j ) and (p j ) are probability vectors, and since D ⊆ E also µ D µ E and hence q j = 0 implies p j = 0. Thus, from the definitions and Lemma 6.21 applied to the vectors (p j ), (q j ), Inserting this into Equation (9) completes the proof.

Changing coordinates
The proof of Theorem 8.8 relied on the fact that D d a k+p refines π −1 D k a n . This holds when π is a coordinate projection, but not for general linear maps. In order to treat the general case we now investigate how the local behavior of entropy changes when we change to a dyadic partition in a new coordinate system. We shall state things a little more generally, since it is not much harder to do so.
Recall that a partition B refines a partition A if every A ∈ A is a union of elements of B. A sequence (A n ) of partitions is refining if A n+1 refines A n for all n. Definition 8.9. Let (X, µ) be a probability space. Let (A n ), (B n ) be refining sequences of partitions of X. We say that (B n ) asymptotically refines (A n ) (with respect to µ) if for every ε > 0 there is an s ∈ N such that Trivial situations aside, the simplest method to ensure that one partition asymptotically refines another is to randomly perturb one of the partitions. The example that interests us is that of b-adic partitions for different coordinate systems on R d . To be precise, fix some orthogonal basis u 1 , . . . , u d of R d and let ξ ∈ [0, 1] d be chosen randomly according to Lebesgue measure. Let E n = E n (ξ) denote the (random) partition of R d which is the n-adic partition with respect to the coordinate system whose origin is ξ and whose principal axes are in directions u 1 , . . . , u d (we continue to write D d n for standard n-adic partitions). Observe that E n (ξ) = E n (0) + ξ, where for a partition E and x ∈ R d we write E + x = {E + x : E ∈ E}.
Proposition 8.10. Let µ ∈ P(R d ) and let E n = E n (ξ) be the random partitions described above for a given orthogonal basis of R d . Then almost surely (over the choice of ξ), for every base b, the partitions (D d b n ) asymptotically refine (E b n ).
Denote by U ξ the isometry of R d given by the composition of translation by −ξ and the linear map given by u i → e i . Note that U ξ maps E b n = E b n (ξ) to D b n . Since ξ is chosen from an absolutely continuous distribution, any fixed x, the distribution of U ξ x is absolute continuous, and hence U ξ x is a.s. (over the choice of ξ) normal. Choosing x randomly according to µ and applying Fubini's theorem, for a.e. choice of ξ we find that U ξ x is normal for µ-a.e. x. Thus (10) implies that for every ε > 0 there is a δ > 0 such that Fix ε and corresponding δ as above, and choose s so that every I ∈ D d b n+s has diameter less than b −n δ. Observe that if x, n are such that b n · d(x, ∂E b n (x)) > δ then . From this and the inequality (6) we conclude that which is what we wanted to prove.
It is elementary that if G has convexity defect δ, then In our application we will consider functions G : P([0, 1] d ) → R of the form 1 p H(·, E n ), for suitable partitions E n and a parameter p, n. Since the entropy function H has convexity defect 1, such functions all have the same defect δ = 1/p (uniformly in n). Theorem 8.13. Let µ ∈ P([0, 1] d ) and let (A n ), (B n ) be refining sequences of partitions such that (B n ) asymptotically refines (A n ) (w.r.t. µ). Let C n = A n ∨ B n . Then for every ε > 0 there is an s such that the following holds.
1. For any sequence G n : G n (µ Bn(x) )+εM +δ log 2 µ-a.e. x 3. If and G n satisfy the combined hypotheses of (1) and (2), then The same statements hold with lim sup in place of lim inf.
Proof. Fix ε, and choose s as in the Definition 8.9 for the sequences {A n }, {B n }. Define f n , g n : By our choice of s, Since C n = A n ∨ B n and the sequences (A n ) and (B n ) are refining, B n+s (x) ⊆ A n (x) if and only if B n+s (x) ⊆ C n (x), so Finally, note that f n , g n are C n+s -measurable (because C n+s refines B n+s ). We prove the first claim. Write By non-negativity and concavity of G n+s , or equivalently (using f n (x) = 1 {B n+s (x)⊆Cn(x)} ), Since g n is C n+s -measurable and bounded uniformly in n, by the last inequality and the ergodic theorem for martingale differences (Corollary 8.4), Using 0 ≤ G n ≤ M , we have so with the help of Equation (12), Combined with (13), this completes the proof. For the second part, write c = δ log 2 . Using almost-convexity and G n+s ≤ M we have Since f n , g n are C n+s -measurable, we again use the ergodic theorem for martingale differences again (Corollary 8.4), equation (12), and the trivial inequality g n (x) ≤ G n+s (µ B n+s (x) ), Changing the index from n + s to n in the last inequality gives the claim. The third statement is a formal consequence of the first two. The versions using lim sup instead of lim inf are identical.
Proof. Let (E b n ) ∞ n=1 be a b-adic partition with respect to a randomly perturbed coordinate system. By Proposition 8.10, (E b n ) asymptotically refines both (D b n ) and (D b n ), and clearly N ( . The corollary now follows by part (3) of the previous theorem for the pairs (D b n ), (E b n ) and (D b n ), (E b n ), and from the triangle inequality.

Dimension of general projections
We now give the general case of Theorem 8.8 for non-coordinate projections. As before the base we fix an integer base b ≥ 2 and suppress it in our notation.
Proof. Choose a coordinate system in R d with respect to which π is the coordinate projection to R k , and let E n be the corresponding n-adic partition of R d . We may assume that π −1 D k b n refines E b n (if this is not the case initial, a translation and scaling of the coordinates in R k achieve it without changing dim πµ). By basic properties of entropy, this function is concave and has convexity defect δ = 1 m log b . By Corollary 8.14, and assuming m is also large in a manner depending on ε, for µ-a.e.. x, By our choice of E n and Theorem 8.8 this implies Now taking the limsup over m, and then the infimum over ε, for µ-a.e. x dim πµ ≥ lim sup m→∞ e m (µ, π, x) = e(µ, π, x) The claim follows.

Projections of dynamically defined sets and measures
We are finally ready to study the dimension of projections of typical measures for CPdistributions, and prove Theorem 5.11.

More on entropy and dimension
A natural notion of dimension is the following: Definition 9.1. The entropy dimension dim e µ of µ ∈ P(Q d ) is lim n→∞ H(µ, D n ) log n assuming the limit exists; if not we define dim e µ and dim e µ using lim sup and lim inf, respectively. Often it is convenient to compute entropy dimension along an exponential subsequence of ns: Proof. Each m is bounded between b n−1 and b n for some n = n(m). Using Lemma 6.20, for such a pair we see that |H(µ, D b n ) − H(µ, D m )| < C. The desired equality follows since n(m) log b/ log m → 1 as m → ∞.
Entropy dimension and pointwise dimension are related by the following: ≥ dim µ Remark 9.4. The inequality above can be strict, and in general there is no relation between entropy dimension and dim µ. However, if α(x) = lim r→0 log µ(B r (x))/ log r exists at µ-a.e. point then dim e µ =´α(x) dµ(x).

Dimension of projections of with local statistics
We have seen that for measures on [0, 1] d arising from ergodic CP-distributions, the dimension can be expressed in terms of the mean entropy of D d b (Corollary 6.26). Our goal in this section and the next is to obtain a similar formula for the dimension of linear projections.
Recall the notation µ D , µ D from Section 6.6. It is convenient to introduce a shorthand notation: Definition 9.5. For a fixed base b ≥ 2 and µ ∈ P(R d ), µ x,n = µ D b n (x) µ x,n = µ D b n (x) whenever they are defined.
Note that we have suppresses the base b in the notation. Definition 9.6. µ ∈ P([0, 1] d ) generates a distribution P ∈ P(P([0, 1] d )) in base b, if for µ-a.e. x the sequence (µ x,n ) ∞ n=0 equidistributes for P , i.e. The main examples of measures satisfying the previous definition arise from geometric versions of CP-distributions (recall Definition 6.22): Lemma 9.7. Let P ∈ P(Φ) be an ergodic base-b symbolic CP-distribution and P its geometric marginal. Then for P -a.e. µ, the measure µ generates P at µ-a.e. x.
Proof. We assume as always that P -a.e. µ gives no mass to the boundaries of b-adic cells.
Let P ∈ P(Φ N ) correspond to P and let Q ∈ P(P(Ω)) denote the projection of P to the second coordinate of Φ = Λ × P(Ω). By the ergodic theorem, for P -a.e. ( i, µ) ∈ Φ N , Write π : Ω → [0, 1] d for the symbolic coding. Since π is continuous we can apply it to the limit above and conclude that for P -a.e. ( i, µ), δ πµn (weak-*) Since µ n = µ i 1 ...in 0 (see the proof of Proposition 6.18) and π(µ i 1 ...in 0 ) = (πµ 0 ) π i,n (since the boundaries of b-adic cells are µ-null), this implies that µ 0 generates P at x = π i. Conditioned on µ 0 the point i is distributed according to µ 0 (Proposition 6.18), so x = π i is distributed according to πµ 0 , hence πµ 0 generates P . This happens for P -a.e. ( i, µ), which is equivalent to what we wanted to prove.
Remark 9.8. There is also a converse: if µ ∈ P(R d ) generates a distribution P at µ-a.e. point, then P is the geometric marginal of a CP-distribution. We do not use or prove this fact, see [8].
Proof. Write δ µ x,n so P n → P weak-* . Note that by Lemma 6.20, Therefore, by the same lemma and the fact that P x,N → P weak-* , The second statement is immediate from the first using the fact that a.e. measure for a geometric, ergodic CP-distribution generates the distribution along b-adic cells.

Semicontinuity of dimension for CP-distributions
We now consider typical measures for a ergodic CP-distribution, which, by Lemma 9.7. The following proposition shows that such for measures the lower bound on dimension that was given in Theorem 9.10 is an equality. Proposition 9.11. Let P ∈ P(P([0, 1] d )) be the geometric marginal of an ergodic base-b CP-distribution and π ∈ Π d,k . Then dim πµ = e(P, π) for P -a.e. µ and e(P, π) = lim n→∞ e n (P, π) (i.e., the limsup in the definition of e(P, π) is a limit).
Therefore for large r, e r (P, π ) ≥ e k (P, π) − δ k − C k Hence e(P, π ) = lim r→∞ e r (P, π ) ≥ e k (P, π) − δ k − C k This inequality holds for all π ∈ U π,k , and since the right hand side tends to e(P, π) as k→ ∞, the claim follows.
Remark 9.13. For P -typical µ we have dim πµ = e(P, π) (Proposition 9.11). Hence there is semicontinuity of the projected dimension when one randomizes over µ. It is not known if for P -a.e. µ the function π → dim πµ coincides with π → e(P, π).
Lemma 9.14. If P is the geometric version of an ergodic CP-distribution then e(P, π) = min{k, dim P } for a.e. π ∈ Π d,k .
Corollary 9.15. Let P ∈ P(P([0, 1] d )) be a the geometric version of an ergodic CPdistribution, and µ a measure which generates P at a.e. point. Then for every ε there is a dense open set of projections π ∈ Π d,k such that dim πµ > min{k, dim P } − ε. In particular, the set {π ∈ Π d,k : dim πµ = min{k, dim P }} contains a dense G δ .
Proof. Let α denote the dimension of P -typical measures. By Lemma 4.5 e(P, π) ≤ min{k, α} for every π ∈ Π d,k . Thus min{k, α} is an upper bound for e(P, ·) : Π d,k → R, and by the last theorem this upper bound is attained on a set of full measure, and hence on a dense subset of Π d,k . Since the set of maxima of a lower semi-continuous function is a G δ and e(P, ·) is lower semi-continuous, the conclusion follows.

Projections of products of f a -and f b -invariant sets
For u ∈ R we again write π u (x, y) = ux + y Lemma 9. 16. Let E ⊆ R 2 , u ∈ R and s, t ∈ N we have dim π u ((f s × f t )(E)) = dim π us/t E Proof. On each cell I × J , I ∈ D s , J ∈ D t the map f s × f t | [0,1] 2 is affine and given by (x, y) → (sx, tx) + a for some a = a I,J ∈ R 2 . Thus π u (f s × f t | I×J (x, y)) = π u ((sx, ty) + a) = usx + ty + π u a = t · π us/t (x, y) + π u (a) = ψ I,J • π us/t (x, y) where ψ is an affine map of R which, being bi-Lipschitz, preserves dimension. Therefore dim π u (f s × f t (E ∩ (I × J))) = dim π su/t (E ∩ (I × J)) Since E = I∈Ds,J∈Dt (E ∩ (I × J)), the claim follows by Lemma 2.12 (2).
Theorem 9.17. Let X be closed and f a invariant, let Y be closed and f b -invariant, and a ∼ b. Then dim π u (X × Y ) = min{1, dim Y + dim X} for every u = 0.
Proof. Let Z = X × Y and for each ε > 0. We wish to show that dim π u Z > min{1, dim Z} − ε. Now, for any m, n ∈ N the set Z is invariant under f a m × f b n = f m a × f n b , so by the Lemma 9.16, dim π u Z = dim π u ((f m a × f n b )(Z)) = dim π u·a m /b n Z for all m, n ∈ N Therefore it suffices to show that dim π ua m /b n Z > min{1, dim Z} − ε for some m, n ∈ N.
By assumption log a/ log b / ∈ Q, so a m /b n is dense in R + . Therefore it suffices to show that the set U ε = {π ∈ Π 2,1 : dim πZ > min{1, dim Z} − ε} has non-empty interior.
To show this we construct an ergodic base-a CP-distribution P such that dim P = dim Z and for P -a.e. µ there is a u ∈ R + such that, writing L(x, y) = (x, uy) mod 1, the measure Lµ is supported on Z. We first note that Z has equal box and Hausdorff dimension (since X, Y have this property), so 1 k log a log N (Z, D a k ) → dim Z. We construct a CP-distribution as in the proof of Theorem 7.8, starting from measures µ k ∈ P(Z) such that H(µ k , D a k ) = log N (Z, D a k ), and passing to an ergodic component for which dim P ≥ dim Z, and in fact there is equality because P -a.e. µ satisfies Lµ(Z) = 1, a fact also proved as in Theorem 7.8.
Let us now replace P with its geometric version. Fixing a P -typical µ, we know from Theorems 9.10 and 9.12 that π v → dim π v µ is bounded below by a lower semi-continuous function which is a.e. equal to min{1, dim Z}, so, for the measure µ = Lµ| Z , the map π v → dim π v µ is bounded below by a similar function, and in particular the set V ε = {π ∈ Π 2,1 : dim πµ > min{1, dim Z} − ε} is open and non-empty (in fact dense) in Π 2,1 . Since dim πZ ≥ dim πµ for all π ∈ Π 2,1 , we have V ε ⊆ V ε , so V ε , as desired.
Remark 9.18. One can show that the same result holds for products of invariant measures, but establishing a relation between the product measure and an appropriate CPdistribution requires a little more work, see [9].