1 Introduction

1.1 Background

A number \(x\in [0,1]\) is called \(n\)-normal, or normal in base \(n\), if \(\{n^{k}x\}_{k\in \mathbb {N}}\) equidistributes modulo 1 for Lebesgue measure. This is the same as saying that the sequence of digits in the base-\(n\) expansion of \(x\) has the same limiting statistics as an i.i.d. sequence of digits with uniform marginals. It was E. Borel who first showed that Lebesgue-a.e. \(x\) is normal (in every base); thus the \(n\)-ary expansion of a typical number is maximally random. It is generally believed that, absent obvious obstructions, this phenomenon persists when it is relativised to “naturally” defined subsets of the reals, i.e. that typical elements of well-structured sets, with respect to appropriate measures, are normal, unless the set displays an obvious obstruction. Taking this to the extreme and applying it to singletons one arrives at the folklore conjecture that natural constants such as \(\pi ,e,\sqrt{2}\) are normal in every base. While the last conjecture seems very much out of reach of current methods, there are various positive results known for more substantial sets, often “fractal” sets. The present paper is a contribution in this direction.

It is better to work with measures than with sets, and it will be convenient to say that a measure \(\mu \) is pointwise \(n\)-normal if it is supported on \(n\)-normal numbers. The first results on the problem above were obtained independently by Cassels and Schmidt in the late 1950s [13, 51]. Motivated by a question of Steinhaus, who asked whether normality in infinitely many bases implies it for all bases, they showed that the Cantor-Lebesgue measure \(\mu \) on the middle-\(\frac{1}{3}\) Cantor set is pointwise \(m\)-normal whenever \(m\) is not a power of \(3\). This answers Steinhaus’s question negatively since no number in the middle-\(\frac{1}{3}\) Cantor set is \(3\)-normal.

The proofs of Cassels and Schmidt are analytical: they establish rapid decay, as \(N\rightarrow \infty \), of the \(L^{2}(\mu )\) norms of the trigonometric polynomials \(\frac{1}{N}\sum _{k=0}^{N-1}e(mn^kt)\) appearing in Weyl’s equidistribution criterion (here and in what follows, \(e(s)=\exp (2\pi is)\)). An essentially sharp condition for pointwise \(n\)-normality in terms of these norms was provided a few years later by Davenport et al. [14]. The latter theorem underlies most subsequent work on the subject and is particularly effective when the measures are constructed with this method in mind, for example Riesz products, which are defined in terms of their Fourier transform. Many results have been obtained in this way by Brown, Pearce, Pollington, and Moran [911, 42, 46, 47]. However, for most “natural” measures the required norm bounds are nontrivial to obtain, if they can be obtained at all. They also are fragile in the sense that they do not persist when the measure is perturbed. The book [12] contains a thorough overview of many classical equidistribution results.

1.2 Main results

In this paper we give a new sufficient condition for pointwise \(n\)-normality, which is more dynamical and geometric in nature, and captures the spirit of the conjecture stated at the beginning of this introduction. Roughly speaking, we show that if the process of continuously magnifying the measure around a typical point does not exhibit any almost-periodic features at frequency \(1/\log n\), then the measure is pointwise \(n\)-normal. While the condition is not a necessary one, it is a natural one in many of the most interesting examples, and can be verified relatively easily in many cases where other methods fail. It also leads to many applications which we discuss below.

The condition is formulated in terms of an auxiliary measure-valued flow which arises from the process of “zooming in” on \(\mu \)-typical points. This procedure has a long history, going back variously to Furstenberg [22, 23], Zähle [55], Bedford and Fisher [3], Mörters and Preiss [43], and Gavish [25]; the following definitions are adapted from [27], where further references can be found. Let \(\mathcal {P}(X)\) denote the space of Borel probability measures on a metric space \(X\); when \(X\) is compact we equip it with the Borel structure, and then the space \(\mathcal {P}(X)\) is then compact and metrizable in the weak-* topology. Write \(\mathcal {M}\) for the space of Radon (locally finite Borel) measures on \(\mathbb {R}\) and \({{\mathrm{supp}}}\mu \) for the topological support of a measure \(\mu \in \mathcal {M}\). Let

$$\begin{aligned} \mathcal {M}^{{{}^{{}_\square }}}=\{\mu \in \mathcal {P}([-1,1])\,:\,0\in {{\mathrm{supp}}}\mu \} \end{aligned}$$

and for \(\mu \in \mathcal {M}^{{}^{{}_\square }}\) and \(t\in \mathbb {R}\), define \(S_{t}\mu \in \mathcal {M}^{{}^{{}_\square }}\) byFootnote 1

$$\begin{aligned} S_{t}\mu (E)=c\cdot \mu (e^{-t}E\cap [-1,1]) \end{aligned}$$

where \(c=c(\mu ,t)\) is a normalizing constant. For \(x\in {{\mathrm{supp}}}\mu \), similarly define the translated measure by \(\mu ^{x}(E)=c'\cdot \mu ((E+x)\cap [-1,1])\). The scaling flow is the Borel \(\mathbb {R}^{+}\)-flow \(S=(S_{t})_{t>0}\) acting on \(\mathcal {M}^{{}^{{}_\square }}\). The scenery of \(\mu \) at \(x\in {{\mathrm{supp}}}\mu \) is the orbit of \(\mu ^{x}\) under \(S\), that is, the one-parameter family of measures \(\mu _{x,t}=S_{t}(\mu ^{x})\), \(t\ge 0\).

Write \(\mathcal {D}=\mathcal {P}(\mathcal {P}([-1,1]))\), which is again compact and metrizable and \(\mathcal {P}(\mathcal {M}^{{}^{{}_\square }})\subseteq \mathcal {D}\).Footnote 2 For clarity we refer to elements of \(\mathcal {D}\) as distributions, whereas we continue to refer to the elements of \(\mathcal {M}^{{}^{{}_\square }}\) as measures. A measure \(\mu \in \mathcal {P}(\mathbb {R})\) generates a distribution \(P\in \mathcal {D}\) at \(x\in {{\mathrm{supp}}}\mu \) if the scenery at \(x\) equidistributes for \(P\) in \(\mathcal {D}\), i.e. if

$$\begin{aligned} \lim _{T\rightarrow \infty }\frac{1}{T}\int _{0}^{T}f(\mu _{x,t})\, dt=\int f(\nu )\, dP(\nu )\qquad \text{ for } \text{ all } f\in C(\mathcal {P}([-1,1])), \end{aligned}$$

and \(\mu \) generates \(P\) if it generates \(P\) at \(\mu \)-a.e. \(x\).

If \(\mu \) generates \(P\), then \(P\) is supported on \(\mathcal {M}^{{}^{{}_\square }}\) and \(S\)-invariant (while unsurprising this is not completely trivial since \(S\) acts discontinuously, see [27, Theorem 1.7] for the proof). We say that \(P\) is trivial if it is the distribution supported on the measure \(\delta _{0}\in \mathcal {M}^{{}^{{}_\square }}\), which is a fixed point of \(S\). It can be shown that if \(\mu \) generates a distribution, then it is the trivial one if and only if \(\mu \) gives full mass to a set of zero Hausdorff dimension (this follows from [27, Proposition 1.19]).

To an \(S\)-invariant distribution \(P\) we associate its pure-point spectrum \(\Sigma (P,S)\). This is the set of \(\alpha \in \mathbb {R}\) for which there exists a non-zero measurable function \(\varphi :\mathcal {M}^{{}^{{}_\square }}\rightarrow \mathbb {C}\) satisfying \(\varphi \circ S_{t}=e(\alpha t)\varphi \), \(t\in \mathbb {R}\), on a set of full \(P\)-measure. The existence of such an eigenfunction indicates that some non-trivial feature of the measures of \(P\) repeats periodically when the measures are magnified by a factor of \(e^\alpha \).

Finally, let \(f\mu \) denote the push-forward of the measure \(\mu \), i.e. \((f\mu )(A)=\mu (f^{-1}A)\). We note this is sometimes denoted \(f_{\#}\mu \).

Theorem 1.1

Let \(\mu \in \mathcal {M}\) be a measure generating a non-trivial \(S\)-ergodic distribution \(P\in \mathcal {D}\), and let \(n\in \mathbb {N}\), \(n\ge 2\). If \(\Sigma (P,S)\) does not contain a non-zero integer multiple of \(1/\log n\), then \(\mu \) is pointwise \(n\)-normal. Furthermore, the same is true for \(f\mu \) for all \(f\in {{\mathrm{diff}}}^{1}(\mathbb {R})\).

The non-triviality assumption means that the theorem does not apply to measures supported on zero-dimensional sets. This limitation is intrinsic to our methods.

The hypotheses of the theorem may seem restrictive, since general measures do not generate any distribution, let alone an ergodic one satisfying the spectral condition. However, “natural” measures arising in dynamics, fractal geometry or arithmetic, very often do generate an \(S\)-ergodic distribution (see e.g. [2527] for many examples), and the important hypothesis becomes the spectral one. It is possible to formulate a version of the theorem that applies to measures which do not generate a distribution in the above sense, but the result is less useful. See remark at the end of Sect. 5.4. In Sect. 8 we give some stronger versions of the theorem which are used in some of the later applications.

Finally, note that the theorem is not a characterization, and the presence of \(k/\log n\) in the pure point spectrum of \(P\) does not rule out pointwise \(n\)-normality. Indeed, if a measure is translated by a random, uniformly chosen distance, then the sceneries are not affected, but almost surely the measure becomes pointwise normal in every base (see also Theorem 1.7 below). It is worth mentioning though that the canonical example of a measure that is not pointwise \(n\)-normal is that of a singular measure on \([0,1]\) invariant and ergodic for \(x\mapsto nx\mod 1\). For such \(\mu \), the first author showed in [26] that, when the entropy is positive, the generated distribution indeed has a multiple of \(1/\log n\) in its spectrum.

There is some interest also in expansions of numbers in non-integer bases. Following Rényi [49], for \(\beta >1\) we define the \(\beta \)-expansion of \(x\in [0,1)\) to be the lexicographically least sequence \(x_n\in \{0,1,\ldots ,\lceil \beta -1\rceil \}\) such that \(x=\sum _{n=1}^\infty x_n\beta ^{-n}\). This sequence is obtained from the orbit of \(x\) under \(T_\beta :x\mapsto \beta x\mod 1\) in a manner similar to the integer case. It is known that \(T_\beta \) has a unique absolutely continuous invariant measure, called the Parry measure, and we shall say that \(x\) is \(\beta \)-normal if under \(T_\beta \) it equidistributes for this measure.

Recall that \(\beta >1\) is called a Pisot number if it is an algebraic integer whose algebraic conjugates are of modulus strictly smaller than \(1\). We adopt the convention that integers \(\ge 2\) are Pisot numbers. The dynamics of \(T_\beta \) is best understood for this class of numbers, and our results extend to them:

Theorem 1.2

Theorem 1.1 holds as stated for a Pisot number \(\beta >1\) in place of \(n\).

It is possible that the Pisot assumption is unnecessary but currently we are unable to prove this (but see also the discussion following Corollary 1.11 below). On the other hand, Bertrand-Mathis [5] proved that if \(\beta \) is Pisot and \(x\) is \(\beta \)-normal, then \(\{ \beta ^n x\}_{n=1}^\infty \) equidistributes on the circle. Hence we have:

Corollary 1.3

If \(\beta >1\) is Pisot and \(\mu \) satisfies the hypothesis of the Theorem 1.1 with \(\beta \) in place of \(n\), then \(\{\beta ^nx\}_{n=1}^\infty \) equidistributes modulo 1 for \(\mu \)-a.e. \(x\).

Before turning to applications let us say a few words about what goes into the proof of Theorem 1.2 (a more detailed sketch of the proof is given in Sect. 5.1). There are two main ingredients. The first involves the behavior of the dimension of measure under convolution. Specifically, among the measures of positive dimension invariant under \(x\mapsto \beta x\mod 1\), one can characterize Lebesgue measure (or the Parry measure) in terms of its dimension growth under convolutions. This part of the argument is special to the dynamics of \(x\mapsto \beta x\mod 1\) and is the main place where the Pisot property is used in the non-integer case. Most of the work then goes into showing that, if there were a measure \(\mu \) satisfying the hypothesis of the theorems above but not their conclusion, then one could concoct an invariant measure \(\eta \) violating the characterization alluded to above. The scheme above is a refinement of ideas we have used before in [27, 29].

The second ingredient in the proof, and one of the main innovations in this paper, applies in a more general setting than invariant measures for piecewise-affine maps of \([0,1]\). The proper context is that of a Borel map \(T\) of a compact metric space \(X\). Roughly speaking, we show how to relate the small-scale structure of a measure \(\mu \) on \(X\) to the distribution of \(T\)-orbits of \(\mu \)-typical points. This result, while classical in nature, appears to be new and we believe it may find further applications. We leave the discussion and precise statement to Sect. 2.

1.3 Applications

1.3.1 Normal numbers in fractals

As our first application we consider sets arising as attractors of iterated function systems, or, equivalently, repellers of uniformly expanding maps on the line (see below for definitions). We show that, under some weak regularity assumptions, if such a set is defined by nonlinear dynamics, or if the contraction rates of the defining maps satisfy a natural algebraic condition, then typical points in the set are \(n\)-normal. This should be interpreted in terms of the conjecture stated earlier: indeed, it implies that if such a set contains no \(n\)-normal numbers, then the set is essentially defined by linearFootnote 3 maps whose slopes are rational powers of \(n\), and in this sense the dynamics is similar to the canonical examples of sets without \(n\)-normal numbers, namely closed subsets of \([0,1]\) that are invariant under the piecewise-linear maps \(x\mapsto nx\mod 1\).

We start with the relevant definitions. An iterated function system (IFS) is a finite family \(\mathcal {I}=\{f_{0},\ldots ,f_{r-1}\}\) of strictly contracting maps \(f_{i}:I\rightarrow I\) for a compact interval \(I\subseteq \mathbb {R}\) (of course one can define IFSs in general metric spaces). The IFS is of class \(C^\alpha \) if all the \(f_i\) are. We shall say that the IFS \(\mathcal {I}\) is regular if the maps \(f_i\) are orientation-preserving injections, and the intervals \(f_i(I)\) are disjoint except possibly at their endpoints so, in particular, the so-called open set condition is satisfied. In this article we will only consider \(C^{1+\varepsilon }\) regular IFSs, but some of the assumptions can be relaxed. For example, the orientation-preserving assumption is just for simplicity and can be easily dropped.

The attractorFootnote 4 of \(\mathcal {I}\) is the unique nonempty compact set \(X\subseteq I\) satisfying

$$\begin{aligned} X=\bigcup _{i\in [r]} f_{i}(X) \end{aligned}$$

(here and throughout the paper, \([r]=\{0,\ldots ,r-1\}\)). There are a number of natural measures one can place on \(X\). One is the \(\dim X\)-dimensional Hausdorff measure, which for a \(C^{1+\varepsilon }\)-IFS is positive and finite on \(X\). Another good class are the self-conformal measures (also called self-similar measures if the maps \(f_i\) are linear), that is, measures satisfying the relation

$$\begin{aligned} \mu =\sum _{i\in [r]} p_i \cdot f_i\mu \end{aligned}$$

for a positive probability vector \((p_0,\ldots ,p_{r-1})\). Both of the examples above are special cases of Gibbs measures for Hölder potentials \(\varphi :X\rightarrow \mathbb {R}\). We will not define Gibbs measures, but rather rely on a standard property of such measures \(\mu \), namely, that there is a constant \(C>1\) such that for all finite sequences \(i_1,\ldots ,i_k\), \(j_1,\ldots ,j_\ell \in [r]\),

$$\begin{aligned} C^{-1} \le \frac{\mu (f_{i_1}\cdots f_{i_k}I)\mu (f_{j_1}\cdots f_{j_\ell }I)}{\mu (f_{i_1}\cdots f_{i_k}f_{j_1}\cdots f_{j_\ell }I)}\le C. \end{aligned}$$
(1)

We shall call measures satisfying this property quasi-product measures (or quasi-Bernoulli measures). This is a broader class than Gibbs measures for Hölder potentials; for example, it contains Gibbs measures for almost-additive sequences of potentials, see [2].

Our first result assumes an algebraic condition on the contractions. For a \(C^1\)-contraction \(f\) on \(\mathbb {R}\), we define its (asymptotic) contraction ratio to be \(\lambda (f)=f'(p)\), where \(p\) is the unique fixed point of \(f\). For affine \(f\) this is just the usual contraction ratio; to justify the name in the nonlinear case note that for every distinct pair of points \(x,y\),

$$\begin{aligned} \lambda (f)=\lim _{n\rightarrow \infty } -\frac{\log |f^n(x)-f^n(y)|}{n}. \end{aligned}$$

Write \(a\sim b\) if \(a,b\) are integer powers of a common number, equivalently \(\log a/\log b\in \mathbb {Q}\); otherwise write \(a\not \sim b\), in which case \(a,b\) are said to be multiplicatively independent.

Theorem 1.4

Let \(\mathcal {I}\) be a \(C^{1+\varepsilon }\) IFS that is regular in the sense above, and \(\beta >1\) a Pisot number.Footnote 5 If there exists an \(f\in \mathcal {I}\) with \(\lambda (f)\not \sim \beta \), then any quasi-product measure \(\mu \) for \(\mathcal {I}\) is pointwise \(\beta \)-normal, and so is \(g\mu \) for all \(g\in {{\mathrm{diff}}}^1(\mathbb {R})\).

The classical results of Cassels and Schmidt are special cases of this for certain IFSs consisting of affine maps with the same contraction ratio. We note that the result above is new even when the IFS is affine and contains maps with two multiplicatively independent contraction ratios; classical methods break down since nothing seems to be known about the decay (or lack thereof) of the Fourier transform of natural measures on such attractors.

Our second result says that nonlinearity and enough regularity are sufficient for pointwise normality, irrespective of algebraic considerations. More precisely, we say an IFS \(\mathcal {I}=\{f_i\}\) is linear if all of the maps in \(\mathcal {I}\) are affine maps, and non-linear otherwise. We say that \(\mathcal {I}\) is totally non-linear if it is not conjugate to a linear IFS via a \(C^1\) map; here an IFS is \(\mathcal {J}\) is \(C^\alpha \)-conjugate to \(\mathcal {I}\) if it has the form \(\mathcal {J}=g\mathcal {I}=\{gf_ig^{-1}\}\) for a \(C^\alpha \)-diffeomorphism \(g\).

Theorem 1.5

Let \(\mathcal {I}\) be a \(C^{\omega }\) IFS that is regular in the sense above and \(\beta >1\) a Pisot number. If \(\mathcal {I}\) is totally non-linear, then any quasi-product measure \(\mu \) for \(\mathcal {I}\) is pointwise \(\beta \)-normal, and so is \(g\mu \) for all \(g\in {{\mathrm{diff}}}^1(\mathbb {R})\).

The two theorems above have substantial overlap and each of them is generic in the appropriate space of IFSs. The algebraic condition is generally the easier one to verify, and the regularity assumptions are weaker, though it seems very probable that weaker regularity assumptions are sufficient in the totally non-linear case also.

It also seems very likely that non-linearity, rather than total non-linearity, should suffice in Theorem 1.5. We are able to prove such a result for a smaller class of measures. Namely,

Theorem 1.6

Let \(\mathcal {I}\) be a \(C^\omega \) IFS that is regular in the sense above, and \(\beta >1\) a Pisot number. If \(\mathcal {I}\) is non-linear, then every self-conformal measure for \(\mathcal {I}\) is pointwise \(\beta \)-normal.

In this theorem, the totally non-linear case is covered by Theorem 1.5. In the conjugate-to-linear case, if \(g\in {{\mathrm{diff}}}^\omega (\mathbb {R})\) conjugates \(\mathcal {I}\) to a linear IFS \(\mathcal {J}=g\mathcal {I}\), then \(\mu =g^{-1}\nu \) where \(\nu \) is a self-similar measure for \(\mathcal {J}\), and \(g\) is not affine (since \(\mathcal {J}\) is linear and \(\mathcal {I}=g^{-1}\mathcal {J}\) is not). Also, it is a remarkable consequence of the work of Sullivan [53] and Bedford-Fisher [3] that if a \(C^{\alpha }\)-IFSs, \(\alpha >1\), is \(C^1\)-conjugate to a linear IFS then it is also \(C^\alpha \)-conjugate to a linear IFS (see [3, Theorem 7.5]). Thus, Theorem 1.6 follows from the following one:

Theorem 1.7

Let \(\mu \) be a self-similar measure for a linear IFS that is regular in the sense above. Then for any non-affine real-analytic \(g\in {{\mathrm{diff}}}^\omega (\mathbb {R})\), \(g\mu \) is pointwise \(\beta \)-normal for every Pisot \(\beta >1\).

Here is one concrete consequence of the results above.

Corollary 1.8

Let \(\mu \) denote the Cantor-Lebesgue measure on the middle-1/3 Cantor set. Then \(x^2\) is \(3\)-normal for \(\mu \)-a.e. \(x\).

The point is, of course, that no points in the middle-1/3 Cantor set are \(3\)-normal themselves.

The corollary above is immediate from the previous theorem and the use of the square function is incidental. In fact for we could replace \(x^2\) with \(f(x)\) for \(f\in {{\mathrm{diff}}}^2\). From Theorem 1.4 we can reduce the regularity to \({{\mathrm{diff}}}^1\) if we only want \(n\)-normality for \(n\not \sim 3\). These differences perhaps indicate that our regularity assumptions may be suboptimal. Note that for \(f=\) identity, this again is the theorem of Cassels and Schmidt, and their spectral methods carry over to translations, but the stability under perturbation is new even for affine \(f\). Related to this question, we note that Bugeaud, Fishman, Kleinbock and Weiss [8] have shown that for many fractals sets, including self-similar sets satisfying the open set condition, there is a full-dimension subset consisting of numbers which are not normal in any integer base. Moreover their result holds for any bi-Lipschitz image of the set. The stability of our results under bi-Lipschitz transformations remains open.

While this paper was in revision we learned of Kaufman’s [35] paper. Kaufman studies differentiable images of certain Bernoulli convolutions, and obtains polynomial decay of the Fourier transform of their image under \(C^2\) diffeomorphisms, implying in particular pointwise normality of the images. His results apply to linear self-similar measures defined by two maps with the same contraction ratio, and equal weights (it is likely the method can be adapted to more than two maps, but unlikely that the equicontraction assumption can be dropped with current methods). In particular the last corollary follows from Kaufman’s work.

1.3.2 Host’s theorem and measure rigidity

Let \(n\in \mathbb {N}\) and let \(T_n:[0,1]\rightarrow [0,1]\) denote the map \(T_nx=nx\mod 1\). An important phenomenon concerning these maps is measure rigidity: a well-known conjecture of Furstenberg states that, if \(m\not \sim n\), then the only probability measures jointly invariant under \(T_m\) and \(T_n\) are combinations of Lebesgue measure and atomic measures on rational points. This conjecture, known as the times-2, times-3 conjecture, is the prototype for many similar conjectures in other contexts, see e.g. [38]. The best result towards it is due to Rudolph and Johnson [33, 50]: if a measure has positive entropy and is jointly invariant and ergodic under \(T_m,T_n\) for \(m\not \sim n\), then it is Lebesgue. Although nothing is known about the zero-entropy case, in the positive entropy case there is a pointwise strengthening of the Rudolph-Johnson theorem for \(\gcd (m,n)=1\), due to Host [31, Théorème 1]:

Theorem 1.9

Let \(m,n\ge 2\) be integers and \(\gcd (m,n)=1\). Suppose \(\mu \) is an invariant and ergodic measure for \(T_{n}\) of positive entropy. Then \(\mu \) is pointwise \(m\)-normal.

This implies the Rudolph-Johnson theorem in the case \(\gcd (m,n)=1\): if \(\mu \) is a jointly \(T_m,T_n\) invariant measure and all \(T_n\) ergodic components have positive entropy, then by the theorem \(\mu \)-a.e. point equidistributes for Lebesgue under \(T_m\). But by the ergodic theorem, it also equidistributes for the ergodic component of \(\mu \) to which it belongs; hence \(\mu \) is Lebesgue.

The hypothesis of Host’s theorem, however, is stronger than it “should” be, i.e. it is stronger than the hypothesis of the Rudolph-Jonson Theorem. Lindenstrauss [37] showed that the conclusion holds under the weaker assumption that \(n\) does not divide any power of \(m\), but this is still too strong.Footnote 6 On the other hand, Feldman and Smorodinsky [20] had earlier proved a similar result assuming only that \(m\not \sim n\), but under the strong assumption that the measure \(\mu \) is weak Bernoulli. In that work it is conjectured that the same holds assuming only that \(\mu \) is ergodic and has positive entropy. The following theorem gives the result in its “correct” generality and for some non-integer bases, and also shows that it is stable under smooth enough perturbation.

Theorem 1.10

Let \(\beta ,\gamma >1\) with \(\beta \) a Pisot number, and \(\beta \not \sim \gamma \). Then any \(T_\gamma \)-invariant and ergodic measure \(\mu \) with positive entropy is pointwise \(\beta \)-normal. Furthermore the same remains true for \(g\mu \) for any \(g\in {{\mathrm{diff}}}^2(\mathbb {R})\).

Of course, the same is true under the assumption that all \(T_\gamma \)-ergodic components of \(\mu \) have positive entropy. Note the asymmetry in the requirement from \(\beta ,\gamma \). We do not know whether the Pisot assumption is unnecessary, but we note that Bertrand-Mathis [4] has obtained some complementary results for \(\gamma \) Pisot and \(\beta \) arbitrary, though only for measures that satisfy the weak-Bernoulli property with respect to the natural symbolic coding of \(T_\gamma \).

From this one derives a new measure rigidity result for \(\beta \)-maps.

Corollary 1.11

Let \(\beta ,\gamma >1\) with \(\beta \not \sim \gamma \) and \(\beta \) Pisot. If \(\mu \) is jointly invariant under \(T_{\beta },T_{\gamma }\), and if all ergodic components of \(\mu \) under \(T_\gamma \) have positive entropy, then \(\mu \) is the common Parry measure for \(\beta \) and \(\gamma \); in particular, it is absolutely continuous. The same holds if \(T_{\beta },T_{\gamma }\) are conjugated separately by \(C^{2}\)-diffeomorphisms.

Proof

If \(\mu \) is as in the statement, then by Theorem 1.10, \(\mu \)-almost all \(x\) equidistribute under \(T_\beta \) for the \(\beta \)-Parry measure (i.e. an absolutely continuous measure). On the other hand, by the ergodic theorem \(\mu \)-a.e. \(x\) equidistributes for the \(T_\gamma \)-ergodic component to which it belongs; hence \(\mu \) is also the Parry measure for \(T_\gamma \).

The latter assertion follows in the same way, using that \(g\mu \) is pointwise \(\beta \)-normal for all \(g\in {{\mathrm{diff}}}^2(\mathbb {R})\). \(\square \)

We hope to be able to eliminate the Pisot assumption in this result; this will be addressed in a forthcoming paper. Corollary 1.11 also improves [26, Corollary 1.5] by eliminating the ergodicity assumption. We do not know for what pairs \((\beta ,\gamma )\) the Parry measures coincide, or even whether this may happen for different non-integer \(\beta ,\gamma \).

1.3.3 Badly approximable normal numbers

Another application concerns continued fraction representations and their relation to integer expansions. Let \(\Lambda \subseteq \mathbb {N}\) be a finite set with at least two elements, and set

$$\begin{aligned} C_{\Lambda }\!=\!\{x\!\in \![0,1]\,:\, x \text{ has } \text{ only } \text{ symbols } \text{ from } \Lambda \text{ in } \text{ its } \text{ continued } \text{ fraction } \text{ expansion }\}. \end{aligned}$$

These sets are natural in Diophantine approximation since their union over all finite \(\Lambda \subseteq \mathbb {N}\) is the set of badly approximable numbers. The question of whether there are badly approximable normal numbers reduces to asking whether any of the \(C_\Lambda \) contain normal numbers. An affirmative answer follows from work of Kaufman [34], who, assuming \(\dim C_\Lambda >2/3\), constructed probability measures on \(C_{\Lambda }\) whose Fourier transform decays polynomially. The bound on the dimension was relaxed to \(\dim C_\Lambda >1/2\) by Queffélec and Ramaré [48]. Thus, for example, there are normal numbers whose continued fraction expansions consist only of the digits \(1,2\) (because \(\dim C_{\{1,2\}}>1/2\)). However, the methods from those papers fail below dimension \(1/2\), so, for example, it was not known whether there are normal numbers with continued fraction coefficients \(5,6\).

We note that \(C_\Lambda \) is the attractor of a regular IFS, namely \(\{ f_i\circ f_j:i,j\in \Lambda \}\), where \(\{ f_i\}\) are the inverse branches of the Gauss map (the reason for the compositions is that, although \(f_1\) is not a strict contraction, all the compositions \(f_i\circ f_j\) are). As an application of Theorem 1.4, we have:

Theorem 1.12

Any quasi-product measure on \(C_\Lambda \) (in particular the \(\dim C_\Lambda \)-dimensional Hausdorff measure) is pointwise \(\beta \)-normal for any Pisot \(\beta >1\).

Even when \(\dim C_\Lambda >1/2\), this improves the results of Kaufman, Queffélec and Ramaré, in that the result holds for a broader and more natural class of measures. The result on normality in non-integer Pisot bases is new in all cases. It seems very likely that the result holds also for Gibbs measures when \(\Lambda \subseteq \mathbb {N}\) is infinite, under standard assumptions on the Gibbs potential, but we do not pursue this.

One natural question is whether a reciprocal of Theorem 1.12 holds. For example, is it true that almost all points in the middle-\(1/3\) Cantor set are normal with respect to the Gauss map \(G\)? (i.e. they equidistribute under \(G\) for the Gauss measure, which is the only absolutely continuous \(G\)-invariant measure). To the best of our knowledge, it is not even known whether there exists a point which is Gauss normal but not \(n\)-normal for any \(n\) (in the positive direction, Einsiedler et al. [15] recently proved that almost all points in the middle-\(1/3\) Cantor set have unbounded partial quotients, i.e. are not contained in any \(C_\Lambda \)). Unfortunately, our methods do not seem to help with this problem. The (piecewise) linearity of \(T_\beta \) is strongly used in the part of the proof that deals with the geometric behavior of invariant measures under convolution. In particular, Theorem 5.5 seems to fail for the Gauss map and likely for most non-linear and many piecewise linear maps.

1.4 Organization of the paper

In the next section we state and prove a general result relating orbits of \(\mu \)-typical points to the structure of \(\mu \); this is the second main component of the proof of Theorem 1.2 referred to above. Section 3 collects some background on dimension. In Sect. 4 we recall some background on the pure point spectrum and eigenfunctions of flows, and discuss the class of distributions arising from scenery flows, called ergodic fractal distributions. We also introduce the concept of phase measure and its main properties. We prove Theorem 1.2 in Sect. 5 (with a key component postponed to Sect. 6). In Sect. 7 we derive Theorems 1.4, 1.5 and 1.12. Finally, in Sect. 8 we prove some variants of Theorem 1.1, and employ them to prove Theorems 1.10 and 1.7.

2 Relating the distribution of orbits to the measure

While most of our considerations in this paper are special to \(\mathbb {R}\), those in this section apply in the following very general setting. Let \(X\) be a compact metric space and \(T:X\rightarrow X\) a Borel measurable map.Footnote 7 For Borel probability measures \(\mu ,\nu \) on \(X\), let us say that a measure \(\mu \) is pointwise generic for \(\nu \) if \(\mu \)-a.e. \(x\) equidistributes for \(\nu \) under \(T\), that is,

$$\begin{aligned} \frac{1}{N}\sum _{n=0}^{N-1}f(T^nx)\rightarrow \int f\,d\nu \;\;\;\; \text{ for } \text{ every } f\in C(X). \end{aligned}$$
(2)

This notion appears in many contexts, although the name is not standard. Clearly when \(Tx=nx\mod 1\) and \(\nu \) is Lebesgue measure on \([0,1]\), this is the same as pointwise \(n\)-normality. A well-known variant appears in smooth dynamics: when \(X\) is a manifold, a measure \(\nu \) is called the Sinai-Ruelle-Bowen (SRB) measure if the volume measure on \(X\) is pointwise generic for \(\nu \). Other examples include the study of badly approximable points on analytic curves in \(\mathbb {R}^d\), and similar applications in arithmetic contexts.

While one does not expect to be able to say very much for arbitrary maps and measures, there is an obvious formal strategy to follow if one wants to prove that \(\mu \) is pointwise generic for \(\nu \): it is sufficient to show that for \(\mu \)-a.e. \(x\), if \(x\) equidistributes for a measure \(\eta \) along some subsequence of times (i.e. (2) holds along some \(N_k\rightarrow \infty \)), then \(\eta =\nu \).

To go any further with this scheme, one needs a way to relate measures \(\eta \) arising as above to the original measure \(\mu \). It is not obvious that such a relation exists: \(\eta \) is determined primarily by the point \(x\), and although \(x\) is \(\mu \)-typical, once it is selected, it would appear that the role of \(\mu \) has ended. However, it turns out that there is a very close connection between \(\eta \) and \(\mu \), provided by the theorem below. Roughly speaking, it shows that, under a mild technical condition, one can express \(\eta \) as a weak limit of “pieces” of \(\mu \), “magnified” via the dynamics.

For a finite measurable partition \(\mathcal {A}\) of \(X\), write \(T^{i}\mathcal {A}=\{T^{-i}A\,:\, A\in \mathcal {A}\}\) and \(\mathcal {A}^{n}=\bigvee _{i=0}^{n}T^{i}\mathcal {A}\) for the coarsest common refinement of \(\mathcal {A},T\mathcal {A},\ldots ,T^{n}\mathcal {A}\). Also let \(\mathcal {A}^{\infty }=\bigvee _{i=0}^{\infty }T^{i}\mathcal {A}\) denote the \(\sigma \)-algebra generated by the partitions \(\mathcal {A}^n\), \(n\ge 0\). We say that \(\mathcal {A}\) is a generator for \(T\) if \(\mathcal {A}^{\infty }\) is the full Borel algebra. When \(T\) is invertible, we similarly define \(\mathcal {A}^{\pm n}=\bigvee _{i=-n}^n T^i\mathcal {A}\) and \(\mathcal {A}^{\pm \infty }=\bigvee _{i=-\infty }^{\infty }T^{i}\mathcal {A}\), and say that \(\mathcal {A}\) is a generator if \(\mathcal {A}^{\pm \infty }\) is the full Borel algebra. Finally, we say that \(\mathcal {A}\) is a topological generator if \(\sup \{{{\mathrm{diam}}}A \,:\,A\in \mathcal {A}^n\} \rightarrow 0\) as \(n\rightarrow \infty \) (or, in the invertible case, the sup is over \(A\in \mathcal {A}^{\pm n}\)). A topological generator is clearly a generator.

Write \(\mathcal {A}(x)\in \mathcal {A}\) for the unique element \(A\in \mathcal {A}\) containing \(x\). Given \(\mu \in \mathcal {P}(X)\) and a point \(x\in X\) such that \(\mu (\mathcal {A}^{n}(x))>0\), let

$$\begin{aligned} \mu _{\mathcal {A}^{n}(x)}=c\cdot T^{n}(\mu |_{\mathcal {A}^{n}(x)}) \end{aligned}$$

where \(c=\mu (\mathcal {A}^{n}(x))^{-1}\) is a normalizing constant. For a.e. \(x\), this is well-defined for all \(n\).

Theorem 2.1

Let \(T:X\rightarrow X\) be a Borel-measurable map of a compact metric space, \(\mu \) be a Borel probability measure on \(X\) and \(\mathcal {A}\) a generating partition. Then for \(\mu \)-a.e. \(x\), if \(x\) equidistributes for \(\nu \in \mathcal {P}(X)\) along some \(N_{k}\rightarrow \infty \), and if \(\nu (\partial A)=0\) for all \(A\in \mathcal {A}^{n}, n\in \mathbb {N}\), then

$$\begin{aligned} \nu =\lim _{k\rightarrow \infty }\frac{1}{N_{k}}\sum _{n=1}^{N_{k}}\mu _{\mathcal {A}^{n}(x)}\qquad \text{ weak-* } \text{ in } \mathcal {P}(X). \end{aligned}$$
(3)

If, furthermore, \(\mathcal {A}\) is a topological generator, then the hypothesis on \(\nu \) follows if, for all \(m\),

$$\begin{aligned} \limsup _{k\rightarrow \infty }\frac{1}{N_{k}}\sum _{n=1}^{N_{k}}\mu _{\mathcal {A}^{n}(x)}(C_m^{(\varepsilon )})\;=\;o(1) \qquad \text{ as } \varepsilon \rightarrow 0, \end{aligned}$$
(4)

where \(C_m=\bigcup _{A\in \mathcal {A}^m}\partial A\), and \(C_m^{(\varepsilon )}\) is its \(\varepsilon \)-neighborhood.

Note that if in the right hand side of (3) we replace \(\mu _{\mathcal {A}^{n}(x)}\) by \(\delta _{T^{n}x}\), then the convergence to \(\nu \) is just a reformulation of the definition of equidistribution. Generally \(\mu _{\mathcal {A}^{n}(x)}\) and \(\delta _{T^{n}x}\) are very different measures and the content of the theorem is that these two sequences are nevertheless asymptotic in the Cesàro sense. This is quite surprising, and such a general fact can only be due to very general principles, as we shall see in the proof.

Proof

We give the proof assuming that \(\mathcal {A}\) is forward generating and comment on the invertible case at the end.

Let \(\mathcal {F}\) denote the set of linear combinations of indicator functions of \(A\in \mathcal {A}^{n}\), \(n\in \mathbb {N}\), with coefficients in \(\mathbb {Q}\). This is a countable algebra and, for \(x\in X\) and \(\nu \in \mathcal {P}(X)\) such that \(\nu (\partial A)=0\) for \(A\in \mathcal {A}^{n}\), it is well known that \(x\) equidistributes for \(\nu \) along \(N_{i}\) if and only if \(\lim \frac{1}{N_{i}}\sum _{n=1}^{N_{i}}f(T^{n}x) = \int f\, d\nu \) for every \(f\in \mathcal {F}\) (it is here that we use the assumption that \(\nu \) gives zero mass to the boundaries of \(A\in \mathcal {A}^n\)). Similarly, the limit in the conclusion of the theorem holds if and only if \(\lim \frac{1}{N_{i}}\sum _{n=1}^{N_{i}}\int f(x)\,d\mu _{\mathcal {A}^n(x)} = \int f\, d\nu \) for all \(f\in \mathcal {F}\). It follows, then, that to prove the theorem it suffices for us to show that for \(\mu \)-a.e. \(x\),

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{n=0}^{N-1}\left( \int f\,d\mu _{\mathcal {A}^{n}(x)}-f(T^{n}x)\right) \rightarrow 0\qquad \text{ for } \text{ every } f\in \mathcal {F}. \end{aligned}$$
(5)

Suppose that \(f=\sum a_{i}1_{A_{i}}\) where \(a_{i}\in \mathbb {Q}\) and \(A_{i}\in \mathcal {A}^{k}\) for some \(k\). Notice that by definition of \(\mu _{\mathcal {A}^{n}(x)}\),

$$\begin{aligned} \int f\, d\mu _{\mathcal {A}^{n}(x)}&= \frac{1}{\mu (\mathcal {A}^{n}(x))}\int _{\mathcal {A}^{n}(x)}T^{n}f\, d\mu \\&= \mathbb {E}_\mu (T^{n}f\,|\,\mathcal {A}^{n})(x) \end{aligned}$$

Writing \(g_{n}=\mathbb {E}_{\mu }(T^{n}f\,|\,\mathcal {A}^{n})-T^{n}f\), it suffices to show that \(\lim \frac{1}{N}\sum _{n=0}^{N-1}g_{n}=0\) \(\mu \)-a.e., and for this it clearly suffices to prove that \(\lim \frac{1}{N}\sum _{n=0}^{N-1}g_{kn+p}=0\) \(\mu \)-a.e. for \(0\le p\le k-1\).

Now, \(g_{n}\) is \(\mathcal {A}^{n+k}\)-measurable (because \(T^{n}f\) is \(\mathcal {A}^{n+k}\)-measurable); and on the other hand

$$\begin{aligned} \mathbb {E}_{\mu }(g_{n}\,|\,\mathcal {A}^{n})=\left( \mathbb {E}_{\mu }(T^{n}f\,|\,\mathcal {A}^{n})-\mathbb {E}_{\mu }(T^{n}f\,|\,\mathcal {A}^{n})\right) =0. \end{aligned}$$

Therefore, \(\{g_{p+kn}\}_{n=0}^{\infty }\) is an orthogonal system in \(L^{2}(\mu )\), since if \(j>i\) then

$$\begin{aligned} \int g_{p+ki}\,g_{p+kj}\, d\mu&= \int \mathbb {E}_{\mu }\left( g_{p+ki}\,g_{p+kj}|\mathcal {A}^{p+k(i+1)}\right) \, d\mu \\&= \int g_{p+ki}\cdot \mathbb {E}_{\mu }\left( g_{p+kj}|\mathcal {A}^{p+k(i+1)}\right) \, d\mu \\&= \int g_{p+ki}\cdot 0\, d\mu \\&= 0. \end{aligned}$$

Since the sequence \(\{ g_{p+kn}\}_{n=0}^\infty \) is also uniformly bounded in \(L^2(\mu )\), we conclude that \(\frac{1}{N}\sum _{n=0}^{N-1}g_{p+kn}\rightarrow 0\) a.e., see for instance [39]. (Alternatively, \(\{g_{p+kn}\}_{n=1}^{\infty }\) form a sequence of bounded martingale differences for the filtration \(\{\mathcal {A}^{p+kn}\}\), hence their averages converges a.e. to \(0\), see [21, Chapter 9, Theorem 3].)

We turn to the second statement. Assume that \(\mathcal {A}\) is a topological generator. We will show that the assumption (4) implies that \(\nu (\partial A)=0\) for \(A\in \mathcal {A}^n, n\in \mathbb {N}\). Fix \(n\) and \(C=C_n\) as in the statement. For \(\varepsilon >0\) let \(f_\varepsilon \in \mathcal {F}\) be such that \(1_C\le f_\varepsilon \le 1_{C^{(\varepsilon )}}\). Then, using (5) and the hypothesis (4), we get

$$\begin{aligned} \limsup _{k\rightarrow \infty } \frac{1}{N_k}\sum _{n=0}^{N_k-1}f_\varepsilon (T^n x)&= \limsup _{k\rightarrow \infty }\frac{1}{N_{k}}\sum _{n=0}^{N_{k}-1}\int f_\varepsilon \,d\mu _{\mathcal {A}^{n}(x)}\\&\le \limsup _{k\rightarrow \infty }\frac{1}{N_{k}}\sum _{n=0}^{N_{k}-1} \mu _{\mathcal {A}^{n}(x)}(C^{(\varepsilon )}) \\&= o(1)\qquad \text{ as } \varepsilon \rightarrow 0. \end{aligned}$$

Since \(\mathcal {A}\) is a topological generator, \(\mathcal {F}\) is uniformly dense in \(C(X)\), so the above conclusion holds also for \(f\in C(X)\) satisfying \(1_C\le f\le 1_{C^{(\varepsilon )}}\). Since \(x\) equidistributes for \(\nu \) along \(\{N_k\}\), this implies that \(\nu (C)=0\). \(\square \)

In the case that \(T\) is invertible we consider instead the algebra \(\mathcal {F}^\pm \) of \(\mathbb {Q}\)-linear combinations of indicators of sets from \(\mathcal {A}^{\pm n}=\bigvee _{i=-n}^{n}T^{i}\mathcal {A}\). The rest of the proof proceeds as before using the filtration \(\mathcal {A}^{\pm n}\).

3 Preliminaries on dimension

In this section we summarize some standard and some less well known facts about dimension.

3.1 Dimension of measures

The (lower) Hausdorff dimension of a finite non-zero Borel measure \(\theta \) on some metric space is defined by

$$\begin{aligned} \dim \theta =\inf \{\dim A\,:\,\theta (A)>0\;,\;A \text{ is } \text{ Borel }\}. \end{aligned}$$

Here \(\dim A\) is the Hausdorff dimension of \(A\). We note that this is only one of many possible concepts of dimension of a measure, but it turns out to be the appropriate one for our purposes because of the way it behaves under convolutions, i.e. the resonance and dissonance phenomena discussed in the following sections.

An alternative characterization that we will have occasion to use is given in terms of local dimensions:

$$\begin{aligned} \dim \theta ={{\mathrm{essinf}}}_{x\sim \theta }\underline{\dim }(\theta ,x), \end{aligned}$$
(6)

where

$$\begin{aligned} \underline{\dim }(\theta ,x)= \liminf _{r\downarrow 0} \frac{\log \theta (B(x,r))}{\log r} \end{aligned}$$

is the lower local dimension of \(\theta \) at \(x\). The equivalence is a version of the mass distribution principle, see [19, Proposition 4.9]. Note that this characterization shows that (when the underlying space is compact) the dimension is a Borel function of the measure in the weak\(^*\) topology.

We briefly recall some other properties of the dimension which will be used throughout the paper without further reference. Clearly \(\dim (\theta |_E)\ge \dim \theta \) for any set \(E\) of positive measure, and \(\dim \) is invariant under bi-Lipschitz maps (since this is true for the dimension of sets); in particular it is invariant under diffeomorphisms. Dimension also satisfies the relations

$$\begin{aligned} \dim \sum _i \theta _i&=\inf _i\dim \theta _i,\\ \dim \int \theta _\omega \,dQ(\omega )&\ge {{\mathrm{essinf}}}_{\omega \sim Q}\dim \theta _\omega . \end{aligned}$$

In particular, \(\dim \mu =\dim T\mu \) for any map \(T\) between intervals that is a piecewise diffeomorphism (such as the maps \(T_\beta \) or the Gauss map), as can be seen by writing the measure as a sum over countably many domains where the map is bi-Lipschitz. The same argument shows that dimension is invariant under the quotient map \(\mathbb {R}\rightarrow \mathbb {R}/\mathbb {Z}\).

Finally, we note that (6) implies that \(\dim \mu \times \nu \ge \dim \mu +\dim \nu \) (strict inequality is possible).

3.2 Projection theorems

It is a general principle that if \(\mu \) is a measure on some space \(X\) and \(f:X\rightarrow Y\) is a “typical” Lipschitz map, then the image measure \(f\mu \) will have dimension that is “as large” as possible: namely, it will have the same dimension as \(\mu \) itself if \(Y\) is large enough to accommodate this, and otherwise it will be as large as a subset of \(Y\) can possibly be, that is, it will have the same dimension as \(Y\). Thus one expects \(\dim f\mu =\min \{\dim \mu ,\dim Y\}\). There are many precise versions of this fact. The most classical is Marstrand’s projection theorem, concerning linear images of sets and measures on \(\mathbb {R}^2\). The following version is due to Hunt and Kaloshin [32, Theorem 4.1].

Theorem 3.1

If \(\eta \) is a probability measure on \(\mathbb {R}^2\), then for a.e. \(\alpha \in [0,\pi )\), \(\dim \pi _\alpha \eta =\min \{1,\dim \eta \}\), where \(\pi _\alpha \) is the orthogonal projection onto a line making angle \(\alpha \) with the \(x\)-axis.

In our applications, \(\theta \) will be a product \(\mu \times \nu \). In this particular case, we obtain

Corollary 3.2

Let \(\mu ,\nu \in \mathcal {P}(\mathbb {R})\). Then for almost all \(t\in \mathbb {R}\),

$$\begin{aligned} \dim (\mu *S_t\nu ) \ge \min (1,\dim \mu +\dim \nu ). \end{aligned}$$

Proof

The family of linear maps \(\{ P_t(x,y)=x+ty\}\) is a smooth reparametrization of the orthogonal projections \(\{ \pi _\alpha \}\), up to affine changes of coordinates which do not affect dimension. Hence, by Theorem 3.1,

$$\begin{aligned} \dim P_t(\mu \times \nu ) = \min (1,\dim (\mu \times \nu )) \ge \min (1,\dim \mu +\dim \nu )\quad \text {for a.e. } t. \end{aligned}$$

The corollary follows since \(\mu *S_t\nu \) is a restriction of \(P_t(\mu \times \nu )\) to a set of positive measure, and restriction does not decrease dimension. \(\square \)

We will have occasion to use the following refinement of the above.

Theorem 3.3

If \(\mu ,\nu \) are Borel probability measures on \(\mathbb {R}\) such that \(\dim \mu +\dim \nu >1\), then

$$\begin{aligned} \dim \{t\in \mathbb R: \dim (\mu *S_t\nu ) < 1\} < 1. \end{aligned}$$

Proof

Falconer [17] essentially proved the corresponding result for Hausdorff dimensions of sets, we indicate how to modify his proof to work with dimension of measures (the argument is standard). Let \(P_t(x,y)=x+ty\). In the course of the proof of [17, Theorem 1] it is shown that if \(\eta \) is a Borel probability measure on \(\mathbb {R}^2\) such that

$$\begin{aligned} \int \int \frac{d\eta (x)d\eta (y)}{|x-y|^s} < \infty \end{aligned}$$
(7)

for some \(s>1\), then the set \(E\) of parameters \(t\) such that the projection \(P_t\eta \) is not absolutely continuous, satisfies \(\dim (E)\le 2-s<1\) (as above, Falconer worked with orthogonal projections, but by reparametrization the same holds for the family \(\{P_t\}\)).

Let \(\rho =\mu \times \nu \). We only need to show that \(\dim (E)<1\), where

$$\begin{aligned} E=\{\alpha : P_\alpha \rho \text { is not absolutely continuous}\}. \end{aligned}$$

We have \(\dim \rho \ge \dim \mu +\dim \nu >1\). Using Eq. (6), it follows that there is \(s_0>1\) such that

$$\begin{aligned} \liminf _{r\downarrow 0} \frac{\log \rho (B(x,r))}{\log r} \ge s_0\quad \text {for }\rho \text {-a.e. }x. \end{aligned}$$

By Egorov’s Theorem, for any \(\varepsilon >0\) there are a set \(A_\varepsilon \) with \(\rho (A_\varepsilon )>1-\varepsilon \) and a constant \(r_\varepsilon >0\) such that

$$\begin{aligned} \rho (B(x,r)) \le r^{(1+s_0)/2}\quad \text {for all }x\in A_\varepsilon , 0<r<r_\varepsilon . \end{aligned}$$

It follows that \(\eta :=\rho |_{A_\varepsilon }\) satisfies (7) with \(s=1+(s_0-1)/4\) (say). Hence \(\dim (E_\varepsilon )\le 2-s\), where \(E_\varepsilon =\{\alpha : P_\alpha \rho |_{A_\varepsilon } \text { is singular}\}\). Since \(E\subseteq \bigcup _{n\in \mathbb {N}} E_{1/n}\) the result follows. \(\square \)

3.3 Further facts on dimension

For the part of the proof of Theorem 1.10 dealing with invariance under \(C^2\) diffeomorphisms, we will need some classical but perhaps less well-known facts about dimension. The material below will not be used anywhere except in this application.

It is always true that \(\dim (A\times B)\ge \dim (A)\times \dim (B)\) for Borel sets \(A,B\); however, the inequality may be strict. More generally, there is a “Cavalieri inequality” for Hausdorff dimensions. To get inequalities in the opposite direction, one needs to consider also packing dimension \(\dim _P\). The interested reader may consult e.g. [40] for its definition, but we shall only require the property given in the following proposition.

Proposition 3.4

Let \(E\subseteq \mathbb {R}^{d_1+d_2}\) be a Borel set.

  1. 1.

    Suppose there is a set \(A\subseteq \mathbb {R}^{d_1}\) of positive Lebesgue measure, such that for \(x_0\in A\), the fiber \(\{y:(x_0,y)\in E\}\) has Hausdorff dimension at least \(\alpha \). Then \(\dim (E)\ge d_1+\alpha \).

  2. 2.

    Let \(P_i\) be the coordinate projection onto \(\mathbb {R}^{d_i}\). Then \(\dim (E)\le \dim (P_1 E)+\dim _P(P_2 E)\).

The first part follows from [40, Theorem 7.7], and the second from [40, Theorem 8.10]

We now turn to measures. In a similar way to our definition of lower Hausdorff dimension \(\dim \), we may define upper packing dimension \(\dim _P\) as

$$\begin{aligned} \dim _P(\mu ) =\inf \{ \dim _P(E):\mu (E)=1 \}. \end{aligned}$$

(Note that in the definition of \(\dim \) the infimum is taken over sets of positive measure; here, it is taken over sets of full measure.) The following is an analog of Proposition 3.4 for measures.

Lemma 3.5

Let \(\mu \) be a measure on \(\mathbb {R}^{d_1+d_2}\), and let \(P_i\) be the coordinate projection onto \(\mathbb {R}^{d_i}\).

  1. 1.

    Suppose \(P_1\mu \) is absolutely continuous, and \(\dim (\mu _{x_0})\ge \alpha \) for \(P_1\mu \)-a.e. \(x_0\), where \(\mu _{x_0}\) is the conditional measure on the fiber \(\{(x,y):x=x_0\}\). Then \(\dim \mu \ge d_1+\alpha \).

  2. 2.

    \(\dim \mu \le \dim (P_1 \mu )+\dim _P(P_2 \mu )\).

Proof

For the first part, suppose \(\mu (E)>0\). Then there is a set \(A\) with \(P_1\mu (A)>0\) (and hence \(A\) has positive Lebesgue measure) such that \(\mu _x(E)>0\) for almost all \(x\in A\). The claim then follows from the corresponding statement for sets. The second part is established in a similar manner. \(\square \)

Finally, recall that a measure \(\mu \) is exact dimensional if the local dimension

$$\begin{aligned} \lim _{r\downarrow 0} \frac{\log \mu (B(x,r))}{\log r} \end{aligned}$$

exists and is \(\mu \)-a.e. constant. For exact dimensional measures \(\mu \), it is well known that \(\dim \mu =\dim _P\mu \), with both dimensions agreeing with the almost sure value of the local dimension; see [18, Proposition 2.3]. If \(\beta >1\) is Pisot and \(\mu \) is \(T_\beta \)-ergodic, then \(\mu \) is exact dimensional. This well known fact follows from the Shannon-McMillan-Breiman Theorem and, in the Pisot case, a classical Lemma of Garsia (see Lemma 6.2 below).

4 Ergodic fractal distributions, spectra and phase

4.1 Ergodicity and spectrum

Below we prove some basic facts relating the spectrum of a flow to the equidistribution properties of points under individual maps in the flow. The discussion is mostly valid for general flows on metric spaces but for simplicity we formulate them for \((\mathcal {M}^{{}^{{}_\square }},S)\).

Proposition 4.1

If \(P\in \mathcal {D}\) is \(S\)-ergodic and \(t_0>0\), then \(P\) is \(S_{t_0}\)-ergodic if and only if no non-zero multiple of \(1/t_0\) is in the pure point spectrum of \(P\).

Proof

\(S\) acts on the ergodic decomposition with respect to \(S_{t_0}\): \(P=\int P_\mu \,dP(\mu )\). Clearly this action is \(t_0\)-periodic. Thus the factor of \((P,S)\) with respect to the \(\sigma \)-algebra \(\mathcal {E}\) of invariant sets factors through the standard translation action of \(\mathbb {R}\) on \(\mathbb {R}/t_0\mathbb {Z}\). The only factors of the latter action are the trivial one, in which case \(\mathcal {E}\) is trivial and \(P\) is \(S_{t_0}\)-ergodic, or an action isomorphic to the translation action of \(\mathbb {R}\) on \(\mathbb {R}/(t_0/k)\mathbb {Z}\) for some \(k\in \mathbb {Z}\setminus \{0\}\), in which case this factor map defines an eigenfunction with eigenvalue \(k/t_0\). \(\square \)

Lemma 4.2

Let \(P\) be \(S\)-ergodic and \(t_0>0\). Then \(P\)-a.e. \(\mu \) equidistributes under \(S_{t_0}\) for an \(S_{t_0}\)-ergodic distribution \(P_\mu \), and \(P=\int P_\mu \,dP(\mu )\) is the ergodic decomposition of \(P\) under \(S_{t_0}\). If no multiple of \(1/t_0\) is in \(\Sigma (P,S)\), then \(P_\mu =P\) a.s.

Proof

Let \(P=\int P_\mu \,dP(\mu )\) be the ergodic decomposition of \(P\) with respect to the measure-preserving map \(S_{t_0}\). By the ergodic theorem, for \(P\)-a.e. \(\mu \), \(P_\mu \)-a.e. \(\nu \) equidistributes for \(P_\nu \). the first statement follows. For the second statement, if \(k/t_0\notin \Sigma (P,S)\) for all non-zero integers \(k\), then by the previous Proposition \(P\) is \(S_{t_0}\)-ergodic, and so \(P_\mu =P\) a.s. \(\square \)

We turn to distributions generated by a measure \(\mu \in \mathcal {P}(\mathbb {R})\). Given \(t_0>0\), we say that a distribution \(P\) is \(t_0\)-generated by \(\mu \) at \(x\) if \(\mu ^x\) equidistributes for \(P\) under the discrete semigroup \(\{S_{kt_0}\}_{k\in \mathbb {N}}\), that is, the sequence \(\{\mu _{x,kt_0}\}_{k=0}^\infty \) equidistributes for \(P\).

We have seen that if \(k/t_0\not \in \Sigma (P,S)\) for all non-zero integers \(k\), then \(P\)-a.e. \(\mu \) \(t_0\)-equidistributes for \(P\). The next result says that the same is true for any measure \(\mu \) that generates \(P\).

Lemma 4.3

Suppose \(\mu \) generates an \(S\)-ergodic distribution \(P\) and no non-zero integer multiple of \(t_0\) is an eigenvalue of \((P,S)\). Then \(P\) is \(t_0\)-generated at \(\mu \)-a.e. \(x\).

This is, essentially, the following well-known fact from ergodic theory, whose proof we provide for completeness:

Lemma 4.4

Let \(W=(W_{t})_{t>0}\) be a continuous flow on a compact metric space \(X\). Suppose \(\theta \) is a \(W\)-invariant and ergodic measure which does not have \(k/t_{0}\) is its pure point spectrum for any \(k\in \mathbb {Z}\setminus \{0\}\). Then any point \(x\) which equidistributes for \(\theta \) under \(W\) equidistributes for \(\theta \) also under the “time \(t_{0}\)” map \(W_{t_{0}}\).

Proof

As in Lemma 4.2, the spectral hypothesis implies ergodicity of \(\theta \) under the map \(W_{t_{0}}\). Now suppose that \(x\) equidistributes for a measure \(\theta '\) under \(W_{t_{0}}\) along a sequence \(N_{k}\rightarrow \infty \); it suffices to prove \(\theta '=\theta \). By continuity, \(\theta '\) is \(W_{t_{0}}\)-invariant, so \(W_t\theta '\) is \(W_{t_0}\)-invariant for every \(t\). Let \(\rho =\frac{1}{t_{0}}\int _{0}^{t_{0}}W_{t}\theta '\, dt\). Then for every \(f\in C(X)\),

$$\begin{aligned} \int f\, d\rho&= \lim _{k\rightarrow \infty }\frac{1}{t_{0}}\int _{0}^{t_0}\frac{1}{N_{k}}\sum _{n=0}^{N_{k}-1}f(W_{t_{0}}^{n} W_t x) dt\\&= \lim _{k\rightarrow \infty }\frac{1}{N_{k}t_{0}}\int _{0}^{N_{k}t_{0}}f(W_{t}x)\, dt\\&= \int f\, d\theta , \end{aligned}$$

where the last equality is because \(x\) equidistributes for \(\theta \). Thus \(\rho =\theta \), i.e. \(\frac{1}{t_{0}}\int _{0}^{t_{0}}W_{t}\theta 'dt=\theta \). Since \(\theta \) is \(W_{t_{0}}\)-ergodic and this is a representation of \(\theta \) as the integral of \(W_{t_{0}}\)-invariant measures, we conclude that \(W_{t}\theta '=\theta \) for a.e. \(t\). Since \(\theta \) is \(W\)-invariant this holds for \(t=0\), i.e. \(\theta '=\theta \), as desired. \(\square \)

A priori this does not apply in our situation, because the topological assumptions are not satisfied (\(S\) acts discontinuously, and is not everywhere defined on \(\mathcal {P}(\mathcal {P}([-1,1]))\)). However, the only place in the proof that continuity was used was in the assertion that \(\theta '\) is \(W_{t_0}\)-invariant. In the context of Lemma 4.3 this is true at \(\mu \)-a.e. point by [27, Theorem 1.7]. Thus, we have proved Lemma 4.3.

4.2 Ergodic fractal distributions

Definition 4.5

An \(S\)-invariant distribution \(P\in \mathcal {D}\) is S-quasi-Palm if for every Borel set \(B\subseteq \mathcal {M}^{{}^{{}_\square }}\), \(P(B)=1\) if and only if for every \(t>0\), \(P\)-almost every measure \(\eta \) satisfies \(\eta _{x,t}\in B\) for \(\eta \)-almost all \(x\) such that \([x-e^{-t},x+e^{-t}]\subseteq [-1,1]\).

Definition 4.6

A distribution \(P\in \mathcal {D}\) which is supported on \(\mathcal {M}^\square \), \(S\)-invariant and satisfies the \(S\)-quasi-Palm property is called a fractal distribution, or FD. If, in addition, \(P\) is \(S\)-ergodic, then \(P\) is called an ergodic fractal distribution, or EFD.

This definition differs slightly from the one introduced and studied in [27]. More precisely, the notion of quasi-Palm in [27] is suited for distributions on Radon measures on \(\mathbb {R}\), rather than distributions on probability measures on \([-1,1]\), and the notion of EFDs there is for distributions on Radon measures that are invariant under the action of a semigroup \(S^*\), which is defined similarly to \(S\) but without restricting the measures to a bounded interval, so that \(S^*\) acts on measures of unbounded support (our \(S\) is denoted by \(S^\Box \) in [27]). For this reason, in the definition of quasi-palm measure given in [27] there is no need to assume that \([x-e^{-t},x+e^{-t}]\subseteq [-1,1]\), and it has \(\mu ^x\) in place of \(\mu _{x,t}\). However, it is proved in [27, Lemma 3.1] that \(S\)-invariant and \(S^*\)-invariant distributions are canonically in one-to-one correspondence. Hence any EFD according to our definition arises as the push-forward of an EFD in the sense of [27] under the map \(\mu \mapsto \mu |_{[-1,1]}\). Therefore all results proved for EFDs in [27] continue to be valid with our definition of EFD. In particular, the following is proved in [27, Theorem 1.7].

Theorem 4.7

For \(\mu \) almost all \(x\), any distribution \(P\) generated by \(\mu \) at \(x\) along a sequence of times \(T_i\) is a FD (i.e. it is \(S\)-invariant and automatically satisfies the \(S\)-quasi-Palm property).

In particular, if \(\mu \) generates an \(S\)-ergodic distribution \(P\), then \(P\) is an EFD.

For the rest of the section we fix an EFD \(P\), and shall draw some simple but important conclusions about it. We will repeatedly use the following consequence of the \(S\)-quasi-palm property:

Lemma 4.8

Let \(P\) be an EFD, and \(B\subseteq \mathcal {M}^{{}^{{}_\square }}\) a Borel set with the property that \(\eta \in B\) whenever \(S_t\eta \in B\) for some \(t\). Then \(P(B)=1\) if and only if for \(P\)-almost all \(\eta \) and \(\eta \)-almost all \(x\), the translation \(\eta ^x\) is in \(B\).

As a first application, we have:

Lemma 4.9

Let \(P\) be an EFD. Fix \(t_0>0\). \(P\)-a.e. \(\mu \) generates \(P\), and \(t_0\)-generates an \(S_{t_0}\)-ergodic component of \(P\) at \(\mu \)-a.e. point \(x\).

Proof

Let \(B\) be the set of \(\mu \) such that \(\mu \) generates \(P\) and \(t_0\)-generates an \(S_{t_0}\)-ergodic component of \(P\) at \(0\); then \(P(B)=1\) by ergodicity. Also, \(\eta \in B\) whenever \(S_t\eta \in B\), so the lemma follows from Lemma 4.8. \(\square \)

Recall that an \(S\)-invariant distribution is trivial if it is supported on the \(S\)-fixed point \(\delta _0\).

Lemma 4.10

If \(P\) is a non-trivial EFD then \(P\)-almost all measures are non-atomic.

Proof

Let \(a(\mu )=\mu (\{0\})\). It is clear that \(a(S_t\mu )\ge a(\mu )\), by definition of \(S\), so \(a\) is a.s. constant. Also it is clear that if \(a(\mu )>0\) then \(a(S_t\mu )\rightarrow 1\) as \(t\rightarrow \infty \), so if that were the case, \(a=1\) \(P\)-a.s. But this would imply that \(\mu =\delta _0\) a.s. and so \(P\) is trivial, contrary to assumption. Hence \(a=0\) \(P\)-a.s.; using Lemma 4.8 applied to the set \(\{\nu \,:\,a(\nu )=0\}\), we find that \(P\)-a.e. \(\nu \) satisfies \(a(\nu ^x)=0\) for \(\nu \)-a.e. \(x\), so \(\nu \) is non-atomic. \(\square \)

Lemma 4.11

Suppose that \(\mu \) \(t_0\)-generates \(P\) and \(P\) is supported on non-atomic measures. For every \(\varepsilon >0\) there is a \(\rho >0\) such that

$$\begin{aligned} \limsup _{N\rightarrow \infty } \frac{1}{N}\sum _{n=0}^{N-1}\sup \mu _{x,t_0 n}(I)<\varepsilon \quad \text {for }\mu \text {-almost all } x, \end{aligned}$$

where the supremum is over intervals \(I\subseteq [-1,1]\) of length \(|I|<\rho \).

Proof

Fix \(\varepsilon >0\) and let \(\mathcal {C}_\rho \) denote the set of measures \(\eta \) such that \(\eta (I)<\varepsilon \) for every open interval of length \(|I|<\rho \). Note that \(\mathcal {C}_\rho \) is open.

By the fact that \(P\) gives no mass to measures with atoms, for \(P\)-a.e. \(\eta \) there is a \(\rho =\rho _\eta >0\) depending on \(\eta \) such that \(\sup \eta (I)<\varepsilon \) where \(I\) ranges over open intervals of length \(\rho _\eta \). It follows that there is a \(\rho \) such that with \(P\)-probability \(>1-\varepsilon \) we have \(\rho <\rho _\eta \), and in particular \(P(\mathcal {C}_\rho )>1-\varepsilon \). Since \(\mu \) \(t_0\)-generates \(P\), we find that

$$\begin{aligned} \limsup _{N\rightarrow \infty } \frac{1}{N}\sum _{n=0}^{N-1}\delta _{\mu _{x,t_0 n}\in \mathcal {C}_\rho }\ge P(\mathcal {C}_\rho )>1-\varepsilon \end{aligned}$$

as required. \(\square \)

It is not hard to show that the same conclusion holds if one assumes only that \(\mu \) generates a non-trivial \(P\) (without necessarily \(t_0\)-generating it), but we will not use this fact.

In fact, not only are \(P\)-typical measures non-atomic; they also have positive dimension:

Proposition 4.12

Let \(P\) be an EFD. There is a number \(\delta \) such that \(P\)-a.e. \(\nu \) has \(\dim \nu =\delta \). If \(P\) is nontrivial then \(\delta >0\).

Proof

This follows from [27, Lemma 1.18]; we include a proof for completeness. For the first statement, restriction can only increase dimension, and scaling does not affect it, so for any measure \(\nu \) we have \(\dim S_t\nu \ge \dim \nu \). By ergodicity, the dimension is \(P\)-a.s. equal to some constant \(\delta \ge 0\).

Now assume that \(P\) is nontrivial, we need to show that \(\delta >0\). We will use the characterization of dimension using local dimension, recall Eq. (6). Write

$$\begin{aligned} f(\nu ) = \liminf _{r\downarrow 0} \frac{\log \nu ([-r,r])}{\log r}. \end{aligned}$$

By Lemma 4.8, it is enough to verify that there is \(\delta >0\) such that \(f(\nu )\ge \delta \) for \(P\)-a.e. \(\nu \) (note that the set \(B=\{ \nu : f(\nu )\ge \delta \}\) satisfies \(S^t\nu \in B\Rightarrow \nu \in B\)). But \(f\) is \(S\)-invariant, whence by ergodicity we only need to check that \(f(\nu )>0\) on a set of positive \(P\)-measure.

Now Lemma 4.10 and \(S\)-invariance ensure that \(g(\nu )\!=\!-\log \nu ([-1/2,1/2])\) satisfies \(\int g\,dP>0\). By the ergodic theorem applied to the (possibly non-ergodic) discrete-time system \(S_{\log 2}\),

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{\log \nu ([-2^{-N},2^{-N}])}{N\log 2} = \lim _{N\rightarrow \infty } \frac{1}{N\log 2} \sum _{n=0}^{N-1} g\left( S^{n\log 2}\nu \right) \end{aligned}$$

converges almost everywhere to a function of \(\nu \) with strictly positive integral; but the left-hand side equals \(f(\nu )\), so this completes the proof. \(\square \)

We will also need to know that \(P\)-typical measures are not “one-sided at small scales”.

Proposition 4.13

Let \(P\) be an EFD. For every \(\rho >0\), for \(P\)-a.e. \(\nu \) we have \(\inf \nu (I)>0\), where \(I\subseteq [-1,1]\) ranges over closed intervals of length \(\rho \) containing \(0\).

Proof

Let \(B=\{\nu :\nu [-\varepsilon ,0]=0\text { for some }\varepsilon >0\}\). It is enough to show that \(P(B)=0\). Indeed, if this is true then by symmetry also \(P(B')=0\) where \(B'=\{\nu :\nu ([0,\varepsilon ])=0\text { for some }\varepsilon >0\}\), and the claim follows since any interval of length \(\rho \) containing \(0\) contains either \([-\rho /2,0]\) or \([0,\rho /2]\).

Since \(B\) is \(S\)-invariant, by ergodicity we only need to show that \(P(B)<1\). Suppose otherwise. Since \(S^t\mu \in B\) implies that \(\mu \in B\), it follows from Lemma 4.8 that, for \(P\)-typical \(\nu \) and \(\nu \)-typical \(x\), there is \(\varepsilon (x)\) such that \(\nu ([x-\varepsilon (x),x])=0\). Take \(\varepsilon >0\) such that \(\nu (A)>0\), where \(A=\{x:\varepsilon (x)\ge \varepsilon \}\). The restriction \(\nu |_A\) has the property that the distance between any two distinct points in its support is at least \(\varepsilon \). However this can only happen for discrete measures, and we have already established in Lemma 4.10 that \(P\)-typical measures have no atoms. Hence \(P(B)<1\) and therefore \(P(B)=0\), as claimed. \(\square \)

4.3 Phase and synchronization

Suppose that \(\mu \) generates \(P\) and \(t_0\)-generates an \(S_{t_0}\)-ergodic distribution \(P_x\) at \(\mu \)-typical points \(x\). Let \(\varphi \) be an eigenfunction of the flow \((P,S)\) for some eigenvalue \(k/t_0\). Since \(\varphi \) is \(S_{t_0}\)-invariant, it is almost surely constant on each ergodic component of \(P\) under \(S_{t_0}\), hence it is \(P_x\)-a.s. constant for \(\mu \)-a.e. \(x\). This allows us to define the phase of \(\mu \) at \(x\) to be the a.s. value of \(\varphi \) on \(P_x\). We denote the phase by \(\varphi _\mu (x)\), and claim that it is a measurable function of \(x\). Indeed, write \(\varphi \) as the increasing limit of simple functions \(\varphi _n\), and note that \(\varphi _\mu (x) = \int \varphi \, dP_x = \lim _{n\rightarrow \infty } \int \varphi _n\, dP_x\). The map \(x\mapsto P_x\) is measurable in \(x\), since \(P_x\) arises as an almost-sure limit of measurable functions of \(x\), and hence \(x\mapsto \int \phi _n\,dP_x\) is measurable for each \(n\). By the limit above, also \(\phi _\mu \) is.

The push-forward of \(\mu \) to the unit circle by \(x\mapsto \varphi _\mu (x)\) gives a measure \(\theta =\theta _\mu \) which describes the distribution of phases, and is called the phase measure.Footnote 8

Lemma 4.14

For \(P\)-typical \(\nu \), let \(P_\nu \) denote the \(S_{t_0}\)-ergodic component of \(P\) to which \(\nu \) belongs. Then for \(P\)-a.e. \(\nu \), the phase of \(\nu \) is well defined at \(0\) and is equal to \(\varphi (\nu )\).

Proof

Fix an \(S_{t_0}\)-ergodic component \(P'\) of \(P\). Let \(z\) denote the \(P'\)-a.s. value of \(\varphi \). Now, for \(P'\)-a.e. \(\nu \) we know that \(\varphi (\nu )=z\) and, by the ergodic theorem, that \(\nu \) equidistributes for \(P'\) under \(S_{t_0}\). This shows that the phase of \(\nu \) is well defined at \(0\) and equal to \(z\). Since \(P\) is the integral of its ergodic components, the claim follows.\(\square \)

Proposition 4.15

For \(P\)-a.e. \(\nu \), the function \(\varphi _\nu \) is \(\nu \)-a.e. constant and \(\theta _\nu =\delta _{\varphi (\nu )}\).

Proof

By the \(S\)-quasi-Palm property and the last lemma, it is clear that for \(P\)-a.e. \(\nu \) and \(\nu \)-a.e. \(x\), the eigenfunction \(\varphi \) is well-defined on \(S_t(\nu ^x)\) for all large enough \(t\), and that this value is the phase of the distribution that is \(t_0\)-generated by \(\nu ^x\). Since \(x\mapsto \varphi _\nu (x)\) is measurable, it is enough to show that \(P\)-almost all \(\nu \) and all \(\varepsilon >0\),

$$\begin{aligned} \int \int |\varphi _\nu (y')-\varphi _\nu (y'')|\,d\nu (y')\,d\nu (y'') < \varepsilon \end{aligned}$$

Write \(A_\varepsilon \) for the set of \(\nu \) for which the above holds; we aim to show \(P(A_\varepsilon )=0\). Let

$$\begin{aligned} B_\varepsilon = \{ \nu : S_t\nu \in A_\varepsilon \text { for sufficiently large } t\}. \end{aligned}$$

By invariance, it is enough to show that \(P(B_\varepsilon )=1\).

By the Besicovitch differentiation theorem [40, Corollary 2.14(2)], for \(\nu \)-almost all \(x\),

$$\begin{aligned} \lim _{t\rightarrow \infty } \frac{\int _{[x-e^{-t},x+e^{-t}]} |\varphi _\nu (x)-\varphi _\nu (y)|d\nu (y)}{\nu ([x-e^{-t},x+e^{-t}])} \rightarrow 0\quad \text {as }t\rightarrow \infty , \end{aligned}$$

and therefore

$$\begin{aligned} \lim _{t\rightarrow \infty } \frac{\int _{[x-e^{-t},x+e^{-t}]^2} |\varphi _\nu (y')-\varphi _\nu (y'')|d\nu (y')d\nu (y'')}{\nu ([x-e^{-t},x+e^{-t}])^2} \rightarrow 0\quad \text {as }t\rightarrow \infty . \end{aligned}$$

From the eigenfunction property, \(\varphi _{\nu _{x,t}}(y)=e(-\alpha t)\varphi _\nu (x+e^{-t}y)\) for all \(t\), \(\nu \)-a.e. \(x\) and \(\nu _{x,t}\)-a.e. \(y\). It follows that for \(\nu \)-a.e. \(x\), the measure \(\nu _{x,t}\) is in \(A_\varepsilon \) for sufficiently large \(t\), i.e. \(\nu ^x\in B_\varepsilon \). But then we conclude from Lemma 4.8 that \(P(B_\varepsilon )=1\), as desired. \(\square \)

Finally we consider the effect of perturbation on the generated distributions and the phase of a measure.

Lemma 4.16

Let \(\nu \in \mathcal {P}(\mathbb {R})\).

  1. 1.

    Let \(f\in L^1(\nu )\), \(f\ge 0\) and \(\int f d\nu >0\), and write \(d\nu '=f\,d\nu \). Then for \(\nu '\)-a.e. \(x\), the sceneries of \(\nu \) and of \(\nu '\) at \(x\) are asymptotic. In particular, if \(\nu \) generates \(P\), then so does \(\nu '\).

  2. 2.

    Let \(I\) be an interval and \(f:I\rightarrow J\) an orientation-preserving diffeomorphism. Let \(\nu '=f(\nu )\). Then for \(\nu \)-a.e. \(x\), the sceneries \(\nu _{x,t}\) and \(\nu '_{f(x),t-\log f'(x)}\) are mean-asymptotic in \(\mathcal {P}([-1,1])\) in the sense that

    $$\begin{aligned} \lim _{T\rightarrow \infty }\frac{1}{T}\!\! \left( \int _0^T F(\nu _{x,t})\,dt \!-\! \int _0^T F(\nu '_{f(x),t\!-\!\ln f'(x)})\,dt\right) \!=\! 0\;\;\;\;\mathrm for all F\in C([\!-\!1,1]) \end{aligned}$$

    and similarly when one averages at discrete time steps of some size \(t_0\). In particular, if \(\nu \) generates \(P\) at \(x\) then \(\nu '\) generates \(P\) at \(f(x)\).

Proof

The first part is an immediate consequence of the Besicovitch differentiation theorem (see Mattila [40, Corollary 2.14(2)], or [27] for more detail).

The second part can be proved by adapting the argument in Proposition 1.9 of [27] or the forthcoming paper of Aspenberg, Ekström, Persson and Shmeling [1].Footnote 9 Here we only give a sketch. Consider the maps \(g_t(y)= e^t\cdot (y-x)\) and \(h_t(y)= e^t\cdot (f(y)-f(x))\), so that \(\nu _{x,t}=a_t\cdot g_t(\nu )|_{[-1,1]}\) and \(\nu '_{x,t}=b_t\cdot h_t(\nu )|_{[-1,1]}\) for normalizing constants \(a_t,b_t\) (we suppress the dependence on \(x\) in the notation). Using the linear approximation of \(f\) at \(x\), we see that the uniform distance \(\varepsilon (t)\) between the maps \(g_t\) and \(h_{t-\log f'(x)}\circ f\) on \([x-2e^{-t},x+2e^{-t}]\) tends to \(0\) as \(t\rightarrow \infty \). Thus we will be done if we show that for \(\nu \)-a.e. \(x\) we have \(a_t/b_t\rightarrow 1\) in the mean (Cesaro) sense. Now, for a given \(\delta >0\), in order to have \(|a_t/b_t-1|>\delta \), we must have \(\left| \frac{\nu (B_{e^{-t-\varepsilon (t)}}(x))}{\nu (B_{e^{-t+\varepsilon (t)}}(x))}-1\right| >\frac{\delta }{100}\). If this were to happen for a non-negligible proportion of \(t\)s in arbitrarily long intervals \([0,T_i]\) we would conclude that there is a distribution \(P\) generated by \(\nu \) at \(x\) along the times \(T_i\), such that, with positive \(P\)-probability, a measure \(\theta \) satisfies \(\theta (\{\pm 1\})>0\). This is impossible by Theorem 4.7, Lemma 4.10 and the ergodic decomposition.

In the discrete time case, suppose that when averaged at steps of size \(t_0\) the two sceneries are not a.s. mean-asymptotic. Passing to a subsequence, the we find that for a positive \(\mu \)-proportion of \(x\), there is a subsequence along which \(\mu \) generates some distribution \(P_x\) \(t_0\)-discretely at \(x\), and \(P_x\) gives positive mass to measures with atoms at \(\pm 1\). But then for \(\mu \)-a.e. such \(x\) one sees that \(P'_x=\int _{-t_0}^{0}S_tP_x\,dt\) is a FD supported on measures that have atoms at non-zero points, and we know this is impossible, because each ergodic component of \(P\) is an EFD [26] and is either trivial, in which case its measures have an atom only at \(0\), or non-trivial, in which case Lemma 4.10 applies (since the space of atomic measures with atoms is not closed, some more care must be taken in the last step, and one needs to use the fact that for \(P\) there is already a positive probability of finding atoms of mass bounded away from zero at locations bounded away from \(0,\pm 1\), and this translates to \(P'\). We omit the details). \(\square \)

Corollary 4.17

If \(\mu \) generates \(P\) \(t_0\)-discretely, \(P\) is \(S_{t_0}\)-ergodic and \(t_0\in \Sigma (P,S)\) with eigenfunction \(\varphi \), then

  1. 1.

    If \(\nu \ll \mu \), then \(\theta _\nu \) is well defined and \(\theta _\nu \ll \theta _\mu \).

  2. 2.

    If \(f\in {{\mathrm{diff}}}^1(\mathbb {R})\) and \(\nu =f(\mu )\), then \(\theta _\nu \) is well defined and

    $$\begin{aligned} \theta _\nu = \int \delta _{e(-t_0\log f'(x))\varphi _\mu (x)}\,d\mu (x). \end{aligned}$$

Proof

For (1), by the previous lemma, if \(\nu \ll \mu \) then for \(\nu \)-a.e. \(y\), the distribution \(t_0\)-generated by \(\mu \) and \(\nu \) at \(y\) is the same, and the claim follows. For (2), fixing a \(\mu \)-typical \(x\), by the second part of the previous lemma, \(\mu _{t,x}\) and \(\nu _{f(x),-\log f'(x)}\) generate that same distribution \(t_0\)-discretely. HenceFootnote 10 \(\nu \) generates \(S_{-\log f'(x)}P\) \(t_0\)-discretely at \(f(x)\), and by the eigenfunction property,

$$\begin{aligned} \varphi _\nu (fx)=e(-t_0 \log f'(x)) \varphi _\mu (x), \end{aligned}$$

from which we deduce (2). \(\square \)

5 Proof of theorem 1.2

5.1 A sketch of the proof

We start by explaining the main steps involved in the proof of Theorem 1.2. This strategy will also apply for the generalizations considered in Sect. 8, with suitable modifications.

We start with a measure \(\mu \) on \([0,1]\) generating an EFD \(P\) such that \(k/\log \beta \notin \Sigma (P,S)\) for \(k\in \mathbb {Z}\setminus \{0\}\) for Pisot \(\beta >1\). We fix a \(\mu \)-typical \(x\) and suppose that \(x\) equidistributes under \(T_\beta \) for a measure \(\nu \) along some subsequence \(N_j\); our job is to show that \(\nu \) is in fact the Parry measure \(\lambda _\beta \). To accomplish this, there are three main steps involved:

  1. 1.

    The first step is to use Theorem 2.1 and the spectral hypothesis to establish that \(\nu \) can be represented as a superposition of measures drawn according to \(P\), each of them suitably translated, restricted and normalized. See Theorem 5.1 and the ensuing discussion for the general Pisot case.

  2. 2.

    We show that any \(T_\beta \)-invariant measure of positive dimension, other than the Parry measure, resonates with measures of arbitrarily large dimension (see Sect. 5.3 for the definition of resonance and dissonance). This is stated in Theorem 5.5 and proved in Sect. 6.

  3. 3.

    Using the first step, the \(S\)-invariance of \(P\) and Marstrand’s Theorem, we show that \(\nu \) dissonates with arbitrary measures of sufficiently large dimension (this step uses the nontriviality of \(P\)). Hence, in light of the second step, \(\nu \) must be the Parry measure. This step is carried over in Sect. 5.4.

We note that both the first and second steps use the algebraic assumption on \(\beta \) (in each case it can be slightly relaxed, but in different directions).

5.2 An integral representation

We begin with the details. From now on, we specialize to the interval \([0,1]\) and to maps of the form \(T_n:x\mapsto n x\mod 1\) for an integer \(n\ge 2\). We comment on the Pisot case afterwards. Let \(\mu \in \mathcal {P}([0,1])\) be a measure that generates a distribution \(P\) satisfying the spectral hypothesis in Theorem 1.1 (we do not assume that \(\mu \) is \(T_\beta \)-invariant; in fact, we will eventually apply the result of this section to measures \(\mu \) which are invariant under a different dynamics). We shall obtain a certain integral representation of the measures for which \(\mu \)-typical points equidistribute along sub-sequences.

Let \(\mathcal {A}\) denote the partition of \([0,1]\) into \(n\)-adic intervals, \([j/n,(j+1)/n)\). Note that \(\delta _y*\nu \) is the translate of the measure \(\nu \) by \(y\). Fixing \(x\), we claim that

$$\begin{aligned} \mu _{\mathcal {A}^k(x)} = c_k\cdot (\delta _{y_k} *\mu _{x,k\log n})|_{[0,1]} \end{aligned}$$
(8)

for some normalizing constant \(c_k\) and a number \(y_k\in [0,1]\). Indeed, \(\mu _{x,k\log n}\) is the restriction of \(\mu \) to the interval \(I\) of side \(2\cdot n^{-k}\) centered at \(x\), re-scaled to \([-1,1]\) and normalized; while \(\mu _{\mathcal {A}^k(x)}\) is obtained similarly from the restriction of \(\mu \) to an interval \(J=\mathcal {A}^k(x)\) of length \(n^{-k}\) around \(x\), re-scaled to the interval \([0,1]\) and normalized. Since \(J\subseteq I\), the representation (8) follows.

For a \(\mu \)-typical \(x\), suppose that \(x\) equidistributes for some measure \(\nu \) under \(T_n\), along a sequence \(N_j\). Since \(P\) is nontrivial, Lemma 4.11 applied with \(t_0=\log n\), together with the representation (8), imply that the condition (4) in Theorem 2.1 holds. Thus for some sequence \(N_j\rightarrow \infty \),

$$\begin{aligned} \nu = \lim _{j\rightarrow \infty }\frac{1}{N_j}\sum _{k=1}^{N_j}c_k\cdot (\delta _{y_k} *\mu _{x,k\log n})|_{[0,1]}\qquad \text{ weak-* } \text{ in } \mathcal {P}([0,1]) \end{aligned}$$
(9)

Passing to a further subsequence we may assume that the joint distribution of \(c_k,y_k\) and the measures converges, i.e. that \(\frac{1}{N_j}\sum _{k=0}^{N_j-1}\delta _{(c_k,y_k,\mu _{x,k\log n})}\) converges to a probability measure \(Q\) on \(\Omega =\mathbb {R}\times [-1,1]\times \mathcal {P}(\mathcal {P}([-1,1]))\) (to see that the distribution of the \(c_k\)’s is tight, we use Proposition 4.13). Moreover, thanks to Lemma 4.3, the measure marginal of \(Q\) is \(P\): this is the point of the proof where the spectral assumption is used.

Taking stock, we have proved the following representation of \(\nu \).

Theorem 5.1

Let \(\mu \) be a measure on \([0,1]\) which generates a distribution \(P\) at a.e. point, and \(\Sigma (P,S)\cap \frac{1}{\log n}\mathbb {Z}=\{0\}\). Then for \(\mu \)-a.e. \(x\), if \(x\) equidistributes under \(T_n\) for \(\nu \) along some subsequence, then there is an auxiliary probability space \((\Omega ,\mathcal {F},Q)\) and measurable functions \(c:\Omega \rightarrow (0,\infty )\), \(y:\Omega \rightarrow [-1,1]\) and \(\eta :\Omega \rightarrow \mathcal {P}[-1,1])\), such that \(\eta \) is distributed according to \(P\), and

$$\begin{aligned} \nu = \int c_\omega \cdot (\delta _{y_\omega }*\eta _\omega )|_{[0,1]}\,dQ(\omega ). \end{aligned}$$

Invoking Proposition 4.12, we immediately get:

Corollary 5.2

A measure \(\nu \) as in the theorem is of dimension at least \(\delta \) (the a.s. dimension of measures drawn according to \(P\)); in particular, \(\dim \nu >0\).

The changes needed to prove this for \(T_\beta \) and non-integral Pisot \(\beta \) are minimal. In this case one uses the partition \(\mathcal {A}\) of \([0,1]\) into intervals \([j/\beta ,(j+1)/\beta )\cap [0,1]\). The main difference is that now the identity (8) is not always true because the length of \(\mathcal {A}^k(x)\) is no longer constant, and so the left hand side of (8) is generally the restriction of the right hand side to a shorter interval (followed by normalization). If in (8) we replace the restriction on the right hand side with restriction to the appropriate interval \(I_k(x)\subseteq [0,1]\), then we obtain a representation of the same kind as in Theorem 5.1 but of the form

$$\begin{aligned} \nu = \int c_\omega \cdot (\delta _{y_\omega }*\eta _\omega )|_{I_\omega }\,dQ(\omega ) \end{aligned}$$
(10)

where \(I_\omega \subseteq [0,1]\) is a random interval. The missing ingredient in this argument is that a-priori the intervals \(I_k\) may be vanishingly short for a positive frequency of \(k\), and we must ensure that the distribution of lengths does not concentrate on \(0\), i.e. we must ensure that \(I_\omega \) is a.s. of positive length. This is where the Pisot property of \(\beta \) comes into play, via

Lemma 5.3

There is a constant \(c>1\) such that for any \(k\) and interval \(I\in \mathcal {A}^k\), the length of \(I\) satisfies \(c^{-1}\beta ^{-k}<|I|<c\beta ^{-k}\).

This is a consequence of a classical lemma of Garsia [24], stated more completely below, see Lemma 6.2. We note that the weaker version stated here continues to hold for the larger class of \(\beta \) for which the \(\beta \)-shift \(T_\beta \) satisfies the specification property, but for these numbers the results of the next section do not appear to hold.

5.3 Resonance and dissonance

As indicated in Sect. 5.1, the second idea we need for the proof of Theorem 1.1 is that, among invariant measures for \(T_\beta \) of positive dimension, the Parry measure can be identified by the behavior of its dimension under convolutions. Following terminology of Peres and Shmerkin [45], we say that measures \(\mu ,\nu \in \mathcal {P}(\mathbb {R})\) resonate if

$$\begin{aligned} \dim \mu *\nu <\min \{1,\dim \mu +\dim \nu \} \end{aligned}$$
(11)

otherwise they dissonate.

As a general rule, measures should dissonate; resonance requires, heuristically, that they have some common structure. This heuristic can be made precise in many ways. For example, as an immediate consequence of Corollary 3.2, we have

Theorem 5.4

If \(\mu ,\nu \) are Borel probability measures on \(\mathbb {R}\), then for Lebesgue-a.e. \(t\in \mathbb {R}\), the measures \(\mu \) and \(S_{t}\nu \) dissonate.

Moreover, suppose that \(\dim \mu |_I=\dim \mu \) for any interval \(I\) of positive \(\mu \)-measure. Then for a.e. \(t\), if \(I\) is any set of positive \(S_{t}\mu \)-measure, then \((S_{t}\mu )|_{I}\) and \(\nu \) dissonate.

Proof

This is a consequence of Theorem 3.1, and elementary properties of \(\dim \).\(\square \)

Unlike the “generic” case, where dissonance is the rule, for integer \(n\), \(T_n\)-invariant measures of dimension strictly between \(0\) and \(1\) do resonate, often with themselves and always with other \(T_n\) invariant measures. For example consider a \(T_n\)-invariant measure \(\mu \) with \(1/2<\dim \mu <1\). That \(\mu \) resonates with itself can be seen as follows. First, \(\mu *\mu \) has the same dimension as the dimension of the self-convolution \(\nu =\mu *\mu \) with the convolution taken in \(\mathbb {R}/\mathbb {Z}\) (this is because the map \(\mathbb {R}\rightarrow \mathbb {R}/\mathbb {Z}\) is a countable to 1 local isometry). Consider the Fourier transform: \(\hat{\nu }(k)=\hat{\mu }(k)^2\). Since \(\mu \) is not Lebesgue measure it has a non-zero coefficient, hence so does \(\nu \), and therefore \(\nu \) is not Lebesgue measure. But it is a well known fact that the only \(T_n\)-invariant measure of dimension \(1\) is Lebesgue measure, and \(\nu \) is \(T_n\)-invariant; hence \(\dim \nu =\dim \mu *\mu <1=\min \{1,\dim \mu +\dim \mu \}\).

We will require the following strengthening of the fact above.

Theorem 5.5

Let \(\beta >1\) be a Pisot number. Then there is a sequence of probability measures \(\tau _{1},\tau _{2},\ldots \) on \(\mathbb {R}\) with

$$\begin{aligned} \dim \tau _n\rightarrow 1 \quad \text {as } n\rightarrow \infty , \end{aligned}$$

such that any \(T_{\beta }\)-invariant measure \(\nu \) with \(0<\dim \nu <1\) resonates with \(\tau _{n}\) for all large enough \(n\).

In order not to interrupt the main line of argument, we postpone the proof to Sect. 6.

5.4 Proof of Theorem 1.1

Let \(\beta >1\) be a Pisot number. Let \(\mu \in \mathcal {P}([0,1])\) generate an \(S\)-ergodic and non-trivial distribution \(P\), and suppose that \(k/\log \beta \) is not in \(\Sigma (P,S)\) for any \(k\in \mathbb {Z}\setminus \{0\}\).

Let \(\nu _\beta \) be the unique absolutely continuous invariant measure for \(T_\beta \) (the Parry measure). The following fact is standard, but we include a proof as we have not been able to find a reference.

Lemma 5.6

The measure \(\nu _\beta \) is also the unique invariant measure of maximal dimension \(1\).

Proof

It is well known that \(\nu _\beta \) is the only measure of maximal entropy \(\log \beta \) ([30], see also [52, Remark 2.4]). Let \(\theta \ne \nu _\beta \) be another invariant measure. By the Shannon-McMillan-Breiman applied to the (generating) partition \(\{[k/\beta ,(k+1)/\beta )\cap [0,1]\}\), and Lemma 5.3,

$$\begin{aligned} \underline{\dim }(\theta ,x)\le \lim _{n\rightarrow \infty } \frac{\log \theta ([x-\beta ^{-n},x+\beta ^{-n}])}{n\log \beta } = \frac{h(\theta ,x)}{\log \beta }, \end{aligned}$$

for \(\theta \)-almost all \(x\), where \(h(\theta ,x)\) is the entropy of the ergodic component of \(x\). Since \(h(\theta )<h(\nu _\beta )=\log \beta \), there is a set of positive measure where the right-hand side above is \(<1\). In light of the characterization of \(\dim \) using local dimensions given in Eq. (6), \(\dim \theta <1\), as desired. \(\square \)

Fix a \(\mu \)-typical \(x\). It suffices to show that if \(x\) equidistributes under \(T_\beta \) along a sub-sequence for a measure \(\nu \), then \(\nu \) is the unique absolutely continuous \(T_\beta \)-invariant measure \(\nu _\beta \).

From Theorem 5.1 (and the discussion following it for the general Pisot case), we have the representation

$$\begin{aligned} \nu = \int c_\omega \cdot (\delta _{y_\omega }*\eta _\omega )|_{I_\omega }\,dQ(\omega ) \end{aligned}$$

where \(c_\omega ,y_\omega ,\eta _\omega ,I_\omega \) are defined for \(\omega \) in some auxiliary probability space \((\Omega ,\mathcal {F},Q)\), and the distribution of \(\eta _\omega \) is \(P\). Recalling Proposition 4.12, let \(\delta >0\) denote the a.s. dimension of measures drawn according to \(P\), so also \(\dim \eta _\omega =\delta \) a.s. In particular, \(\dim \nu >0\) and \(\nu \) is non-atomic.

Lemma 5.7

\(\nu \) is \(T_\beta \)-invariant.

Proof

\(T_\beta \) has finitely many discontinuities, and \(\nu \) is non-atomic, so the set of discontinuities has \(\nu \)-measure zero. Since \(\nu \) arises as the measure for which \(x\) equidistributes subsequentially, it is \(T_\beta \)-invariant. \(\square \)

Lemma 5.8

Let \(\tau \) be a probability measure on \(\mathbb {R}\) with \(\dim \tau \ge 1-\delta \). Then \(\dim \tau *\eta _\omega =1\) for \(Q\)-a.e. \(\omega \).

Proof

Using \(S\)-invariance of \(P\), Fubini and Theorem 5.4,

$$\begin{aligned} \int \dim (\tau *\eta _\omega )\,dQ(\omega )&= \int \dim (\tau *\eta )\,dP(\eta ) \\&= \int _0^1 \int \dim (\tau *\eta )\,dS_tP(\eta )\,dt\\&= \int \int _0^1\dim (\tau *S_t\eta )\,dt\,dP(\eta )\\&= \int \min \{1,\dim \tau +\dim \eta )\,dP(\eta )\\&= 1. \end{aligned}$$

Since the integrand on the left hand side is \(\le 1\), it is a.s. equal to \(1\), as claimed. \(\square \)

Now let \(\{\tau _n\}\) be the sequence of resonant measures provided by Theorem 5.5. Then \(\dim \tau _n\rightarrow 1\), so for \(n\) large enough we have \(\dim \tau _n>1-\delta \), hence by linearity of convolution, basic properties of dimension, and the previous lemma,

$$\begin{aligned} \dim \tau _n * \nu&= \dim \left( \tau _n * \int c_\omega \cdot (\delta _{y_\omega }*\eta _\omega )|_{[0,1]}\,dQ(\omega )\right) \\&= \dim \left( \int c_\omega \cdot (\tau _n*\delta _{y_\omega }*\eta _\omega )|_{[0,1]}\,dQ(\omega )\right) \\&\ge {{\mathrm{essinf}}}_{\omega \sim Q} \dim (\tau _n*\delta _{y_\omega }*\eta _\omega |_{[0,1]})\\&\ge {{\mathrm{essinf}}}_{\omega \sim Q} \dim (\tau _n*\eta _\omega )\\&= {{\mathrm{essinf}}}_{\eta \sim P} \dim (\tau _n*\eta )\\&= 1. \end{aligned}$$

But by choice of \(\tau _n\), this is possible only if \(\dim \nu =0\) or \(1\). Since \(\dim \nu >0\), we must have \(\dim \nu =1\). Lemma 5.6 then allows us to conclude that \(\nu \) is the Parry measure for \(T_\beta \), as desired.

This completes the proof of Theorem 1.1

There is a version of Theorem 1.1 for measures which do not generate a distribution. For a measure \(\mu \) and a typical point \(x\) let \(\mathcal {D}(\mu ,x)\subseteq \mathcal {D}\) denote the set of accumulation points of \(\frac{1}{T}\int _0^T\delta _{\mu _{x,t}}dt\) as \(t\rightarrow \infty \). In [27, Theorem 1.7] it was shown that for \(\mu \)-a.e. \(x\), this set consists EFDs. An easy adaptation of the proof of the theorem above shows that if \(\mu \) is a measure such that a.s., \(\mathcal {D}(\mu ,x)\) contains only non-trivial ergodic distributions which do not have \(k/\log n\) in their spectrum, then \(\mu \) is pointwise \(n\)-normal. We shall not give the proof of this in detail.

6 Construction of resonant measures

The proof of Theorem 5.5 is slightly more transparent in the case that \(\beta \) is an integer. After some preliminaries we will prove this case, since it is shorter and may shed light on the general case.

6.1 Preliminaries on entropy

We use standard notation and properties for the entropy \(H(\mu ,\mathcal {P})\) of a measure \(\mu \) with respect to a partition \(\mathcal {P}\). See [54] or any textbook in ergodic theory for details.

Let \(\mathcal {A}^k\) be the partition of \(\mathbb {R}\) into \(k\)-generation \(n\)-adic intervals, that is, intervals \([r/n^k,(r+1)/n^k)\) for \(r\in \mathbb {N}\). For a \(T_n\)-invariant measure \(\mu \), the Kolmogorov-Sinai entropy is given by

$$\begin{aligned} h(\mu ) = \lim _{k\rightarrow \infty }\frac{1}{k}H\left( \mu ,\mathcal {A}^k\right) \end{aligned}$$

and the limit is also the infimum. In general, \(h(\mu )\le \log n\), with equality if and only if \(\mu \) is Lebesgue measure \(\lambda \). We also have

$$\begin{aligned} \frac{1}{\log n}h(\mu ) \ge \dim \mu \end{aligned}$$

with equality if \(\mu \) is ergodic; in general \(\dim \mu \) is the essential infimum over the dimensions (=normalized entropies) of the ergodic components of \(\mu \). This follows e.g. from the proof of Lemma 5.6.

The quantity \(H(\mu ,\mathcal {A}^k)\) is not continuous in \(\mu \), however we have the following approximate continuity under translation: If \(\eta \) is a measure supported on an interval of length \(<1/n^k\), then

$$\begin{aligned} \left| H\left( \eta *\mu ,\mathcal {A}^k\right) -H\left( \mu ,\mathcal {A}^k\right) \right| <c \end{aligned}$$

where \(c\) is a universal constant.

6.2 The integer case

Fix an integer \(n\ge 2\). Our goal is to construct a sequence of probability measures \(\tau _{1},\tau _{2},\ldots \) on \(\mathbb {R}\) such that \(\dim \tau _i\rightarrow 1\) and any \(T_{n}\)-invariant measure \(\nu \) with \(0<\dim \nu <1\) resonates with \(\tau _{i}\) for all large enough \(i\).

We will use the standard identification of the map \(T_n\) on \([0,1]\) with the shift map on the sequence space \(\{0,\ldots ,n-1\}^\mathbb {N}\), given by the base-\(n\) expansion. This is defined uniquely off a countable set of points and hence for non-atomic measures is an a.e. isomorphism, so we will not distinguish between the models.

Let \(N\) be an integer and define a measure \(\nu _N\) on infinite sequences of digits \(\{0,\ldots ,n-1\}\) as follows. Set the first \(N\) digits to be \(0\). Let the next \(N^2\) digits be chosen independently and equiprobably from \(\{0,\ldots ,n-1\}\). Repeat this procedure, independently of previous choices, for each subsequent block of \(N+N^2\) symbols. Write \(\nu _N\) also for the corresponding measure on \([0,1]\). Now, this measure is not \(T_n\)-invariant but it is \(T_n ^{N+N^2}\)-invariant, so the measure

$$\begin{aligned} \tau _N=\frac{1}{N+N^2}\sum _{i=0}^{N+N^2-1}T_{n}^{i}\nu _N \end{aligned}$$

is \(T_n\)-invariant.

It is elementary to use Eq. (6) to show that \(\dim \nu _N=N^2/(N+N^2)\), and so the same is true for \(T^i \nu _N\), and hence for \(\tau _N\). Thus \(\dim \tau _N\rightarrow 1\) as \(N\rightarrow \infty \).

Now let \(\mu \) be a \(T_n\)-invariant measure and suppose that it is not Lebesgue measure. We aim to show that \(\dim \tau _N*\mu < 1\) for large enough \(N\). Using the \(T_n\)-invariance of \(\mu \) and the fact that \(T_n^i\) is piecewise affine with constant expansion, we have

$$\begin{aligned} \dim \tau _N*\mu&= \dim \left( \frac{1}{N+N^2}\sum _{i=0}^{N+N^2-1}(T_n^i\nu _N)*\mu \right) \\&= \inf _{0\le i<N+N^2} \dim (T_n^i\nu _N)*\mu \\&= \inf _{0\le i<N+N^2} \dim (T_n^i\nu _N)*(T_n^i\mu )\\&= \inf _{0\le i<N+N^2} \dim (\nu _N*\mu )\\&= \dim \nu _N*\mu . \end{aligned}$$

Thus it is enough to show that \(\dim (\nu _N*\mu )<1\) for large enough \(N\), and since \(\nu _N*\mu \) is \(T_n^{N+N^2}\)-invariant, we only need to show that \(\nu _N*\mu \) is not Lebesgue. \(\nu _N\) is concentrated on the interval \([0,n^{-N})\), so we know that

$$\begin{aligned} H(\nu _N*\mu ,\mathcal {A}^N) < H(\mu ,\mathcal {A}^N) + c \end{aligned}$$

where \(c\) is a universal constant. Since \(\mu \) is not Lebesgue, it has less than full entropy, and hence \(H(\mu ,\mathcal {A}^N)<(1-\varepsilon )N\log n\) for some \(\varepsilon >0\) independent of \(N\). Thus

$$\begin{aligned} H(\nu _N*\mu ,\mathcal {A}^N) < (1-\varepsilon )N\log n+c< N\log n \end{aligned}$$

for large enough \(N\). Dividing by \(N\) and taking the infimum over \(N\) we find that \(h(\tau _N*\nu )<\log n=h(\lambda )\), where \(\lambda \) is Lebesgue measure, so \(\tau _N*\nu \ne \lambda \), as desired.

6.3 Dynamics of beta transformations

We review some basic facts about the beta transformations \(T_\beta :x\mapsto \beta x\mod 1\). We refer the reader to the surveys [7, 52] for further information and references.

Recall that \([m]=\{0,\ldots ,m-1\}\). For each \(\beta >1\), there is a \(T\)-invariant closed subset \(X_\beta \subseteq [\lceil \beta \rceil ]^\mathbb {N}\) (known as the \(\beta \)-shift) such that the beta expansion map \(\pi :X_\beta \rightarrow [0,1]\), \(\pi (x)=\sum _{n=1}^\infty x_n \beta ^{-n}\) semi-conjugates the action of the shift map \(T\) on \(X_\beta \) with the action of \(T_\beta \) on \([0,1]\). Further, \(\pi \) is injective on \(X_\beta \), except at countably many points on which it is two-to-one. In particular, any non-atomic \(T_\beta \) invariant measure lifts uniquely to a shift-invariant measure on the \(\beta \)-shift.

We will require the following lemma on the structure of the \(\beta \)-shift for Pisot \(\beta \).

Lemma 6.1

Let \(\beta \) be a Pisot number. There exists \(N_0=N_0(\beta )\in \mathbb {N}\) with the following property: let \(\{ x_i\}\) be finite words in \(X_\beta \) (i.e. \(X_\beta \) contains infinite words starting with each of the \(x_i\)). Then the infinite concatenation \((0^N x_1 0^N x_2 \ldots )\) is in \(X_\beta \) for \(N> N_0\).

Proof

The following characterization of \(X_\beta \) is essentially due to Parry [44], see also [7, Proposition 2.3]. Let \(a\) be the lexicographically least \(\beta \)-expansion of \(1\), i.e. the lexicographically smallest sequence \(a\in [\lceil \beta \rceil ]^\mathbb {N}\) such that \(1=\sum _{i=1}^\infty a_i \beta ^i\). Then \(x\in X_\beta \) if and only if \(T^k x\prec a\) for all \(k\), where \(\prec \) denotes lexicographically smaller or equal.

On the other hand, if \(\beta \) is Pisot, then the sequence \(a\) is eventually periodic, see [7], Sect. 4.1]. It cannot end in infinitely many zeros because \( (a_1\ldots a_{k-1} (a_k-1))^\infty \prec (a_1\ldots a_k 0^\infty )\) and both sequences represent the same number in base \(\beta \). It follows that the number of consecutive zeros in \(a\) is bounded by some integer \(N_0\). But then it is clear that for any finite words \(\{y_i\}\) in \(X_\beta \), any \(N> N_0\) and any \(\ell \ge 0\), we have \((0^\ell y_1 0^N y_2 0^N\ldots )\prec a\). This gives the claim. \(\square \)

6.4 Resonance in the Pisot case

The proof of the Pisot case is not unlike the integer one. The main difference is that convolutions of \(T_\beta \)-invariant measures are no longer invariant or related in any obvious way to an invariant measure. This makes estimating their dimension more involved.

For the rest of this section we fix a Pisot number \(\beta >1\), and write \(B=\lceil \beta \rceil \). Given an integer \(D\ge B\) (often implicit), and a \([D]\)-valued finite or infinite sequence \(x\) of length \(|x|\), we let \(\pi \) be the \(\beta \) expansion map, i.e. \(\pi (x) =\sum _{k=1}^{|x|} x_k\,\beta ^{-k}\). If \(|x|=\infty \), we also write \(\pi _k(x)=\pi (x|_{\{1,\ldots ,k\}})\). A key role will be played by the following partition of \(D^k\):

$$\begin{aligned} \mathcal {P}_k = \{ \pi ^{-1}(\pi x): x\in [D]^k\}. \end{aligned}$$

The property of Pisot numbers that will be used in the proof is given in the following classical Lemma of Garsia [24, Lemma 1.51]:

Lemma 6.2

There exists \(c>0\) (depending on \(\beta \) and \(D\)) such that for any \(x,y\in [D]^k\), either \(\pi (x)=\pi (y)\), or \(|\pi (x)-\pi (y)|\ge c \beta ^{-k}\).

We quote a basic fact for later reference:

Lemma 6.3

Let \(\widetilde{\mu }\) be any measure on \([D]^\mathbb {N}\), and set

$$\begin{aligned} a = a_{\beta ,D}= \frac{(D-1)\beta ^{-1}}{1-\beta ^{-1}}. \end{aligned}$$

Then for any Borel set \(A\subseteq \mathbb {R}\) and any \(k\in \mathbb {N}\),

  1. 1.

    \(\pi \widetilde{\mu }(A) \le \pi _k\widetilde{\mu }(A^{(a\beta ^{-k})})\),

  2. 2.

    \(\pi \widetilde{\mu }(A^{(a\beta ^{-k})}) \ge \pi _k\widetilde{\mu }(A)\),

where \(A^{(\delta )}\) denotes the \(\delta \)-neighborhood of \(A\).

Proof

Immediate from the fact that if \(x\in [D]^\mathbb {N}\), then

$$\begin{aligned} |\pi (x)-\pi _k(x)| \le \sum _{i=k+1}^\infty (D-1)\beta ^{-i} = a\,\beta ^{-k}. \end{aligned}$$

\(\square \)

The following lemma is similar to [36, Lemma 3].

Lemma 6.4

Let \(\widetilde{\mu }\) be an \(T\)-invariant measure on \([D]^\mathbb {N}\) (as before \(T\) is the shift map). Then

$$\begin{aligned} \dim \pi \widetilde{\mu } \le \lim _{k\rightarrow \infty } \frac{ H(\widetilde{\mu },\mathcal {P}_k)}{k\log \beta }= \inf _{k\ge 1}\frac{H(\widetilde{\mu },\mathcal {P}_k)}{k\log \beta }. \end{aligned}$$

Proof

Write \(\mu =\pi \widetilde{\mu }\). For the first inequality, note first that

$$\begin{aligned} \mu (B(\pi x,(1+a)\beta ^{-k})) \ge \widetilde{\mu }(\mathcal {P}_k(x)), \end{aligned}$$

for any \(x\in [D]^\mathbb {N}\), where \(a\) is the constant from Lemma 6.3. The inequality follows by combining this and Fatou’s lemma applied to the sequence

$$\begin{aligned} g_k(x) = \frac{\log \widetilde{\mu }(\mathcal {P}_k(x))}{-k\log \beta }. \end{aligned}$$

For the second equality, it is enough to show that the sequence \(H(\mathcal {P}_k,\widetilde{\mu })\) is sub-additive. The partition \(\mathcal {P}_k \vee T^{-k}\mathcal {P}_m\) is a refinement of \(\mathcal {P}_{m+k}\), since \(\pi _k(x)\) and \(\pi _m(T^k x)\) determine \(\pi _{m+k}(x)\). Thus

$$\begin{aligned} H(\mathcal {P}_{m+k},\widetilde{\mu })&\le H(\mathcal {P}_k \vee T^{-k}\mathcal {P}_m,\widetilde{\mu })\\&\le H(\mathcal {P}_k,\widetilde{\mu }) + H(\mathcal {P}_m,\widetilde{\mu }), \end{aligned}$$

using the invariance of \(\mu \). \(\square \)

Note that in the above we do not assume that \(\pi \widetilde{\mu }\) is \(T_\beta \)-invariant.

Lemma 6.5

Let \(\widetilde{\mu }\) be the lift to \(X_\beta \subseteq [B]^\mathbb {N}\) of a non-atomic \(T_\beta \)-invariant measure \(\mu \). If \(\mu \) is not the Parry measure, then

$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{ H(\mathcal {P}_k,\widetilde{\mu })}{k} < \log \beta . \end{aligned}$$

Proof

We claim that the limit in the left-hand side equals the entropy of \(\mu \) under \(T_\beta \); this will imply the lemma since the Parry measure is the unique measure of maximal entropy \(\log \beta \).

As before, let \(\mathcal {A}\) be the partition of \([0,1]\) into intervals \([j/\beta ,(j+1)/\beta )\cap [0,1]\), and \(\mathcal {A}^k = \mathcal {A}\vee \cdots \vee T_\beta ^{k-1} \mathcal {A}\). Let also \(\mathcal {Q}_k\) be the partition of \([0,1]\) into half-open intervals determined by the points \(\{ \pi (x):x\in D^n\}\). Since \(\widetilde{\mu }\) is supported on \(X_\beta \), \(H(\mathcal {P}_k,\widetilde{\mu })=H(\mathcal {Q}_k,\mu )\) (the correspondence between the elements of both partitions follows from the fact that \(X_\beta \) is composed of the lexicographically least sequences with a given \(\beta \) expansion, which implies that the lexicographic order on \(X_\beta \) projects onto the usual order of \([0,1]\), see e.g. [52]).

On the other hand, it is easy to see that \(\mathcal {Q}_k\) refines \(\mathcal {A}^k\) and, thanks to Garsia’s Lemma, each atom of \(\mathcal {A}^k\) is the union of a uniformly bounded number of atoms of \(\mathcal {Q}_k\). Hence

$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{1}{k} H(\mathcal {P}_k,\widetilde{\mu }) = \lim _{k\rightarrow \infty } \frac{1}{k} H(\mathcal {A}^k,\mu ) = h(\mu ), \end{aligned}$$

as claimed. \(\square \)

Proof of Theorem 5.5

Let \(M \gg N \gg 1\) be large numbers; \(M\) will be chosen as a function of \(N\) later. We construct a measure \(\tau =\tau _{M,N}\) on \([B]^\mathbb {N}\) as follows. Let \(\widetilde{\lambda }\) be the lift of the Parry measure \(\lambda _\beta \) to the code space. Now let \(\widetilde{\tau }_0\) be the measure on \([B]^\mathbb {N}\) defined as follows (compare with the measure constructed in the integer case). The first \(N\) digits are \(0\). The next \(M\) digits are chosen according to \(\widetilde{\lambda }\). Continue this procedure for each block of \(N+M\) digits, with all the choices independent.

As in the integer case, this measure is \(T^{M+N}\)-invariant but not \(T\)-invariant, so we define

$$\begin{aligned} \widetilde{\tau } = \widetilde{\tau }_{M,N}=\frac{1}{M+N} \sum _{i=0}^{M+N-1} T^i\widetilde{\tau }_0, \end{aligned}$$

which is shift-invariant and ergodic. Lemma 6.1 shows that, provided \(N\) is large enough, \(\widetilde{\tau }_0\) and hence also \(\widetilde{\tau }\) are defined on the \(\beta \)-shift \(X_\beta \). In particular \(\tau =\tau _{N,M}:=\pi \widetilde{\tau }\) is \(T_\beta \)-invariant.

Let \(\tau _0=\pi \widetilde{\tau }_0\). The Parry measure \(\lambda _\beta \) has a bounded density with respect to Lebesgue measure (in the Pisot case it is actually piecewise constant). It follows that if \(I\) is an interval determined by two consecutive points of the form \(\sum _{i=1}^{(N+M)k} x_i\beta ^{-i}\), then \(\tau _0(I) \le c^k\beta ^{-Mk}\), where \(c>0\) is a constant that depends only on \(\beta \) (in particular, it is independent of \(M\), \(N\) and \(I\)). By Garsia’s Lemma 6.2, any interval of length \(2\beta ^{-(M+N)k}\) can be covered by a uniformly bounded number of such \(I\), and we conclude that

$$\begin{aligned} \liminf _{r\downarrow 0}\frac{\log \tau _0([x-r,x+r])}{\log r} \ge \frac{\log c+M\log \beta }{(M+N)\log \beta }. \end{aligned}$$

Thus for any \(N\), by taking \(M=M(N,c)\) large enough, we can ensure that \(\dim \tau =\dim \tau _0>1-1/N\).

It remains to show that if \(N\) is large enough, then for any \(M\), \(\dim (\mu *\tau )<1\). Since \(\mu \) is invariant, arguing as in the integer case we see that it suffices to show this with \(\tau _0\) in place of \(\tau \) (note that the argument does not use invariance of the convolved measure, only the identity \(\dim (T_\beta ^i\mu *T_\beta ^i\nu )=\dim (\mu *\nu )\), which holds for any map that is piecewise affine with constant slope, in particular \(T_\beta \)).

Note that \(\mu *\tau _0\) is the projection of \(\mu \times \tau _0\) under the addition map \((x,y)\rightarrow x+y\), and hence \(\mu *\tau _0 = \pi \widetilde{\rho }\), where \(\widetilde{\rho }\) is the image of \(\widetilde{\mu }\times \widetilde{\tau }\) on \([B]^\mathbb {N}\times [B]^\mathbb {N}\) under the map \((x,y)\rightarrow (x_i+y_i)_i\) (so that \(\widetilde{\rho }\) is defined in \([2B-1]^\mathbb {N}\), and it is \(T\)-invariant). It follows from Lemma 6.4 (applied with \(D=2B-1\) and the partitions \(\mathcal {P}_k\) defined in terms of \(D\)) that

$$\begin{aligned} \dim (\mu *\tau _0) \le \frac{H(\mathcal {P}_{M+N},\widetilde{\rho })}{(M+N)\log \beta }. \end{aligned}$$
(12)

Since, by assumption, \(\dim \mu <1\), we know from Lemma 6.5 that there is \(\varepsilon >0\) such that, if \(N\) is large enough, then

$$\begin{aligned} H(\mathcal {P}_N,\widetilde{\mu }) < (1-\varepsilon ) N\log \beta . \end{aligned}$$

Using this, the fact that \(\mathcal {P}_N\vee T^{-N}\mathcal {P}_M\) refines \(\mathcal {P}_{M+N}\), that \(|\mathcal {P}_M|\le C\,\beta ^M\) (by Garsia’s Lemma), and that \(\tau \) is concentrated on \(\{ x\in [B]^\mathbb {N}: x_1=\cdots =x_N=0\}\) (which implies that \(\widetilde{\rho }\) and \(\widetilde{\mu }\) coincide on \(\mathcal {P}_N\)), we estimate

$$\begin{aligned} H(\mathcal {P}_{M+N},\widetilde{\rho })&\le H(\mathcal {P}_N,\widetilde{\rho }) + H(T^{-N}\mathcal {P}_M,\widetilde{\rho })\\&\le H(\mathcal {P}_N,\widetilde{\rho }) + \log |\mathcal {P}_M|\\&\le H(\mathcal {P}_N,\widetilde{\mu }) + M\log \beta + \log C\\&\le ((1-\varepsilon )N+M+\log C)\log \beta . \end{aligned}$$

Recalling (12), we conclude that there is \(N_0\) such that for all \(N\ge N_0\) and all \(M\in \mathbb {N}\),

$$\begin{aligned} \dim (\mu *\tau _0)< 1. \end{aligned}$$

This completes the proof.

7 Application to iterated function systems: Theorems 1.4, 1.5 and 1.12

7.1 Limit geometries

In this section we fix the following notation. Let \(\mathcal {I}=\{f_0\,\ldots \,f_{r-1}\}\) be an IFS on an interval which, without loss of generality, we assume is \([0,1]\). We will henceforth assume that \(\mathcal {I}\) is \(C^\alpha \) for some \(\alpha >1\) or \(\alpha =\omega \), and regular as defined in the introduction.

Let \(\mu \) be a quasi-product measure for \(\mathcal {I}\). The next lemma contains the key structural information we shall require about \(\mu \); it is a manifestation of ideas that go back to Sullivan [53]. We write \(\nu _1\sim _C\nu _2\) to denote that the measures \(\nu _1,\nu _2\) are mutually absolutely continuous with both Radon-Nikodym densities bounded by \(C\), i.e. \(1/C\le d\nu _1/d\nu _2 \le C\).

Lemma 7.1

Then there is \(C=C(\mu )>0\) such that the following holds. Let \(x\in {{\mathrm{supp}}}\mu \) and let \(\nu \) be an accumulation point of \(\mu _{x,t}\) as \(t\rightarrow \infty \). Then \(\nu \sim _C (g\mu )|_{[-1,1]}\) and \(\mu \sim _C (h\nu )|_{[0,1]}\) for some \(g,h\in {{\mathrm{diff}}}^\alpha (\mathbb {R})\).

Proof

Given a finite sequence \(y\in [r]^n\), let \(f_y = f_{y_1}\circ \cdots \circ f_{y_n}\) and \(f_y^*=A_y f_y\), where \(A_y\) is the renormalizing homothety mapping \(f_y([0,1])\) back to \([0,1]\). It is proved in [3, Theorems 5.9 and 6.1] that for a left-infinite sequence \(y=(y_i)_{i=-\infty }^0\), the sequence \(f_{y_{-n}\ldots y_{0}}^*\) converges, in the \(C^1\) topology, to a \(C^\alpha \) diffeomorphism \(F^*_y\) (these are known as limit diffeomorphisms). Moreover, the dependence of \(F^*_y\) on \(y\) is uniformly continuous. In particular, the family \(\{ f^*_y\}\), where \(y\) ranges over all finite words, is relatively compact in the \(C^1\) topology.

For the first part, \(\nu \sim _C (g\mu )|_{[-1,1]}\), one notes that \(\mu _{x,t}\) is \(C\)-equivalent to a bounded translation, a restriction and normalization of \(S_s f^*_y\mu \) for an appropriate word \(y=y(t)\), whose length tends to \(\infty \) with \(t\), and some \(s=s(t)\in [0,L]\), where \(L\) depends only on the IFS (one can take \(L\) to be the maximum of \(|\log f'_i(x)|\) over \(i\in [r]\) and \(x\in [0,1]\)). Also, the space of measures \(C\)-equivalent to \(\mu \) is weak-* closed. Thus, up to passing to a subsequence, \(\nu \) is \(C\)-equivalent to a translation, restriction and normalization of \(S_s F\mu \) for some limit diffeomorphism \(F\) and \(s\in [0,L]\).

For the second statement, note that it follows from the first part that \(\nu |_I \sim _C g(\mu |_J)\) for suitable intervals \(I,J\). Moreover, we can take \(J=f_y([0,1])\) for some word \(y\). In this case \(\mu |_J \sim _C f_y\mu \) by the quasi-product property, so the claim follows with \(h=(g f_y)^{-1}\) (and a different value of \(C\)). \(\square \)

Observe that by Lemmas 4.16 and 7.1, if \(\nu \) is any accumulation point of \(\mu _{x,t}\), and \(\nu \) generates \(P\), then so does \(\mu \). This fact will be key in the proof of the following theorem.

Theorem 7.2

\(\mu \) generates an \(S\)-ergodic, non-trivial distribution \(P\). Furthermore, there is \(C>0\), such that \(P\)-a.e. \(\nu \) satisfies \(\nu \sim _C (g\mu )|_{[-1,1]}\) for some \(g\in {{\mathrm{diff}}}^\alpha (\mathbb {R})\).

Proof

The proof of the first part is essentially identical to [27, Proposition 1.36]. We sketch the details for completeness.

For \(\mu \)-a.e. \(x\), if the scenery at \(x\) equidistributes for \(P\) along a subsequence of times \(T_k\rightarrow \infty \), then \(P\) is \(S\)-invariant and \(S\)-quasi-Palm [27, Theorem 1.7]. By the ergodic theorem and the \(S\)-quasi-Palm property, \(P\)-almost all measures \(\nu \) generate the \(S\)-ergodic component \(P_\nu \) of \(\nu \) (this argument holds for general measures \(\mu \) with no additional assumptions).

Let \(x\) be a \(\mu \)-typical point, and let the scenery at \(x\) equidistribute for \(P\) along a subsequence (such subsequence exists by compactness). By the previous paragraph, a \(P\)-typical measure \(\nu \) generates an \(S\)-ergodic distribution \(P_\nu \). By the remark preceding the theorem this means that \(\mu \) generates \(P_\nu \), and the generated distribution is \(S\)-ergodic. The second part is just Lemma 7.1 and the fact that \(P\) is supported on accumulation points of sceneries of \(\mu \). \(\square \)

7.2 Proofs of Theorems 1.4, 1.5 and 1.12

Proof of Theorem 1.4

Let \(\mu \) be a quasi-product measure for a \(C^{1+\varepsilon }\)-IFS \(\mathcal {I}\) such that \(\lambda (f)\not \sim \lambda (g)\) for some \(f,g\in \mathcal {I}\). We want to show that \(\mu \) is pointwise \(\beta \)-normal for any Pisot \(\beta >1\). We have already established in Theorem 7.2 that \(\mu \) generates an EFD \(P\), and our aim is to apply Theorem 1.2 to \(\mu \), so we must show that \(\Sigma (S,P)\cap \frac{1}{\beta }\mathbb {Z}=\{0\}\). To this end, we shall show that if \(k/\log \beta \in \Sigma (P,S)\), then \(\lambda (f)\sim \beta \) for all \(f\in \mathcal {I}\).

We argue by contradiction. Let \(t_0=k/\log \beta \in \Sigma (P,S)\) such that \(\lambda (f)\not \sim \beta \) for some \(f\in \mathcal {I}\). Hence

$$\begin{aligned} e(t_0\log \lambda (f))\ne 1. \end{aligned}$$
(13)

Let \(\nu \) be a \(P\)-typical measure; we know from Proposition 4.15 that the phase measure \(\theta _\nu \) of the eigenvalue corresponding to \(t_0\) is (well defined, and) a single atom. Now by Lemma 7.1, \(\nu \) is also (the restriction of) a quasi-product measure for a conjugated \(C^\alpha \) IFS \(\mathcal {J}= g\mathcal {I} =\{g f g^{-1}:f\in \mathcal {I}\}\). Since \(\lambda (g f g^{-1})=\lambda (f)\), we may assume without loss of generality that, already for the original measure \(\mu \), the phase measure is an atom, say \(\delta _z\).

Let \(x_0\) be the fixed point of \(f\) (this is in the support of \(\mu \)). Let \(U\) be a small interval centered at \(x_0\). Since \(\mu _U\ll \mu \) and \(f(\mu _U)\ll \mu \), by the first part of Corollary 4.17, the phase measures of \(\frac{1}{\mu (U)}\mu |_U\) and \(\nu _U:=\frac{1}{\mu (U)}f(\mu |_U)\) are (well defined and) equal to \(\delta _z\). By the second part of Corollary 4.17, the phase measure of \(\nu _U\) equals \(\frac{1}{\mu (U)}\int _U \delta _{e(-t_0 \log f'(x))z} d\mu (x)\). Since \(f'\) is continuous, this shows that as the size of \(U\) tends to \(0\), the support of \(\theta _{\nu _U}\) tends to \(\{e(-t_0 \log \lambda (f)) z\}\). In light of (13), this is a contradiction, as desired. \(\square \)

We sketch an alternative proof of Theorem 1.4 which does not directly use EFDs and instead relies on the results from [29] on dissonance between quasi-product measures for regular IFSs. The overall strategy is the same, with the main difference coming in the part that establishes dissonance. Namely, for a \(\mu \)-typical \(x\), suppose \(x\) equidistributes for \(\nu \) under \(T_\beta \) along a sequence \(N_k\rightarrow \infty \). Using Lemma 7.1, one can check that a representation (10) still holds, except that a priori we do not know that the measure marginal of \(Q\) is \(P\); however, it is easily seen to be supported on limits of sceneries of \(\mu \) which, as we know from Lemma 7.1, are restrictions of quasi-product measures for a smoothly conjugated IFS. Now since the measures \(\tau _n\) constructed in Theorem 5.5 are (convex combinations of) quasi-product measures for a homogeneous affine IFS with contraction ratio \(\sim \beta \), it follows from [29, Theorem 1.4] that the measures \(\eta _\omega \) in (10) dissonate with the \(\tau _n\) (this is the step that uses that \(\lambda (f)\not \sim \beta \) for some \(f\in \mathcal {I}\)), and hence so does \(\nu \) if \(\dim (\tau _n)\) is large enough. This contradicts Theorem 5.5 unless \(\nu \) is the Parry measure.

Proof of Theorem 1.5

Let \(\mu \) be a quasi-product measure for a real-analytic, totally non-linear IFS \(\mathcal {I}\). We know that \(\mu \) generates an EFD \(P\); we will again show that \(P\) is weak-mixing, i.e. \(\Sigma (P,S)=\{0\}\). Together with Theorem 1.2, this will yield the result.

Suppose for contradiction that \(0\ne t_0\in \Sigma (P,S)\). We follow the scheme of the proof of Theorem 1.4: instead of working with the original measure \(\mu \), we consider a \(P\)-typical measure \(\nu \) such that the phase measure is an atom \(\delta _z\). This measure is a restriction of the attractor of a conjugated IFS \(g\mathcal {I}\), which is also real-analytic. Since \(\mathcal {I}\) is totally non-linear, \(g\mathcal {I}\) is nonlinear, hence it contains a non-affine analytic map \(h\). We can then find a non-trivial interval \(U\) meeting the support of \(\nu \), on which \(h'\) is strictly monotone (this is the point where analyticity gets used; if the IFS was merely \(C^2\), a priori \(h'\) might have no point of strict monotonicity on the Cantor set \({{\mathrm{supp}}}\nu \)). Now arguing as in the proof of Theorem 1.4, on one hand the phase measure of \(\frac{1}{\nu (U)}h(\nu |_U)\) is \(\delta _z\), and on the other hand it equals \(\frac{1}{\nu (U)}\int _U \delta _{e(-t_0\log h'(x)) z} d\nu (x)\). The latter measure clearly cannot be atomic, so we have reached the desired contradiction. \(\square \)

The facts on the spectrum \(\Sigma (P,S)\) that emerged in the above proofs may find other applications, so we summarize them below.

Theorem 7.3

Let \(\mathcal {I}\) be a \(C^{1+\varepsilon }\) IFS, \(\mu \) a quasi-product measure for it, and \(P\) the distribution generated by \(\mu \).

  1. 1.

    Suppose that \(\lambda (f)\not \sim t_0\) for some \(f\in \mathcal {I}\). Then \(k/\log t_0\notin \Sigma (P,S)\) for any \(k\in \mathbb {Z}\setminus \{0\}\). In particular, if \(\lambda (f_1)\not \sim \lambda (f_2)\) for \(f_1,f_2\in \mathcal {I}\), then \(\Sigma (P,S)=\{0\}\).

  2. 2.

    If \(\mathcal {I}\) is \(C^\omega \) and totally non-linear or, more generally, if \(\mathcal {I}\) has the property that for any limit diffeomorphism \(g\), the conjugated IFS \(g\mathcal {I}\) contains a map \(h\) such that \(h'\) is a local diffeomorphism, then \(\Sigma (P,S)=\{0\}\).

To conclude this section, we present the deduction of Theorem 1.12 from Theorem 1.2.

Proof of Theorem 1.12

Let \(a,b\) be two distinct elements of \(\Lambda \). Write \(x_i\) for the fixed point of the inverse branch of the Gauss map \(f_i(x)=1/(x+i)\). Then \(x_a\) and \(x_b\) are quadratic numbers generating distinct quadratic fields. It follows that \(\lambda (f_a^2)=x_a^4\not \sim x_b^4=\lambda (f_b^2)\). Hence for any Pisot \(\beta >1\), either \(\beta \not \sim \lambda (f_a^2)\) or \(\beta \not \sim \lambda (f_b^2)\); by Theorem 1.4, any quasi-product measure on \(C_\Lambda \) is pointwise \(\beta \)-normal. \(\square \)

8 A refinement of Theorem 1.1 and applications

8.1 Relaxing the spectral hypothesis

For an integer \(n\), any \(T_n\)-invariant and ergodic measure \(\mu \) generates an EFD \(P\), see [26]. This \(P\) can be rather explicitly described, and its spectrum can be shown to contain non-zero integer multiples of \(\frac{1}{\log m}\) only if either \(m\sim n\) or \(\log n/\log m\in \Sigma (T,\mu )\). Thus in many cases the pointwise \(m\)-normality of \(\mu \) follows directly from Theorem 1.2. In order to deal with the remaining cases we now present some refinements of Theorem 1.2, in which \(k/\log \beta \) is present in the spectrum of \(P\), but instead we assume that the phase is “sufficiently spread out”. We give two versions, the first being simpler to state:

Theorem 8.1

Let \(\beta >1\) be a Pisot number. Let \(\mu \in \mathcal {P}([0,1])\) and suppose that \(\mu \) generates an \(S\)-ergodic and non-trivial distribution \(P\) which is not \(S_{\log \beta }\)-ergodic (so that \(k/\log \beta \in \Sigma (P,S)\)). Further, assume that \(\mu \) \(\log \beta \)-generates an \(S_{\log \beta }\)-ergodic distribution \(P_x\) at \(\mu \)-a.e. point \(x\). Let \(\theta =\theta _\mu \) denote the associated phase measure as described in Sect. 4.3. If \(\dim \theta =1\), then \(\mu \) is pointwise \(\beta \)-normal.

One consequence is that if \(\int P_xd\mu (x)\) is \(S\)-invariant, then \(\mu \) is pointwise \(\beta \)-normal, as it is clear that in this case the phase measure is invariant under rotations of the circle hence is normalized length measure. Although the theorem above is strong enough for applications, the proofs become simpler using the following variant:

Theorem 8.2

Let \(\beta >1\) be a Pisot number. Let \(\{\mu _\omega \}_{\omega \in \Omega }\subseteq \mathcal {P}(\mathbb {R})\) be a measurable family defined on a probability space \((\Omega ,\mathcal {F},Q)\). Suppose that there is an \(S\)-ergodic and non-trivial distribution \(P\), which is not \(S_{\log \beta }\)-ergodic, and such that \(Q\)-a.e. \(\mu _\omega \) generates \(P\) and at a.e. point \(\log \beta \)-generates an \(S_{\log \beta }\)-ergodic distribution. Let \(\theta _{\mu _\omega }\) denote the associated phase measures and \(\theta =\int \theta _{\mu _\omega }\,dQ(\omega )\) the “cumulative” phase measure. If \(\dim \theta =1\), then \(\mu _\omega \) is pointwise \(\beta \)-normal for \(Q\)-a.e. \(\omega \), and hence also \(\mu =\int \mu _\omega \,dQ(\omega )\) is pointwise \(\beta \)-normal.

It is clear that the first theorem follows from the second by taking \(\mu _\omega =\mu \) for all \(\omega \). Nevertheless we shall prove the first, and then explain the changes needed for the second. The proof of Theorem 8.1 follows the scheme of the proof of Theorem 1.2 detailed in Sect. 5.1, with a minimal change to the first step and a more significant change in the proof of the third step, in particular making use of the stronger version of Marstrand’s projection theorem given in Theorem 3.3.

For the rest of this section, suppose that \(\beta \) and \(\mu \) are as in the statement of Theorem 8.1. In particular, let \(\theta =\theta _\mu \) be the associated phase measure with respect to an appropriate eigenfunction \(\varphi \) of \((P,S)\), as in Sect. 4.3. Fix a \(\mu \)-typical \(x_0\) for which \(P_{x_0}\) is defined. For \(\mu \)-typical \(x\), define a function \(\ell (x)\in [0,1)\) by \(\varphi _\mu (x)=e(\ell (x))\varphi _\mu (x_0)\). It follows from the fact that \(P=\int _0^{\log \beta } S_t P_x dt\) and the eigenfunction property that \(P_x = S_{\ell (x)} P_{x_0}\). Since \(\ell (x)\) depends only on \(\varphi _\mu (x)\), we will also denote \(\ell (z)=\ell (x)\) where \(z=\varphi _\mu (x)\), or in other words \(e(\ell (z))=z/\varphi _\mu (x_0)\). In particular, \(\dim (\ell \theta )=\dim \theta =1\).

Let \(\delta \) denote the almost-sure dimension of measures drawn from \(P\); it is also the a.s. dimension of measures drawn from \(P_x\) for \(\mu \)-almost all \(x\). Recall from Proposition 4.12 that \(\delta >0\). The following is a refined version of Lemma 5.8.

Lemma 8.3

Let \(\tau \) be a probability measure on \(\mathbb {R}\) with \(\dim \tau \ge 1-\delta \). Then \(\dim \tau *\eta =1\) for \(\mu \)-a.e. \(x\) and \(P_x\)-a.e. \(\eta \).

Proof

Using Fubini, the fact that \(\dim (\ell \theta )=1\), and Theorem 3.3,

$$\begin{aligned} \int \int \dim (\tau * \eta ) \,dP_x(\eta )\,d\mu (x)&= \int \int \dim (\tau * \eta ) \,dS_{\ell (x)}P_{x_0}(\eta ) \,d\mu (x) \nonumber \\&= \int \int \dim (\tau * \eta ) \,dS_{\ell (z)}P_{x_0}(\eta ) \,d\theta (z)\nonumber \\&= \int \int \dim (\tau * S_{\ell (z)}\eta ) \,d\theta (z)\,dP_{x_0}(\eta )\nonumber \\&= \int \int \dim (\tau * S_t\eta ) \,d\ell \theta (t)\,dP_{x_0}(\eta )\nonumber \\&\ge \int \min \{1,\dim \tau +\dim \eta \} \,dP_{x_0}(\eta )\nonumber \\&= \min \{1,\dim \tau +\delta \} \nonumber \\&= 1. \end{aligned}$$
(14)

But the integrand on the left hand side is \(\le 1\), so it is a.s. equal to \(1\), as claimed. \(\square \)

We can now finish the proof of the theorem.

Proof of Theorem 8.1

For \(\mu \)-typical \(x\), the analog of Lemma 4.3 holds for the distributions \(P_x\) by assumption. It follows that for \(\mu \)-a.e. \(x\) and any measure \(\nu \) for which \(x\) equidistributes under \(T_\beta \) sub-sequentially, we have a representation similar to Theorem 5.1:

$$\begin{aligned} \nu = \int c_\omega \cdot (\delta _{y_\omega }*\eta _\omega )|_{I_\omega }\,dQ(\omega ) \end{aligned}$$

where \(c_\omega ,y_\omega ,\eta _\omega , I_\omega \) are defined on some auxiliary probability space \((\Omega ,\mathcal {F},Q)\), and \(\eta _\omega \) is distributed as \(P_x\) (rather than \(P\)).

The proof is now concluded exactly in the same way as in Theorem 1.1. Combining the integral representation with Lemma 8.3, for \(\mu \)-a.e. \(x\) and any \(\nu \) for which \(x\) equidistributes sub-sequentially under \(T_\beta \), we have that \(\nu \) dissonates with every measure of large enough dimension, and also that \(\dim \nu \ge \delta \). But, by Theorem 5.5, this is possible only if \(\nu \) is of dimension \(1\), hence the unique absolutely continuous measure for \(T_\beta \). This completes the proof. \(\square \)

As for Theorem 8.2, the argument is identical, except that in Eq. (14) one replaces \(\mu \) by \(\mu _\omega \) and integrates \(dQ(\omega )\). We leave the remaining details to the reader.

8.2 Distributions associated to \(T_\gamma \) invariant measures

Let \(\gamma >1\) and \(\mu \) a \(T_\gamma \)-invariant and ergodic measure with \(\dim \mu >0\). In this section we develop some background about such measures and distributions associated to them. This is a minor adaptation of [26], Sect. 3], which dealt with the integer case (though the language we employ here is slightly different).

Let \(G=\lceil \gamma \rceil \). We have already met the \(\gamma \)-shift \(X_\gamma \subseteq [G]^\mathbb {N}\), which, together with the shift map \(T\), factors onto \(([0,1],T_\gamma )\), and have noted that \(\mu \) lifts uniquely to \(X_\gamma \). We also will need the so-called natural extension: let \(\widetilde{X}_\gamma \subseteq [G]^\mathbb {Z}\) denote two-sided \(\gamma \)-shift, i.e. the set of bi-infinite sequences all of whose subwords appear in the one-sided \(\gamma \)-shift \(X_\gamma \). For \(\omega \in \widetilde{X}_\gamma \) let \(\omega ^+=(\omega _1,\omega _2,\ldots )\) and \(\omega ^-=(\ldots ,\omega _{-1},\omega _0)\), and also write \(x(\omega )=\pi (\omega ^+)\), where \(\pi :X_\gamma \rightarrow [0,1]\) is the usual base-\(\gamma \) coding map. It is a standard fact that \(\mu \) lifts uniquely to a \(T\)-invariant measure \(\widetilde{\mu }\) on \((\widetilde{X}_\gamma ,T)\).

For \(\widetilde{\mu }\)-typical \(\omega \), let \(\mu _\omega \) denote the conditional measure of \(\widetilde{\mu }\) given the “past” \((\ldots ,\omega _{-1},\omega _{0})\). These conditional measures can be defined abstractly as the disintegration of \(\widetilde{\mu }\) given the measurable and countably generated partition into different pasts, see [16, Theorem 5.14], or more concretely by the conditions

$$\begin{aligned} \mu _\omega [i_1\cdots i_k] = \lim _{n\rightarrow \infty }\frac{\mu [\omega _{-n}\ldots \omega _0 i_1\ldots i_k]}{\mu [\omega _{-n}\ldots \omega _0]}. \end{aligned}$$

(That the limit exists for \(\widetilde{\mu }\)-a.e. \(\omega \) can be seen from a martingale argument.)

These conditional measures are measures on the “future” \([G]^\mathbb {N}\) and almost surely are supported on the one-sided \(\gamma \)-shift. We silently shall identify \(\mu _\omega \) with the corresponding measure \(\pi \mu _\omega \) on \([0,1]\). It is well known that \(\dim \mu _\omega =\dim \mu \) a.s.

Definition 8.4

A distribution \(P_0\in \mathcal {D}\) is \(S_{t_0}\)-quasi-Palm if it is \(S_{t_0}\)-invariant, gives full mass to \(\mathcal {M}^{{}^{{}_\square }}\), and for every Borel set \(B\subseteq \mathcal {M}^{{}^{{}_\square }}\) with \(P(B)=1\) and every \(k\in \mathbb {N}\), \(P\)-almost every measure \(\eta \) satisfies \(\eta _{x,kt_0}\in B\) for \(\eta \)-almost all \(x\) such that \([x-e^{-{kt_0}},x+e^{kt_0}]\subseteq [-1,1]\).

If \(P_0\) is \(S_{t_0}\)-quasi-Palm, it is easy to see that \(P=\frac{1}{t_0}\int _0^{t_0}S_tP_0\,dt\) is \(S\)-quasi-Palm.

Definition 8.5

An \(S_{t_0}\)-invariant and ergodic distribution which is also \(S_{t_0}\)-quasi-Palm is a \(t_0\)-discrete ergodic fractal distribution.

It is again clear that if \(P_0\) is such a distribution then \(P=\frac{1}{t_0}\int _0^{t_0}S_tP_0\,dt\) is an EFD.

Theorem 8.6

Let \(\mu \) be \(T_\gamma \)-invariant and ergodic. Then there is a \(1/\log \gamma \)-discrete EFD \(P_0\) and a factor map \(\sigma : (\widetilde{X}_\gamma ,\widetilde{\mu },T)\rightarrow (\mathcal {M}^{{}^{{}_\square }},P_0,S_{\log \gamma })\), such that for \(\widetilde{\mu }\)-a.e. \(\omega \),

$$\begin{aligned} (\mu _\omega )_{x(\omega )}\ll \sigma (\omega ). \end{aligned}$$
(15)

(actually the two measures are proportional on the interval \([-x(\omega ),1-x(\omega )]\)).

Proof

The factor map in question is defined by

$$\begin{aligned} \sigma (\omega )=\lim _{n\rightarrow \infty }S_{n\log \gamma }((\mu _{T^{-n}\omega })_{x(T^{-n}\omega )}), \end{aligned}$$

and \(P_0\) is the push-forward of \(\widetilde{\mu }\) through this map. The \(S\)-quasi-Palm property is a consequence of the fact that the distribution of \(\mu _{T\omega }\) for \(\omega \sim \widetilde{\mu }\) is equal in distribution to \(\mu _\omega \) for \(\omega \sim \widetilde{\mu }\). For a more detailed verification of the integer case, see [26, Theorem 3.1]; there are no substantial changes when passing to a general \(\gamma >1\). \(\square \)

Let \(P_0\) be as in Theorem 8.6, and

$$\begin{aligned} P=\frac{1}{\log \gamma }\int _0^{\log \gamma }S_tP_0\,dt \end{aligned}$$

which, as was already noted, is an EFD.

Proposition 8.7

  1. 1.

    \(P_0\) is \(\log \gamma \)-generated by \(\mu _\omega \) at \(\mu _\omega \)-a.e. point, for \(\widetilde{\mu }\)-a.e. \(\omega \).

  2. 2.

    \(\mu _\omega \) generates \(P\) for \(\widetilde{\mu }\)-a.e. \(\omega \).

Proof

(1) By the ergodic theorem, \(P_0\)-a.e. measure \(\nu \) generates \(P_0\) \(\log \gamma \)-discretely at \(0\), and by the \(S_{\log \gamma }\)-quasi-Palm property of \(P_0\), \(0\) can be replaced by \(\nu \)-typical \(x\) (this argument is the same as the proof of Lemma 4.9). Thus \(\sigma (\omega )\) generates \(P_0\) \(\log \gamma \)-discretely, and using (15), the same is true for \(\mu _\omega \). (2) is a consequence of (1). Let \(f\in C(\mathcal {P}([-1,1]))\) and let

$$\begin{aligned} F(\nu )=\frac{1}{\log \gamma }\int _0^{\log \gamma }f(S_t\nu )\,dt. \end{aligned}$$

Let \(\nu =\mu _\omega \) for a typical \(\omega \). We must show that

$$\begin{aligned} \lim _{T\rightarrow \infty }\frac{1}{T}\int _0^Tf(\nu _{x,t})\,dt = \int f\,dP_0 \qquad \text {for }\nu \text {-a.e. }x. \end{aligned}$$

But

$$\begin{aligned} \frac{1}{T}\int _0^Tf(\nu _{x,t})\,dt = \frac{1}{\lfloor T\rfloor }\sum _{n=0}^{\lfloor T\rfloor }F(\nu _{x,n}) + O\left( \frac{\Vert f\Vert _\infty }{T}\right) . \end{aligned}$$

If \(F\) were continuous on \(\mathcal {P}([0,1])\), convergence above would follow immediately from the fact that \(P_0\) is \(\log \gamma \)-discretely generated by \(\nu \) at \(\nu \)-a.e. point. In fact, \(F\) is defined only on \(\mathcal {M}^{{}^{{}_\square }}\), but it is continuous on a \(P_0\)-full measure set (such as the set of atomless measures in \(\mathcal {M}^{{}^{{}_\square }}\)), so the result still follows, see e.g. [6, Theorem 2.7]. \(\square \)

8.3 Proof of Theorem 1.10: normality of \(\mu \)

Let \(\gamma >1\), let \(\mu \) be a \(T_\gamma \)-invariant and ergodic measure, and let \(\beta >1\) be a Pisot number with \(\gamma \not \sim \beta \). Our goal is to show that \(\mu \) is pointwise \(\beta \)-normal, and so is \(f\mu \) when \(f\in {{\mathrm{diff}}}^2(\mathbb {R})\). We continue with the notation of the previous section: \(\widetilde{\mu },\mu _\omega ,P_0,P\) etc.

We will first show that \(\mu \) is pointwise \(\beta \)-normal; the case of \(f\mu \) for \(f\in {{\mathrm{diff}}}^2(\mathbb {R})\) will be handled in the next section.

Suppose first that \(\Sigma (P,S)\) does not contain non-zero integer multiples of \(1/\log \beta \). Then by Proposition 8.7(2) and Theorem 1.2, for \(\widetilde{\mu }\) a.e. \(\omega \) the conditional measure \(\mu _\omega \) is pointwise \(\beta \)-normal. But then so is \(\mu \) since \(\mu =\int \mu _\omega \, d\widetilde{\mu }(\omega )\).

Therefore, assume that there is some integer \(k\ne 0\) with \(k/\log \beta \in \Sigma (P,S)\), or, equivalently, that \(P\) is not \(S_{\log \beta }\)-ergodic. Our goal is to verify that the assumptions of Theorem 8.2 are met. In light of Proposition 8.7, and setting \(\Omega =\widetilde{X}_\gamma \) and \(Q=\widetilde{\mu }\), we see that all that remains to be checked is that the cumulative phase measure \(\theta =\int \theta _{\mu _\omega } \,\widetilde{\mu }(\omega )\) has full dimension.

Recall that \(P\)-a.e. \(\nu \) equidistributes under \(S_{\log \beta }\) for an \(S_{\log \beta }\)-ergodic distribution (Lemma 4.2). This is an \(S\)-invariant property, so it holds for \(P_0\)-a.e. \(\nu \), and by the \(S_{\log \lambda }\)-quasi-Palm property, the relation (15), and Lemma 8.7, the same holds for \(\nu ^x\) at \(\nu \)-a.e. \(x\) for \(P_0\)-a.e. \(\nu \).

Fix an eigenfunction \(\varphi \) for the eigenvalue \(k/\log \beta \). The assumption \(\beta \not \sim \gamma \) comes in during the proof of the next lemma.

Lemma 8.8

\(\theta '=\int \theta _\nu \,dP_0\nu \) is Lebesgue measure on the circle.

Proof

To begin, note that either \(P_0=P\) or else \(P_0\) is the level set of an eigenfunction \(\psi \) with eigenvalue \(m/\log \gamma \) for some non-zero integer \(m\). In the first case the assertion is clear, since \(\theta '\) is a translation-invariant measure on the circle, so we consider the second case only. Switching to additive notation, the map \(\nu \mapsto (\varphi (\nu ),\psi (\nu ))\) defines a factor map from \((P,S)\) to the torus equipped with translation by \((k/\log \beta ,m/\log \gamma )\). Since \(\beta \not \sim \gamma \), Lebesgue measure is the unique invariant measure for this translation, and we deduce that the distribution of \(\varphi \) conditioned on any level set of \(\psi \) is uniform on the circle; in particular, the distribution of \(\varphi \) on \(P_0\) is uniform on the circle. Now the lemma follows since, by Proposition 4.15, \(\theta _\nu =\delta _{\varphi (\nu )}\) for \(P\)-a.e. \(\nu \) and hence, by \(S\)-invariance, for \(P_0\)-a.e. \(\nu \). \(\square \)

Since \(\mu _\omega \ll \sigma (\omega )\) for \(\widetilde{\mu }\)-a.e. \(\omega \), and \(\theta _\nu \) is a single atom, we have \(\theta _{\mu _\omega }=\theta _{\sigma (\omega )}\), hence \(\theta =\int \theta _{\mu _\omega }\,d\widetilde{\mu }(\omega )\) is uniform on the circle, in particular of dimension \(1\). We have shown all the hypotheses of Theorem 8.2 hold, so this proves the pointwise \(\beta \)-normality of \(\mu \).

8.4 Conclusion of the proof of Theorem 1.10: normality of \(f\mu \)

It remains for us to prove pointwise \(\beta \)-normality of \(f\mu \) for \(f\in {{\mathrm{diff}}}^2(\mathbb {R})\); again we will do so by applying Theorem 8.2. Since \(\mu _\omega \) and \(f\mu _\omega \) generate and \(\log \beta \)-generate the same distributions, the task is to show that the cumulative phase measure has dimension \(1\). By Corollary 4.17, this measure is given by

$$\begin{aligned} \theta '=\int \int \delta _{e(\log (-f'(x))/\log \gamma )\cdot \varphi (\mu _\omega )} \,d\mu _\omega (x) \,d\widetilde{\mu }(\omega ). \end{aligned}$$
(16)

In order to be able to apply projection results, we pass from multiplicative to additive notation; in particular the range of \(\varphi \) becomes the unit interval \([0,1]\). Define the measure \(\eta \) on \([0,1]^2\) by

$$\begin{aligned} \eta = \int \mu _\omega \times \delta _{\varphi (\mu _\omega )} \, d\widetilde{\mu }(\omega ) \end{aligned}$$

Then \(\theta '\) is the projection of \(\eta \) by the map

$$\begin{aligned} \pi (x,y)=y-\log f'(x)/\log \gamma . \end{aligned}$$

Now, by definition the projection \(P_2\eta \) of \(\eta \) to the \(y\)-axis is \(\int \delta _{\varphi (\mu _\omega )}d\widetilde{\mu }(\omega )\), which we have seen is Lebesgue measure. Also, since \(\dim \mu _\omega =\dim \mu \) a.s., by Lemma 3.5(1) we find that

$$\begin{aligned} \dim \eta \ge 1+\dim \mu . \end{aligned}$$

Note that the map \(F(x,y)=(x,\pi (x,y))\) preserves dimension: indeed, since \(f\in {{\mathrm{diff}}}^2(\mathbb {R})\), we have that \(f'\) is differentiable, and one easily computes and finds that \(F\) is nonsingular. Thus the image \(\widetilde{\eta }=F\eta \) has dimension \(\dim \eta \ge 1+\dim \mu \). Also note that \(P_1\widetilde{\eta }=P_1\eta =\int \mu _\omega \,d\widetilde{\mu }(\omega )=\mu \). Since \(\mu \) is exact-dimensional, \(\dim _P(P_1\widetilde{\eta })=\dim \mu \) (see the discussion at the end of Sect. 3.3).

On the other hand, \(P_2\widetilde{\eta }=\theta '\) by definition. Thus, applying Lemma 3.5(2) to \(\widetilde{\eta }\), we conclude

$$\begin{aligned} \dim \theta '&\ge \dim \widetilde{\eta } - \dim _P (P_1\widetilde{\eta }) \\&\ge \dim \eta - \dim \mu \\&\ge 1. \end{aligned}$$

Summarizing, we have shown that \(\dim \theta '=1\), hence we can apply Theorem 8.2 to \(f\mu \). This completes the proof of Theorem 1.10.

8.5 Proof of Theorem 1.7

We now assume that \(\mathcal {I}\) consists of linear maps, \(\mu \) is self-similar, and \(f\in {{\mathrm{diff}}}^\omega (\mathbb {R})\) is not affine. Our aim is to prove Theorem 1.7, asserting the \(\beta \)-normality of \(f\mu \) for all Pisot \(\beta >1\). The argument is similar to what we have already seen, except that the classical projection theorems are not strong enough and we rely instead on a recent result from [28] that gives stronger bounds for self-similar measures.

If \(\mathcal {I}\) contains two maps with contraction ratios \(\lambda _1\not \sim \lambda _2\), then it follows from Theorem 1.4 that \(f\mu \) is normal to all Pisot bases for all \(f\in {{\mathrm{diff}}}^2(\mathbb {R})\) (and in fact \(f\in {{\mathrm{diff}}}^1(\mathbb {R})\) is enough in this case). Thus we may assume \(\lambda (f_i)\sim \gamma ^{p_i}\) for all \(f_i\in \mathcal {I}\) and integers \(p_i\). Let \(\beta >1\) be Pisot. Again, if \(\gamma \not \sim \beta \) then we are done by Theorem 1.4, so we assume that \(\beta \sim \gamma \).

Lemma 8.9

\(\mu \) generates an ergodic distribution \(P\) with \(\Sigma (P,S)\!\subseteq \! (1/\log \gamma )\mathbb {Q}\). For each eigenvalue, the phase measure is well defined and consists of a single atom.

Proof

The first statement follows from the fact that the measure in question is a quasi-product measure, and from Theorems 7.2 and 7.3. Because the contractions are linear, it is elementary to see that for every accumulation point \(\nu \) of \(\mu _{x,t}\), there is a linear map \(f\) and interval \(I\) with \(f\mu =c\cdot \nu |_I\) for a normalizing constant \(c\). The statement about the phase then follows from Proposition 4.15. \(\square \)

Let \(P\) be as in the lemma, and fix an eigenvalue \(\alpha \) of \(P\) with associated eigenfunction \(\varphi \). Let \(f\in {{\mathrm{diff}}}^\omega (\mathbb {R})\) be non-linear. By Corollary 4.17, we know that the phase distribution of \(f\mu \) is, up smooth coordinate change and in additive notation, the push-forward of \(\mu \) through \(f'\). Since \(f\) is real analytic and non-linear, \(f'\) is a piecewise diffeomorphism and so \(\dim f'\mu = \dim \mu >0\).

Now let \(\tau _n\) be the sequence of eventually-resonant measures for \(T_\beta \)-invariant measures provided by Theorem 5.5, and observe from the construction of \(\tau _n\) that they are in fact affine combinations of self-similar measures with uniform contraction ratio \(\sim \beta \) satisfying the open set condition. Arguing through the proof of Theorem 8.1, we find that the following lemma, which replaces Lemma 8.3, allows the proof to carry through.

Lemma 8.10

Let \(\nu =f\mu \) and \(\theta =\theta _{\nu }\). Let \(P_x\) denote the distribution that is \(\log \gamma \)-generated by \(\nu \) at \(x\). Let \(\tau \) be a self-similar measure for an IFS with uniform contraction ratio a power of \(\gamma \), satisfying the open set condition, and satisfying \(\dim \tau +\dim \nu \ge 1\). Then for \(\nu \)-a.e. \(x\) and \(P_x\)-a.e. \(\eta \), we have \(\dim \tau *\eta = \min \{1,\dim \tau +\dim \eta \}\).

Proof

We have already noted that \(\dim \theta >0\). We now calculate exactly as in the proof of Lemma 8.3. The only change is that we cannot use Theorem 3.3 to deduce that \(\dim \tau *S_{\ell (z)}\eta =1\) for \(\theta \)-almost all \(z\), since all we know about \(\theta \) is that \(\dim \theta >0\).

Instead, note that up to a translation, \(S_t\eta \) is absolutely continuous with respect to the measure \(\eta \) scaled by \(e^{-t}\), whence \(\tau *S_t\eta \) is absolutely continuous with respect to the image of the self-similar measure \(\tau \times \mu \) via the linear map \((x,y)\mapsto x+e^{-t}y\). Now, for \(\tau =\tau _n\), both \(\mu \) and \(\tau \) are self-similar with contraction ratios \(\sim \beta \). When the contraction ratios of \(\mathcal {I}\) are uniform, \(\tau \times \mu \) is also self-similar and of dimension \(>1\) (for large \(n\)), and we can invoke Theorem [28, Theorem 1.8], which implies that \(\dim \tau *S_t\eta =1\) for all \(t\) outside a set of Hausdorff dimension \(0\), and hence of \(\theta \)-measure \(0\). In the non-uniformly contracting case a minor (but not short) modification of the arguments in [28] is needed; this will appear separately. \(\square \)

This completes the proof of Theorem 1.7.