Non-normal numbers in dynamical systems fulfilling the specification property

In the present paper we want to focus on this dichotomy of the non-normal numbers -- on the one hand they are a set of measure zero and on the other hand they are residual -- for dynamical system fulfilling the specification property. These dynamical systems are motivated by $\beta$-expansions. We consider the limiting frequencies of digits in the words of the languagse arising from these dynamical systems, and show that not only a typical $x$ in the sense of Baire is non-normal, but also its Ces\`aro variants diverge.


Introduction
Let N ≥ 2 be an integer, called the base, and Σ := {0, 1, . . . , N − 1}, called the set of digits. Then for every x ∈ [0, 1) we denote by where d h (x) ∈ Σ for all h ≥ 1, the unique non-terminating N -ary expansion of x. For every positive integer n and a block of digits b = b 1 . . . b k ∈ Σ k we write Π(x, b, n) := |{0 ≤ i < n : d i+1 (x) = b 1 , . . . , d i+k (x) = b k }| n for the frequency of the block b among the first n digits of the N -ary expansion of x. Furthermore, let Π k (x, n) := (Π(x, b, n)) b∈Σ k be the vector of frequencies of all blocks b of length k. Now we call a number k-normal if for every block b ∈ Σ k of digits of length k, the limit of the frequency Π(x, b, n) exists and equals N −k . A number is called normal with respect to base N if it is k-normal for all k ≥ 1. Furthermore, a number is called absolutely normal if it is normal to any base N ≥ 2.
On the one hand, it is a classical result due to Borel [6] that Lebesgue almost all numbers are absolutely normal. So the set of normal numbers is large from a measure theoretical viewpoint.
On the other hand, it suffices for a number to be not normal if the limit of the frequency vector is not the uniform one. First results concerning the Hausdorff dimension or the Baire category of non-normal numbers were obtained byŠalát [22] and Volkmann [25]. Stronger variants of non-normal numbers were of recent interest. In particular, Albeverio et al. [1,2] considered the fractal structure of essentially non-normal numbers and their variants. The theory of multifractal divergence points lead to the investigation of extremely non-normal numbers by Olsen [14,15] and Olsen and Winter [17]. The important result for our considerations is that both essentially and extremely non-normal numbers are large from a topological point of view.

Definitions and statement of result
We start with the definition of a dynamical system. Let M be a compact metric space and φ : M → M be a continuous map. Then we call the pair (M, φ) a (topological) dynamical system.
The second ingredient is the definition of a topological partition. Let M be a metric space and let P = {P 0 , . . . , P N −1 } be a finite collection of disjoint open sets. Then we call P a topological partition (of M ) if M is the union of the closures P i for i = 0, . . . , N − 1, i.e. M = P 0 ∪ · · · ∪ P N −1 .
Suppose now that a dynamical system (M, φ) and a topological partition P = {P 0 , . . . , P N −1 } of M are given. We want to consider the symbolic dynamical system behind. Therefore, let Σ = {0, . . . , N − 1} be the alphabet corresponding to the topological partition P. Furthermore, define to be the set of words of length k, the set of finite and the set of infinite words over Σ, respectively, where ǫ is the empty word. For an infinite word ω = a 1 a 2 a 3 . . . ∈ Σ N and a positive integer n, let ω|n = a 1 a 2 . . . a n denote the truncation of ω to the n-th place. Finally, for ω ∈ Σ * we denote by [ω] the cylinder set of all infinite words starting with the same letters as ω, i.e.
Now we want to describe the shift space that is generated by our topological partition. Therefore, we call ω = a 1 a 2 . . . a n ∈ Σ n allowed for Let L P,φ be the set of allowed words. Then L P,φ is a language and there is a unique shift space X P,φ ⊆ Σ N , whose language is L P,φ . We call X P,φ ⊆ Σ N the one-sided symbolic dynamical system corresponding to (P, φ).
Furthermore, we split the language up corresponding to the length of the words. For k ≥ 1 we denote by L k = {ω ∈ L P,φ : |ω| = k}.
Then we have that L P,φ = ∞ k=1 L k . Finally, for each ω = a 1 a 2 a 3 . . . ∈ X P,φ and n ≥ 0 we denote by D n (ω) the cylinder set of order n corresponding to ω in M , i.e.
After providing all the ingredients necessary for the statement of our result we want to link this concept with the N -ary representations of Section 1. Example 1. Let M = R/Z be the circle and φ : M → M be defined by φ(x) = N x (mod 1). We divide M into N subintervals P 0 , . . . , P N −1 of the form P i = (i/N, (i + 1)/N ) and let Σ = {0, . . . , N − 1}. Then the underlying system is the N -ary representation. Furthermore, it is easy to verify that the language L P,φ (x) is the set of all words over Σ, so that the one-sided symbolic dynamical system X P,φ is the full one-sided N -shift Σ N .
Our second example will be the main motivation for this paper. In particular, we will consider β-expansions, where β > 1 is not necessarily an integer. These systems are of special interest, since the underlying symbolic dynamical system is not the full-shift. The first authors investigating these number systems were Parry [19] and Renyi [20]. For a more modern account on these number systems we refer the interested reader to the book of Dajani and Kraaikamp [7]. together with φ form a number system partition of M . The corresponding language is called the β-shift (cf. [7,19,20]).
Before extending the notions of normal and non-normal numbers we want to investigate the properties of the β-shift in more detail. We say that a language L fulfills the specification property if there exists a positive integer j ≥ 0 such that we can concatenate any two words a and b by padding a word of length less than j in between, i.e. if, for every pair a, b ∈ L, there exists a word u ∈ L with |u| ≤ j such that aub ∈ L. Furthermore, we call the language connected of order j if this padding word can always be chosen of length j. Note that the β-shift fulfills this property.
Suppose for the rest of the paper that (M, φ) is a number system partition, together with a dynamical system X P,φ that fulfills the specification property with a parameter j. Since the partition P and the transformation φ are fixed, we may write X = X P,φ and L = L P,φ for short.
In order to extend the definition of normal and thus non-normal numbers to M we need that the expansion is unique. Therefore, we suppose that ∞ n=0 D n (ω) consists of exactly one point. This motivates the definition of the map π P,φ : X → M by However, the converse need not be true. In particular, we consider Example 2 with β = 1+ √ 5 2 (the golden mean). Then clearly β 2 − β − 1 = 0. Now on the one hand, every word in X is mapped to a unique real number. However, if we consider expansion of 1 β , which lies between the two intervals P 0 and P 1 , then, since we get that 010101 . . . and 100000 . . . are possible expansions of 1 β . Similarly we get that 101010 . . . and 010000 . . . are possible expansion of 1 β 2 . However, one observes that these ambiguities originate from the intersections of two partitions P i ∩ P j for i = j. Thus we concentrate on the inner points, which somehow correspond to the irrational numbers in the above case of the decimal expansion. Let which is an open and dense (U = M ) set. Then for each n ≥ 1 the set is open and dense in M . Thus by the Baire Category Theorem, the set is dense. Since M \ U ∞ is the countable union of nowhere dense sets it suffices to show that a set is residual in U ∞ in order to show that it is in fact residual in M . Furthermore, for x ∈ U ∞ we may call ω the symbolic expansion of x if π P,φ (ω) = x. Thus in the following we will silently suppose that x ∈ U ∞ .
After defining the environment we want to pull over the definitions of normal and non-normal numbers to the symbolic dynamical system. To this end let b ∈ Σ k be a block of letters of length k and ω = a 1 a 2 a 3 . . . ∈ X be the symbolic representation of an element. Then we write for the frequency of the block b among the first n letters of ω. In the same manner as above let P k (ω, n) = (P(ω, b, n)) b∈Σ k be the vector of all frequencies of blocks b of length k among the first n letters of ω. Let µ be a given φ-invariant probability measure on X and ω ∈ X. Then we call the measure µ associated to ω if there exists a infinite sub-sequence F of N such that for any block b ∈ Σ k lim n→∞ n∈F Furthermore, we call ω a generic point for µ if we can take F = N: then µ is the only measure associated with ω. If µ is the maximal measure, then we call ω normal. Finally, for a φ-invariant probability measure on X we define its entropy by ..,an∈A n µ([a 1 , . . . , a n ]) log(µ([a 1 , . . . , a n ])).
The existence of such an invariant measure for the β-shift was independently proven by Gelfond [8] and Parry [19]. Bertrand-Mathis [4] constructed such an invariant measure by generalizing the construction of Champernowne for any dynamical system fulfilling the specification property. She also showed that this measure is ergodic, strongly mixing, its entropy is log β and it is generic for the maximal measure. An application of Birkhoff's ergodic theorem yields that almost all numbers ω ∈ X are normal (cf. Chapter 3.1.2 of [7]).
Normal sequences for β-shifts where constructed by Ito and Shiokawa [10], however, these expansions provide no admissible numbers. Furthermore, Bertrand-Mathis and Volkmann [5] constructed normal numbers on connected dynamical systems.
We note that we equivalently could have defined the measure-theoretic dynamical system with respect to M instead of X. However, since the definition of essentially and extremely non-normal numbers does not depend on this, we will not consider this in the following.
As already mentioned above, the aim of the present paper is to show that the non-normal numbers are a large set in the topological sense. Sigmund [24] showed, that for any dynamical system fulfilling the specification property, the set of non-normal numbers is residual. However, in the present paper we want to show that even smaller sets, namely the essentially and extremely non-normal numbers, are also residual.
We start by defining the simplex of all probability vectors ∆ k by Let · 1 denote the 1-norm then (∆ k , · 1 ) is a metric space. On the one hand, we clearly have that any vector P k (ω, n) of frequencies of blocks of digits of length k belongs to ∆ k . On the other hand, if we assume for example that the word 11 is forbidden in the expansion. Then the maximum frequency for the single letter 0 is 1 and for 1 is 1 2 . Therefore, the probability vector (0, 1) cannot be reached.
Let A k (ω) be the set of accumulation points of the sequence (P k (ω, n)) n with respect to · 1 , i.e. for ω ∈ X we set A k (ω) := {p ∈ ∆ k : p is an accumulation point of (P k (ω, n)) n } .
Then we define S k as union of all possible accumulation points, i.e.
We note that in the case of N -ary expansions this definition leads to the shift invariant probability vectors (cf. Theorem 0 of Olsen [16]).
We call a number ω ∈ X essentially non-normal if for all i ∈ Σ the limit lim n→∞ P(ω, i, n) does not exist. For the case of N -ary expansions Albeverio et al. [1,2] could prove the following This result has been generalized to Markov partitions whose underlying language is the full shift by the first author [11]. Our first results is the following generalization.
Remark 2.2. The requirement that for each digit we need at least two possible distributions is sufficient in order to prevent that the underlying language is too simple. For example, we want to exclude the case of the shift over the alphabet {0, 1} with forbidden words 00 and 11.
A different concept of non-normal numbers are those being arbitrarily close to any given configuration. In particular, we want to generalize the idea of extremely non-normal numbers and their Cesàro variants to the setting of number system partitions.
For any infinite word ω ∈ X we clearly have A k (ω) ⊂ S k . On the other hand, we call ω ∈ X extremely non-k-normal if the set of accumulation points of the sequence (P k (ω, n)) n (with respect to · 1 ) equals S k , i.e. A k (ω) = S k . Furthermore, we call a number extremly non-normal if it is extremely non-k-normal for all k ≥ 1.
The set of extremely non-normal numbers for the N -ary represenation has been considered by Olsen [16]. Theorem ( [16, Theorem 1]). Let (P, φ) be the N -ary expansion of Example 1. Then the set of extremely non-normal numbers is residual in M .
This result was generalized to iterated function systems by Baek and Olsen [3] and to finite Markov partitions by the first author [11]. Furthermore, number systems with infinite set of digits like the continued fraction expansion or Lüroth expansion were considered by Olsen [12],Šalát [21], respectively. Finally,Šalát [23] considered the Hausdorff dimension of sets with digital restrictions with respect to the Cantor series expansion.
We want to extend this notion to the Cesàro averages of the frequencies. To this end for a fixed block b 1 . . . b k ∈ Σ k let P (0) (ω, b, n) = P(ω, b, n).
For r ≥ 1 we recursively define P (r) (ω, b, n) = n j=1 P (r−1) (ω, b, j) n to be the rth iterated Cesàro average of the frequency of the block of digits b under the first n digits. Furthermore, we define by P (r) k (ω, n) := P (r) (ω, b, n) b∈Σ k the vector of rth iterated Cesàro averages. As above, we are interested in the accumulation points. Thus similar to above let A    As above, this has already been considered for the case of the N -ary expansion by Hyde et al. [9].
Theorem ( [9, Theorem 1.1]). Let (P, φ) be the N -ary representation of Example 1. Then for all r ≥ 1 the set E (r) 1 is residual. In the context of extremely non-normal numbers our result is the following. Theorem 2.3. Let k, r and N be positive integers. Furthermore, let P = {P 0 , . . . , P N −1 } be a number system partition for (M, φ). Suppose that L P,φ fulfills the specification property. Then the set E (r) k is residual. Since the set of non-normal numbers is a countable intersection of sets E (r) k we get the following Corollary 2.4. Let N be a positive integer and P = {P 0 , . . . , P N −1 } be a number system partition for (M, φ). Suppose that L P,φ fulfills the specification property. Then the sets E (r) and E are residual.

Proof of Theorem 2.1
Before we start proving Theorem2.1, we will construct sets Z n which we will use in order to "measure" the distance between the proportion of occurrences of blocks and q. Let k ∈ N and q ∈ S k be fixed. For n ≥ 1 let The main idea consists now in the construction of a word having the desired frequencies. In particular, for a given word ω we want to show that we can add a word from Z n to get a word whose frequency vector is sufficiently near to q. Therefore, we first need that Z n is not empty. Proof. This is essentially Theorem 6 of [13] (see also Theorem 7.1 of [18]).
In our considerations we have two main differences. On the one hand, we consider dynamical system fulfilling the specification property, whereas Olsen [13] investigates subshifts of finite type modelled by a directed and strongly connected multigraph. However, in his proof he never uses the finitude of the set of exceptions. This means that they stay true if we replace the subshift of finite type by one fulfilling the specification property.
Another difference is that we have a number system partition, whereas Olsen [13] analyses a graph directed self-conformal iterated function system satisfying the Strong Open Set Condition. In the iterated function system, we have first the functions and the partition and in our case we have first a partition and then the restricted function. Therefore, this changes only the point of view. Furthermore, the Strong Open Set Condition is satisfied by the topological partition, which we use.
After adapting these differences the proof runs along the same lines.
Lemma 3.2. For all n ≥ 1, q ∈ S k and k ∈ N we have Z n (q, k) = ∅.
Proof. If follows from Theorem 3.1 that dim{ω ∈ X : lim which implies that {ω ∈ X : lim n→∞ P k (ω, n) = q} = ∅. Thus we chose a ω ∈ X such that lim n→∞ P k (ω, n) = q. Then for sufficiently large ℓ the truncated word ω | ℓ lies in Z n .
Since we may not put any two words together, we use the specification property do define a modified concatenation. For any pair of finite words a and b we fix a u a,b with |u a,b | ≤ j such that au a,b b ∈ L. Then for a 1 , . . . , a m ∈ L and n ∈ N we write a 1 ⊙ a 2 ⊙ · · · ⊙ a m := a 1 u a1,a2 a 2 u a2,a3 a 3 · · · a m−1 u am−1,am a m and a ⊙n := a ⊙ a ⊙ · · · ⊙ a n times .
Then we have the following result.
Let I n be the interior of C n . Let D n = ∪ ∞ k=n I k , and F = ∩ ∞ n=1 D n . It is clear that D n is open and dense in X P,φ . Since F is a countable intersection of open and dense sets, it is residual. We now need to show that if w ∈ F , then w is essentially non-normal. Let w ∈ F . Then there exists (n k ) k ⊆ N such that w ∈ C n k . Lemma 3.3 then implies that for each digit of i, the sequence (P(ω, i, n)) n does not converge.

Proof of Theorem 2.3
Now we draw our attention to the case of extremely non-normal numbers and their Cesàro variants. Let k ∈ N and q ∈ S k be fixed throughout the rest of this section. We consider how many copies of elements in Z n we have to add in order to get the desired properties.
Lemma 4.1. Let q ∈ S k and n, t ∈ N be positive integers. Furthermore, let ω = ω 1 . . . ω t ∈ L t be a word of length t. Then, for any γ ∈ Z n (q, k) and any Proof. We set s := |γ|, σ := ω ⊙ γ ⊙ℓ and L := |σ|. For a fixed block i an occurrence can happen in ω, in γ or somewhere in between. Thus for every i ∈ Σ k we clearly have that Now we concentrate on the occurrences inside the copies of γ and show that we may neglect all other occurrences, i.e.
We will estimate both parts separately. For the first one we get that where we have used that L ≥ ℓs ≥ ℓn(k + j) |L k |.
For the second part we get that Putting these together yields By our assumptions on the size of ℓ in (4.1) this proves the lemma.
As in the papers of Olsen [12,16] our main idea is to construct a residual set E ⊂ E (r) k . But before we start we want to ease up notation. To this end we recursively define the function ϕ 1 (x) = 2 x and ϕ m (x) = ϕ 1 (ϕ m−1 (x)) for m ≥ 2. Furthermore, we set D = (Q N ∩ S k ). Since D is countable and dense in S k we may concentrate on the probability vectors q ∈ D.
Now we say that a sequence (x n ) n in R N k has property P if for all q ∈ D, m ∈ N, i ∈ N, and ε > 0, there exists a j ∈ N satisfying: (1) j ≥ i, if j < n < ϕ m (j) then x n − q < ε. Then we define our set E to consist of all frequency vectors having property P , i.e.
Proof. For fixed h, m, i ∈ N and q ∈ D, we say that a sequence (x n ) n in R N k has property P h,m,q,i if for every ε > 1/h, there exists j ∈ N satisfying: if j < n < ϕ m (2 j ), then x n − q < ε. Now let E h,m,q,i be the set of all points whose frequency vector satisfies property P h,m,q,i , i.e.
has property P h,m,q,i .
Let ω ∈ X be such that x = π(ω) and set t := ϕ m (2 j ). Since D t (ω) is open, there exists a δ > 0 such that the ball B(x, δ) ⊆ D t (ω). Furthermore, since all y ∈ D t (ω) have their first ϕ m (2 j ) digits the same as x, we get that (2) E h,m,q,i is dense. Let x ∈ U ∞ and δ > 0. We must find y ∈ B(x, δ) ∩ E h,m,q,i .
Let ω ∈ X be such that x = π(ω). Since D t (ω) → 0 and x ∈ D t (ω) there exists a t such that D t (ω) ⊂ B(x, δ). Let σ = ω|t be the first t digits of x.
Now, an application of Lemma 3.2 yields that there exists a finite word γ such that Let ε ≥ 1 h and L be as in the statement of Lemma 4.1. Then we choose j such that j 2 j < ε and j ≥ max (L, i) . An application of Lemma 4.1 then gives us that Thus we choose y ∈ D j (σγ * ). Then on the one hand, y ∈ D j (σγ * ) ⊂ D t (ω) ⊂ B(x, δ) and on the other hand, y ∈ D j (σγ * ) ⊂ E h,m,q,i It follows that E is the countable intersection of open and dense sets and therefore E is residual in U ∞ . Lemma 4.3. Let ω ∈ X P,φ . If (P (r) (ω, n)) ∞ n=1 has property P , then also (P (r+1) (ω, n)) ∞ n=1 has property P . This is Lemma 2.2 of [9]. However, the proof is short so we present it here for completeness.

NON-NORMAL NUMBERS IN DYNAMICAL SYSTEMS FULFILLING THE SPECIFICATION PROPERTY 11
Proof of Theorem 2.3. Since by Lemma 4.2 E is residual in U ∞ and by Lemma 4.4 E is a subset of E (r) k we get that E (r) k is residual in U ∞ . Again we note that M \ U ∞ is the countable union of nowhere dense sets and therefore E (r) k is also residual in M .