Non-normal numbers with respect to Markov partitions

We call a real number normal if for any block of digits the asymptotic 
frequency of this block in the $N$-adic expansion equals the expected 
one. In the present paper we consider non-normal numbers and, in 
particular, essentially and extremely non-normal numbers. We call a 
real number essentially non-normal if for each single digit there 
exists no asymptotic frequency of its occurrence. Furthermore we call 
a real number extremely non-normal if all possible probability vectors 
are accumulation points of the sequence of frequency vectors. Our aim 
now is to extend and generalize these results to Markov partitions.


Introduction
Let N ≥ 2 be an integer, called the base, and D := {0, 1, . . . , N − 1}, called the set of digits. Then for every x ∈ [0, 1) we denote by where d h (x) ∈ D for all h ≥ 1, the unique non-terminating N -ary expansion of x. For every positive integer n and a block of digits b = b 1 . . . b k ∈ D k we write Π(x, b, n) := |{0 ≤ i < n : d i+1 (x) = b 1 , . . . , d i+k (x) = b k }| n for the frequency of the block b among the first n digits of the N -ary expansion of x. Furthermore let Π k (x, n) := (Π(x, b, n)) b∈D k be the vector of frequencies of all blocks b of length k. Now we call a number k-normal if for every block b ∈ D k of digits of length k, the limit of the frequency Π(x, b, n) exists and equals N −k . A number is called normal with respect to base N if it is k-normal for all k ≥ 1. Furthermore a number is called absolutely normal if it is normal to any base N ≥ 2.
On the one hand it is a classical result due to Borel [5] that Lebesgue almost all numbers are absolutely normal. So the set of normal numbers is large from a measure theoretical viewpoint.
On the other hand it suffices for a number to be not normal if the limit of the frequency vector is not the uniform one. First results concerning the Hausdorff dimension or the Baire category of non-normal numbers were obtained byŠalát [16] and Volkmann [19]. Stronger variants of non-normal numbers were of recent interest. In particular, Albeverio et al. [1,2] considered the fractal structure of essentially non-normal numbers and their variants. The theory of multifractal divergence points lead to the investigation of extremely non-normal numbers by Olsen [11,12] and Olsen and Winter [14]. The important result for our considerations is that both essentially and extremely non-normal numbers are large from a topological point of view. The present paper focuses on this dichotomy of the non-normal numbers: on the one hand they are a set of measure zero and on the other hand they are residual.
Before we get into the statements and their proofs we want to define essentially and extremely non-normal numbers. We call a number essentially non-normal if every single digit has no limit frequency, i.e. for every b ∈ D we have that lim n→∞ Π(x, b, n) does not exist. Let L be the set of essentially non-normal numbers in base N . Then Albeverio et al. [1,2] could prove amongst other results the following Theorem ( [1, 2, Theorem 1]). The set L is residual, i.e. [0, 1) \ L is of the first category. In particular, the set L is of the second category.
Another class of non-normal numbers are extremely non-normal numbers. Let S k be the set of shift invariant probability vectors, i.e.
Then we call a number x ∈ [0, 1] extremely non-k-normal (with respect to base N ) if each shift invariant probability vector p ∈ S k occurs as accumulation point of the sequence (Π k (x, n)) n . Furthermore we call a number x ∈ [0, 1] extremely non-normal (with respect to N ) if it is extremely non-k-normal (with respect to N ) for every k ≥ 1. We denote by E k,N and E N the sets of extremely non-k-normal and extremely non-normal numbers, respectively. Finally let E be the set of extremely non-normal numbers to all bases N ≥ 2, i.e.
Olsen [13] was able to prove that the set E is big in the topological sense. Theorem ( [13, Theorem 1]). The set E is residual, i.e. [0, 1) \ E is of the first category. In particular, the set E is of the second category.
This result was generalized to iterated function systems by Baek and Olsen [3]. This result stays in connection with our Theorem 2.4. However, on the one hand we use a direct approach for the proof, whereas Baek and Olsen map the iterated function system to the N -ary number system. On the other hand we also consider Cesàro variants of the extremely normal numbers. Furthermore number systems with infinite set of digits like the continued fraction expansion and to Lüroth expansion were considered by Olsen [10] andŠalát [15] respectively. FinallyŠalát [17] considered the Hausdorff dimension of sets with digital restrictions with respect to the Cantor series expansion.
We now want to introduce the Cesàro variants of extremely non-normal numbers. The main idea is that if the limit Π(x, b, n) does not exist, the Cesàro limit might. In particular, for i ∈ D let and for r ≥ 2, let n be the rth iterated Cesàro average. As above let denote the vector of rth iterated Cesàro averages. Then Hyde et al. [8] considered the Cesàro average of the frequencies and were able to show the following The main goal of this paper is to generalize and extend the above mentioned results to Markov partitions. In our definitions regarding symbolic dynamical systems we mainly follow Chapter 6 of Lind and Marcus [9].
We start with the definition of a dynamical system. Let M be a compact metric space and φ : M → M be a continuous map. Then we call the pair (M, φ) a dynamical system.
The second ingredient is the definition of a topological partition. Let M be a metric space and let P = {P 0 , . . . , P N −1 } be a finite collection of disjoint open sets. Then we call P a topological partition (of M ) if M is the union of the closures P i for i = 0, . . . , N − 1, i.e.
Suppose now that a dynamical system (M, φ) and a topological partition P = {P 0 , . . . , P N −1 } of M are given. We want to consider the symbolic dynamical system behind. Therefore let Σ = {0, . . . , N − 1} be the alphabet corresponding to the topological partition P. Futhermore define to be the set of words of length k, the set of finite and the set of infinite words over Σ, respectively. For an infinite word ω = a 1 a 2 a 3 . . . ∈ Σ N and a positive integer n, let ω|n = a 1 a 2 . . . a n denote the truncation of ω to the n-th place. Finally for ω ∈ Σ * we denote by [ω] the cylinder set of all infinite words starting with the same letters as ω, i.e.
Now we want to describe the shift space that is generated by our Markov partition. Therefore Let L P,φ be the set of allowed words. Then L P,φ is a language and there is a unique shift space X P,φ ⊆ Σ N , whose language is L P,φ . We call X P,φ ⊆ Σ N the one-sided symbolic dynamical system corresponding to (P, φ). Finally for each ω = a 1 a 2 a 3 . . . ∈ X P,φ and n ≥ 0 we denote by D n (ω) the cylinder set of order n corresponding to ω in M , i.e., Now we can state the definition of a Markov partition.
Definition 2.1. Let (M, φ) be a dynamical system and P = {P 0 , . . . , P N −1 } be a topological partition of M . Then we call P a Markov partition if the generated shift space X P,φ is of finite type and for every ω ∈ X P,φ the intersection ∞ n=0 D n (ω) consists of exactly one point. After providing all the ingredients necessary for the statement of our result we want to link the introduced concept of Markov partitions with the N -ary representations of Section 1 and with matrix number systems (cf. Gröchenig and Haas [7]).
We divide M into N subintervals P 0 , . . . , P N −1 of the form P i = (i/N, (i + 1)/N ) and let Σ = {0, . . . , N − 1}. Then the underlying system is the N -ary representation. Furthermore it is easy to verify that the language L P,φ (x) is the set of all words over Σ, so that the one-sided symbolic dynamical system X P,φ is the full one-sided N -shift Σ N .
Example 2. Let A ∈ Z n×n be an expanding matrix and let D ⊂ Z n be a complete set of residues modulo AZ n . Then there exists a unique compact set M such that

hal-00782110, version 1 -29 Jan 2013
The sets P d := (A −1 (K + d)) • together with φ : Ax mod AZ n form a Markov partition of M . It is again easy to verify that the corresponding language is the set of all words over D, which can be mapped to {0, . . . , N − 1} for some integer N .
We note that Example 2 also contains the cases of canonical number systems as a special case. For further information on the different dynamical aspects we refer the interested reader to the survey of Barat et al. [4] and the references therein.
In order to extend the definition of normal and thus non-normal numbers to M we need the map π P,φ : X P,φ → M defined by By the definition of a Markov partition we have that every ω ∈ X P,φ maps to a unique element x ∈ M . However, the converse need not be true. In particular, let us consider Example 1 with N = 10 (the decimal expansion in the unit interval). Then on the one hand every expansion is mapped to a unique real number. On the other hand the expansions 0.99999 . . . and 1.00000 . . . correspond to the same element. Similarly we get that 0.39999 . . . = 0.40000. However, one observes that these ambiguities originate from the intersections P i ∩ P j for i = j. Thus we concentrate on the inner points, which somehow correspond to the irrational numbers in the above case of the decimal expansion. Let which is an open and dense (U = M ) set. Then for each n ≥ 1 the set is open and dense in M . Thus by the Baire Category Theorem, the set is dense. Since M \ U ∞ is the countable union of nowhere dense sets it suffices to show that a set is residual in U ∞ in order to show that it is residual in M . Furthermore for x ∈ U ∞ we may call ω the symbolic expansion of x if π P,φ (ω) = x. Thus in the following we will silently suppose that After defining the environment we want to pull over the definitions of normal and non-normal numbers to the symbolic dynamical system. To this end let b ∈ Σ k be a block of letters of length k and ω = a 1 a 2 a 3 . . . ∈ X P,φ be the symbolic representation of an element. Then we write for the frequency of the block b among the first n letters of ω. In the same manner as above let be the vector of all frequencies of blocks b of length k among the first n letters of ω.
In order to properly define normal numbers we need a probability measure on M . Therefore let B be the σ-algebra generated by the cylinder sets of M and µ be a probability measure defined on B.

hal-00782110, version 1 -29 Jan 2013
Furthermore, a number ω ∈ X P,φ is called normal (with respect to (M, φ, B, µ)) if it is k-normal for every k ≥ 1. An application of Birkhoff's ergodic theorem yields for µ being ergodic that almost all numbers ω ∈ X P,φ are normal (cf. Chapter 3.1.2 of [6]). We note that we equivalently could have defined the measure-theoretic dynamical system with respect to X P,φ instead of M . However, since the definition of essentially and extremely nonnormal numbers does not depend on this, we will not consider this in the following.
As already mentioned above the aim of the present paper is to show that the non-normal numbers are a large set in the topological sense. In particular, we want to generalize the theorems from above by Albeverio et al. [2] and Olsen [13] to this new setting. Thus we call a number ω ∈ X P,φ essentially non-normal if for all i ∈ Σ the limit lim n→∞ P(ω, i, n) does not exist. By abuse of notation we denote by L the set of essentially non-normal numbers in M . Then our first result is the following Theorem 2.1. Let P = {P 0 , . . . , P N −1 } be a one-sided Markov partition for (M, φ). Suppose that X P,φ is the one-sided full-shift. Then the set of essentially non-normal numbers is residual.
Remark 2.2. The requirement that X P,φ is the full-shift is an artifact of the used construction. In particular, we suppose that a similar result can be shown for any shift of finite type fulfilling some mild conditions in order to exclude some trivial cases. For example, we want to exclude the case of the shift over the alphabet {0, 1} with forbidden words 00 and 11.
A different concept of non-normal numbers are those being arbitrarily close to any given configuration. In particular, we want to generalize the idea of extremely non-normal numbers and their Cesàro variants to the setting of Markov partitions. Following Olsen's paper [13] we start by defining the simplex of all probability vectors ∆ k by Let · 1 denote the 1-norm then (∆ k , · 1 ) is a metric space. On the one hand we clearly have that any vector P k (ω, n) of frequencies of blocks of digits of length k belongs to ∆ k . On the other hand we want to quantify the non-normality of a number by considering the extend to which the sequence (P k (ω, n)) n fills up the simplex ∆ k . Following the arguments of Volkmann [18] or Olsen [10,13] we get that the sequences (P k (ω, n)) n can only fill up an essential part of ∆ k for any ω and k. In particular, let us consider all possible ways a block of length 2, such as 28, can give rise to one of length 3. Thus we get that i∈Σ P(ω, i28, n) − i∈Σ P(ω, 28i, n) ≤ 1 n (2.2) for all ω. This implies, that for each ω, all but finitely many points in the sequence (P 3 (ω, n)) n will be very close to the subsimplex Thus we get as Olsen [13] that the sequence (P k (ω, n)) n does not fill up a significant part of ∆ k . In particular, the simplex ∆ k is not the "correct" object to consider. Rather we need to consider the subsimplex of shift invariant probability vectors S k , i.e.
Now we define the second ingredient for extremely non-normal numbers, namely the accumulation points of the frequency vectors. In particular, let A k (ω) be the set of accumulation points hal-00782110, version 1 -29 Jan 2013 of the sequence (P k (ω, n)) n with respect to · 1 , i.e. for ω ∈ X P,φ we set A k (ω) := {p ∈ ∆ k : p is an accumulation point of (P k (ω, n)) n } .
Our first step is to show, that the sub-simplex S k is really the right object of investigation. Therefore we have the following Proof. We prove this by showing that for each accumulation point also its shifts must lie in A k (ω). Let p = (p i ) i∈Σ k be an accumulation point of the sequence (P k (ω, n)) n with respect to the 1-norm. Then there exists an increasing sequence (n m ) m of positive integers such that (P k (ω, n m )) m − p 1 → 0.
Now we use the idea of (2.2) in order to consider all the possible ways a block of length k − 1 can be extended to one of length k. Thus which implies that i∈Σ p ii = i∈Σ p ii for all i ∈ Σ k−1 .
We call an infinite word ω ∈ X P,φ extremely non-k-normal if the set of accumulation points of the sequence (P k (ω, n)) n (with respect to · 1 ) equals S k , i.e. A k (ω) = S k . Furthermore we call a number extremly non-normal if it is extremely non-k-normal for all k ≥ 1.
As above we also want to extend this notion to the Cesàro averages of the frequencies. To this end for a fixed block b 1 . . . b k ∈ Σ k let P (1) (ω, b, n) = P(ω, b, n).
For r ≥ 2 we recursively define P (r) (ω, b, n) = n j=1 P (r−1) (ω, b, j) n to be the rth iterated Cesàro average of the frequency of the block of digits b under the first n digits. Furthermore we define by P (r) k (ω, n) := P (r) (ω, b, n) b∈Σ k the vector of rth iterated Cesàro averages. As above we are interested in the accumulation points. Thus similar to above let A (r) k (ω) denote the set of accumulation points of the sequence (P (r) k (ω, n)) n with respect to · 1 ,i.e. A (r) k (ω) := p ∈ ∆ k : p is an accumulation point of (P (r) k (ω, n)) n .
We will denote the set of extremely non-k-normal numbers of M by E (1) k . Similarly for r ≥ 1 and k ≥ 1 we denote by E (r) k the set of rth iterated Cesàro extremely non-k-normal numbers of M . Furthermore for r ≥ 1 we denote by E (r) the set of rth iterated Cesàro extremely non-normal numbers and by E the set of completely Cesàro extremely non-normal numbers, i.e.
Then our result is the following hal-00782110, version 1 -29 Jan 2013 Theorem 2.4. Let k, r and N be positive integers. Furthermore let P = {P 0 , . . . , P N −1 } be a one-sided Markov partition for (M, φ). Suppose that X P,φ is the one-sided full-shift. Then the set E (r) k is residual.
Since the set of non-normal numbers is a countable intersection of sets E (r) k we get the following generalization of the result by Hyde et al. [8].
Corollary 2.5. Let N be a positive integer and P = {P 0 , . . . , P N −1 } be a one-sided Markov partition for (M, φ). Suppose that X P,φ is the one-sided full-shift. Then the sets E (r) and E are residual.

Proof of Theorem 2.1
We borrow the method of Albeverio et al. [2] in order to construct a set of infinite words such that no limiting frequency of a single digit exits. In particular, this means that this is a subset of the set of essentially non-normal numbers, which is already residual.
For n ≥ 1 let γ n be the word and C n be the set We fix ω ∈ C n and a digit i ∈ Σ. Then we show that C n has the desired properties. In particular, we compare the number of occurrences of i in ω just after the block of is in γ n and after the final block of (N − 1)s. Thus for i ∈ {0, . . . , N − 2} we set k n (i) = n(i + 2) and k n (i) = n(N + 1). Then we get that P(ω, i, k n (i)) = n + P(ω, i, n) n(i + 2) , P(ω, i, k n (i)) = n + P(ω, i, n) n(N + 1) .
Since ω was arbitrary we get for every i ∈ Σ two sub-sequences of positions k n (i) and k n (i) such that their respective number of occurrences of the digit i tends to different limits.
In the next step we carry this construction over to the set U ∞ defined in (2.1), which is sufficient for the proof. For n ≥ 1 we set We note that these sets are the "inner points" of the sets C n . Any point in I n has a corresponding representation in C n . Furthermore we set In order to show that L is residual in U ∞ it suffices to show that every F n is open and dense.

hal-00782110, version 1 -29 Jan 2013
Since L is the countable union of open and dense sets we get that L is residual in U ∞ . Finally we note that if x ∈ L, then x ∈ F n , ∀n ∈ N. Therefore there exists an increasing sequence of integers {n m } such that x ∈ I nm for all m ∈ N. Thus for any fixed x ∈ L and every i ∈ Σ one can choose two sequences k nm (i) and k nm (i) such that where ω is the symbolic expansion of x. Hence, L ⊂ L, which proves Theorem 2.1.

Proof of Theorem 2.4
Now we draw our attention to the case of extremely non-normal numbers and their Cesàro variants. Let k ∈ N and q ∈ S k be fixed throughout the rest of this section. For n ≥ 1 we define a set Z n which we will use in order to "measure" the distance of the number of occurrences and q, i.e.
The main idea consists now in the construction of a word having the desired frequencies. In particular, for a given word ω we want to show that we can add sufficiently many copies of a word from Z n to get a word whose frequency vector is sufficiently near to q. Therefore we first need, that Z n is not empty. . For all n ≥ 1, q ∈ S k and k ∈ N we have Z n (q, k) = ∅. Now we consider how many copies of elements in Z n we have to add in order to get the desired properties.
Furthermore we set q and 0 ≤ r < s such that m = t + qs + r. For a fixed block i an occurrence can happen in ω, in γ, somewhere in between or at the end. Thus for every i ∈ Σ k we clearly have that Now we concentrate on the occurrences in multiples of γ and show that we may neglect those outside of γ, i.e., We will estimate both parts separately. For the first one we get that where we have used that ≥ qs ≥ qnkN k .

hal-00782110, version 1 -29 Jan 2013
For the second part we get that Putting these together yields By our assumptions on the size of in (4.1) this proves the lemma.
As in the papers of Olsen [10,13] our main idea is to construct a residual set E ⊂ E (r) k . But before we start we want to ease up notation. To this end we recursively define the function ϕ 1 (x) = 2 x and ϕ m (x) = ϕ 1 (ϕ m−1 (x)) for m ≥ 2. Furthermore we set D = (Q N ∩ ∆ N ). Since D is countable and dense in ∆ N we may concentrate on the probability vectors q ∈ D.
Now we say that a sequence (x n ) n in R N k has property P if for all q ∈ D, m ∈ N, i ∈ N, and ε > 0, there exists a j ∈ N satisfying: (1) j ≥ i, Then we define our set E to consist of all frequency vectors having property P , i.e.
Proof. For fixed h, m, i ∈ N and q ∈ D, we say that a sequence (x n ) n in R N k has property P h,m,q,i if for every ε > 1/h, there exists j ∈ N satisfying: (1) j ≥ i, if j < n < ϕ m (2 j ), then x n − q < ε. Now let E h,m,q,i be the set of all points whose frequency vector satisfies property P h,m,q,i , i.e. E h,m,q,i := x ∈ U ∞ : P has property P h,m,q,i .
Let ω ∈ X be such that x = π(ω) and set t := ϕ m (2 j ). Since D t (ω) is open, there exists a δ > 0 such that the ball B(x, δ) ⊆ D t (ω). Furthermore, since all y ∈ D t (ω) have their first ϕ m (2 j ) digits the same as x, we get that
Let ω ∈ X be such that x = π(ω). Since D t (ω) → 0 and x ∈ D t (ω) there exists a t such that D t (ω) ⊂ B(x, δ). Let σ = ω|t be the first t digits of x.
Now, an application of Lemma 4.1 yields that there exists a finite word γ such that Let ε ≥ 1 h and L be as in the statement of Lemma 4.2. Then we choose j such that j 2 j < ε and j ≥ max (L, i) . An application of Lemma 4.2 then gives us that Thus we choose y ∈ D j (σγ * ). Then on the one hand y ∈ D j (σγ * ) ⊂ D t (ω) ⊂ B(x, δ) and on the other hand y ∈ D j (σγ * ) ⊂ E h,m,q,i It follows that E is the countable intersection of open and dense sets and therefore E is residual in U ∞ .

hal-00782110, version 1 -29 Jan 2013
Thus it suffices to show that p is an accumulation point of (P (r) k (ω, n)) n for any p ∈ S k . Therefore we fix h ∈ N and find a q ∈ D such that p − q < 1 h .
Since (P (r) k (ω, n)) n has property P for any m ∈ N we find j ∈ N with j ≥ h and such that if j < n < ϕ m (2 j ) then P (r) k (ω, n) − q < 1 h . Hence let n h be any integer with j < n h < ϕ m (2 j ), then P (r) Thus each n h in the sequence (n h ) h satisfies Since n h > h we may extract an increasing sub-sequence (n hu ) u such that P (r) k (ω, n hu ) → p for u → ∞. Thus p is an accumulation point of P (r) k (ω, n), which proves the lemma.
Proof of Theorem 2.4. Since by Lemma 4.3 E is residual in U ∞ and by Lemma 4.5 E is a subset of E (r) k we get that E (r) k is residual in U ∞ . Again we note that M \ U ∞ is the countable union of nowhere dense sets and therefore E (r) k is also residual in M .