Heaviness in Symbolic Dynamics: Substitution and Sturmian Systems

Heaviness refers to a sequence of partial sums maintaining a certain lower bound and was recently introduced and studied in"Heaviness: and Extension of a Lemma of Y. Peres."After a review of basic properties to familiarize the reader with the ideas of heaviness, general principles of heaviness in symbolic dynamics are introduced. The classical Morse sequence is used to study a specific example of heaviness in a system with nontrivial rational eigenvalues. To contrast, Sturmian sequences are examined, including a new condition for a sequence to be Sturmian.


Introduction
Dynamical systems devotes much attention to the asymptotic behavior of points or other elements of a system. While asymptotic properties are extremely important, they are in a sense not observable; an observer monitoring a closed system can only ever observe a finite window of time. Suppose that an observer is capable of monitoring the output of a function f over a finite portion of an orbit x, T (x), T 2 (x), . . . , T n (x), and keeps a record of the associated partial sums. Finite observations do not lend themselves to discussion of limits, but any observer might be concerned with extremal behavior of the partial sums (a motivation similar to, but distinct from, that in the study of large deviations). With this restriction and motivation in mind, we define the heavy set (subject to various restrictions to be outlined later) to be those points in a system for which these partial sums maintain a natural lower bound over a natural collection of finite ranges.
In applying these notions to symbolic dynamics, we will note a distinction between systems with rational eigenvalues ( §3) and a class of systems with no rational eigenvalues ( §4). Specifically, the Morse sequence defines a nontrivial system with an abundance of heavy points, while Sturmian sequences have a scarcity of heavy points. Furthermore, Sturmian sequences are most frequently defined with a restriction on the allowable weights of subwords, and heaviness will be concerned with establishing bounds on weights of words, so a connection between the two ideas is developed, most significantly in Theorem 7.
1.1. Background and terminology. Before defining heaviness formally in §2, it is necessary to establish our framework and notation. Let {Ω, µ} be a probability measure space (µ(Ω) = 1). If T : Ω → Ω is µ-measurable, and µ(T −1 Γ) = µ(Γ) for all µ-measurable Γ ⊂ Ω, then we say that T preserves µ, and {Ω, µ, T } is a probability measure preserving system. In this situation, let f ∈ L 1 (Ω, µ). If the only functions f such that f •T = f almost-everywhere are themselves almost-everywhere constant, then T is called ergodic, and if µ is the only preserved probability measure, T is called uniquely ergodic. For ω ∈ Ω, define S n (ω) recursively: S 0 (ω) = 0, S n+1 (ω) = S n (ω) + f • T n (ω) (if T is invertible, we may use this relation to define for all n ∈ Z): In the vein of classical concerns of asymptotic behavior over infinite time frames, define: (2) f * (ω) = lim inf n→∞ 1 n S n (ω), noting that the Birkhoff Ergodic Theorem guarantees that lim n→∞ n −1 S n (ω) exists almost everywhere. We may use the fact that T preserves µ to derive Ω S n (x)dµ = n Ω f dµ. So, in line with our model of an observer of finite time periods, we define: Definition 1. The heavy set for f (relative to T ) between times m and n (m, n ∈ Z, m ≤ n) is given by: In the common event that m = 0, the set is called heavy through time n. We use the shorthand: to define the heavy sets over N and Z. Any use of negative times requires T −1 to exist.
If T and f are clear from the context, we will simply write H(m, n), H(N), or H(Z). These sets represent points whose partial sums meet or exceed the average value of the partial sums over the range of prescribed times. Given the emphasis on finite time periods in defining heavy sets, it is worth pointing out that stating x ∈ H(N) or x ∈ H(Z) should not be read as a statement about the behavior of S n (x) on an infinite time frame, but rather about all finite times. We now present a pair of theorems regarding the existence of such points.
Proof. For a proof of this theorem, see [11].
Theorem 2. If {Ω, µ, T } is a continuous measure-preserving system on a compact probability space, then H f T (N) = ∅ for any upper semi-continuous f ∈ L 1 (Ω, µ). Furthermore, if f is continuous and T is invertible and transitive, then H f T (Z) = ∅. Proof. For a proof of this theorem, see [11]. The first statement follows as a corollary from Theorem 1, but an alternate proof may be found as a lemma of Y. Peres [10].
Remark. If Γ is closed, then H χΓ T (N) = ∅ (as χ Γ is upper semi-continuous), but H χΓ T (Z) = ∅ is true in general only if Γ is clopen (in this case, χ A is continuous).
Remark. It is a common mistake to assume that µ H f T (N) = 0. This claim is obviously false for functions f which are constant almost everywhere, for any function f on an atomic system (any nonempty set is of positive measure), and in general for nonergodic T (see Corollary 1).
In §3.1, we give an example of a uniquely ergodic system without atoms, and a function f which is not constant, such that µ(H(N)) > 0. However, as proved in [7], if T is ergodic, then for almost every ω ∈ Ω, there exists some N = N (ω), 0 < N < ∞, for which S N (ω) ≤ N Ω f dµ. There is no contradiction between this fact and Theorem 1; if H(N) is not a null set, then for almost all ω ∈ H(N) there is some finite N > 0 such that S N (ω) = N Ω f dµ. This situation will be investigated in §3.2.
Proceeding to definitions specific to symbolic dynamics, the reader familiar with standard terminology may skip to §2, except to note the nonstandard Definitions 2 and 3. We let an alphabet A be a subset of R, and elements of A are called letters. An element A ∈ A n is called a word of length n over A. We sometimes write |A|. An element X ∈ A N is said to be a sequence, and X ∈ A Z is a bi-sequence. In the frequent event that A = {0, 1}, the word is called binary. The somewhat nonstandard definition of an alphabet to be an arbitrary subset of R is motivated by the canonical relation of sequences in A N to sequences {f (T i ω)} i=0,1,2,... in a probability measure preserving system {Ω, µ, T } along with f ∈ L 1 (Ω, µ) and ω ∈ Ω. In this case, A is the range of f . By assuming Ω to be compact, continuity of f implies compactness of A, and in the common scenario that f is the characteristic function of a set, A = {0, 1}. By relating points in A N and sequences {f (T i ω)} (i = 0, 1, . . .), then, the shift operator on the space A N is analogous to the transformation T : Ω → Ω, and heaviness statements about measure preserving systems in general may be interpreted as statements about heaviness in shift systems.
For a binary word, we define the conjugate of A = a 0 a 1 . . ., denoted A = a 0 a 1 . . ., by setting 0 = 1 and 1 = 0. For any word A = a 0 . . . a n−1 of length n < ∞ over any alphabet, the transpose of A is denoted and defined by A T = a n−1 a n−2 . . . a 0 (so that A T i = A n−1−i for i = 0, . . . , n − 1). Given two words A = a 0 . . . a m−1 , B = b 0 . . . b n−1 of finite lengths m and n, the concatenation of A and B is the word of length m + n given by AB = a 0 . . . a m−1 b 0 . . . b n−1 . Given a word A ∈ A n , the weight and average weight, respectively, are: A word A of length n is said to be a factor of another word (or sequence) B of length m ≥ n if there is some j ∈ N so that a i = b i+j for i = 0, 1, . . . , n − 1. The complexity function for a sequence or bi-sequence X (over a finite alphabet A) is given by p(n) = #{A ∈ A n : A is a factor of X}.
A binary sequence X is said to be of minimal complexity if p(n) + 1 (any sequence X for which p(n) ≤ n for some n is eventually periodic -see [4]). If A is a word of length n < ∞ such that there are two distinct letters α, β ∈ A such that Aα and Aβ are both factors of X, then A is said to be a right special factor. If there are two distinct letters α and β such that αA and βA are both factors of X, then A is said to be a left special factor. Given a sequence X ∈ A N or A Z , we define the sequence σ(X) by σ(X) n = X n+1 . If A is compact, then so are A N and A Z (in the product topology), and therefore Ω = {σ n X} ∞ n=0 is compact. It is seen that σ is now a continuous map of a compact space. The system generated by the sequence X is the topological dynamical system {σ n X} ∞ n=0 , σ .
Definition 2. Let A = a 0 . . . a n−1 and 0 ≤ i ≤ j ≤ n. Then A i,j = a i . . . a j−1 is a word of length j − i, beginning at index i (note that A = A 0,n , and A i,i is the empty word of length zero).
Definition 3. Let A = a 0 . . . a n−1 be a word of length n, over alphabet A. Then the reversal of A is the word over the alphabet −A = {−α : α ∈ A}, defined and notated by ρ(A) i = −a n−1−i , i = 0, 1, . . . , n − 1.
That is, ρ(A) is the transpose of A, with a negative sign on all entries. For n < 0, Remark. Compare with (1), our partial sums over negative times. By defining f : A Z → R by f (X) = x 0 , and σ is the shift operator, for n ≥ 0, S n (X) = w(X 0,n ), and for n ≤ 0, S n (X) = w(ρ(X n,0 )) (for n = 0, both equal 0, the weight of the empty word). This relation is the motivation for defining ρ.

Heaviness in symbolic dynamics
There are two fruitful ways to define heaviness in symbolic dynamics. The first, α-heaviness ( §2.1), is a direct analogue of the definition of heaviness in Theorem 1. The second way to view heaviness, local heaviness ( §2.2), is more combinatorial in nature. In presenting both views, we will spend some time to familiarize the reader with the definitions by presenting several theorems regarding the existence of such phenomena in very general settings, before we proceed to considering any specific systems.
2.1. α-Heaviness. In situations where we are interested in a global target for heaviness, some fixed α which will act as a lower bound on our partial averages, we proceed as follows: If X is a sequence, then X is α-heavy (α-light) if for all i ∈ N: Remark. Trivially, if A is α-heavy (or light), then so is the initial factor A 0,j for all 0 ≤ j ≤ m.
The following lemma is not difficult, but will be of great use in §4.1: Lemma 1 (The Reversing Principle). Assume that A is of length n + 1, such that A = a 0 . . . a n−1 is α-heavy, but w(a 0 . . . a n−1 a n ) ≤ α.
¿From this point, we refrain from statements in terms of both lightness and heaviness; we refer only to heaviness properties, but analogous statements regarding lightness are all possible. Recall that a set A ⊂ R is well-ordered by '≥' if it contains no infinite increasing sequence. We take '≥' to be our standard ordering; for lightness, '≤' would be the relevant ordering.
Proof. It suffices to prove that if both A and B are well-ordered, then so is the set As A is well-ordered, we may pass to a subsequence γ n(i) such that α n(i) is monotonically decreasing (not necessarily strictly): define n(i + 1) = min{n > n(i) : α n = max{α j : j ≥ n(i)}} and see that α n(i+1) ≤ α n(i) , and each n(i) is defined by well-orderedness of A. As γ n(i) are increasing, and the α n(i) are nonincreasing, the β n(i) must be an infinite increasing sequence in B, a contradiction.
We now investigate just how frequently one may expect to find α-heavy factors of arbitrary sequences, depending on the target value α.
Lemma 3. Let α > −∞ and X be a sequence such that: Then for any δ < α, there is an N ∈ N such that the sequence Proof. Fix δ < α, and assume to the contrary that for every N , there is some f (N ) such that w(X N,f (N ) ) < δ. Set k 0 = 0 and recursively define k i = f (k i−1 ). Then represent X = X k0,f (k0) X k1,f (k1) . . .. It is seen that for all i, w(X 1,f (ki) ) < δ < α, contrary to our assumption that lim inf n→∞ w(X 0,n ) = α.
Corollary 1. Let {Ω, µ, T } be a probability measure preserving system which is not ergodic, and let f ∈ L 1 (Ω, µ) be such that f * (ω) is not almost everywhere equal to a constant. Then µ (H(N)) > 0.
It is not difficult to construct a sequence X of finite upper density: such that X does not have arbitrarily long α-heavy factors (for example, x n /(n + 1) will construct a sequence of upper density one with no 1-heavy factors whatsoever), but the following lemma will extend the idea of Lemma 3 as far as possible: Theorem 3. The alphabet A has the property that every X ∈ A N contains arbi- Similarly to Lemma 3, represent X as a string of concatenated words of average weight strictly less than δ, but note that there is now have a universal bound on the length of the words. It follows that d * B (X) ≤ δ: for very large b i − a i , words of length b i − a i may be considered as a concatenation of factors of length no larger than N , of average weight less than δ, plus small extra pieces at the end of bounded length and weight. So, X must contain arbitrarily long δ-heavy factors for arbitrary δ < d * B (X). Now assume that there is an X and a bound N on the length of any d * B (X)heavy factors of X. For any ǫ > 0, let s = s(ǫ) be a factor of length N which is (d * B (X) − ǫ)-heavy (but by assumption, not d * B (X)-heavy). Define a decreasing sequence ǫ i by fixing an arbitrary ǫ 0 > 0, and defining: Continuing this process, create a sequence of words {S(ǫ i )} ∞ i=0 of length N whose average weights are strictly increasing. By Lemma 3, there is a contradiction. Therefore, X must contain arbitrarily long d * B (X)-heavy factors. The proof of the converse is much shorter: let {α i } ∞ i=0 be a sequence in A which is strictly increasing. Then X = α 0 α 1 . . . does not contain any d * B (X)-heavy factors of any length. 2.2. Local Heaviness. In the following section, we introduce a version of heaviness which does not depend on an arbitrary constant α: Definition 6. A word A = a 0 a 1 . . . a n−1 is said to be heavy, or locally heavy, (light, or locally light), if for all 1 ≤ i ≤ n: Equivalently, for all 1 ≤ i ≤ n − 1: A sequence X is heavy (light) if for every i ≥ 0: The word A is locally heavy if and only if A is w(A)-heavy. However, the 'target value' in this case varies with the word in question, whereas in definition 4 there was a preordained α. In the case A = {0, 1}, light words are called Lyndon words, an object of study in combinatorics and computer science (see [9]). Again, however, we will suppress statements regarding light words.
Remark. In contrast to α-heaviness, it is generally the case that for a given heavy A of length n, there may be some 1 < i < n for which A 0,i are not be locally heavy. Consider the word 1010, which is heavy, and the initial factor 101, which is not.  Proof. First, assume that A is well-ordered. Let X ∈ A N and N < ∞ be such that X contains no heavy factors of length longer than N . Then represent X as a chain of heavy words in the following manner: Let A 1 = x 0 . . . x n1−1 be the longest possible heavy factor beginning at x 0 . By assumption, n 1 ≤ N . Let A 2 = x n1 . . . x n1+n2−1 be the longest heavy factor beginning at x n1 and again note that n 2 ≤ N . Continue in this manner to write X = A 1 A 2 A 3 . . . where each |A i | ≤ N . In light of Lemma 4, the average weights of these blocks must be strictly increasing. Furthermore, because the length of each A n is bounded, there must be some specific length which occurs infinitely often, so there is an infinite collection of words of the same length, with strictly increasing average weight. By Lemma 2, this is impossible. Now, suppose that A has an infinite subsequence {α i } ∞ i=0 which is strictly increasing. Then the sequence X = α 0 α 1 . . . is seen to have no heavy words of length longer than one.
Remark. We do not claim that every X has heavy factors of every length! Consider the sequence 101010 . . .; the alphabet {0, 1} is certainly well-ordered, but X does not have any heavy factors of odd length larger than one.

The Morse-Thue sequence and substitution systems
In this section, we will define the classical Morse sequence, using it as an example to discuss certain aspects of heaviness. After discussing this sequence, we will make brief remarks extending these properties to a general class of substitution systems.  Corollary 3. Let X be any sequence which is a factor of the Morse sequence such that X 0,2 = 11. Then X is 1 2 -heavy. Similarly, if X is a sequence which is a factor of the Morse sequence, and X begins with 00, then X is 1 2 -light. Proof. Pick an initial word X 0,i of the form 11X ′ where 2k < |X ′ | ≤ 2k + 2. Then w(X 0,i ) = 2+w(X ′ ) ≥ k+2, so w(X 0,i ) ≥ 1 2 . The proof is similar for X 0,2 = 00. We now appeal to a well-known result: the system generated by M is uniquely ergodic (see [ 0) is of positive measure; infinite words which end in 00. Therefore, in the two-sided sequence, any bisequence X with x −2 x −1 x 0 x 1 = 0011 will be in H χΓ σ (Z), and the set ∆ ′ = {X : x −2 x −1 x 0 x 1 = 0011} set is also seen to be of positive measure (to be precise, µ(∆ ′ ) = 1 12 ).
Proof. The quantities S n (ω) − nα are bounded almost everywhere (from Theorem 5) and discrete (S n (ω) ∈ Z, and α ∈ Q). Therefore, the minimum value of the sequence {S n (ω) − nα} ∞ n=0 is achieved at some minimal time N (ω) for almost all ω ∈ Ω. So, ω ∈ T −N (ω) (H(N)). As We will provide a brief overview of a class of systems in symbolic dynamics with rational eigenvalues, as well as illustrating why satisfying (5) with an irrational eigenvalue does not guarantee positive measure heavy sets.
The Morse sequence may also be viewed as a fixed point of the substitution defined by 0 → 01 and 1 → 10: In general, define a substitution system Σ on Ω k = {0, 1, . . . , k − 1} by assigning Σ(a) ∈ Ω n(a) k , for all a ∈ Ω k , and n(a) < ∞ for all a (that is, Σ assigns a word to each letter). The substitution matrix for Σ is: A i,j = #{j in Σ(i)}. In the case of the Morse sequence, the matrix is given by A i,j ≡ 1.
Two different substitution systems might have the same matrix. However, if the matrix A is primitive (∃n : A n i,j > 0 ∀i, j), then the shift map defines a uniquely ergodic system {Ω, σ} on some limiting sequence X ∈ {Σ n (x 0 )} ∞ n=1 such that Σ N (X) = X for some N ∈ N (see [6,Ch. 5]).
In the event that σ is a substitution of constant length (n(a) is constant over Ω k ), and X is a periodic point under the substitution Σ, then the system {O + (X), µ, σ} has nontrivial rational eigenvalues (see [6,Ch. 7]), and therefore there are nontrivial µ-integrable functions f : O + (X) → R with positive-measure heavy sets (in light of Corollary 5).

Sturmian sequences
Definition 7. A binary word, sequence, or bi-sequence X is called Sturmian if factors of the same length differ in weight by at most one.
Remark. In a Sturmian sequence, the density of the sequence exists [6, Ch. 6]: We restrict our attention to those Sturmian X of density α / ∈ Q; Sturmian sequences of rational density are eventually periodic. 4.1. α-Heaviness in Sturmian Sequences. We note the following theorem: Theorem 6 (E. Coven, G. Hedlund [4]). A bi-infinite binary sequence X of minimal complexity (p(n) + 1) is Sturmian if and only if for every A which is a factor of X, A T is also a factor of X. and use it to derive the following: Theorem 7. A bi-infinite binary sequence X is Sturmian of irrational density α if and only if for any n ∈ N, there is a unique factor A of length n such that A is α-heavy, and a unique factor A of length n such that ρ(A) is (−α)-heavy.
Proof. Assume X is Sturmian, and let α / ∈ Q be its density. Then by Theorem 3, X contains arbitrarily long α-heavy factors (A = {0, 1} is certainly well-ordered). It is also seen that X contains as factors arbitrarily long transposes of α-light factors (using the 'α-light' version of Theorem 3, and noting that x −1 x −2 x −3 . . . is also Sturmian of density α). So, X has arbitrarily long factors A such that ρ(A) is (−α)-heavy. Therefore, there exists at least one word of each length which satisfies our criteria.
Let A, B be two distinct factors of length n which are α-heavy. Assume they are of minimal length n > 0, so that A = C0 and B = C1 for some factor C. Then w(A) and w(B) are the two possible weights for factors of length n in X, and both weights are at least as large as nα. As α / ∈ Q, both are strictly larger than nα. Therefore, every factor of length n has average weight larger than some α + ǫ, contradicting the fact that X was of density α. The proof is similar for two factors of length n > 0 whose reversals are (−α)-heavy. If α ∈ Q, the result does not hold: consider X = . . . (10)(10)1(10)(10)(10) . . .. This sequence is Sturmian, and the factors 10 and 11 are both 1/2-heavy.
For the converse, it will suffice, in light of Theorem 6, to show that X is of minimal complexity and all transposes of factors are also factors. Assume that the following conditions all hold for 1 ≤ i ≤ n − 1 (they are easy to verify for i = 1): I. There is a unique α-heavy factor of length i, and a unique factor of length i whose reversal is (−α)-heavy. This condition has been assumed for all i.

II. If
A is a factor of length i, then A T is a factor.
III. The factor A of length i − 1 is a right (left) special factor if and only if 1A T is α-heavy (ρ(1A) is −α-heavy).
By combining (I) and (II), X contains unique α-light factors of length i, and unique words of length i whose reversals are (−α)-light. Adding (III), p(n − 1), as there is a unique right (or left) special factor for all i ≤ n − 1. Establishing (II) and (III) for all n, then, would ensure both minimal complexity and admissibility of transposes, sufficient to show that X is Sturmian.
Assume, then, that there is a factor A of length n ≥ 2 such that A T is not a factor. Then, by our inductive hypothesis, A T 0,n−1 and A T 1,n are both factors. As X is a bisequence, both of these words have precursors and successors (they do not begin or end X), so the words a n−1 A T 0,n−1 = a n−1 A T 1,n−1 a 0 and A T 1,n a 0 = a n−1 A T 1,n−1 a 0 are factors. Note, then, that A T 1,n−1 is both a left and right special factor. Therefore, by (III), 1A 1,n−1 is the unique α-heavy factor of length n − 1, and A 1,n−1 1 is the unique α-light factor of length 1 − n. Then 1A 1,n−1 = (A 1,n−1 1) T ⇒ A T 1,n−1 = A 1,n−1 . As A T is not a factor, A = A T , so a 0 = a n−1 . Without loss of generality, let a 0 = 1 and a n−1 = 0. Let B = B T = A 1,n−1 , for convenience.
The following have been shown to be factors of X: 1B0, 0B0, and 1B1. As 1B was α-heavy, 1B1 is the unique α-heavy factor of length n. Therefore, 1B0 is not αheavy, and by the Reversing Principle (Lemma 1), the word ρ(1B0) is (−α)-heavy. Then certainly ρ(0B0) is (−α)-heavy and of the same length, contradicting (I), our original assumption. Therefore, (II) holds for factors of length n as well.
It remains only to show (III) for factors of length n. Let A be the unique right special factor of length n − 1; A1 and A0 both appear. Inductively, then, 1A T is the unique α-heavy factor of length n − 1. However, by (II), both 1A T and 0A T appear. So, if 1A T is not α-heavy, it follows (again, by the Reversing Principle) that A1 and A0 are both α-light, so that the reversals of the transposes are both (−α)-heavy, contradicting (I).
In the reverse, let 1A be the unique α-heavy factor of length n. Then 1A 0,n−2 is the unique α-heavy factor of length n − 1, so A T 0,n−2 is a right special factor. If A T is not a right special factor, then a n−2 A T 1,n−2 is, and the previous reasoning would ensure that 1A 1,n−2 a n−2 would be α-heavy, contradicting (I).
So, there is a 1 − 1 correspondence between α-heavy factors of length n (which exist and are unique) and right special factors of length n − 1. Therefore, p(n) + 1 for all n, and as X contains as factors all transposes of factors, X is Sturmian.
Remark. The heart of the above theorem is that Sturmian words can be characterized as having unique special factors, and when special factors are unique, they can be characterized by a heaviness condition. Corollary 6. Let X be a Sturmian sequence (bisequence), of density α / ∈ Q. Then O + (X) contains exactly one sequence (bisequence) which is α-heavy.
Proof. Let x(n) be the α-heavy factor of X which is of length n. It is seen that for m > n, m, n ∈ N, x(m) = x(n)A for some word A of length m − n. Therefore, in the compact space {0, 1} N , let The sequence X ′ is unique, is in the system generated by X, and X ′ uniquely extends on the left as a Sturmian bisequence (see [6,Ch. 6]).

Local Heaviness in Sturmian Sequences.
We will now approach the subject of Sturmian sequences using local heaviness (Definition 6), rather than αheaviness.
Lemma 6. There is exactly one heavy Sturmian word of length n and weight m for any choice 0 ≤ m ≤ n < ∞.
Proof. For convenience, denote a word of length n and weight m as type (m, n), and the claim is apparent if m = 0 or n = 1. Assume, then, that the claim is true for all words of length smaller than n.
Let A be Sturmian and heavy, of type (pk 1 , pk 2 ), where k 1 and k 2 are relatively prime. Then applying the pigeonhole principle and Lemma 4, A = A 1 A 2 . . . A p , where each A i is Sturmian and heavy, of type (k 1 , k 2 ). Therefore, it is sufficient to prove the claim in the event when m and n are relatively prime. If m > n/2, A must be of the form 1 n0 01 n1 0 . . . 1 n k 0, where the Sturmian condition requires that each n i = N or N + 1 for some N . As (m, n) = 1, both values occur (if all n i = N , for instance, n = (k + 1)(N + 1) and m = (k + 1)N ), and the heaviness condition requires that n 0 = N + 1 and n k = N . Define f (1 ni 0) i − N , and associate to A the smaller word (1 n1 0) . . . f (1 n k 0).
We now show that B is Sturmian. Let B ′ and B ′′ be factors of B of equal length such that w(B ′ ) = w(B ′′ ) + 2. Then consider the two factors of A, f (A ′ ) = B ′ and f (A ′′ ) = B ′′ : w(A ′ ) = w(A ′′ ) + 2, but |A ′ | = |A ′′ | + 2 as well. However, by assuming the minimality on the length of B ′ and B ′′ , B ′′ begins with a zero. Therefore, A ′′ = A 0,N ; A ′′ is preceded by a zero. Let C 1 = 0A ′′ . Also, the last element of A ′ must be a zero, so let C 2 = A ′ 0,|A ′ |−1 . Then w(C 2 ) = w(C 1 ) + 2, but |C 1 | = |C 2 |, contradicting the assumption that A is Sturmian. Now, to show that B is heavy, begin with knowledge that A is heavy and recall n i = N + f (1 ni 0): So B is a heavy Sturmian word of smaller length, and by the inductive hypothesis B is unique, and therefore A is unique.
The proof works similarly for m < n/2, by considering the lengths of blocks of zeroes.
Corollary 7. Let X be a Sturmian sequence which contains at least two ones and two zeroes. Let N = max{n : X contains 1 n or 0 n as a factor}, noting that our assumptions guarantee N < ∞ (and either 11 or 00 is not a factor X). Then X has exactly two distinct heavy factors of lengths n ≤ N , and at most one heavy factor of all other lengths.
Proof. All factors of X are Sturmian, and given a fixed length n, there are at most two weights possible for factors of length n. It is therefore clear in light of Lemma 6 that for any n, there are at most two heavy factors of length n. Assume that d(X) ≥ 1/2, so that N is the longest string of ones. Then 1 N 0 (where 1 N represents a string of N consecutive ones) is a factor, so 1 N −i−1 0 and 1 N −i for i = 0, . . . , N − 1 are also factors, giving two heavy factors of lengths 1, . . . , N . The proof is similar if N is the length of the longest string of ones.
If X has two distinct heavy factors of length n > N , then as they are Sturmian heavy words and distinct, they must be of two different weights, and the weights therefore differ by one. Let B and C be the two factors, and let w(B) = w(C) + 1. That n > N ensures that each of them begins with a one and ends with a zero; a heavy binary word containing both 1 and 0 must begin with a one and end with a zero. Then it is seen that w(B 0,n−1 ) = w(B) = w(C) + 1 = w(C 1,n ) + 2, which contradicts that X is Sturmian.
Corollary 8. Let X be a Sturmian sequence whose density is α / ∈ Q. Then there is a unique locally heavy sequence in O + (X).
Proof. Recall that for a sequence to locally heavy is, by definition, the same as the word being lim sup n→∞ w(X 0,n )-heavy. We appeal to Corollary 6.