Initial nonrepetitive complexity of regular episturmian words and their Diophantine exponents

Regular episturmian words are episturmian words whose directive words have a regular and restricted form making them behave more like Sturmian words than general episturmian words. We present a method to evaluate the initial nonrepetitive complexity of regular episturmian words extending the work of Wojcik on Sturmian words. For this, we develop a theory of generalized Ostrowski numeration systems and show how to associate with each episturmian word a unique sequence of numbers written in this numeration system. The description of the initial nonrepetitive complexity allows us to obtain novel results on the Diophantine exponents of regular episturmian words. We prove that the Diophantine exponent of a regular episturmian word is ﬁnite if and only if its directive word has bounded partial quotients. Moreover, we prove that the Diophantine exponent of a regular episturmian word is strictly greater than 2 if the sequence of partial quotients is eventually at least 3. Given an inﬁnite word x over an integer alphabet, we may consider a real number ξ x having x as a fractional part. The Diophantine exponent of x is a lower bound for the irrationality exponent of ξ x . Our results thus yield nontrivial lower bounds for the irrationality exponents of real numbers whose fractional parts are regular episturmian words. As a consequence, we identify a new uncountable class of transcendental numbers whose irrationality exponents are strictly greater than 2. This class contains an uncountable subclass of Liouville numbers.


Introduction
The fractional part of the expansion of a real number in some base can be interpreted as a rightinfinite word.Major open problems in number theory concern the expansions of well-known numbers such as √ 2, π, or e.The inverse problem of inferring properties of a number ξ x whose fractional part matches a prescribed infinite word x has attracted much attention especially in the last two decades.One of the most significant results is that of Adamczewski and Bugeaud [3] from 2007 stating that if the factor complexity function p(x, n) of an aperiodic infinite word x (a purely combinatorial notion) is sublinear, then ξ x is transcendental.More recently, Bugeaud and Kim [13] introduced the notion of the exponent of repetition of an infinite word and studied it in relation to Sturmian words proving, among other results, that if lim n→∞ (p(x, n) − n) < ∞, then the irrationality exponent of ξ x is at least 5/3 + 4 √ 10/15.The exponent of repetition is closely linked to the notion of the Diophantine exponent of an infinite word.The significance of this notion is that the Diophantine exponent of x is a lower bound to the irrationality exponent of ξ x .In this paper, we consider the class of regular episturmian words and prove results on their Diophantine exponents by characterizing their initial nonrepetitive complexity function.This provides novel results on the irrationality exponents of numbers whose fractional parts match a regular episturmian word.

Episturmian Words
An infinite word is Sturmian if it has exactly n + 1 distinct factors (subwords) of length n for all n.Sturmian words have many equivalent definitions and they can be generalized in various ways depending on the definition being used.Generalizing the work of de Luca on iterated palindromic closure, Droubay, Justin, and Pirillo introduced in [16] episturmian words that further generalize the so-called Arnoux-Rauzy words [8].This purely combinatorial generalization is defined as follows.For a finite word w, let w (+) be the shortest palindrome having w as a prefix.Let ∆ = y 1 y 2 • • • be an infinite word and define a sequence (u k ) of finite words as follows: The limit c ∆ of the words (u k ) is the standard episturmian word with directive word ∆.An episturmian word with directive word ∆ is an infinite word sharing the set of factors with c ∆ .Sturmian words correspond to directive words that are binary and not ultimately constant.The most famous standard episturmian word that is not Sturmian is the Tribonacci word 01020100102010102010010201020100102010102010010201 • • • having directive word 012012012 • • • .The main results of this paper are stated for regular episturmian words whose directive words are of special form.Write ∆ in the form x a 1 1 x a 2 2 • • • with x k ̸ = x k+1 and a k > 0 for all k.If the sequence x 1 x 2 • • • equals the periodic sequence with period 012 • • • (d − 1) for some d, then we say that the episturmian words with directive word ∆ are regular with period d.This class of regular episturmian words contains Sturmian words (case d = 2) and d-bonacci words, generalizations of the Fibonacci and Tribonacci words.
Episturmian words enjoy many of the good properties of Sturmian words as described in the foundational papers [16,23,24] and the survey [19], but some properties such as interpretation as codings of irrational rotations are lost.Standard references for Sturmian words are [26,Ch. 2], [32,Ch. 6].We refer the reader to [31,Ch. 4] for an introduction to Sturmian words as codings of irrational rotations.

Initial Nonrepetitive Complexity
We define the initial nonrepetitive complexity function inrc(x, n) of an infinite word x by inrc(x, n) = max{m : x[i, i + n − 1] ̸ = x[j, j + n − 1] for all i, j with 1 ≤ i < j ≤ m}.
The number inrc(x, n) is the maximum number of factors of length n seen when x is read from left to right prior to the first repeated factor of length n.In other words, the prefix of x of length inrc(x, n) + n is the shortest prefix of x containing two occurrences of some factor of length n.
The notion of initial nonrepetitive complexity was introduced independently by Moothatu [28] and Bugeaud and Kim [13].Nicholson and Rampersad [29] examine the general properties of this function and determine it explicitly for certain words such as the Thue-Morse word, the Fibonacci word, and the Tribonacci word.Their results were generalized for all standard Arnoux-Rauzy words in [27] by Medková et al.In this paper, we set up a framework that allows us in principle to determine the initial nonrepetitive complexity of any episturmian word.In order to do this, we generalize the S-adic expansion of Sturmian words derived in [10] to all episturmian words and prove in Theorem 3.13 that an episturmian word t can be expressed in the form where c is the corresponding standard episturmian word and (ρ k ) is an integer sequence expressed in a generalized Ostrowski numeration system.In other words, we show that each episturmian word is a limit of appropriate shifts of the corresponding standard word c.This means that studying a property of episturmian words can be reduced to studying the property on shifts of standard episturmian words.Here we apply this principle and determine the initial nonrepetitive complexity of regular episturmian words generalizing a result of Wojcik [35] on Sturmian words.This results in Theorem 5.10 which is too complicated to be stated here.We leave the characterization of this function for all episturmian words open.

Diophantine Exponents and Main Results
The Diophantine exponent is a combinatorial exponent of infinite words.It is introduced in [2], but is used implicitly in the earlier works by the same authors.Definition 1.1.Let x be an infinite word.We let its Diophantine exponent, denoted by dio(x), to be the supremum of all real numbers ρ for which there exist arbitrarily long prefixes of x of the form UV e , where U and V are finite words and e is a real number, such that The concept of Diophantine exponent has inherent interest to a word-combinatorist, and the concept has connections to other combinatorial exponents widely studied in combinatorics on words.What makes it special is the ingenious and simple result that dio(x) is a lower bound for the irrationality exponent µ(ξ x,b ) of the real number ξ x,b having x as the fractional part of its base-b expansion.
The crucial result here is that of Bugeaud and Kim [13] stating that dio(x) = 1 + lim sup n→∞ n inrc(x, n) .
This, together with the characterization of the function inrc(x, n) for regular episturmian words (Theorem 5.10), provides means to find lower bounds for µ(ξ x,b ) when x is a regular episturmian word.
Our two main results are as follows.The results were previously proved for Sturmian words by Adamczewski (based on the results of [10]) and Komatsu [25] respectively.Theorem 1.2.Let t be a regular episturmian word of period d with directive word ∆ = x a 1 1 x a 2 2 • • • .If d = 2 or lim sup k a k ≥ 3, then µ(ξ t,b ) > 2.
Theorem 1.3.Let t be a regular episturmian word with directive word ∆ = x a 1 1 x a 2 2 • • • .Then ξ t,b is a Liouville number if and only if the sequence (a k ) is unbounded.
We thus identify a new uncountable class of numbers with irrationality exponent strictly greater than 2 and a new uncountable class of Liouville numbers.
In addition, we show in Section 7 that it is possible that µ(ξ t,b ) > dio(t) for an episturmian word over a 3-letter alphabet.For Sturmian words, we have the equality µ(ξ t,b ) = dio(t) by a result of Bugeaud and Kim [13].Additional results on the Diophantine exponents of d-bonacci words are provided in Section 6.

Preliminaries from Combinatorics on Words
We use standard notions and notations from combinatorics on words.For a general reference, see, e.g., [26].
A word is a finite sequence of symbols from some finite set of letters called an alphabet.If w = w 1 • • • w n with w i ∈ A, then we say that w is a word of length n over A, and we set |w| = n.The empty word, the unique word of length 0, is denoted by ε.The set of words over A is denoted by A * .If u and v are words such that u = u then u is a prefix of w, v is a suffix of w, and z is a factor of w.If u ̸ = w, then u is a proper prefix of w; proper suffix is defined analogously.If w = uv, then by u −1 w and wv −1 we respectively refer to the words v and u.An occurrence of u in w 1 • • • w n is an index i such that u is a prefix of w i w i+1 • • • w n .By w[i, j] we mean the factor w i • • • w j whenever the indices i and j make sense.
Let w be a word over A. By w n we refer to the concatenation w • • • w where w is repeated n times.This is an integer power, and by a fractional power w e , e ≥ 1, we mean the word (uv) n u with uv = w and e = n + |u|/|w|.If w = u n only if n = 1, then we say that w is primitive.The word w is primitive if and only if w occurs exactly twice in w 2 .If w = uv, then the word vu is a conjugate of w.If w = w 1 • • • w n , w i ∈ A, then the reversal of w is the word w n • • • w 1 .If a word equals its reversal, then we say that it is a palindrome (we count the empty word as a palindrome).If w i = w i+p for all i such that 0 ≤ i < |w| − p, then w has period p.A mapping τ : An infinite sequence of letters x over A is called an infinite word.The set of infinite words over A is denoted by A ω .This set is naturally equipped with the product topology.
The map T is continuous with respect to the product topology.The language L x of x is its set of factors (the preceding notions of prefix, suffix, factor, and occurrence directly generalize to infinite words), and by L x (n) we refer to the set of factors of x of length n.We say that the language L x is closed under reversal if the reversal of each w in L x is also in L x .If wa, wb ∈ L x for distinct letters a and b, then we say that the factor w of x is right special; left special factors are defined analogously.If a factor is both right and left special, we say it is bispecial.The set {w ∈ A ω : L w ⊆ L x } is called the subshift generated by x.The subshift is a T-invariant and closed set.An infinite word x is ultimately periodic if x = uv ω where u and v are finite words, v ̸ = ε, and

Episturmian Words and Generalized Ostrowski Numeration Systems
Episturmian words were introduced in [16] as generalizations of Sturmian words based on palindromic closure.
Let A be the integer alphabet {0, 1, . . ., d − 1} of d letters.Let w (+) be the shortest palindrome having the word w as a prefix.Let ∆ = y 1 y 2 • • • be an infinite word over A and define a sequence (u k ) of finite words as follows: and set c ∆ = lim k→∞ u k .We say that c ∆ is a standard episturmian word with directive word ∆.
In what follows, we often use the name epistandard for a standard episturmian word.Each epistandard word has a unique directive word.The words u k are called central words and they are exactly the palindromic prefixes of c ∆ .In fact, the words u k are exactly the bispecial factors of c ∆ .This means in particular that every prefix of c ∆ is left special.
An infinite word t is episturmian with directive word ∆ if L t = L c ∆ .Equivalently, a word t is episturmian if L t is closed under reversal and t has at most one right special factor of length n for all n [16, Thm.5].If t is binary and aperiodic, then we call t Sturmian.It is equivalent to require that ∆ is binary and contains both 0 and 1 infinitely often.
An episturmian word is ultimately periodic if and only if the directive word ∆ is eventually constant, that is, y n = a for some a ∈ A for all n large enough [16,Thm. 3].In this paper, we consider only aperiodic episturmian words, so we assume that (y n ) is not eventually constant.

The Intercept of an Episturmian Word
Our next aim is to desubstitute an episturmian word with episturmian morphisms in a certain way that gives rise to the concept of the intercept of an episturmian word.This generalizes the arguments of [10] for Sturmian words.
Let A be the integer alphabet {0, 1, . . ., d − 1} as before.For each y ∈ A, define the morphisms L y as follows: These morphisms belong to the class of episturmian morphisms; see [19,Sect. 3].Let t be an episturmian word over A with directive word y 1 y 2 • • • .Then, depending on if the first letter of t is y 1 , we have t = L y 1 (t 1 ) or t = T(L y 1 (t 1 )) for some unique infinite word t 1 .It is wellknown that t 1 is also an episturmian word over A (possibly over a strict subalphabet of A) [23,Thm. 3.10].Thus there exist an integer b 1 in {0, 1} and a unique episturmian word t 1 such that t = T b 1 • L y 1 (t 1 ).By repeating this decoding, we see that there exists a unique integer sequence (b k ) and a unique sequence (t k ) of episturmian words such that for all k ≥ 1.It is easy to see that if y k = y k+1 and b k+1 = 0, then b k = 0. Indeed, if b k = 1, then t k−1 does not begin with y k so, by the form of the morphism L y k+1 , neither does t k begin with y k .Since y k = y k+1 , this implies that b k+1 = 1.
Let us write the directive word ∆ more compactly as follows: with a k ≥ 1, and x k ̸ = x k+1 for all k.We call the sequence (a k ) the sequence of partial quotients of t (the choice of the name will become apparent below).Let r 0 = 0 and By the property given at the end of the previous paragraph, we see that b r k +1 • • • b r k+1 (viewed as a word over {0, 1}) is of the form 0 * 1 * , so we may write for all k.It is easy to verify that L y • T • L y = T • L y • L y for all letters y, so we find that for all k.We call the sequence c 1 c 2 • • • the intercept of t.The choice of the name will become apparent after we discuss Sturmian words below.Notice that the intercept of t is unique.Notice also the important fact that the above derivation guarantees that x k (z) is nonempty when z is the first letter of t r k .Lemma 3.1.Let ∆ be a directive word as in (2).The intercept of the epistandard word c ∆ is 0 ω .Proof.The word c ∆ begins with x 1 , so b 1 = 0.The claim follows by induction because t 1 is epistandard by [16,Thm. 9].
Next we introduce several sequences of morphisms and words in order to define the important generalized standard words and show their connection to the central words u k .See [23,Sect. 2] for a slightly more elaborate presentation.
Let ∆ be a directive word as in (2).Set with µ 0 and τ 0 being the identity map, and define the sequences (h k ) k≥0 , (s k ) k≥0 , and (q k ) k≥0 by setting ), and q k = |s k |.
The words s k are the (finite) standard words associated with the directive word ∆.By definition, the epistandard word c ∆ is the limit of both (h k ) and (s k ).
Let (c n ) be the sequence of epistandard words such that c ∆ = µ n (c n ) as in (1) (see Lemma 3.1) and (µ n,k ), (τ n,k ), (u n,k ), (h n,k ), and (s n,k ) be the respective sequences for c n .A simple induction argument (see [23,Eq. 3]) shows that for all p such that 0 ≤ p < k.Replacing k by k + 1 and p by k − 1 in (4) yields for all k ≥ 1.Using (5) repeatedly, we obtain for k ≥ 2. The equation ( 6) directly implies the following important formula for k ≥ 1: Definition 3.2.Let ∆ be as in (2), and let P(k) = max{p < k : y p = y k } if this integer exists, and leave P(k) undefined otherwise.Define j(k) as the largest j such that j ≤ k and x j = x k+1 when P(r k + 1) exists and leave j(k) undefined otherwise.
For the next lemma, we make the following convention which makes our formulas less "noisy".We often have formulas involving s a k+1 k where the subscript k + 1 in the superscript a k+1 is one greater than in the subscript k.This conveys no essential information, so we will write s a * k instead whenever there is no risk of confusion.We also take s a If P(r k + 1) does not exist, then Proof.Suppose first that P(r k + 1) exists.Set j = j(k).First of all, as ) by a similar computation.Hence we find that . Say P(r k + 1) does not exist.Then x k−t ̸ = x k+1 for all t such that 0 ≤ t < k and the above arguments yield The claim follows as τ 0 (x k+1 ) = x k+1 .
The equations ( 8) and ( 9) imply that when P(r k + 1) exists and when P(r k + 1) does not exist.If ∆ is periodic with period d, then we see that (q k ) satisfies a linear recurrence of order d.

Regular Episturmian Words
In this section, we define regular episturmian words which have directive words of special form.This subclass of episturmian words has not had specific attention except for the paper of Glen [18] where the powers occurring in these words are studied.
Definition 3.4.Let ∆ be a directive word as in (2).If there exists an integer d such that d ≥ 2, the letters x 1 , • • • , x d are pairwise distinct, and , then we say that the directive word ∆ is regular with period d.An episturmian word is regular if its directive word is regular.
In what follows, we often assume that ) ω for some integer d such that d ≥ 2. Notice that regular episturmian words are exactly the Sturmian words when d = 2.This class includes the d-bonacci words f d which are the epistandard words having directive words (012 • • • (d − 1)) ω .The 2-bonacci word is called the Fibonacci word, and the 3-bonacci word is called the Tribonacci word.
The main advantage in studying regular episturmian words is that the function j(k) is simple: j(k) = k − (d − 1) when j(k) is defined, i.e., when k ≥ d.This simplifies many properties.For example, from ( 9) and ( 8), we have for 1 ≤ k < d, and for k ≥ d.Two consecutive applications of (13) show that In particular, we see that for k ≥ d.In a similar fashion, combining ( 13) and ( 7) yields that for k ≥ d + 2. We believe that most of the results of this paper can be carried out for general episturmian words.However, this leads to very complicated arguments; the arguments are already tedious and complicated in the regular case.

Generalized Ostrowski Numeration Systems
Let us now define a representation for a nonnegative integer n in terms of the shift T n (c ∆ ) of the epistandard word c ∆ .First, we prove a generalization of a famous result of Brown [11,Thm. 2].Proposition 3.5.Let ∆ be a directive word as in (2), and let n be a positive integer.Let c 1 c 2 • • • be the intercept of T n (c ∆ ).Then there exists an integer k such that c k ̸ = 0 and c i = 0 for all i > k.Moreover the prefix of c ∆ of length n equals s We prove the claim by induction on n.From Lemma 3.1, it follows that c ∆ = τ 1 (c ∆ ′ ) where ∆ ′ = T a 1 (∆).Thus if n ≤ a 1 , then n0 ω is a valid intercept for T n (c ∆ ), and the uniqueness of the intercept implies that c 1 = n.By definition, the word s 1 is a prefix of c ∆ and s 1 = τ 1 (x 2 ) = x a 1 1 x 2 , so s c 1 0 is a prefix of c ∆ .This establishes the base case.Suppose that n > a 1 .Then it follows from (3) and the arguments preceding it that T n−c 1 q 0 (c ∆ ) is a τ 1 -image of an episturmian word t with intercept c 2 c 3 • • • (recall from above that q 0 = 1).In fact, as in the proof of Lemma 3.1, the word t is a suffix of the epistandard word c ∆ ′ .Let w be the prefix of c ∆ ′ such that |τ 1 (w)| = n − c 1 q 0 , that is, say t = T |w| (c ∆ ′ ).The word w must be nonempty as n > a 1 and c 1 ≤ a 1 .Now 0 < |w| < n, so the induction hypothesis implies that there exists an integer k such that c k ̸ = 0, c i = 0 for all i > k, and where s ∆ ′ ,j is the jth standard word for the directive word ∆ ′ .By definition, we have τ 1 (s ∆ ′ ,j ) = s j+1 for all j, so Since T n−c 1 q 0 (c ∆ ) = τ 1 (t), the word T n−c 1 q 0 (c ∆ ) must begin with x a 1 1 .It follows that τ 1 (w)s c 1 0 is a prefix of T n (c ∆ ).Since |τ 1 (w)s c 1 0 | = n, the claim follows.
The connection to Brown's result becomes clearer as we study greedy expansions below.Thanks to Proposition 3.5, we can give the following definitions.Definition 3.6.Let ∆ be a directive word as in (2), and let n be a positive integer.We let the representation, or the Ostrowski expansion, rep ∆ (n) of n to be the word c 1 • • • c k where c k ̸ = 0 and c 1 • • • c k 0 ω is the intercept of the word T n (c ∆ ).In addition, we set rep ∆ (0) = ε.Definition 3.7.Let ∆ be a directive word as in (2).If We often omit the subscript ∆ in rep ∆ (n) and val ∆ (w) if the directive word ∆ is clear from context.
It follows from Proposition 3.
Therefore the Ostrowski expansion of an integer can be viewed as an expansion with respect to the numeration system associated with the sequence (q k ) (for a gentle introduction to numeration systems, see the book [33]).However, we emphasize that the Ostrowski expansion of a number n does not necessarily coincide with the greedy expansion of n with respect to (q k ) as indicated by the following example.
Our next aim is to prove that the Ostrowski expansion of an integer coincides with the greedy expansion in the important special case of regular episturmian words.We leave the characterization open in the case of nonregular directive words.Whenever we discuss greedy expansions below, we assume that the greedy expansion is written the least significant digit first and without trailing zeros.Definition 3.9.Given a directive word ∆ as in (2), an infinite word c 1 c 2 • • • over the alphabet {0, 1, 2, . ..} satisfies the Ostrowski conditions if 0 ≤ c k ≤ a k for all k and for all k ≥ 1 the following implication holds: is a prefix of an infinite word satisfying the Ostrowski conditions.
If ∆ is regular with period d, then the Ostrowski conditions state that 0 ≤ c k ≤ a k for all k and that if c i = a i for all i such that k − Lemma 3.10.The intercept of a regular episturmian word satisfies the Ostrowski conditions.
Proof.Let c 1 c 2 • • • be the intercept of a regular episturmian word t with directive word ∆ as in (2).The property that 0 ≤ c k ≤ a k for all k follows directly from the derivation of the intercept preceding Lemma 3.1.Suppose that P(r k + 1) exists, that is, say k ≥ d, and assume that c i = a i for all i such that k − (d − 1) < i ≤ k.Let z be the first letter of t r k .From (3) and the discussion following it, we see that is a nonempty prefix of t.Suppose for a contradiction that there exists a largest ℓ such that z x ℓ (z) = ε because z = x ℓ and c ℓ = a ℓ > 0. This contradicts that ( 16) is nonempty.Since the directive word ∆ contains d distinct letters, we deduce that z = x k−(d−1) .Thus z = x k+1 for Since the prefix (16) must be nonempty, we deduce that The arguments presented so far generalize those of [10].In the case of Sturmian words, the sequence of partial quotients (a k ) can be viewed as the continued fraction expansion [0; a 1 , a 2 , . ..] of a number α which equals the frequency of the letter x 2 in t.A Sturmian word t can be seen as a coding of the rotation x → x + α of a point ρ in the torus [−α, 1 − α).The point ρ, often called the intercept of t due to the connection to so-called mechanical words, can be expressed as a sum where p k−1 /q k−1 are the convergents of α and (c k ) is an integer sequence satisfying the conditions 0 ≤ c k ≤ a k for all k and c k+1 = a k+1 =⇒ c k = 0 for all k.The sequence (c k ) is exactly the intercept of t in the sense we defined above and the conditions match the Ostrowski conditions.Moreover, the denominators of the convergents match the lengths of the associated standard words.The sum representation of ρ often goes by the name of Ostrowski expansion of a real number.For the proofs of these facts and more details, see [10].Thus the concepts we defined have nice number-theoretic interpretations in the context of Sturmian words.Unfortunately no such interpretations are known for general episturmian words.See [16,Sect. 5] for an intercept defined for episturmian words that are fixed points of morphisms.
Equalities like (3) make sense even when the word c 1 c 2 • • • does not satisfy the Ostrowski conditions.A whole theory of so-called spinned directive words has been developed for studying alternative representations of episturmian words; see [19,Sect. 4].Our notion of intercept coincides with the normalized directive word of [20].Lemma 3.11.Let ∆ be a directive word as in (2).
Proof.Let us prove the claim by induction on k.The base case is established by observing that val(ε) = 0 < q 0 = 1.Say P(r k + 1) exists, and set j = j(k).Assume first that there exists a largest i such that j < i ≤ k and c i < a i .Then where the first inequality follows from the induction hypothesis and the final inequality follows from (10).Suppose then that no i like above exists.The Ostrowski conditions now imply that c j = 0, so by the induction hypothesis and (10).
Suppose then that P(r k + 1) does not exist.Then by (11).The claim follows.
Proposition 3.12.Let ∆ be a regular directive word.Let c 1 • • • c k be the greedy expansion of a nonnegative integer n with respect to the numeration system associated with (q k ).
Proof.If n = 0, then the claim is clear as the greedy expansion of 0 is ε by convention.Suppose that n > 0. Let ℓ be the largest integer such that q ℓ−1 ≤ n.Then there exists a unique nonnegative integers b ℓ and Without loss of generality, we may assume that k ≤ ℓ.
Observe how the proof of Proposition 3.12 fails in Example 3.8.We have rep(6) = 111, but it is not true that val(11) = q 0 + q 1 = 1 + 2 = 3 < q 2 = 3.Thus a result akin to Lemma 3.11 is not valid for general directive words.

Auxiliary Results on Generalized Standard Words
In this section, we prove further results on generalized standard words needed in this paper.Most of the presented results appear in some form in [23,Sect. 3.3]; our proofs follow a somewhat different philosophy being based purely on properties of the numeration system.See also [24] on additional results on the numeration system.Proposition 3.5 states that if in an episturmian word has intercept We generalize this to arbitrary intercepts as follows.This result is found as [16,Thm. 3.20].Theorem 3.13.If t is an episturmian word with directive word ∆ as in (2) For the proof, we need the following lemma.Lemma 3.14.Let ∆ be a directive word as in (2) and c 1 c 2 • • • be an intercept.Then there exists an integer k such that c k < a k .
Proof.This proof is similar to that of Lemma 3.10.Let z k be the first letter of t r k where t r k is as in (3).Since ∆ contains finitely many distinct letters and is not eventually constant, there must exist integers j and k such that j < k, x j = z k , and 3) that T c j (z k ) must be nonempty, so we conclude that c j = 0.The claim follows.
Proof of Theorem 3.13.Since the suffix of an intercept is a valid intercept, Lemma 3.14 implies that there exists an increasing integer sequence (k n ) such that c k n < a k n for all n.From (3), we have for a letter z k n .Thus t and the episturmian word with intercept By definition, we have The length of this common prefix is thus has at least |τ k n −1 (z k n )|, and this length tends to infinity as n → ∞ provided that the directive sequence ∆ is not ultimately constant (we always assume this).If we denote the episturmian word with intercept The significance of Theorem 3.13 is that, in principle, the properties of a general episturmian word reduce to those of shifts of an epistandard word.This result suggests to consider the longest common prefixes of the words in the sequence (T val(c 1 •••c k ) (c ∆ )) k .This is found in Lemma 3.20, but we need several auxiliary lemmas for the proof.
Lemma 3.15.The word u r k is a proper prefix of s k for all k ≥ 1. 5) and ( 7) and s 1 = s a 1 0 x 2 , so the claim holds.Assume that k > 1. Say P(r k + 1) exists.Since both u r k and s k are prefixes of c ∆ , it suffices to show that |u r k | < |s k |.By ( 5) and ( 7), we have |u does not exist, the claim follows by similar arguments.Lemma 3.16.We have the following implications for all k ≥ 0.
(iv) If neither P(r k+1 + 1) nor P(r k + 1) exists, then Proof.Suppose that P(r k+1 + 1) exists.As j(k + 1) < k + 1, we see that u r j(k+1) is a prefix of s k .Applying ( 8) and ( 7), we have and (i) holds.If P(r k + 1) does not exist, then we deduce from (9) that Suppose then that P(r k+1 + 1) does not exist.Then like above, and we have and we see that (iv) holds.
Lemma 3.17.For all k ≥ 0, the word s a * +1 k is a prefix of c ∆ if and only if P(r k + 1) exists.If P(r k + 1) Proof.By Lemma 3.15, the word u r j(k+1) is a proper prefix of s k .Say P(r k + 1) exists.Lemma 3.16 implies that the word k is a prefix of c ∆ .Suppose that P(r k + 1) does not exist.Then (ii) or (iv) of Lemma 3.16 holds.In the latter case, the word s a * +1 k cannot be a prefix of s k+1 s k because x k+1 ̸ = x k+2 by definition.In the former case, we have we see that the words s k and u r j(k+1) +1 share the prefix u r j(k+1) x j(k+1) .Therefore s k+1 s k has prefix s a * +1 k x −1 k+1 x j(k+1) .By definition, we have x j(k+1) = x k+2 , so we see that s a * +1 k x −1 k+1 x k+2 is a prefix of c ∆ .Like above, the word s a * +1 k is not a prefix of c ∆ .Thus we have proved the first and second claims.Let us then prove the final claim.If P(r k + 1) does not exist, then the second claim shows that s a * +1 k is not a prefix of c ∆ , so assume that P(r k + 1) exists.Suppose first that P(r k+1 + 1) also exists.The first claim shows that s 2 k+1 is a prefix of c ∆ .From Lemma 3.16, we see that Because s k is a prefix of c ∆ and u r j(k+1) is a proper prefix of s k by Lemma 3.15, we see that the prefix s a * +1 k of s 2 k+1 is followed by u r j(k) x k+2 .Similarly, the word s k has the word u r j(k) x k+1 as a prefix.Since x k+1 ̸ = x k+2 , we conclude that s a k+1 +2 k is not a prefix of c ∆ .Assume then that P(r k+1 + 1) does not exist.Then Based on Lemma 3.17, we give the following definition.Definition 3.18.If P(r k + 1) exists, then we let t k to be the longest word such that s a * +1 k t k is a prefix of c ∆ with period q k .If P(r k + 1) does not exist, then we set Notice that the word t k is a proper prefix of s k when P(r k + 1) exists, but t k can be empty.
Lemma 3.19.For all k ≥ 0, we have s i k t k = u r k +i for all i such that 1 ≤ i ≤ a k+1 .Proof.Suppose that P(r k + 1) exists.By Lemma 3.17, the word s 2 k is a prefix of c ∆ , so s k t k is a right special prefix of c ∆ .Therefore s k t k = u ℓ+2 for some integer ℓ (recall that the central words u k are as bispecial factors exactly the right special prefixes of c ∆ ).It follows from [23, Eq. 2] that ℓ+1 is the (ℓ + 1)th central word associated with the directive word T(∆).The word u ℓ+2 can be repeatedly decoded like this for a total of ℓ + 1 times to obtain the empty word u (ℓ+1) 1 .On the other hand, we have s k t k = µ r k (x k+1 )t k , so it must be possible to decode s k t k at least r k times before obtaining an empty word.Therefore Assume for a contradiction that r k + 1 < ℓ + 2, that is, r k ≤ ℓ.By (8), we may write On the other hand, by ( 6), we have This contradicts the definition of t k .We have thus proved that s k t k = u r k +1 .The rest of the claim follows by observing from ( 5) that the differences |u Let us then assume that P(r k + 1) does not exist.Lemma 3.17 again implies that s k t k is a central word.By (9), we have k+1 .The rest of the claim follows as above. The As ∆ is regular, it follows from Lemmas 3.10 and 3.11 that Suppose that P(r k+n−1 + 1) exists.Then Lemma 3.17 implies that s a * +1 k+n−1 t k+n−1 is the longest prefix of c ∆ with period q k+n−1 .Therefore the word k+n−1 t k+n−1 .Since c k+n > 0, we have by the definition of the word t k+n−1 that the longest common prefix of the words The claim follows by a short computation applying Lemma 3.19.The same arguments apply when P(r k+n−1 + 1) does not exist as then the longest prefix of c ∆ with period q k+n−1 equals s a * +1 k+n−1 t k+n−1 and t k+n−1 = x −1 k+n .Notice that |x −1 k+n | = −1.Let us then find out when the length of the longest common prefix increases.Let k ≥ 1 and n 1 and n 2 be the two least positive integers such that n 1 < n 2 , c k+n 1 ̸ = 0, and c k+n 2 ̸ = 0. Let v 1 be the longest common prefix of T val(c Lemma 3.21.Let ∆ be a regular word and c 1 c 2 • • • be an intercept.Define Proof.Let k ≥ 0. Then From the discussion preceding this lemma, we see that η k+1 = η k if and only if a k+2 = c k+2 .By Lemma 3.14, there exist infinitely many k such that c k+2 < a k+2 .Since (η k ) is nondecreasing, this shows that lim k→∞ η k = ∞.
Many additional properties of the words s k could be easily derived, but we do not need them in this paper, so we will stop after the following required result.Lemma 3.22.[23,Thm. 3.17] Let ∆ be a directive word as in (2) and y be a letter occurring infinitely many times in ∆.Define an intercept c Proof.Since y occurs infinitely many times in ∆, arbitrarily long central words u k are followed by the letter y.As the language of c ∆ is closed under reversal and u k are palindromes, we see that yu k is a factor of c ∆ for infinitely many k.Since c ∆ is the limit of the sequence (u k ), it follows that yc ∆ is an episturmian word with directive word ∆. Let

Rauzy Graphs of Episturmian Words
Let x be an infinite word.The Rauzy graph Γ(n) of order n associated with the language of x is a directed graph with vertices L x (n) and edges L x (n + 1).There is an edge e from vertex u to vertex v if and only if e has prefix u and suffix v.Each word with the language L x corresponds to an infinite path in the graph Γ(n) starting from its prefix of length n.The initial nonrepetitive complexity inrc(x, n) can be determined from Γ(n): start from the vertex corresponding to the prefix of x of length n and follow the path dictated by x until a vertex is repeated for the first time.In general, this is of no help as the graph Γ(n) can be very complicated.However, when x has low factor complexity, there are only few right special factors of length n and the analysis is more likely to succeed.This is indeed the case with episturmian words whose Rauzy graphs have especially nice form.
An episturmian word t with directive word ∆ can be equivalently defined as an infinite word such that its language is closed under reversal and it has exactly one right special factor of each length [16,Thm. 5].The reversal of the right special factor must thus be left special, so there is exactly one left special factor and exactly one right special factor of each length.A moment's thought shows that this means that Γ(n) is composed of cycles sharing a common part, called the central path, like in Figure 1.The number of cycles depends on the number of letters that eventually appear in the directive word.If t is regular with period d, then there are exactly d cycles.Indeed, each central word u k has as a suffix all shorter central words, so each central word is followed by each letter 0, 1, . .., d − 1 in c ∆ .The suffixes of the central words yield right special factors for each length, and the claim follows.Notice that we just argued that the central words are right special.This means that they are left special because they are palindromes.Therefore the central path of Γ(|u k |) reduces to a single vertex.Notice that the graph Γ(n) "evolves" to Γ(n + 1) in a deterministic fashion whenever n does not equal the length of a central word: the central path is shortened by one edge and all cycles maintain their number of edges.When n equals some |u k |, the evolution or "bursting of the bispecial factor" is determined by ∆.In the case of episturmian words, determining inrc(t, n) is thus rather straightforward: find out the location of the vertex v corresponding to the prefix of t of length n and determine the length L of the next cycle taken.If v is on the central path, then inrc(t, n) equals L. Otherwise we need to add to L the number of edges that need to be traversed from v to the vertex of the left special factor.ℓ The Rauzy graph of an episturmian word.The left special factor corresponds to the vertex ℓ and the right special to the vertex r.The directed path from ℓ to r is the central path.

Initial Nonrepetitive Complexity of Regular Episturmian Words
In this section, we derive a complete description of the initial nonrepetitive complexity of regular episturmian words; see Theorem 5.10.We specialize the most significant propositions to the case of Sturmian words.Our proof method generalizes that of Wojcik who determined the initial nonrepetitive complexity of Sturmian words [35,Sect. 5.3].
The majority of the results presented need the assumption that the directive word ∆ is regular.We make the convention that this is implicitly assumed in the following discussion, but we make the assumption explicit in the statements of lemmas, propositions, etc.
It is natural to partition the positive integers according to the sequence (q k ), but it is in fact better to do it using the central words u k .We set Clearly N is a disjoint union of these intervals.We further subdivide each interval I k into a k+1 subintervals by setting for ℓ = 0, . . ., a k+1 − 1.Notice the following peculiarity: the first subinterval I k,0 has q k−1 elements while the remaining intervals have q k elements (when k > 0).Indeed by (5), we have The chief reason for defining the intervals like this is that when the directive word ∆ is regular with period d, we are guaranteed that q k+d−1 > |u r k+1 |, that is, q k+d−1 exceeds the right endpoint of I k .If we had defined the right endpoint of I k to be |u r k+1 +1 |, which is a priori more natural, this is not true when d = 2 as indicated by the proof of the following lemma.Lemma 5.1.Let ∆ be a regular directive word with period d.We have q k+d−1 Proof.Let us first prove the latter claim.By (7) and ( 13), we have  We set out to figure out inrc(t, n) for a regular episturmian word t when n ∈ I k for k ≥ 0. In view of Theorem 3.13, the aim is to reduce finding this number to the study of shifts of c ∆ .In fact, if t is regular with period d and intercept c 1 c 2 • • • , then in most cases inrc(t, n) is determined by the word T val(c 1 •••c k+d−1 ) (c ∆ ); see Proposition 5.7 for the complete details.See Figure 2 for example plots of the function inrc(t, n).
Let n ∈ I k,ℓ for k and ℓ such that k ≥ 0 and 0 ≤ ℓ < a k+1 , and let θ n be the length of the central path of the Rauzy graph Γ(n) of c ∆ (the number of edges on the central path).The number inrc(T m (c ∆ ), n) is determined by the cycle sequence taken in the Rauzy graph Γ(n) when the word c ∆ is read.We denote by C y the cycle of Γ(|u r k +ℓ+1 |) containing the edge corresponding to the factor u r k +ℓ+1 y.We denote the length of C y by ∥C y ∥.Notice that the graph Γ(n) has the same cycle lengths as the graph Γ(|u r k +1+ℓ |).
Let us next show that ∥C y ∥ = |µ r k +ℓ (y)|.Say we start at the vertex u r k +ℓ+1 of Γ(|u r k +ℓ+1 |), take the cycle C y , and return to the vertex u r k +ℓ+1 .This sequence of vertices corresponds to a factor w of length |u r k +ℓ−1 | + ∥C y ∥ such that w contains exactly two occurrences of u r k +ℓ+1 , one as a prefix and one as a suffix.If follows from [23, Eq. 2] that u r k +ℓ+1 = L x 1 (v 1 )x 1 where v 1 is the (r k + ℓ)th central word associated with the directive word T(∆).Hence the word obtained from w by removing its last letter decodes to a word w 1 such that w 1 has exactly two occurrences of v 1 , one as a prefix and one as a suffix.Moreover, we deduce from the form of the morphism L x 1 that the prefix v 1 of w 1 is followed by y.This procedure may be repeated r k + ℓ times to obtain w r k +ℓ = y.The procedure removes the suffix u r k +ℓ+1 completely, so it must be that |µ r k +ℓ (y Next we partition the interval {0, 1, . . ., q k+d−1 − 1} into intervals λ i , and we further divide these intervals into subintervals λ i,j according to the cycle sequence as described below.Our aim is to show that the initial nonrepetitive complexity has simple description on each λ i,j .See Proposition 5.3. Let , the standard word c ∆ ′′ is formed of blocks x a k+1 −ℓ k+1 y with y ̸ = x k+1 .We say that such a block is of type y.Notice that the block types are given by the letters of the standard word with directive word ∆ ′ .It is straightforward to verify that the number of blocks including the first block of type x k+d equals K d , where In particular, K 2 = 1.
Corresponding to the ith block, 1 ≤ i ≤ K d , we define an interval λ i as follows.If i = 1, then we let L i = 0, and otherwise we let L i − 1 to be the largest element of λ i−1 .We define where y i is the type of the ith block.The number of elements of λ i is simply the length of the µ r k +ℓ -image of the block since µ r k +ℓ (x Next we subdivide the interval λ i into four adjacent intervals that respectively have sizes (a k+1 − ℓ − 1)∥C x k+1 ∥ + θ n + 1, ∥C x k+1 ∥ − (θ n + 1), θ n + 1, and ∥C y i ∥ − (θ n + 1).More formally, we define Let us find out the size of the union of the intervals λ i for i = 1, . . ., K d .The intervals are clearly disjoint and adjacent, so the size equals |τ k+1 (y)| summed over the block types y.Thus the size of the union equals the length of the τ k+1 -image of the prefix of c ∆ ′ having the first occurrence of x k+d as a suffix.By (12), this prefix equals v d−2 where v d−2 is the (d − 2)th standard word for the directive word ∆ ′ .Then the τ k+1 -image of v d−2 equals s k+d−1 .Indeed, by definition, we have Example 5.2.(Sturmian Case) When ∆ is binary and d = 2, we have K d = 1, so there is only one block.Now ∥C x k+1 ∥ = |τ k (x k+1 )| = q k .The type y of the block is clearly x k+2 , so By recalling that L 1 = 0, we find that the intervals λ 1,j are as follows: The union of the intervals equals {0, 1, . . ., q k+1 − 1}.Proposition 5.3.Let ∆ be a regular directive word.Let n and i be integers such that n ∈ I k,ℓ with k ≥ 0 and 0 ≤ ℓ < a k+1 and 1 ≤ i ≤ K d .Suppose that the ith block has type y i .
Proof.We are concerned with the cycles taken in the graph Γ(n), which evolves to Γ(u r k +1+ℓ ).Recall that the cycle sequence taken in Γ(u r k +1+ℓ ) is determined by the letters of the standard word c ∆ ′′ with ∆ ′′ = T r k +ℓ (∆) = x a k+1 −ℓ k+1 x a k+2 k+2 • • • and that the cycle lengths are given by the lengths of the µ r k +ℓ -images of these letters.Let v be the prefix of T m (c ∆ ) of length n.
Case A. Suppose that m ∈ λ i,1 .By the discussion preceding Example 5.2, the number L i equals the length of the µ r k +ℓ -image of the first i − 1 blocks.Then, from the remark at the beginning of this proof, we see that reading off L i letters from the beginning of c ∆ amounts to traveling complete cycles in Γ(u r k +1+ℓ ).Since all prefixes of c ∆ are left special, we see that the prefix of Then v lies on the central path of Γ(n) during the (a k+1 − ℓ)th traversal of C 1 .Hence inrc(T m (c ∆ ), n) = q k in this case as well.
Case B. Suppose that m ∈ λ i,2 .By the arguments in the latter case of the previous paragraph, we see that v lies on the cycle C 1 during the (a k+1 − ℓ)th traversal of C 1 .Moreover, the vertex v is not on the central path of Γ(n) and exactly ∥C 1 ∥ − (m − (L i + (a k+1 − ℓ − 1)∥C 1 ∥)) edges need to be traversed to return to the left special vertex of Γ(n).The ith block equals x a k+1 −ℓ k+1 y i , so the cycle C 1 is followed by a cycle C y having length |µ r k +ℓ (y i )|.Therefore the initial nonrepetitive complexity is determined by the return to the left special vertex of Γ(n) when traversing the cycle C y , that is, we have Case C. Suppose that m ∈ λ i,3 .Now v lies on the central path of Γ(n) and the next cycle to be traversed is C y .Since v is on the central path, the initial nonrepetitive complexity is simply ∥C y ∥.The claim follows from the preceding computations.
Case D. Suppose that m ∈ λ i,4 .In this case v lies on C y but not on the central path.From the form of the word c ∆ ′′ , we deduce that the next cycle taken is C 1 .Exactly L i+1 − m edges need to be traversed to arrive at the left special factor of Γ(n).Therefore Proof.The claim follows by short computations using the information provided in Example 5.2.
The following proposition gives the nonrepetitive initial complexity of certain shifts of the regular standard episturmian words for the lengths in the interval I k .The statement is quite complicated.We advise the reader to read the proof and study the implications ( 18), ( 19), (20), and (21) rather than spending much time on the statement itself.Proposition 5.5.Let ∆ be a regular directive word and ∆ ′ = T r k+1 (∆) for k ≥ 0. Suppose that n ∈ I k , and let m be an integer such that 0 ≤ m < q k+d−1 and rep

possibly with trailing zeros). Let y i be the ith letter of c ∆
0 and c k = 0, then we have the following implications: (ii) If c k+1 = a k+1 , then we have the following implications: (iii) If c k+1 = a k+1 − 1 > 0 and c k ̸ = 0, then we have the following implications: (iv) If c k+1 = 0 and c k = 0, then we have the following implications: (v) If c k+1 = a k+1 − 1 = 0 and c k ̸ = 0, then we have the following implications: (vi) If a k+1 − 1 > c k+1 = 0 and c k ̸ = 0, then we have the following implications: Proof.Since 0 ≤ m < q k+d−1 , we see that m ∈ λ i for some i.The discussion preceding Example 5.2 tells us that the left endpoint (here < lex is the lexicographic order on N).Since v 1 and v 2 are the representations of two consecutive integers, we see that rep ∆ (m) must end with v 1 .In other words, we have showed that Thus the type of the ith block is y i where y i is the ith letter of c ∆ ′ .Assume that n ∈ I k,l for some ℓ such that 0 ≤ ℓ < a k+1 , and write Recall that θ n is the length of the central path of Γ(n) and that the intervals I k,ℓ have size q k except in the case ℓ = 0 when the size is q k−1 .
Case A. Let us first consider the case where m ∈ λ i,1 .Now The left inequality val ∆ (c 1 • • • c k+1 ) ≥ 0 is trivially true, so we conclude using Proposition 5.3 that The only option is that c k+1 = a k+1 − ℓ − 1, which cannot happen if a k+1 = c k+1 .Substituting θ n from (17) to the left inequality implies that n > (a k+1 − 1) The right inequality val Notice indeed that here . This holds if and only if Since θ n < q k , it must be that c k+1 = a k+1 − ℓ, and this cannot happen if c k+1 = 0 as 0 ≤ ℓ < a k+1 .The left inequality is trivial.Utilizing again (17), the right inequality transforms into Case D. Assume finally that m ∈ λ i,4 .This is true only if From the left inequality, we obtain the following inequality: Since n ∈ I k,ℓ , we see that this is possible only when The right inequality holds trivially, so we have Here the facts Let us then put the above results together.If c k+1 satisfies 0 < c k+1 < a k+1 − 1, then the antecedents of ( 18), ( 19), (20), and ( 21) are all satisfied.Clearly the interval {|u r k | + 1, . . ., |u r k+1 |} is partitioned by these four cases and the initial nonrepetitive complexity is determined on each partition.Exactly the same happens if c k+1 = a k+1 − 1 and c k = 0.This gives (i).Suppose then that c k+1 = a k+1 .Then, as we saw above, the antecedents of ( 18) and ( 19) are not satisfied, so these cases are omitted.The left inequality is trivial in the Case C and n ∈ I k,0 , so we may now deduce that inrc(T The Case D directly applies, and we have (ii).The remaining cases are similar.

Proposition 5.6 (Sturmian Case).
Let ∆ be a binary directive word.Suppose that n ∈ I k for k ≥ 0, and let m be an integer such that 0 ≤ m < q k+1 and rep(m) = c 1 • • • c k+1 (possibly with trailing zeros).
(i) If 0 < c k+1 < a k+1 − 1 or c k+1 = a k+1 − 1 > 0 and c k = 0, then we have the following implications: (ii) If c k+1 = a k+1 , then we have the following implications: (iii) If c k+1 = a k+1 − 1 > 0 and c k ̸ = 0, then we have the following implications: (iv) If c k+1 = 0 and c k = 0, then we have the following implications: (v) If c k+1 = a k+1 − 1 = 0 and c k ̸ = 0, then we have the following implications: (vi) If a k+1 − 1 > c k+1 = 0 and c k ̸ = 0, then we have the following implications: Proof.An easy induction argument shows that |u r k | = q k − 2 for all k ≥ 1 in the Sturmian case.Utilize again the computations of Example 5.2.
Keeping in mind Theorem 3.13, we now determine which shifts of c ∆ need to be considered in order to determine the initial nonrepetitive complexity.Notice that if precise information is not required, the first case of the following proposition can be omitted since the longest common prefix of t and T val(c 1 •••c k+d ) (c ∆ ) is at least as long as that of t and T val(c 1 •••c k+d−1 ) (c ∆ ) according to Lemma 3.21.Notice also that when the case (i) of the proposition applies, we can compute the initial nonrepetitive complexity using Proposition 5.5.This result was proved for Sturmian words in [35,Prop. 5.3.0.3].Proposition 5.7.Let t be a regular episturmian word with directive word as in (2) For the proof, we need some auxiliary lemmas.They include more information than we need, but the complete statements could be useful in some other contexts.Lemma 5.8.Let ∆ = x a 1 1 x a 2 2 • • • be a regular directive word with period d.Let k and i be such that , and the claim follows from the induction hypothesis.If i = 0, then and the claim follows.Lemma 5.9.Let ∆ = x a 1 1 x a 2 2 • • • be a regular directive word with period d.For k and ℓ such that 0 ≤ k < d and 1 ≤ ℓ ≤ a k+1 , we have Let k, ℓ, and i be such that k ≥ d, 1 ≤ ℓ ≤ a k+1 , and and j is the smallest integer such that k − j ≡ i (mod d).
Proof.Assume that k and ℓ are such that 0 ≤ k < d and 1 ≤ ℓ ≤ a k+1 .Notice that since ∆ is regular, we have This proves the first part of the claim.Suppose then that k ≥ d, 1 ≤ ℓ ≤ a k+1 , and 1 ≤ i ≤ d.Again we have τ k L ℓ x k+1 (x k+1 ) = s k which gives (iv).Assume that i ̸ ≡ k + 1 (mod d).Let j be the smallest number such that k − j ≡ i (mod d).Applying computations as above, we obtain that This proves (v).
The point of Lemma 5.9 is that the words τ k L ℓ x k+1 (x i ) are ordered by length in a predictable pattern.Let us consider the words τ k L ℓ x k+1 (x i ) ordered by the index i in the natural order of {1, 2, . . ., d}.Then the first part of Lemma 5.9 states that the lengths |τ k L ℓ x k+1 (x i )| strictly decrease when i increases from 1 to k + 1.The remaining words are of equal length that is strictly greater than the preceding values.In other words, the shortest word τ k L ℓ x k+1 (x k+1 ) is pushed to the right in a cyclical fashion.The second part of the lemma states that this cyclical pattern continues modulo d.
Proof of Proposition 5.7.Suppose that n ∈ I k,ℓ for some ℓ such that 0 ≤ ℓ < a k+1 .Let m = val(c 1 • • • c k+d−1 ).Now m belongs to the ith block where i = val ∆ ′ (c k+2 • • • c k+d−1 ) + 1 and ∆ ′ = T r k+1 (∆) (see the first paragraph of the proof of Proposition 5.5).Say the block has type We assume first that there exists a largest i such that k + 2 ≤ i ≤ k + d and c i < a i .In order to show that inrc(t, n) = inrc(T m (c ∆ ), n), it suffices to demonstrate that the longest common prefix of these words has length at least inrc(T m (c ∆ ), n) + n.Below we do this depending on which interval λ i,j the number m belongs to.If t = T m (c ∆ ), there is nothing to prove, so we assume that there exists a least positive integer j such that c k+d−1+j ̸ = 0.By Lemma 3.20, the longest common prefix of T m (c ∆ ) and It follows from Lemma 3.21 that t and As in the proof of Proposition 5.5, we see that this means that 0 By applying Lemma 3.21 and the preceding inequality, we obtain As in the proof of Proposition 5.5, we have Recall that there exists a largest i such that k + 2 ≤ i ≤ k + d and c i < a i .An application of Lemma 3.21 yields 12) and ( 13), it follows that Like in the previous case, we have Case D. Suppose that m ∈ λ i,4 .Now and c k+1 ̸ = 0. From Proposition 5.3, we obtain that We have now finished the first part of the proof.Let m ′ = val(c 1 • • • c k+d ), and assume that c i = a i for all i such that k + 2 ≤ i ≤ k + d.Since the intercept c 1 c 2 • • • satisfies the Ostrowski conditions, it follows that c k+1 = 0 and c k+d+1 < a k+d+1 .Thus we immediately see that the preceding Cases C and D do not occur as then we had c k+1 ̸ = 0.The arguments given in the Case A still work, and we see that inrc(t, n) = inrc(T m (c ∆ ), n) when m ∈ λ i,1 .Since the longest common prefix of t and T m ′ (c ∆ ) is at least as long as that of t and T m (c ∆ ) by Lemma 3.21, we deduce that inrc(t, If T m ′ (c ∆ ) = t, then the claim is clear, so we assume that there exists a least positive integer j ′ such that c k+d+j ′ ̸ = 0 (notice that since c k+d = a k+d , we have j = 1).By Lemma 3.20, the longest common prefix of t and T m ′ (c ∆ ) has length P ′ where We used above the facts that c k+d+1 < a k+d+1 , c k+1 = 0, and It follows that inrc(t, n) = inrc(T m ′ (c ∆ ), n).This establishes (ii) and ends the proof.
We are finally in the position to give a complete description of the initial nonrepetitive complexity of a regular episturmian word.This mostly amounts to putting together Propositions 5.7 and 5.5.The statement is again complicated, but all listed cases are different.The initial nonrepetitive complexity of a general epistandard word is determined in [27,Thm. 16].
Theorem 5.10.Let t be a regular episturmian word with a directive word ∆ of period d and intercept c 1 c 2 • • • .Suppose that n ∈ I k for k ≥ 0, and let ∆ ′ = T r k+1 (∆) and y i be the ith letter of c then we have the following implications: Recall that the number inrc(T m (c ∆ ), n) is determined by the cycle sequence taken in the graph Γ(n) and that the cycle sequence is determined by the letters of the standard word c ∆ ′′ with ∆ ′′ = T r k +ℓ (∆) = x k+1 x a k+2 k+2 • • • .Recall also that the prefix of T L i (c ∆ ) of length n corresponds to the left special vertex of Γ(n) and that L i equals the length of the µ r k +ℓ -images of the first i − 1 blocks.Phrased alternatively, L i is the length of the τ k+1 -image of the prefix of ∆ ′ of length i − 1 where ∆ ′ = T r k+1 (∆) = T(∆ ′′ ).Now m ′ − m = c k+d q k+d−1 , so reading off the first L i + m ′ − m letters of c ∆ corresponds exactly to taking the cycles determined by the L a k+1 −ℓ Let us then find out the letter y of c ∆ ′ at position val ∆ ′ (c k+2 • • • c k+d ) + 1.By (12), we have x k+1 for standard words v t with the directive word ∆ ′ .The length of the prefix of v d−1 obtained by removing the suffix x k+1 equals val ∆ ′ (c k+2 • • • c k+d ), so y = x k+1 .The next cycles taken after reading off the first L i + m ′ − m letters of c ∆ are thus determined by the L a k+1 −ℓ x k+1 -image of the letter y and the letter immediately after it.Since the L a k+1 −ℓ x k+1 -image of any letter begins with x k+1 , we see that T L i +m ′ −m (c ∆ ) initially takes the cycle C x k+1 twice.
Since m ∈ λ i,2 , we see as in the Case B of the proof of Proposition 5.3 that the prefix of T m (c ∆ ) of length n lies on the cycle C x k+1 but not on the central path.In other words, when m − L i edges are traversed on the cycle C x k+1 , starting from the left special vertex, the whole cycle is not traversed.Consider now the word T L i +m ′ −m (c ∆ ) that takes the cycle C x k+1 twice.When m − L i more letters are read, the current vertex is vertex v on the cycle C x k+1 .When ∥C x k+1 ∥ more letters are read, the vertex v is encountered again.(i) If 0 < c k+1 < a k+1 − 1 or c k+1 = a k+1 − 1 > 0 and c k = 0, then we have the following implications: (ii) If c k+1 = a k+1 , then we have the following implications: (iii) If c k+1 = a k+1 − 1 > 0 and c k ̸ = 0, then we have the following implications: (iv) Suppose that c k+1 = 0 and c k+2 < a k+2 .
(iv.a)If c k = 0, then we have the following implications: (iv.b)If a k+1 − 1 = c k+1 and c k ̸ = 0, then we have the following implication: (iv.c)If a k+1 − 1 > c k+1 and c k ̸ = 0, then we have the following implications: (v) If c k+1 = 0 and a k+2 = c k+2 , then we have the following implication: Proof.Utilize again the computations of Example 5.2.

Diophantine Exponents
Recall the definition of the Diophantine exponent dio(x) of an infinite word x from Subsection 1.3.
It is clear from the definition that dio(x) ≥ 1 for all infinite words x.It is possible that dio(x) = ∞ (as we shall see).It is proved in [17] that for each real number θ such that θ ≥ 1, there exists an infinite binary word x such that dio(x) = θ.Propositions 4.3 and 6.2 of the recent paper [6] provide lower and upper bounds for Diophantine exponents of infinite words generated by morphisms.
Diophantine exponents relate to our work through the following result that allows us to compute the Diophantine exponent of a regular episturmian word with the help of Theorem 5.10.Proof.There are some notational differences.The claim follows directly from [13,Lemma 10.3] with the observation that inrc(x, n) = r(n, x) − n where r is defined as on p. 3282 of [13].
By Proposition 6.1 and Theorem 5.10, finding dio(t) for a regular episturmian word t amounts to determining the ratio n/inrc(t, n) on the right endpoint of each of the subintervals of I k described by Theorem 5.10.
Let us then define the closely-related notions of initial critical exponent and index.Definition 6.2.Let x be an infinite word.The initial critical exponent of x, denoted by ice(x), is the supremum of the rational numbers ρ for which there exist arbitrarily long prefixes V of x such that V ρ is a prefix of x.
A formula for the the initial critical exponent of a Sturmian word is given in [10,Cor. 3.5].According to our knowledge, no attempt to study the initial critical exponents of general episturmian words has been attempted.The methods used to prove Theorem 5.10 could be used for such a study.It is worth mentioning that a slight modification of the proof of [13,Lemma 10.3] yields ice(x) = 1 + lim sup n→∞ n pnrc(x, n) for an infinite word x.Here pnrc(x, n) is the prefix nonrepetitive complexity function of x defined by setting pnrc(x, n) = max{m : Thus the prefix of x of length pnrc(x, n) + n is the shortest prefix of x containing two occurrences of the prefix of x of length n.We have the obvious relation inrc(x, n) ≤ pnrc(x, n) for all n.Definition 6.3.Let x be an infinite word.The index (or critical exponent) of x, denoted by ind(x), is the number sup{e ∈ Q ∩ [1, ∞) : x has an e-power as a factor}.
The indices have been determined for various classes of infinite words.It suffices to say here that a formula for the index of Sturmian words was derived independently in [14,22,15] (see also [30]).The index of a regular episturmian word can be found by applying [18,Lemma 6.5,Thm. 6.19].It seems that the discussion in Sections 4.1 and 5.5 of [23] is all that has been said of the indices of general episturmian words.
Let us then show that the Diophantine exponent is shift-invariant.The same is not true for the initial critical exponent; see [10,Prop. 2.1,Sect. 4.2].Proposition 6.4.Let x be an infinite word.Then dio(T m (x)) = dio(x) for all m ≥ 0.
n for a conjugate V ′ n of V n for n large enough.For n large enough, we have Therefore dio(T m (x)) ≥ dio(x).A similar argument establishes that ice(T m (x)) ≥ ice(x).
A symmetric argument where we prepend a factor of length m to words U n V e n n such that (|U n V n |) is increasing and |U n V e n n |/|U n V n | converges to dio(T m (x)) establishes that dio(x) ≥ dio(T m (x)).This argument does not apply to the initial critical exponent unless the prepended factor is compatible with the prefix powers of T m (x).

Diophantine Exponents of Regular Episturmian Words
As the first application of Theorem 5.10, we find the Diophantine exponent of a regular epistandard word.Proposition 6.5.Let ∆ be a regular directive word as in (2).Then Proof.The intercept of c ∆ is 0 ω .Thus we find from the first implication of Theorem 5.10 (iv.a) that max 7) and ( 13), it follows from Proposition 6.1 that dio(c ∆ ) = 1 + lim sup k→∞ (a k+1 + |u r k−(d−1) |/q k ).If c ∆ has unbounded partial quotients, that is, the sequence (a k ) is unbounded, then dio(c ∆ ) = ∞.Since dio(c ∆ ) ≤ ind(c ∆ ), the claim follows.If c ∆ has bounded partial quotients, then it follows from [18,Lemma 6.5,Thm. 6.19] that We saw in the preceding proof that dio(c ∆ ) = ∞ if and only if c ∆ has unbounded partial quotients.This is in fact true for all words in the subshift generated by c ∆ as the next theorem states.This theorem was originally proved for Sturmian words in [4,Prop. 11.1] and an alternative proof was given in [13,Thm. 3.3].Our more general proof uses yet another approach.Theorem 6.6.Let t be a regular episturmian word.Then dio(t) < ∞ if and only if t has bounded partial quotients.
Before the proof, let us derive some helpful inequalities.Lemma 6.7.Let t be a regular episturmian word of period d with directive word ∆ and intercept c 1 c 2 • • • .Let y be a letter occurring in ∆ and Q a nonnegative integer.Then for all k large enough.
Proof.From Lemma 3.11, (14), Lemma 5.8, and (15), we find that for all k large enough.This proves the first inequality.The second inequality is derived similarly: for all k large enough.The third inequality is derived analogously by using the inequality |u r k+1 | ≥ a k+1 q k + q k−(d+1) in the numerator and the inequalities |τ k (y)| ≤ q k and val(c  We may thus assume that t has unbounded partial quotients.Thus there exists a sequence (k j ) such that a k j +1 → ∞ as j → ∞.Below we impose restrictions on (k j ) and show that under each restriction dio(t) = ∞.
It is straightforward to see that (by taking an appropriate subsequence) some (k j ) must satisfy at least one of the restrictions or c k = 0 for k large enough.When c k = 0 for all k large enough, the word t is a shift of c ∆ , and Propositions 6.5 and 6.4 imply that dio(t) = ∞.Case A. Assume that 0 < c k j +1 < a k j +1 − 1 for all j.Suppose additionally that there exists a nonnegative constant M such that a k j +1 − c k j +1 ≤ M for all j.By the fourth implication of Theorem 5.10 (i), it suffices to show that in order to conclude that dio(t) = ∞ (here y i is as in Theorem 5.10).By (22), we have for j large enough.As a k j +1 → ∞ as j → ∞, it follows that dio(t) = ∞.Assume then that the difference a k j +1 − c k j +1 is unbounded.This time around, we consider the first implication of Theorem 5.10 (i).Notice that we need to ensure that the interval {|u r because the difference a k j +1 − c k j +1 is unbounded.Hence dio(t) = ∞ in this case as well.Case B. If c k j +1 = a k j +1 − 1 > 0 and c k j = 0 for all j, then the difference a k j +1 − c k j +1 is bounded.By repeating the arguments of Case A, we see that dio(t) = ∞.
Case C. Suppose that c k j +1 = a k j +1 for all j.From (24), we see that (a k j +1 + q k−(d+1) /q k ) so dio(t) = ∞ by the last implication of Theorem 5.10 (ii).Case D. Assume that c k j +1 = a k j +1 − 1 > 0 and c k j ̸ = 0 for all j.We apply the last implication of Theorem 5.11 (iii) and obtain similar to the Case C that Hence again dio(t) = ∞.
Interestingly a statement analogous to Theorem 6.6 is not true for the initial critical exponent.It is shown in [10,Prop. 4.1] that every Sturmian subshift contains a word t such that ice(t) ≤ 1 + φ ≈ 2.6180 where φ is the Golden ratio.Even more interestingly it is possible that ice(t) = 2 for certain Sturmian words t with unbounded partial quotients [10,Thm 1.1].We suspect that it is equally possible that the initial critical exponent is finite while the Diophantine exponent is infinite when d > 2.
Next we show that dio(t) > 2 for essentially all regular episturmian words.For Sturmian words, this result can be inferred from the results of [10] as indicated in the proof of [1,Prop. 4].Theorem 6.8.Let t be a regular episturmian word of period d with directive word as in (2).
Proof.The claim follows from Theorem 6.6 if t has unbounded partial quotients.Suppose that t has bounded partial quotients and intercept c 1 c 2 • • • .Let C = M + 1 where M = lim sup k a k .It follows from ( 14) that q k+1 /q k ≤ C for all k large enough.
Case A. Assume first that there exists infinitely many k such that 0 < c k+1 < a k+1 − 1.By the first implication of Theorem 5.10 (i), we have q k where we consider the limit superior over an appropriate subsequence like in the proof of Theorem 6.6.From (23), we find that The claim follows.Case B. Let us assume that c k+1 = a k+1 − 1 > 0 and c k = 0 for infinitely many k.Suppose in addition that a k+1 ≥ 3.By the final implication of Theorem 5.10 (i), we have .
By applying (22), we see that Case C. Assume that c k+1 = a k+1 − 1 > 0 and c k ̸ = 0 for infinitely many k.Assume moreover that a k+1 ≥ 3. The last implication of Theorem 5.10 (iii) gives From (24), we obtain that so the claim holds in this case as well.
Case D. Suppose that c k+1 = a k+1 for infinitely many k.Assume moreover that a k+1 ≥ 2. By the second implication of Theorem 5.10 (ii), we have Again, from (24), we see that Case E. Suppose that c k+1 = c k = 0 for infinitely many k.Suppose additionally that a k+1 ≥ 2. Then the first implication of Theorem 5.10 (iv.a) gives q k .Now from (23), we find that so the claim follows.Case F. Assume finally that c k+1 = 0 and c k ̸ = 0 for infinitely many k.Suppose in addition that a k+1 ≥ 2. We deduce from the first implication of Theorem 5.10 (iv.c) that Thus the claim follows by an application of ( 23) as in the Case E. The Cases A-F prove the claim when lim sup k a k ≥ 3, so we may consider the special case d = 2. Next we handle the Cases B-D unconditionally.Consider the Case D first.The Ostrowski conditions imply that c k = 0. Thus the first implication of Theorem 5.11 (ii) gives The only case left is the case where c 1 c 2 • • • has suffix (01) ω .Say k is large enough and c k+1 = 0 and c k+2 = c k = 1.Since c k+2 = 1, we have y i ̸ = x k+2 , and so |τ k+1 (y i )| = q k + q k−1 .From Theorem 5.10 (iv.b), we have By Lemma 6.10, the conclusion of Theorem 6.8 can fail only if lim sup k a k = 2 when d = 3.The next proposition shows that this may happen.For the proof, recall that the Stolz-Cesàro Theorem states that whenever the right side limit exists and the sequence (b k ) is strictly monotone.Proposition 6.11.Let t be the episturmian word with directive word (001122) ω having intercept 1 ω .Then where β, approximately 2.8312, is the real root of the polynomial x 3 − 2x 2 − 2x − 1.
Proof.Suppose that k is such that k ≥ 4. It follows from Theorem 5.10 (iii) that dio(t) − 1 equals the largest of the limits of the ratios as k → ∞.Here ∆ ′ = T r k+1 (∆) = T 2(k+1) (∆), so the words ∆ and ∆ ′ are isomorphic.Therefore the Ostrowski numeration systems associated with ∆ and ∆ ′ are the same.It follows that val ∆ ′ (c k+2 • • • c k+d−1 ) = val ∆ (1) = 1 meaning that y i is the second letter of the word c ∆ ′ , that is, y i = x k+2 .Thus τ k (y i ) = s 2 k−1 s k−2 .The numbers q i satisfy the linear recurrence q i = 2q i−1 + 2q i−2 + q i−3 , so it follows from the theory of linear recurrences that q k+ℓ /q k → β ℓ as k → ∞.
Therefore dio(t) is at least as large as claimed.It thus suffices to show that the other two limits do not exceed this value.For the first ratio, we find like above that lim and, for the second ratio, we have Notice that Proposition 6.11 shows that the Diophantine exponent can be less than that of the corresponding standard word.The next proposition demonstrates that Lemma 6.10 does not generalize to d > 3. Therefore the assumptions of Theorem 6.8 are necessary.Proposition 6.12.Let t be the episturmian word with directive word (0123) ω having intercept (001) ω , (010) ω , or (100) ω .Then dio(t) = 1 + 1 27 (−7ζ Proof.Assume that the intercept c 1 c 2 • • • equals (001) ω .Suppose that k is such that k ≥ 3 and c k+1 = 1.It follows from Theorem 5.10 (ii) that dio(t) − 1 is it least as large as the limits of as k → ∞ (along appropriate subsequences).Now c k+2 • • • c k+d−2 = 00, so y i is the first letter of the epistandard word with intercept x k+2 x k+3 • • • , that is, y i = x k+2 (notice that the word ∆ ′ of Theorem 5.10 is isomorphic to the directive word (0123) ω , so the numeration systems associated to both directive words are the same).It follows that τ k (y i ) = s k−1 s k−2 s k−3 .The numbers q i satisfy the linear recurrence q i = q i−1 + q i−2 + q i−3 + q i−4 , so it follows from the theory of linear recurrences that q k+ℓ /q k → ζ ℓ 4 as k → ∞.From (7), we find that |u r k+1 | − |u r k−2 | = q k − q k−4 .In addition, val(c 1 • • • c k ) − val(c 1 • • • c k−3 ) = q k−3 .Using the Stolz-Cesàro Theorem, the limit of the first ratio is found as follows: Therefore dio(t) is at least as large as claimed.It thus suffices to show that the analogous limits do not exceed this value in the other cases.For the latter ratio, we find that = lim k→∞ q k − q k−4 q k + q k−1 + q k−2 − 2q k−3 + q k−7 ≈ 0.6107.
Assume then that c k+1 = 0 and c k = 1.By Theorem 5.10 (iv.b), we need to find the limit of Proceeding as above, it is straightforward to show that the limits are approximately 0.7653 and 0.7309.This proves the claim.It is straightforward to check that the intercepts (010) ω and (100) ω lead to exactly the same result.
Since ice(x) ≤ dio(x) for any infinite word x, Proposition 6.11 has the following remarkable consequence.Corollary 6.13.There exists an episturmian word over a 3-letter alphabet having only finitely many square prefixes.This is indeed unexpected since every Sturmian word and every regular epistandard word has arbitrarily long square prefixes [18,Lemma 6.5].We expect that a regular episturmian word has infinitely many square prefixes when lim sup k→∞ a k ≥ 3, but we have not attempted to prove this.We also expect that every infinite word in the Tribonacci subshift has arbitrarily long square prefixes.
Proof.Since t is not in the shift orbit of f d , the intercept does not end with 0 ω .Therefore it contains 0 ℓ 1 infinitely many times for some ℓ.Let k be such that c k+1 = 1 and c k = . . .= c k−(ℓ−1) = 0. Using Lemma 5.9, we see that Now −val(c , so if ℓ can be taken arbitrarily large, we have from Theorem 5.10 (ii) that where the last equality follows from the proof of Proposition 6.5.The latter claim now follows from Proposition 6.15 and simplification.If ℓ ≥ d, then we deduce from (25) that It is easy to check that this lower bound is larger than dio(f d ).Thus the first claim is proved.
We believe that there always exists a word in the subshift of the d-bonacci word such that dio(t) = 2 +

Lemma 3 . 3 .
Let ∆ be a directive word as in (2) and k ≥ 0. If P(r k + 1) exists, then the same number which is impossible by what we just argued.Thus b ℓ = c k .By repeating the argument for the words c 1 the word s k+d−1 has the factors s k−1+d−1 and s a * k s k−1 .