On Arithmetically Progressed Suffix Arrays and related Burrows-Wheeler Transforms

We characterize those strings whose suffix arrays are based on arithmetic progressions, in particular, arithmetically progressed permutations where all pairs of successive entries of the permutation have the same difference modulo the respective string length. We show that an arithmetically progressed permutation $P$ coincides with the suffix array of a unary, binary, or ternary string. We further analyze the conditions of a given $P$ under which we can find a uniquely defined string over either a binary or ternary alphabet having $P$ as its suffix array. For the binary case, we show its connection to lower Christoffel words, balanced words, and Fibonacci words. In addition to solving the arithmetically progressed suffix array problem, we give the shape of the Burrows-Wheeler transform of those strings solving this problem. These results give rise to numerous future research directions.


Introduction
The integral relationship between the suffix array [36] (SA) and Burrows-Wheeler transform [10] (BWT) is explored by Adjeroh et al. [1], who also illustrate the versatility of the BWT beyond its original motivation in lossless block compression [10].BWT applications include compressed index structures using backward search pattern matching, multimedia information retrieval, bioinformatics sequence processing, and it is at the heart of the bzip2 suite of text compressors.By its association with the BWT, this also indicates the importance of the SA data structure and hence our interest in exploring its combinatorial properties.
These combinatorial properties can be useful when checking the performance or integrity of string algorithms or data structures on string sequences in testbeds when properties of the employed string sequences are well understood.
In particular, due to current trends involving massive data sets, indexing data structures need to work in external memory (e.g., [8]), or on distributed systems (e.g.[24]).For devising a solution adaptable to these scenarios, it is crucial to test whether the computed index (consisting of the suffix array or the BWT, for instance) is correctly stored on the hard disk or on the computing nodes, respectively.This test is more cumbersome than in the case of a single machine working only with its own RAM.One way to test is to compute the index for an instance, whose index shape can be easily verified.For example, one could check the validity of the computed BWT on a Fibonacci word since the shape of its BWT is known [39,13,45].
Other studies based on Fibonacci words are the suffix tree [43] or the Lempel-Ziv 77 (LZ77) factorization [7].In [34], the suffix array and its inverse of each even Fibonacci word is studied as an arithmetic progression.In this study, the authors, like many at that time, did not append the artificial $ delimiter (also known as a sentinel) to the input string, thus allowing suffixes to be prefixes of other suffixes.This small fact makes the definition of BWT T [i] = T[SA T [i]−1] for a string T with suffix array SA T incompatible with the traditional BWT defined on the BWT matrix, namely the lexicographic sorting of all cyclic rotations of the string T.
Note that not all strings are in the BWT image.An O(n log n)-time algorithm is given by Giuliani et al. [28] for identifying all the positions in a string S where a $ can be inserted into so that S becomes the BWT image of a string ending with $.
Despite this incompatibility in the suffix array based definition of the BWT, we can still observe a regularity for even Fibonacci words [34,Sect. 5].Similarly, both methods for constructing the BWT are compatible when the string T consists of a Lyndon word.The authors of [34,Remark 1] also observed similar characteristics for other, more peculiar string sequences.For the general case, the $ delimiter makes both methods equivalent, however the suffix array approach is typically preferred as it requires O(n) time [41] compared to O(n 2 ) with the BWT matrix method [10].By utilizing combinatorial properties of the BWT, an in-place algorithm is given by Crochemore et al. [15], which avoids the need for explicit storage for the suffix sort and output arrays, and runs in O(n 2 ) time using O(1) extra memory (apart from storing the input text).Köppl et al. [33,Sect. 5.1] adapted this algorithm to compute the traditional BWT within the same space and time bounds.
Up to now, it has remained unknown whether we can formulate a class of string sequences for which we can give the shape of the suffix array as an arithmetic progression (independent of the $ delimiter).With this article, we catch up on this question, and establish a correspondence between strings and suffix arrays generated by arithmetic progressions.Calling a permutation of integers [1..n] arithmetically progressed if all pairs of successive entries of the permutation have the same difference modulo n, we show that an arithmetically progressed permutation coincides with the suffix array of a unary, binary or ternary string.We analyze the conditions of a given arithmetically progressed permutation P under which we can find a uniquely defined string T over either a unary, a binary, or ternary alphabet having P as its suffix array.
The simplest case is for unary alphabets: Given the unary alphabet Σ := {a} and a string T of length n over Σ, SA T = [n, n − 1, . . ., 1] is an arithmetically progressed permutation with ratio −1 ≡ n − 1 mod n.
For the case of a binary alphabet {a, b}, several strings of length n exist that solve the problem.Trivially, the solutions for the unary alphabet also solve the problem for the binary alphabet.However, studying those strings of length n whose suffix array is [n, n − 1, . . ., 1], there are now multiple solutions: each T = b r a s with r, s ∈ [0.
.n] such that r + s = n has this suffix array.Similarly, T = a n−1 b has the suffix array SA T = [1, 2, . . ., n], which is an arithmetically progressed permutation with ratio 1.
A non-lexicographic ordering on strings, the V -order , considered for a modified FM-index [2], provides a curious example for the case with ratio −1: if S is a proper subsequence of a string T of length n, then S precedes T in V -order, written S ≺ V T. This implies that T .n], so that SA T [i] = n − i + 1 for every i ∈ [1..n], thus enabling trivial suffix sorting.
Clearly, any string ordering method that prioritizes minimal length within its definition will have a suffix array progression ratio of −1.Binary Gray codes 1 have this property and are ordered so that adjacent strings differ by exactly one bit [29].These codes, which exhibit non-lexicographic order, have numerous applications, notably in error detection schemes such as flagging unexpected changes in data for digital communication systems, and logic circuit minimization.While it seems that Gray code order has not been applied directly to order the rows in a BWT matrix, just four years after the BWT appeared in 1994 [10], Chapin and Tate applied the concept when they investigated the effect of both alphabet and string reordering on BWT-based compressibility [11].Their string sorting technique for the BWT matrix ordered the strings in a manner analogous to reflected Gray codes, but for more general alphabets.This modification inverted the sorting order for alternating character positions with which they demonstrated improved compression.
To date the only known BWT designed specifically for binary strings is the binary block order B-BWT [19].This binary string sorting method prioritizes length, thus also exhibiting a suffix array with progression ratio −1.Ordering strings of the same length, as applicable to forming the BWT matrix, is by a play on decreasing and increasing lexicographic ordering of the run length exponents of blocks of bits.Experimentation showed roughly equal compressibility to the original BWT in binary lexicographic order but with instances where the B-BWT gave better compression.
To see that the above two binary orders are distinct: 1101 comes before 1110 1 Originally known as Reflected Binary Code by the inventor Frank Gray.
in Gray code order, whereas, 1110 comes before 1101 in binary block order.
In what follows, we present a comprehensive analysis of strings whose suffix arrays are arithmetically progressed permutations (under the standard lexicographic order).In practice, such knowledge can reduce the O(n) space for the suffix array to O (1).
The structure of the paper is as follows. 2 In Section 2 we give the basic definitions and background, and also deal with the elementary case of a unary alphabet.We present the main results in Section 3, where we cover ternary and binary alphabets, and consider inverse permutations.In Section 4, we illustrate the binary case for Christoffel words and in particular establish that every (lower) Christoffel word has an arithmetically progressed suffix array.We go on to link the binary characterization to balanced words and Fibonacci words.A characterization of strings with larger alphabets follows in Section 5. We overview the theme concepts for meta strings in Section 6.We conclude in Section 7 and propose a list of open problems and research directions, showing there is plenty of scope for further investigation.But for all that, we proceed to the foundational concepts presented in the following section, starting with the case of a unary alphabet.

Preliminaries
Let Σ be an alphabet with size σ := |Σ|.An element of Σ is called a character3 .Let Σ + denote the set of all nonempty finite strings over Σ.The empty string of length zero is denoted by ε; we write Σ * = Σ + ∪ {ε}.Given an integer n ≥ 1, a string4 of length n over Σ takes the form T = t 1 • • • t n with each t i ∈ Σ.We write T = T[1..n] with T[i] = t i .The length n of a string T is denoted by |T|.If T = uwv for some strings u, w, v ∈ Σ * , then u is a prefix , w is a substring, and v is a suffix of T; we say u (resp.w and v) is proper if u = T (resp.w = T and v = T).We say that a string T of length n has period p ∈ .n − p] (note that we allow periods larger than n/2).If T = uv, then vu is said to be a cyclic rotation of T. A string T is said to be a repetition if and only if it has a factorization T = T k for some integer k ≥ 1; otherwise, T is said to be primitive.A string that is both a proper prefix and a proper suffix of a string T = ε is called a border of T; a string is border-free if the only border it has is the empty string ε.
If Σ is a totally ordered alphabet with order <, then this order < induces the lexicographic ordering ≺ on Σ * such that u ≺ v for two strings u, v ∈ Σ * if and only if either u is a proper prefix of v, or u = ras, v = rbt for two characters a, b ∈ Σ such that a < b and for some strings r, s, t ∈ Σ * .In the following, we select a totally ordered alphabet Σ having three characters a, b, c with a < b < c.
A string T is a Lyndon word if it is strictly least in the lexicographic order among all its cyclic rotations [35].For instance, abcac and aaacbaabaaacc are Lyndon words, while the string aaacbaabaaac with border aaac is not.
A reciprocal relationship exists between the suffix array of a text and its Lyndon factorization, that is, the unique factorization of the text by greedily choosing the maximal length Lyndon prefix while processing the text from start to end: the Lyndon factorization of a text can be obtained from its suffix array [31]; conversely, the suffix array of a text can be constructed iteratively from its Lyndon factorization [38].
Lyndon words have numerous applications in combinatorics and algebra, and prove challenging entities due to their non-commutativity.Additionally, Lyndon words can arise naturally in Big Data -an instance in a biological sequence over the DNA alphabet, with Σ = {A < C < G < T}, is the following substring occurring in a SARS-CoV-2 genome5 :

AAAAACAGTAAAGTACAAATAGGAGAGTACACCTTTGAAAAAGGTGACTATGGTGAT
For the rest of the article, we take a string T of length n ≥ 2. The suffix array SA : The focus of this paper is on arithmetic progressions.An arithmetic progression is a sequence of numbers such that the differences between all two consecutive terms are of the same value: Given an arithmetic progression {p i } i≥1 , there is an integer k ≥ 1 such that p i+1 = p i + k for all i ≥ 1.We call k the ratio of this arithmetic progression.Similarly to sequences, we can define permutations that are based on arithmetic progressions: An arithmetically progressed permutation with ratio k ∈ [1..n − 1] is an array P := [p 1 , . . ., p n ] with p i+1 = p i + k mod n for all i ∈ [1..n], where we stipulate that p n+1 := p 1 . 6ere x mod n := x if x ≤ n, x − n mod n for an integer x ≥ 1, and x + n mod n for x < 1.In particular, n mod n = 0 mod n = n.In what follows, we want to study (a) strings whose suffix arrays are arithmetically progressed permutations, and (b) the shape of these suffix arrays.For a warm-up, we start with the unary alphabet: Theorem 2.1.Given the unary alphabet {a}, the suffix array of a string of length n over {a} is uniquely defined by the arithmetically progressed permutation [n, n − 1, . . ., 1] with ratio n − 1.
Conversely, given the arithmetically progressed permutation P = [n, n − 1, . . ., 1], we want to know the number of strings from a general totally ordered alphabet Σ = [1..σ] with the natural order 1 < 2 their suffix array.For that, we fix a string T of length n with SA T = P .Let s j ≥ 0 be the number of occurrences of the character j ∈ Σ appearing in T. Then σ j=1 s j = n.By construction, each character j has to appear after all characters k with k > j.Therefore, T = σ sσ (σ − 1) sσ−1 • • • 1 s1 such that the position of the characters are uniquely determined.As described above, strings of Theorem 2.2 have the form σ sσ (σ−1) sσ−1 • • • 1 s1 .The BWT based on the suffix array is 1 s1−1 2 s2 • • • σ sσ 1.For s 1 ≥ 2, it does not coincide with the BWT based on the rotations since the lexicographically smallest rotation is 1 s1 σ sσ • • • 2 s2 , and hence the first entry of this BWT is 2. For s 1 = 1, the last character '1' acts as the dollar sign being unique and least among all characters, making both BWT definitions equivalent.
For the rest of the analysis, we omit the arithmetically progressed permutation [n, n − 1, . . ., 1] of ratio k = n − 1 as this case is complete.All other permutations (including those of ratio k = n − 1) are covered in our following theorems whose results we summarized in Fig. 1.

Arithmetically Progressed Suffix Arrays
We start with the claim that each arithmetically progressed permutation coincides with the suffix array of a string on a ternary alphabet.Subsequently, given an arithmetically progressed permutation P , we show that either there is precisely one string T with SA T = P whose characters are drawn from a ternary alphabet, or, if there are multiple candidate strings, then there is precisely one whose characters are drawn from a binary alphabet.For this aim, we start with the restriction on k and n to be coprime:

Coprimality
Two integers are coprime 8 . On the contrary, each arithmetically progressed permutation with ratio k induces an ideal kN (the induced ideals are the same for two arithmetically progressed permutations that are shifted).Consequently, there is no arithmetically progressed permutation with ratio k if k and n are not coprime since in this case {(ki) mod n | i ≥ 1} [1..n], from which we obtain: Lemma 3.1.The numbers k and n must be coprime if there exists an arithmetically progressed permutation of length n with ratio k.

Ternary Alphabet
Given an arithmetically progressed permutation P := [p 1 , . . ., p n ] with ratio k, we define the ternary string T[1..n] by splitting P right after the values n − k and (p 1 − k − 1) mod n into the three subarrays A, B, and C (one of which is possibly empty) such that P = ABC.Subsequently, we set (1) Figure 2 gives an example of induced ternary/binary strings.5) and ( 8), the alphabet is binary and the BWTs are the same.The strings of ( 3) and ( 8) are periodic with period n − k, since the last text position of each subarray is at most as large as n − k = 3 (cf.the proof of Theorem 3.9).For and 2. The split at the value n − k ensures that when reaching p i = n − k and p i+1 = n, we can stop the comparison here as there is no character following T[p i+1 ].The split here ensures that we can compare the suffixes .n] (which yields a contradiction unless p n = n).
To sum up, the text positions stored in each of A, B and C are in the same order as in SA T since the j − 1 subsequent text positions of each consecutive pair of entries p i and p i+1 are consecutive in P for the smallest integer j ∈ [1.
.n] such that p i+1 + jk ∈ {p 1 − 1, n}.  2) from Fig. 6.Reading the last column of a BWT matrix (whose characters are italic) from top downwards yields the BWT defined on the BWT matrix.While the BWT defined on the BWT matrix and the one defined by the suffix array coincides for the strings of Eq. ( 1) due to Theorem 3.3, this is not the case in general for the binary strings studied in Section 3.3, where we observe that BWT bbabbabb = bbbbbaab defined by the suffix array differs from bbbbbaba (the last column on the right).
Knowing the suffix array of the ternary string T of Eq. ( 1), we can give a characterization of its BWT.We start with the observation that both BWT definitions (rotation based and suffix array based) coincide for the strings of Eq. (1) (but do not in general as highlighted in the introduction, cf.Fig. 4), and then continue with insights in how the BWT looks like.Theorem 3.3.Given an arithmetically progressed permutation P := [p 1 , . . ., p n ] = [n, n − 1, . . ., 1] with ratio k and the string T of Eq. ( 1), the BWT of T defined on the BWT matrix coincides with the BWT of T defined on the suffix array.
Proof.According to Theorem 3.2, SA T = P , and therefore the BWT of T defined on the suffix array is given by BWT T The BWT matrix is constituted of the lexicographically ordered cyclic rotations of T. The BWT BWT matrix based on the BWT matrix is obtained by reading the last column of the BWT matrix from top downwards (see Fig. 4).Formally, let Q[i] be the starting position of the lexicographically i-th smallest rotation . We do that by comparing both rotations R i and R i+1 characterwise: Let j be the first position where R i and R i+1 differ, i.e., R i First we show that j = p n − p i + 1 mod n by a contradiction: Assuming that j = p n − p i + 1 mod n, we conclude that j = 1 by the definition of Since k is the ratio of P , we have , contradicting the choice of j as the first position where R i and R i+1 differ.
This concludes that j ≤ n (hence, and by definition of j we have ] be an arithmetically progressed permutation with ratio k.Further, let T[1..n] be given by Eq. ( 1) such that SA T = P according to Theorem 3.2.Given that Proof.Since P is an arithmetically progressed permutation with ratio k then so is the sequence P := [p 1 , . . ., p n ] with p i = p i − 1 mod n.In particular, P is a cyclic shift of P with p n = p 1 − 1 − k mod n because p 1 = p 1 − 1.However, p 1 − 1 − k is a split position of one of the subarrays A, B, or C, meaning that P starts with one of these subarrays and ends with another of them (cf.Fig. 5).
Consequently, there is a t such that p t = p n , and we have the property that We will determine the parameter t = n − k −1 mod n after Eq. ( 3) in Section 3.4, where k −1 is defined such that k • k −1 mod n = 1 mod n.With Lemma 3.4, we obtain the following corollary which shows that the number of runs in BWT T for T defined in Eq. ( 1) are minimal: Corollary 3.5.For an arithmetically progressed permutation P := [p 1 , . . ., p n ] = [n, n − 1, . . ., 1] and the string T defined by Eq. ( 1), BWT T consists of exactly 2 runs if T is binary, while it consists of exactly 3 runs if T is ternary.
Proof.The only possible way to define another string T would be to change the borders of the subarrays A, B, and C. Since p 1 ∈ {1, n}, n − k and n, as well as p 1 − k − 1 and p 1 − 1, are stored as a consecutive pair of text positions in P .
• If P is not split between its consecutive text positions (p .n], cf.Fig. 3. Following this analysis of the ternary case we proceed to consider binary strings.A preliminary observation is given in Fig. 2, which shows, for the cases p 1 is 1 and n in Theorem 3.6, namely Rotations ( 5) and (8), that a rotation of n− k in the permutation gives a rotation of one in the corresponding binary strings.We formalize this observation in the following lemma, drawing a connection between binary strings whose suffix arrays are arithmetically progressed and start with 1 or n.Lemma 3.7.Let P := [p 1 , . . ., p n ] be an arithmetically progressed permutation with ratio k and p 1 = 1 for a binary string T over Σ = {a, b} with SA T = P .Suppose that the number of a's in T is m and that In the following, we show that P = SA T for P For that, we show that each pair of suffixes in SA T is kept in the same relative order in P (excluding SA T [1] = 1): Hence the relative order of these suffixes given by [p 2 , . . ., p n ] and [p 2 , . . ., p n ] is the same.In total, we have .n], hence P is an arithmetically progressed permutation with ratio k.Given the first m entries in P represent all suffixes of T starting with a, P is the m-th rotation of P since p 1 = n is the (m + 1)-th entry of P , i.e., the smallest suffix starting with b in T.
Finally, since the strings T and T are rotations of each other, their BWTs are the same.
Like the parameter t of Lemma 3.4, we will determine the parameter m after Eq. (3) in Section 3.4.

Binary Alphabet
We start with the construction of a binary string from an arithmetically progressed permutation: Figure 6: All binary strings of length 8 whose suffix arrays are arithmetically progressed permutations with ratio k = 5.Theorem 3.9 characterizes these strings (and also gives the definition of p s ).Cases ( 1) and (3) also appear in Fig. 2 at Rotation ( 8) and ( 5), respectively, while Case (2) can be obtained from Rotation (6) by exchanging the last character with c. Cases (1) and ( 2) both have period n − k = 3, and Case (3) is a Lyndon word.
Case 1 Case 2 Case 3 Figure 7: Sketches of the cases of Theorem 3.9.T s is uniquely determined if the suffix array SA of T s is arithmetically progressed with ratio k and the first entry mutations.More precisely, we identify which of them are unique 9 , periodic, or a Lyndon word.
Theorem 3.9.Let n and k ∈ [1..n − 1] be two coprime integers.If k = n − 1, there are exactly three binary strings of length n whose suffix arrays are arithmetically progressed permutations with ratio k.Each such solution T s ∈ {a, b} + is characterized by for all text positions i ∈ [1.
The individual solutions are obtained by fixing the values for p 1 and p s , the position of the lexicographically largest suffix starting with a, of SA Ts = [p 1 , . . ., p n ]: 2. p 1 = k + 1 and p s = n − k, and 3. p 1 = 1 and p s = n − k. 9 The exact number of these binary strings is not covered by Theorem 3.6.
The string T s has period n − k in Cases 1 and 2, while T s of Case 3 is a Lyndon word, which is not periodic by definition.
For k = n − 1, Cases 2 and 3 each yields exactly one binary string, but Case 1 yields n binary strings according to Theorem 2.2.
Proof.Let S be a binary string of length n, and suppose that SA S = P := [p 1 , . . ., p n ] is an arithmetically progressed permutation with ratio k.Further let p s be the position of the largest suffix of S starting with a. Then S[p i ..n] ≺ S[p i+1 ..n] and thus Case is the smallest suffix starting with b, and consequently p s = n − k.
For the periodicity, with .n] \ {p 1 , p s } we need to check two conditions: .n]thus becomes the second smallest suffix).Finally, we need to show that no other value for p 1 admits a binary string S having an arithmetically progressed permutation P := [p 1 , . . ., p n ] with ratio k as its suffix array.So suppose that p 1 / ∈ {1, k + 1, n}, then this would imply the following: and thus the smallest suffix starting with b is located at index n (this is a contradiction since we assumed that this suffix starts at p 1 − 1 ∈ [1..n − 1]).This establishes the claim for p 1 .
For a given arithmetically progressed permutation with ratio k, and first entry p 1 ∈ {1, k + 1, n}, the string T s of Theorem 3.9 coincides with T of Theorem 3.8.

Inverse Permutations
Since the inverse P −1 of a permutation P with P −1 [P [i]] = i is also a permutation, one may wonder whether the inverse P −1 of an arithmetically progressed permutation is also arithmetically progressed.We affirm this question in the following.For that, we use the notion of the multiplicative inverse k −1 of an integer k (to the congruence class [1..n] = Z/nZ), which is given by k −1 • k mod n = 1 mod n.The multiplicative inverse k −1 is uniquely defined if k and n are coprime.
Theorem 3.10.The inverse P −1 of an arithmetically progressed permutation P with ratio k is an arithmetically progressed permutation with ratio k −1 and P For the inverse permutation P −1 this means that P −1 [x] = i and P Since P [i] = j ⇐⇒ P [n] + ik mod n = j holds for all indices i ∈ [1.
.n], we have (using i ← P −1 [1] and j ← 1 in the above equivalence) Consequently, using the split index s of p s for SA and we can rewrite T s defined in Eq. (2) as where SA and ISA denote the suffix array and the inverse suffix array of T s , respectively.Another result is that ISA[p s ] = s is the number of a's in T s , for which we split the study into the cases of Theorem 3.9: For Fig. 6 with k = 5 and n = 8, we know that the number of a's is ISA[p s ] = 3 in Cases (1) and (3), and This also determines the constant m used in Lemma 3.7.Finally, we can fix the parameter t in Lemma 3.4 defined such that p t = p 1 −1−k mod n: For that, write 3q mod 0 (p + q) p 2p mod 0 (p + q) Figure 8: Cayley graph of Z/(p + q)Z generated by q.See [6, Fig. 1.4] for a concrete example.

Applications
We conclude our main results of Section 3 by drawing connections between strings having arithmetically progressed suffix arrays and Christoffel words (Section 4.1), balanced words (Section 4.2), and Fibonacci words (Section 4.3).

Christoffel Words
Christoffel words are binary strings whose origins are considered to date from Bernoulli's 1771 work [5].Christoffel words can be described geometrically in terms of a line segment and associated polygon traversal [9]: let (p, q) ∈ N 2 where (p, q) are coprime and let S be the line segment with endpoints (0, 0) and (p, q).The induced path of a binary string T is a list of points v 0 , . . ., v n ∈ N 2 such that v 0 = (0, 0), v n = (p, q), and for each i ∈ The string T ∈ {a, b} * is a lower Christoffel word if the path induced by T from the origin to (p, q) is below the line segment S and the path and S determines a simple polygon which contains no other point in N 2 .An upper Christoffel word is defined analogously by taking the path above S. Hence, a Christoffel word is defined by a direction (above or below) and the slope p/q, which determines p and q since p and q are coprime.
For instance, Case (3) of Fig. 6 determines a Christoffel word with slope 5 3 .It follows from the coprimality of p and q that Christoffel words are necessarily primitive.In what follows, we focus on lower Christoffel words, and drop the lower adjective when speaking of (lower) Christoffel words.
To show that every lower Christoffel word has an arithmetically progressed suffix array, we use an alternative characterization of Christoffel words based on Cayley graphs [6,Def. 1.4].Let again p, q ∈ N be coprime.Fig. 8 is the Cayley graph of Z/(p + q)Z generated by q.Cayley graphs are always simple cycles since q and p + q are coprime.In what follows, we use mod 0 with n mod 0 n = 0 mod 0 n = 0 to match the definition in [6], as opposed to mod with 0 mod n = n mod n = n elsewhere.
An edge s → t in the Cayley graph has the label a if s < t, otherwise it has the label b.Reading the edge labels, starting from node 0, following the edges of the Cayley graph and stopping when reaching node 0 again, yields the lower Christoffel word T c parametrized by p and q.The i-th node in the Cayley graph (0 being the first node) is (i − 1)q mod 0 (p + q).Hence the i-th character of T c is Given the suffix array SA Tc of a lower Christoffel word T c , the split index s (defined in Theorem 3.9) is given by p, the total number of units along the x-axis in the polygonal path.All lower Christoffel words are Lyndon words [9] and so necessarily border-free.Hence SA Tc [p 1 ] = 1, and SA Tc [s+1] = n since the string T c must end b.The following theorem gives now the connection between lower Christoffel words and the strings of Section 3: Theorem 4.1.Let p, q ∈ N be coprime.Then the lower Christoffel word T c characterized by p and q has an arithmetically progressed suffix array.The suffix array is given by the arithmetic progression P with p 1 = 1 and k = q −1 , where q −1 is the multiplicative inverse of q in Z/nZ.The string T c is identical to Case 3 of Theorem 3.9 characterizing the binary case.
Proof.We proof the theorem by showing that the Christoffel word T c is equal to the string described in Theorem 3.9 as Case 3 whose suffix array is the arithmetically progressed array P .
Let n = p + q.By Eq. ( 4) the i-th character of T c is an a if and only if (i − 1)q mod 0 n < iq mod 0 n.We can rewrite that to (i − 1)q mod 0 n < (i − 1)q mod 0 n + q mod 0 n.This condition is fulfilled if and only if ((i − 1)q mod 0 n) + q < n.We can rewrite that to (i − 1)q mod 0 n < n − q.Replacing mod 0 with mod, we obtain Let k = q −1 .Let T s denote the string of Case 3 in Theorem 3.9, which is characterized by p 1 = 1 and p s = n − k = n − q −1 .Using the results from Section 3.4, Eq. (3), T s can be written as we can rewrite the condition for a in Eq. ( 5) as 1 + (i − 1)q mod n ≤ n − q.Replacing ≤ with < we obtain the same definition for T s as for T c , concluding the proof.In the geometric setting, traversing the path of the polygon from (0, 0) to (p, q) is equivalent to scanning the characters in the defining Christoffel word T c , hence we can associate each character with its polygon coordinates.The BWT can be obtained from iterating across the suffix array and for each index selecting the preceding character in the text -the BWT for the first string in Figure 9 has maximal runs in the form b 5 a 7 ; more generally, the BWT of a Christoffel word over Σ = {a, b} with slope q p takes the form b q a p [6].Further, since the BWT is injective on Lyndon words (see [6]) it follows that this property holds for Christoffel words.
The progression ratio q −1 of Theorem 4.1 may be useful for accessing geometries of interest in the polygon or aiding discovery of geometric repetitive structures.For instance, to access the 'steps' in the polygon in decreasing width, start at the origin and apply increments of k: for the first string in Figure 9 the widest steps are at coordinates (0, 0) and (3, 2) -see Figure 10.Note that the geometric width of this polygon at a certain height reflects a run of the character a in its representing Christoffel word T c .Hence, our question of steps in decreasing width can be translated to finding text positions T c [i] with T c [i..] having a long common prefix with a • • • .
We now extend a known result for lower Christoffel words [6, Sect.6.1], which distinguishes consecutive rows in an associated BWT matrix: Lemma 4.2.Let T c be a Christoffel word of length n ≥ 2 over Σ = {a, b}.Suppose T c has an arithmetically progressed suffix array with progression ratio k.Then in the BWT matrix rows i and i+1 for 1 ≤ i < n differ in exactly two consecutive positions.For each row i, these positions are at Proof.The Christoffel word T c is characterized by Theorem 3.9 (3).Further, since T c is also a Lyndon word then it starts with a and ends with b.The second row in the BWT matrix where the prefix T c [SA Tc [2]..n] has length n − k and ends with b.
From Case 3 of Theorem 3.9, p s = n − k, and so T c [n − k] = a, and since .n] is the largest suffix starting with a and n ≥ 2, then it has the prefix This gives an arrangement in the first two rows of A = ab ba starting at position n − k in row 1 and then at position 2(n − 2) mod n in row 2. Similarly for each subsequent row i in M, i < n, there is an additional factor of (n − k) mod n for the position of occurrence of A.
We proceed to show that there are no other differences between adjacent rows other than those occurring in A. So suppose instead that M[i, j] = c and Then the character d in row i + 1 is the same character in T c as the one at position j − (n − k) mod n in row i which must be the same character as the one at position j − (n − k) + (n − k) mod n in row i, namely a c, contradicting the claim.
Hence the only differences between adjacent rows occur at A which coincide with p s and the end of T c .In particular, A determines the lexicographic ordering of the rows of the matrix M.
The arithmetic progression of arithmetically progressed suffix arrays of Christoffel words allows us to determine the following factorizations of such words in constant time (without looking at the explicit characters of the word): • Right factorization [6]; originally called the standard factorization [12,22].
If w = uv is a Lyndon word with v its lexicographically least proper suffix, then u and v are also Lyndon words and u < v. Equivalently, the right factor v of the standard factorization can be defined to be the longest proper suffix of w that is a Lyndon word [3].
• Left factorization [44,46].If w = uv is a Lyndon word with u a proper Lyndon prefix of maximal length, then v is also a Lyndon word and u < v.
• balanced 2 factorization [40].A Lyndon word w is balanced 2 if w is a character or there is a factorization w = (u, v) that is simultaneously the left and the right factorization of w, and u and v are also balanced 2 .
For any of the three above factorizations w = (u, v), we will say that the factorization index is the index of the last character of u in w.The factorization index determines the split position and hence the factorization.Proof.We apply the result that a word is a lower Christoffel word if and only if it is a balanced 2 Lyndon word [40] or [6,Theorem 6.7].So given T c , it remains to prove that Since T u and T v are Lyndon words, we have that • The second smallest suffix starting with b is To see that observe that, because T c is a Lyndon word, SA Tc [1] = 1, and hence Finally, .n] since otherwise we yield a contradiction to the right factorization that T v is the lexicographically least proper suffix.So j = SA Tc [(s + 2) mod n].
For example, the Christoffel word in Fig. 9 Case (1) has factorization index SA Tc [9] = 5 with the balanced 2 factorization (aabab)(aababab); the first level of recursion gives the balanced 2 factorizations (aab)(ab) and (aabab)(ab).Conditions for the factorization of a Lyndon word into exactly two nonoverlapping Lyndon factors are given in [16], where overlapping factors have non-empty suffixes and prefixes sharing the same characters.We can geometrically consider the balanced 2 factorization of Christoffel words as follows: Lemma 4.4.Let p, q ∈ N be coprime and T c be the lower Christoffel word characterized by p and q with factorization index i.Further, let the line segment S = ((0, 0), (p, q)) be associated with T c .Then for all points on the path determined by T c (apart from the end points), T c [i] has the shortest Euclidean distance to S.
Proof.Since T c is a Christoffel word it is also a balanced 2 Lyndon word.Let the balanced 2 factorization be T c = T u T v , where the factors have defining line segments S u , S v respectively, i.e., S u is the line from (0, 0) to the point s being the geometric representation of T c [1..|T u |], and S v is the line from s to (p, q), cf.Fig. 10 with S u = aabab and s = (3, 2).Now assume that there exists a point r that geometrically represents a prefix T c [1..i] with the property that r does not have the shortest Euclidean distance to S. Then at least one of the paths for T u and T v must cross their associated line S u or S v , contradicting the geometric definition of a Christoffel word.

Balanced Words
A binary string T is called a balanced word if for each character c ∈ {a, b}, the number of occurrences of c in U and in V differ by at most one, for all pairs of substrings U and V with |U| = |V| of the infinite concatenation T • T • • • of T. Lemma 4.5.Let T be a string over the binary alphabet {a, b} with an arithmetically progressed suffix array P = [p 1 , . . ., p n ] with ratio k.Given that p 1 = k − 1, T is balanced.
Proof.According to Theorem 3.3 for T the BWT defined on the suffix array and the BWT defined on the BWT matrix are identical.From Corollary 3.5 we know that the BWT of T has the shape b x a y .By [42,Thm. 2] for binary words the following conditions are equivalent: 1.There are two coprime numbers p and q such that BWT T = b p a q ; 2. T is balanced.
We conclude the proof by showing that x and y are co-prime.Since p 1 = k − 1 we have p 1 ∈ {1, n}.Using the results from Section 3.4 the number of a's in T

Relation to the Fibonacci word sequence
The Fibonacci sequence is a sequence of binary strings {F m } m≥1 with F 1 := b, Köppl and I [34, Thm.1] observed that the suffix array of F m for even m is the arithmetically progressed permutation SA Fm with ratio f m−2 mod f m and SA Fm [1] = f m .Theorem 3.9 generalizes this observation by characterizing all binary strings whose suffix arrays are arithmetically progressed.Hence, F m must coincide with Case 1 of Theorem 3.9 since it ends with character a.
Lemma 4.6.The Fibonacci word F m for even m is given by Proof.We use the following facts: • The greatest common divisor of f i and f j is the Fibonacci number whose index is the greatest common divisor of i and j [47, Fibonacci numbers].Hence, f m−1 and f m are coprime for every m ≥ 2.
• f 2 m−2 mod f m = 1 holds for every even m ≥ 3 [30].Hence, • By definition, F m [f m ] = a if m is even, and therefore SA Fm [1]  If SA is arithmetically progressed with ratio k, then its split position must be p s = n − k − 1 according to Theorem 3.9.We show that k = f m−2 by proving in a way similar to [34,Lemma 8].For that, let S of a binary string S ∈ {a, b} * denote S after exchanging a's and b's (i.e., S = a ⇔ S = b).Further, let be the relation on strings such that S T if and only if S ≺ T and S is not a prefix of T. We need this relation since S T ⇐⇒ S T while S ≺ T and S ≺ T holds if S is a prefix of T. [34,Lemma 7], thus . By using one of the two above items we can show that these arithmetic progression steps yield a list of suffixes sorted in lexicographically ascending order.

Extension to Larger Alphabets
In this section we extend our results of Section 3 to alphabets of arbitrary size.Let P = [p 1 , . . ., p n ] be an arithmetically progressed permutation with ratio k.Let Σ = {1, 2, . . ., σ} be an alphabet of size σ where the order is given by 1 < 2 < • • • < σ.To construct a string T over the alphabet Σ having P as its suffix array we proceed similarly to the construction presented in the previous sections: First we split P into subarrays S 1 , . . ., S σ , then for each subarray S i we assign the character i to each position p ∈ S i .When splitting P into subarrays there are some fixed positions where we are required to split P , while the remaining splitting positions can be chosen freely.Let σ min be the size of the minimal alphabet over which there is a string having P as its suffix array.Then there are σ min − 1 positions where we are required to split P .Those required splitting positions are after the following entries, modulo n: At the beginning of this paper we have looked at the strings over an alphabet of size σ having the suffix array [n, n − 1, . . ., 1].The number of those strings is given by Theorem 2.2 and is bounded by a polynomial in n and σ.We extend upon that result: For an arbitrary permutation, which is not necessarily arithmetically progressed we give a bound on the number of strings with that permutation as suffix array.For a fixed arithmetically progressed permutation P we give an exact formula for the number of strings having P as their suffix array.We conclude that in total the number of strings having an arithmetically progressed suffix array is bounded by a polynomial in n and σ.As the permutation P is a bijection on Σ n there is only one string T that, permuted by P , gives the string T .Thus there are at most n+σ−1 n strings having P as suffix array.There may be less as it is possible that a string V, which is a pre-image of some T = 1 t1 • • • σ tσ under the permutation P , has a suffix array different to P .Lemma 5.3.Let Σ be an alphabet of size σ = |Σ|.Given a permutation P of length n, let σ min be the size of the smallest alphabet over which there exists a string having P as its suffix array.Then there are at most n+σ−1 σ−σmin strings of length n over the alphabet Σ having P as their suffix array.
Proof.As described at the beginning of this section, all strings over the alphabet Σ having P as suffix array can be constructed by splitting P into σ−1 subarrays.There are σ min − 1 positions where we are required to split P and σ − σ min splitting positions that can be chosen freely.We model the selection of the positions that can be chosen freely using the stars and bars problem [23, Chp.II, Sect.5].The stars represent the entries of the permutation, the bars the splitting positions.Assume we start with n + σ − 1 stars.Then σ − σ min bars (splitting positions) can be freely chosen.There are n+σ−1 σ−σmin ways to do this.Then we replace the stars at the σ min −1 required splitting positions with bars.This gives us n stars and σ −1 bars, which we map to a partition of P into σ subarrays.Lemma 5.4.Let Σ be an alphabet of size σ = |Σ|.Then the number of strings of length n with an arithmetically progressed suffix array is at most p(n, σ) = n(n − 1) n+σ−1 σ−σmin .Proof.Each arithmetically progressed permutation P of length n can be described by two parameters: Its first entry P [1] a its ratio k.For P [1] there are n different options, for k there are at most n − 1 different options.Thus the number of arithmetically progressed permutations of length n is at most n(n − 1).By Lemma 5.3 the number of strings of length n having a specific permutation as suffix array is bounded by n+σ−1 σ−σmin .Putting those two facts together we obtain that the number of strings of length n with an arithmetically progressed suffix array is bounded by p Given a fixed alphabet Σ, by the above lemma, the number of strings of length n with arithmetically progressed suffix array is bounded by a polynomial in n and σ.

Applications on Meta Strings
In this final section we overview some connections of suffix arrays with generalized forms of words and meta strings.A generalization of strings was proposed in the 1974 groundbreaking paper of Fischer and Paterson [25], where string matching was considered in a more general setting than with the usual solid letters, whereby either string could have don't care10 symbols, and was achieved in time nearly as fast as linear.Uncertain sequences, including indeterminate strings, have application in inexact matching tasks, for instance, allowing for errors such as with Web interface data entry and Internet search.They are also useful for expressing DNA polymorphisms, that is biological sequence positions that can have multiple possibilities and encoded with IUPAC11 meta characters, for example N denotes any of the DNA nucleotides.A codon is a form of meta character whereby a sequence of three nucleotides encodes a specific amino acid and are used for protein expression; so a genetic code can be composed of concatenated codon units.The truncated generalized suffix automaton was introduced for indexing length-bounded k-factors of degenerate DNA and RNA sequences [26].An elastic-degenerate string is a sequence of sets of strings used for succinctly representing a multiple alignment of a collection of closely related sequences (e.g. a pan-genome, that is all genes and genetic variation within a species), and also supports approximate pattern matching [4].Sequence alignment is useful for inferring evolutionary relationships between biological sequences.
Daykin and Watson proposed a simple degenerate BWT, the D-BWT [21], constructed by applying lex-extension order (cf.Example 6.1) to relabel the sets and order the degenerate strings in the D-BWT matrix.Subsequently in [18], the D-BWT was applied to pattern matching using the backward search technique.This formalized and extended work implemented in [32] presenting a bioinformatics software tool, BWBBLE, for pattern matching on a pan-genome that they called a reference multi-genome.
In what follows, we first formally define indeterminate strings, and subsequently illustrate various approaches for defining an indeterminate suffix array: An indeterminate string 12 T I = T I [1..n] on an alphabet Σ is a sequence of nonempty subsets of Σ.Specifically, an indeterminate string T I has the form T I = t 1 • • • t n where each t i is a set of characters over Σ; a singleton is known as a solid letter .For example, T = {c, a, b, e}{d}{c, a, b, e}{d}{d}{c, a, b, e, f, g} is a ternary border-free indeterminate string of length 6, with a singleton {d}.Example 6.1.Let T I = {b}{b}{c,a}{b,d,a} be an indeterminate string, then SA T I can be alternatively defined by the following approaches: • We can apply the lex-extension order by treating each character set as a string whose characters are sorted, and use the lexicographic order of these strings to sort the sets, resulting in ranked meta characters: {a,b,d} = A; {a,c} = B; {b} = C. Then T I = CCBA, and SA T I = [4, 3, 2, 1], an arithmetically progressed permutation.This method enables linear-time construction of an indeterminate suffix array.
• Given we do not want the input to be rearranged, then we can form the meta characters as follows: {b} = A; {b,d,a} = B; {c,a} = C. Here, T I = AACB with suffix array SA T I = [1, 2, 4, 3], which is not arithmetically progressed.
• Finally, we can incorporate the suffix arrays of the individual sets treated as strings, so the above suffix array [1,2,4,3], with both indeterminate and solid letter positions, becomes Clearly, there is a natural generalization of many types of well-known patterned strings over solid letters to the indeterminate form, such as indeterminate Fibonacci words, where the solid form can give a meta representation of the generalized form.

Conclusion and Problems
Given an arithmetically progressed permutation P with ratio k, we studied the minimum alphabet size and the shape of those strings having P as their suffix array.Only in the case P = [n, n − 1, . . ., 1], a unary alphabet suffices.For general P = [p 1 , . . ., p n ] = [n, n − 1, . . ., 1], there is exactly one such string on the binary alphabet if and only if p 1 ∈ {1, k + 1, n}.In all other cases, there is exactly one such string on the ternary alphabet.We conclude by proposing some research directions.
• A natural question arising from this research is to characterize strings having arithmetic progression properties for the run length exponents of their BWTs, particularly for the bijective [27] or extended BWT [37], which are always invertible.
For example, given the arithmetically progressed permutation 3214, then the run-length compressed string a • Arithmetic properties can likewise be considered for the following stringology integer arrays: -Firstly the longest common prefix (LCP) array LCP, whose entry LCP[i] is the length of the longest common prefix of the lexicographically i-th smallest suffix with its lexicographic predecessor for i ∈ [2. .n].
-Given a string T ∈ Σ + of length n, the prefix table P T of T is given by P T [i] = LCP(T, T[i..n]) for i ∈ [1.
.n]; equivalently, the border table B T of T is defined by -Integer prefix lists are more concise than prefix tables and give the lengths of overlapping LCPs of T and suffixes of T (cf.[14]).
-The i-th entry of the Lyndon array λ = λ T [1.
.n] of a given string T = T[1.
.n] is the length of the longest Lyndon word that is a prefix of T [i..] -reverse engineering in [17]  • A challenging research direction is to consider arithmetic progressions for multi-dimensional suffix arrays and Fibonacci word sequences.

7 Theorem 2 . 2 .
In other words, we can reduce this problem to the classic stars and bars problem [23, Chp.II, Sect.5] with n stars and σ − 1 bars, yielding n+σ−1 n possible strings.Hence we obtain: There are n+σ−1 n strings of length n over an alphabet with size σ having the suffix array [n, n − 1, . . ., 1].

Figure 2 :
Figure 2: T of Eq. (1) for each arithmetically progressed permutation P of length n = 8 with ratio k = 5, starting with p 1 := P [1] = k = 5.The permutation of the k-th row is the k-th cyclic rotation of the permutation P in the first row.The splitting of P into the subarrays is visualized by the vertical bar (|) symbol.For (5) and (8), the alphabet is binary and the BWTs are the same.The strings of (3) and (8) are periodic with period n − k, since the last text position of each subarray is at most as large as n − k = 3 (cf.the proof of Theorem 3.9).Fori ∈ [1..n], BWT T [i] = T[P [i + n − k −1 mod n]] = T[P [i + 3 mod n]] with k −1 = k = 5 defined in Section 3.4.
Figure 2: T of Eq. (1) for each arithmetically progressed permutation P of length n = 8 with ratio k = 5, starting with p 1 := P [1] = k = 5.The permutation of the k-th row is the k-th cyclic rotation of the permutation P in the first row.The splitting of P into the subarrays is visualized by the vertical bar (|) symbol.For (5) and (8), the alphabet is binary and the BWTs are the same.The strings of (3) and (8) are periodic with period n − k, since the last text position of each subarray is at most as large as n − k = 3 (cf.the proof of Theorem 3.9).Fori ∈ [1..n], BWT T [i] = T[P [i + n − k −1 mod n]] = T[P [i + 3 mod n]] with k −1 = k = 5 defined in Section 3.4.

Figure 3 :Figure 4 :
Figure 3: Setting of the proof of Theorem 3.2 with the condition p i + 1 = p 1 − k = p n .In the figure we assume that the entry p 1 − k − 1 appears before n − k in P .BWT matrix of babbabac:

Figure 5 :
Figure 5: Setting of Eq. (1) with the distinction whether the entry p 1 − k − 1 appears before (left) or after (right) n − k in P , yielding a different shape of the BWT T defined as BWT T[i] = T[P [i]] with P [i] = SA T [i] − 1 mod n.

Theorem 3 . 8 .
Given an arithmetically progressed permutation P := [p 1 , . . ., p n ] = [n, n − 1, . . ., 1] with ratio k such that p 1 ∈ {1, k + 1, n}, we can modify T of Eq. (1) to be a string over the binary alphabet {a, b} with SA T = P .Proof.If p 1 = 1, then P is split after the occurrences of the values n − k and −k = n − k mod n, which gives only two non-empty subarrays.If p 1 = n, P is split after the occurrence of n − k − 1, which implies that C is empty since p n = n − k.Hence, T can be constructed with a binary alphabet in those cases, cf.Fig.2.For the case p 1 = k + 1, P is split after the occurrences of the values n − k and k + 1 − k − 1 mod n = n mod n, so B contains only the text position n.By construction, the requirement is that the suffix T[n] is smaller than all other suffixes starting with c.So instead of assigning the unique character T[n] ← b like in Theorem 3.2, we can assign T[n] ← c, which still makes T[n] the smallest suffix starting with c.We conclude this case by converting the binary alphabet {a, c} to {a, b}.Cf.Fig.2, where T in Rotation (6) has become bbabbabb with period n − k = 3.The main result of this section is the following theorem.There, we characterize all binary strings whose suffix arrays are arithmetically progressed per-

Case 1 :
is the starting position of the largest suffix (S[p n ] = b = a = S[p 1 ] = S[p n + k]).• S[p s ..n] and S[p s+1 ..n] are the lexicographically largest suffix starting with a and the lexicographically smallest suffix starting with b, respectively, such that S[p s ] = a = b = S[p s+1 ].To sum up, since S[p s ..n] ≺ S[p s+1 ..n] by construction, S[p i ..n] ≺ S[p i+1 ..n] holds for p i > p i+1 whenever p i = p n .This, together with the coprimality of n and k, determines p s uniquely in the three cases (cf.Fig. 6 for the case that n = 8 and k = 5, and Fig. 7 for sketches of the proof): We first observe that the case k = n − 1 gives us P = [n, n − 1, . . ., 1], and this case was already treated with Theorem 2.1.In the following, we assume k < n − 1, and under this assumption we have s > 1, T s [n] = T s [p 1 ] = a and T s [n − 1] = b (otherwise T s [n − 1..n] would be the second smallest suffix, i.e., P [2] = n − 1 and hence k = n − 1).Consequently, T s [n − 1..n] = T s [p s+1 ..n] is the smallest suffix starting with b, namely ba, and therefore p s = n − 1 − k.

breaks the periodicity.For Case 1 ,
p n = n − k and p s = n − k − 1 (hence p n − (n − k) = 0 and p s − (n − k) = −1), thus Case 1 is periodic.Case 2 is analogous to Case 1.For Case 3, T s does not have period n − k as p n = n − k + 1, and hence p n − (n − k) > 0. It cannot have any other period since Case 3 yields a Lyndon word (because the lexicographically smallest suffix T s [p 1 ..n] = T s [1..n] starts at the first text position).Note that Case 3 can be obtained from Case 2 by setting T s [1] ← a (the smallest suffix T s [k + 1. a because the smallest suffix starts at text position p 1 , and • S[p 1 − 1] = b because of the following: First, the text position S[p 1 − 1] exists due to p 1 > 1.Second, since p 1 < n, there is a text position j ∈ [p 1 + 1..n] such that S[p 1 ] = . . .= S[j − 1] = a and S[j] = b (otherwise S[n] would be the smallest suffix).If S[p 1 − 1] = a, then the suffix S[p 1 − 1..n] starting with a j−p1+1 b is lexicographically smaller than the suffix S[p 1 ..n] starting with a j−p1 b.Hence, S[p 1 − 1] = b must hold.• p n − 1 ≥ 1 (since p 1 = k + 1) and S[p n − 1] = a.If S[p n − 1] = b, then the suffix S[p n − 1..n] has a longer prefix of b's than the suffix S[p n ..n], and is therefore lexicographically larger.Since S[p n − 1] = a and S[p 1 − 1] = b with p n − 1 + k mod n = p 1 − 1, the smallest suffix starting with b is located at index p 1 − 1.This is a contradiction as

12 ( 1 ) 5 Figure 9 :
Figure 9: Two Christoffel words with S = ((0, 0), (7, 5)), slope 5 7 , and length 12.The suffix array of the the lower Christoffel word (1) has an arithmetically progressed permutation with ratio k = 5, p s = 7 = n − k, and hence is an instance of Theorem 3.9, Case 3.Then, assuming b < a, the suffix array of the upper Christoffel word (2) has an arithmetic progression ratio of n − k = 7 with split index n − s and position of largest suffix starting b of n − p s .

Lemma 4 . 3 .
Let T c be a Christoffel word of length n ≥ 3 over Σ = {a, b}.Suppose T c has an arithmetically progressed suffix array SA Tc with progression ratio k ≥ 1 and split index s.Then T c is a balanced 2 Lyndon word with a factorization T c = T u T v such that |T u | = SA Tc [(s + 2) mod n].For n = 1, 2 we have that T c ∈ {a, b, ab}, k = 1, and that the split index s is 1.
a and T u would have a border.• SA Tc [(s + 1) mod n] = n with T c [n] = b since T c is a Lyndon word larger than one, and the smallest suffix starting with b is T c [n..n].

Figure 10 :
Figure 10: An upper Christoffel word T U = bababaababaa and its reversal, the lower Christoffel word T L = aababaababab, shown geometrically with respect to the line segment S = ((0, 0), (7, 5)).The dotted lines indicate the balanced 2 factorization of T L into the factors aabab and aababab.

F 2 :
= a, and F m := F m−1 F m−2 .Then F m and f m := |F m | are called the m-th Fibonacci word and the m-th Fibonacci number , respectively.
The rest follows from Eq. (3).Let Fm denote the m-th Fibonacci word whose a's and b's are exchanged, i.e., Fm = a ⇔ F m = b.Lemma 4.7.SAF m is arithmetically progressed with ratio f m−2 for odd m.Proof.Since Fm [| Fm |] = a for odd m, Fm [f m ..] is the lexicographically smallest suffix.Hence, SA := SAF m = | Fm |.

Since f m and f m− 2
are coprime, {i + f m−2 mod f m | i > 0} = [1..n].Starting with the smallest suffix Fm [f m ..], we end up at the largest suffix Fm [f m + (f m − 1)f m−2 mod f m ..] after m − 1 arithmetic progression steps of the form Fm

Lemma 5 . 2 .
Let Σ be an alphabet of size σ = |Σ|.Given a permutation P of length n there are at most n+σ−1 n strings of length n over the alphabet Σ having P as their suffix array.Proof.Let the alphabet Σ be given by Σ = {1, . . ., σ} with the order 1 < 2 < • • • < σ.Let T be a string over Σ of length n whose suffix array is P .Permuting T by its suffix array P we obtain the string T with T [i] = T[P [i]].As P is the suffix array of T we have T [i..n] ≺ T [j..n] and thus T [i] ≤ T [j] for all i < j.Therefore T is given by T = 1 t1 • • • σ tσ , where t i describes the number of occurrences of i in T .The problem of finding the number of strings with this particular shape can be reduced to the classic stars and bars problem [23, Chp.II, Sect.5] with n stars and σ − 1 bars, yielding n+σ−1 n possible strings.

3 c 2
$b 4 (a) matches the permutation 3214 and (b) is a BWT image because its inverse is b 2 cb 2 ca 3 $, which can be computed by the Last-First mapping.However, for the same permutation, a 3 b 2 $b 4 does not work since it is not a BWT image.Further examples of arithmetically progressed BWT exponents are: a 3 c 4 $b 2 , a 4 c 3 e 2 $d 6 b 5 , and a 4 c 3 e 2 $f 7 d 6 b 5 .
includes a linear-time test for whether an integer array is a Lyndon array.Likewise, the Lyndon factorization array F = F T[1..n] of T stores in its i-th entry the size of the Lyndon factorization (i.e., the number of factors) of the suffix T[i..n].The problems are to characterize those arithmetic progressions that define a valid Lyndon array, respectively Lyndon factorization array.For example, consider the string T = azyx, then its Lyndon array is λ T = [4, 1, 1, 1], while the Lyndon factorization array is F T = [1, 3, 2, 1].Trivially, for T = abc . . .z the Lyndon array is an arithmetic progression and likewise for the Lyndon factorization array of T = z t y t x t . . .a t for z > y > x > . . .> a.
, . . .p n ] of ratio k.The choice of p 1 determines the minimum size of the alphabet and whether a string is unique, periodic or a Lyndon word.The column Min.σ denotes the smallest possible size σ for which there exists such a string whose characters are drawn from an alphabet Σ with |Σ| = σ.