Nondeterministic automatic complexity of overlap-free and almost square-free words

Shallit and Wang studied deterministic automatic complexity of words. They showed that the automatic Hausdorff dimension $I(\mathbf t)$ of the infinite Thue word satisfies $1/3\le I(\mathbf t)\le 2/3$. We improve that result by showing that $I(\mathbf t)\ge 1/2$. For nondeterministic automatic complexity we show $I(\mathbf t)=1/2$. We prove that such complexity $A_N$ of a word $x$ of length $n$ satisfies $A_N(x)\le b(n):=\lfloor n/2\rfloor + 1$. This enables us to define the complexity deficiency $D(x)=b(n)-A_N(x)$. If $x$ is square-free then $D(x)=0$. If $x$ almost square-free in the sense of Fraenkel and Simpson, or if $x$ is a strongly cube-free binary word such as the infinite Thue word, then $D(x)\le 1$. On the other hand, there is no constant upper bound on $D$ for strongly cube-free words in a ternary alphabet, nor for cube-free words in a binary alphabet. The decision problem whether $D(x)\ge d$ for given $x$, $d$ belongs to $NP\cap E$.


Introduction
The Kolmogorov complexity of a finite word w is, roughly speaking, the length of the shortest description w * of w in a fixed formal language. The description w * can be thought of as an optimally compressed version of w. Motivated by the non-computability of Kolmogorov complexity, Shallit and Wang [9] studied a deterministic finite automaton analogue. A more recent approach is due to Calude, Salomaa, and Roblot [1].
Definition 1 (Shallit and Wang [9]). The automatic complexity of a finite binary string x = x 1 · · · x n is the least number A D (x) of states of a deterministic finite automaton M such that x is the only string of length n in the language accepted by M. This complexity notion has the following two properties: 1. Most of the relevant automata end up having a "dead state" whose sole purpose is to absorb any irrelevant or unacceptable transitions.

The complexity of a string can be changed by reversing it. For instance,
A D (011100) = 4 < 5 = A D (001110).
(1) Equation 1 was verified by a computer program; for the idea and a partial proof see Figure 1. The anonymous referee of this article raised the question, which we have not been able to answer, whether the complexity of a string and its reverse can be arbitrarily far apart.
If we replace deterministic finite automata by nondeterministic ones, these properties disappear. The nondeterministic automatic complexity turns out to have other pleasant properties, such as a sharp linear upper bound.
Technical ideas and results. In this paper we develop some of the properties of nondeterministic automatic complexity. As a corollary we get a strengthening of a result of Shallit and Wang [9] on the complexity of the infinite Thue-Morse word t. Moreover, viewed through an NFA lens we can, in a sense, characterize the complexity of t exactly. A main technical idea is to extend [9,Theorem 9] which said that not only do squares, cubes and higher powers of a word have low complexity, but a word completely free of such powers must conversely have high complexity. The way we strengthen their results is by considering a variation on square-freeness and cube-freeness, overlap-freeness. This notion also goes by the names of irreducibility and strong cube-freeness in the combinatorial literature. We also take up an idea from [9,Theorem 8] and use it to show that the natural decision problem associated with nondeterministic automatic complexity is in E = DTIME (2 O(n) ). This result is a theoretical complement to the practical fact that the nondeterministic automatic complexity can be computed reasonably quickly; to see it in action, for strings of length up to 23 one can view automaton witnesses and check complexity using the following URL format http://math.hawaii.edu/wordpress/bjoern/complexity-of-110101101/ and check one's comprehension by playing a Complexity Guessing Game at http://math.hawaii.edu/wordpress/bjoern/software/web/complexity-guessing-game/ Let us now define our central notion and get started on developing its properties. Recall that a nondeterministic finite automaton (NFA) is assumed to have no ε-transitions, i.e., it is not an NFA − ε.
Definition 2. The nondeterministic automatic complexity A N (w) of a word w is the minimum number of states of an NFA M accepting w such that there is only one accepting path in M of length |w|.
The minimum complexity A N (w) = 1 is only achieved by words of the form a n where a is a single letter.
Proof sketch. If x has odd length, it suffices to carefully consider the automaton in Figure  2. If x has even length, a slightly modified automaton can be used.
x m+2 x m+3 x n−3 x n−2 x n−1 x n Figure 2: A nondeterministic finite automaton that only accepts one string x = x 1 x 2 x 3 x 4 · · · x n of length n = 2m + 1.
Definition 4. The complexity deficiency of a word x of length n is The distribution of A N (w) for w of length n 23 is given in Table  Proof. Shallit and Wang [9,Theorem 2] showed that one can efficiently determine whether a given DFA uniquely accepts w among string of length |w|. Hyde [5,Theorem 2.2] extended that result to NFAs, from which the result easily follows.

E
Definition 7. Suppose M is an NFA with q states that uniquely accepts a word x of length n. Throughout this paper we may assume that M contains no edges except those traversed on input x. Consider the almost unlabeled transition diagram of M, which is a directed graph whose vertices are the states of M and whose edges correspond to transitions. Each edge is labeled with a 0 except for an edge entering the initial state as described below.
We define the accepting path P for x to be the sequence of n + 1 edges traversed in this graph, where we include as first element an edge labeled with the empty string ε that enters the initial state q 0 of M.
We define the abbreviated accepting path P ′ to be the sequence of edges obtained from P by considering each edge in order and deleting it if it has previously been traversed.
Then v is of one of the following five types.
4. In-degree 2 (edges e i and e j with j > i), out-degree 2 (e i+1 and e j+1 ). 5. In-degree 1 (edge e t ), out-degree 0. 1 Proof. The out-degree and in-degree of each vertex encountered along P ′ are both 2, since failure of this would imply non-uniqueness of accepting path. Since all the edges of M are included in P , the list includes all the possible in-degree, out-degree combinations. We can define i by the rule that e i is the first edge in P ′ entering v. Again, since all the edges of M are included in P , e i+1 must be one of the edges contributing to the out-degree of v, if any, and e j must also be as specified in the types. Lemma 8 implies that Definition 9 makes sense. Definition 9. For 0 i t + 1 and 0 n t + 1 we let E(i, n) be a string representing the edges (e i , · · · , e n ). The meaning of the symbols is as follows: 0 represents an edge. A left bracket [ represents a vertex that is the target of a backedge. A right bracket ] represents a backedge. The symbol + represents a vertex of out-degree 2. When i > n, we set E(i, n) = ε. Next, assuming we have defined E(j, m) for all m and all j > i, we can define E(i, n) by considering the type of the vertex reached by the edge e i . Let a i ∈ {0, ε} be the label of e i .
Lemma 10. The abbreviated accepting path can be reconstructed from E(0, t).
We do not include the proof of Lemma 10; instead, Figure 3 gives an example of an automaton and the computation of E(0, t).
Remark 13. The bound 16 n counts many automata that are not uniquely accepting; the actual number may be closer to 3 n based on computational evidence.

Powers and complexity
In this section we shall exhibit infinite words all of whose prefixes have complexity deficiency bounded by 1. We say that such a word has a hereditary deficiency bound of 1.

Square-free words
Lemma 14. Let x and y be strings over an arbitrary alphabet with xy = yx. Then there is a string z and integers k and ℓ such that x = z k and y = z ℓ .
Definition 15. A word x is a factor in a word y if y = uxv for some words u and v. In this case we also say that y contains x.
We will use the following simple strengthening from DFAs to NFAs of a fact used in [9, Theorem 9].
Theorem 16. If an NFA M uniquely accepts w of length n, and visits a state p at least k +1 times, where k 2, during its computation on input w, then w contains a kth power.
• w 0 is the portion of w read before the first visit to the state p, the electronic journal of combinatorics 22 (2015), #P00 • w i is the portion of w read between visits number i and i + 1 to the state p for 1 i k, and • w k+1 is the portion of w read after the last visit to the state p.
Thus |w i | 1 for each 1 i k, but it is possible to have |w 0 | = 0 (|w k+1 | = 0) since the initial (final) state of M's on input w computation may be p.
For any permutation π on 1, . . . , k, M accepts w 0 w π(1) · · · w π(k) w k+1 . Let 1 j k be such that w j has minimal length and let Then M also accepts w 0 w jŵj w k+1 and w 0ŵj w j w k+1 .
By uniqueness, and so w jŵj =ŵ j w j .
By Lemma 14, w j andŵ j are both powers of a string z. Since |ŵ j | (k − 1)|w j |, w jŵj is at least a kth power of z, so w contains a kth power of z. We next strengthen a particular case of [9, Theorem 9] to NFAs.
Theorem 18. A square-free word has deficiency 0.
Proof. Suppose w is a word of length n = 2k or n = 2k + 1, of deficiency d. Then there is a witnessing automaton M with q = k + 1 − d states. Since n + 1 2k + 1 = 2(k + 1 − d) + 2d − 1 = 2q + (2d − 1), by the Extended Pigeonhole Principle (Theorem 17), there is a state p which is visited 2 + (2d − 1) = 3 times t 1 < t 2 < t 3 , during the n + 1 times of the computation of M on input w (and is not visited at any other times in the interval [t 1 , t 3 ]). By Theorem 16, w contains a square.
Corollary 19. There exists an infinite word of hereditary deficiency 0.
Proof. There is an infinite square-free word over the alphabet {0, 1, 2} as shown by Thue [11]. The result follows from Theorem 18.

Cube-free and overlap-free words
Definition 20. For a word u, let first(u) and last(u) denote the first and last letters of u, respectively. An overlap is a word of the form uu first(u) (or equivalently, last(u)uu). A word w is overlap-free if it does not contain any overlaps. Theorem 24 (Shelton and Soni [10]). Let ℓ be a positive integer. The following are equivalent.
1. There exists an overlap-free binary word y and a word x such that y contains xx and ℓ = |xx|.
Lemma 25. If a cube www contains another cube xxx then either |x| = |w|, or xx first(x) is contained in the first two consecutive occurrences of w, or last(x)xx is contained in the last two occurrences of w.
Proof. We prove the contrapositive. Suppose xx first(x) is not contained in the first two consecutive occurrences of w, and last(x)xx is not contained in the last two occurrences of w. Then the middle last(x)x first(x) of the factor xxx has last(w)w first(w) as a factor, and hence |x| |w|.
Theorem 26. The deficiency of cube-free binary words is unbounded.
Proof. Given k, we shall find a cube-free word x with D(x) k. Pick a number n such that 2 n 2k + 1. Let w := µ n (0), which is a word of length ℓ := 2 n . By Theorem 22, ww is overlap-free. Let x = wwŵ whereŵ is the proper prefix of w of length |w| − 1. By Lemma 25, x is cube-free. The complexity of x is at most |w| as we can just make one loop of length w, with code (Theorem 12) And so
Lemma 29. For each k 1 there is a sequence x 1,k , . . . , x k,k of positive integers such that Let t j denote bit j of the infinite Thue-Morse word. Then we can ensure that Proof. Let Given x 1,k−1 , . . . , x k−1,k−1 , we let x i,k = 3x i,k−1 for i < k and x k,k = 3u k−1 + 2 for a sufficienctly large number u k−1 . Reducing the equation so we conclude a k = 2. Then we can cancel a k , divide by three and reduce to the induction hypothesis. Thus our numbers are x 1,2 = 3, x 2,2 = 3u 1 + 2, To ensure (1) we just take u j−1 sufficiently big. To ensure (2), we apply Lemma 28.
Theorem 30. The complexity deficiency of overlap-free words over an alphabet of size three is unbounded.
the electronic journal of combinatorics 22 (2015), #P00 Proof. Let d 1. We will show that there is a word w of deficiency D(w) d. Let k = 2d − 1. For each 1 i k let x i = x k+1−i,k where the x j,k are as in Lemma 29. Note that since x i,k + 1 < x i+1,k , we have x i > x i+1 + 1. Let [+0 where * indicates the accept state. Let X = k i=1 x i . Then M has k − 1 + X edges but only q = X states; and w has length Suppose v is a word accepted by M. Then M on input v goes through each loop of length x i some number of times a i 0, where If additionally |v| = |w|, then by Lemma 29 we have a 1 = a 2 = · · · = a k , and hence v = w. Thus Below we prove that w is overlap-free.
Proof that the word w in Theorem 30 is overlap-free. Suppose a word uu is contained in w.
Proof that the number of 2s in uu is either 0 or 2. Let o 1 , . . . , o 2a denote the occurrences of 2s in uu and suppose a 1. Let δ i = o i+1 − o i . Then the sequence (δ 1 , · · · , δ a ) is an interval in the sequence Since x i > x i+1 + 1, in particular |x i − x i+1 | > 1 and so this sequence is injective, i.e., no two entries are the same.
So either Case 1 or Case 2 below obtains. Case 1: The number of 2s in uu is zero. Then certainly uu first(u) is not contained in w, since the infinite Thue-Morse word is overlap-free. Case 2: The number of 2s in uu is two. Then we have one of the following two cases.
the electronic journal of combinatorics 22 (2015), #P00 1. uu is contained in a word of the form We guard against that by making sure that • t x i = t x i+1 −1 (Lemma 29) and • 2 = t x i+1 (the Thue-Morse word uses only the letters 0 and 1) 2. uu is contained in a word of the form Since uu contains exactly two 2s and the t j are not 2s, it follows that uu = a2b2c where a, b, c are words over the binary alphabet {0, 1}. Then u = a2b 1 = b 2 2c where b = b 1 b 2 , so a = b 2 , c = b 1 and so actually u = a2c and t 1 · · · t x i = b = ca. Here then |ca| = x i . If |a| 2 then consequently which contradicts x i+1 < x i − 1. If |a| 2 then we appeal to Lemma 31.
Lemma 31. t x i −2 t x i −1 2 t 1 · · · t x i 2 cannot be a factor of a square having only two 2s.
Proof. The Thue-Morse word is a concatenation of disjoint occurrences of the words 01 and 10. Each of these two words are of the form zz where z = 1 − z. The idea now is that if x i is odd then say it ends in a lone 0 and 2, 02; then adding the next control bit will give something ending in 012, preventing a square.
More precisely, since t 1 · · · t x i −1 2 having odd or even length ends in say zz2 or zza2 respectively, and then t 1 · · · t x i −1 t x i 2 ends in zzb2 or zzaa2, respectively; either way t 1 · · · t x i −1 2 and t 1 · · · t x i −1 t x i 2 are incompatible. Definition 2 yields the following lemma.
Note that in Lemma 32, it may very well be that t 1 = t 2 .
the electronic journal of combinatorics 22 (2015), #P00 Proof. Suppose w is a word satisfying D(w) 2 and consider the sequence of states visited in a witnessing computation. As in the proof of Theorem 41, either there is a state that is visited four times, and hence there is a cube in w, or there are three state cubes (states that are visited three times each), and hence there are three squares in w. By Theorem 24, a overlap-free binary word can only contain squares of length 2 a , 3 · 2 a , and hence can only contain powers u i where |u| is of the form 2 a , 3 · 2 a , and i 2.
In particular, the length of one of the squares in the three state cubes must divide the length of another. So if these two state cubes are disjoint then the shorter one repeated can replace one occurrence of the longer one, contradicting Lemma 32.
So suppose we have two state cubes, at states p 1 and p 2 , that overlap. At p 1 then we read consecutive words ab that are powers a = u i , b = u j of a word u, and since there are no cubes in w it must be that i = j = 1 and so actually a = b. And at p 2 we have words c, d that are powers of a word v and again the exponents are 1 and c = d.
The overlap means that in one of the two excursions of the same length starting and ending at p 1 , we visit p 2 . By uniqueness of the accepting path we then visit p 2 in both of these excursions. If we suppose the state cubes are chosen to be of minimal length then we only visit p 2 once in each excursion. If we write a = rs where r is the word read when going from p 1 to p 2 , and s is the word going from p 2 to p 1 , then c = sr and w contains rsrsr. In particular, w contains an overlap.
Remark 34. In computability theory, the effective Hausdorff dimension dim and effective packing dimension Dim of a single infinite binary sequence u are defined, and related to Kolmogorov complexity C. It is shown (see [2,Theorem 13.3.4 and Corollary 13.11.12]) that dim(u) = lim inf n C(u 1 · · · u n ) n , and Dim(u) = lim sup n C(u 1 · · · u n ) n .
These results, together with the idea that automatic complexity is a miniaturization of Kolmogorov complexity, constitute our motivation for making Definitions 35 and 37 below.
Definition 35. For an infinite word u define the deterministic automatic Hausdorff dimension of u by and the deterministic automatic packing dimension of u by The connection between effective dimension and automatic dimension is not merely by analogy, as Theorem 36 shows.

Almost square-free words
Definition 40 (Fraenkel and Simpson [3]). A word whose square factors all belong to the set {00, 11, 0101} is called almost square-free.
Theorem 41. A word that is almost square-free has a deficiency bound of 1.
Proof. It is easy to verify for words of length at most 3. Suppose now w has length at least 4. Suppose w is a word of a length n ∈ {2k, 2k + 1} where k 2, with deficiency at least 2. Then there are q = k − 1 1 states occupied at n + 1 times. So n + 1 ∈ {2k + 1, 2k + 2} = {2q + 3, 2q + 4} times. There are at least 2q + 3 times and only q states, so by the Extended Pigeonhole Principle (Theorem 17), we are in one of the following Cases 1-3.
• Case 1. There is at least one state that is visited at least 5 times. Then by Theorem 16, w contains a fourth power.
• Case 2. There is at least one state p 1 that is visited at least 4 times and another state p 2 = p 1 that is visited at least 3 times. Then by Theorem 16, there is a cube xxx and a square yy in w. Since w has no squares of length > 4, we must have |xx| 4 and |yy| 4 and hence 1 |x| 2 and 1 |y| 2. We next consider possible lengths of x and y. -Subcase |x| = 1, |y| = 2: In this case, the xxx and yy occurrences must be disjoint, because the states in a yy occurrence are p 2 p 3 p 2 p 3 p 2 for some p 3 which must be disjoint from p 1 p 1 p 1 p 1 when p 1 = p 2 . But then we can replace these by p 2 p 3 p 2 p 3 p 2 p 3 p 2 and p 1 p 1 , respectively, giving two distinct state sequences leading to acceptance, contradicting Lemma 32.
-Subcase |x| = 1, |y| = 1: In this case again the occurrences of xxx and yy must be disjoint, since p 1 = p 2 . We can replace p 4 1 and p 3 2 by p 1 and p 6 2 , respectively, again contradicting Lemma 32.
• Case 3. There are at least 3 states p 1 , p 2 , p 3 (all distinct) that are each visited at least 3 times. Then by Theorem 16, there are three squares u i u i at three distinct states p i , 1 i 3. By assumption |u i u i | 4 so |u i | 2.
-Subcase 3.1. |u i | = |u j | = 1 for two values 1 i < j 3. Then the argument is entirely analogous to that in Case 2.
-Subcase 3.2 |u j | = |u k | = 2 for two values 1 j < k 3. * Subsubcase 3.2.1. If disjoint, we can replace u 2 j by u 2 k to get u 4 k , again a fourth power, by the argument of Subcase 3.1. * Subsubcase 3.2.2. If nondisjoint with full overlap then p j a 1 p j a 2 p j and p k b 1 p k b 2 p k become p j p k p j p k p j p k and immediately we get 10101 or 01010 or a fourth power in w; * Subsubcase 3.2.3. If partial overlap only then p j a 1 p j a 2 p j and p k b 1 p k b 2 p k become, by Lemma 32, p j ap j ap j and p k bp k bp k and then p j ap j p k p j p k bp k By Lemma 32 again, this must be p j p k p j p k p j p k p j p k = (p j p k ) 4 and so the read word must be of the form abababa, giving an occurrence of 1010 (if a = b) or of a 7th power (if a = b) in w.
Thus all cases are covered and the Theorem is proved.
Corollary 42. There is an infinite binary word having hereditary deficiency bound of 1.
the electronic journal of combinatorics 22 (2015), #P00 Proof. We have two distinct proofs. On the one hand, Fraenkel and Simpson [3] show there is an infinite almost square-free binary word, and the result follows from Theorem 41. On the other hand, the infinite Thue-Morse word is overlap-free (Theorem 27) and the result follows from Theorem 33.
Conjecture 43. There is an infinite binary word having hereditary deficiency 0.
Remark 44. We obtained some numerical evidence for Conjecture 43. For instance, we found that there are 108 binary words of length 18 having hereditary deficiency 0.