Pathwidth and nonrepetitive list coloring

A vertex coloring of a graph is nonrepetitive if there is no path in the graph whose first half receives the same sequence of colors as the second half. While every tree can be nonrepetitively colored with a bounded number of colors (4 colors is enough), Fiorenzi, Ochem, Ossona de Mendez, and Zhu recently showed that this does not extend to the list version of the problem, that is, for every $\ell \geq 1$ there is a tree that is not nonrepetitively $\ell$-choosable. In this paper we prove the following positive result, which complements the result of Fiorenzi et al.: There exists a function $f$ such that every tree of pathwidth $k$ is nonrepetitively $f(k)$-choosable. We also show that such a property is specific to trees by constructing a family of pathwidth-2 graphs that are not nonrepetitively $\ell$-choosable for any fixed $\ell$.


Introduction
A repetition of length r (r 1) in a sequence of symbols is a subsequence of consecutive terms of the form x 1 . . . x r x 1 . . . x r . A sequence is nonrepetitive (or square-free) if it does not contain a repetition of any length. In 1906 Thue proved that there exist arbitrarily long nonrepetitive sequences over an alphabet of size 3 (see [2,16]). The method discovered by Thue is constructive and uses substitutions over a given set of symbols.
A different approach to creating long nonrepetitive sequences was recently introduced by Grytczuk, Kozik, and Micek [10]: Generate a sequence by iteratively appending a random symbol at the end, and each time a repetition appears erase the repeated block. (For instance, if the sequence generated so far is abcb and we add c, then we erase the last two symbols, bringing us back to abc.) By a simple counting argument one can prove that with positive probability the length of the constructed sequence eventually exceeds any finite bound, provided the alphabet has size at least 4. This is one more than in Thue's result but the proof is more flexible and can be adapted to other settings. For instance, it led to a very short proof that for every n 1 and every sequence of sets L 1 , . . . , L n , each of size at least 4, there exists a nonrepetitive sequence s 1 s 2 . . . s n where s i ∈ L i for all i (see [10]), a theorem first proved by Grytczuk, Przybyło, and Zhu [11] via an intricate application of the Lefthanded Local Lemma. Whether the analogous statement for lists of size 3 is true remains an exciting open problem.
In this paper we make use of the above-mentioned approach to color trees nonrepetitively. Given an (undirected, simple) graph G, we denote by V (G) and E(G) its vertex set and edge set, respectively. A coloring φ : V (G) → N of the vertices of G is nonrepetitive if there is no repetition in the color sequence of any path in G; that is, φ is nonrepetitive if for every path P with an even number of vertices the sequence of colors on the first half of P is distinct from the sequence of colors on the second half of P . (We remark that all paths in this paper are simple, that is, contain no repeated vertex.) The minimum number of colors used in a nonrepetitive coloring of G is called the Thue chromatic number of G and is denoted by π(G). Now, given a graph G, suppose that each vertex v ∈ V (G) has a preassigned list of available colors L v ⊂ N. A coloring of G with these lists is a coloring φ of G such that φ(v) ∈ L v for each vertex v ∈ V (G). The Thue choice number of G, denoted by π l (G), is the minimum such that, for every list assignment {L v } v∈V (G) with |L v | for each v ∈ V (G), there is a nonrepetitive coloring of G with these lists. Similarly as for many graph coloring parameters, the Thue chromatic (choice) number can be bounded from above by a function of the maximum degree: Alon, Grytczuk, Hałuszczak, and Riordan [1] proved that for every graph G with maximum degree ∆ we have π(G) π l (G) c · ∆ 2 for some absolute constant c. A number of subsequent works [6,8,9,12] focused on reducing the value of the constant c, the current best bound being π l (G) (1 + o(1))∆ 2 (see [6]). Alon et al.
[1] also showed that there are graphs with maximum degree ∆ with π(G) = Ω ∆ 2 log ∆ . (Whether this can be improved by a log ∆ factor remains an open problem.) It is not difficult to show that every tree has Thue chromatic number at most 4 (see [3]), which is best possible. This result was generalized to graphs of bounded treewidth by Kündgen and Pelsmajer [14]. They proved that π(G) 4 k for every graph G of treewidth k. It is not known whether this upper bound can be improved to a polynomial in k. However, if one considers graphs of pathwidth k instead, a polynomial bound is known: It was shown by Dujmović et al. [6] that π(G) 2k 2 + 6k + 1 for every graph G of pathwidth k. (We note that quadratic might not be the right order of magnitude here.) Probably the most intriguing open problem regarding the Thue chromatic number is whether it is bounded for all planar graphs, a question originally asked by Grytczuk [9]. A O(log n) upper bound is known [5], and from below Ochem constructed a planar graph requiring 11 colors (see [5]).
The main focus of this paper is the list version of the parameter, the Thue choice number. As mentioned at the beginning of the introduction, we have π l (P ) 4 for every path P , and it is open whether this bound can be improved to 3. Fiorenzi, Ochem, Ossona de Mendez, and Zhu [7] gave the first example of a class of graphs where the Thue chromatic and Thue choice numbers behave very differently: While trees have Thue chromatic number at most 4, they showed that the Thue choice number of trees is unbounded. Clearly, trees with large Thue choice number must have large maximum degree, and in fact one can deduce from the proof in [7] that there are trees with maximum degree ∆ and Thue choice number Ω( log ∆ log log ∆ ). Kozik and Micek [13] subsequently showed that a better-than-quadratic upper bound in terms of the maximum degree exists for trees: For every ε > 0 there exists c > 0 such that π l (T ) c∆ 1+ε for every tree T of maximum degree ∆. (Bridging the significant gap between the upper and lower bounds remains an open problem.) Note that graphs of bounded treewidth have unbounded Thue choice number since this is already the case for trees. On the other hand, Dujmović et al. [6] observed that π l (G) is bounded when G is a graph of pathwidth 1. This prompted the authors of [6] to ask whether π l (G) is bounded more generally when G has bounded pathwidth (which is the case for the Thue chromatic number). Also, since connected graphs G of pathwidth 1 are caterpillars, and thus trees in particular, they also asked the same question but with G moreover required to be a tree. A second motivation for the latter question was that the trees with arbitrarily large Thue choice number constructed by Fiorenzi et al. [7] also have unbounded pathwidth.
In this paper we answer both questions. First, we give a simple construction showing that the Thue choice number is unbounded for graphs of bounded pathwidth; in fact, this is true even for graphs of pathwidth 2 (which is best possible as noted above): Theorem 1. For every 1, there is a graph G of pathwidth 2 with π l (G) .
Next, we address the case of trees and prove that their Thue choice number is bounded from above by a function of their pathwidth: The proof of Theorem 2 combines an induction on the pathwidth with the algorithmic method of Grytczuk et al. [10] to produce arbitrarily long nonrepetitive sequences described at the beginning of the introduction. This method, which finds its roots in the celebrated algorithmic proof of the Local Lemma by Moser and Tardos [15], was extended to produce nonrepetitive colorings of graphs (in [6]) and trees (in [13]). Part of our proof consists in adapting the ideas from [6,13] to the situation under consideration.
We note that the bounding function b(k) in Theorem 2 stemming from our proof is quite large, it is doubly exponential in k.
The paper is organized as follows: In Section 2 we introduce definitions and terminology. Then we prove Theorem 1 in Section 3, and Theorem 2 in Section 4.
Graphs in this paper are finite, simple, and undirected. The vertex set and edge set of a graph G are denoted V (G) and E(G), respectively. Note that, since only simple graphs are considered, resulting loops and parallel edges are removed when contracting edges in a graph. A graph H is a minor of a graph G if H can be obtained from a subgraph of G by contracting edges.
A tree decomposition of a graph G is a pair (T, C) where T is a tree and C is a collection The width of the tree decomposition (T, C) is the maximum, over every x ∈ V (T ), of the number of subtrees in C containing x, minus 1. The treewidth of G is the minimum width of a tree decomposition of G. Path decompositions and pathwidth are defined analogously with the tree T required instead to be a path. Treewidth and pathwidth are minor-closed parameters, in the sense that every minor of a graph G has treewidth (pathwidth) at most that of G. We refer the reader to Diestel's textbook [4] for an introduction to the theory of treewidth and graph minors.
The length of a path is the number of its edges. The height of a rooted tree T is the maximum length of a path from the root to a leaf of T . (Thus T has height 0 if it consists of a unique vertex.) The height of a vertex v of T is the length of the path from the root to v in T .

Graphs of pathwidth 2
Let G n, be the graph constructed from the path on 2n vertices where every second vertex is blown up to n vertices forming an independent set. Formally, and two vertices are adjacent in G n, if and only if their lower indices differ by exactly 1.
It is not difficult to check that G n, has pathwidth at most 2 (with equality for n 2 and 1). Thus Theorem 1 follows from the following theorem.
Theorem 3. Let and n be integers such that 1 and n > e +2 . Then π l (G n, ) > .
Proof. Consider the following list assignment for the vertices of G n, . For each odd index i = 2t + 1 ∈ [2n], vertex v i is assigned the list Thus these n lists have size , are pairwise disjoint, and their union is [ n]. Next, enumerate the -subsets of [ n] in an arbitrary way. Then, for each even index i ∈ [2n] and index j ∈ n , vertex v j i is assigned the list which is the j-th set in that enumeration. We claim that, because n was chosen to be strictly larger than e +2 , there cannot be a nonrepetitive coloring of G n, with these lists. Arguing by contradiction, let us suppose that φ is such a coloring.
With a slight abuse of notation, for i ∈ [2n] we use the shorthand φ(V i ) for the set ∪ u∈V i {φ(u)}. Consider an interval I ⊆ [2n] of the form I = [a, a + 4k + 1] with k 0. Suppose that the following two conditions are satisfied: Then it is easy to check that there exists a path w a , . . . , w a+4k+1 in G n, with w j = v j for j odd and w j ∈ V j for j even, for all j ∈ I, such that the color sequence φ(w a ), . . . , φ(w a+4k+1 ) is a repetition (of size 2k + 1). Since this cannot happen, it follows that there exists an index i ∈ [a, a + 2k] for which one of the above two conditions is not satisfied. For every such index i, we say that the pair (p, q) is a witness (for interval I), where {p, q} = {i, i + 2k + 1} with p odd and q even.
Next, consider an even index q ∈ [2n]. Observe that |[ n] − φ(V q )| − 1, since φ(V q ) contains at least one color from each -subset of [ n]. Combining this with the fact that vertices v p with odd index p ∈ [2n] have pairwise disjoint lists, we deduce that there are at most − 1 odd indices p ∈ [2n] such that the pair (p, q) is a witness. Summing up over every even index q ∈ [2n], it follows that there are at most Now consider a witness (p, q) and let |p − q| = 2k + 1. The pair (p, q) is a witness for at most 2k + 1 intervals I ⊆ [2n] of the form I = [a, a + 4k + 1]. Since there are exactly 2n − 4k − 1 intervals I of the latter form and each interval of that form must have a witness, it follows that the number of witnesses (p, q) with |p − q| = 2k + 1 is at least 2n−4k−1 2k+1 . Summing up over every possible value of k (that is, k = 0, 1, . . . , (n − 1)/2 ), we obtain that the total number of witnesses is at least It follows that n( − 1) n ln n − 3n, which contradicts the assumption that n > e +2 .

Trees of bounded pathwidth
A path-partition of a tree T is a pair (T , P) where T is a rooted tree and P is a collection {P x : x ∈ V (T )} of vertex-disjoint paths of T which collectively partition the vertex set of T and such that xy ∈ E(T ) if and only if there is an edge between a vertex from P x and a vertex from P y in T . Observe that a consequence of this definition is that T is a minor of T . The root-path of (T , P) is the path P x where x is the root of T . Now, consider a path P x with x distinct from the root. The path P x has a center, defined as the endpoint in P x of the edge in T linking P x to P y where y is the parent of x in T . The height of the path-partition (T , P) is the height of T . When considering a path-partition (T , P) of a tree T , it will be useful to embed T itself in the plane in a way that is 'faithful' to the path-partition. This leads to the following definition: An embedding of T in the plane is faithful to the path-partition (T , P) if each path in P is drawn horizontally, and contracting each such path into one of its vertices we obtain some plane embedding of T , with its root drawn at the bottom and its edges going up. See Figure 1 for an illustration. As the paths in P are drawn horizontally, they have a natural orientation from left to right. Every edge e of T is either horizontal or vertical, depending on whether e belongs to some path in P or not.
Our motivation for considering path-partitions is the following lemma.
Lemma 4. Every tree of pathwidth k has a path-partition of height at most 2k.
Proof. We prove the following stronger statement: For every tree T of pathwidth k and every vertex u ∈ V (T ), there is a path-partition of T of height at most 2k with u in the root-path. The proof is by induction on k. For k = 0, the tree T consists of the single vertex u. Clearly, it has a path-partition of height 0 with u in the root-path. Now suppose k > 0 for the inductive case. Let (P, C) be a path decomposition of T of width k, where T v ∈ C denotes the path associated to vertex v ∈ V (T ). Enumerate the vertices of the path P indexing the path decomposition as p 1 , . . . , p n , in order. We may assume without loss of generality that there are (non-necessarily distinct) vertices x, y ∈ V (T ) such that ) and let D 1 , . . . , D c denote its components. Observe that each tree D j (j ∈ {1, . . . , c}) has pathwidth at most k − 1. Indeed, For each j ∈ {1, . . . , c}, let d j denote the unique vertex of D j having a neighbor in Q 1 ∪ Q 2 in T . By induction, each tree D j (j ∈ {1, . . . , c}) has a path-partition (T j , P j ) of height at most 2(k − 1) such that d j in the root-path. Let r j denote the root of T j .
We construct a path-partition (T , P) of T as follows: T consists of the disjoint union of T 1 , . . . , T c plus two extra vertices q 1 and q 2 with q 1 the root of T and q 2 a child of q 1 . The paths associated to q 1 and q 2 are Q 1 and Q 2 , respectively. For each j ∈ {1, . . . , c}, we make q 1 or q 2 adjacent to r j , depending whether d j has a neighbor in Q 1 or Q 2 in T . It is easy to verify that (T , P) is a path-partition of T of height at most 2(k − 1) + 2 = 2k.
The fact that trees of bounded pathwidth have path-partitions of bounded height is a natural observation. It is thus likely that this observation was made before though we are not aware of any relevant reference. We also note that we made no effort to optimize the bound in Lemma 4 and we do not know whether the factor 2 is unavoidable.
Let T be a tree and fix a path-partition (T , P) of T . Suppose further that T is embedded in the plane faithfully to (T , P). We use the following terminology when discussing paths in T . First, every path P ∈ P has a corresponding level, which is defined as the height of the corresponding vertex in T . By extension, every vertex of T has a level, the level of the path in P it belongs to. Define the base of an arbitrary path P in T as the subpath induced by the vertices of P of minimum level. Since the base of P is a subpath of a path in P, its vertices are ordered from left to right by the plane embedding of T . The path P is said to be ascending if at least one of its two endpoints belongs to its base. (We note that in particular all paths in P are ascending, even though they are drawn horizontally in the embedding of T .) Each ascending path P in T has a source, defined as the endpoint of P that is in the base of P ; in case both endpoints are in the base, the left-most one is selected as the source. We typically think of ascending paths P as being directed from their source to their other endpoint so that the notion of ith vertex of P is well defined, the first vertex being the source. An ascending path P with at least two vertices either goes right or goes left or goes up, depending on whether the second vertex of P is on the base and to the right of the source, or on the base and to the left of the source, or is one level higher.
Next we generalize the notion of repetition as follows. A near repetition is a sequence of the form x 1 . . . x r y 1 . . . y g x 1 . . . x r , where r 1 is its length, and g 0 is said to be its gap. (Thus for g = 0 this is the usual notion of repetition.) Now, let us return to our tree T from the previous paragraph, and let φ denote an arbitrary coloring of its vertices. A slightly technical but key definition for our purposes is the following: An ascending path P of T is said to be φ-bad if, enumerating its vertices as v 1 , v 2 , . . . , v p starting from its source, the sequence φ(v 1 )φ(v 2 ) . . . φ(v p ) forms a near repetition x 1 . . . x r y 1 . . . y g x 1 . . . x r of length r and gap g where at most r vertices from v r+1 , . . . , v r+g lie in the base of P . (That is, either g r, or g > r but at most r vertices from the 'gap' section are in the base of P .) An ascending path that is not φ-bad is said to be φ-good. We sometimes drop φ when using these two adjectives if the coloring φ they refer to is clear from the context Equipped with these definitions we may now state the following technical lemma, which turns out to be the heart of the proof.

Lemma 5.
There is a function f : N × N → N such that, for every 1, every h 0, and every tree T faithfully embedded according to a path-partition (T , P) of T of height h with lists L v (v ∈ V (T )) of colors of size f ( , h), one can find sublists S v ⊆ L v (v ∈ V (T )) of size such that, for every coloring φ of T with these sublists, every ascending path of T is φ-good.
In order to motivate Lemma 5, we show that with only a little extra effort (greedy coloring from the sublists) it implies Theorem 2. with S v ⊆ L v and |S v | = 2k + 1 for each vertex v ∈ V (T ), such that in any coloring φ of T with these sublists, all ascending paths in T are φ-good.
We define a nonrepetitive coloring φ of T with the lists S v (v ∈ V (T )) in a greedy manner. We color the vertices of T one by one in non-decreasing order of their levels. Let v ∈ V (T ) be a vertex under consideration. Let v 1 v 2 . . . v p denote the shortest path in T from v 1 = v to the root-path (thus it enters the root-path in vertex v p ). Recall that every edge of the form v i−1 v i with i ∈ {2, . . . , p} is either horizontal or vertical in T . Let Each vertex in G(v) is said to be a guard for vertex v. Note that |G(v)| is exactly the level of vertex v in T , and thus in particular |G(v)| 2k. We color v as follows: Let φ(v) be an arbitrarily chosen color from the non-empty set S(v) − φ(G(v)). (Here, φ(G(v)) denotes the set of colors used for vertices in G(v); note that these vertices are already colored since they lie on lower levels.) We claim that φ is a nonrepetitive coloring of T . Arguing by contradiction, suppose that there is a repetitively colored path P = v 1 . . . v p w 1 . . . w p . Consider the edge e = v p w 1 . First we show that e belongs to the base of P . Suppose not, and consider the shortest subpath of P that includes the edge e and has one endpoint in the base of P . Reversing P if necessary, we may assume without loss of generality that this subpath is of the form v p w 1 . . . w m , with w m being the only vertex on the base. Observe that w m−1 w m is a vertical edge. This implies that w m is a guard for all the vertices in {v 1 , . . . , v p , w 1 , . . . , w m−1 }. In particular, the color φ(w m ) cannot have been used for vertex . Therefore, the edge e must lie in the base of P .
Let and r be the number of vertices in {v 1 , . . . , v p } and {w 1 , . . . , w p }, respectively, that are in the base of P . Reversing P if necessary, we may assume without loss of generality that r. Consider the path P = w r w r−1 . . . w 1 v p . . . v 1 . Observe that P is an ascending path as one of its endpoints, namely w r , is in the base of P . Now, φ(w r )φ(w r−1 ) . . . φ(w 1 )φ(v p ) . . . φ(v 1 ) is a near repetition of length r with gap p − r, and exactly vertices from the gap section are in the base of P . Since r, we deduce that P is φ-bad, contradicting the fact that every ascending path is φ-good.
An arborescence is a rooted directed tree where the edges are directed away from the root. It will be convenient to consider arborescences that are embedded in the plane without edge crossings in such a way that the root is drawn at the bottom and all arcs go up (thus the source of an arc is drawn below its sink), which we simply call plane arborescences. The height of a vertex in an arborescence is defined as its distance to the root, thus in particular the root has height 0. The rightmost path of a plane arborescence is the path obtained by starting from the root and always taking the rightmost arc going up, until reaching a leaf.
We classify directed paths in a plane arborescence A as being good or bad w.r.t. a given coloring φ of A, similarly as for ascending paths: Say that a directed path P is φ-bad if, enumerating its vertices as can be written as a near repetition x 1 . . . x r y 1 . . . y g x 1 . . . x r of length r and gap g where at most r vertices from v r+1 , . . . , v r+g lie on the rightmost path of A. (That is, either g r, or g > r but at most r vertices from the 'gap' section are on the rightmost path.) If the directed path P is not φ-bad then it is φ-good. Lemma 6. Let 1, let A be a plane arborescence, and let L v (v ∈ V (A)) be lists of colors of size 32 3 + 1. Then one can find sublists S v ⊆ L v (v ∈ V (A)) of size such that, for every coloring φ of A with these sublists, every directed path starting on the rightmost path is φ-good.
The interest of Lemma 6 is that Lemma 5 can be proved by iterated applications of Lemma 6, as we now show.
Proof of Lemma 5 (assuming Lemma 6). The function f ( , h) that will be used is defined inductively on h as follows: f ( , 0) := 32 3 + 1, and f ( , h) := f (32(32 3 + 1) 3 Let T be a tree with a path-partition (T , P) of height h and let L v (v ∈ V (T )) be lists of colors of size f ( , h). Suppose further that T is faithfully embedded according to (T , P). We prove the lemma by induction on h. For the base case of the induction, h = 0, we observe that T is then a path and all ascending paths in T are simply subpaths of T . As f ( , 0) = 32 3 + 1, by Lemma 6 there are sublists S v ⊆ L v for each vertex v ∈ V (T ) with |S v | = such that, for every coloring φ of T with these sublists, all ascending paths of T are φ-good, as required. (As expected, when applying Lemma 6 we first turn the path T into an arborescence by directing it from left to right.) For the inductive case h > 0, let x be the root of T and let P x be the root-path of T . Let also D 1 , . . . , D c be the components of the forest T − V (P x ). (Note that there is at least one component.) For each i ∈ {1, . . . , c}, the path-partition (T , P) induces in a natural way a path-partition (T i , P i ) of D i of height at most h − 1, with T i rooted at the only vertex that is a neighbor of x in T . Since f ( , h) = f (32(32 3 + 1) 3 + 1, h − 1), applying induction on D i we obtain for each vertex v ∈ V (D i ) a sublist S v ⊆ L v of size 32(32 3 + 1) 3 + 1 such that, for every coloring φ of D i with these sublists, every ascending path of D i is φ-good.
Next, for each vertex v ∈ V (P x ) let S v be an arbitrary subset of L v of size 32(32 3 + 1) 3 + 1. Thus, every vertex v of T now has a corresponding sublist S v ⊆ L v of size 32(32 3 + 1) 3 + 1. Moreover, given any coloring φ of the tree T with these sublists, the only ascending paths that could possibly be φ-bad are those having their sources in P x . We shall refer to these ascending paths as the risky paths of T .
Enumerate the vertices of the root-path P x as v 1 v 2 . . . v n , from left to right. Define two plane arborescences A and A from T by rooting T at v 1 and v n , respectively, and ensuring that P x is a prefix of the rightmost path in both instances. Note that the rightmost path of A could extend beyond P x (in case v n is not a leaf of T ), and the same is true for the rightmost path of A (if v 1 is not a leaf). What is important for our purposes is to observe that each risky path of T starts on the rightmost path in both A and A . Observe also that each risky path of T that goes right (left) is a directed path in A (respectively A ), and risky paths that go up are directed in both A and A .
First, apply Lemma 6 on A with list assignment S v (v ∈ V (A)), giving for each vertex v ∈ V (T ) a sublist S v ⊆ S v ⊆ L v of size 32 3 + 1. Next, apply Lemma 6 on A with list assignment Since every risky path of T is mapped to a directed path starting on the rightmost path in A or A , by the properties of the sublists S v and S v (v ∈ V (T )) guaranteed by Lemma 6 we know that, for every coloring φ of T with the lists S v (v ∈ V (T )), all risky paths of T are φ-good. Therefore, the lists S v (v ∈ V (T )) have the desired properties.
It remains to prove Lemma 6. As alluded to in the introduction, we will do so by adapting the algorithmic method used in [6,10,13].
Proof of Lemma 6. Let N := 32 3 + 1 denote the size of the lists. For v ∈ V (A), let UP(v) denote the set of vertices w ∈ V (A) that can be reached via a directed path from v in A. (Note that v ∈ UP(v).) In the proof, we will often abbreviate 'subset of size k' and 'sublist of size k' into 'k-subset' and 'k-sublist', respectively.
We define a simple randomized algorithm, Algorithm 1, that tries to find an -sublist S v of L v for each vertex v ∈ V (A) such that, for every coloring φ of A with these sublists, every directed path starting on the rightmost path of A is φ-good. The following informal description of the algorithm is complemented by the more formal description given in Algorithm 1. The algorithm explores the arborescence A via a depth-first, left-to-right search starting from the root. The algorithm maintains at all time -sublists S v ⊆ L v for all vertices v encountered before the current vertex u in the depth-first search of A. These sublists have the following property: For every coloring φ of these vertices with these sublists (φ being thus a partial coloring of A), every directed path starting on the rightmost path of A that is fully colored is φ-good. We say that such a partial sublist assignment is valid.
Next, the algorithm treats the current vertex u and tries to maintain the above property. To do so, the algorithm first chooses an -sublist S u ⊆ L u uniformly at random. If this new sublist S u triggers the existence of a φ-bad path in A for some (partial) coloring φ with the current sublists-that is, the current sublist assignment is no longer valid-it erases some of these sublists as follows: Say v 1 . . . v 2r+g with v 2r+g = u is a φ-bad path with color sequence φ(v 1 ) . . . φ(v 2r+g ) of the form x 1 . . . x r y 1 . . . y g x 1 . . . x r . The algorithm then erases the choice for the list S v for all vertices v contributing to the second occurrence of the repeated sequence and their descendants, that is, for all v ∈ UP(v r+g+1 ). At the next iteration, v r+g+1 becomes the new current vertex, that is, the next vertex to be treated. Notice that this makes the algorithm backtrack a number of steps w.r.t. the depth-first left-to-right search of A.
If on the other hand, the new sublist S u does not trigger any such bad configuration, then the current sublist assignment remains valid. In this case, before proceeding to the next random choice the algorithm first tries to extend the current sublist assignment deterministically as much as possible. (While it might not be clear at first glance why this deterministic extension step is needed, we remark that it is actually a key feature of the algorithm without which we could not do the analysis below.) This is done as follows: The algorithm considers the children u 1 , . . . , u k of u one by one in left-to-right order, until a problematic child is identified: When considering u j , the algorithm checks whether there exist -subsets S v ⊆ L v for all v ∈ UP(u j ) such that, taken together, they extend the current sublist assignment in such a way that it remains valid. If these subsets exist, the current sublist assignment is extended in this way to the whole subtree rooted at u j , and the algorithm considers the next child of u. (If there are more than one valid choice for these sublists, the algorithm chooses one according to a deterministic rule.) If no such extension of the current sublist assignment can be found for vertices in UP(u j ), then u j is identified as being a problematic child of u, and u j becomes the next vertex to be treated. Observe that this effectively makes the algorithm proceed with the depth-first left-to-right search of A for some number of steps.
Let us make some observations concerning the algorithm: Right at the beginning, after selecting a sublist for the root of A, two situations can occur: (1) No child of the root is problematic. Thus a valid sublist assignment for all vertices of A has been found, and the lemma is proved. (2) Some child of the root is problematic. In this case, it is important to observe that later on each vertex u for which a random sublist S u is chosen was problematic when its parent was considered. It follows in turn that some child of u will be problematic, since otherwise we could extend the sublist assignment to the whole subtree rooted at u.
To summarize, we may assume that we are in case (2) at the beginning, since otherwise we are done. This implies that the vertex u that is currently being treated by the algorithm always has a problematic child. Moreover, the algorithm will never stop, simply because while it can erase the choices of sublists for some vertices of A it cannot do so for the root, as is easily checked. Our proof will then proceed in the following way: We run the algorithm until it made M random choices of sublists and then stop it, where M will be some large number which is a function of |V (A)| and . We then carefully set up a concise description (called log) of its execution that is precise enough to allow us to recover from it all random choices that were made by the algorithm. Finally, we count the number of distinct logs that can occur after M random choices, and show that, for sufficiently large M , this number is strictly less than N M . From this we deduce that not all sequences of M random choices of sublists can occur in case (2). In other words, there is a choice for the sublist of the root of A leaving us in case (1), which then finishes the proof.
This concludes our informal description of the algorithm, see Algorithm 1 for the pseudo code. A few remarks about the latter are in order: First, we assume that the -subsets of L v have been enumerated for each v ∈ V (A), so that the j-th -subset of L v is well defined for j ∈ N . This ordering also induces an ordering on every subcollection of the collection of -subsets of L v . We also use this enumeration in the proof. Second, for simplicity we model the random choices made by the algorithm by a sequence r 1 , r 2 , . . . , r M of numbers given in input, each between 1 and N , where r i will be the number used for the i-th random choice. We call this sequence the random input. Third, in line 8, the φ-bad path is chosen according to some fixed rule. Similarly, in line 15, the sublists S v are chosen according to some fixed rule. (In each case, the actual rule is irrelevant, as long as it is deterministic.) In the following, by the i-th iteration of the algorithm, we mean the i-th iteration Algorithm 1: Attempts to find sublists S v of L v for all v ∈ V (A), each of size , such that for every coloring φ of A with these sublists, every directed path starting on the rightmost path of A is φ-good. . . x r . With a slight abuse of terminology, we will also say that the corresponding φ-bad path has been retracted. From now on we argue by contradiction and suppose that the desired sublists for the vertices of A do not exist. In other words, we assume that every choice for the sublist of the root at the beginning of the algorithm leaves us in case (2) described above. In particular, for all M and all random inputs r 1 , r 2 , . . . , r M , Algorithm 1 runs for M steps and then reports failure.
We are going to create a concise description of what Algorithm 1 does during the M steps of its execution. This description is completely determined by the lists and the random input. We see the lists L v (v ∈ V (A)) as being fixed and thus treat the description as a function of the random input r 1 , r 2 , . . . , r M . The description, which we call an M -log, We start by estimating the number of M -tuples D = (d 1 , . . . , d M ). Each sequence D = (d 1 , . . . , d M ) can be injectively mapped to its sequence of differences (d 2 − d 1 , . . . , d M − d M −1 ). (Note that d 1 = 1.) All numbers in this new sequence belong to the set {1, 0, −1, −2, . . .}, as is easily seen. Next we transform that sequence into yet another sequence by replacing each number k by 1 followed by 1−k consecutive occurrences of −1. For instance, the sequence of differences (1, 1, 1, 1, 1, −2, −1, 1) gets mapped to (1, 1, 1, 1, 1, 1, −1, −1, −1, 1, −1, −1, 1). It is easy to see that the second transformation is also injective. The resulting sequence D is a sequence over the alphabet {−1, 1}. The number of 1's in D corresponds to the number of times the algorithm assigns a value to some variable S u in line 6, and is thus equal to the number of iterations, that is, M . The number of −1's in D is the sum of all values of r over all bad paths v 1 . . . v 2r+g considered in lines 8-9 during the execution. One can see this as the number of times the algorithm 'erases' a value of S v for some v ∈ V (A) that was set earlier using the random input (note that an execution of line 9 erases r such values). Thus, this number is at most the total number of executions of line 6, that is, the number of 1's in D , which is M . Hence, D has size between M and 2M , and there are at most 2 M + 2 M +1 + · · · + 2 2M 2 2M +1 such sequences D .
Next we bound the number of different functions S. Note that this number depends only on N , , and |V (A)|, so it can be treated as a constant w.r.t. M . We denote this number by c (its exact value being irrelevant for the analysis).
Since N is much bigger than (recall that N > 32 3 ), each term in the previous sum is upper bounded by the first term · N −1 −1 , and we obtain that the sum is at most 2 · N −1 −1 . (Of course, this is a rather crude upper bound but it is good enough for our purposes.) Hence all numbers in the sequence Γ i are between 1 and 2 · N −1 −1 . Note also that the length of the sequence Γ i is exactly the length of the near repetition retracted during the i-th iteration. Hence, given D we know exactly which Γ i are defined and what are their lengths. The sum of these lengths is the total number of −1's in D , which is at most M . Therefore, for a fixed D there can be at most 2 · N −1 −1 M distinct sequences Γ.
Putting all the previous observations together, we deduce that the number of distinct tuples (D, S, B, Γ) is at most as desired. (The o(·) follows from the fact that N > 32 3 .) This shows that, if M is sufficiently large, then the number of possible M -logs is strictly smaller than N M , the number of random inputs of length M . To obtain the desired contradiction, it remains to show that runs of the algorithm on different sources produce distinct M -logs, that is, that any M -log (D, S, B, Γ) uniquely determines the random input used by the algorithm to produce it. This is exactly what we show next.
Consider an M -log (D, S, B, Γ) and let r 1 , . . . , r M be any random input that can lead to its production. We prove that r 1 , . . . , r M are uniquely determined by induction on M . This is clearly true if M = 1, since the function S tells us explicitly which sublist was chosen for the root of A. So assume M > 1 for the inductive case. Let D = (d 1 , . . . , d M ), B = (b 1 , . . . , b M ), and Γ = (Γ 1 , . . . , Γ M ).
First suppose that d M = d M −1 + 1. Then no near repetition was retracted during the M -th iteration. The vertex u that was the current vertex at the beginning of the M -th iteration is determined by the function S: It is the last vertex w ∈ V (A) in the depth-first left-to-right search order from the root such that S(w) = undefined that has a child w with S(w ) = undefined. (Note that the first such child w is the problematic child u j of u identified when exiting the inner while-loop.) Now, observe that r M is simply the index of S(u) among -subsets of L u , and is thus completely determined by our M -log (D, S, B, Γ).
Having determined r M , we can use the inductive hypothesis to deduce that r 1 , . . . , r M −1 are also fully determined by the log as follows. Let (D * , S * , B * , Γ * ) be the (M − 1)-log resulting from the execution of the algorithm for M − 1 iterations on random input r 1 , . . . , r M −1 . By induction, the latter sequence is uniquely determined by the (M − 1)-log (D * , S * , B * , Γ * ). Hence, it is enough to show that (D * , S * , B * , Γ * ) is in turn uniquely determined by our initial M -log (D, S, B, Γ). Clearly, As for S * , it is simply obtained from S by letting the value of each vertex v ∈ UP(u) be undefined. That is, for each v ∈ V (A) (recall that u ∈ UP(u)).
In the case when d M = d M −1 − r + 1 with r > 0, a near repetition was retracted during the M -th iteration with a repeated part of size r. Here we first show that r 1 , . . . , r M −1 are uniquely determined, and then we prove that the same holds for r M .