Advice Complexity of Adaptive Priority Algorithms

The priority model was introduced to capture"greedy-like"algorithms. Motivated by the success of advice complexity in the area of online algorithms, the fixed priority model was extended to include advice, and a reduction-based framework was developed for proving lower bounds on the amount of advice required to achieve certain approximation ratios in this rather powerful model. To capture most of the algorithms that are considered greedy-like, the even stronger model of adaptive priority algorithms is needed. We extend the adaptive priority model to include advice. We modify the reduction-based framework from the fixed priority case to work with the more powerful adaptive priority algorithms, simplifying the proof of correctness and strengthening all previous lower bounds by a factor of two in the process. We also present a purely combinatorial adaptive priority algorithm with advice for Minimum Vertex Cover on triangle-free graphs of maximum degree three. Our algorithm achieves optimality and uses at most 7n/22 bits of advice. No adaptive priority algorithm without advice can achieve optimality without advice, and we prove that an online algorithm with advice needs more than 7n/22 bits of advice to reach optimality. We show connections between exact algorithms and priority algorithms with advice. The branching in branch-and-reduce algorithms can be seen as trying all possible advice strings, and all priority algorithms with advice that achieve optimality define corresponding exact algorithms, priority exact algorithms. Lower bounds on advice-based adaptive algorithms imply lower bounds on running times of exact algorithms designed in this way.


Introduction
Today everybody who has studied algorithms is familiar with an intuitive notion of a greedy algorithm. A greedy algorithm adheres to the philosophy succinctly stated as "live for today." In many discrete optimization problems, input can be represented as a sequence of items coming from some infinite universe, and the output of an algorithm can be represented as a sequence of decisionsone decision per item, for example to accept an item or to reject it. For every input item, a greedy algorithm makes a locally best choice. It could mean different things in different contexts, but most often it means that the algorithm pretends that each input item is the last it is going to receive. The algorithm then makes a choice about that input item in a way that optimizes the objective under that assumption. This is just an intuitive understanding of greedy algorithms, but how are they defined formally? One of the earliest attempts to answer this question can be attributed to the development of the theory of matroids in 1935 by Whitney [31]. More recently, this theory has been extended to greedoids by Korte and Lovász [24,25,26,23]. In spite of the profound connection between greedoids and optimization problems admitting optimal greedy algorithms, greedoids do not give a complete characterization of greedy algorithms. In fact, to this day, almost 85 years since the introduction of matroids, there is still no consensus in the research community as to a formal definition of greedy algorithms. Priority algorithms were introduced by Borodin, Nielsen, and Rackoff [9] in an attempt to formalize "greedy-like" or "myopic" algorithms. This model has been studied in the context of many combinatorial optimization topics, including the classic graph problems [1,11,6,2], makespan minimization [30], satisfiability [29], auctions [8], and general results, present in many of the above contributions as well as in [27]. Many classical greedy algorithms have a simple structure consisting of two components: (1) a sorting/ordering/priority component, and (2) an online/irrevocable decisions component. The second component was described in the previous paragraph, while the first component determines the order in which the items are processed by that second component. Priority algorithms have this structure and they come in two flavors: fixed and adaptive. We illustrate these models with two well known examples.
First, consider the earliest finishing time (EFT) algorithm for the interval scheduling problem. The universe of input items is U = {(s, f ) | s, f ∈ Q and 0 ≤ s < f }. An input instance is a finite subset I ⊂ U. The EFT algorithm can be thought of as ordering the entire universe U (by ascending finishing times f ) prior to seeing any of the inputs. The adversary then feeds the input I to the algorithm, but in the order specified by the algorithm. The algorithm makes an irrevocable decision about each new arriving item, namely, accept the interval if it does not overlap the partial solution so far, and reject it otherwise. This describes the typical framework of fixed priority algorithms. An extension of this basic setup leads to adaptive priority algorithms -those algorithms that can change the ordering of the universe after processing each input item. An example of an adaptive priority algorithm is Prim's algorithm for the minimum spanning tree problem. The universe consists of triples U = {(u, v, w) | u, v ∈ N, w ∈ Q, u = v} where (u, v) indicates an edge between vertices u and v, while w is the weight of this edge. Prim's algorithm orders edges by increasing weight, but it has to maintain a single connected component. Thus, the algorithm gives higher priority to edges incident to vertices already added to the solution. Since the set of vertices in the solution keeps growing, the ordering (the priority function) keeps changing while input items are being processed.
The priority model has a lot in common with the model of online algorithms. Priority algorithms can be seen as either extending the power of online algorithms by allowing a limited ordering of input items, or as limiting the power of adversary by not allowing it full control over the order of items. An online algorithm assumes no knowledge of future input items and has no control over the order, in which the items arrive. Nonetheless, an online algorithm is required to make an irrevocable decision for each input item. The assumption that an online algorithm does not see the future at all is quite restrictive and in many cases impractical. It is often the case that some information about the input sequence is known in advance, e.g., its length, the largest weight of an item, etc. An information-theoretic way of capturing this side knowledge is given by the advice tape model 1 of Hromkovič et al. [19] (further developed in Böckenhauer et al. [5]). In this model, an all powerful oracle that sees the entire input sequence creates a short advice string that is written on an infinite tape. The algorithm uses the advice string in processing the online items. The main object of interest is the trade-off between the amount of advice and the competitive ratio 2 of an online algorithm. The worst-case length of advice is called the advice complexity. Often, a short advice string results in a dramatic improvement over the best competitive ratio that is achievable by an online algorithm without advice. Of course, a short advice string can be computationally difficult to obtain since the oracle is allowed unlimited power. Advice complexity of online algorithms has recently become a very active research area -see [10] for a survey on online algorithms with advice, including an extensive list of articles. Of most relevance to us are results concerning graph algorithms [3,12,13,18,17,21,22,28].
Recently a superset of the current authors has introduced an extension of a fixed priority model with advice [7]. As with online algorithms with advice in the advice tape model, an oracle sees the entire input and writes an advice string on the tape. The advice is then read by the priority algorithm at its discretion during the runtime. The oracle and the algorithm cooperate and they are fully in agreement in terms of how the advice is generated and used. In this model, we are interested in the trade-off between the length of the advice and the approximation ratio achieved by such an advice-based algorithm. In addition to introducing this model, [7] also developed a general framework for proving lower bounds in this model and applied this framework to several classical problems, such as maximum independent set, maximum bipartite matching, vertex cover, etc. That research parallels and extends recent developments and successes in the area of online algorithms with advice. However, that research left open the question of whether the ideas can be extended to an (arguably more useful) adaptive priority model. This paper addresses that question. This paper presents three main contributions. The first contribution is conceptual: we introduce the notion of advice in the adaptive priority model and identify three natural models based on how the priority function is allowed to depend on the advice. The second contribution is technical: we study the classical vertex cover problem on graphs of maximum degree 3. We present an adaptive priority algorithm with advice that achieves optimality. It was known that adaptive priority algorithms for this problem cannot achieve optimality without advice [6]. In addition, we show that online algorithms must use more advice than our algorithm to achieve optimality. Thus, both adaptive priority and advice together can be strictly more powerful than either one can be by itself. Our algorithm is purely combinatorial with an involved analysis. A large part of the proof relies on a thorough case analysis. This is the most technical of our contributions. The third main contribution is both technical and conceptual: we extend the general lower bound framework of [7] to work in the most powerful of the newly introduced adaptive priority models with advice. We immediately obtain similar results to, but stronger than those in [7] when we apply this framework to the same classical problems (independent set, bipartite matching, etc.). In addition, we manage to simplify the proof that the framework works, and we strengthen the lower bounds implied by the framework by a factor of 2. The reason for these improvements is that we noticed that is possible to derive lower bounds by reducing directly from the online problem of binary string guessing instead of going through an intermediate pair matching problem in the priority model as in [7].
The remainder of the paper is organized as follows. Section 2 introduces the three adaptive priority models with advice. In Section 3 we introduce an artificial problem called "thorny path" and show how upper and lower bounds can be derived in adaptive priority models with advice. In Section 4 we present our adaptive priority algorithm for the vertex cover problem on graphs of degree at most 3 and analyze its advice complexity. Section 5 presents the extension of the general lower bound framework of [7] to adaptive priority with advice. The paper ends with conclusions and some open problems in Section 6.

Models
A request-answer game is specified by the universe of input items U, the universe of decisions D, the objective function OBJ : U n × D n → R ∪ {±∞}, and the type of a problem which could be either "maximization" or "minimization". An input to the request-answer game is a finite multi-set of items from the universe, i.e., X = {x 1 , . . . , x n } where x i ∈ U. We assume that the objective function is invariant under simultaneous permutations of input items and decisions, i.e., for all x 1 , . . . , x n and all d 1 , . . . , d n and all π ∈ S n we have: OBJ(x 1 , . . . , x n , d 1 , . . . , d n ) = OBJ(x π(1) , . . . , x π(n) , d π(1) , . . . , d π(n) ).
Note that the objective function is actually a family of functions -one for each length of input n ∈ N. The values ±∞ in the objective can be used to specify infeasible inputs. The setting of request-answer games is very general and includes most problems of interest in the areas of online and priority algorithms.
A function P : U → R is called a priority function. We introduce a short-hand notation max P X := arg max{P (x) : x ∈ X} for the element of highest priority in the multi-set X. In case there are multiple elements of highest priority, we assume the ties are broken in the adversarial fashion, i.e., we assume the most unfavorable tie-breaking for our algorithms.
A priority algorithm ALG does not see all of the input X at once. Instead, ALG receives X one item at a time. The priority algorithm controls the order in which X is revealed by specifying priority functions. The next input item from X is revealed according to the specified priority function. We shall consider priority algorithms in the advice tape model [19,5].
In the advice tape model, there are two cooperating players -the oracle and the algorithm. The oracle sees the entire input X and writes advice to the algorithm on the infinite advice tape over the binary alphabet. The contents of the infinite tape are denoted by a, which is an infinite string a 1 a 2 a 3 · · · , where a i ∈ {0, 1}. The algorithm can decide to read zero or more bits from the advice tape (sequentially, from left to right) before making each decision. We use s i to refer to the prefix of the advice tape that has been read so far by the algorithm. The maximum number of advice bits read, i.e., |s n |, (where the maximum is taken over all inputs of length n) is the advice complexity of the algorithm. See Algorithm 1 for the template of a priority algorithm with advice.
Algorithm 1 Template of a Priority Algorithm ALG with Advice 1: X is the input 2: P 0 is the initial priority function 3: i ← 1 4: while X = ∅ do 5: read zero or more advice bits from a 7: s i ← the known contents of the advice string 8: d i ← decision of ALG for input x i 9: X ← X \ {x i } 10: P i ← the updated priority function 11: i ← i + 1 The priority algorithm is specified by three elements: (1) how the priority functions P i are chosen in each step, (2) how the advice is read, and (3) how the decisions d i are made. The decisions are always functions of the input seen so far and the advice read so far, i.e., Depending on the details of (1) we distinguish the following three models. Model 1. We allow the priority functions to depend on the input received so far and the advice read so far Model 2. We allow the priority functions to depend on the input received so far and the decisions made so far Model 3. We allow the priority functions to depend only on the input received so far Observe that any algorithm that works in Model 2 also works in Model 1 (since the input and advice determine the decisions), and similarly any algorithm that works in Model 3 can be simulated by an algorithm in Model 2. Thus, we refer to Model 1 as the strongest model of priority algorithms with advice, Model 3 as the weakest model, and Model 2 as the intermediate model. Observe that Models 1 and 2 coincide when advice encodes the decisions to be made about some input items. For example, decisions could be "accept/reject" and the advice could determine exactly what to do when an optimal decision cannot be inferred from past input items.
To separate Models 1 and 2, consider the artificial problem of computing the spanning tree with an average edge weight as far from the average edge weight in the entire graph as possible. Clearly, one must either compute the maximum or the minimum spanning tree. One bit of advice is required to define a priority function to make the very first decision correctly.
We have presented the most general version of priority algorithms, called adaptive, since the P i can adapt to the input being revealed. In the fixed priority algorithms we insist that P i = P 0 for all i ≥ 1.
When including advice, one can ask how computationally expensive it is to generate that advice. This could vary significantly from one algorithm/application to the next, but the model allows it to be very expensive. This is in line with the information-theoretic nature of the priority model itself. Observe that the priority model does not impose any computational restrictions on priority function or even decisions of the algorithm. This is similar to other areas of theoretical computer science, such as online algorithms, communication complexity, decision tree complexity, and so on. These models sidestep hard computational questions, such as P vs. N P , by introducing informational bottlenecks. The strengths of this information-theoretic modeling are that it makes the proven lower bounds stronger and that it makes it possible to prove results that do not depend on unproven assumptions in complexity theory. The main weakness of this information-theoretic modeling is that algorithms designed might be impractical. However, priority algorithms achieving good approximation ratios tend to have easily computable priority functions and easily computable decisions. Unfortunately, when it comes to advice, it is often not easily computable. There is a generic trick one can use to go from priority algorithms with advice to an offline algorithm with the same approximation ratio. If the algorithm uses advice of length then one can enumerate all 2 advice strings and execute the algorithm on each of them keeping track of the best run. This conversion is only efficient for small values of , but even larger values of might lead to interesting offline algorithms. This is the case for algorithms that achieve optimality for NP-hard problems and can lead to exponential time algorithms with improved runtimes compared to brute force. This is, indeed, the case for our adaptive priority algorithm (see Section 4) with advice for vertex cover on graphs of maximum degree 3. It uses at most 15n/46 bits of advice on graphs with n vertices. Using the generic trick, we obtain an optimal offline algorithm that runs in time 2 15n 46 poly(n) < 1.254 n poly(n). This is much better than the naive 2 n poly(n) brute force method; however, there are other more involved optimal offline algorithms achieving even better runtimes. To the best of our knowledge, the current state of the art offline algorithm for this problem has runtime ≈ 1.08 n poly(n) [20], but that algorithm does not arise out of a priority algorithm with advice, and no priority algorithm without advice can achieve an approximation ratio better than 4/3 [6].
A significant motivation for originally introducing and studying priority algorithms was to develop a framework for proving lower bounds for a large collection algorithms at the same time: Establishing that no fixed (or adaptive) priority algorithm can attain a certain approximation ratio implies that one has to look beyond this fairly broad design pattern to possibly discover an algorithm with a better approximation ratio. We note that this motivation is just as relevant for the design of exact or approximation algorithms using the framework outlined above.

Warm-Up: Thorny Path
We call a graph G a thorny path if its vertices can be arranged in layers, such that each layer except the first has exactly two vertices. One of the vertices in layer i is connected to the two vertices in layer i + 1, while the other vertex in layer i is not connected to any other vertex except its parent. The first layer is special -it consists of a single vertex, and the last layer is special -it has two vertices which are not connected to any other vertex except their common parent. An example of a thorny path graph is show in Figure 1. We define the thorny path problem as follows. You are given a graph G which consists of several components, each of which is a thorny path. You are also given a start vertex s of one of the components of G. Your goal is to construct a path from s to one of the leaves in the last layer. The universe of input items is U = Z 3 . Input item (u, v, w) is interpreted as vertex labels such that vertex u is in some level i and its two children are v and w. The universe of decisions D = {0, 1, ⊥}.
Given an input item (u, v, w) the decision 0 means to include edge (u, v) in the solution, the decision 1 means to include edge (u, w) in the solution, and the decision ⊥ means to not include any of the two edges in the solution. The thorny path problem is parameterized by a single parameter k ∈ N, which is one less than the number of layers in the thorny path starting with vertex s. We refer to the parameterized thorny path problem as the k-thorny path problem. We begin with a simple observation. Proof Starting from s use advice bits to decide which of the two children to follow. Adaptivity in the priority function is used to re-sort input items by the ID of the child being followed. Proof Assume that we have adaptive priority algorithms without advice called ALG 1 , . . . , ALG . We fix n large enough (to be specified later) and let x 1 , . . . , x n ∈ Z \ {1} be distinct. Let S be the set of all triples formed from {x 1 , . . . , x n } together with all triples where 1 is the first element of the triple and the other two elements come from {x 1 , . . . , x n }. We construct a thorny path instance I (with 1 connected component) such that each algorithm ALG 1 , . . . , ALG makes a mistake on I. We construct I iteratively. In step j we construct a subinstance I j that guarantees that algorithm ALG j makes a mistake. The thorny path I j starts at a vertex 1 and ends in two leaves. In addition to I j we keep track of a leaf v j that is going to be extended in step j + 1. We also keep track of a set of input items S j ⊆ S that can be used in the extension of I j . The condition that ALG j makes a mistake on I j also continues to hold no matter how I j is extended with elements from S j .
Initially, I 0 is a thorny path consisting of a single vertex 1 and none of the algorithms have made a mistake yet. We have v 0 = 1 and S 0 = S. This is a base case.
Assume that we have constructed a thorny path I j out of some items from S j and the leaf of I j to be extended is v j . Moreover each of ALG 1 , . . . , ALG j makes a mistake on I j and continue to make a mistake no matter how I j is extended by elements from S j . Consider running ALG j+1 on input I j ∪ S j (in spite of it being an invalid input). In each iteration, either it requests an element from I j , updates the priority function, and makes a decision, or it requests an element from S j . Consider the first time ALG j+1 requests an element from S j . If ALG j+1 made a mistake on an element from I j by that time, then we can simply take I j+1 = I j , v j+1 = v j , and S j+1 = S j . All the properties are easy to verify in this case. Otherwise, let (x, y, z) be the first element from S j that is requested by ALG j+1 . Without loss of generality, assume that the decision of ALG j+1 is to accept edge (x, y) and not (x, z). If x = v j , then we extend I j+1 = I j ∪ {(x, y, z)} and S j+1 is S j with all items involving y or x removed, as well as those items that have z as second or third coordinate. Observe that this makes sure that ALG j+1 makes a mistake on item (x, y, z) and this fact is unaffected by further extensions of I j+1 . In this case, we have v j+1 = z. The last case to consider is when ALG j+1 requests (x, y, z) from S j and x = v j . In this case, we also consider an item (v j , x, w) ∈ S j for some w that is different from any other value appearing in the construction so far. By the way that S j 's are constructed and taking n large enough, such w is guaranteed to exist. We extend I j+1 = I j ∪ {(v j , x, w), (x, y, z)}. Again, without loss of generality, assume that ALG j+1 accepts (x, y) rather than (y, z). We again set v j+1 = z and S j+1 to be the set S j with all items involving x, y, w, v j removed, as well as those items that have z as the second or third coordinate. This guarantees that ALG j+1 makes a mistake on item (x, y, z) and continues to make a mistake on this item no matter how I j+1 is extended with elements from S j+1 .
Lastly, observe that each S j can be defined by some subset F ⊆ {x 1 , . . . , x n }. Namely, S j consists of all triples formed from F , as well as triples formed by having the first coordinate equal to v j and the remaining two coordinates come from F . In each iteration going from j to j + 1 at most 4 elements are removed from F . Therefore, we can fix n = 4 to guarantee that the construction terminates only after all algorithms are fooled by the instance.
Finally, an algorithm in Model 1 with b bits of advice is equivalent to running 2 b algorithms in parallel. Thus, we have k = 2 b algorithms that can all be fooled simultaneously by an instance of size 4k = 4 · 2 b . Letting N = 2 b+2 we see that log N − 2 bits of advice is not enough to solve the thorny path problem.

Solving Vertex Cover to Optimality
Given a simple undirected graph G = (V, E), a subset of vertices S ⊆ V is called a vertex cover if every edge is incident to at least one vertex from S. The vertex cover problem is to find a vertex cover of minimum possible size. We consider this problem on graphs of maximum degree 3 in the online and priority settings. An input item is a vertex together with a complete list of its neighbors (including those vertices that have not even appeared as part of the input yet); this is known as the vertex arrival, vertex adjacency model. As mentioned earlier, no adaptive priority algorithm without advice can achieve an approximation ratio for this problem better than 4/3 [6].
In this section, we show that asymptotically this problem requires at least n/3 bits of advice to solve optimally in the online setting, while it can be solved optimally with ≤ 15n/46 ≈ 0.326n bits of advice in the adaptive priority setting.
We begin with the negative result for the online setting.
Theorem 4.1 An online algorithm that accepts a minimum size vertex cover on any graph with maximum degree 3 requires at least (n − 4)/3 bits of advice.
Proof The graphs constructed by the adversary will have n = 6n + 1 vertices, for n ≥ 2. Let S = {v 1 , v 2 , . . . , v 2n } be the first 2n vertices to arrive. All vertices in S will have degree 2, and their neighbors will be vertices never seen before. Each graph G which may be created from these vertices will have a set I G of independent copies of paths of length two, where the middle vertex will be in S and its two neighbors will have degree 1. All other vertices will be in another set C G , which will be a cycle (if there are at least 7 vertices not in I G ), plus one extra vertex, w of degree 1, adjacent to some vertex v ∈ C G \ S. The vertex v will have degree 3, the extra vertex will have degree 1, and all other vertices in C G will have degree 2. Within C G , the paths of length two defined by vertices in S will be joined by one edge. There will be an even number of vertices from S in I G and an even number in C G . The construction is illustrated in Figure 2 S w v Figure 2: The construction used in Theorem 4.1. Here, we have n = 4. The optimal vertex cover is shown in green. Nodes with a single arrow pointing into them are those nodes from S that were selected to be at odd distance from node v. Nodes with two arrows pointing into them are those nodes from S that were selected to be at even distance from node v. Here, we have n G I = 4, n G C = 4, and r = 2.
Let n G I denote |S ∩ I G | and n G C denote |S ∩ C G |, and let n = (n G I + n G C )/2. Note that this graph G has a unique minimum size vertex cover, since the middle vertex of each path in I G is in that cover, and every other vertex in C G , starting with v is also in that cover. This means that every second vertex in S ∩ C G is in that cover, with the neighbor of v not being there. Thus, the optimal vertex cover has size n G I + 3n G C 2 . The number of vertices in S which are in the optimal vertex cover is n G I + n G C /2. For each vertex, u ∈ S, all of which have degree 2, ALG must decide whether to accept or reject this vertex, without knowing if u is in C I or C G . Of course, within C G , ALG will not know if u will have an even or an odd distance from v.
Note that it is possible that n G C = 0. Suppose we want to create a graph G with 0 ≤ r ≤ n vertices from S not in the optimal vertex cover. We can choose any subset R of r vertices in S to be at odd distances from v in C G . Among the other vertices, r can be placed at the even locations in C G , and the other 2n − 2r can be in I G . (Note that the placement of v is also arbitrary, but we are fixing a placement in this counting.) For fixed r, there are 2n r , different possibilities for the subset R. In all, there are n r=0 2n r different possibilities for the subset R, each with a different optimal vertex cover. Any online algorithm which gets the same advice for two of them, must give a suboptimal cover for at least one of them. Thus, such an algorithm needs at least log 2 n r=0 2n r > log 2 2 2n −1 = 2n − 1 = (n − 4)/3 bits of advice for optimality.
Observe that we have crucially used the fact that all input items in S are fixed to be exactly the same in all instances that we consider, i.e., data items in S do not depend on the choice of R, v, and w. Thus, an online algorithm receiving items from S can only rely on advice to act differently on S from instance to instance. Now, we present an algorithm that uses fewer than n/3 bits of advice and achieves optimality in the adaptive priority setting with advice tape, more specifically, in Model 2 as defined in Section 2.
The high level idea is to process all degree 1 and degree 3 vertices first. Then we will be left with a graph in which every vertex has degree 2. Such a graph consists of disjoint cycles. Finding a vertex cover in such graphs is easy -we just need to take a slight precaution with regards to odd versus even cycles. Processing vertices of degree 1 does not require advice -they can be rejected and their unique neighbors accepted. Thus, the rest of the algorithm boils down to using as little advice as possible for vertices of degree 3. We use several tricks to avoid giving advice to some vertices of degree 3. The first trick is to insist that if a vertex of degree 3 does not have to participate in a minimum vertex cover, the corresponding advice bit is to reject this vertex. Rejecting such a vertex is always good, as it immediately leads to accepting 3 more vertices (its neighbors) without any extra advice. Thus, any vertex of degree 3 which has at least two neighbors in the final vertex cover can be rejected, since otherwise there is another vertex cover no larger than this final one, where that vertex is rejected and all three of its neighbors are accepted. Thus, we observe that if a vertex v at any point receives the advice "accept", then it has at most one neighbor in the final minimum vertex cover. This is essentially the only way we use the condition that an oracle gives higher preference to rejecting vertices of degree 3 subject to still obtaining a minimum vertex cover. This observation implies that if v is accepted then as soon as one of the neighbors of v is accepted, we can reject the other neighbors of v. This observation also leads to another trick: if vertex v of degree 3 receives advice to accept and there is another vertex w of degree 3 such that w and v have at least two neighbors in common, then w can be accepted without advice. This holds because at least two neighbors of v are rejected, so w is going to have a rejected neighbor, and it must be accepted.
The following is a more formal description of the algorithm. The algorithm, ALG processes the vertex with highest priority at any point in time. After it is processed, the degree of each of its neighbors is decreased by 1. We also provide a schematic illustration of several cases with the following meaning: a vertex with an arrow pointing into it is the one currently being processed, red vertices are rejected, green vertices are accepted, yellow vertices received advice "accept", and the status of white vertices is irrelevant to the corresponding case being demonstrated.
• The highest priority items are those with a rejected neighbor. They are accepted.
• The next highest priority items are those of current degree zero; they are rejected.
• The next highest priority items are those of degree 1; they are rejected.
• The next highest priority items are those neighbors of a vertex where advice was given (and the advice was to accept or there would be higher priority vertices), which already have one accepted neighbor. They are rejected (since the advice could have been reject if two neighbors need to be accepted).
• The next highest priority items are those with degree 3; advice tells whether or not to accept them, if we cannot figure it out without advice.
If there are at least two minimum size vertex covers (given what has already been accepted and rejected), one where the vertex is accepted and another where the vertex is rejected, the advice the oracle gives is to reject.
Among the vertices with current degree 3, higher priority is given to those with more neighbors in common with a former degree 3 vertex v for which there was advice to accept. If there are 2 or more neighbors in common with such a vertex v, the current vertex must be accepted because it has a neighbor which has been (or will be) rejected (if v had two neighbors which were accepted, it could have been rejected).
If there are no current degree 3 vertices with a neighbor in common with a former degree 3 vertex v for which there was advice to accept, higher priority is given to those which have a neighbor which has already been seen as a neighbor of something else.
• The lowest priority items are those of degree 2. After the first of these arrives, we have cycles/chains and follow them by continually giving highest priority to the neighbor of the vertex just processed, accepting every other vertex. The first vertex in a cycle is rejected. When reaching a vertex with current degree zero in this way (the last vertex in the cycle), it is accepted. This leads to accepting two in a row at the end of an odd cycle.
The analysis of this algorithm is rather involved, so we present a brief overview first. The high-level idea is to attribute many edges to each vertex that gets an advice bit. Since there can be only a few edges in total (3n/2 to be precise) in a graph of maximum degree 3, this would establish an upper bound on the total number of advice bits. If a vertex of degree 3 receives advice to reject, then we show that five edges can be attributed to it, which is more than sufficient for our bound. The problematic case is when a vertex v receives advice to accept. We have to make sure that an edge is not attributed twice or more. First, we show that the case where one of the neighbors of v is also a neighbor of an already processed vertex does not pose trouble for our argument. This means that we can attribute sufficiently many edges to v in this case. The problem is that this argument has to start somewhere, but what if there are no degree 3 vertices which were neighbors of a previously processed vertex? This naturally leads to an idea of connected components, which are defined by the subgraphs induced on the vertices already seen (processed or neighbors of processed vertices). It turns out that if a connected component is large enough then the initial phase of a vertex not having neighbors of previously processed vertices gets amortized, so we can still attribute many edges to each vertex receiving advice on average. To finish the argument it remains to analyze the cases of small connected components. In principle, the remaining issue is a finite problem and could be done by an exhaustive computer search; however, the possible number of connected components is still huge. In our estimation, this approach would still require a lot of careful optimization and potentially using a large computer cluster to perform such calculations. Thus, we decided in favor of a careful theoretical analysis, because we can eliminate a lot of possibilities in an ad-hoc manner. In addition, it is easier to verify correctness of our analysis than correctness of a complicated enumeration algorithm. By a thorough case analysis we are able to show that our algorithm manages to use sufficiently few bits of advice in all such cases.
The proof very heavily uses the fact that, when a vertex of degree 3 is processed using advice, it still has degree 3 and none of its neighbors currently have degree 1, since otherwise that neighbor would have already been processed. This leads to referring to "the remaining neighbor" or "the third neighbor", which must exist according to this logic. This logic also applies to vertices which are not themselves neighbors of these vertices of degree 3. The formal proof is given below.
Theorem 4.2 The above adaptive priority algorithm ALG with advice works in Model 2 and uses at most 15n/46 ≈ 0.326n bits of advice for simple graphs with n vertices and maximum degree 3.
Proof. The priority functions ensure that, for any rejected vertex, its neighbors are accepted. Advice is read only when a vertex with current degree 3 is processed, and advice of 0 tells ALG to reject, while 1 tells ALG to accept. Assuming that the advice is correct, this algorithm is clearly optimal. In addition, since the advice can be deduced from the decisions made, it works in Model 2. We now argue that the number s of vertices which need advice is at most 15n/46 < 0.3261n.
Since the maximum degree is 3, there are at most 3n/2 edges in all. Suppose that, on average, 5 edges are removed for each vertex that gets advice. Then the graph has at least 5s edges, so 5s ≤ 3n/2 and s ≤ 3n/10. Thus, to obtain a result less than n/3, it is sufficient to show that an average of 5 edges are removed for each vertex that gets advice. We do this below, except for the first vertex, where only three edges are removed.
If the advice for a vertex is reject, that vertex is rejected and its three neighbors are accepted. At least five edges must be removed, three to the neighbors, one possibly between neighbors, and one incident to the third neighbor.
For each degree 3 vertex v for which ALG reads an advice bit, 3 edges are removed immediately. The three neighbors, x, y, and z of v each have degree at least 2 just before v is processed, or they would have already been removed. Assume that one of v's neighbors, say x, is also the neighbor of some vertex w which was processed previously. The vertex x has degree 1 after v is processed. The other neighbor of x could be either a neighbor of v or w, or some other vertex u.
Case 1: the other neighbor (call it y) of x is another neighbor of v or w: Vertex x is rejected and y is accepted (or vice versa if y also has degree 1 and gets higher priority). One of these vertices is rejected and then the other is accepted. The edge between them is removed. Now the vertex a with highest priority is either the third neighbor of x or w (since there was advice with v and w, and a neighbor of it has been accepted) or a remaining neighbor of the vertex just accepted. In the former case, a is rejected and it has at least one neighbor which is accepted, so one more edge can be attributed to v. In the latter case, the edge to the remaining neighbor of the vertex just accepted can be attributed to v. In both cases, five edges have been attributed to v.
Case 2: the other neighbor of x is some other vertex u. In this case, x is rejected and u is accepted. The vertex u has at least one neighbor other than x, so there is an additional edge, in addition to the one from x to u, which is removed and can be attributed to v. Thus, in all cases there are five edges attributed to v.
Note that we assumed that x was a neighbor of a vertex that had already been processed. If it was not, then there were no degree 3 vertices which were neighbors of any vertex previously processed. The connected components defined by the subgraphs induced on the vertices already seen (processed or neighbors of processed vertices), could be disconnected in the original graph, or they could be connected by paths with degree 2 vertices. A vertex v which is in a component C, but has not yet been processed has remaining degree 2, so it will never receive advice.
Each component has one vertex (the first processed in the component) with only three edges attributed to it, and the others have five edges attributed to each of them. Consider the possibility of small components.
Case: a component has one or two vertices that get advice. A component must have at least one vertex with advice and at least its three neighbors, so at least four vertices in all. If it only has one vertex with advice, then at most 1 4 of the vertices in the component get advice. If two vertices get advice, those two vertices cannot share more than one neighbor, so there are at least seven vertices in all. In this case at most 2 7 of the vertices in the component get advice. Case: a component has three vertices that get advice. If three vertices get advice, the third one can share at most one neighbor with each of the other two vertices. If there are at least ten vertices in the component, the ratio 3 10 is achieved. Otherwise, there are exactly nine vertices. Call the first two vertices with advice a and b and their shared neighbor x. The vertex x is rejected. Its remaining neighbor, y, is accepted. This y cannot be a neighbor of the third vertex to get advice, or that vertex would not have had degree 3. Thus, it has to be the "unshared" neighbor of a or b, say a. After y is accepted, vertex a has an accepted neighbor, so its remaining neighbor is rejected. This remaining neighbor is the shared neighbor with the remaining degree 3 vertex, which then no longer has degree 3, giving a contradiction to the component containing only nine vertices.
Case: a component has at least five vertices that get advice. Suppose k ≥ 5 vertices in a component get advice. Then, there are at least 3 + 5(k − 1) = 5k − 2 edges attributed. Using that the number of edges is at most 3n/2, the ratio k n is at most 15 46 ≈ 0.326. Case: a component has exactly four vertices that get advice. Consider the remaining case when we have k = 4, i.e., 4 vertices in a connected component receive advice to accept. Let A denote the set of vertices receiving advice to accept. Let A = {a 1 , a 2 , a 3 , a 4 } and suppose that the order in which the vertices of A receive advice is, in fact, a 1 , a 2 , a 3 , a 4 . Let S denote the set of all other vertices in the connected component. We say that the type of a vertex v ∈ S is |N (v) ∩ A|, i.e., the number of neighbors of v among the vertices in A. Let k i denote the number of vertices of type i.
Suppose that a vertex v ∈ S has type 3 and suppose that it is adjacent to vertices a i 1 , a i 2 , a i 3 , with i 1 < i 2 < i 3 . Then after a i 1 and a i 2 receive advice, the degree of v would drop down to 1 and it would be eliminated before a i 3 receives its advice. However, eliminating v means that the current degree of a i 3 drops and consequently a i 3 would not be receiving advice. This leads to a contradiction. We conclude that v of type 3 does not exist and therefore k 3 = 0.
Each vertex of type 2 is adjacent to two distinct vertices from A. Note that no two of vertices a 1 , a 2 , a 3 , a 4 can have two or more neighbors in common. Thus, no two vertices of type 2 can be adjacent to exactly the same set of two vertices from A. Therefore, k 2 ≤ 4 2 = 6. Note that each vertex v of type 2 has to have degree 3, since otherwise after processing the first neighbor a j , its degree would drop down to 1 and it would get processed before its second neighbor in A. The second neighbor in A then would have its degree decreased and would not receive advice.
We observe that k 1 = 12 − 2k 2 , since the four vertices in A each have three neighbors, and those neighbors which are shared among vertices in S are shared by the k 2 vertices of type 2. Our goal is to show that such a connected component has at least 13 vertices. The size of the connected component is |A ∪ S| = |A| + |S| = 4 + k 2 + k 1 + k 0 = 4 + 12 − k 2 + k 0 . It suffices to show that |S| = 12 − k 2 + k 0 ≥ 9. We consider several cases: Case: k 2 ≤ 3. We have |S| ≥ 12 − k 2 ≥ 9 and we are done.
Case: k 2 = 6, k 1 = 0. The following description refers to the figure below: After a 1 , with neighbors v 1 , v 2 , and v 3 , is processed, the algorithm would select a vertex that has one neighbor in common with a 1 . This is vertex a 2 . Without loss of generality the common neighbor is v 1 . After processing a 2 , the degree of v 1 becomes 1. Then v 1 is rejected and its remaining neighbor is accepted. The remaining neighbor cannot be any of the v 2 , . . . , v 6 , since otherwise accepting it would reduce the degree of either a 3 or a 4 prior to seeing them. Thus, the third neighbor of v 1 has to be a new vertex of type 0. The other neighbors of a 2 are called v 4 and v 5 . The vertex a 3 is processed next, and it must share neighbors with both a 1 and a 2 ; both a 3 and a 4 must share neighbors with both a 1 and a 2 since all of these neighbors are of type 2 and there are four neighbors of a 1 and a 2 remaining. Say that a 3 is adjacent to v 2 and v 4 . After accepting a 3 , the degrees of v 2 and v 4 both become 1. One of them is rejected first, say v 2 (note that essentially the same argument works if it was v 4 , switching the roles of a 1 and a 2 , as well as those of v 2 and v 4 ), and its remaining neighbor is accepted. That remaining neighbor cannot be any neighbor of a 4 , or that would reduce the degree of a 4 . Since v 1 no longer has any neighbors, only v 4 is possible among the type 2 vertices. If v 4 is accepted, though, the remaining neighbor of a 3 is rejected. Since this remaining neighbor of a 3 is also a neighbor of a 4 , the degree of a 4 is reduced. Thus, the third neighbor of v 2 has to have type 0. This third neighbor cannot have been the third neighbor of v 1 , since otherwise the degree of v 2 would be reduced to 1 when that neighbor was accepted. After processing a 1 , a 2 , a 3 the neighborhood of a 4 consists of v 3 , v 5 , v 6 , at most two of which could be neighbors of each other. The third neighbor of a 4 , say v 6 must have a third neighbor which cannot be one of the other type 2 vertices, v 1 , v 2 , or v 4 , since they all had two neighbors among a 1 , a 2 , and a 3 and have been processed themselves. Thus, the third neighbor of v 6 is of type 0. It is not either of the previous third neighbors of v 1 and v 2 since these have also already been processed. This implies that k 0 ≥ 3, giving at least 13 vertices in the component. Case: k 2 = 5, k 1 = 2. For this case, it is only necessary to show that k 0 ≥ 2.
As in the previous case, after a 1 , with neighbors v 1 , v 2 , and v 3 , is processed, the algorithm would select a vertex that has one neighbor in common with a 1 . This is vertex a 2 . Without loss of generality the common neighbor is v 1 . After processing a 2 , the degree of v 1 becomes 1. Then v 1 is rejected and its remaining neighbor is accepted. The remaining neighbor cannot be any of the four other vertices of type 2 or the vertices of type 1 if they are neighbors of a 3 or a 4 , since otherwise accepting it would reduce the degree of either a 3 or a 4 prior to seeing them. Call this vertex u.
Subcase: u is not a neighbor of any vertex in A. Then u has to be a new vertex of type 0.
After a 2 is processed, there are four neighbors of a 1 and a 2 that might still be neighbors of a 3 and a 4 . At most one of these four has type 1, so between a 3 and a 4 , they must be adjacent to at least three of these vertices. Since, according to the priorities, a 3 has at least as many neighbors already seen as a 4 , it must share at least one neighbor with each of a 1 and a 2 . Suppose, without loss of generality, that a 3 is adjacent to v 2 and v 4 . Both of these are reduced to degree 1 and processed before a 4 is processed. One of v 2 and v 4 is processed first and rejected, and its neighbor is accepted.
If the accepted neighbor is the other of v 2 (or v 4 ), then v 3 (or v 5 ) is rejected, since a 1 (or a 2 ) cannot have two accepted neighbors. Given the symmetries, assume without loss of generality that the accepted neighbor is v 2 and v 3 is rejected. In addition, since v 2 is adjacent to a 3 , the remaining neighbor of a 3 , v 6 , is also rejected. Now, the only unprocessed vertices that a 4 can be adjacent to are v 5 and v 7 , so there must be an extra vertex not considered yet, giving 13 vertices.
If the accepted neighbor of v 2 or v 4 is not the other of v 2 or v 4 , they each have an edge to some other vertex, u, possibly the same one. That vertex u is accepted, so it cannot be a neighbor of a 4 , or a 4 would not have had degree 3 when it was processed. In addition, it cannot have been any already processed vertex, since then v 2 or v 4 would have had degree only 1 and been processed before a 3 was processed. The only other possibility remaining is that they are adjacent to v 3 (or v 5 ). Then, there are already three neighbors defined for each of v 1 , v 2 , v 3 (or v 5 ), and v 4 . Thus, a 4 is adjacent to v 6 , v 7 , and v 5 (or v 3 ). Note that u cannot have had degree 1 initially, or the degree of v 1 would have been reduced to 1 before a 2 was processed. If u was adjacent to v 5 (or v 3 ) or v 6 , then the degree of that vertex would have been reduced to 1 before a 4 was processed, so if there are only twelve vertices in the component, it is adjacent to v 7 . When a 4 is processed, each of its three neighbors must still have an additional neighbor not discussed yet. Since they all three have two neighbors already discussed, this is impossible; one of them would get degree larger than 3.
Subcase: u is a neighbor of a 1 or a 2 . Suppose for the sake of contradiction that each of the vertices of type 1 has one neighbor in {a 1 , a 2 }. Since a 1 and a 2 share a neighbor, they have five distinct neighbors between them, three of which are type 2. Thus, there are two type 2 vertices, say v 6 and v 7 , which are not neighbors of a 1 or a 2 . For this to happen, both of v 6 and v 7 must be neighbors of both of a 3 and a 4 , but then a 4 would have been accepted without advice. Thus, at least one of the vertices of type 1 is a neighbor of a 3 or a 4 .
Without loss of generality, assume that u is v 2 . After v 2 is accepted and before a 3 arrives, according to the algorithm, v 3 , the remaining neighbor of a 1 is rejected, since a 1 should not have two accepted neighbors. If v 2 has any neighbor, other than v 1 and a 1 , that neighbor's degree is decreased by 1, so v 2 has type 1. The vertex v 3 must have some other neighbor than a 1 , since v 3 was not rejected before a 1 was processed. Any such neighbors of v 3 must be accepted before a 3 arrives. Thus, v 3 has type 1 also. Now there are no vertices of type 1 adjacent to a 3 or a 4 , and we know that this cannot happen.
Case: k 2 = 4, k 1 = 4. In this case we want to argue that k 0 ≥ 1. Let v 1 , v 2 , v 3 , v 4 be the vertices of type 2, and u 1 , u 2 , u 3 , u 4 be the vertices of type 1. We can split this into further subcases depending on the number of neighbors of each vertex in A of type 1: Subcase: 3, 1, 0, 0. This means that one of the a i has 3 neighbors of type 1, another a i has 1 neighbor of type 1 and the remaining a i have no neighbors of type 1. Because of other restrictions, this case is actually impossible. To see this suppose a i 1 is adjacent to u 1 , u 2 , u 3 , and a i 2 is adjacent to u 4 and v 1 , v 2 . Since a i 3 can have only 1 neighbor in common with a i 2 , the neighborhood of a i 3 consists of v 1 , v 3 , v 4 or v 2 , v 3 , v 4 . In either case, a 4 can have at most one neighbor in common with each of a 2 and a 3 . This means that a 4 can have at most one neighbor from v 1 , v 2 and at most one neighbor from v 3 , v 4 , but all of its three neighbors must come from v 1 , v 2 , v 3 , v 4 , which is impossible. Note that this argument is independent of the order in which the a i receive the advice. Thus, our assumptions about neighborhoods of a 1 , a 2 , a 3 , a 4 were without loss of generality. This case is illustrated in a figure below (the offending topology -two neighbors in common -is highlighted in red).
Subcase: 2, 2, 0, 0. This is impossible similar to subcase 3, 1, 0, 0. The argument will be completely topological, based on the types of neighborhoods possible. Assume that the two vertices that have two neighbors of type 1 are a i 1 and a i 2 . Suppose that the neighborhood of a i 1 is u 1 , u 2 , v 1 . The neighborhood of a i 2 is u 3 , u 4 together with some vertex v i . This v i cannot be v 1 , since otherwise, vertices a i 3 and a i 4 would have 3 neighbors in common -v 2 , v 3 , v 4 . Thus, v i = v 1 . Without loss of generality assume that it is v 2 . If a i 3 has v 1 , v 2 as neighbors, then a i 4 can only be adjacent to v 3 and v 4 (since v 1 , v 2 are of type 2 and already have two neighbors among the a i ), but a i 4 has to have three neighbors among v 1 , . . . , v 4 . Therefore, a i 3 can only be adjacent to one of v 1 , v 2 . Without loss of generality assume it is v 1 . Thus, the neighborhood of a i 3 is exactly v 1 , v 3 , v 4 . This implies that a i 4 has neighborhood v 2 , v 3 , v 4 , but then a i 3 and a i 4 have two neighbors in common. This is a contradiction. This situation is shown in the figure below.
Subcase: 1, 1, 1, 1. In this case we assume that a i is adjacent to u i . Each of a i has two neighbors among v 1 , . . . , v 4 . Without loss of generality assume that a 1 has neighborhood u 1 , v 1 , v 2 . The vertex a 2 has exactly one neighbor in common with a 1 among v 1 , . . . , v 4 . Without loss of generality assume that a 2 has neighborhood u 2 , v 1 , v 3 . Thus, v 1 already has two neighbors a 1 and a 2 . After processing a 1 and a 2 , the degree of v 1 is decreased to 1. It will be rejected and its neighbor will be accepted. The neighbor of v 1 cannot be any of the v i , since otherwise the degree of a 3 or a 4 would decrease and they would not be receiving advice. It also cannot be u 3 or u 4 for the same reason. If the third neighbor of v 1 is u 1 , then one of u 1 and v 1 will be accepted resulting in v 2 being rejected next. Then, the remaining neighbors of v 2 would be accepted, so there would be no advice with a 3 or a 4 , whichever is a neighbor of v 2 . This is a contradiction. The only remaining possibility is that the third neighbor of v 1 is u 2 . Similarly, one of the v 1 and u 2 will be accepted, so the third neighbor of a 2 , namely, v 3 will be rejected. This again results in either a 3 or a 4 being accepted without needing advice. This is a contradiction again. The only option to make this subcase feasible is if the third neighbor of v 1 is some other vertex. But this implies that k 0 ≥ 1. This situation is shown in the figure below. Subcase: 2, 1, 1, 0. Let a i 1 be the vertex with two type 1 neighbors, a i 2 and a i 3 be the vertices with one type 1 neighbor each, and a i 4 be the vertex with no type 1 neighbors. Without loss of generality we can assume that the neighborhood of a i 4 is v 2 , v 3 , v 4 . Note that a i 1 cannot have v 1 in its neighborhood, for otherwise either a i 2 or a i 3 would have their v i neighbors contained among v 2 , v 3 , v 4 which would mean that either a i 2 or a i 3 would share at least two neighbors with a i 4 . Thus, the neighborhood of a i 2 can be taken to be u 3 , v 1 , v 2 and the neighborhood of a i 3 is u 4 , v 1 , v 3 . This means that the neighborhood of a i 1 has to be u 1 , u 2 , v 4 . This is shown in the figure below.
After processing a 1 , a 2 , the degree of one of the v i is decreased to 1, it is rejected, and its neighbor is accepted (or vice versa).
Then the third neighbor of v 1 cannot be any of the u 1 , u 2 , v 2 , v 3 , v 4 for otherwise it would decrease the degree of either a i 1 or a i 4 prior to them receiving advice. The third neighbor of v 1 cannot be u 3 for otherwise a i 2 would have an accepted neighbor, u 3 or v 1 resulting in rejecting v 2 and decreasing the degree of a i 4 prior to a i 4 receiving advice. Similarly, the third neighbor of v 1 cannot be u 4 . Thus, it has to be a new vertex that is different from u 1 , . . . , u 4 , v 1 , . . . , v 4 . This implies that k 0 ≥ 1. Now, suppose that the first vertex from v 1 , . . . , v 4 , whose degree is decreased to 1 after processing a 1 and a 2 is v 2 . Then {a 1 , a 2 } = {a i 2 , a i 4 }. The argument that k 0 ≥ 1 is similar to the one in the previous paragraph. Consider the third neighbor of v 2 . It cannot be u 1 , u 2 , u 4 , v 1 , v 3 , v 4 for otherwise the degree of either a i 1 or a i 3 would decrease prior to them receiving advice. It also cannot be u 3 for otherwise a i 2 would have an accepted neighbor and then v 1 would be rejected and the degree of a i 3 would drop prior to it receiving advice. Thus, the third neighbor of v 2 has to be a new vertex different from u 1 , . . . , u 4 , v 1 , . . . , v 4 , and k 0 ≥ 1.
If the first vertex from v 1 , . . . , v 4 whose degree is decreased to 1 after processing a 1 and a 2 is v 3 , the analysis is symmetric to the one done in the previous paragraph by exchanging the roles of a i 2 and a i 3 . So we can conclude that k 0 ≥ 1 in this case, as well.
Lastly, consider when the first vertex from v 1 , . . . , v 4 whose degree is decreased to 1 after processing a 1 , a 2 is v 4 . In this case, {a 1 , a 2 } = {a i 1 , a i 4 } and we consider the third neighbor of v 4 . It cannot be any of the u 3 , u 4 , v 1 , v 2 , v 3 for otherwise the degree of either a i 2 or a i 3 would drop prior to the vertex receiving advice. Without loss of generality, assume that the third neighbor of v 4 is u 1 (if it is a new vertex then k 0 ≥ 1 and we are done). Then, since a i 1 would already have an accepted neighbor, u 2 would be rejected prior to processing a i 2 or a i 3 . This means that the remaining neighbor(s) of u 2 would be accepted. This neighbor cannot be u 3 , u 4 , v 1 , . . . , v 4 . Thus, the neighbor is u 1 unless k 0 ≥ 1. Also, u 2 cannot have a third neighbor unless k 0 ≥ 1. This topology is depicted in the figure below (what happens to the topology after processing a 1 , a 2 is shown on the right): The next vertex to receive advice is either a i 2 or a i 3 . Their roles are symmetric, so assume it is a i 2 . The degree of v 2 then drops to 1 (since a i 4 has also already been processed), it is rejected and its neighbor is accepted. Its neighbor cannot be any of u 4 , v 1 , v 3 for otherwise the degree of a i 3 would be decreased prior to it receiving advice. If the third neighbor of v 2 is u 3 , then a i 2 would have one accepted and one rejected neighbor, so v 1 would be rejected prior to a i 3 receiving advice. Hence, the only option is for the third neighbor of v 2 to be a new vertex that is not among u 1 , . . . , u 4 , v 1 , . . . , v 4 . This implies that k 0 ≥ 1.
Thus, this case with k 2 = 4 and k 1 = 4 is done since we have shown that in all subcases we have k 0 ≥ 1.
End of proof of Theorem 4.2.

Hardness results using a template
In this section, we present a template for proving lower bounds on how much advice is needed for an adaptive priority algorithm to achieve a certain competitive ratio. The results hold in the strongest model for adaptive priority algorithms with advice. Many proofs in this section are very similar to those presented in [7]. There are two major differences: (1) the proof of the general framework result proceeds by reducing from string guessing directly, and (2) we show how to handle adaptive priorities. Thus, we present the proofs here in their entirety for completeness.
The following online problem, while seeming artificial has been used extensively in proving lower bounds for online algorithms with advice. Here we use it for adaptive priority algorithms with advice.

Definition 5.1
The Binary String Guessing Problem [4] with known history (2-SGKH) is the following online problem. The input consists of (n, σ = (x 1 , . . . , x n )), where x i ∈ {0, 1}. Upon seeing x 1 , . . . , x i−1 an algorithm guesses the value of x i . The actual value of x i is revealed after the guess. The goal is to maximize the number of correct guesses.
Böckenhauer et al. [4] provide a trade-off between the number of advice bits and the approximation ratio for the binary string guessing problem. This can be used to show that a linear number of bits of advice are necessary for many problems. Our template is restricted to binary decision problems since the goal is to derive inapproximability results based on the 2-SGKH problem, where guesses (answers) are either 0 or 1. In our reduction from 2-SGKH to a problem B, we assume that we have a priority algorithm ALG with advice for problem B, with priorities defined by priority functions which may vary between inputs to ALG. The current priority function will generally be referred to as P . Based on ALG, its advice, and its priority functions, we define an online algorithm ALG with advice (the reduction algorithm) for 2-SGKH. The reduction is advice-preserving, since ALG only uses the advice that ALG does, no more. The input items, n, x 1 , x 2 , . . . , x n with x i ∈ {0, 1}, to 2-SGKH arrive in an online manner, so after n arrives, ALG must guess x 1 , and then the actual value of x 1 is revealed. More generally, immediately after the value x i is revealed, ALG must guess x i+1 and then the actual value x i+1 is revealed. When x n is revealed ALG knows that this is the end of the input. At the end, there is some post-processing to allow ALG to complete its computation.
From the value n, ALG initially creates a superset of the input items to problem B, and those items in the subset that is eventually chosen have to be presented to ALG, one at a time, always respecting the current priority function P . Responses from ALG on some of these input items are then used by ALG to help it answer 0 or 1 for its current x i . The main challenge is to ensure that the input items to ALG are presented in the order determined by the priority functions, which may change over time.
Here, we give a high level description of a specific kind of gadget reduction. A gadget G for problem B is simply some constant-sized instance for B, i.e., a collection of input items that satisfy the consistency condition for problem B. For example, if B is a graph problem in the vertex arrival, vertex adjacency model, G could be a constant-sized graph, and the universe then contains all possible pairs of the form: a vertex name coupled with a list of possible neighboring vertex names. Note that each possible vertex name exists many times as the vertex of an input in the universe, because it can be coupled with many different possible lists of neighboring vertex names, to make all isomorphic instances of the gadget possible. The consistency condition must apply to the actual input chosen, so for each vertex name u which is listed as a neighbor of v, it must be the case that v is listed as a neighbor of u.
The gadgets used in a reduction can be thought of as being created in pairs (gadgets in a pair may be isomorphic to each other, so that they are the same up to renaming), one pair for each x i . The universe of input items corresponding to a gadget pair, (G a , G r ), must be the same for both gadgets in the pair. When the first of the input items from (G a , G r ) has highest priority according to the current P and is given to ALG, that item could theoretically be from either G a or G r . Then, depending on whether the actual value of the next input x i to ALG is 0 or 1 (and possibly also depending on the form, e.g., the degree for the vertex for a graph), one of G a and G r is chosen, and the remaining input items from that gadget are eventually presented to ALG, when their priorities are high enough. The reduction algorithm removes the other input items from the universe corresponding to G a and G r once one instance of the gadget pair is chosen.
Recall that max P R denotes the first item in a set R according to the current priority function P , i.e., the highest priority item. For now, assume that ALG responds "accept" or "reject" to any possible input item. This captures problems such as vertex cover, independent set, clique, etc.
Suppose that the universes for the gadget pairs created are (G 1 , G 2 , . . . , G n ). The universe of input items for the n different pairs must be disjoint, so that an input item identifies which gadget pair it belongs to. Each gadget pair satisfies two conditions: the first item condition, and the distinguishing decision condition. The first item condition says that the first input item chosen from a gadget pair during the execution of ALG, first(G j ), gives no information about which of the two gadgets, (G a j , G r J ), it is in. Suppose P is the priority function when this first item has highest priority, and suppose that x i is the value ALG should guess at that point. Then, first(G j ) = max P G a j = max P G r j (the second equality holds since we assume the two gadgets have the same input universe). The distinguishing decision condition says that the decision with regards to item first(G j ) that results in the optimal value of the objective function in G a j is different from the decision that results in the optimal value of the objective function in G r j . One gadget, G a j , is presented to ALG when x i is revealed to be 0 and the other, G r j , when it is 1. To be concrete, we start with an example reduction from 2-SGKH to Vertex Cover.

Example: Vertex Cover
To illustrate how gadgets are used in the general reductions, we consider the Vertex Cover problem, as in Section 4, in the vertex arrival, vertex adjacency input model, with adaptive priority algorithms with advice. Thus, for each vertex, when it becomes the highest priority vertex, the priority algorithm must decide whether or not to "accept" or "reject" that vertex, where at the end, for every edge in the graph, at least one of its neighbors must be accepted.
We use the construction from [6] (which was reused in [7]) to obtain two pairs of gadgets, one if the highest priority input item has degree 2 and the other if it has degree 3. For each gadget pair, the universe of input items contains the names of seven vertices (the same names are used for both pairs of gadgets), and for each of the vertices all possibilities (names of vertices and lists of neighbors) for both degrees two and three. The gadgets are based on the graphs in Fig. 3, where both graphs have vertex covers of size 3. It is important to keep in mind that the numbers (or rather labels) shown in the figure are for our reference only and do not represent actual input items given to an algorithm. The figure represents the topological structure of the inputs. The actual input items would be created out of all consistent labeling of such graphs. To illustrate this point, consider vertex 1 in Graph 1. It is adjacent to vertices 2 and 6. The corresponding data item could happen to be (1, {2, 6}), but it could also be (5, {2, 3}), for example. In the latter case, the actual input vertex 5 would be mapped to the vertex labeled 1 in the figure, vertex 2 would be mapped to label 2, and vertex 3 would be mapped to label 6. In total there are 7 × 6 × 5 possible input items that could be associated with the vertex labeled 1 in Graph 1. As a particular item is associated with a particular vertex, this reduces the number of items that could be associated with the following vertices because of consistency requirements. In order to obtain a vertex cover of size 3, it is necessary to accept vertex 1 in Graph 1 and reject vertex 2 in Graph 1. Thus, the gadget pair, for the case where the first vertex in the gadget pair processed has degree 2, consists of two copies of Graph 1, where the first vertex processed is vertex 1 in the first gadget and vertex 2 in the second. Similarly, in order to obtain a vertex cover of size 3, it is necessary to accept vertex 3 in Graph 1 and reject vertex 1 in Graph 2. Thus, the gadget pair for vertices of degree 3 consists of Graph 1, where the first vertex processed is vertex 3, and Graph 2, where the first vertex processed is vertex 1. In both cases, making the correct decision on that first vertex enables a priority algorithm to obtain a vertex cover of size 3 by giving highest priority after that to neighbors of vertices which are already chosen, accepting if the known neighbor was rejected, and rejecting if the known neighbor was rejected. If the priority algorithm makes the wrong decision, the vertex cover it accepts will have size at least 4.
The first vertex processed from the universe for these gadgets must have either degree 2 or degree 3, so the reduction can continue with the correct gadget pair for that degree. Consider gadget pair j. If the first vertex chosen from this gadget pair has degree 2, then it must be possible for the vertex to be either vertex 1 or vertex 2 in Graph 1. Similarly, if the first vertex chosen from this gadget pair has degree 3, then it must be possible for the vertex to be either vertex 3 in Graph 1 or vertex 1 in Graph 2. Thus, the vertices in this pair could be named v j 1 , v j 2 , . . . , v j 7 . In the universe of inputs items for this pair, each of these 7 vertices could be present as a degree 2 vertex with each pair of the remaining vertices as its neighbors, and each could also be present as a degree 3 vertex with each triple of the remaining vertices as its neighbors. Then, no matter which of these vertices gets highest priority first, it is no problem for the reduction algorithm to give that vertex to ALG, use the answer from ALG to give a response of 0 or 1 to the next input, x i to 2-SGKH, receive the correct value for x i , and choose input items for gadget, G a j or G r j , which will cause ALG to accept a vertex cover of size 3 if the guess for x i was correct or size 4 otherwise. For each value x i guessed, ALG is given seven input items, so the input length for that problem is 7n. The number of gadgets where it obtains a vertex cover of size 3, instead of 4, is exactly the number of inputs to 2-SGKH where ALG guesses correctly. Thus, if ALG makes an error on εn of the x i , ALG obtains an approximation ratio of at most 3+ε 3 . Since Theorem 5.2 limits how many correct guesses ALG can make without a linear amount of advice, this limits how good an approximation can be made be ALG without any more advice: Theorem 5.3 For Minimum Vertex Cover and any ε ∈ (0, 1 2 ], no adaptive priority algorithm reading fewer than (1 − H(ε))n/7 advice bits can achieve an approximation ratio smaller than 1 + ε 3 .

General Results
In this subsection, we establish two theorems that give general templates for gadget reductions from 2-SGKH -one for maximization problems and one for minimization problems. A high level overview was given at the beginning of this section.
We let ALG(I) denote the value of the objective function for ALG on input I. The size of a gadget G, denoted by |G|, is the number of input items specifying the gadget. We write OPT(G) to denote the best value of the objective function on G. Recall that we focus on problems where a solution is specified by making an accept/reject decision for each input item. We write BAD(G) to denote the best value of the objective function attainable on G after making the wrong decision for the first item, first(G), (the item which was presented to ALG first, due to having highest priority), i.e., if there is an optimal solution that accepts (rejects) first(G), then BAD(G) denotes the best value of the objective function given that first(G) was rejected (accepted). We say that the objective function for a problem B is additive, if for any two non-interacting instances I 1 and I 2 to B (for example, two disjoint connected components in a graph), we have OPT(I 1 ∪ I 2 ) = OPT(I 1 ) + OPT(I 2 ).
Theorem 5.4 Let B be a minimization problem with an additive objective function. Let ALG be an adaptive priority algorithm with advice for B in Model 1. Suppose that for each positive integer j, one can construct a pair of gadgets, (G a j , G r j ), from a common universe of input items, G j , satisfying the following conditions: The first item condition: Let P denote the priority function in effect when first(G j ) was chosen.
first(G j ) = max P G a j = max P G r j . The distinguishing decision condition: the optimal decision for first(G j ) in G a j is different from the optimal decision for first(G j ) in G r j (in particular, the optimal decision is unique for each gadget). Without loss of generality, we assume first(G j ) is accepted in an optimal solution in G a j . The size condition: the gadgets have finite sizes; we let s = max j (|G a j |, |G r j |), where the cardinality of a gadget is the number of input items it consists of.
The disjoint copies condition: for j 1 = j 2 and x, y ∈ {a, r}, input items making up G x j 1 and G y j 2 are disjoint.
The gadget OPT and BAD condition: the values OPT(G a j ), BAD(G a j ), OPT(G r j ), BAD(G r j ) are independent of j, and we denote them by OPT(G a ), BAD(G a ), OPT(G r ), BAD(G r ); we assume that OPT(G r ) ≥ OPT(G a ).
Define r = min BAD(G a ) OPT(G a ) , BAD(G r ) OPT(G r ) . Then for any ε ∈ (0, 1 2 ), no adaptive priority algorithm reading fewer than (1 − H(ε))n/s advice bits can achieve an approximation ratio smaller than Proof The reduction is given in Algorithm 2.
Initially, all possible input items for all n gadget pairs are in the set R of remaining gadgets. The set Q, of unprocessed input items from the chosen gadgets for all of the gadget pairs where the first input item has already been processed, is initially empty. At any point in time, the highest priority input item still available in either R or Q is presented to ALG. If this item is the first input item from a gadget, first(G j ) from the jth gadget pair, first(G j ) is presented to ALG. If the next input to 2-SGKH to be processed is x i , ALG guesses 0 for x i if ALG accepts first(G j ) and 1 if ALG rejects. If the value of x i is revealed to be 0 (1), all of the input items for G a j (G r j ), except for first(G j ) are inserted into Q. All input items corresponding to that gadget pair are then removed from R. Now, only the actual input items for the correct gadget in the jth gadget pair remain, and those not processed yet are in Q. Input items from Q are presented to ALG and removed from Q Algorithm 2 Reduction Algorithm, ALG Given: ALG for problem B; The inputs to 2-SGKH are X = x 1 , . . . , x n 1: Create a set R of input items from n disjoint gadget pair universes {G 1 , G 2 , . . . , G n } 2: Q = ∅ Initialize Q, inputs from gadgets to be processed 3: i = 1 Current index of 2-SGKH input 4: while i ≤ n do

5:
Let P be the current priority function for ALG Update the priority function 6: if (Q = ∅ or max P (Q) < max P (R)) then Highest priority element is from R present v to ALG 10: answer 0 if ALG answers "accept" and 1 if "reject" Let P be the current priority function for ALG 24: q = max P Q 25: present q to ALG 26: when they have the highest current priority. Thus, input items are presented to ALG in the order defined by its priority functions.
The amount of advice is the same for both algorithms, so when it is (1 − H(ε))n bits for the n inputs to 2-SGKH, it is at least (1 − H(ε))n/s bits for the n ≤ sn inputs to B.
Now we turn to the approximation ratio obtained. We want to lower-bound the number of incorrect decisions by ALG. We focus on the input items which are first(G j ) and assume that x f (j) is the next input to 2-SGKH when first(G j ) is processed. Assume that ALG answers correctly on all inputs that are not first(G j ) for some j.
Note that the gadget pairs are designed such that if the correct answer by ALG for first(G j ) is "accept", then G a j is presented to ALG, and if it is "reject", G r j is given. Since G a j is given when x i = 0 and G r j when x i = 1, and ALG answers "accept" if and only if ALG answers 0, ALG answers correctly on first(G x j ), x ∈ {a, r} if and only if ALG answers correctly for x i . We know from Theorem 5.2 that for any ε ∈ (0, 1/2], any online algorithm with advice length less than (1 − H(ε))n makes at least εn mistakes on 2-SGKH. Since we want to lower-bound the performance ratio of ALG, and since a ratio larger than one decreases when increasing the numerator and denominator by equal quantities, we can assume that when ALG answers correctly, it is on the gadget with the larger OPT-value, G r . For the same reason, we can assume that the "at least εn" incorrect answers are in fact exactly εn, since classifying some of the incorrect answers as correct just lowers the ratio. For the incorrect answers, assume that the gadget G a is presented w times, and, thus, the gadget, G r , εn − w times.
Denoting the input created by ALG for ALG by I, we obtain the following, where we use that BAD(G x j ) ≥ r OPT(G x j ) for x ∈ {a, r}. Taking the derivative with respect to w and setting equal to zero gives no solutions for w, so the extreme values must be found at the endpoints of the range for w which is [0, εn].
The latter is the smaller ratio and thus the lower bound we can provide.
The following theorem for maximization problems is proved analogously.
Theorem 5.5 Let B be a maximization problem with an additive objective function. Let ALG be an adaptive priority algorithm with advice for B in Model 1. Suppose that for each positive integer j, one can construct a pair of gadgets (G a j , G r j ) satisfying the conditions in Theorem 5.4. Then for any ε ∈ (0, 1 2 ], no adaptive priority algorithm reading fewer than (1 − H(ε))n/s advice bits can achieve an approximation ratio smaller than 1 + ε(r − 1) OPT(G a ) ε OPT(G a ) + (1 − ε)r OPT(G r ) , where r = min OPT(G a ) BAD(G a ) , OPT(G r ) BAD(G r ) .
Proof The proof proceeds as for the minimization case in Theorem 5.4 until the calculation of the lower bound of ALG(I) OPT(I) . We continue from that point, using the inverse ratio to get values larger than one.
We use that for x ∈ {a, r}, BAD(G x ) ≤ OPT(G x )/r. The latter is the smaller ratio and thus the lower bound we can provide.
We mostly use Theorems 5.4 and 5.5 in the following specialized form. For a minimization problem, if OPT(G a ) = OPT(G r ) = BAD(G a ) − 1 = BAD(G r ) − 1, then no adaptive priority algorithm reading fewer than (1 − H(ε))n/s advice bits can achieve an approximation ratio smaller than 1 + ε OPT(G a ) . For a maximization problem, if OPT(G a ) = OPT(G r ) = BAD(G a ) + 1 = BAD(G r ) + 1, then no adaptive priority algorithm reading fewer than (1 − H(ε))n/s advice bits can achieve an approximation ratio smaller than 1 + ε OPT(G a )−ε . All of the gadget pairs used in [7] to prove lower bounds in the fixed priority model also work here in the adaptive priority model; there are no additional restrictions used in the proof here. The reductions done here are directly from 2-SGKH, as opposed to going through the Pair Matching problem, as in [7]. This actually makes the proofs simpler in most respects (except for having to take into account changing priority functions), and it means that one does not lose a factor 2 in the amount of advice required. Thus, the results from [7] can be expressed using Table 1 as adaptive priority algorithm with advice lower bounds: All of the ratios obtained approach 1 as the amount of advice approaches some fraction of n.

Conclusion and Open Problems
The extension of the adaptive priority model to the advice tape model leads to many new research directions. We consider the following open problems to be of particular interest: Table 1: Summary of results: For a given problem, and any ε ∈ (0, 1 2 ], no adaptive priority algorithm (in Model 1) reading fewer than the specified number of bits of advice can achieve an approximation ratio smaller than the ratio listed. Unit Job Scheduling with Precendence Constraints (1 − H(ε))n/9 1 + ε 6−ε 1. Design and analyze new adaptive priority algorithms with advice for (special cases of) classical optimization problems and convert them to offline algorithms by trying all possibilities for the advice. In particular, are there priority algorithms with advice that lead to faster (in terms of the base of the exponent) exact exponential time offline algorithms than the best known?
2. The previous question also applies to approximation algorithms, when the best known offline approximation algorithm is exponential in terms of running time.
3. Suggest and investigate other extensions of the adaptive priority framework besides the information-theoretic advice tape extension. For instance, one could consider a class of adaptive priority algorithms where advice is given by an AC0 circuit. What can be said about the power and limitations of such algorithms?
4. More generally, study structural complexity of priority algorithms with advice. What reasonable complexity classes can be defined based on advice complexity and approximation ratio?
5. The lower bounds implied by our reduction-based framework are of the form "constant inapproximability even given linear advice." Can this framework be extended to handle superconstant inapproximability with sublinear advice? More generally, the goal is to design some framework that could work in this other regime of parameters. A good starting point would be to show that maximum independent set cannot be approximated to within n 1−ε with O(log n) bits of advice, for any fixed ε ∈ (0, 1].