Adaptive Majority Problems for Restricted Query Graphs and for Weighted Sets

Suppose that the vertices of a graph $G$ are colored with two colors in an unknown way. The color that occurs on more than half of the vertices is called the majority color (if it exists), and any vertex of this color is called a majority vertex. We study the problem of finding a majority vertex (or show that none exists) if we can query edges to learn whether their endpoints have the same or different colors. Denote the least number of queries needed in the worst case by $m(G)$. It was shown by Saks and Werman that $m(K_n)=n-b(n)$, where $b(n)$ is the number of 1's in the binary representation of $n$. In this paper, we initiate the study of the problem for general graphs. The obvious bounds for a connected graph $G$ on $n$ vertices are $n-b(n)\le m(G)\le n-1$. We show that for any tree $T$ on an even number of vertices we have $m(T)=n-1$ and that for any tree $T$ on an odd number of vertices, we have $n-65\le m(T)\le n-2$. Our proof uses results about the weighted version of the problem for $K_n$, which may be of independent interest. We also exhibit a sequence $G_n$ of graphs with $m(G_n)=n-b(n)$ such that $G_n$ has $O(nb(n))$ edges and $n$ vertices.


Introduction
Given a set X of n balls and an unknown coloring of X with a fixed set of colors, we say that a ball x ∈ X is a majority ball if its color class contains more than |X|/2 balls. The majority problem is to find a majority ball (or show that none exists). In the basic model of majority problems, one is allowed to ask queries of pairs (x, y) of balls in X to which the answer tells whether the color of x and y is the same or not, which we denote by SAME and DIFF, respectively. The answers are given by an adversary whose goal is to force us to use as many questions as possible. It is an easy exercise to see that if the number of colors is two, then in a non-adaptive search (all queries must be asked at once) the minimum number of queries to solve the majority problem is n − 1, unless n is odd, in which case n − 2 queries suffice. On the other hand, Fisher and Salzberg [9] proved that if we do not have any restriction on the number of colors, ⌈3n/2⌉ − 2 queries are necessary and sufficient to solve the majority problem adaptively (the answer to a query is known before asking the next one). If the number of colors is two, then Saks and Werman [16] proved that the minimum number of queries needed in an adaptive search is n − b(n), where b(n) is the number of 1's in the binary form of n (we note that there are simpler proofs of this result, see [1,14,17]). There are several other generalizations of the problem, which include more colors [2,4,10,12], larger queries [3,4,6,8,11,12,13], non-adaptive [1,5,10] and weighted versions [10].
In the present paper we study the adaptive majority problem for two colors when we restrict the set of pairs that can be queried to the edges of some graph G on n vertices. The original majority problem, where we can ask any pair, corresponds to G = K n . To distinguish between the version when we are restricted to the edges of a graph, and the original, unrestricted version, we call the colored objects vertices and balls, respectively. To the best of our knowledge, the only similar result is [7], where it was shown that if the size of the two color classes differs only by a constant, then Ω(n) queries might be needed on any graph even for a randomized algorithm to solve the majority problem, but if the sizes differ by Ω(n), then randomized algorithms can do better for several graph classes. In this paper, however, we only deal with the worst-case performance of deterministic algorithms, which allows us to obtain better bounds.
Notice that it is possible to solve the majority problem (with any number of queries) if and only if G is connected when n is even, and if and only if G has at most two components when n is odd. For any such graph, denote the minimum number of queries needed to solve the majority problem in the worst case by m(G). Obviously we have n − b(n) = m(K n ) ≤ m(G) ≤ n − 1 (moreover, m(G) ≤ n − 2 when n is odd). Our main results are the following. Theorem 1. 1. For every tree T on an even number n of vertices m(T ) = n − 1 and for every tree T on an odd number n of vertices m(T ) ≥ n − 65.
The constant 65 is probably far from optimal. We will see trees T on n vertices with m(T ) = n − 3, but it is possible that m(T ) ≥ n − 3 holds for every tree. We have a better lower bound, n − 6, for paths . We also study the least number of edges a graph must have if we can solve the majority problem as fast as in the unrestricted case, i.e., when m(G) = n − b(n). Theorem 1. 2. For every n, there is a graph G with n vertices and n(1 + b(n)) edges such that m(G) = n − b(n).
It would be interesting to determine whether this bound can be improved to O(n), or show a superlinear lower bound.
The proof of Theorem 1.1 uses a weighted version of the original (i.e., G = K n case of the) majority problem, which is defined in the next section. We think these results are interesting on their own.
In the following, we always suppose that only two colors are used, which we call red and blue. When both colors contain the same number of balls, then we call the coloring balanced.

Weighted majority problems
Now we define a variant of the majority problem, where the balls are given different weights. More precisely, given k balls with non-negative integer weights w 1 , . . . , w k , a ball is a (weighted) majority ball if the weight of its color class is more than k i=1 w i /2. The (weighted) majority problem is to find a majority ball (or show that none exists). We will often identify a ball with its weight or its index and talk about a ball with weight w i , or the ball w i , or the ball i.
Note that during the running of an adaptive algorithm solving the non-weighted majority problem, at any point the information obtained so-far can be represented by a graph whose vertices are all balls, and the queries asked are edges labeled with DIFF or SAME. Since now we study the majority problem only for two colors, we can deduce from the labels of the edges the color partition inside every component. Denote the difference between the sizes of the color classes in each component by w i . Finishing the algorithm from a given state is equivalent to solving the majority problem with the weights w i . 1 Similarly, in the weighted version when we ask a ball with weight w i and a ball with weight w j , we can consider the answer as merging the two balls into a ball with weight w i + w j or |w i − w j |, depending on the answer. We will say that the new ball contains the two previous balls.
A set of k balls with given weights w 1 , . . . , w k can be represented by a vector w = (w 1 , . . . , w k ). We denote the number of queries needed to solve the weighted majority problem in the worst case by m(w). So, with this notation, the result of Saks and Werman for the non-weighted problem can be written as m(1, . . . , The weighted problem was first studied in [10], where the following proposition (which also implies the result of Saks and Werman) was proved, generalizing a result of [14] (which built on [15]) about the non-weighted variant. Let µ(k) denote the largest l such that 2 l divides k (and define µ(0) = ∞). For w = (w 1 , . . . , w k ) denote by p the number of balanced colorings and by p i the number of (non-balanced) colorings such that w i is in the majority class. 2 It was also shown in [10] that m(w) = k − µ(p) for µ(p) ≤ 2, but not in general, e.g., for w = {1, 2, 3, 4, 5, 6, 7} we have 8 balanced colorings, but m(w) = 5 > 7 − µ(8).
Our main results about the weighted majority problem are exact bounds for some special w. They are based on the following lemma. (i) If w 1 = · · · = w 2 n = 1 and k i=1 w i = 2 n+1 , then m(w) = k − 1. (ii) If w 1 = · · · = w 2 n = 1, k i=1 w i = 2 n+1 + 1 and k = 2 n + 1, then m(w) = k − 2. Proof. For both statements we use Proposition 2.1. Let us start with (i). We are going to calculate µ(p). First color only the balls whose index is from {2 n + 1, . . . , k}. Denote by B and R, respectively, the blue and red balls whose index is from {2 n + 1, . . . , k}. Let x := | i∈B w i − i∈R w i |. Note that x is an even integer, and for a fixed x there are 2 2 n 2 n−1 −x/2 ways in which we can color {1, . . . , 2 n } to make the coloring balanced (note that this value is obtained by considering both the cases i∈B w i − i∈R w i ≤ 0 and i∈B w i − i∈R w i ≥ 0). It is easy to see that 2 2 n 2 n−1 −x/2 is divisible by 4 except for the case x = 2 n , when this number is exactly 2. This means that the number of balanced colorings is 2 mod 4. Thus Proposition 2.1 (i) gives m(w) ≥ k − 1, and since m(w) ≤ k − 1 always holds, we have equality.
The proof of (ii) goes similarly. We are going to calculate µ(p k ). We define B and R in the same way. Without loss of generality we can assume that ball k is blue and let y = i∈B w i − i∈R w i . Then the number of colorings of the first 2 n balls such that blue is the 2 Beware that in [10] a slightly different notation was used, where p denoted the number of balanced 2-partitions, which is half of the number of balanced colorings, and part (ii) of Proposition 2.1 was not explicitly stated. majority color is 2 n i=2 n−1 −⌊y/2⌋ 2 n i . If y < 2 n , each term here is divisible by 2, except the last one, while if y > 2 n , each term is divisible by 2. There are 2 k−1−2 n ways to color the balls from {2 n + 1, . . . , k − 1} out of which y > 2 n only once. This means that when ball k is blue, then the number of colorings (of all the balls) where blue is the majority color is odd. Of course, the same is true when ball k is red. Thus Proposition 2.1 (ii) gives m(w) ≥ k − 2, and since m(w) ≤ k − 2 holds whenever k i=1 w i is odd, we have equality. Note that in the above proof we do not need at all that the first few weights are 1, we only need that they are the same and that their number is a power of two. We also do not need in (i) that the sum of all the weights is exactly 2 n+1 , only that x = i∈B w i − i∈R w i = 2 n can happen in an odd number of ways. We also do not need in (ii) that the sum of all the weights is exactly 2 n+1 + 1, only that −2 n < y = i∈B w i − i∈R w i ≤ 2 n can happen in an odd number of ways (with fixed ball k always blue). A sufficient condition for this is that y > −2 n always holds, while y ≤ 2 n holds except when all balls are blue, i.e., we need w k − k−1 i=2 n +1 w i > −2 n and k i=2 n +1 w i − w j ≤ 2 n + w j for any 2 n < j < k. To summarize, this proves the following. Lemma 2. 3. Let w = (w 1 , . . . , w k ) and k > 2 n + 1.
For the bound m(w) ≥ k − 1 − s we need one more trick. We run the adversarial algorithm that gives the lower bound in Lemma 2.2, until the first ball of weight 1 is queried. When this happens, then we reveal that it is red, and we reveal another ball of weight 1 that it is blue. We do this s times, and after that proceed according to the adversarial algorithm.
Call a vector w = (w 1 , . . . , w k ) hard if m(w) = k − 1 and n i=1 w i is even, or n i=1 w i is odd and m(w) = k − 2. Thus Lemma 2.2 states that the vectors satisfying its conditions are hard.
Proof. If k = 2, then w ′ is hard by definition. Let us assume that k ≥ 3 and w ′ is not hard. We will show that w is not hard either. We ask w 1 and w 2 in the first query. If the answer is DIFF, we can obviously finish with k − 3 further queries if k i=1 w i is even and k − 4 further queries if k i=1 w i is odd, thus w cannot be hard. If the answer is SAME, we apply our algorithm for w ′ to reach the same conclusion using that w ′ is not hard.
Question 2. 6. Does the reverse direction also hold in the above observation?
Also, the respective statement might hold when m(w) is smaller, but we do not know of other, generally applicable sufficient conditions. Combining Observation 2.5 with Lemma 2.2, we obtain the following statement.
Lemma 2. 7. If w 1 , . . . , w j are each powers of two and Note that (i) of Lemma 2.2 states that if the sum of the weights in w is a power of two, and at least half of that weight is given by balls of weight 1, then w is hard. Lemma 2.7 shows that weight 1 can be replaced by any weights that are powers of two, and (ii) of Lemma 2.2 can be similarly improved. If Observation 2.5 held for non-hard vectors as well, then we could obtain an improvement of Lemma 2.7, similar to Corollary 2. 4. Instead, we state the following weaker statement.
Corollary 2. 8. If w 1 , . . . , w j are each powers of two and j i=1 w i = 2 n , w j+1 = 1, k i=1 w i = 2 n+1 + 3 and k > 2 n + 2, then m(w) ≥ k − 3. Proof. If w i = 1 for every i > j, then we know that m(w) = k − 2. Otherwise, before the start of the algorithm, we can reveal that ball j + 1 and the heaviest ball with index > j + 1 have different colors, reducing the problem to (ii) of Lemma 2.7.
Combining Lemma 2.3 and Corollary 2.8 we obtain the following.
If there is some i > j such that w i = 1, we are done using Corollary 2. 8. Otherwise, for all j < i ≤ k we have w i ≥ 2, and we can apply Lemma 2. 3.
The next subsection, contrary to its title, is not so relevant to our proof, but it helps to understand better what can happen before the final steps of an optimal algorithm that solves the majority problem.

Relevant balls
Given w = (w 1 , . . . , w k ), we say that w i is relevant 3 if there is a coloring of the other balls such that the color of w i changes what the majority color is, or whether a majority color exists. In other words, there is a coloring of the other balls, such that either w i is red means red is majority and w i is blue means blue is majority, or one color of w i means there is no majority, the other color means there is majority. In this subsection we prove some simple facts about relevant balls. We start with some simple observations. Proposition 2. 10. (i) If m(w) = 0, then either there is no relevant ball and the answer is that there is no majority, or there is one relevant ball and that is the majority ball.
(ii) If we obtain w ′ from w by any answer to a query (a, b) such that b is non-relevant, then m(w ′ ) = m(w).
(iii) If we increase the weight of a relevant ball, it remains relevant. In other words, if we obtain w ′ from w by replacing one ball (v) If a ball x is relevant before a query Q not containing x, there is at least one answer to Q such that x is still relevant afterwards. If a ball x is relevant before a query (x, y), then after the answer SAME the resulting ball with weight x + y is relevant.
(vi) For any query (a, b) there is an answer such that the number of relevant balls decreases by at most two. Proof. To prove (i), observe that if we color all the balls blue, there is a majority, unless all the balls have zero weight, in which case there is no relevant ball. Thus if there is no majority in any coloring of w, then there cannot be relevant balls. If there is a majority ball a, then it has to be the only relevant ball. Indeed, if b is relevant, then changing the color of b must change the answer, unless b was the answer.
To prove (ii), let c be the ball that the answer to the query (a, b) gives, thus it has weight either a + b or |a − b|. Any algorithm that gives a solution for w gives a solution for w ′ , where asking a ball that contains a is replaced by the query that contains c, and ignoring queries that involve b. A similar argument shows that a solution for w ′ gives a solution for w. Therefore, To prove (iii), assume for a contradiction that w ′ i is not relevant and take a coloring of the other balls that shows this. But then taking the same coloring for the other balls also shows that w i is not relevant, a contradiction.
To prove (iv), observe first that it is equivalent to the statement that a non-relevant ball cannot have larger or equal weight than a relevant ball. Indeed, this is obviously implied by (iv), and if this holds, than t can be chosen as the largest weight of a non-relevant ball. Now assume a is relevant and w(b) ≥ w(a). Then consider the coloring that shows a is relevant, and exchange the color of b and the color of a. This coloring clearly shows b must be also relevant.
To prove (v), assume x is not in the query and consider a coloring of the other balls such that the color of x decides the majority. In that coloring the balls in the query have different or same color; answer accordingly. Then the same coloring shows x is still relevant.
Assume now x is in the query (x, y) and the answer is SAME. Consider a coloring of the other balls such that the color of x decides the majority. Taking the same coloring, the new ball x + y will decide majority.
To prove (vi), consider the relevant ball x not in the query with the smallest weight. By (v) there is an answer such that x remains relevant. As other relevant balls have larger weight, they also remain relevant, except for a and b (whose total weight can go below the weight of x).
We remark that (vi) of the above proposition gives a new proof of a proposition from [10], which states that if all the n balls are relevant, we need at least ⌊n/2⌋ queries.
Proposition 2. 11. Before the last query of an optimal algorithm, there are either two relevant balls, they are of equal weight and there are no other balls with non-zero weight, or there are three relevant balls, and any query that compares two of them finishes the algorithm.
Proof. Proposition 2.10 implies that there are at most three relevant balls. Observe that if there are two relevant balls a and b in w, they must have the same weight. Indeed, if w(a) > w(b) and the total weight m of the other balls is smaller than w(a) − w(b), then a is the only relevant ball. If m ≥ w(a) − w(b), then color b red, a blue and go through the other balls in increasing order of their weight, without the last ball c. We give each of them the color which has the smaller weight at that point. The first ball gets the color red, but as m ≥ w(a) − w(b), at one point the total weight of red balls becomes at least the total weight of blue balls. From that point, the difference between the classes is at most the weight of the current ball, which is at most w(c). This coloring shows c is relevant.
It is left to show that if there are exactly three relevant balls, a, b and c, querying any two of them (say a and b) finishes the algorithm. Let m be the sum of the weights of the other balls. If m = 0, we are done unless both a + b and |a − b| are equal to c, which means b = 0, but a ball with zero weight cannot be relevant. Thus we can assume m > 0. We have a ≤ b + c + m, otherwise we are done (which contradicts our assumption that we are before the last query). But we also have a + m ≤ b + c, because the other balls are not relevant. Moreover, if a + m = b + c, then again, some of the other balls would be relevant, thus we have a + m < b + c. This implies c > a − b + m, thus we are done if the answer is DIFF, as c is a majority ball. We also have a + b + m ≥ c, otherwise we are done without the last query, and it implies a + b ≥ c + m, moreover a + b > c + m, otherwise some of the remaining balls are relevant. Thus we are done if the answer is SAME.

Graphs
Let us start this section with describing in detail how the weighted majority problems are connected to the majority problem on graphs. Consider an algorithm solving the majority problem on a graph G. Let G i be the subgraph of G formed by the first i queries. We call the vertex set of a connected component of G i a q-component. Observe that knowing the answer to the first i queries, for every q-component U we know a partition of U into a blue subset U 1 and a red subset depending on the answer to the i+ 1 st query (u, u ′ ). For other q-components U ′′ we have w i+1 (U ′′ ) = w i (U ′′ ). Hence an algorithm that finishes solving the majority problem on G after the k th query also solves the weighted majority problem for the vector having the weights of the q-components of G k as coordinates. However, this does not work in the other direction, as we do not have the restriction of the graph structure in the weighted problem. Thus we can only prove upper bounds for m(G) this way.
We will omit i and simply talk about w(U) instead of w i (U) because i will be always clear from the context. For a ball u ∈ U, let w(u) := w(U). If a q-component X has weight zero, we say that X is balanced. Similarly to vectors, we say that a graph G on n vertices is hard if m(G) = n − 1 for even n and m(G) = n − 2 for odd n. Proof. We show more: the adversary can pick in advance a coloring c of the vertices such that no matter what edge is missing from the queries, we cannot find out if there is a majority or not. All this coloring needs to satisfy is that we have n/2 blue balls, and if we remove any edge from T , both the resulting subtrees are unbalanced, i.e., the number of blue and red balls is not the same in them. Indeed, if an edge of T is not asked, then the adversary can either claim that the real coloring of the vertices is c and thus no majority vertex exists, or the coloring coincides with c on one component, but is exactly the flipped version of c on the other component and thus there is majority.
Equivalently, we want to find a balanced 2-coloring such that each edge of T cuts it into two non-balanced parts. We start with an arbitrary balanced coloring of T . If an edge connecting a red ball u and a blue ball v cuts T into two balanced parts, we can simply change the color of u to blue and the color of v to red. Observe that any other edge e cuts T into two parts such that u and v belong to the same part, hence it does not change whether e cuts T into balanced parts.
Let us assume now that u and v are both red, and the edge uv cuts T into two balanced parts A and A ′ . Then we change the color of every ball in A. We claim that for any edge e that cuts T into parts B and B ′ , it does not change whether B and B ′ are balanced. Indeed, either B is completely inside A ′ , in which case no color inside B is changed, or B contains A, in which case some colors have changed, but the number of blue balls turning red is the same as the number of red balls turning blue, as A is balanced. As B ′ is balanced if and only if B is balanced, B ′ is also unaffected. Now u and v have different colors, so we can again exchange their color.
Hence we obtained that we can decrease the number of edges that cut T into balanced parts. It is easy to see that the coloring remains balanced. After applying this operation finitely many times we obtain the desired coloring, finishing the proof.
Surprisingly, it is much harder to give a lower bound for trees on an odd number of vertices. For paths, for example, we have m(P n ) = n − b(n) for all odd n ≤ 13, while m(P 15 ) = 12 = n − b(n) + 1 = n − 3. (This we have verified with a computer program.) We conjecture that n − 3 might be a lower bound for all trees, but we can only prove the weaker bound n − 65.
To prove the lower bound of n − 65 for odd n, we start with a lemma that gives another proof for Proposition 3.1. First, we introduce a notation. In a graph G, for a subset of its vertices X ⊂ V we denote by δ(X) the parity of the number of edges between X and V \ X. If G is a tree and X is a connected subset of vertices, then δ(X) equals the parity of the number of components of V \ X. Recall that the weight of a q-component X, denoted by w(X) is the difference between the number of the blue balls and the number of the red balls in it.
Lemma 3. 2. We can answer to any sequence of queries in any graph G such that for any q- Note that if T is a tree on an even number of vertices, then 0 ≤ w(X) ≤ 2 implies that the game goes on until we have at most one non-balanced q-component. Assume at that point there would also be some balanced q-components. Observe that there is a tree-structure on the q-components of a tree. Then at least one of the balanced q-components would be a leafcomponent, but that contradicts condition (ii). Thus there can be only one component, which implies Proposition 3. 1. For trees on an odd number of vertices, a similar argument cannot work, as for example in P n , it can happen that the first two vertices form a q-component of weight 2, followed by (n − 3)/2 pairs of vertices that each form a balanced q-component, and the last vertex is a q-component of weight 1. In this case we have solved the majority problem with only (n − 1)/2 queries. In fact, according to the conditions of Lemma 3.2, our algorithm would answer exactly so that it would produce such weights for the q-components. For paths, there is no way to keep the weight function bounded without allowing an arbitrarily number of adjacent balanced q-components; but if this happened, then we could merge all the q-components to their left, and all the qcomponents to their right, so that only two non-balanced q-components remain -after this we are done if n is odd, saving an arbitrarily large number of queries. This is why the proof will be more complicated for trees on an odd number of vertices; we will need to use our results about weighted balls.
Proof of Lemma 3. 2. Initially the conditions are satisfied. Suppose that the query is between two q-components, X and Y .
If |X| + |Y | is odd, then exactly one of w(X) and w(Y ) equals 1, while the other equals 0 or 2, so we can achieve w(X ∪ Y ) = 1 to satisfy condition (i).
If |X| and |Y | are both odd, then we can choose the weight of X ∪ Y to be 0 or 2; one of those is equal to 2δ(X).
If |X| and |Y | are both even, then since δ( Thus we can answer so that w(X ∪ Y ) becomes 0, 2 or 0, respectively, to satisfy condition (ii).
For the lower bound of n−65 for trees on an odd number of vertices, we need another theorem. Before that, we prove a simpler result that contains an important ingredient of the proof, and is of independent interest. Theorem 3. 3. Let n = 2 k + l, where l < 2 k . If G has a set U of vertices such that |U| ≤ 2 k−2 and the components of G \ U are single vertices (i.e., every edge is incident to a vertex in U), then G is hard, i.e., m(G) = n − 1 if n is even and m(G) = n − 2 if n is odd. Proof. Denoting by w(X) the weight of a q-component X, we initially have X w(X) = n. The adversary will maintain in the first part of the algorithm that w(X) = 0 for every q-component X.
Let us now describe a strategy of the adversary for the first part of the algorithm. Whenever for some v ∈ G \ U we ask the first query containing v, if the other vertex in the query u ∈ U is such that w(u) = p ≥ 2, the answer is such that the weight of the new q-component is p − 1, thus X w(X) decreases by 2. In every other case the answer is such that the weights are added up, i.e., X w(X) remains the same.
Introduce the potential function 4 Ψ = X w(X) + |{X | X ∩ U = ∅, w(X) = 1}|. The adversary's strategy is such that every time we ask the first query containing a v ∈ G \ U, the function Ψ decreases by at least 1. Since initially Ψ = n + |U|, after |U| + l queries involving some vertex of V \ U, we would have 2 k ≥ Ψ ≥ X w(X). But the adversary stops executing this algorithm the moment we have X w(X) = 2 k or X w(X) = 2 k + 1; this surely happens, as X w(X) can only decrease by 2. Let us consider the vertices from G \ U that were merged into some q-components (i.e. those that appeared in queries). Let x denote the number of those where the total weight did not decrease when they first appeared in a query, and y denote the number of those where the total weight decreased when they first appeared in a query. Then we have x ≤ y + |U|. Indeed, consider a q-component containing a vertex u ∈ U, and observe that whenever the weight of this component increased by merging it with a vertex from G \ U, the next time its weight decreased.
This implies that at the point where the adversary stops executing the algorithm, the number of vertices in G \ U that have not appeared in any query is at least n − |U| − (|U| + l) ≥ 2 k−1 . Now we can apply Lemma 2.2 to the current q-components as weighted balls. Indeed, we have at least 2 k−1 q-components of weight 1, and the total weight is 2 k or 2 k + 1. By Lemma 2.2 the number of queries needed is the number of components minus 1 or minus 2, depending on the parity. Hence even if we could compare any two q-components from now, we still could not solve the majority problem with less queries.
With a similar method, we can obtain the following lower bound for odd paths.
Theorem 3. 4. m(P n ) ≥ n − 6. Moreover, m(P n ) ≥ n − 5 unless n + 1 or n + 3 is a power of two. Proof. We have already seen that this holds if n is even, so it is enough to prove the theorem for n odd. First we prove the weaker claim m(P n ) ≥ n − 10. The statement holds for n < 1000 as m(P n ) ≥ n − b(n). Let U include every 9 th vertex of P n , starting with the first, and also the last vertex of P n , so ⌈ n 9 ⌉ ≤ |U| ≤ ⌈ n 9 ⌉ + 1, and P n \ U consists of paths on 8 vertices (and possibly one shorter path at the end). We answer each query such that for any q-component X if X ∩ U = ∅, then w(X) ≤ 1, while if X ∩ U = ∅, then 1 ≤ w(X) ≤ 2. In each step the total weight decreases by 0 or 2, so after a while it becomes 2 k + 1 for k = ⌊log n⌋. When this happens, we apply Lemma 2.2 to the current q-components as weighted balls. Indeed, X:X∩U =∅ w(X) ≤ 2|U| = 2⌈ n 9 ⌉ + 2 ≤ n 4 ≤ 2 k−1 if n > 1000, so X:X∩U =∅ 1 > 2 k−1 . By Lemma 2.2, the number of queries needed to finish is at least the number of components minus 2. Equivalently, Lemma 2.2 states that we need to connect the weighted balls until at most two components remain, thus, we need to connect all of U into at most two components. This means querying all the edges between any two vertices of U that are in the same component at the end. That means the edges we have not queried are all on a path of length 9 (between two vertices from U). This proves m(P n ) ≥ n − 10.
But we can do even better, because out of the 9 edges of the path at least 4 must be queried if the path contains no non-balanced components. This proves m(P n ) ≥ n − 6 if n is large enough, but now we have to be more careful with the calculations. Because of this, we also change how we select U; instead of starting with the first vertex, we start with the second vertex of the path, then take every 9 th vertex, and finally the last but one vertex. We can afford to skip the endvertices, as a single vertex anyhow cannot form a balanced component, we can only compare it to its adjacent vertex from U. This gives |U| = ⌊ n+14 9 ⌋, and n+14 9 ≤ n 8 if n ≥ 112, while for n < 127 the lower bound n − b(n) ≥ n − 6 holds.
The proof of the moreover part is similar, except that after we start with the second vertex, we take every 8 th vertex, and finally the last but one vertex. This way only 4 edges can remain unqueried between two different components. This gives |U| = ⌊ n+12 8 ⌋, and this is less than 2 ⌊log 2 n⌋−2 unless n + 1 or n + 3 is a power of two.
It is an interesting question where the truth is between n − 3 and n − 6 for P n for odd n. Our only (computer verified) case is m(P 15 ) = n − 3. Now we present the lower bound for general trees. The proof of Theorem 3.6 is based on a similar idea as that of Theorem 3.3, but also combines ideas from Theorem 3.4 and uses Proposition 2. 9. We also need the following version of the folklore generalization of the concept of centroid for trees, known as centroid decomposition.
Proposition 3. 5. In every tree on n vertices, for every integer p, there is a subset U of at most 2n/p vertices such that every component of G \ U has at most p edges (including the edges from the components to U).
Theorem 3. 6. If G is a tree on n vertices, then m(G) ≥ n − 65.
Proof. Let n = 2 k + l, where l < 2 k is odd. Observe that the statement is trivial if n ≤ 65, thus we can assume k ≥ 6. Apply Proposition 3.5 with p = 32 to obtain a set U of vertices such that |U| ≤ 2 k−3 − 1 and each component T of G \ U has at most p edges. (We write p instead of 32 throughout the proof. ) We proceed as in the proof of Theorem 3. 3. We denote by w(X) the weight of a q-component X, and for a vertex u of G, w(u) denotes the weight of the q-component containing u. We initially have X w(X) = n. The adversary will maintain in the first part of the algorithm that w(X) = 0 for every q-component X that intersects U.
We split each component T to a connecting part T ′ and some hanging parts T 1 , T 2 , . . . where any of these can be empty, as follows. If v ∈ T separates some vertices of U from each other, then it goes to T ′ . Each connected component of T \ T ′ forms a different T i . Notice that each hanging part T i is a subtree of T , thus it has a unique vertex r(T i ) that separates T i \ {r(T i )} from T \ T i ; we call r(T i ) the root of T i .
We answer queries inside T i according to Lemma 3.2 (applied only to T i ), while if the query X ∩ T ′ = ∅, we answer such that w(X) ≤ 2 (which is similar to Theorem 3.4). This way the weight of any X ⊂ G \ U will be at most 2. The crucial property is that the balanced q-components of T will always separate either two U vertices, or some positive weight part of a T i from a U vertex. This way they are "in the way" to compare these parts with the rest of the graph, so they cannot be simply ignored. The strategy of the adversary will be to make sure that the game cannot end while there are many unbalanced q-components. After there are only few unbalanced q-components the game might end, but in this case the graph could be made into a single q-component by adding O(p) further edges to it. This shows that at most these many queries can be saved.
Also, in case we merge all of some T i into one q-component, the adversary would like to avoid w(T i ) = 0. This cannot happen if T i has an odd number of vertices; if T i has an even number of vertices, the adversary adds an (imaginary) extra degree one vertex r ′ (T i ) to T i that is adjacent only to r(T i ), to obtain T * i , and applies Lemma 3.2 to T * i instead of T i . Since r ′ (T i ) is never compared with anything, merging all of T i into a q-component cannot give w(T i ) = 0, because T ′ = T * i \ T i has only one component, {r ′ (T i )}. Therefore, in case the whole tree T i is merged, we get w(T i ) = 2.
Whenever we compare some Y ⊂ G \ U with an X intersecting U such that w(X) ≥ 3, the adversary answers such that the weight of the new q-component is w(X) − w(Y ), thus X w(X) decreases by 2w(Y ) ≤ 4. In every other case the adversary answers so that the weights are added up, i.e., X w(X) remains the same. This way the weight of a q-component can never exceed 4, unless we merge two q-components that both intersect U. Because of this, we can conclude that The adversary stops executing this algorithm the moment we have X w(X) = 2 k + 1 or 2 k + 3; this surely happens, as X w(X) is odd and can decrease by at most 4. As we have seen in the earlier proofs, if X w(X) = 2 k + 1, then we will have two non-balanced q-components when the algorithm is done. If X w(X) = 2 k + 3, then we can apply Proposition 2.9, whose conditions are shaped to work here, to conclude that we will have at most three non-balanced q-components when the algorithm is done.
Moreover, these few remaining non-balanced q-components need to cover U, as the weights of sets intersecting U stays positive throughout the algorithm. If at the end we have at most ℓ components, then adding ℓ − 1 original tree component T 's, we can make the q-graph connected. As every tree has at most p vertices, and in our case ℓ ≤ 3, adding 2p edges can make the q-graph connected.
To summarize, instead of asking all n − 1 edges, we might save 2p = 64.

Remark.
We could get a better constant by considering the number of yet unqueried edges we need to add to connect the remaining non-balanced q-components (as in the end of the proof of Theorem 3.4). Here we will not go to details, as our bound is probably anyhow far from being optimal, but this would give something like n − 33 as a lower bound.

Non-deterministic complexity of odd trees
We have seen that it is much harder to prove the lower bound m(T ) ≥ n − 65 for trees of odd order n than the lower bound m(T ) ≥ n − 1 for trees of even order n. Somewhat even more surprisingly, there is a significant difference between the so-called non-deterministic complexities of trees of even and odd order. The non-deterministic complexity m nd (G) of a graph G is defined as the minimum number of queries needed to find a majority vertex in the worst case, provided we know the color of each vertex beforehand from an unreliable source and we just have to verify (some of) this information. Let us observe that in the proof of Proposition 3.1 we actually showed m nd (T ) = n − 1 for any tree T of even order n.
Proposition 3. 7. Let P be a path of order n, such that n is odd. Then m nd (P ) = n − Θ( √ n).
Proof. Let us denote the i th vertex of P by x i for i = 1, 2, . . . , n. For the lower bound let us suppose that n = k 2 + 1 for some even k (this is possible, since we are only interested in the order of magnitude of n − m nd (P )) and let us call the batch of vertices x (i−1)k+1 , x (i−1)k+2 , . . . , x ik Batch i for i = 1, 2, . . . , k. Now let us color the vertices of Batch i red if and only if i is odd and let x k 2 +1 be blue (just like the vertices of Batch k, since k is even). We claim that in order to find a majority vertex, one needs at least n − 2k − 1 comparisons (that is, one has to verify the result of this many comparisons). Assume to the contrary that fewer comparisons suffice. Then the number of edges not asked as queries is p ≥ 2k, hence the number of q-components after the last query is p + 1 ≥ 2k + 1. It is easy to see that the number of balanced q-components is at most k − 1, since a balanced q-component must contain both x ik and x ik+1 for some i < k.
Thus at least k + 2 of the q-components are unbalanced. It is also easy to see that the weight of any q-component is at most k + 1 (since k vertices of the same color are always followed and preceded by k vertices of the opposite color, except for the first k and last k + 1 vertices). Now if one could show a majority vertex, the weight of its q-component should be more than the sum of the weights of the other q-components, which is impossible, since the latter sum is at least k + 1. This finishes the proof of the lower bound.
Next we prove the upper bound. Consider any coloring of the vertices of P and let us denote the number of red (resp. blue) vertices among {x 1 , x 2 , . . . , x i } by R(i) (resp. B(i)) and let us suppose without loss of generality that d := R(n) − B(n) > 0 (notice that n is odd). Observe that if d ≥ √ n, then by asking the first n − ⌈ d 2 ⌉ edges (or any consecutive n − ⌈ d 2 ⌉ edges) of P , we obtain a q-component of weight at least ⌈ d 2 ⌉ + 1 and ⌈ d 2 ⌉ − 1 q-components of weight (and also cardinality) 1. Thus any majority vertex of the large q-component is also a majority vertex of the whole graph, and we are done. Therefore, we might assume that d < √ n.
Let D(i) := R(i) − B(i) (so d = D(n)), ∆ := max n i=1 |D(i)|, and let j be the smallest number, such that |D(j)| = ∆. Since d > 0, we may suppose that D(j) = ∆, otherwise we can reverse the order of the vertices and obtain a situation, where the similarly obtained D(j ′ ) is positive (the value ∆ would be different then, but d remains the same). Now we consider two cases, based on the value of ∆.
In this way we obtain r or r + 1 q-components, of which r − 1 are balanced, therefore any majority vertex of the largest non-balanced q-component (which exists, since n is odd) is a majority vertex of the whole graph as well. Since r > √ n 5 , we are done with this case. Case 2. ∆ ≥ 2 √ n. Recall that j is the smallest number, such that D(j) = ∆ and d < √ n, i.e., the number of red vertices is ∆ more than the number of blue vertices by x j , but then the difference drops to d by the end of P . Thus there must exist a smallest number k, such that k > j and D(k) < √ n. Then the subpath P ′ between the vertices x j+1 and x k contains at least ∆ − √ n ≥ √ n more blue vertices, than red vertices. Let now j 1 be the smallest index, such that the subpath of P ′ between x j+1 and x j 1 contains exactly one more of the blue vertices than the red vertices. Similarly, let j 2 be the smallest index, such that the subpath of P ′ between x j 1 +1 and x j 2 contains one more of blue vertices than red vertices, and so on. It is clear that the indices j 1 , j 2 , . . . , j ⌊ √ n⌋ are well-defined. Now let us query all edges of P , except the edges (x j , x j+1 ), (x j 1 , x j 1 +1 ), . . . , (x j ⌊ √ n⌋ , x j ⌊ √ n⌋+1 ). In this way we obtain ⌊ √ n⌋ + 2 q-components, such that one of them has weight ∆, ⌊ √ n⌋ of them has weight 1, and one of them has weight smaller than ⌊ √ n⌋, thus any majority vertex of the q-component of weight ∆ is also a majority vertex of the whole graph, finishing the proof of Proposition 3.7.

Optimal graph with few edges
In this subsection we prove Theorem 1.2 which states that for every n there is a graph G with n vertices and n(1 + b(n)) edges such that m(G) = n − b(n).
Proof of Theorem 1. 2. Let F k be the graph obtained from a path v 1 v 2 . . . v k by adding k vertices of degree 1, u 1 , u 2 , . . . , u k , to it such that u i is connected to v i for each 1 ≤ i ≤ k. Let k = ⌊n/2⌋ and G be the graph we obtain from F k by adding all the possible edges incident to any of the vertices v k−b(n)+1 , . . . , v k . 5 We are going to define an algorithm A l for l = 2 i . First we describe some properties of the algorithm. It uses the edges of F l and either gets back a DIFF answer at some point for an edge that connects two monochromatic q-components of the same size (which is a power of two), or shows that F l is monochromatic. Moreover, at any point it uses only the first j vertices of the path v 1 . . . v j and the leaves u 1 , . . . , u j connected to them, for some j. Therefore, if there are vertices not appearing in any query, they form a connected graph.
We define algorithm A l recursively. Algorithm A 1 is trivial, it has only one query u 1 v 1 . Assume we have defined Algorithm A l and we are given F 2l . The graph F 2l consists of two copies of F l and an additional edge, where the first copy has vertices v 1 , . . . , v l , u 1 , . . . , u l , the second copy has the remaining vertices, and the additional edge is v l v l+1 . Algorithm A l+1 runs algorithm A l separately for the first, and then for the second copy of F l , and finally asks (v l , v l+1 ). If for either copy of F l we get back a DIFF answer at some point for an edge that connects two monochromatic q-components of the same size, we are done. Otherwise, both copies of F l are monochromatic. In this case a DIFF answer to the last query connects two monochromatic q-components of the same size, while a SAME answer shows F 2l is monochromatic.
The algorithm showing m(G) ≤ n−b(n) for even n is based on an idea similar to the algorithm showing m(K n ) ≤ n − b(n): we ask queries such that if the answer is DIFF, we obtain a balanced q-component (that we can discard), and otherwise we build larger and larger monochromatic q-