On the maximum mean subtree order of trees

A subtree of a tree is any induced subgraph that is again a tree (i.e., connected). The mean subtree order of a tree is the average number of vertices of its subtrees. This invariant was first analyzed in the 1980s by Jamison. An intriguing open question raised by Jamison asks whether the maximum of the mean subtree order, given the order of the tree, is always attained by some caterpillar. While we do not completely resolve this conjecture, we find some evidence in its favor by proving different features of trees that attain the maximum. For example, we show that the diameter of a tree of order $n$ with maximum mean subtree order must be very close to $n$. Moreover, we show that the maximum mean subtree order is equal to $n - 2\log_2 n + O(1)$. For the local mean subtree order, which is the average order of all subtrees containing a fixed vertex, we can be even more precise: we show that its maximum is always attained by a broom and that it is equal to $n - \log_2 n + O(1)$.


Introduction
A subtree of a tree T is any induced subgraph that is connected and thus again a tree. In this paper, we will be concerned with the average number of vertices in a subtree (averaged over all subtrees), which is known as the mean subtree order of T and denoted µ T . A normalized version of the mean subtree order, called the subtree density, is obtained by dividing by the number of vertices in T : if T has n vertices, then This quantity clearly always lies between 0 and 1. The concepts of mean subtree order and subtree density were introduced to the literature by Jamison in the 1980s [2,3]. Both papers contain a number of interesting questions and conjectures, many of which were only resolved very recently [1,4,6,7,8].
The biggest open problem concerning mean subtree order and subtree density is certainly the natural question: which trees yield the maximum for a given number of vertices? The (highly nontrivial) fact that the minimum is always attained by the path was already proven by Jamison in his first paper [2]. Initially, for a small number of vertices, the tree with greatest subtree density is a star. The first exception occurs when the number of vertices n is equal to 9: here, a double-star obtained by connecting the centers of two stars with four and five vertices respectively has mean subtree order 779 159 ≈ 4.89937 and subtree density Based on further computational evidence, Jamison put forward the conjecture that the maximum mean subtree order is attained by a caterpillar (i.e., a tree that becomes a path when all leaves are removed) for every possible number of vertices. This conjecture has been open ever since, and rather little progress has been made. It is not difficult to show that the maximum subtree density approaches 1 as the number of vertices tends to infinity. There are many possible constructions that yield this limit, the simplest perhaps being "double-brooms" (or "batons"), which consist of a long path with a suitable number of leaves attached to both ends.
Mol and Oellermann [4] considered this construction in greater detail and found that the optimal choice of double-broom for a given number of vertices is essentially to attach (approximately) 2 log 2 n leaves at each end of a path of length (approximately) n − 4 log 2 n. This immediately yields a lower bound for the maximum of the mean subtree order, which is asymptotically (1) n − 2 log 2 n + O(1), as one finds by a relatively straightforward calculation; see also Corollary 9 and Theorem 18 below for a more precise estimate.
One of the goals of this paper is to show that double-brooms are indeed "close to optimal". Specifically, we prove Theorem 1. The maximum of the mean subtree order over all trees with n vertices is n − 2 log 2 n + O(1).
We will see, however, that double-brooms do not attain the maximum for sufficiently large n. See Section 5 for more details.
Our proof of Theorem 1 is, as many other results on the mean subtree order, based on a "local" version. Define the local mean subtree order µ T (v) at a vertex v of a tree T to be the average number of vertices in a subtree that contains v. More generally, one can consider the average of the order of all subtrees containing a specific set A of vertices, denoted µ T (A) (so that µ T (v) = µ T ({v}) and µ T = µ T (∅)). The following useful monotonicity property was already proven by Jamison [2]: Theorem 4.5]). We have µ T (A) ≤ µ T (B) whenever A is a subset of B. Equality holds if and only if the smallest subtree containing all of A is the same as the smallest subtree containing all of B.
As an important special case [2, Theorem 3.9], (2) µ T (v) ≥ µ T holds for every vertex v of T (even with strict inequality unless T only has one vertex). Therefore, any upper bound on local mean subtree orders immediately yields an upper bound on the (global) mean subtree order. The local mean subtree order is often easier to deal with, and in fact we will be able to resolve the local analogue of Jamison's caterpillar conjecture.
We even obtain the following stronger result: Theorem 3. If the local mean subtree order µ T (r) attains its maximum among all choices of an n-vertex tree T and a vertex r of the tree, then T has to be a broom, i.e., a tree consisting of a path and leaves attached to one end of the path, and r has to be the other end of the path.
The formal proof of this result will be presented in the next section. Let us briefly give an intuitive explanation why brooms are strong candidates for the maximum local mean subtree order. The leaves at one end create a large number of subtrees that all have to contain the end of the path that these leaves are attached to. Since we are only counting subtrees that also contain the other end (the vertex r), most of the subtrees under consideration contain the entire path, thus providing a large contribution to the local mean subtree order. One also finds (a precise discussion is given later) that the optimal choice for the number of leaves is about 2 log 2 n. The same reasoning also explains why the double-brooms that were mentioned earlier are close to optimal.
Building on Theorem 3, Theorem 1 will be proven in Section 3. In Section 4, we add further evidence in favor of Jamison's caterpillar conjecture by proving that an optimal tree (a tree that attains the maximum mean subtree order) with n vertices, or even a tree that comes close to the maximum, must have a diameter that is very close to n. Moreover, we will be able to bound the number of subtrees in an optimal tree both from above and below, showing that it is necessarily of order Θ(n 4 ).
Finally, we look closer at caterpillars. We show that with a suitable choice of caterpillar, one can obtain a greater mean subtree order than with a double-broom when the number of vertices is sufficiently large, even though the improvement is modest (of order O(1)). The structure of the caterpillars that achieve this feat is somewhat surprising-see Section 5 for details.

Extremal local mean subtree order
This section is devoted to the proof of Theorem 3. While the calculations are somewhat technical, the main idea is rather straightforward: we prove that a tree that is not a broom can always be improved by replacing a subtree that is the union of two brooms by a single one (keeping the total number of vertices the same) in such a way that the local mean subtree order increases.
To this end, let us define two general configurations T 1 and T 2 . The tree T 1 consists of a rooted tree T ′ with root r, with a broom attached to a vertex v (of T ′ ) whose length is a ≥ 0 and which has b ≥ 1 leaves. Note that a = 0 is allowed, in which case the broom becomes a star. Similarly, T 2 contains the tree T ′ with root r, with two brooms attached at the same vertex v. These brooms have paths of length a and c respectively and b and d leaves respectively. Here we assume a ≥ 0 and b, c, d ≥ 1, since for a = c = 0 both brooms would degenerate to stars and could be regarded as a single star. Let us remark that v is not necessarily a leaf of T ′ in this setup. The two constructions are presented in Figure 1. In both cases, we will mainly be interested in the local mean subtree order at the root.
So let ℓ be the number of subtrees of T ′ that contain r and v, and let s be their total order (number of vertices). Likewise, let m be the number of subtrees of T ′ that contain r, but not v, and let t be their total order. Note that µ T ′ ({r, v}) = s ℓ and µ T ′ ({r}) = s+t ℓ+m by definition, and recall from Theorem 2 that µ T ′ ({r, v}) ≥ µ T ′ ({r}) where equality only holds when v = r. This is equivalent to We can now express the local mean subtree order of T 1 at r in terms of the variables a, b, ℓ, m, s, t. Note that there are m subtrees of T 1 containing r, but not v, and ℓ(a + 2 b ) subtrees of T 1 containing both r and v: any subtree of T ′ that contains both vertices can be combined with any of the a + 2 b subtrees of the attached broom that contains v (here, a is the number of proper subpaths of the path of length a, and 2 b is the number of subtrees containing the entire path of length a and a subset of the b leaves). The total order of the former is t by definition, the total order of the latter is s(a + 2 b ) + ℓ 2 (a 2 − a + (2a + b)2 b ) (the first term being the contribution from T ′ , the latter the contribution from the broom). It follows that .
The number of variables can be reduced using the abbreviations k = m ℓ and C = 2 sm−tℓ ℓ 2 : Note here that C ≥ 0 by the earlier observation that s ℓ ≥ t m . Using analogous reasoning, we obtain Since s ℓ is determined by the structure of T ′ alone, we focus on the remaining parts of µ T 1 (r) and µ T 2 (r) and write these as functions f 1 and f 2 of the six variables k, C, a, b, c, d, i.e., Ultimately, we would like to obtain an inequality of the form f 1 (k, C, α, β) ≥ f 2 (k, C, a, b, c, d) for a suitable choice of α, β that may depend on k, C, a, b, c, d. We first prove some auxiliary lemmas for this purpose. Lemma 4. Let k, C ≥ 0 be fixed constants, and let N ≥ 2 be a fixed integer. Suppose that the integers a and b maximize the function f 1 (k, C, a, b) under the conditions a + b = N , a ≥ 0 and b ≥ 1. Then we have 2 b ≥ 3a, except when k = C = 0 and N = 2, in which case a = 0, b = 2 and a = b = 1 both maximize f 1 (k, C, a, b).
Proof. We have Proposition 6. Let k, C ≥ 0 be fixed constants. Let a, b, c, d be integers with a ≥ 0 and b, c, d ≥ 1 such that 2 b ≥ 3a and 2 d ≥ 3c holds. Then we have Proof. We prove that there is a linear combination with nonnegative coefficients of which is strictly positive. This implies that either ∆ 1 > 0 or ∆ 2 > 0, from which the result follows. Let us write x = 2 b and y = 2 d . We take the following coefficients: Note that λ 1 ≥ 0 by Lemma 5, and clearly also λ 2 > 0. The linear combination 2 is indeed strictly positive for all integers a, b, c, d, x, y with a ≥ 0 and b, c, d ≥ 1 as well as x ≥ 1 + b ≥ 2 and y ≥ 2. It follows that either ∆ 1 > 0 or ∆ 2 > 0 (or both), completing the proof.
We are now ready to prove the main result of this section.
Proof of Theorem 3. Suppose there is an optimal tree for the local mean subtree order with respect to the vertex r (the root) that is not a broom. Then there is a vertex v, possibly equal to r, which is the root of at least two brooms (take the vertex at greatest distance from the root for which the tree consisting of this vertex and all its successors is not a broom). Hence the optimal tree can be described in the way we defined the general construction T 2 . At the same time, it can also be regarded as a T 1 (with the other broom becoming part of T ′ ), so Lemma 4 applies: a, b and c, d have to maximize f 1 (k, C, ., .) for the corresponding k and C under the fixed sum condition as the tree is assumed to be optimal. We can conclude that 2 b ≥ 3a, unless k = C = 0 (which happens only if v = r) and N = 2. In that case, we may assume a = 0 and b = 2 without loss of generality as both this choice and a = b = 1 yield the same local density. So we always have 2 b ≥ 3a, and likewise 2 d ≥ 3c. We can also assume a ≥ 0 and b, c, d ≥ 1 as mentioned before.
But now we can apply Proposition 6 and conclude that we can replace the two brooms by a single one such that the order is the same and the local mean subtree order increases. So the original tree was not optimal, which contradicts the initial assumption. We conclude that an optimal tree has to be a broom as described in the statement of the theorem.
Corollary 7. Among all trees of order n, the maximum local mean subtree order is of the form n − log 2 n + O(1). More precisely, it is of the form Proof. We know that the maximum local subtree order is attained by a broom. Let a denote the length of its "handle" and b the number of leaves, so that n = a + b + 1. We can take ℓ = 1, m = 0, s = 1 and t = 0 in (3) to see that the average subtree order is n − 1 2 b + an a+2 b . Hence we need to minimize b + an a+2 b , subject to the condition that a + b = n − 1. If we take b = ⌈2 log 2 n⌉, then an a+2 b ≤ an 2 b ≤ n 2 n 2 = 1. So the minimum of our expression is at most ⌈2 log 2 n⌉ + 1 ≤ 2 log 2 n + 2. Consequently, any choice of a and b where b > 2 log 2 n + 2 cannot be optimal. It follows that a = n − O(log n) for the optimal choice of a and b.
Likewise, we must have an a + 2 b ≤ 2 log 2 n + 2 for the optimal choice of a and b, which implies that 2 b ≥ n 2 2 log 2 n+2 − O(n). From these estimates, we obtain an a + 2 b = In other words, the minimum of b + an a+2 b , subject to the condition that a + b = n − 1, differs from the minimum of b + n 2 2 b only by o(1). The minimum of b + n 2 2 b occurs (by convexity) for a value of b for which we simultaneously We obtain that the maximum local subtree order of a broom (and thus an arbitrary tree) of order n is where x = {2 log 2 n} is the fractional part of 2 log 2 n. This can also be written as which completes the proof of our asymptotic formula.
In the following section, we will use our knowledge on the maximum local mean subtree order to bound the global mean subtree order as well.

The maximum mean subtree order
We now make the step from the local to the global mean subtree order. Recall that the global mean subtree order of a tree T is no greater than the local mean subtree order at any vertex of T (inequality (2)). This combined with Corollary 7 shows immediately that for every tree of order n. However, in order to match the lower bound (1) due to Mol and Oellermann, we have to refine the argument to prove our main result on the mean subtree order (Theorem 1).
A vertex of a tree is said to be a centroid vertex if none of the components that result when the vertex is removed contains more than half of the vertices. It is well known that every tree has either one or two centroid vertices, and that a centroid vertex minimizes the sum of all distances to the other vertices [9]. Proposition 8. Let T be a tree of order n, and let v be a centroid vertex of T . Then the local mean subtree order of T at v is at most with f as in Corollary 7.
Proof. We can assume that the local mean subtree order of T at v is greatest among all choices of a tree T and a centroid vertex v. Let the components of T − v be T 1 , T 2 , . . . , T k . Moreover, let T ′ i be the tree that results when v is added back to T i (along with the edge connecting it to its neighbor in T i ). Clearly, every subtree of T that contains v induces a subtree in each T ′ i that contains v, and conversely every subtree of T that contains v can be obtained by merging subtrees of T ′ 1 , T ′ 2 , . . . , T ′ k . It follows easily from this observation that We use this representation combined with the upper bound on the local mean subtree order to bound the local mean subtree order at the centroid vertex v. Assume without loss of generality that By Theorem 3, we can replace T ′ 1 ∪ T ′ 2 by a single broom of the same order whose local mean subtree order at the root is greater than the local mean subtree order of T ′ 1 ∪ T ′ 2 . Thus the local mean subtree order increases after this replacement, while vertex v remains a centroid since T 1 and T 2 together do not contain more than half of the vertices. This contradicts our choice of T and v. So we know that k ≤ 3.
If k = 2, then T ′ 1 and T ′ 2 each contain between n 2 and n 2 + 1 vertices, and we can apply (4) and Corollary 7 to complete the proof. If k = 3, then we note first that |T ′ 3 | ≥ |T ′ 2 | ≥ n 4 + 1 2 (since T 3 contains at most half of the vertices, and T ′ 2 more than half of the rest).
then we can already apply (4) and Corollary 7 to T ′ 1 , T ′ 2 and T ′ 3 to show that the local mean subtree order at v is at most In this case, we are done. Otherwise, T ′ 2 and T ′ 3 each contain at least n 2 − √ n vertices (and at most n 2 + 1), by the choice of v as a centroid. Applying Corollary 7 to those two, we find that the local mean subtree order at v is indeed at most Note here that log 2 ( n 2 − √ n) = log 2 n − 1 + O(n −1/2 ), and that f is a continuous 1-periodic function. This completes the proof.
Proposition 8 tells us that every tree has a vertex such that the local mean subtree order at that vertex is at most n − 2 log 2 n + O(1). Specifically, every centroid vertex has this property. Combined with (2), this immediately implies Theorem 1.
It is worth studying the construction of Mol and Oellermann given in [4] a little further: if we take a double-broom consisting of a path of n − 2s vertices, with s leaves attached at each end, then there are subtrees with a total of 2 2s (n − s) + 2 s (n − s)(n − 2s − 1) + n − 2s 3 + 2s vertices. The optimal choice of s is 2 log 2 n + O(1), in which case we have 2 s = Θ(n 2 ). The mean subtree order becomes 2 2s (n − s) + 2 s (n − s)(n − 2s − 1) + n−2s The expression s + n 2 2 s was already analyzed in the proof of Corollary 7. Its minimum is obtained when s = ⌊2 log 2 n⌋ and has a value of 2 log 2 n − f (2 log 2 n), with the same function f as in that corollary. So we have the following refinement of Theorem 1: Corollary 9. The maximum of the mean subtree order over all trees with n vertices lies between n − 2 log 2 n + f (2 log 2 n) + o(1) and n − 2 log 2 n + 2 + f (2 log 2 n) + o(1).
We have thus reduced the gap between the upper and lower bound to 2 + o(1).

Characteristics of optimal trees
We call a tree optimal if it has the greatest possible mean subtree order among all trees of the same size. In this section, we prove several structural properties of optimal trees. Specifically, we show that the diameter of optimal trees with n vertices is very close to n, and that the number of subtrees is of the order of magnitude Θ(n 4 ). Another feature of optimal trees was proved by Mol and Oellermann in [4], namely that the number of leaves cannot be too large: This estimate is based on a result of Jamison [2, Lemma 6.1] stating that the mean subtree order of a tree with n vertices and ℓ leaves is at most n − ℓ/2.
The following notation will be useful for our arguments. For an arbitrary tree T , we denote the total number of subtrees by σ(T ), and their total order (i.e., the total number of vertices in all these subtrees) by τ (T ), so that the mean subtree order is given by For a rooted tree T with root r, we let s(T ) be the number of subtrees of T that contain the root, and let t(T ) be their total order (we suppress the dependence on the root for simplicity). The local subtree order at r can then be expressed as the quotient of these two: We also define the defect ∆(T ) as the average number of vertices not contained in a randomly chosen subtree that contains the root r, i.e., Note in particular that the defect of a single-vertex tree is 0 while the defect of a (rooted) tree with exactly two vertices is 1 2 . Let T 1 , T 2 , . . . , T k be the components resulting when the root is removed. Each of them is endowed with a natural root, namely the unique neighbor of T 's root. Let moreover T ′ 1 , T ′ 2 , . . . , T ′ k be the trees obtained from T 1 , T 2 , . . . , T k by adding back the root of T (and an edge connecting it to the root of the respective component, similar to the proof of Proposition 8). We have since each subtree that contains the root of T can be decomposed into subtrees of T ′ 1 , T ′ 2 , . . . , T ′ k in a natural way. Furthermore, when a uniformly random subtree of T that contains the root is chosen, the induced subtrees in T ′ 1 , T ′ 2 , . . . , T ′ k are independent uniformly random subtrees containing the root in the respective branches. It follows that The following inequality between ∆(T ) and s(T ) will be crucial: Lemma 11. For every rooted tree T , we have Proof. We proceed by induction on |T |. For a single-vertex tree, the inequality is trivial as both sides are equal to 0. Next we distinguish two different cases: if there are two or more branches, then we can use the induction hypothesis together with (5) and (6). Otherwise, there is only a single component T 1 , and we obtain s(T ) = 1 + s(T 1 ), t(T ) = 1 + s(T 1 ) + t(T 1 ), and consequently .
Thus we can again invoke the induction hypothesis to complete the proof of the inequality.
As an immediate consequence of Lemma 11, we already get an upper bound on the number of subtrees of an optimal tree. Proposition 12. There is a constant C 1 > 0 such that every optimal tree T has at most C 1 n 4 subtrees, where n is the number of vertices of T .
Proof. It is clearly enough to prove the statement for sufficiently large n. Let T be an optimal tree, and suppose first that none of the vertices of T is contained in more than half of the subtrees. Then the mean subtree order is clearly at most n 2 , which contradicts Theorem 1 (at least for sufficiently large n). Thus we can select a vertex r as the root that is contained in more than σ(T )/2 subtrees. By (2) and Lemma 11, we have Combining this inequality with Theorem 1, we obtain log 2 σ(T ) ≤ 4 log 2 n + O(1), which implies the statement.
Next we show that every optimal tree has a large central part that is contained in most subtrees. Formally, we define the central part C(T ) of a tree T to be the set of all vertices that are contained in at least 1 1+n −1/4 σ(T ) of all subtrees. The constant 1 4 is somewhat arbitrary in this definition and can be replaced by any other number less than 1 2 . We remark that the definition of the central part is conceptually similar to that of the subtree core as defined in [5]: the subtree core contains those vertices that are contained in the greatest number of subtrees. It can be shown that there are always either one or two vertices in the subtree core; our central part turns out to be much larger for optimal trees.
Lemma 13. Let T be an optimal tree with n vertices, where n is sufficiently large. The vertices of the central part C(T ) induce a connected graph, i.e., a subtree of T , with at least n − n 1/3 vertices. Moreover, the subtree induced by C(T ) has at most 16 leaves.
Proof. Let σ v (T ) denote the number of subtrees containing a vertex v, so that C(T ) consists of all vertices for which σ v (T ) ≥ 1 1+n −1/4 σ(T ). It was shown in [5, Theorem 9.1] that σ v (T ) is unimodal along paths: for any path from one leaf to another, it first increases, then decreases. Hence the minimum of σ v (T ) among all vertices on an arbitrary path is attained at one or both ends. Consequently, for any two vertices v and w that belong to C(T ), the entire path between v and w is also contained in C(T ). This proves that C(T ) induces a connected graph. With some minor abuse of notation, we will also write C(T ) for the graph induced by C(T ). To bound the number of vertices in C(T ), we use the following crude bounds: • Every vertex outside of C(T ) is contained in at most So by Theorem 1, which implies that n − |C(T )| ≤ 2n 1/4 log 2 n + O(n 1/4 ) ≤ n 1/3 for sufficiently large n, so that |C(T )| ≥ n − n 1/3 . It remains to prove the statement on the number of leaves.
Consider any leaf v of the subtree C(T ) and let w be its unique neighbor in C(T ). When the edge vw is removed from T , we obtain two components. The component containing v is called the branch bundle of v, denoted B(v); the branch bundle B(v) has v as a natural root.
Let a be the number of subtrees of T − B(v) that contain w, and let b = s(B(v)) be the number of subtrees of B(v) that contain v. Then the total number of subtrees of T that contain v is (a + 1)b, as every subtree of B(v) containing v is such a subtree itself, and can also be combined with an arbitrary subtree of T − B(v) containing w. We observe that T has at least a + 1 subtrees that do not contain v: all a subtrees of T − B(v) that contain w, and any single vertex of T other than v and w can also be regarded as such a subtree. Thus we have σ(T ) ≥ (a + 1) + (a + 1)b = (a + 1)(b + 1), which implies By definition of the central part C(T ), we must have (a + 1)b which is ultimately equivalent to Now suppose that the subtree C(T ) has more than 16 leaves. The branch bundles associated with these leaves are all disjoint, since each of them only contains one vertex of C(T ). We can choose a subtree containing the root in each of these branch bundles and take their union with the tree C(T ) to obtain a subtree of T . In view of (7), this gives us at least (n 1/4 ) 17 = n 4.25 different subtrees of T , which contradicts Proposition 12 for sufficiently large n.
Now we can obtain a matching lower bound for Proposition 12, showing that an optimal tree with n vertices has Θ(n 4 ) subtrees.
Proposition 14. There is a constant C 2 > 0 such that every optimal tree T has at least C 2 n 4 subtrees, where n is the number of vertices of T .
Proof. We build our proof on the structural properties established in Lemma 13. Once again, it suffices to prove the statement for sufficiently large n. Consider two families of subtrees of an optimal tree T with n vertices: • Let F 1 be the family of subtrees that contain all leaves of the tree C(T ), and thus all of C(T ). Note that each of these leaves is, by definition, contained in all but at most of T 's subtrees. Thus the number of subtrees of T that do not contain one of these leaves is at most 16σ(T ) n 1/4 + 1 , which in turn means that F 1 contains at least (1 − 16n −1/4 )σ(T ) subtrees. If we contract all vertices of C(T ) to a single vertex r and call the resulting tree T ′ , then each of the subtrees in F 1 becomes a subtree of T ′ that contains r (which we take as the root of T ′ ), and this correspondence is clearly bijective. The average number of vertices not contained in subtrees of F 1 thus becomes exactly the defect ∆(T ′ ), which we can bound in the following way by Lemma 11: In summary, we find that the total number of vertices not contained in subtrees that belong to F 1 is at least • In order to define the second family, we consider a centroid vertex v of T (cf. Proposition 8). Let the components of T − v be T 1 , T 2 , . . . , T k , and let T ′ 1 , T ′ 2 , . . . , T ′ k be obtained from these components by adding back the vertex v in the same way as in Proposition 8. Recall that none of T 1 , T 2 , . . . , T k can contain more than half of the vertices of T . Since we know that the central part C(T ) induces a subtree and contains at least n − n 1/3 vertices, we can conclude (for sufficiently large n) that v lies in C(T ). Moreover, since the tree C(T ) has no more than 16 leaves, v cannot have more than 16 neighbors in C(T ). Those of the components T 1 , T 2 , . . . , T k whose corresponding neighbor of v does not lie in C(T ) cannot have more than n 1/3 vertices in total. Each of the remaining at most 16 components contains at most n 2 vertices, so by the pigeonhole principle, there are at least two that contain at least n 32 vertices each (for sufficiently large n). Without loss of generality, let those be T 1 and T 2 . If the union of all remaining branches (which is T − (T 1 ∪ T 2 )) contains more than √ n vertices in total, then we can regard the tree T as the union of T ′ 1 , T ′ 2 and T − (T 1 ∪ T 2 ) and apply the argument of Proposition 8 to show that the local mean subtree order at v is at most This gives us a contradiction for sufficiently large n. So both T ′ 1 and T ′ 2 contain in fact at least n 2 − √ n vertices each. Next, we use (5), which shows that regarding T and T ′ 1 , T ′ 2 , . . . as rooted at v. This implies that either s(T ′ 1 ) ≤ σ(T ) or s(T ′ 2 ) ≤ σ(T ) (or both). Let us assume that the former holds. We now define F 2 as the family of all subtrees of T that contain v, but not all vertices of T 1 ∩ C(T ). Clearly, F 2 is disjoint from F 1 (whose members need to contain all of C(T )).
Note that T 1 ∩ C(T ) contains at least |T 1 | − n 1/3 ≥ n 2 − √ n − 1 − n 1/3 > n 3 vertices (for sufficiently large n). For every ℓ ∈ {0, 1, . . . , ⌊ n 3 ⌋}, we can find a subtree of T ′ 1 that contains v and ℓ vertices of T 1 ∩ C(T ) (by successively adding vertices). Each of these can be merged with an arbitrary subtree of T − T 1 that contains v to form an element of F 2 , so we obtain at least such trees, since s(T ) ≥ σ(T )/(1 + n −1/4 ) (as we established that v belongs to C(T )) and s(T ′ 1 ) ≤ σ(T ) by assumption. Since each such tree does not contain at least n 3 − ℓ vertices of T , we find that the total number of vertices not contained in subtrees that belong to F 2 is at least by the same inequalities as before.
Now we put together the contributions of F 1 and F 2 : the total number of vertices not contained in subtrees of T is at least Thus the mean subtree order of T can be bounded above as follows: In the last step, we used the fact that log 2 σ(T ) = O(log n) by Proposition 12. Writing σ(T ) = xn 4 , we get On the other hand, Theorem 1 tells us that µ T = n − 2 log 2 n + O(1). As the function x → 1 2 log 2 x + 1 20 √ x tends to ∞ as x → 0, x must in fact be bounded below by some constant C 2 , which completes the proof.
Summarizing, we have shown the following: Corollary 15. The number of subtrees in an optimal tree with n vertices is Θ(n 4 ).
We remark that the approach that gave us Proposition 12 and Proposition 14 could in principle also be used to prove Theorem 1. Of course, the O-constants that occur are not nearly optimal. As the final main result of this section, we are able to provide information on the diameter of optimal trees. Theorem 16. There exists an absolute constant C such that the following statement holds: for every tree T with n vertices and diameter d, we have µ T ≤ n − log 2 n − 2 log 2 (n − d) + C.
Proof. If d ≥ n − √ n, then the statement follows directly from Theorem 1, so we assume that d ≤ n − √ n. Fix a diameter (path of maximum length) of T ; its length is d, so there are d + 1 vertices on it, and n − d − 1 vertices that do not lie on it. Let us again consider the central part C(T ) as in the previous proof. If µ T < n − 3 log 2 n, we are done again, so we can assume that µ T ≥ n − 3 log 2 n. Then we can apply the same arguments as in Lemma 13, where T was assumed to be optimal, but all that was actually used was a lower bound on µ T . So we can conclude that C(T ) contains at least n − n 1/3 vertices (if n is sufficiently large, as we can always assume). Thus there are at least n − d − n 1/3 − 1 vertices of C(T ) that do not lie on the diameter. Each of these lies on at least one path from a leaf of the tree C(T ) to the diameter.
If the tree C(T ) has 24 or more leaves, then we can adapt the argument in the proof of Lemma 13 to show that σ(T ) ≥ (n 1/4 ) 24 = n 6 , and the proof of Proposition 12 to show that In this case we are done, so assume that there are at most 23 such leaves. By the pigeonhole principle at least one of the paths from a leaf of C(T ) to the diameter must have length at least 1 23 (n − d − n 1/3 − 1). For sufficiently large n, we have n 1/3 + 1 ≤ 1 2 √ n ≤ 1 2 (n − d), so this path has at least length 1 46 (n − d).
Let v be the vertex where this path meets the diameter. We know now that there is a path emanating from v that does not have any vertices in common with the diameter other than v and whose length is at least 1 46 (n − d). Moreover, the vertex v divides the diameter into two pieces. Both pieces must also have a length of at least 1 46 (n − d), since there would otherwise be a path through v that is longer than the diameter.
We can therefore split T into three subtrees T ′ 1 , T ′ 2 , T ′ 3 whose union is T , whose pairwise intersection is only the vertex v, and each of which contains at least 1 46 (n − d) vertices. The largest of these three subtrees certainly contains at least n 3 vertices. We can now use equation (4) from the proof of Proposition 8, which tells us that Now we apply Corollary 7: Note that the O-constant does not depend on d, so the proof is complete.
Corollary 17. For every positive integer n, letT n be an optimal tree with n vertices. Then we have, for every δ > 0, lim n→∞ n − diam(T n ) n 1/2+δ = 0.
In plain words, optimal trees must have a diameter that is close to the number of vertices. We remark that there are trees whose diameter is about n − √ n for which the asymptotic formula of Theorem 1 is attained. The construction is fairly simple: merge three brooms at their roots; two of them have length approximately √ n and log 2 n leaves each. The third one consists of a path of length about n − 2 √ n − 4 log 2 n, with approximately 2 log 2 n leaves at the end ( Figure 2). It is not difficult to check that the resulting tree satisfies the abovementioned properties: the diameter is n − √ n + O(log n), the mean subtree order is n − 2 log 2 n + O(1).
Example of a tree with near-maximum mean subtree order.

Constructing better caterpillars
In this section, we analyze a construction of caterpillars that achieve a slightly higher mean subtree order than double-brooms. This allows us to improve the final constant in the lower bound for the maximum mean subtree order. We conjecture that our construction in fact accurately reflects the shape of optimal trees for large n.
For n ≤ 24 the optimal trees have been computed explicitly. In particular, see [4, Figure 1] for the optimal trees when 16 ≤ n ≤ 24. None of these are double-brooms.
Theorem 18. For large n, there is always a tree T with n vertices such that µ T ≥ n − 2 log 2 (0.9n)+f (2 log 2 (0.9n))+o(1), where f is the same 1-periodic function as in Corollary 7, Proof. We provide an explicit construction. Let T be a caterpillar consisting of a path with ℓ+1 vertices (which will be called the stem), m leaves attached at either end, and k additional leaves, where n = ℓ + 2m + k + 1. We will choose m and k in such a way that the number of leaves is 2m + k = 4 log 2 n + O(1) and k = c log 2 n + O(1) for some fixed constant c ∈ (0, 1).
Fix an end of the stem and call it the left end; the other one will be called the right end. The k additional leaves are attached to vertices of the stem whose distances from the left end are a 1 ℓ, a 2 ℓ, . . . , a k ℓ respectively, where 0 ≤ a 1 ≤ a 2 ≤ . . . ≤ a k ≤ 1. These will be called support vertices. Moreover, we set a 0 = 0 and a k+1 = 1.
Let us now determine the number of subtrees as well as their total order. Firstly, there are 2 2m+k subtrees that contain the entire stem. Next, we consider subtrees containing the left end, but not the right. Here, the number of subtrees containing precisely the first i support vertices is 2 m+i (a i+1 ℓ − a i ℓ), since there are m + i leaves that can potentially be added, and a i+1 ℓ − a i ℓ ways to add a path segment between the i-th and the (i + 1)-th support vertex. Using the same reasoning, we find that there are subtrees containing the right end, but not the left, and precisely the last k −i support vertices.
Finally, the number of subtrees containing neither of the two ends of the stem can be bounded by O(2 k ℓ 2 ), as those are either single leaves or consist of part of the stem and a subset of the k additional leaves. So the total number of subtrees of T is In a similar fashion, we can compute τ (T ), the sum of the orders of the subtrees of T . The subtrees containing both ends of the stem contribute 2 2m+k (ℓ+1+m+ k 2 ) = 2 2m+k (n−m− k 2 ). The subtrees that contain the left end and the first i support vertices, but not the right end, contribute a total of 2 m+i (a i+1 ℓ−a i ℓ) Likewise, subtrees that contain the right end and the last k − i support vertices, but not the left end, contribute a total of The contribution of all subtrees that do not contain either of the ends is bounded by O(2 k ℓ 3 ) by the same reasoning as before. So we have Recall now the assumptions we made on m and k. With those, we find that 2 2m+k = Θ(n 4 ) and 2 k ℓ 2 = Θ(n 2+c ) as well as ℓ = n − O(log n). Consequently, σ(T ) = 2 2m+k + 2 m+k ℓp(k; a 1 , a 2 , . . . , a k ) + O(2 k ℓ 2 ) is easily seen to be bounded. Similarly, where q(k; a 1 , a 2 , . . . , a k ) = 1 2 We can still choose a 1 , a 2 , . . . , a k , and we make the choice in such a way that q(k; a 1 , a 2 , . . . , a k )− p(k; a 1 , a 2 , . . . , a k ) is (near) maximal. To this end, note that q(k; a 1 , a 2 , . . . , a k ) − p(k; a 1 , a 2 , . . . , a k ) = − The sum of quadratic polynomials attains its maximum when a i = 1 2 k−2i+1 +1 for every i. However, since a i ℓ needs to be an integer for each i, we can only choose it in such a way that a i = 1 2 k−2i+1 +1 + O(n −1 ). The error term here has an asymptotically negligible impact on µ T . With this choice, we arrive at The expression in brackets simplifies as follows: The value of the infinite sum depends only on the residue class of k modulo 2: it is approximately 0.801214 for even k and approximately 0.801218 for odd k. In particular, it is less than 0.81. Therefore, we have where m ′ = m + k 2 . It remains to minimize the expression m ′ + 0.81 · 2 −m ′ n 2 = m ′ + (0.9n) 2 2 m ′ , and (assuming for simplicity that k is chosen to be even) we already know from the proof of Corollary 7 that this can be achieved by taking m ′ = ⌊2 log 2 (0.9n)⌋, resulting in the lower bound µ T ≥ n − 2 log 2 (0.9n) + f (2 log 2 (0.9n)) + o(1), which completes the proof. Figure 3 shows the rough structure of the trees that are constructed in the proof of Theorem 18. Note the wide variety of potential choices for both k and a 1 , a 2 , . . . , a k , which results in a large number of trees that satisfy the asymptotic inequality in Theorem 18. As an immediate corollary, we obtain the following result.
Corollary 19. For sufficiently large n, no double-broom is an optimal tree.
By means of a computer program 3 , it can also be checked that for 25 ≤ n ≤ 1000, the best balanced double-broom is not optimal either.

Optimal trees are nearly caterpillars
In this concluding section, we summarize the progress towards the last major open question by Jamison on the mean subtree order of trees. Although there is not yet a proof for the optimal tree to be a caterpillar, the evidence listed below shows that the optimal trees are indeed very much like caterpillars. In Corollary 15, we proved that the number of subtrees in an optimal tree is of the same order as the number of subtrees in the optimal double broom. To further analyze the structure of the optimal treeT n , we combine Corollary 9 and Theorem 16 to see that In particular, this implies that almost all, up to at most O( √ n), vertices belong to one path.