Paths vs. stars in the local profile of trees

The aim of this paper is to provide an affirmative answer to a recent question by Bubeck and Linial on the local profile of trees. For a tree $T$, let $p^{(k)}_1(T)$ be the proportion of paths among all $k$-vertex subtrees (induced connected subgraphs) of $T$, and let $p^{(k)}_2(T)$ be the proportion of stars. Our main theorem states: if $p^{(k)}_1(T_n) \to 0$ for a sequence of trees $T_1,T_2,\ldots$ whose size tends to infinity, then $p^{(k)}_2(T_n) \to 1$. Both are also shown to be equivalent to the statement that the number of $k$-vertex subtrees grows superlinearly and the statement that the $(k-1)$th degree moment grows superlinearly.

(k) 1 (T ) be the proportion of paths among all k-vertex subtrees (induced connected subgraphs) of T , and let p  1 (T n ) → 0 for a sequence of trees T 1 , T 2 , . . . whose size tends to infinity, then p (k) 2 (T n ) → 1. Both are also shown to be equivalent to the statement that the number of k-vertex subtrees grows superlinearly and the statement that the (k − 1)th degree moment grows superlinearly.

introduction
In their recent paper [2], Bubeck and Linial studied what they call the local profile of trees. For two trees S and T , we denote the number of copies of S in T by c(S, T ) (formally, the number of vertex subsets of T that induce a tree isomorphic to S). For an integer k ≥ 4, let T k 1 , T k 2 , . . . be a list of all k-vertex trees (up to isomorphism), such that T k 1 = P k is the path and T k 2 = S k is the star, and set p (k) , where Z k (T ) = j c(T k j , T ).
In words, Z k (T ) is the number of k-vertex subtrees of T (the number of k-vertex subsets that induce a tree), and p (k) i the proportion of copies of T k i among those subtrees. In particular, p Bubeck and Linial study specifically the limit set ∆(k) of k-profiles p (k) (T ) as the number of vertices of T tends to infinity. Their main result is that ∆(k) is convex for every k. This contrasts the situation for general graphs, where the analogously defined set is not convex and even determining the convex hull is computationally infeasible [3]. Even in special cases, fairly little is known about k-profiles (see [4] for a study of 3-profiles). We remark that there is also a notable difference in the definitions of k-profiles of general graphs and trees: for graphs, the proportion is taken among all vertex sets of cardinality k, while for trees it makes more sense to only consider those k-vertex sets that actually induce a tree. For general graphs, this would amount to considering only those subsets that induce a connected graph.
Furthermore, Bubeck and Linial show that the sum of the first two components (corresponding to the path and the star respectively) is strictly positive for every point in the limit set ∆(k) and in fact bounded below by an explicit constant that only depends on k (see the discussion at the end of Section 2 and in particular Corollary 11 for an equivalent statement). They also obtain a somewhat stronger inequality in the special case k = 5.
Bubeck and Linial list a number of open problems at the end of their article, and one of them will be the main topic of this paper. It can be expressed as follows: . be a sequence of trees such that the number of vertices of T n tends to infinity as n → ∞. Given that lim n→∞ p In somewhat more informal terms, this states the following: if only few of the k-vertex subtrees of a large tree are paths, almost all of those subtrees have to be stars. We remark that the statement is not true if p (k) 1 and p (k) 2 are interchanged. For example, consider the sequence of caterpillars as shown in Figure 1. Obviously, p 2 (T n ) = 0 for every n in this example: the maximum degree is 3, so T n does not contain any 5-vertex stars. On the other hand, simple calculations show that lim n→∞ p (5) 1 (T n ) = 1 2 . In the following, we will provide an affirmative answer to the question raised by Bubeck and Linial, and even prove a slight extension involving the total number of k-vertex subtrees and the degree moments. Here and in the following, we write V (T ) and E(T ) for the vertex set and edge set of a tree T , |T | is the number of vertices of T , and d(v) denotes the degree of a vertex v; whenever we speak about the degree of a vertex, we always mean the degree in the underlying tree T , not a subtree. Theorem 1. Let T 1 , T 2 , . . . be a sequence of trees such that |T n | → ∞ as n → ∞. For every k ≥ 4, the following four statements are equivalent: Informally, statement (M2) says that T n contains more than linearly many k-vertex subtrees. (M3) states that the (k − 1)-th degree moment tends to infinity. The implication (M4) ⇒ (M1) is trivial, so our main task will be to prove the implications (M1) ⇒ (M2) Shortly after a first version of this paper was published online, the equivalence of (M1) and (M4) was shown independently by Bubeck, Edwards, Mania and Supko [1], who also provided an explicit (nonlinear) inequality between p (k) 1 (T ) and p (k) 2 (T ) that implies the equivalence.

Proof of the main theorem
Theorem 1 will follow from a sequence of lemmas. As a first step, we estimate the total number of k-vertex subtrees.
Lemma 2. Let k be a positive integer. The total number of k-vertex subtrees of any tree T can be bounded above as follows: Proof. For every vertex v of T , we count the number of k-vertex subtrees with the property that v is contained, and that it has maximum degree (in T , not the subtree!) among all vertices of the subtree. Every such subtree can be constructed by repeatedly adding a leaf, starting with the single vertex v. At the j-th such step, there are at most j vertices to attach a leaf to, and at most d(v) choices for the new leaf (since v was assumed to have maximum degree). Therefore, there are at most (k − 1)! · d(v) k−1 possible subtrees of this kind for any fixed vertex v. Summing over all v, we obtain the desired result. Clearly every subtree is counted at least once in the sum-possibly even several times, but since we are only interested in an upper bound, this is immaterial.
Proof. The number of k-vertex stars contained in T whose center is v is given by d(v) k−1 , the number of ways to choose k−1 of its neighbors. The desired statement follows immediately.
Note that p (k) 1 (T n ) = 0 if the diameter of T n is at most k − 2 (in this case, there are certainly no induced k-vertex paths), so this would provide us with a simple construction for which condition (M1) holds. We will treat this case separately and show that it implies (M2): Lemma 4. Fix an integer k ≥ 3, and let T 1 , T 2 , . . . be a sequence of trees whose diameter is bounded above by some fixed constant D. If |T n | → ∞ as n → ∞, then (M2) holds, i.e.
Proof. We prove the slightly stronger statement that i.e. the number of induced k-vertex stars grows faster than linearly. To this end, it will be useful to consider all trees as rooted (at an arbitrary vertex). Clearly, if the diameter is bounded by D, the height of any rooted version is also bounded by D. We prove the following by induction on D, from which the statement of the lemma follows immediately: Claim. For every positive integer D, there exist positive constants α D , β D with β D > 1 and a positive integer N D depending only on D and k such that for any rooted tree T whose height is at most D.
First note that the claim is trivial for D = 1: there is only one possible rooted tree in this case, namely a star. Thus we have in this case as soon as |T | ≥ k, which gives us the desired inequality with Now we turn to the induction step. Let r be the root degree, and let T 1 , T 2 , . . . , T r be the root branches, each endowed with the natural root (the neighbor of T 's root). The number of copies of S k in T for which the root is the centre is given by r k−1 , so Each of the branches has height at most D − 1, so we can apply the induction hypothesis to them. In addition, we note that f (x) = α D−1 max(x−N D−1 , 0) β D−1 is a convex function, so Jensen's inequality gives us If r ≥ |T | 2/3 and |T | ≥ (k − 1) 3/2 , then the first term is If, on the other hand, r < |T | 2/3 and |T | ≥ (N D−1 + 2) 3 , then the second term is Thus we obtain the desired inequality with Since k ≥ 3 and we were assuming β D−1 > 1, we also have β D > 1. This completes the induction and thus the proof of the lemma.
Lemma 4 shows that (M2) always holds for sequences of trees with bounded diameter, even without the assumption (M1). On the other hand, if the diameter is sufficiently large, then it turns out that there must always be at least linearly many paths of length k. In fact, we have the following simple lemma: Lemma 5. Let k be a positive integer. If the diameter of a tree T is at least 2k − 2, then c(P k , T ) ≥ |T |/2.
Proof. Since the diameter is assumed to be at least 2k − 2, the radius must be at least k − 1. Therefore, for every vertex v of T , there is a k-vertex path in T starting at v. Since every path has only two ends, no path is counted more than twice in this argument, thus there must be at least |T |/2 k-vertex paths occurring in T .

Corollary 6. For every integer k ≥ 4, the implication (M1) ⇒ (M2) holds.
Proof. Consider a sequence T 1 , T 2 , . . . of trees with |T n | → ∞ for which (M1) holds. For the subsequence consisting of trees whose diameter is at most 2k − 3, (M2) follows from Lemma 4, regardless of whether (M1) is true or not. For the remaining subsequence, we can simply combine Lemma 5 with the assumption (M1).
As a next step, we show the equivalence of (M2) and (M3), which is quite straightforward: Lemma 7. For every integer k ≥ 3, the two statements (M2) and (M3) are equivalent.
Proof. Condition (M2), combined with Lemma 2, implies that where the final term stems from vertices whose degree is less than k − 1. Therefore, if (M3) holds, then we also have which is (M2). Now we would like to bound the number of non-star k-vertex subtrees from above to obtain the implication (M2) ⇒ (M4). To this end, we first introduce the notion of edge weights: Define the weight of an edge e = vu as In words: take the degrees of the two endpoints of e and divide the higher degree by the lower degree. For some real number a > 1, call a subtree S of a tree T an a-unbalanced subtree if it contains at least one edge that is not a pendant edge (incident to a leaf) of S and that has a weight of at least a in T . Denote the total number of a-unbalanced k-vertex subtrees of T by Z k (T, a). The following lemma is in some sense a refinement of Lemma 2.
Lemma 8. For every integer k ≥ 4, every real number a > 1, and every tree T , we have Proof. We can follow the proof of Lemma 2. The only change in the argument is that at least one vertex of degree at most d(v)/a has to be added to the subtree at some point so as to include an edge of weight at least a. Since we also require the presence of such an edge that is not a pendant edge of the subtree, at some stage a neighbor of a vertex of degree at most d(v)/a has to be added to the subtree as well, for which there are only at most d(v)/a possibilities. This gives us the same inequality as in Lemma 2, but with an extra factor a in the denominator.
It remains to bound the number of k-vertex subtrees that are neither stars nor aunbalanced; we denote this number by Z k (T, a). Our next lemma provides a suitable bound: Lemma 9. For every integer k ≥ 4, every real number a > 1, and every tree T , we have Proof. Consider any edge e whose weight is at most a. It is not difficult to see that there exists some nonnegative integer ℓ such that the degrees of both its endpoints lie in the interval [a ℓ , a ℓ+2 ): simply take ℓ in such a way that the smaller degree of the two lies in [a ℓ , a ℓ+1 ). Now consider any subtree S that is not a-unbalanced and contains e as a nonpendant edge (it automatically follows that S is not a star). Every internal vertex v of S can be reached from e by a path of non-pendant edges whose length is at most k − 4. Since S was assumed not to be a-unbalanced, none of these edges can have a weight greater than a, so the degree of v in T is at most a ℓ+2 · a k−4 = a ℓ+k−2 . Now we count all subtrees S that are not a-unbalanced and contain e as a non-pendant edge. Every such subtree can be obtained by repeatedly adding leaves, starting from e. This is done k − 2 times. At the j-th step, we have a choice of j + 1 vertices to attach a leaf to, and at most a ℓ+k−2 possible choices for the leaf by the observation on degrees of internal vertices in S. It follows that there are no more than such subtrees.
The number of edges whose ends both have degrees in [a ℓ , a ℓ+2 ) is less than the number of vertices whose degrees lie in this interval, since the edges induce a forest on the set of these vertices. Therefore, we obtain the following upper bound for the number of k-vertex subtrees that are neither stars nor a-unbalanced (note that every non-star has at least one non-pendant edge): The last inequality holds since every vertex is counted at most twice in the double sum. We combine Lemma 8 and Lemma 9 to find that the total number of k-vertex subtrees of a tree T that are not stars can be bounded by Hölder's inequality gives us , so putting everything together, we obtain The O-constant depends on k and the specific sequence of trees, but notably not on a, which we can still choose freely. Taking which is greater than 1 for sufficiently large n in view of condition (M2), the two terms in the estimate balance, and we end up with which is exactly (M4).
As Our ideas can also be used to re-prove a result of Bubeck and Linial [2, Theorem 2], even with a slightly improved constant: namely, they showed that lim inf n→∞ p (k) for any sequence T 1 , T 2 , . . . of trees with |T n | → ∞, where N k is the number of nonisomorphic trees with k vertices.
Making use of the arguments used to prove Theorem 1, we obtain the following: Corollary 11. For every sequence T 1 , T 2 , . . . of trees with |T n | → ∞, we have Proof. Lemma 2 gives us Combining this inequality with (1) and Lemma 5 (we may assume that the diameter is not bounded in view of Lemma 4) yields Therefore, , and the desired result follows immediately.
With more careful estimates, it is certainly possible to improve further on the lower bound in Corollary 11.

Subtrees of different sizes
So far, we were only comparing subtrees of the same fixed size k. However, it is natural to assume that lim n→∞ p (k) 1 (T n ) = 0 for some k (in words: the proportion of paths among k-vertex subtrees goes to 0) should also imply lim n→∞ p (ℓ) 2 (T n ) = 1 (the proportion of stars among ℓ-vertex subtrees goes to 1) for some ℓ that is not necessarily equal to k. Indeed this is true if k ≤ ℓ: since we trivially have in this case, condition (M3) is satisfied for ℓ if it is satisfied for k. Therefore, we immediately obtain a slight extension of Theorem 1: Theorem 12. Let T 1 , T 2 , . . . be a sequence of trees such that |T n | → ∞ as n → ∞. Let k, ℓ be integers such that ℓ ≥ k ≥ 4, and assume that one of the following equivalent statements holds: In heuristic terms: if most k-vertex subtrees are stars, then this is also the case for ℓ-vertex subtrees, provided ℓ ≥ k. On the other hand, if only very few of the k-vertex subtrees are paths, then the same applies to ℓ-vertex subtrees for every ℓ ≥ k. It is noteworthy, however, that the converse is not true, and counterexamples are very easy to construct.
Consider for instance a family of extended stars constructed as follows ( Figure 2): T n has n vertices, of which the central vertex has degree (approximately) n 2/(2k−1) for some k ≥ 4, while all other vertices have degree 1 or 2. The actual lengths of the paths around the central vertex are irrelevant. It is easy to see in this example that (M3) k is not satisfied, and that in fact lim n→∞ p (k) 2 (T n ) = 0, while on the other hand (M3) k+1 is satisfied, so that lim n→∞ p