Asymptotic distribution of two-protected nodes in ternary search trees

We study protected nodes in $m$-ary search trees, by putting them in context of generalised P\'olya urns. We show that the number of two-protected nodes (the nodes that are neither leaves nor parents of leaves) in a random ternary search tree is asymptotically normal. The methods apply in principle to $m $-ary search trees with larger $m$ as well, although the size of the matrices used in the calculations grow rapidly with $ m $; we conjecture that the method yields an asymptotically normal distribution for all $m\leq 26$. The one-protected nodes, and their complement, i.e., the leaves, are easier to analyze. By using a simpler P\'olya urn (that is similar to the one that has earlier been used to study the total number of nodes in $ m $-ary search trees), we prove normal limit laws for the number of one-protected nodes and the number of leaves for all $ m\leq 26 $.


Introduction
There are many recent studies of so-called protected nodes in various classes of random trees, see e.g. [1,3,6,8,11,18,19]. A node is protected (more precisely, two-protected) if it is not a leaf and none of its children is a leaf.
In this paper we consider the number of protected nodes in m-ary search trees (see Section 1.1.2 for definitions), by putting them in context of generalised Pólya urns. The following result is our main theorem. We let d → denote convergence in distribution and denote a normal distribution by N (µ, σ 2 ). For a binary search tree, we obtain by the same method a new proof of the following result, which earlier has been obtained by different methods, first by Mahmoud and Ward [18] (using generating functions), and later in [11] (using fringe trees). Recall that necessary and sufficient conditions for L 1 convergence of a sequence (X n ) of random variables, are that X n p → X (where p → denotes convergence in probability) and that the sequence (X n ) is uniformly integrable. Note that Theorems 1.1 and 1.2 imply Zn n p → 57 700 and Yn n p → 11 30 , respectively, since if a sequence of random variables converges in distribution to a constant it also converges in probability to that constant. Since 0 ≤ Zn n ≤ 1 and 0 ≤ Yn n ≤ 1, uniformly integrability for the sequences ( Zn n ) and ( Yn n ) obviously holds. Hence, ( Zn n ) and ( Yn n ) converge in L 1 to 57 700 and 11 30 , respectively; in particular, E(Zn) n → 57 700 and E(Yn) n → 11 30 . We conjecture that also the variances (and higher moments) converge in Theorems 1.1 and 1.2.
The methods apply to larger m too, at least in principle, see Sections 1.1.3 and 5. Similarly, we may consider the one-protected nodes, i.e. the non-leaves. These are easier to analyze than the two-protected nodes and using a minor variation of a Pólya urn earlier used to study the total number of nodes [15,12,16], we prove in Sections 4 and 5.2 normal limit laws for the number of one-protected nodes and the number of leaves in an m-ary search tree for all m ≤ 26.
1.1 Protected nodes in m-ary search trees described as generalised Pólya urns 1

.1.1 A generalised Pólya urn
A (generalised) Pólya urn process is defined as follows, see e.g. [12] or [16]. There are balls of q types (or colours) 1, . . . , q, and for each n a random vector X n = (X n,1 , . . . , X n,q ) , where X n,i is the number of balls of type i in the urn at time n. The urn starts with a given vector X 0 . For each type i, there is an activity (or weight) a i ≥ 0 and a random vector ξ i = (ξ i1 , . . . , ξ iq ). The urn evolves according to a Markov process. At each time n ≥ 1, one ball is drawn at random from the urn, with the probability of any ball proportional to its activity. Thus, the drawn ball has type i with probability a i X n−1,i j a j X n−1,j . If the drawn ball has type i, it is replaced together with ∆X n,q ) has the same distribution as ξ i and is independent of everything else that has happened so far. (We allow ∆X (i) n,i = −1, which means that the drawn ball is not replaced.) We let A denote the q × q matrix A = (a j E ξ ji ) q i,j=1 . (1.1) The matrix A with its eigenvalues and eigenvectors is central for proving limit theorems. The basic assumptions in [12] are the following. We say that a type i is dominating if every other type j may appear at some time in an urn started with a single ball of type i.
(A3) The largest eigenvalue λ 1 of A is positive.
(A5) There exists a dominating type i with X 0i > 0, i.e., we start with at least one ball of a dominating type.
(A6) λ 1 is an eigenvalue of the submatrix of A given by the dominating types.
Furthermore, [12] says that the process becomes essentially extinct if at some time there are no balls of any dominating type left. We will also use the following simlifying assumption.
In the Pólya urns used in this paper, it is easily seen (from the definitions using trees) that every type with non-zero activity is dominating. If we remove rows and columns corresponding to the types with activity 0 from A, then the removed columns are identically 0, so the set of non-zero eigenvalues of A is not changed. The remaining matrix is irreducible, and using the Perron-Frobenius theorem, it is easy to verify all conditions (A1)-(A6), see [12,Lemma 2.1]. Furthermore, in our urns there will always be a ball of positive activity, so essential extinction is impossible.
Before stating the results that we use, we need some notation. With a vector v we mean a column vector, and we write v for a row vector. We denote the transpose of a matrix A as A . By an eigenvector of A we mean a right eigenvector, a left eigenvector is the same as an eigenvector of the matrix A . If u and v are vectors then u v is a scalar while uv is a q × q matrix. We also use the notation u · v for u v. We let λ 1 denote the largest eigenvalue. Let a = (a 1 , . . . , a q ) denote the (column) vector of activities, and let u 1 and v 1 denote left and right eigenvectors of A corresponding to the largest eigenvector λ 1 , i.e., vectors satisfying We assume that v 1 and u 1 are normalized such that and P I = I q − P λ 1 , where I q is the q × q identity matrix. (Thus P λ 1 is a one-dimensional projection onto the eigenspace corresponding to λ 1 , such that P λ 1 commutes with the matrix A, see [12, equation (2.2)]). We define the matrices where we recall that e tA = ∞ j=0 t j A j /j!. It is proved in [12] that, under assumptions (A1)-(A7), X n is asymptotically normal if Re λ ≤ λ 1 /2 for each eigenvalue λ = λ 1 ; more precisely, if Re λ < λ 1 /2 for each such λ, then n −1/2 (X n − nµ) d → N (0, Σ) for some µ and Σ. The asymptotic covariance matrix Σ may be calculated in different ways; we use the following results from [12], which apply under different additional assumptions. . Assume (A1)-(A7) and that we have normalized as in (1.2). Also assume that Re λ < λ 1 /2 for each eigenvalue λ = λ 1 . Suppose that a · E(ξ i ) = m for some m > 0 and every i. Then, as n → ∞, with µ = λ 1 v 1 and covariance matrix Σ equal to mΣ I , with Σ I as in (1.5). . Assume (A1)-(A7), and that we have normalized as in (1.2). Also assume that Re λ < λ 1 /2 for each eigenvalue λ = λ 1 . If the matrix A is diagonalisable, and {u i } q i=1 and {v i } q i=1 are dual bases of left and right eigenvectors, respectively, i.e., u i A = λ i u i , Av i = λ i v i and u i · v j = δ ij (where δ ij is the Kronecker delta, and the λ i , i = 1, . . . , q, do not have to be distinct). Then, as n → ∞, with the matrix B as in (1.4).

M -ary search trees
We recall the definition of m-ary search trees, see e.g. [14] or [7]. An m-ary search tree, where m ≥ 2, is constructed recursively from a sequence of n keys (numbers). We assume that the keys are i.i.d. uniform random numbers in [0, 1]. (Only the order of the keys matter, so alternatively, we may assume that the keys form a uniformly random permutation of {1, . . . , n}.) Each node may contain up to m − 1 keys. We start with a tree containing just an empty root. The first m − 1 keys are put in the root, and are placed in increasing order from left to right; they divide the set of real numbers into m intervals J 1 , . . . , J m . When the root is full (after the first m − 1 keys are added), it gets m children that are initially empty, and each further key is passed to one of the children depending on which interval it belongs to; a key in J i is passed to the i:th child. (The binary search tree is the simplest case where keys are passed to the left or right child depending on whether it is larger or smaller than the key in the root.) The procedure repeats recursively in the subtrees until all keys are added to the tree. Nodes that contain at least one key are called internal, while empty nodes are called external. We regard the m-ary search tree as consisting only of the internal nodes; the external nodes are places for potential additions, and are useful when discussing the tree (e.g. below), but are not really part of the tree. Thus, a leaf is an internal node that has no internal children, but it may have external children. (It will have external children if it is full, but not otherwise.) Similarly, a protected node is an internal node that is not a leaf, and has no child that is a leaf. (It may have external nodes as children. ) We say that a node with i ≤ m − 2 keys has i + 1 gaps, while a full node has no gaps. It is easily seen that a m-ary search tree with n keys has n + 1 gaps; the gaps correspond to the intervals of real numbers between the keys (and ±∞), and a new key has the same probability 1/(n+1) of belonging to any of the gaps. Thus the evolution of the m-ary search tree may be described by choosing a gap uniformly at random at each step. Equivalently, the probability that the next key is added to a node is proportional to the number of gaps at that node.
Pólya urns have been used in some earlier studies, e.g. [15,12], to describe the number of nodes in m-ary search trees containing i keys where 0 ≤ i ≤ m − 1; then a node containing i keys is called a node of type i and thus the generalised Pólya urn has m different types. It has been shown that for this process, when m ≤ 26 the number of different types has an asymptotic multivariate normal distribution, but this does not hold for larger m.
(Since the condition Re λ < λ 1 /2 for λ = λ 1 on the eigenvalues of the matrix A in (1.1) holds only if m ≤ 26.) Since the number of nodes in the whole tree is a linear combination of these numbers, this implies in particular that the distribution of the random number of nodes in an m-ary search tree containing n keys is asymptotically normal for m ≤ 26. In this Pólya urn, with one ball representing each node, the activity of a ball is the number of gaps, i.e., i + 1 for a ball of type i ≤ m − 2, and 0 for a ball of type m − 1.
Alternatively, see [12], we can use a Pólya urn where each ball represents a gap; thus a node with i keys corresponds to i + 1 balls for 0 ≤ i ≤ m − 2, and these balls are all given type i. (Full nodes are ignored.) This is thus an urn with m − 1 types, all with activities 1.

Protected nodes and generalised Pólya urns
We will see that it is possible to use a generalised Pólya urn also to study protected nodes in an m-ary search tree, although the urn consists of quite a few different types.
Description of the Types in the Pólya urn. Given an m-ary search tree T with n keys together with its external nodes, erase all edges that connect two internal non-leaves. This yields a forest of small trees, where (assuming n ≥ m) each tree has a root that is a nonleaf in T while all other nodes are leaves or external nodes in T . We regard these small trees as the balls in our generalised Pólya urn. The type of a ball (tree) is the type of the tree as an unordered tree, i.e., up to permutations of the children. The type of a tree in the urn is thus described by the numbers k i , i = 0, . . . , m − 1, of children of the root with i keys; each of these children is an external node (i = 0) or a leaf (i ≥ 1), and it has itself children only when i = m − 1 when it has m external children; thus the type is uniquely determined by k 0 , . . . , k m−1 , and we can label the type by (k 0 , . . . , k m−1 ). Since the root of any of the small trees has m children (including external ones) in the original tree T , we have m−1 i=0 k i ≤ m, (with the remainder m − m−1 i=0 k i equal to the number of erased edges to children in the original tree T that are non-leaves). Furthermore, the case k 0 = m is excluded, since the root of the small tree is a non-leaf in T . The total number of types is thus one less than the number of compositions of m into m + 1 non-negative parts, i.e., The activity in the Pólya urn of one of these types is the number of gaps that it contains. The root has no gaps, so a tree with type (k 0 , . . . , k m−1 ) has activity m−1 i=0 (i + 1)k i .

Type 1
Type 2 Type 3 Type 4 Type 5 Figure 1: The different types characterizing protected and unprotected nodes in binary search trees. Type 4 and type 5 are the only ones that include protected nodes.
Moreover, if we add a new key to a leaf, it is still a leaf, so in the Pólya urn, this corresponds to replacing a tree by another tree where we have increased by 1 the number of keys of one of the children of the root. The same holds if we add a key to an external node that is a child of the root. However, if we add a key to an external node that is a child of a leaf, then that leaf becomes a non-leaf, so the edge from it to the root is erased and the tree is split into two (one of which always has the type (m − 1, 1, 0, . . . , 0)). See Section 2 for examples. Note that in general, a small tree may be transformed in several different ways when we add a new key, depending on which gap it goes into. Hence, the additions ξ i in the Pólya urn will be random. A protected node in T is a non-leaf, and is therefore a root in one of the small trees. Moreover, it must not have any child that is a leaf, so all its children are external nodes. Thus, the number of protected nodes in T equals the number of balls in the urn that have

Protected nodes in binary search trees and Pólya urns
In this section we demonstrate the technique of using the Pólya urn defined above to study the number of protected nodes, by applying it to the simplest case m = 2, the binary search tree. This gives us a new proof of Theorem 1.2; for earlier proofs, see [18] and [11].
For a binary tree, the number of types in the Pólya urn defined above is 4 2 − 1 = 5. We show the different types in Figure 1, with a numbering that will be used below. (For convenience we omit the external nodes in the figures. We use dotted lines for edges attached to external nodes.) With our characterization of the types in Section 1.1.3, the types i ∈ {1, . . . , 5} correspond to (0, 2), (1, 1), (0, 1), (1, 0) and (0, 0), respectively.
In a binary search tree, each leaf contains one key, so it has two external children, whereas other internal nodes have either 1 or 0 external children. There is one gap at each external node, and no gaps at any internal node. As explained in Section 1.1.2, each gap (i.e. external node) has activity 1.
When a ball is drawn from the urn (i.e., a new key is added to the tree), as explained in general in Section 1.1.3, a key is either added to an external node that is a child of the root (we return a ball of another type), or to an external node that is a child of a leaf (we return two balls).  show the transitions in the Pólya urn when a ball of type i for i ∈ {1, 2, 3, 4, 5} is drawn (where the types are shown in Figure 1), so that the drawn ball is replaced by a new set of balls. (As said above, this set could depend on which of the nodes in the drawn type the key is added to, see Figure 3.) The activities of the different types depend on their number of gaps; the total activities for the types 1, 2, 3, 4, 5 are 4, 3, 2, 1, 0, respectively; thus a = (4, 3, 2, 1, 0) .
To do the matrix operations in this paper we use Mathematica, but one could alternatively use e.g. Maple.
The eigenvalues of A are 1, 0, −2, −3, −4. Corresponding right eigenvectors of A are: and corresponding left eigenvectors of A are: Since the eigenvalues for the matrix A are distinct it follows automatically that u i · v j = 0 for i = j. Note that we have scaled the eigenvectors so that u i · v i = 1 and (1.2) hold. Note also that u 1 is equal to the activity vector a. This is a consequence of the fact that the total activity always increases by 1 when we draw a ball from the urn, and thus a · E ξ i = 1 for each i, see [12,Lemma 5.4].
It is easy to see that we can apply Theorem 1.5 for this generalised Pólya urn. Note that it is obvious that the matrix A is diagonalisable since all eigenvalues are simple. From Theorem 1.5 we obtain that X n = (X n1 , X n2 , X n3 , X n4 , X n5 ), where X ni is the number of balls of type i (in our case the number of trees that correspond to type i in our forest), has asymptotically a multivariate normal distribution. Let Y n be equal to the number of protected nodes in the binary search tree with n nodes. Since type 4 and type 5 each contains exactly one protected node, while the other types contain no protected nodes, Thus, Theorem 1.5 implies that with parameters µ Y = µ 4 + µ 5 and In all cases except for B 2 these are deterministic and equal to ξ i ξ i . We only show how to obtain B 2 (since the other cases are simpler). As shown in Figure 3 when adding a key to type 2 we can either add it to the leaf or to the external node. In case we add it to the external node (which happens with probability 1/3) a node of type 2 is replaced by a node of type 1; this change corresponds to the column vector (1, −1, 0, 0, 0) . If the key is instead added to the leaf (which happens with probability 2/3) a node of type 2 is replaced by another node of type 2 (the change of type 2 is 0) and a node of type 4; this change corresponds to the column vector (0, 0, 0, 1, 0) . Hence By calculating the B i 's we obtain the matrix B in (1.4) as From (1.6) in Theorem 1.5 it follows that the covariance matrix Σ for the asymptotic multivariate normal distribution of X n = (X n1 , X n2 , X n3 , X n4 , X n5 ), is given by (2.9) Thus, it follows that

Protected nodes in ternary search trees and Pólya urns
We now proceed by analyzing the number of protected nodes in ternary search trees, by using the Pólya urn in Section 1.1.3 (described for general m-ary search trees ) when m = 3. The 19 different types we get are shown in Figure 6 (with a numbering that will be used below). From our characterization of the types in Section 1.1.3, for example type 2 corresponds to (0,1,2). Note that type 17, type 18 and type 19 contain one protected node each, while the other types contain no protected nodes.
To determine the matrix A we proceed (as for the binary search tree) to find the transitions when a ball (in our case one of the 19 trees in our forest) of type i is chosen. Figure  7 illustrates the different situations for how a new key could be added to a ball (a tree) of type 2. All the other cases are similar, and we leave these cases as an exercise to the reader. From the different transitions for changing a node of type i we get the matrix A for ternary search trees in Figure 8. The example in Figure 7 gives the second column of A. The tree of type 2 has activity 8. If it is drawn, and the new key is added to the node with only one key which happens with probability 2 8 , then a tree of type 2 is replaced with a tree of type 1. If the new key is instead added to one of the nodes containing two keys which happens with probability 6 8 , then the tree of type 2 is replaced by a tree of type 8 and one tree of type 13. Thus, the second column of the matrix A for the ternary search tree is given by   The eigenspace belonging to the eigenvalue −4 (which has algebraic multiplicity 4) has dimension 3. Since the dimension of the eigenspace belonging to the eigenvalue −4 is not equal to the algebraic multiplicity, the matrix A is not diagonalisable. (However, all other eigenspaces have full dimension.) Hence, we can not apply Theorem 1.5. However, Theorem 1.4 can be applied since a · E(ξ i ) = 1 for each i (this follows since we always add exactly one key when a tree of type i is chosen). From Theorem 1.4 we obtain that the vector X n = (X n1 , . . . , X n19 ), where X ni are the number of balls of type i (in our case the number of trees that correspond to type i in our forest obtained from the ternary search tree), has asymptotically a multivariate normal distribution. Let Z n be the number of protected nodes in the ternary search tree with n nodes. Since type 17, type 18 and type 19 each contains exactly one protected node, while the other types contain no protected nodes, Thus, Theorem 1.4 implies that  Figure 7: The two possibilities for adding a key to a node in a tree of type 2 of a ternary search tree.  and, writing Σ = (σ i,j ) 19 i,j=1 , Using the normalization in (1.2), we see that We only describe how to get B 2 since the other cases are analogous. From Figure 7 (and the explanation of that figure above) it is easy to see that −1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) and 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0 Summing the σ i,j in (3.6), which is equivalent to calculating (1, 1, 1)Σ p (1, 1, 1) , we find which completes the proof of Theorem 1.1.

Leaves in ternary search trees
Recall that a leaf is an internal node without internal children, i.e., a node that contains at least one key and has no children except possibly external ones. The proof of Theorem 1.1 yields also the following theorem. (The corresponding result for a binary search tree was considered already by Devroye [5] using two different methods, one of them a Pólya urn as here.) First proof. Counting the number of leaves (of the original ternary search tree) in each type in Figure 6, we see that the number of leaves in a subtree of type i, i = 1, . . . , 19, is given by the vector = (3, 3, 3, 2, 3, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1, 0, 0, 0) . (4.1) Figure 9: An external node which is not a child of a leaf. Figure  10: A leaf containing one key. Figure  11: A leaf containing two keys and its three external children.  Hence, L n = · X n . By the proof of Theorem 1.1, the vector X n has asymptotically a multivariate normal distribution, and it follows that with, using (3.4) and (4.1), and, using the covariance matrix Σ shown in the appendix, However, it is also possible to show Theorem 4.1 using a much simpler Pólya urn process, where we only need to consider four different types. We again chop up the ternary search tree into small subtrees, now using the following types of subtrees.
Type 1 is an external node which is not a child of a leaf. Type 2 is a node containing one key. Type 3 is a leaf containing two keys together with its three external children. Type 4 is an internal node containing two keys which is not a leaf (i.e., it has less than three external children). The types are shown in Figure 13. Note that all nodes in the ternary search tree belong to exactly one such subtree.
A ball of type 1 has activity 1; when it is drawn it is replaced by one ball of type 2. A ball of type 2 has activity 2; when it is drawn it is replaced by one ball of type 3. A ball of type 3 has activity 3; when it is drawn it is replaced by one ball of type 2, two balls of type 1 and one ball of type 4. A ball of type 4 has activity 0 and is thus never drawn. The types that contain leaves are type 2 and type 3.
To simplify we can study another urn using the gaps as balls. Type 1 has one gap, type 2 has two gaps, type 3 has three gaps and type 4 has 0 gaps. We label each gap with the type it belongs to; thus the gaps have only the three types 1-3. The gaps evolve as an urn with three types, with all activities 1 and the matrix A in (1.1) given by Since we consider the gaps (with activity 1) it is obvious that all columns add to 1 (since we always add one ball to the urn). The eigenvalues of A are 1, −3, −4. Theorem 1.5 shows that (X n1 , X n2 , X n3 ) has asymptotically a multivariate normal distribution, where X ni is the number of balls of type i in the Pólya urn, i.e., the number of gaps of type i. Note that the number of subtrees of Types 1-3 thus is (X n1 , X n2 /2, X n3 /3), which thus also is asymptotically multivariate normal. Since the number of leaves L n = X n2 /2+X n3 /3, it follows that L n has asymptotically a normal distribution (4.2).
To find the parameters µ L and σ 2 L , we note that right eigenvectors of A corresponding to the eigenvalues 1, −3, −4 are: and corresponding left eigenvectors of A are: Note that we have scaled the eigenvectors so that u i · v j = δ ij and (1.2) holds. We have a = (1, 1, 1) . Since type 2 has two gaps and one leaf and type 3 has three gaps and one leaf, it follows that corresponding to (4.3). By calculating B, we get from Theorem 1.5, that the covariance matrix Σ is given by  The Pólya urn defined in Section 1.1.3 can be used for any given m, although the size of the matrices used in the calculations grow rapidly with m. (For m = 4 we have 69 types; for m = 10 we would have 184755.) However, the central condition Re λ < λ 1 /2 is not satisfied for large m. We do not know any general formula for the eigenvalues of the matrix A, but some of them are given as follows.
is an eigenvalue of the matrix A for the Pólya urn in Section 1.1.3.
Proof. Let V in be the number of nodes containing exactly i keys (thus V 0n is the number of external nodes), and consider the vector W n = (W 1,n , . . . , W m−1,n ) where W i,n = iV i−1,n ; thus W i,n is the total number of gaps at nodes with i gaps. The random vector W n can also be described by a Pólya urn, see e.g., [ we denote the activity vector and the matrix (1.1) for this urn by a W = (1, . . . , 1) and A W . This means that the expected change of the two vectors when a new key is added are given by Furthermore, the vector X n determines the number of nodes with different numbers of keys, so there is a linear map W n = T X n . Consequently, by (5.2)-(5.3), for any X n , and thus T A = A W T . The (m − 1) × (m − 1) matrix A W is constructed as follows. Let a i,i = −i for i ∈ {1, . . . , m − 1}, a i,i−1 = i for i ∈ {2, . . . , m}, a 1,m−1 = m and all other elements a i,j = 0. I.e., As is well-known, the matrix A W has characteristic polynomial φ m (λ), see e.g., [12,Example 7.8] or [16,Section 8.1.3]. In particular, 0 is not an eigenvalue so A W is non-singular. The column vectors of A W are in the range of T , and thus T is onto.
Suppose that λ is a root of φ m (λ) = 0. Then λ is an eigenvalue of A W and thus there exists a left eigenvector u with u A W = λu . Consequently, u T A = u A W T = λu T, (5.5) so (u T ) = T u is a left eigenvector of A. Since T is onto, T is injective and thus T u = 0. This shows that λ is an eigenvalue of A too.
Recall that λ 1 = 1 for the matrix A, since the total activity increases by 1 at each step. Let λ 1 , λ 2 , . . . , λ m−1 be the roots of (5.1) in order of decreasing real parts. It is wellknown that λ 1 = 1 and, moreover, that Re λ 2 ≤ 1/2 if and only if m ≤ 26, see [17] and [9]. Consequently, if m ≥ 27, then Lemma 5.1 shows that A has an eigenvalue λ = λ 2 = λ 1 with Re λ 2 > 1/2, and then X n is not asymptotically normal. (See [12] for general results suggesting this, and [4] for a rigorous proof in the present case, showing that the total number of internal nodes is not asymptotically normal.) Furthermore, if α := Re λ 2 > 1/2, then (X n − E X n )/n α is stochastically bounded, but has no limit in distribution (the distribution oscillates), see [4,2,12].
Some exceptional linear combinations of the variables X ni are asymptotically normal also in such cases [12], but we conjecture that for any m ≥ 27, the number of protected nodes is not one of these exceptional cases and that it has the same non-normal behaviour as just described for the number of internal nodes.
On the other hand, if m ≤ 26, although A has a much larger dimension that A W , and thus presumably many more eigenvalues, we conjecture that all additional eigenvalues also have Re λ < 1/2, so that Theorem 1.4 applies showing that the number of protected vertices is asymptotically normal, with asymptotic variance linear in n, just as for m = 2 and 3 in Theorems 1.2 and 1.1. (This conjecture has been verified for m ≤ 6 by Heimbürger [10].)

One-protected nodes and leaves in m-ary search trees.
As mentioned in Section 1, the number of one-protected nodes and the number of leaves (the complement of the one-protected nodes) are easier to analyze than the two-protected nodes, and we prove normal limit laws for all m-ary search trees where m ≤ 26. In these cases we can use a Pólya urn that is similar to the Pólya urn that has earlier been used to study the total number of internal nodes in an m-ary search tree, see e.g. Mahmoud [15] and [16,Section 8.1.3] or [12,Example 7.8].
We can generalise the study of the number of leaves in ternary search tree in Section 4 to arbitrary m ≥ 2. (For m = 2, there are minor modifications in the formulas below; we leave these to the reader. As mentioned above, the case m = 2 was considered by Devroye [5].) We have in general m + 1 types, defined in analogy with Figure 13: Type 1 is as before, Type i with 2 ≤ i ≤ m − 1 is a leaf with i − 1 keys, Type m is a leaf with m − 1 keys together with its m external children, and Type m + 1 is an internal non-leaf.
Let V i,n = V i,n be the number of nodes containing exactly i keys for i ∈ {1, . . . , m−2}; let V 0,n be the number of nodes containing 0 keys (external nodes) that are not children of leaves; let V m,n be the number of nodes containing m − 1 keys that are leaves (i.e., they have only external children); finally, let V m+1,n be the number of internal nodes that are not leaves (all containing m − 1 keys). We consider again another, slightly simpler, urn with the balls representing the gaps, giving them types 1, . . . , m, and consider the vector W n = (W 1,n , . . . , W m,n ) where W i,n = iV i−1,n is the total number of gaps of type i.
The random vector W n can be described by a Pólya urn, with all activities 1. We denote the m×m matrix (1.1) for this urn by A L . It is a minor modification of the matrix A W described in Section 5.1, see (5.4); the entries of A L are given by a i,i = −i for i ∈ {1, . . . , m}, a i,i−1 = i for i ∈ {2, . . . , m}, a 1,m = m − 1, a 2,m = 2, and all other entries a i,j = 0. I.e., We can easily calculate the characteristic polynomial of A L and find that it is where φ m (λ) is the characteristic polynomial of A W in (5.1). Thus, A L has the same eigenvalues as A, plus the additional eigenvalue λ = −m. Since φ m has only simple roots [14,Section 3.3], and −m is not one of them, also φ L m has only simple roots. Hence, A L has m distinct eigenvalues, and is thus diagonalisable.
The largest eigenvalue of A L is λ 1 = 1 (as for A) and this eigenvalue corresponds to the right and left eigenvectors where , (5.11) and σ 2 L,m can be evaluated as where (σ ij ) m i,j=1 is given by (1.6).
Proof. As said above, for m ≤ 26, Re λ < λ 1 /2 = 1/2 for all eigenvalues λ = λ 1 of A, and thus also of A L . Furthermore, A is diagonalisable. Hence, Theorem 1.5 applies and shows asymptotic normality of W n . The result follows by (5.9), using v 1 in (5.8). For m ≥ 27, we expect the same non-normal asymptotic behaviour as for the number of internal nodes [4,2], see Section 5.1.
For the one-protected nodes we can use the first Pólya urn described above for the leaves, with m + 1 types. For the leaves we could simplify by considering the gaps and use a Pólya urn with m types, with all activities 1. However, now we also need to consider type m + 1, which has 0 gaps. So in the analysis of the one-protected nodes we use the urn with m + 1 different types (as explained in the beginning of this subsection) where types i ∈ {1, . . . , m} have activities 1, 2, . . . , m and type m + 1 has activity 0. In this Pólya urn, the one-protected nodes correspond to type m + 1. All other types correspond to leaves or external nodes. Theorem 1.5 implies the following result (the proof is analogous to the proof of Theorem 5.2).