K-cut on paths and some trees

We define the (random) $k$-cut number of a rooted graph to model the difficulty of the destruction of a resilient network. The process is as the cut model of Meir and Moon except now a node must be cut $k$ times before it is destroyed. The first order terms of the expectation and variance of $\mathcal{X}_{n}$, the $k$-cut number of a path of length $n$, are proved. We also show that $\mathcal{X}_{n}$, after rescaling, converges in distribution to a limit $\mathcal{B}_{k}$, which has a complicated representation. The paper then briefly discusses the $k$-cut number of some trees and general graphs. We conclude by some analytic results which may be of interest.

We call the (random) total number of cuts needed to end this procedure the k-cut number and denote it by K(G n ). (Note that in traditional cutting models, nodes are removed as soon as they are cut once, i.e., k = 1. But in our model, a node is only removed after being cut k times.) One can also define an edge version of this process. Instead of cutting nodes, each time we choose an edge uniformly at random from the component that contains the root and cut it once. If the edge has been cut k-times then we remove it. The process stops when the root is isolated. We let K e (G n ) denote the number of cuts needed for the process to end.
Throughout human history, various secret societies of very different structures existed [14]. Nonetheless, most such societies have a few leaders who are critical for the organizations to function properly. The k-cut process can be seen as a simplified model of the destruction process of a resilient secret network. The graph G n represents the structure of the network and the root node represents the leader. We assume that active members of the network are chosen uniformly at random to be investigated by the authority, and that a member stops operating after having been investigated k times. The network completely breaks down when the root (leader) stops working. Thus the random number K(G n ) models how much effort it takes to destroy the network.
Remark 1. A model of similar flavor was introduced for the destruction of terrorist cells with tree-like structures [15]. In this model, each node of a tree is removed in one step, independently at random with some fixed probability. The quantity being studied is the probability that the root node (leader) is separated from all the leaves (operatives). This model has been studied for deterministic trees [6] and for conditioned Galton-Watson trees [10].
Our model can also be applied to botnets, i.e., malicious computer networks consisting of compromised machines which are often used in spamming or attacks. The nodes in G n represent the computers in a botnet, and the root represent the bot-master. The effectiveness of a botnet can be measured using the size of the component containing the root, which indicates the resources available to the botmaster [8]. To take down a botnet means to reduce the size of this root component as much as possible. If we assume that we target infected computers uniformly at random and it takes at least k attempts to fix a computer, then the k-cut number measures how difficult it is to completely isolate the bot-master.
The case k = 1 and G n being a rooted tree has aroused great interests among mathematicians in the past few decades. The edge version of one-cut was first introduced by Meir and Moon [29] for the uniform random Cayley tree. Janson [23,24] noticed the equivalence between one-cuts and records in trees and studied them in binary trees and conditional Galton-Watson trees. Later Addario-Berry, Broutin, and Holmgren [1] gave a simpler proof for the limit distribution of onecuts in conditional Galton-Watson trees. For one-cuts in random recursive trees, see Meir and Moon [30], Iksanov and Möhle [22], and Drmota, Iksanov, Moehle, and Roesler [12]. For binary search trees and split trees, see Holmgren [19,20].

The k-cut number of a tree
One of the most interesting cases is when G n = T n , where T n is a rooted tree with n nodes.
There is an equivalent way to define K(T n ). Imagine that each node is given an alarm clock. At time zero, the alarm clock of node j is set to ring at time T 1,j , where (T i,j ) i≥1,j≥1 are i.i.d. (independent and identically distributed) Exp(1) random variables. After the alarm clock of node j rings the i-th time, we set it to ring again at time T i+1,j . Due to the memoryless property of exponential random variables (see [13, pp. 134]), at any moment, which alarm clock rings next is always uniformly distributed. Thus if we cut a node that is still in the tree when its alarm clock rings, and remove the node with its descendants if it has already been cut k-times, then we get exactly the k-cut model.
How can we tell if a node is still in the tree? When node j's alarm clock rings for the r-th time for some r ≤ k, and no node above j has already rung k times, we say j has become an r-record. And when a node becomes an r-record, it must still be in the tree. Thus summing the number of r-records over r ∈ {1, . . . , k}, we again get the k-cut number K(T n ). One node can be a 1-record, a 2-record, etc., at the same time, so it can be counted multiple times. Note that if a node is an r-record, then it must also be a j-record for j ∈ {1, . . . , r − 1}.
To be more precise, we define K(T n ) as a function of (T i,j ) i≥1,j≥1 . Let i.e., G r,j is the moment when the alarm clock of node j rings for the r-th time. Then G r,j has a gamma distribution with parameters (r, 1) (see [13,Thm. 2.1.12]), which we denote by Gamma(r). In other words, G r,j has the density function, where Γ(z) denotes the gamma function [11, 5.2.1]. Let where · denotes the Iverson bracket, i.e., S = 1 if the statement S is true and S = 0 otherwise. In other words, I r,j is the indicator random variable for node j being an r-record. Let Then K r (T n ) is the number of r-records and K(T n ) is the total number of records.

The k-cut number of a path
Let P n be a one-ary tree (a path) consisting of n nodes labeled 1, . . . , n from the root to the leaf. Let X n def = K(P n ) and X r n = K r (P n ). In this paper, we mainly consider X n and we let k ≥ 2 be a fixed integer.
The first motivation of this choice is that, as shown in section 5, P n is the fastest to cut among all graphs. (We make this statement precise in Lemma 12.) Thus X n provides a universal stochastic lower bound for K(G n ). Moreover, our results on X n can be immediately extended to some trees of simple structures: see Section 5. Finally, as shown below, X n generalizes the well-known record number in permutations and has very different behavior when k = 1, the usual cut-model, and k ≥ 2, our extended model.
The name record comes from the classic definition of record in random permutations. Let σ 1 , . . . , σ n be a uniform random permutation of {1, . . . , n}. If σ i < min 1≤j<i σ j , then i is called a (strictly lower) record. Let K n denote the number of records in σ 1 , . . . , σ n . Let W 1 , . . . , W n be i.i.d. random variables with a common continuous distribution. Since the relative order of W 1 , . . . , W n also gives a uniform random permutation, we can equivalently define σ i as the rank of W i . As gamma distributions are continuous, we can in fact let W i = G k,i . Thus being a record in a uniform permutation is equivalent to being a k-record and K n L = X k n . Moreover, when k = 1, K n L = X n . Starting from Chandler's article [7] in 1952, the theory of records has been widely studied due to its applications in statistics, computer science, and physics. For more recent surveys on this topic, see [3], [4], [32] and [5].
A well-known and surprising result of K n by Rényi [33] is that (I k,j ) 1≤j≤n are mutually independent. It follows easily that where N (0, 1) denotes the standard normal distribution [13, pp. 111].
The following theorem shows that only one-records actually matter.
here constants η k,r are defined by Therefore In particular, when k = 2 The previous two theorems imply that the correct rescaling parameter should be n 1− 1 k . However, unlike the record number in permutations, the limit distribution of X n /n 1− 1 k has a rather complicated representation.
Theorem 3. Let (U j , E j ) j≥1 be mutually independent random variables with E j L = Exp (1) and U j L = Unif[0, 1]. We define the k-cut distribution B k by (1.8) (We use the convention that an empty product equals one.) Then

Remark 2. An equivalent recursive definition of S p is
Remark 3. It is easy to see that X e n+1 def = K e (P n+1 ) L = X n by treating each edge on a length n + 1 path as a node on a length n path.

Outline
In section 2 3 and 4, we prove Theorem 1, 2, and 3 respectively. In section 5, we discuss some easy extensions of our results to other graphs including binary trees, split trees and Galton-Watson trees. Finally, in section 6, we collect some auxiliary results used in our proofs.

the expectation
In this section we prove Theorem 1. Since E X k n ∼ log(n) is well-known, we only prove (1.4) for r < k.
Throughout this paper, we use the notation O( f ( z)) to denote a function g( z) such that for all z in a given set S, there exists a constant C > 0, |g( z)| ≤ C f ( z). Sometimes we do not explicitly state the set S when it is clear from the context.
Proof. Conditioning on G r,i+1 = x ≥ 0, for I r,i+1 = 1, i.e., for node i + 1 to be an r-record, we need to have G k,1 , . . . , G k,i all greater than G r,i+1 . Since these i random variables are i.i.d. Gamma(k), the probability of this event equals P {Gamma(k) > x} i .
Since G r,i+1 L = Gamma(r), using its density function (1.2), where the estimation of the integral comes from Lemma 16.
where η k,r is defined in (1.5).

the variance
In this section we prove Theorem 2.
First we estimate E I 1,i+1 I 1,j+1 for j > i ≥ 0. For the moment we condition on G 1,i+1 = x and G 1,j+1 = y. For I 1,i+1 I 1,j+1 = 1 to happen, both node i + 1 and node j + 1 must be one-records. Recalling the definition of one-records in (1.3), this event can be written as , and all these random variables are independent, we have It follows from G 1,i+1 and G 1,j+1 having Exp(1) distribution that Proof. In this case, max(x, y) = y, thus by (3.1) Let Poi(c) denote a Poisson distribution with mean c. Using the connection between Poisson processes and gamma distributions [13, Section 3.6.3], the inner integral equals It follows from Lemma 16 that where the last step uses the fact that only the summand for ℓ = 0 matters.
. Let x 1 = a β /k! and y 1 = b β /k!. Then for all a ≥ 1 and b ≥ 1, Proof. In the case of A 1,i,j , max(x, y) = x and y − x < 0. Thus by (3.1), It follows from Lemma 16 that where Γ(ℓ, z) denotes the upper incomplete gamma function [11, 8.2.2]. Let S be the integral area of (3.3). We will choose S 0 , an appropriate subset of S, in which the integrand of (3.3) can be well approximated by exp − ax k +by k k! . Then we will show that the part outside S 0 can be absorbed by the error term in (3.2). More precisely, let x 0 = a −α and y 0 = b −α where α = 1 In other words, S is the integration area of (3.3) and S 0 is the part of S in which x < x 0 and y < y 0 . Note that the shape of S 0 is different when a < b and a > b, see Figure 1. We split the integral into two parts Throughout the proof of this lemma, the O( f (a, b)) notation applies to (a, b) ∈ [1, ∞) 2 . For A 1,1 , i.e., the part inside S 0 , we can well approximate the integrand using Lemma 15 and get Assume for now that a > b, i.e., case (i) in Figure 1. Since exp − ax k +by k k! is monotonically decreasing in both x and y, where the last step uses ξ k a 2 , b = O(1) by Lemma 18, and that ax k 0 /k! = x 1 . For the case (ii) in Figure 1, we can further divide S \ S 0 into two parts, as depicted by the dotted line, to show that Together with (3.5), we can see that (3.6) is valid regardless of the order of a and b. Putting (3.4) and (3.6) together, we have Note that Γ(k, x)/Γ(k) is monotonically decreasing in x. Therefore, if x > x 0 , then by Lemma 15, and that Γ(k, 0) = Γ(k), Thus when a > b, i.e., case (i) in Figure 1, where we again use Lemma 15. Together with a similar analysis for a < b, we can see that, regardless of the order of a and b, It follows from (3.7) and (3.8) that from Lemma 18 and that e − x 1 2 and e − y 1 2 are exponentially small. Now we are ready to finish the proof of Theorem 2. Since we have Thus by Lemma 2, where last step follows from Lemma 19. Also, in the above computation, we can approximate the double sum by an integral because ξ k (a, b) is monotonically decreasing in both a and b (see Lemma 18). Plug (3.10), (3.11) into (3.9), where γ k and λ k are defined in Theorem 2.

convergence to the k-cut distribution
By Theorem 1 and Markov's inequality [13, Thm. 1.6.4], X r n /n 1− 1 k p → 0 for r ∈ {2, . . . , k}. So instead of proving Theorem 3 for X n , it suffices to prove it for X 1 n . The idea of the proof is to condition on the positions and values of the krecords, and study the distribution of the number of one-records between two consecutive k-records.
We use (R n,j ) j≥1 to denote the k-record values and (P n,j ) j≥1 the positions of these k-records. To define them more precisely, recall that G r,j is the moment when the alarm clock of j rings for the r-th time, see (1.1). Let R n,0 def = 0, and P n,0 otherwise let P n,p = 1 and R n,p = ∞. Note that R n,1 is simply the minimum of n i.i.d. Gamma(k) random variables.
Recall that S = 1 if S is true and S = 0 otherwise. According to (P n,j ) j≥1 , we can split X 1 n into the following sum where I 1,j is the indicator for j being a one-record and we also use the fact that a k-record must be a one record. Figure 2 gives an example of (B n,p ) p≥1 for n = 12.
It depicts the positions of the k-records and the one-records. It also shows the values and the summation ranges for (B n,p ) p≥1 . 0 1 2 3 4 5 6 7 8 9 10 11 12 13 P n,3 P n,2 P n,1 n P n,0 Recall that T i,j is the lapse of time between the alarm clock of j rings for the (i − 1)-th time and the i-th time, see (1.1). Conditioning on (P n,j ) j≥1 and (R n,j ) j≥1 , for j ∈ (P n,p , P n,p−1 ), we must have ∑ 1≤i≤k T i,j < R n,p−1 . (Otherwise j would have become a k-record.) And for j to be a one-record, we need T 1,j < R n,p . Since (T i,j ) i≥1,j≥1 are i.i.d. Exp(1) before the conditioning, we have where the last step uses Lemma 15. Then the distribution of B n,p is just where Bin(m, p) denotes a binomial (m, p) distribution. When R n,p−1 is small and P n,p−1 − P n,p is large, this is roughly Therefore, to simplify the computations, we first study a slightly modified model.
We say a node j is an alt-one-record if I * j = 1. As in (4.2), we can write Then conditioning on (R n,j , P n,j ) n≥1,j≥1 , B * n,p has exactly the distribution as (4.3). Figure 3 gives an example of (B * n,p ) p≥1 for n = 12. It shows the positions of alt-onerecords, as well as the values and the summation ranges of(B * n,p ) p≥1 . Note that the positions and the number of one-records and alt-one-records are not necessarily the same.
The most part of this section is devoted to showing that We will argue at the end of this section that X 1 n /n 1− 1 k and X * n /n 1− 1 k converge to the same limit.
Then we show that if we choose p large enough, then the leftovers, i.e., ∑ p<j B j and ∑ p<j B * n,j /n 1− 1 k , are negligible.

Proof of Proposition 1
The first step to prove (4.7) is to construct a coupling by defining all the random variables that we are studying in one probability space. Let since P n,0 = n + 1. And we do not change the definition of any other random variables.
Recall that R m,1 is the minimum of m independent Gamma(k) random vari- In other words, M(m, t) has the distribution of R m,1 conditioned on R m,1 > t.
Moreover, the density function of H m converges point-wise to the density function of H. The lemma also holds if we replace H m by Proof. We only prove the lemma for H m . Similar argument works for H ′ m . We show that for all fixed x ≥ t, P {H m > x} converges to Let y m = x/r Thus we have the point-wise convergence of density functions.
We define the auxiliary random variables The second step is to show that where (S p ) p≥1 are defined by (1.8) in Theorem 3. Moreover, the joint density function of (S n,1 , . . . , S n,p ) converges point-wise to the joint density function of (S 1 , . . . , S p ). The lemma also holds if we replace S n,j by S * n,j .
Proof. We only prove the lemma for S n,j . The same argument works for S * n,j . Let F = σ((U j ) j≥1 ) denote the sigma algebra generated by (U j ) j≥1 . To prove Lemma 5, we will condition on F and treat (U p , P n,p , L * n,p , L n,p ) p≥0,n≥1 as deterministic numbers. If we can show the convergence of distribution in (4.7) conditioning on F , i.e., if for all (x 1 , . . . , x p ) ∈ R p , P S n,1 > x 1 , . . . , S n,p > x p F − P S 1 > x 1 , . . . , S p > x p F → 0, then we have, for all fixed (x 1 , . . . , x p ), Recall that R n,1 is the minimum of n i.i.d. Gamma(k) random variables and P n,0 = n + 1, see (4.1). Then Let f n,1 (·) and f 1 (·) denote the density functions of S n,1 and S 1 respectively. It follows from Lemma 4 that where (E j ) j≥1 are i.i.d. Exp(1) random variables, and for all y 1 ∈ R f n,1 (y 1 ) → f p (y 1 ). (4.13) For p > 1, we condition on S p−1 = y p−1 ∈ [0, ∞). We will apply Lemma 4 to P n,p , by taking Recall that R n,p is the minimum of (P p−1 − 1) i.i.d. Gamma(k) random variables restricted to (R n,p−1 , ∞), see (4.1). Thus where we use the definition of L * n,p and S n,p in (4.11) to get Also note that by (4.9), Let f n,p (·|y p−1 ) and f p (·|y p−1 ) denote the density function of S n,p |S n,p−1 = y p−1 , and S p |S p−1 = y p−1 respectively. It follows from Lemma 4 that and for all y p ∈ [0, ∞) f n,p (y p |y p−1 ) → f p y p |y p−1 . (4.14) Then by (4.13) and (4.14), for all y 1 , . . . , y p ∈ [0, ∞) p , g n,p (y 1 , . . . , y p ) def = f n,p (y p |y p−1 ) f n,p−1 (y p−1 |y p−2 ) . . . f n,1 (y 1 ) In other words, the joint density function of (S n,1 , . . . , S n,p ) converges point-wise to the joint density function of (S 1 , . . . , S p ). Thus by Scheffé's lemma [17, pp. 227], we have the convergence in distribution in (4.12).
Now it is quite easy to finish the proof of Proposition 1 using the following lemma Proof. For all fixed ε > 0, where the second step uses Chernoff's bound [31, pp. 43].
We will apply Lemma 6 to B * n,j by taking m = P n,j−1 − P n,j , ℓ m = L * n,j , p m = y j L * n,j , c = y j .
Note that by (4.9) and L * n,p It follows from Lemma 6 that conditioning on A y 1 , . . . , y p defined in (4.15) Combining the two above expressions, Let g * n,p (y 1 , y 2 , . . . , y p ) and g * p (y 1 , y 2 , . . . , y p ) be the joint density functions of (S * n,1 , . . . , S * n,p ) and (S 1 , . . . , S p ) respectively. Then for all (x 1 , . . . , In other words, jointly, conditioning on F = σ((U j ) j≥1 ), and the convergence also holds without conditioning on F by the same argument for Lemma 5. Thus we are done proving Proposition 1.

The leftovers
In this section, we show that for p large enough, ∑ s>p B s , ∑ s>p B * n,s /n 1− 1 k , and ∑ s>p B n,s /n 1− 1 k are all negligible.
Lemma 7. For all ε > 0 and δ > 0, there exists an p ∈ N such that We are done since To deal with ∑ s>p B * n,s and ∑ s>p B n,s , the next lemma allows us to choose an appropriate p. Lemma 8. Uniformly for all p ∈ N and n ∈ N, P P n,p n ∈ e −3p/2 , e −p/2 ∩ P n,p R k n,p < k!
where the last steps uses (4.18) in Lemma 7.
On the other hand, by Lemma 5, Recall that denotes stochastically smaller than. Then by the definition of S p (see (1.8)), S k p k!W p where W p L = Gamma(p). It follows from (4.18) that P P n,p R k n,p ≥ k!
Lemma 9. For all ε > 0 and δ > 0, there exists p ∈ N and n 0 ∈ N such that for all n > n 0 , Proof. Let A p denote the event in (4.19) for a p chosen later. We condition on for (m, y) satisfying (m, y) ∈ S * def = (m, y) ∈ R 2 : ne −3p/2 ≤ m ≤ ne −p/2 , my k ≤ k! 3p 2 . (4.23) (If (m, y) / ∈ S * , the event A p (m, y) is empty.) Note that this changes the distribution of G r,j for j < m, from Gamma(k) to Gamma(k) restricted to (y, ∞). Thus by the definition of I * j in (4.4) where the last step follows from Lemma 15. Thus for n large enough, for all (m, y) ∈ S * , where the second inequality uses [11, 8.10.11]. Together with (2.1) in Lemma 1, Thus for n large enough, by Theorem 1, for all (m, y) ∈ S * , if we take p large enough. This implies that there exists an p ∈ N and n 0 such that for all n > n 0 , Now we are done since by Lemma 8, P A c p = O p −2 .
Lemma 10. For all ε > 0 and δ > 0, there exists p ∈ N and n 0 ∈ N such that for all n > n 0 , Proof. We again condition on A p (m, y), as defined by (4.22) in Lemma 9, for (m, y) satisfying (4.23).
Let (E ′ s ) s≥1 be i.i.d. Exp(1) random variables. By our conditioning, for j < m, the distribution of T 1,j+1 has changed from E ′ 1 to E ′ 1 conditioning on E ′ 1 + · · · + E ′ k > y. Let f (x) be the density function of T 1,j+1 conditioning on A p (m, y). Then by Lemma 15, for x ≥ y, and for x < y By (4.23), my k < k!3p/2 and m ≥ e −3p/2 n. So for n large enough y < 1/2. Thus by [11, 8.10.11], In other words f (x) ≤ 2e −x . Thus by (4.24) in Lemma 9. From now on the proof simply follows the same argument as Lemma 9.

Finishing the proof Theorem of 3
By Proposition 1 and Lemma 9, for all x > 0 and δ > 0, there exists ε > 0, p ∈ N and n 0 ∈ N, such that for all n ≥ n 0 , On the other hand, we can choose ε small enough such that And by Lemma 7, we can choose p such that P ∑ p<j B j ≥ ε < δ 3 . Thus by Proposition 1, In other words, we have Now we fill the gap between X * n and X 1 n as we promised.
Lemma 11. There exists a coupling such that Note that in the following proof, we construct (P n,j , R n,j ) j≥0 as in (4.1). In other words, we do not use the coupling constructed in subsection 4.1.
Proof. Recall that (T * i,j ) i≥1,j≥1 are i.i.d. Exp(1) random variables that we used, together with (P n,j , R n,j ) j≥0 to define X * n . Now we modify (T i,j ) i≥1,j≥1 by letting T i,j = T * i,j for all i ∈ N and j ∈ {P n,j } j≥0 , unless there is a discrepancy, i.e., if for some p ≥ 1, P n,p−1 < j < P n,p , and This may change the value of (B n,j ) j≥1 but not its distribution. Let J n,p denote the number of discrepancies between P n,p−1 and P n,p , i.e., Recall that (see (4.2) and (4.5)) Then with the above coupling, for all fixed p ∈ N, By Theorem 1, we have X k n /n 1− 1 k p → 0. It follows from Lemma 9 and Lemma 10 that by choosing p large enough, the last two terms of (4.26) divided by n 1− 1 k are all negligible. Thus, it suffices to only consider ∑ 1≤j≤p J n,j .
Therefore, it follows from Lemma 15 and the series expansion of the incomplete gamma function (see (6.1)) that ≤ P n,p−1 R k n,p ≤ 2 P n,p−1 (L * n,p ) k (L * n,p R n,p ) k a.s.
for n large enough, where denotes stochastically smaller than. In other words, for all fixed p ∈ N, sup n≥1 E J n,p < ∞. Thus ∑ 1≤i≤p J n,i Then by (4.26) we are done.

Remark 4.
It is not obvious how to directly compute E [B k ] from its representation B k = ∑ 1≤p B p (see (1.6)). In fact, it is not difficult to show that Thus ((X 1 n /n 1− 1 k ) 2 ) n≥1 is uniformly integrable and by Theorem 2 We leave the details to the reader.

some extensions
In this section we briefly discuss some easy implications of our main results.

A lower bound and an upper bound for general graphs
Let G n be the set of rooted graphs with n nodes. Recall that for G n ∈ G n , K(G n ) denotes the k-cut number of G n . The following lemma shows that P n , a path of length n with one endpoint as the root, is the easiest to break down among all graphs in G n .
Lemma 12. For all G n ∈ G n , i.e., the left hand side is stochastically dominated by the right hand side [18, pp. 68]. Therefore, min where η k,1 is as in Theorem 1.
Proof. Let T n be an arbitrary spanning tree of G n with the root of G n marked as the root of T n . It is not difficult to see that K(T n ) K(G n ) -adding edges to the graph certainly would not decrease the k-cut number.
Consider the simple case when T n is a tree that consists of only two paths connected to the root. If we disconnect one of these two paths from the root and connect it to the leaf of the other path, we can only decrease the number of records in the tree. In other words, we have a coupling which implies K(P n ) K(T n ).
For a more complicated T n , we can repeat the above transformation for subtrees that consist of a root and two paths connected to the root, until the whole tree becomes a path. In other words, for all trees P n ∈ G n , K(P n ) K(T n ). This proves (5.1). The second result of the lemma follows trivially from Theorem 1 (see e.g., [18,Theorem 2.15,pp. 71]).
The most resilient graph is obviously K n , the complete graph with n vertices. Thus we have the following upper bound.
Proof. Let S n be the tree of n nodes with one root and n − 1 leaves. Obviously K(K n ) L = K(S n ). So we can prove the lemma for K(S n ) instead. Let Y be the time when the alarm clock of the root rings for the k-th time. Let W 1,n , . . . , W n−1,n be the number of cuts each leaf receives. Conditioning on the event Y = y, W 1,n , . . . , W n−1,n are i.i.d. with the same distribution max(Poi(y), k) (each node can receive at most k cuts). In other words, conditioning on Y = y, by the law of large numbers, Therefore, (5.2) follows. Since K(S n )/n ≤ k, i.e., it is bounded, we also have where we omit the computation for the last step.

Path-like graphs
If a graph G n consists of only long paths, then the limit distribution K(G n ) should be related to B k , the limit distribution of K(P n )/n 1− 1 k (see Theorem 3). We give a few simple examples in this section whose details we leave to the reader. Example 1 (Long path). Let (G n ) n≥1 be a sequence of rooted graphs such that G n contains a path of length m(n) starting from the root with n − m(n) = o(n 1− 1 k ). Since it takes at most k(n − m(n)) cuts to remove all the nodes outside the long path, K(P m(n) ) K(G n ) K(P m(n) ) + ko n 1−1/k .
Together with Lemma 12, this implies that K(G n )/n 1− 1 k converges in distribution to B k .
n be a tree with n nodes. If T [ℓ] n consists of only a path connected to the root such that ℓ − 1 leaves are attached to each node on this path, except the last one which may have between 1 and ℓ leaves attached, then we call T

[ℓ]
n an ℓ-caterpillar. We call this path of length n/ℓ + O(1) the spine. It is easy to see that the number of one-records in T [ℓ] n is about ℓ times of the number of one-records in P ⌈n/ℓ⌉ . Therefore, it is not difficult to show that K(T Example 3 (Curtain). Let ℓ ≥ 2 be a fixed integer. Let T (ℓ) n be a graph that contains of only ℓ paths connected to the root, with the first ℓ − 1 of them having length n−1 ℓ . We call T (ℓ) n an ℓ-curtain. It is easy to see that cutting T (ℓ) n is very similar to cutting ℓ separated paths of length n ℓ . Therefore, we can show that k are i.i.d. copies of B k .

Deterministic and random trees
The approximation given in Lemma 1 can be used to compute the expectation of k-cut numbers in many deterministic or random trees. We give three examples: complete binary trees, split trees, and Galton-Watson trees.

Complete binary trees
Let T bi n be a complete binary tree of with n = 2 m+1 − 1 nodes, i.e., its height is m. Observe that for a node at depth i, i.e., at distance i to the root, to be an r-record we require that it has been cut r times before any of the i nodes above it have been cut k times. This has exactly the same probability for the i + 1-th node in a path to be an r-record. Hence the random variable I r,i+1 in Lemma 1 is also the indicator that a node at depth i is an r-record. Then by Lemma 1, for r ≤ k, Thus only the one-records matter as in the case of P n and

Split trees
Split trees were first defined by Devroye [9] to encompass many families of trees that are frequently used in algorithm analysis, e.g., binary search trees. The random split tree T sp n has parameters b, s, s 0 , s 1 , V and n which are required to satisfy the inequalities To define the random split tree consider an infinite b-ary tree U . The split tree T sp n is constructed by distributing n balls among nodes of U . For a node u, let n u be the number of balls stored in the subtree rooted at u. Once n u are all decided, we take T sp n to be the largest subtree of U such that n u > 0 for all u ∈ T sp n . Let V u = (V u,1 , . . . , V u,b ) be the independent copy of V assigned to u. Let u 1 , . . . , u b be the child nodes of u. Conditioning on n u and V u , if n u ≤ s, then n u i = 0 for all i; if n u > s, then (n u 1 , . . . , n u b ) ∼ Mult(n − s 0 − bs 1 , V u,1 , . . . , V u,b ) + (s 1 , s 1 , . . . , s 1 ), where Mult denotes multinomial distribution, and b, s, s 0 , s 1 are integers satisfying (5.3).
In the setup of split trees (and other random trees), we obtain K(T sp n ) by picking a random tree T sp n and a random k-cut of it. We let K r (T sp n ) be the total number of r-records, just as we did for fixed trees.
In the study of split trees, the following condition is often assumed: Let N be the number of nodes in T Holmgren [20,Thm. 1.1] also showed that for k = 1, condition A implies that K(T sp n ) converges to a weakly 1-stable distribution after normalization, and that Lemma 14. Let T sp n be a split tree defined as above. Assuming condition A, we have (1 ≤ r ≤ k), (5.5) otherwise we say it is bad. Let B sp n be the number of bad nodes in T sp n . It is known that there are not so many bad nodes. More specifically, by [21][Thm.
Let X sp n be the number of r-records that are also good nodes. By (5.6), which is negligible. Thus it suffices to prove the lemma for X sp n . By Lemma 1 and the definition of good nodes, we have Taking expectations, we get (5.5).

Galton-Watson trees
A Galton-Watson tree T gw is a random tree that starts with the root node and recursively attaches a random number of children to each node in the tree, where the numbers of children are drawn independently from the same distribution ξ. A conditional Galton-Watson tree T gw n is T gw restricted to size n. Conditional Galton-Watson trees have been well-studied, see, e.g., [25]. We assume throughout that Eξ = 1 and σ 2 def = Var (ξ) ∈ (0, ∞). Let Z i (T gw n ) be the number of nodes of depth i (at distance i to the root). Let H(T gw n ) be the height of T gw n . Then, by Lemma 1, conditioning on T gw n , It has been shown that [24, Theorem 1.13] uniformly for all i ≥ 1 and n ≥ 1. It is also well-known that H(T gw n ) is of the order √ n. More precisely, there exist constants C ′ and c ′ such that uniformly for all n ≥ 1.
In fact, we conjecture that n 1− 1 2k is actually the right order of EK(T gw n ). Let v 0 , . . . , v n−1 be the nodes of T n in depth first order. (In other words, v 0 is the root of the tree. Assuming that v 0 has d subtrees T 1 , . . . , T d attached to it, then v 1 , . . . , v n−1 are the nodes of T 1 in depth first order, followed by nodes in T 2 in depth first order, and so on, until T d . This continues recursively.) Let D n (i) be the depth of v i . As is well-known, when Eξ = 1 and σ 2 def = Var (ξ) ∈ (0, ∞), and e(t) is a Brownian excursion. See [28] for details. Therefore, the expected number of k-records in T gw n conditioned on T gw n satisfies Thus it is natural to expect that This has indeed been proved with other methods [24, Theorem A.1]. As a result, we conjecture that for r ∈ {1, . . . , k}, and as a result, we further conjecture that

some auxiliary results
In this section, we collect some lemmas that are used in previous sections.
Lemma 18. For a > 0, b > 0 and k ≥ 2, Moreover, ξ k (a, b) is monotonically decreasing in both a and b.