The Aldous-Shields model revisited (with application to cellular ageing)

In Aldous and Shields (1988), a model for a rooted, growing random binary tree was presented. For some c>0, an external vertex splits at rate c^(-i) (and becomes internal) if its distance from the root (depth) is i. For c>1, we reanalyse the tree profile, i.e. the numbers of external vertices in depth i=1,2,.... Our main result are concrete formulas for the expectation and covariance-structure of the profile. In addition, we present the application of the model to cellular ageing. Here, we assume that nodes in depth h+1 are senescent, i.e. do not split. We obtain a limit result for the proportion of non-senescent vertices for large h.


Introduction
Trees arise in several applied sciences: In linguistics and biology, trees describe the relationship of items (languages, species) and in computer science, trees are used as data structures, e.g. for sorting. Randomizing the input leads to random trees, which are object of a large body of research. For applications in biology, see e.g. [6,14]. Here, important examples are trees arising from branching processes (e.g. Yule trees). In computer science, prominent examples are search trees; see e.g. [18,10].
In this note, we are concerned with an application of random trees in cellular biology. In the 1960s it was known that eukaryotic cells have a limited replication capacity ( [16]). The number of generations until cells do not proliferate any more is today known as the Hayflick limit and the phenomenon that cells loose their ability to proliferate is called cellular senescence. The molecular basis for cellular senescence were uncovered starting in the 1970s. A theory was developed which argued that during each round of replication, the telomeres (which are the end part of each chromosome) are shortened due to physical constraints of the DNA copying mechanism ( [20]). In humans, these telomeres are a multiple (i.e. more than 1000-fold) repetition of the base pairs TTAGGG and up to 200 bases are lost in each replication round ( [17]). Most importantly, telomeres have a stabilizing effect on the DNA. The DNA repair mechanism of a cell must be able to distinguish between usual DNA breaks (which it is assumed to repair) and the telomeres (which it is assumed to ignore). Hence, when telomeres become shorter this stabilizing effect seizes and ageing occurs. It can be observed that telomeres shrink from 15 kilobases at birth to less than 5 kilobases during a lifetime ( [24]). However, the enzyme telomerase is known to be able to decrease the loss of telomeres during replication. This enzyme has been found to be active in stem cells and cancer cells, which both are cell types with an (almost) unbounded replication potential. The deeper understanding of the role of telomeres and telomerase is an active field of research because of the medical implications for ageing and cancer. In particular, it was awarded the Nobel prize in medicine in 2009 ( [25]). We study the model of random trees introduced in Aldous and Shields in [1] (hereafter referred to as [AS]) and extend it for an application to cellular ageing. Given some c > 0 and a full binary tree T, the model introduced in [AS] describes the evolution of the vertices of the tree. Here, we distinguish internal, external and prospective vertices. At t = 0, the root is the only external vertex (and there are no internal vertices). An external vertex u ∈ T in depth |u| becomes internal at rate c −|u| . At the time it becomes internal, the two daughter vertices in depth |u| + 1 become external. We present our result on the profile of the Aldous-Shields model in Theorem 1.
For our application to cellular senescence, we will analyze a relative of the Aldous-Shields model for c > 1. Here, a critical depth h is fixed, and only external vertices in depth at most h can become internal. External vertices in depth h + 1 never become internal. Here, external vertices can be thought of as cells. The depth of a vertex is the number of generations from the first cell. Vertices in depth at most h represent proliferating cells, because they are able to produce offspring (i.e. daughter cells). Vertices in depth h + 1 represent senescent cells. This model has two features, which appear to be realistic in cellular senescence. First, the rate of cell proliferation decreases with the generation of a cell, parameterized by c > 1. Second, cells which have already split too often loose their ability to proliferate at all. For this model, we obtain a limit result for the frequency of proliferating cells in Theorem 2.
The paper is organized as follows: In Section 2, we state our results on the Aldous-Shields model. The application to cellular senescence is carried out in Section 3, where we also give an overview of other models for cellular senescence in the literature. Section 4 contains the proofs for our results on the Aldous-Shields model (Theorem 1), and in Section 5, we give proofs for the results on the model of cellular ageing (Theorem 2).

Model and results
We start by introducing some notation. Let T be the complete binary tree, given through We refer to elements in T by vertices and identify u ∈ T n by a word of length n over the alphabet {0, 1}, whose ith letter is u i , n ≥ 1. The vertex ∅ is the root of the tree and vertex u ∈ T has two daughter vertices, u0 and u1. (We make the convention that ∅0 := 0, ∅1 := 1.) For u ∈ T we set |u| = n iff u ∈ T n . We say that u is an ancestor of v if |u| < |v| and there are i 1 , ..., i |v|−|u| ∈ {0, 1} with v = ui 1 · · · i |v|−|u| . The ancestor induces a transitive order relation in T, and we write u ≺ v iff u is ancestor of v.
Given Y (t) = y ∈ {0, 1} T and u ∈ T with y u = 1, it jumps to ( y v ) v∈T , given by at rate c −|u| . In this case, we say that vertex u splits. Remark 2.2 (Internal and external vertices). Let Y = (Y (t)) t≥0 be the Aldous-Shields model and Y = Y (t) for some t ≥ 0. It is important to note that the dynamics is such that any path ∅, i 1 , i 1 i 2 , ... ∈ T with i 1 , i 2 , ... ∈ {0, 1}, starting at the root, has exactly one element u with Y u = 1. In particular, the sets of internal, external and prospective vertices are disjunct.
Definition 2.3 (Profile). Let Y = (Y (t)) t≥0 be the Aldous-Shields model and Y = Y (t) for some t ≥ 0. We define X n := u∈Tn Y u , X n := 2 −n X n , (2.1) the total number of external vertices and the relative proportion of external vertices in depth n, respectively. The vector (X n ) n=0,1,2,... is also called the profile and is the total number of external vertices.
Remark 2.4 (Dependence on c). The behaviour of the Aldous-Shields model strongly depends on c. A larger c implies that the profile is more concentrated around certain depths. This is because a larger c means that external vertices in smaller depth have a higher chance to be the next to split. See Figure 1 for an illustration. Two values of c are of particular important in applications from computer science: for c = 1, and if X = n, the set of external vertices is a binary search tree with n external vertices. For c = 2 and X = n, the set of external vertices is a digital search tree with n external vertices; see e.g. [10].
Remark 2.5 (Relative frequencies). We observe that for all t, almost surely. To see this, note that X 0 (0) = 1 and X n (0) = 0 for n > 0, i.e. (2.2) holds at t = 0. Additionally, assume that some u with |u| = n splits at time t. Then, we have that In particular, every split leaves ∞ n=0 X n unchanged which shows (2.2). Remark 2.6 (Notation). In our results, we will give asymptotics of moments of X n+i (tc n ) for large n. Generally, for two sequences (x n ) n=1,2,... and (y n ) n=1,2,... , which may depend on other parameters, we write Theorem 1 (Moments of the profile and their limits). Let c > 1.
and a 0 = b 0 = 1 (with the convention that an empty product equals 1). For negative k, we set Then, for t > 0 and i > −n, Remark 2.7 (Convergence and covariance). 1. It is immediate from the Theorem that in probability, for all t > 0 and all c > 1.
2. The covariances given in (2.5) show a phase transition at c = √ 2. Such a phase transition is already known from results by [AS] and [9]. However, these papers do not give explicit formulas for the covariance structure.
Remark 2.8 (Connection to results from Aldous and Shields (1988)). In [AS], the evolution of the vector ( X n (t)) n=0,1,2,... is studied. In their Theorems, a law of large numbers if X n (t) to some deterministic limit x n (t) is stated and proved using martingale methods. Their result implies that (2.7) holds even almost surely on compact time intervals. In particular, they claim that the limit x i (t) of X n+i (tc n ) must satisfy x i (t) = x i+1 (ct), which clearly holds for the right hand side of (2.7). In addition, they show that a suitably rescaled process, 2 n/2 ( X n+i (t) − x i (t)) t≥0 , converges as n → ∞ weakly to a diffusion for c > √ 2. The rescaling factor 2 n/2 can also be seen from the above Theorem. Moreover, (2.5) shows that a convergence of c n ( X n+i (t) − x i (t)) t≥0 to a diffusion can be conjectured for 1 < c < √ 2.
Remark 2.9 (Connection to work of Dean and Majumdar (2006)). In [9], the total number of external vertices, X, was studied in the context of the Aldous-Shields model on an m-ary tree. In the binary case, a functional equation (their equation (2)) for the Laplace transform of X(t) was shown to hold true. This equation uses the following fact: Given that T is the random time of the first split in the model, it is clear that T is mean one exponential and, in addition, where X and X are independent of T and of each other and distributed like X. From their identity on Laplace transforms, [9] show the phase transition for the variance of the number of occupied vertices at c = √ 2, which is also seen from Theorem 1.

Application: cellular ageing
The first mathematical model for cellular senescence was given in [17]. It takes several biological facts into account. When DNA is copied, the double helix is unfolded and both strands of DNA are copied. Only in one of the two strands there are physical constraints by which the end of a chromosome cannot be perfectly copied. This shortening of telomeres is independent for all chromosomes. In [17], a fixed length for telomeres which decreases by a fixed amount at each proliferation event for one of the daughter cells and proliferation occurs along a full binary tree is assumed. If the length of a telomere of one chromosome falls below a threshold, a cell cannot replicate any more and becomes senescent. This threshold takes the Hayflick limit into account, which states that a cell line can only life for a limited number of generations before it becomes senescent. The model by [17] was extended in several directions. A stochastic amount of loss of telomeres was studied in [2]. In [3] and [19], the binary tree of proliferating cells from [17] was replaced by a branching model. In particular, [19] took cell death into account, with different death rates above and below a critical threshold of telomere length. Age structure of cells (i.e. structure which phase of the cell cycle) is taken into account by [11,12]. Moreover, [4] extend the model of [17] by explicitly taking telomerase activity (which is present in stem cells and cancer cells) into account.
The idea to use the Aldous-Shields model for cellular ageing was influenced by the following recent results: 1. In [7], a model is proposed which distinguishes two states of telomeres: capped and uncapped.
Only in the capped state, proliferation of the cell is still possible. In somatic cells, an uncapped telomere cannot be transformed to the capped state any more leading to senescent cells; see the model of [22]. In stem and tumor cells, telomerase is (among other things) responsible for transitions from uncapped back to capped telomeres. Following [23], the transition rate of the uncapped to the capped state in stem cell decreases with shorter telomeres.
2. In data, it has been observed that proliferating cells can behave differently. Motivated by data from [5,8,15], it is argued in [21] that the rate of proliferation decreases for shorter telomeres. Their model produces a Gompertzian growth model which is known to fit to empirical data for somatic and tumor cells.
In stem cells, the decreasing rate for an uncapped telomere to reeneter the capped state for shorter telomeres from [23] shows exactly the behaviour of the Aldous-Shields model: cells with a long replicative history proliferate slower. While [21] use a linear decrease in replication rate, depending on telomere length, the Aldous-Shields model uses a geometric decay of the proliferation rate. Note that short telomeres can be seen as a form of damage. In [13], models for cellular damage were introduced. In their model, cells inherit damage to the daughter cells. This model, as well as the Aldous-Shields model are among the analytically tractable ones. We state our model of cellular ageing: at rate rc −|u| . (Note that vertices u ∈ T h+1 do not split.) Informally, every external vertex u in this process represents a cell. If |u| = n, we say that the cell is in generation n. The process starts with a single mother cell. It proliferates at rate r. All cells up to generation h from the mother cell follow the usual dynamics of the Aldous-Shields model (with time rescaled by a factor of r), such that cells in generation n proliferate at rate r · c −n . If a cell is in generation h + 1 from the mother cell, its telomeres have reached the Hayflick limit and the cell is not able to proliferate any more.
is of particular importance. Here, Z p (t) and Z s (t) is the number of proliferating and senescent cells at time t, respectively.
Theorem 2 (Frequency of replicating cells). For Z and L as in Definition 3.1 and 3.2, in probability, for all t > 0.
Proof. To see that E contains the eigenvectors of A, note that for i ≥ j To see that E and F are inverse to each other, it follows from the definition of a k and b k that For i > j, we set n := i − j > 0 and obtain where we have reversed the order of the summands in the last equality. We rewrite (c l −1) · · · (c l − 1) = l j=l (c j − 1) = 1 for l > l and claim that for all n > 0 n k=0 which implies that F and E are inverse to each other. We use induction and note that the assertion is clear for n = 1. Given it is true for n, we have where we have used the induction hypothesis in the second and in the last equality. Hence, we have shown (4.3) and the proof is complete.

First order structure; proof of (2.4)
By linearity, we can now explicitly solve (4.2) using Lemma 4.1. Since E contains the eigenvalues of A and F is inverse to E, we immediately write, using y(t) := (y 0 (t), y 1 (t), y 2 (t), ...) and D : because z(0) = (1, 0, 0, ...) as the process starts with (Y (0)) = (1 u=∅ ) u∈T . We have shown the first part of (2.4) and in order to prove the second part, fix i, k, t and note that n → |a k |b n+i−k e −c −i+k t is increasing with a summable limit. Hence, by dominated convergence, 4.3 Second order structure; proof of (2.5) Now we come to the second order structure. Similar to the definition of y n in (4.1), we set for n ≤ n y n,n ,m (t) In order to see the connection of y n,n ,m (t) and COV[X n (t), X n (t)], we define the depth of the most recent common ancestor of u, u as We write = 2 n+n n m=0 2 −((m+1)∧n) y n,n ,m (t). (4.5) Using the last equations we now prove (2.5) in three steps. First, we give a representation of y n,n ,m in terms of a functional of T m . Second, we derive the asymptotics of y n+i,n+i ,m (tc n ) for large n using this representation. Last, we plug this asymptotics into (4.4).
Step 1 (Exact representation of y n,n ,m (t)). In this step we show that for m < n ≤ n y n,n ,m (t) = and for m = n ≤ n y n,n ,m (t) = δ n,n y n (t) − y n (t)y n (t), (4.7) where δ n,n is Kronecker's δ.
Proof. For (4.6), observe that Y 0n (t) and Y 0m1 n −m (t) are independent, given T m . Moreover, it is For (4.7), note that m = n implies that Y 0n (t)Y 0m1 n −m (t) = δ n,n Y 0n (t) and the result follows.

Proof of Theorem 2
The parameter r is only a rescaling of time. Hence, we can safely assume r = 1 in our proof. Let Y = (Y (t)) t≥0 be the Aldous-Shields model with parameter c and X n (t) as in (2.1). Defining it is important to note that X h+1 (t) = #{u ∈ T h+1 : ∃v : u v, Y v (t) = 1}, almost surely; see also Remark 2.5. This implies that we can couple Y and Z in the sense that (X 1 (t), ..., X h (t), X h+1 (t)) t≥0 2a k e −c −i+k t in probability, for all t > 0. Using the last two limits in the definition of L in (3.1) gives the result.