Universality of clonal dynamics poses fundamental limits to identify stem cell self-renewal strategies

How adult stem cells maintain self-renewing tissues is commonly assessed by analysing clonal data from in vivo cell lineage-tracing assays. To identify strategies of stem cell self-renewal requires that different models of stem cell fate choice predict sufficiently different clonal statistics. Here, we show that models of cell fate choice can, in homeostatic tissues, be categorized by exactly two ‘universality classes’, whereby models of the same class predict, under asymptotic conditions, the same clonal statistics. Those classes relate to generalizations of the canonical asymmetric vs. symmetric stem cell self-renewal strategies and are distinguished by a conservation law. This poses both challenges and opportunities to identify stem cell self-renewal strategies: while under asymptotic conditions, self-renewal models of the same universality class cannot be distinguished by clonal data only, models of different classes can be distinguished by simple means.


Introduction
Adult stem cells are the key players for maintaining and renewing biological tissue, due to their ability to persistently produce tissue cells through cell division and differentiation (National Institute of Health, 2009). For maintaining tissues in a homeostatic state, it is crucial that stem cells adopt suitable self-renewal strategies, a pattern of stem cell fate choices that balances proliferation and differentiation; otherwise, imbalanced proliferation may lead to hyperplasia and cancer. Therefore, the understanding and identification of stem cell self-renewal strategies has been a major goal of stem cell biology ever since the discovery of adult stem cells.
Classically, two stem cell self-renewal strategies have been proposed (Potten and Loeffler, 1990;Simons and Clevers, 2011a): following the Invariant Asymmetric division (IA) strategy, stem cells undertake only asymmetric divisions, whose outcome is one differentiating cell and one stem cell as daughter cells. The other proposed strategy, Population Asymmetry (PA) (Potten and Loeffler, 1990;Simons and Clevers, 2011a;Watt and Hogan, 2000;Klein and Simons, 2011), features additionally symmetric divisions, which produce either two stem cells or two differentiating cells as daughters, yet in balanced proportions. Both patterns of cell fate choice leave the number of cells on average unchanged and thus can maintain homeostasis. Assessing stem cell self-renewal strategies experimentally is difficult in vivo, since direct observation of cell divisions is rarely possible. Yet, through genetic cell lineage-tracing assays, the statistics of clones -the progeny of individual cellscan be obtained, and via mathematical modeling assessing cell fate dynamics became possible. With such an approach several studies suggested that population asymmetry prevails in many mouse tissues (e.g. Clayton et al., 2007;Lopez-Garcia et al., 2010;Simons and Clevers, 2011b;Doupé et al., 2012;Klein et al., 2010).
However, the interpretation of those studies has been challenged by a suggested alternative selfrenewal strategy, called Dynamic Heterogeneity (DH), featuring some degree of cell fate plasticity (Greulich and Simons, 2016). In this model, all stem cell divisions are asymmetric, yet it is in agreement with the experimental clonal data that had previously been shown to agree also with the population asymmetry strategy. Thus, those two strategies are not distinguishable in view of the clonal data.
This raises the question to what extent different stem cell self-renewal strategies can be distinguished at all via clonal data (Klein and Simons, 2011;Greulich, 2019). Here, we address this question by studying models for stem cell fate choice, which define the self-renewal strategies, in their most generic form. We show that many cell fate models predict, under asymptotic conditions, the same clonal statistics and thus cannot be distinguished via clonal data from cell lineage-tracing experiments. In particular, we find that there exist two particular classes of stem cell self-renewal strategies: one class of models which all generate an Exponential distribution of clone sizes (the number of cells in a clone) after sufficiently large time, and one which generates a Normal distribution under sufficiently fast stem cell proliferation. Crucially, these two classes are not differentiated via the classical definitions of symmetric and asymmetric stem cell divisions, but by whether or not a subset of cells is conserved. These classes thus bear resemblance to 'universality classes' known from statistical physics, as suggested in Klein and Simons, 2011. This leads us to a more generic, and in this context more useful, definition of the terms 'symmetric' and 'asymmetric' divisions. Notably, however, we find that the conditions for the emergence of universality are not always fulfilled in real tissues, which provides chances, but also further challenges, for the identification of stem cell fate choices in homeostatic tissues.

Strategies for stem cell self-renewal
The two classical stem cell self-renewal strategies, Invariant Asymmetry (IA) and Population Asymmetry (PA) (Potten and Loeffler, 1990;Simons and Clevers, 2011a;Watt and Hogan, 2000;Klein and Simons, 2011), are commonly described in terms of two cell types: stem cells (S) which can self-renew (i.e. divide without reducing their potential to divide in the future); and differentiating cells (D). Both strategies can be expressed in terms of a single parametrized stochastic model, a multi-type branching process (Haccou et al., 2005), defined by the outcomes of cell divisions (the cell fate choices), SÀ ! l S þ S with probability r S þ D with probability 1-2r D þ D with probability r 8 < : ; (1) where cells of type S divide with rate l. Here, a daughter cell configuration S þ S corresponds to symmetric self-renewal division and D þ D to symmetric differentiation, while daughter cells of different type, S þ D, marks an asymmetric division. In the basic model version, a cell of type D is eventually lost with rate g, DÀ ! g ; (corresponding to death, shedding, or emigration of D-cells), while other versions may include the possibility of limited proliferation as committed progenitor cells. The two self-renewal strategies, IA and PA, are distinguished by the value of the symmetric division fraction r: the PA model corresponds to any 0<r 1 2 ; the IA model is defined by r ¼ 0, that is, only asymmetric divisions occur.
To maintain homeostasis, the number of cells must stay, on average, constant. Thus cells following the PA strategy must regulate the probabilities of symmetric self-renewal and differentiation to be exactly equal, whereas for the IA model this is trivially assured. However, only for the IA model is the number of stem cells strictly conserved, that is, no gain or loss of stem cells is possible.
A way to assess self-renewal strategies experimentally is via genetic cell-lineage tracing (Kretzschmar and Watt, 2012;Blanpain and Simons, 2013): By marking single cells with an inheritable genetic marker (through a Cre-Lox system [Soriano, 1999;Sauer, 1998]) each cell's progeny, called a clone, which retain that marker, can be traced. The number of cells per clone, that is the clone size, is measured and the statistical frequency distribution of clone sizes (clone size distribution) determined. To test the cell fate choice models on that data, one evaluates the models with a single cell as initial condition and samples the outcome in terms of the final cell numbers -the size of a virtual clone. In the basic version of the model (i.e. when DÀ ! g ;), the IA and PA models predict, respectively, a Poisson and an Exponential clone size distribution for large times (Klein and Simons, 2011;Antal and Krapivsky, 2010) (see also the Appendix, 'Invariant Asymmetry and Population Asymmetry models'). Thus, they are fundamentally different and can easily be distinguished when compared with clonal data. By a series of lineage-tracing experiments it was confirmed that Exponential clone size distributions prevail for most mouse tissues, which thus exclude the IA model and support the PA strategy (Clayton et al., 2007;Lopez-Garcia et al., 2010;Simons and Clevers, 2011b;Doupé et al., 2012;Klein et al., 2010). While this seemed to settle the case in favour of the PA strategy, at least for most adult mouse tissues, this was challenged by a third type of strategy, the DH model (Greulich and Simons, 2016). Motivated by the emerging view of prevailing cell plasticity (Blanpain and Fuchs, 2014;Tetteh et al., 2015;Tetteh et al., 2016;Donati and Watt, 2015), the DH model considers the possibility of reversible switching between two cell types: where symbols at arrows denote the process rates (frequency of events). This strategy is also capable of maintaining a homeostatic population if g=l ¼ ! S =! D . Notably, the DH model only features asymmetric divisions (in that daughter cells are of different type), like the IA model, yet the DH model predicts clonal statistics that are indistinguishable from the PA model (Greulich and Simons, 2016). This means that in view of the existing clonal data for mouse tissues, the DH model, may as well describe the real cell fate dynamics. More fundamentally, this implies that the PA and DH model cannot be distinguished via plain clonal data, which poses fundamental limitations to the common approach to use lineage tracing for determining cell fate choices. This demonstrates that the classical definition of asymmetric and symmetric divisions is not always suitable to distinguish cell fate strategies in view of clonal data alone. In general, cell fate dynamics may be much more complex than the simplified models described above, as there may exists a plethora of cell (sub-)types in a tissue. However, to what extent would it be possible to distinguish details of potentially rather complex cell fate dynamics models through comparison with clonal data at all? This is only the case if the clonal statistics are sufficiently different. In the following, we study cell fate models in their most generic form, and analyze what clonal statistics would be expected.

Model generalization
Let us consider the dynamics of a generic system of cells, characterized by a number m of possible cell states X i , i ¼ 1; :::; m. We define a cell state here as a group of cells showing common properties (e.g. any cell sub-type classification). Most generally, cells in a state X i may be able to divide, producing daughter cells of any cell states X j and X k (where i ¼ j ¼ k, that is, simple cell duplication, is possible). Furthermore, any cell state X i may turn into another state X j or may be lost (through emigration, shedding, or death). Hence, we can write a generic cell fate model as, cell loss: where i; j; k ¼ 1; :::; m. In this model, l i is the rate of division of cells in state X i and the parameter r jk i corresponds to the proportion of division outcomes producing daughter cells of state X j and state X k ; w ij is the transition rate from state X i to state X j and g i the loss rate from state X i . The dynamics of each cell in Equations 3-5 could depend on the cell environment through spatial, cell-extrinsic regulation of cell fate. However, the clonal statistics of spatial models that include cell-extrinsic regulation of cell fate (models of the voter type [Clifford and Sudbury, 1973]) are, in the long term, the same as for the corresponding branching process models (Haccou et al., 2005), as Equations 3-5 are, except for one-dimensional arrangements of cells (as shown in Klein and Simons, 2011;Bramson and Griffeath, 1980). Here, we are focussing on the long-term clonal statistics of self-renewal strategies, and since this is not affected by cell-extrinsic regulation, for tissues with two-dimensional or three-dimensional arrangements of dividing cells (like epithelial sheets, and volumnar tissue), we wish to keep the analysis simple and therefore choose dynamics (and thus the parameters l i ; ! ij ; r jk i ; g i ) to be independent of the cell environment. In the following, we study the dynamics of cell numbers in each state X i , n i . To gain initial insight into those dynamics, let us first consider the time evolution of the mean cell numbers, n i ¼ hn i i, given by, in which r j i ¼ P k ðr jk i þ r kj i Þ=2 is the probability of having a daughter cell in state X j produced upon division of a cell in state X i . This linear system of differential equations can be written more compactly in terms of the mean cell number vector n ¼ ð n 1 ; n 2 ; :::; n m Þ, with A being the m Â m matrix where we defined the total transition rate k ij ¼ l i 2r j i þ ! ij , combining all transitions from X i to X j by cell divisions and direct transitions, and the local loss rate d i ¼ l i þ P j ! ij þ g i . Models of the form Equations 3-5 are not generally in homeostasis, which in this context is defined by the existence of a stationary state n Ã , with d n Ã =dt ¼ 0, that is (Lyapunov) stable and nontrivial (for a discussion, see the Appendix 'Conditions for homeostasis'). This can in principle be assessed through the spectral properties of A (Å strö m and Murray, 2008), but applying spectral conditions explicitly is unwieldy and difficult to interpret biologically. For a more intuitive view, we interpret the system, Equation 7, as a network (graph): the matrix A can be interpreted as the adjacency matrix of the cell state network. This is a weighted directed graph in which cell states correspond to the graph's nodes and a link from state X i to X j exists where a transition is possible, that is, when k ij >0. The value of k ij also denotes the link weights (diagonal elements of A can be considered as self-links). Now, we note that Equation 7 is linear and cooperative, that is, the off-diagonal elements of matrix A are non-negative, and for such systems more simple and intuitive conditions for homeostasis exist , based on a decomposition into the network's Strongly Connected Component (SCC). An SCC is a sub-graph that groups nodes which are strongly connected, that is, which are mutually connected by paths (more accurately: two nodes, X i and X j are strongly connected if there exists a path from X i to X j and from X j to X i on the network). An example of such a decomposition, which yields an acyclic condensed network that contains SCCs as nodes and directed links between them, is shown in Figure 1.
The stability of systems like Equation 7 is then determined by the dominant eigenvalues k of each strongly connected component k, for k ¼ 1; :::; m S where m S is the number of SCCs, and their topological arrangement (the Perron-Frobenius theorem assures that for adjacency matrices of SCCs of cooperative systems, a unique, real, maximal eigenvalue exists, which is the dominant eigenvalue [Arrow, 1989;Greulich et al., 2019]). In brief, according to Greulich et al., 2019, the conditions for existence of a homeostatic state are that, at the apex of each lineage (the condensed cell state network), there must be an SCC with dominant eigenvalue k ¼ 0, while all SCCs downstream of the former must have k <0 (see detailed discussion in the Appendix, 'Conditions for homeostasis'). Given this structure of homeostatic models, we can define two compartments in the cell state transition network: (1) the (self-) Renewing compartment (R), which is the SCC at the apex of the lineage tree; and (2) the Committed compartment (C), which consists of all SCCs with k <0, that is, those downstream of the apex SCC. Importantly, cells in states forming R have the potential to return to any state within the same compartment and this population maintains itself. Instead, the cell population in C would vanish without external input, since the combined dominant eigenvalue of all those SCCs is negative (it is the maximum of all SCCs' k <0), thus the progeny of each cell in the committed compartment will eventually be lost. We can thereby classify cells as being of a (self-)Renewing type (R) if their state is within R, and of a Committed type (C) if their state is in C. With this coarse-grained classification, a generic homeostatic model can be represented in terms of compartments R and C as, ; (9) where the symbols above arrows are the effective rates of those events, denoting the average frequency at which they occur (loss events R ! ; are not explicitly included, since they can be approximated by a short lived state X d in C, as R ! X d ! ;). To be compatible with a homeostatic condition, it is further required that (i) the R-population remains on average constant ( k ¼ 0), that is, l R r RR ¼ l R r CC þ !, and (ii) the loss rate of C must exceed its proliferation rate ( k <0), that is, g C >l C . Figure 1 shows how a generic homeostatic cell state network can be condensed into an effective model of renewing and committed cell states, according to Equation 9. It has to be noted, however, that the events depicted in Equation 9 are not Markovian, that is, the timing of events is not independent from each other and depends on their history. Thus, the 'rates' l R , l C , ! RC , and g C are not constant rates in the Markovian sense, yet we can define them by the mean frequency of events occurring (see Appendix 'Approximation of generic GIA models' and 'Asymptotic clone size distributions: mathematical analysis').
The formulation in terms of renewing and committed states can help us to gain insights into potential behaviors of generic homeostatic cell fate models. In particular, we define generalized asymmetric divisions as events of the type R ! R þ C, and generalized symmetric divisions as events of the type R ! R þ R (symmetric renewal) and R ! C þ C (symmetric commitment). With these definitions, we can categorize homeostatic cell fate models into two classes: Generalized Invariant Asymmetry (GIA) models are those which only exhibit R ! R þ C divisions in the renewing compartment, while Generalized Population Asymmetry (GPA) are models for which such restriction does not hold. We note that the two classes are equivalently characterized by a conservation law: For GIA models, the number of cells in R is strictly conserved, while for GPA models, no such conservation law holds. Since ¼ 0 is necessary for conservation, the only possible conserved cell states in homeostasis are those in R. Naturally, the previously discussed IA model is a GIA model and the PA model is a GPA model. Notably, the DH model (Equation 2) is of the GPA category, since in that model S and D cells form a single SCC at the apex of the lineage hierarchy, and thus they are both part of R. Therefore, a division S ! S þ D in the DH model, which is asymmetric in the conventional sense, corresponds to R ! R þ R in terms of compartments (Equation 9) and thus it is a generalised symmetric division. According to this classification, PA and DH models are both in the same category (GPA), and indeed, both predict the same type of clone size distribution, an Exponential one (Greulich and Simons, 2016).

Numerical simulation of random cell fate models
To check whether the correspondence between model class, GIA vs. GPA, and predicted clonal statistics holds in general, we analyze the clonal dynamics numerically, by generating and testing a large number of random stochastic models, implemented via random generation of the parameters l i , w ij , g i and r jk i . To simulate clones, we perform stochastic simulations based on the Gillespie algorithm (Gillespie, 1977), assuming a Markov process following the rules of Equation 3-5. We run, for each model, a large number of simulations with initially one cell in the compartment R, thus the cell population of each simulation run represents one clone. Then we sample their outcomes, the total cell numbers per clone (the clone size) n ¼ P i n i , to obtain predictions for clonal statistics, namely the frequency distribution of clone sizes (clone size distribution) and mean clone sizes (see Materials and methods).
We first study the mean clone size of surviving clones (with n>0), n s ¼ hnij n>0 , shown in Figure 2, respectively, for the GIA and GPA models, as a function of time (the final time t ¼ 20=a min where a min is the minimal process rate, a min ¼ minðl 1 ; :::; ! 12 ; :::; d m Þ). We note that indeed a common behavior is seen in each case. While for every simulated GIA model, n s saturates at a plateau value, it steadily increases for every GPA model. This is expected, and can be understood given that clones in a GPA model can go extinct while those in a GIA model not. Assume that there are initially a large number N c of clones, such that the total number of cells is n tot ¼ N c n s . Since the system is homeostatic, it will reach a constant steady state n Ã tot after a sufficient amount of time, meaning that the mean clone size is n s ¼ n Ã tot =N c . If no clones go extinct, as in GIA models, N c is constant and thus n s approaches a constant. However, in non-conserved multi-type branching processes, as GPA models are, the clone number N c decreases through progressive extinction of clones (Haccou et al., 2005), and therefore n s increases, despite the cell population as a whole staying stationary.
The resulting clone size distributions for the two model classes are shown in Figure 3. Here, clones sizes n are rescaled by the mean value n s and compared to an Exponential distribution of unitary mean (green curve). As conjectured, all simulated GPA models shown in panel (b) predict asymptotically the same rescaled clone size distribution, namely a standard Exponential distribution. Deviations exist for small times and small clone sizes, but these deviations vanish in the large time limit (details on the convergence are shown in the Appendix, 'Analysis of the generalized Population Asymmetry model'). This means that different models within the GPA class cannot be distinguished in the long-term limit, since they differ only by the mean clone size, which is a free fit parameter. In analogy to statistical physics, we can categorize them as a universality class (Klein and Simons, 2011), meaning that the details of the model do not affect the (scaled) outcomes for assymptotic conditions, which is a form of weak convergence of random variables (Billingsley, 1968). However, the same cannot be said about the GIA models. In fact, we see all kind of shapes in the clone size distributions, both peaked distributions and non-peaked ones, and in fact, some distributions are even close to an Exponential form, and can thus not be distinguished from GPA models. The question is whether we can yet find other parameters for which, when large, also GIA models exhibit universality, that is, yield the same rescaled clone size distribution. For this purpose, we will in the following sections develop a deeper theoretical understanding of the model classes.

Mathematical analysis: Markovian approximation of compartment model
To obtain a deeper understanding of the numerical results, we study the cell fate models in terms of the compartment representation, Equation 9. In this representation models are not Markovian, yet we can study their Markovian counterpart, as an approximation. While this is not expected to yield accurate clone size distributions in general, the limiting distributions of non-Markovian processes are commonly well estimated by their Markovian counterparts.
For GIA models, which only feature R ! R þ C transitions between the renewing compartment, C, and the committed compartment, C, a corresponding Markovian model reads, in which X 1 represents a single state in R and X 2 in C, and symbols at arrows are the process rates. The number of cells in X 1 , n 1 , is conserved, that is, given an single X 1 -cell initially, it always remains at n 1 ¼ 1. Thus, we only need to consider the dynamics of cells in X 2 , n 2 . This Markov process can be solved analytically, and for sufficiently large steady state mean number of X 2 -cells, n 2 ¼ hn 2 i ¼ l 1 =ðg À l 2 Þ (see Appendix, 'GIA 0 test case: steady state distribution and limiting behavior'), the rescaled distribution of cells in X 2 is, in which x 2 ¼ n 2 = n 2 ,l 1 ¼ l 1 =g andl 2 ¼ l 2 =g and Gð:::Þ is the Gamma function (Abramowitz and Stegun, 1972). We note that this distribution exhibits a large variety of shapes: for largel 1 the distribution is peaked, while for smalll 1 is loses its peak. Notably, forl 1 ! 1 andl 2 ! 1, the distribution becomes Exponential and in this case it cannot be distinguished from the GPA case. On the other hand, forl 1 ! ¥, that is, when the ratio of asymmetric divisions over the loss rate is high, this Figure 2. Mean size of surviving clones, n s , as a function of time for random GIA models (a), and GPA models (b). In (a), t ¼ 20=a min , in (b), t is the time at 98% clone extinction. The grey shade represents the percentile of all the simulations (black lines limit the 5-95%ile range); the blue curves correspond to some illustrative selected simulations. Simulations for which the final mean is below two and where the final condition is not achieved (due to computational limitations) are not included: this results in 238 and 571 models, respectively for the GIA and GPA cases.
distribution tends to a Normal distribution with unitary mean and variance equal to 1=l 1 . These different behaviors are graphically shown in the Appendix (see Appendix 1-figure 6, 7 and 8).
For the GPA models, a Markovian approximation reads, accordingly, X 1 þ X 1 with probability r 1 X 1 þ X 2 with probability 1 À r 1 À r 2 X 2 þ X 2 with probability r 2 8 > < > : ; (12) whereby for homeostasis to prevail, l 1 r 1 ¼ l 1 r 2 þ ! and l 2 <g must hold. We note that the dynamics of X 1 are independent of X 2 and thus for the number of cells in X 1 in homeostasis holds which corresponds to a simple continuous-time branching process with two offspring, for which it is known that the resulting distribution of cell numbers is Exponential, that is, P 1 ðn 1 Þ ¼ n À1 1;s e Àn1= n1;s , where n 1;s ' l 1 r 1 t is the mean number of cells in the surviving clones (Haccou et al., 2005).
X 2 cells produced according to 12 follow the same fate as in the two-state GIA model above. While it is not assured that the distribution of X 2 cells is identical to that of Equation 11 (due to simultaneous production events of type X 1 ! X 2 þ X 2 ), we show in the Appendix, 'Asymptotic clone size distributions: mathematical analysis', that for large rates of production of C-cells, the distribution of C-cells -here: cells in state X 2 -attains a Normal distribution with mean n 2 equal to its variance As each X 1 cell contributes independently to the production of X 2 -cells, we have that n 2~n1;s~t . Crucially, this means that in terms of the rescaled variable x 2 ¼ n 2 = n s the standard deviation s x2 ¼ sn 2 ns 1 ffiffiffi ffi n2 p~t À1=2 vanishes for large times, since n 2~n1;s~t ! ¥. Hence, given fixed x 1 , x 2 can be approximated by a constant random number x 2 j x1~ x 1 ¼ n 1 = n s . Therefore, the Figure 3. Rescaled clone size distributions (expected relative frequency P of clone sizes) for random GIA models (a), and GPA models (b), in terms of the rescaled clone size x ¼ n= n s , at final time t ¼ t (see Figure 2 for definition). The grey shade represents the percentile of all the simulations (black lines limit the 5-95%ile range); the blue curves correspond to some selected simulations. A reference curve corresponding to an Exponential distribution of unitary mean ('Exp(1)') is shown in green.
rescaled distribution of the total number of cells is Thus, the rescaled distribution of the total clone size, x ¼ n= n s , is as well an Exponential.

Universality of generic cell fate models
For generic GIA or GPA models, the compartment representation, Equation 9, is not Markovian and one would not expect exactly the distributions we found in the previous section. Fortunately, the limiting distributions of non-Markovian processes and their Markovian counterparts are often, under certain conditions on the parameters, the same. While we reserve the technical arguments for the Appendix ('Asymptotic clone size distributions: mathematical analysis'), we note that this independence of the limiting distribution on the Markov property related to the central limit theorem, which does not rely on the Markov property.
To identify the correct limiting parameters for more complex cell fate models, we need to express the effective non-Markovian rates (i.e. the mean frequency of events) of representation nine in terms of the original model, 3-5. As discussed in the Appendix ('Approximation of generic GIA models' and 'Asymptotic clone size distributions: Mathematical analysis'), we identify those effective rates by the total rates of cell divisions, ;C n j is the probability of a single cell being in state X i of R, respectively ( n i are the solutions to Equation 6). In the Appendix, 'Asymptotic clone size distributions: mathematical analysis', we reason that all GPA models are expected to generate Exponential clone size distributions for large times t. This is indeed what is observed in Figure 3(b). Correspondingly, for GIA models we expect that for largel R ¼ l R =g C the clone size distribution of GIA models would tend to a Normal distribution. To test this prediction, we simulated the same GIA models as for Figure 3 before, but we tuned parameters in R such that the effective parameterl R becomes large (see details in the Appendix, 'GIA model for largel R '). The result is shown in Figure 4: for an illustrative case shown in panel (a), increasingl R changes the distribution from an exponential form to a peaked form akin to a Normal distribution, and for all simulated random GIA models, shown in panel (b), a Normal distribution is approached whenl R becomes large.
We note that when taking the limit of largel R , as shown in Figure 4, also all other process rates w ij with i,j within R increased as well. What if instead some process rates in R do not scale to become large withl R ? To assess this situation, we studied a simple test case similar to model 10 but containing two states in R, connected via direct state transition (see Appendix, 'GIA B test case: bimodal distribution'). As discussed there, if all rates within R are large compared to the rates in C then indeed we observe a Normal clone size distribution, as expected. However, if the direct transition rates between the states of R are smaller or of equal magnitude as g C , and in addition, one of the two division rates is higher then the other, then we observe a bimodal clone size distribution. The reason is that if the transitions between the two states in R are rare compared to the life time of cells, 1=g C , they become essentially separated and each of those states generate separate Normal distributions with different mean (due to different cell division rates in those two states) which, when overlaid, generate a bimodal clone size distribution (see detailed arguments in the Appendix, 'Asymptotic clone size distributions: mathematical analysis').
Finally, from those considerations follows: 1. GPA models attain an Exponential clone size distribution for time t ! ¥. 2. GIA models attain a Normal clone size distribution if all process rates within R are much larger than the inverse lifetime of C-cells, g C .
Hence, the GIA and GPA model classes, each represent a universality class, that is, a scaling limit exists in which all models of the same class yield the same rescaled clonal statistics.

Discussion
Our analysis shows that intrinsic limitations exist for identifying strategies of stem cell self-renewal through clonal data from cell lineage-tracing experiments. This is due to different models of cell fate choice generating the same type of clonal statistics (clone size distributions), so that model inference based on clonal statistics -currently still the most prevalent method to determine stem cell selfrenewal strategies -fails to distinguish them. The feature that different models asymptotically generate the same statistics is a form of weak convergence of random variables (Billingsley, 1968) and corresponds to universality, as known from statistical physics.
Cell fate models can in principle be very complex, with a plethora of cell (sub-)types in a tissue. We introduced a new categorization of cell types, distinguishing between cell states that are committed (C-cells), whose progeny is inevitably lost eventually, and non-committed or (self-)renewing cell states (R-cells), which retain the potential to remain or return to the apex of the lineage hierarchy. According to this categorization we classified generic models of cell fate choice as Generalized Invariant Asymmetry (GIA), if only generalized asymmetric divisions of the form R ! R þ C occur for R-cells, and Generalized Population Asymmetry (GPA), when all kind of divisions can occur, as long as gain and loss of R-cells are balanced. Models of the GIA category are also characterized by a conservation law, since the number of R-cells is strictly conserved, while GPA models do not exhibit such a conservation law.
We found that the classification in GIA and GPA models mirrors the clonal statistics generated by them: models of the GPA class all generate clonal statistics which with time converge to an Exponential clone size distribution. Thus, two GPA models can therefore not be distinguished through clonal data, once some time has passed after induction of clones. For GIA models, distributions can generally vary, but if the rates of divisions and transitions in the R compartment are much larger that the rate of cell loss, the clone size distribution of all those models becomes a Normal distribution. In that case, two GIA models can not be distinguished by the clonal data. While here we do not explicitly consider cell-extrinsic regulation of cell fate, this kind of regulation does not affect long-term clone size distributions, except when cells are arranged one-dimensionally (Klein and Simons, 2011;Bramson and Griffeath, 1980). Thus, our results cover cell dynamics in most renewing tissues, such as epithelial sheets or volumnar organs, but not (quasi-)one-dimensional arrangements of stem cells, as found in the seminiferous tubule, or in intestinal crypts, where clonal statistics may differ. Hence, our analysis shows that models of cell fate choice cannot in general be distinguished with further resolution beyond the R vs. C categorization of cell types. The universality of the model dynamics also . Rescaled clone size distributions (expected relative frequency P of clone sizes) for random GIA models as in Figure 3, at time t ¼ t (see definition in Figure 2). Sensitivity to parameterl R is shown for one illustrative case in panel (a), and all GIA models forl R ¼ 30 in panel (b). The distributions are shown in terms of the rescaled variables x ¼ n= n s for panel (a) andx ¼ ðn À n s Þ=s n , where s n is the distributions variance, in panel (b). In (b), the grey shade represents the percentile of all simulations (black lines limit the 5-95%ile range); the blue curves correspond to some selected simulations. A reference curve corresponding to a Normal distribution of zero mean and unitary variance is shown in green. Simulations for which t ¼ t is not reached (due to computational limitations) are not included, resulting in 922 model instances.
shows that effective, simplistic models are often equally accurate to model experimental data, yet with a higher statistical power due to less free parameters.
While at first glance, this analysis seems to discourage efforts to unravel details of cell fate dynamics, room remains in regimes where the limiting conditions for asymptotic distributions are not fulfilled. In particular, if fast cycling committed progenitor cells are present, while stem cells are slow cycling, then the condition that the division rate of R-cells is much larger than the cell loss rate is not fulfilled. In that case, details of the model dynamics may affect the shape of the clone size distribution and thus allow distinction between models. However, caution should be given when an Exponential clone size distribution is observed, since this could indicate either a GIA model with high activity of committed progenitor cells, or a GPA model. In that case, the mean clone size needs to be consulted to distinguish models (see Figure 2). Differentiating between models within the GPA category is more difficult, since the predicted statistics from different models always become more similar over time. Short-term measurements would in principle allow such a distinction, but since in reality the underlying processes are not truly Markovian (as assumed for the modeling purpose) they are not necessarily a good representation of the real cell dynamics at short times. At long times, however, Markovian approximations are increasingly accurate, precisely because of the feature of universality.
How could the resolution of cell fate modeling be improved? The state-of-the-art approach to determine cell fate trajectories is via analysis and modeling of single-cell RNA-sequencing (scRNAseq) data. However, many limitations to this method exist, discussed in Weinreb et al., 2018, and neither reversible trajectories nor the modes of cell division, such as asymmetric vs symmetric divisions, can be inferred. Intravital live imaging, on the other hand, allows to trace individual clones over time (Ritsma et al., 2012;Pittet and Weissleder, 2011;Hara et al., 2014;Rompolas et al., 2016), and thus can obtain details of cell fate trajectories, yet this technique is limited to few tissue types which are accessible for invasive long-term imaging. Nonetheless, while each of those experimental assays alone is prone to limitations in defining self-renewal strategies, advanced model inference schemes, that integrate data from different experimental sources, might be the way forward in the future to finally reveal the details of stem cell self-renewal strategies.

Materials and methods
The numerical analysis of the random cell fate model was implemented in Matlab. The description of the stochastic models definition, the random model generation and the simulation campaign is detailed in the Appendix, 'Stochastic process modelling'. Additionally, as a validation of the implemented simulator, based on the Gillespie algorithm (Gillespie, 1977), the IA and PA models were simulated and the results analyzed in the Appendix, 'Invariant Asymmetry and Population Asymmetry models'.
Analytical solutions were partially obtained using Mathematica. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.   into the biological context of clonal dynamics. A linear cooperative system is one of the form d dt xðtÞ ¼ AxðtÞ where xðtÞ ¼ ðx 1 ðtÞ; x 2 ðtÞ; :::; x m ðtÞÞ are functions of time t and A is a constant m Â m matrix for which all off-diagonal elements are non-negative (the latter condition defines the cooperativity of the system) (Hirsch and Smith, 2006;Greulich et al., 2019). We note that the dynamics of mean cell numbers, Equations 6 and 7 in the main text, indeed describe an LCS according to this definition. Now we use the following definitions:

Additional information
. GðAÞ is the graph of A, that is, the graph for which A is the adjacency matrix, whose elements a ij give the weight of the links from i to j (a ij ¼ 0 means that no link exists). In the following, we use the terms graph and network synonymously.
. If in GðAÞ there exists a path from node i to node j and from j to i, then we call those nodes strongly connected, i j, which is an equivalence relation. A maximal set of nodes which are are strongly connected with each other are called a Strongly Connected Component (SCC) of the graph (the equivalence class of the equivalence relation '').
. The graph GðAÞ can be decomposed into its N S SCCs, S k , k ¼ 1; :::; N S (Cormen, 2009), which are sub-graphs associated with an adjacency matrix A k , such that GðA k Þ ¼ S k . Since the A k have non-negative off-diagonal elements, they are Metzler matrices for which the Perron-Frobenius theorem ensures that a unique, simple and real maximal eigenvalue k exists (Arrow, 1989). . If there is a path from SCC S k to SCC S l , then we call S k upstream of S l and accordingly S l downstream of S k . We note that there can never exist paths from S k to S l and from S l to S k , since otherwise, by definition, their nodes would be strongly connected and both together would form a single SCC (Cormen, 2009). Thus, there is a unique hierarchy of SCCs.
. A stationary state x Ã of a dynamical system is Lyapunov stable if a small initial deviation from x Ã leads to a small final deviation xðtÞ (i.e. x Ã is not unstable). More accurately: there exists a constant C>0 such that jxðtÞ À x Ã j<Cjx 0 À x Ã j for all times t, where x 0 ¼ xðt ¼ t 0 Þ is the initial condition, sufficiently close to x Ã . A stationary state of a linear system that is Lyapunov stable, yet neither asymptotically stable nor has a limit cycle, is neutrally stable.
. Homeostasis means that the cell numbers in each state, n ¼ ðn 1 ; :::; n m Þ, stay on average constant, d n dt ¼ 0 (where n ¼ hni), and that this state is not unstable towards perturbations. This condition corresponds to a Lyapunov-stable stationary state. Note that a linear system, as the one described by Equations 6 and 7, main text, cannot have an asymptotically stable state except for the trivial state n Ã ¼ 0, which corresponds to a vanishing cell population. We note that when considering the tissue cell population as a whole, dynamics can be non-linear through interactions between cells and a non-vanishing asymptotically stable state may then exist. However, since single clones do not significantly affect the total configuration of cells in a tissue, the clones compete neutrally, when embedded in a homeostatic cell population, which corresponds to a Lyapunov stable, but not asymptotically stable state. We therefore use Lyapunov stability, a weaker form of stability, to define homeostasis, since an asymptotically stable vanishing state is not a biologically viable state. 2. There is at least one SCC, S k , with k ¼ 0.
3. There is no path between any two SCCs, S k and S l , which have k ¼ 0 and l ¼ 0.

Theorem 2
All nodes i upstream of an SCC S l with l ¼ 0 must be empty in the the stationary state, that is, Since Equation 7, main text, is an LCS, we can apply theorems 1 and 2 to find conditions for homeostasis, defined by a Lyapunov-stable configuration of mean cell numbers n Ã ¼ ð n 1 ; n 2 ; :::Þ. According to theorem 1 at least one SCC with k ¼ 0 must then exist, and according to theorem 2 the stationary state of nodes upstream of it must be empty, that is, they do not exist in homeostasis. Since the condensed graph of the SCCs does not have cyclic paths, an SCC S k with k ¼ 0 must therefore always reside at the apex of all non-vanishing cell types. In principle, an acyclic graph may have more than one apex, however, since, by definition, a stem cell clone always starts with a single stem cell, and no other SCC with ¼ 0 may be downstream of the latter, we only consider one apex SCC with one initial cell when studying clonal dynamics.
Hence, in the context of homeostatic clonal dynamics, we can assume that there is a single SCC, S k with k ¼ 0 at the apex of the cell state graph, while all other SCCs, S l are downstream of it and have l <0. Since there are no paths from the non-apex SCC to the apex SCC (as the condensed graph is acyclic) we can distinguish the two separate compartments R (the renewing compartment) consisting of all nodes of the apex SCC, S k , and C (the committed compartment), consisting of all other nodes, whereby due to l <0 for all SCCs in C, all progeny of cells in C will vanish in the long term.

Model description
Since clonal dynamics start, by definition, with a single cell, we use stochastic dynamics to model clones. Thus, we model cell fate dynamics as a continuous-time multi-type branching process (Haccou et al., 2005), a Markov process following the rules of Equations 3-5, main text. As shown later, without losing generality, here only two types of events are modeled; considering an arbitrary number m of cell states, X i , for i ¼ 1; :::m, the model includes . Cell divisions: a cell in state X i divides in two cells with rate l i , respectively in state X j and X k at a ratio r jk i .
where l i ¼ 0 if state X i does not allow division. In this formulation of cell division events, which we use for the generation and numerical simulations of random models, only one division outcome is possible upon division of a particular cell state X i . Nonetheless, multiple division outcomes per state can be implemented as single outcomes if additional metastates are introduced, which represent priming of a state X i towards a certain division outcome option. For example, if in the original model, state X i has different outcome options, X j1 þ X k1 ; X j2 þ X k2 ; :::, we can substitute this by, first, transitions from X i to (new) states X m1 ; X m2 ; ::: and subsequent divisions X ml ! X jl þ X kl . The use of metastates to model more complex processes is discussed in detail in 'Population Asymmetry model using metastates'.
. Direct state transitions: a cell in state X i changes to state X j at a given rate w ij .
X i À ! !ij X j , i; j ¼ 1; :::; m; i 6 ¼ j; where ! ij ¼ 0 means that no transition from X i to X j is possible. Additionally, we include cell loss in this scheme, by treating it as a transition to an additional special state, called hereafter death and denoted by ; (cells in this state do not enter in the counting of the total number of cells). In that formulation, the loss rates of the original model are d i ¼ ! i; .
These events define a Markov process, which can be represented as a stochastic network (Bang-Jensen and Gutin, 2007). In this view, each node can be related to a cell state, while the links represent transitions between states via cell divisions and the direct state transitions. It is noted that this stochastic network is different from the network defined in the main text and in 'Conditions for homeostasis' of this SI, which describes the dynamics of mean cell number instead. Here, for the stochastic modelling, let us define the adjacency matrix K of this network, through the elements k ij ¼ l i 2r j i þ ! ij i; j ¼ 1; :::; m, in which k ij are the total transition rates as defined in the main text. We note that K is related to the matrix A used in the main text by A ¼ K T À D, where D is the diagonal matrix with entries d i ; i ¼ 1; :::; m, as defined in the main text, with the slight difference that here the loss state ; is treated as a separate state. Additionally, it is remarked that in this model interpretation, where only one division option for each state is possible, the term r j i 1 is not a continuum value, but instead it can only take the values 0; 1=2; 1 depending on the specific outcome of the division of the cells in state X i . Notably, more than one stochastic network may result in the same matrix K, therefore, to uniquely define a process, we distinguish a matrix D which describes cell division events (note that this is possible with just a single matrix as there is only one division option per state) and a matrix T which describes direct transition events. The matrix K is the sum of both,

Generation of random models
To test the behavior of the clonal dynamics in a generic homeostatic model, a large number of random stochastic networks was generated, whereby each stochastic network corresponds to a distinct set of parameters l 1 ; :::; l m ; ! 12 ; :::; ! m; for the stochastic stem cell fate choice model. The strategy detailed below is based on the following considerations which summarize the key requirements to achieve homeostasis detailed in 'Conditions for homeostasis': (a) each network is composed of Strongly Connected Components (SCCs) that are randomly connected; (b) only one SCC, the one at the apex of the network, forms the renewing compartment, R, (i.e. it is characterized by a dominant eigenvalue ¼ 0 with respect to A) and all the others form the committed compartment, C, (i.e. they are characterized by a dominant eigenvalues <0). It is further noted that the SCCs of the stochastic network GðKÞ are the same as those of the matrix GðAÞ, where A ¼ K T À D defines the dynamics of mean cell numbers. This is, since transposition of an adjacency matrix and altering of diagonal elements does not affect the network topology.
To generate the stochastic network, a two-step process is followed: (1) a large number of (random) SCCs are generated; (2) a condensed network is randomly constructed and filled with randomly picked SCC from step 1.
It is noted that unitary rates are assumed in step (1) and they are successively randomly modified in step (2) to achieve the desired properties of the dominant eigenvalue m while ensuring randomness.
Focusing now on step (1), that is, the generation of single SCCs, the following procedure is used.
a. The total number of states composing the SCC is defined, indicated as m S . An additional state is added to represent whatever is outside the SCC. In the current analysis, we set 1 m S 4. If not, then this model is discarded. In case GPA networks are generated, a further check is performed to discard also those models consistent with a GIA network (they cannot be a priori excluded as done in point 2 for the GIA ones). These pools of models are indicated as M GIA and M GPA for the GIA and GPA models, respectively. e. For each SCC in M GIA and M GPA , the dominant eigenvalue is estimated. For construction, the generated GIA networks are all characterized by ¼ 0, while in general any value can be obtained within M GPA . f. The SCCs in M GPA are additionally processed to check whether the network is compatible with homeostasis by tuning the rates. Networks satisfying this condition are additionally stored under a new pool of SCCs, called M Ã GPA . If not, then they are discarded when >0 (i.e. for any combination of rates the number of cells in these networks is expected to grow).
This process results in three pools of SCCs classified for m S , N T and N D (i.e. number of states, transitions and divisions): (1) M GIA contains GIA models; (2) M Ã GPA contains GPA models that can be tuned to have ¼ 0 and (3) M GPA contains GPA models characterized by <0 or that can be tuned to meet this condition.
In step (2), the generation of random networks starting from the individual SCCs is implemented as follows. a. A number of committed SCCs, N c , between 1 and 3 is randomly chosen. b. N c SCCs are randomly picked from the pool of models M GPA . The selection is done considering equal probability in m S , N T and N D . For each SCC, the unitary rates a (where a stands for any rate l i or ! ij ) are modified by multiplying them for random numbers (exponentially distributed with mean a ¼ 1 and minimum a m ¼ 0:3). Additionally, a threshold on the dominant eigenvalue is set, max ¼ À1; if this condition is not satisfied, then the rates are tuned to meet this requirement while maintaining the rates above the minimum. c. The committed compartment of the condensed network is generated by randomly connecting all the outgoing components of the k-SCC with states in the l-SCC for l ¼ k þ 1; ::; N c . In this way, the transposed adjacency matrix of the stochastic network has triangular block form: C 12 B 2 0 ::: C 1;Nc C 2;Nc B Nc C 1; C 2; C Nc;; 0 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 : The last SCC is forced to be linked to a single death state. d. With a similar procedure described in point 2, two SCCs are randomly picked respectively from the pool of SCCs in M Ã GPA and M GIA ; the unitary rates are modified (exponentially distributed with mean a ¼ 1 and minimum a m ¼ 0:3) and, in the GPA case, tuned to meet the condition ¼ 0. They represent the renewing part of the network. e. Two networks (one for the GIA and one for the GPA models) are produced by attaching the selected renewing network upstream the committed one; this is done based on an analogous procedure as described in step 3.
At the end of this process, we have two networks which are different in just the renewing part, being one consistent with the GIA model and the other with the GPA one. In total 2000 networks were built and analyzed.

Simulation campaign
An extensive simulation campaign was run to model the clone dynamics. The code implemented to numerically simulate the stochastic process defined by events of type 1 and 2 is based on the Gillespie algorithm (Gillespie, 1977). Since a clone is by definition the progeny of a single cell, we choose as initial condition a single cell put randomly in a state within R. Concerning the final condition, given the substantial difference in the dynamics in the two models, the final time, indicated by t , is set equal to 20 times the inverse of the minimum process rate, a min ¼ minðl 1 ; :::; l m ; ! 12 ; :::; ! m;; Þ, in the GIA models, and to the time at which the fraction of extinct clones reaches 98% in the GPA models. Note that all critical branching processes, as homeostatic clonal dynamics are, will go extinct almost surely at some point in time (Haccou et al., 2005).
To determine the clone size distribution, 10 3 and 5 Á 10 4 simulations were run respectively in for each GIA and GPA model (in this way, both models result in the same final number of clones when 98% extinction is taken into account).

Invariant Asymmetry and Population Asymmetry models
To validate the simulation approach, we tested the procedure on simple cell fate models for which analytical results are known, the Invariant Asymmetry (IA) and Population Asymmetry (PA) models. As described in the main text, in the simplest version, these are defined as, In these processes, cells of type S represent the stem cells (called hereafter also progenitor), which divide with stochastic rate l, and cells of type D are the differentiated cells, which are shed with rate g. While in the PA model the three possible outcomes of the division of a progenitor are controlled by a probability parameter 0<r 1=2, in the IA model r = 0, meaning that there are strictly asymmetric division and the number of S-cells is conserved. It is remarked that in the definition of the stochastic networks given in 'Model description' only one division option for each state is modelled; however, the code implemented for the numerical simulations of the stochastic process allows for an arbitrary number of division options for each state as well (see 'Population Asymmetry model using metastates').
Considering the dynamics at tissue level, the system of ODEs describing the average number of cell n S and n D respectively of type S and D is, : It is clear that, on average, the number of S-cells remains constant. Additionally, in homeostasis, the average total number of D-cells stabilizes around a constant value n Ã D ¼ ðl=gÞ n S that uniquely depends on the number of stem cells, n S which equals the initial number of stem cells n S;0 ¼ n S ðt ¼ 0Þ, Thus, the (Lyapunov stable) stationary state of total cell numbers n ¼ n S þ n D is given by, Based on Equation 6, the process rates l and g determine the proportion of cells of type D with respect to cells of type S. Importantly, there is no difference at tissue level between the IA and PA models.
A distinction is instead evident when we look at the dynamics at the single-cell level, and study the clone size distribution, that is, the distribution of the progeny of a single cell. For the IA model, the number of S-cells is strictly constant, and thus the joint probability distribution Pðn S ; n D Þ of both S-cells and D-cells, respectively indicated as n S and n D , is fully determined by the distribution of Dcells, Pðn D Þ. The IA model's master equation for Pðn D Þ, considering a single initial cell of type S, is given by, This corresponds to a simple birth-and-death process for which the distribution is Poissonian with mean l=g, (Van Kampen, 1981).
Considering now the PA model, the master equation is instead given by, dPðn S ; n D Þ dt ¼ lðrðn S À 1ÞPðn S À 1; n D Þ þ ð1 À 2rÞn S Pðn S ; n D À 1Þ þ rðn S þ 1ÞPðn S þ 1; n D À 2ÞÞ þgðn D þ 1ÞPðn S ; n D þ 1Þ À ln S þ gn D ð ÞPðn S ; n D Þ: In Antal and Krapivsky, 2010, an exact result for the distribution of total cell numbers n ¼ n S þ n D is found when l ¼ g and r ¼ 1=4. For different values of the process parameters, the long-term distribution is shown to be Exponential.
Numerical simulations for the clonal dynamics were run, considering the above models and three different sets of test parameters each, indicated as IA# and PA#i for i ¼ 1; 2; 3, which are reported in Appendix 1-table 1. It is noted that the time unit is arbitrary and therefore omitted. Simulations are based on 10 4 and 5 Á 10 4 runs respectively for the IA and PA test cases. The initial condition is a single stem cell and the final simulation time, indicated as t , is equal to 10: this value is well representative of a steady state condition (for the IA test cases) and at which the total extinction of the process is not yet achieved (for PA test cases only). The clone size distribution at t in the IA test cases is shown in Appendix 1-figure 1: in this figure, each profile is compared to the corresponding Poisson distribution shifted by one (i.e. plus the stem cell). Concerning the results for the PA test cases, they are shown in Appendix 1-figure 2. In this case, the profiles are compared to the numerical integration of the master Equation 8. Additionally, for the PA# test case, where l ¼ g and r ¼ 1=4, the reference analytic solution provided in Antal and Krapivsky, 2010 is also shown. In general, a good agreement is obtained in all of the cases. Appendix 1-figure 1. Invariant Asymmetry (IA) test cases clone size distribution PðnÞ, that is the distribution of the total number of cells n forming the progeny of a single initial cell in R. For each case, the distribution is shown at t (defined in Figure 2, main text), which is well representative of the steady state condition. Tested parameters for cases IA#1-3 are provided in Appendix 1-

Population Asymmetry model using metastates
As argued before, we assume in the random model generation that cell division in state X i has a unique outcome, X i ! X j þ X k (Equation 1), since thereby the stochastic process can be uniquely defined by the two matrices D and T. To accommodate for the possibility of different division outcomes from the same state X i , as in Equation 4 and Equations 3-5 in the main text, we introduce metastates, which represent short-lived states that indicate priming for either outcome, from which the cell division outcomes are unique. This is a small modification of the original model, which, however, does not lead to significant deviations if the metastates are traversed sufficiently quickly (which can be assured by a choice of high direct state transition rates in the metastates).
To illustrate this, let us consider the PA model described by 4; instead of having three different outcomes upon division of an S-cell we define the corresponding Metastate (MS) model with three primed states, M 1;2;3 , as, in which S and D correspond to the same cell type of the PA model (i.e. the stem and the differentiated cells, respectively), while M i , for i ¼ 1; 2; 3, represent the metastates. These states are temporary states that are used to model each one of the three different possible division options of the S-cells. The rates l i and ! i , for i ¼ 1; 2; 3, are chosen such that the time scales of division and outcome probabilities are the same as in the original PA model: Equations 10 assure that outcome probabilities are the same as in the original model, while Equations 11 are needed to have the same total average time between two consecutive events. As there are six unknowns and only five relations, the following additional equation is added in which D is an additional parameter that is used to control how fast cells in metastate M 1 divide. Low values of D imply that as soon as an S-cell transits to the metastate M 1 , it divides in two S-cells.
Globally, this results in Numerical simulations for the two models were run and compared, based on the parameters reported in Appendix 1-table 1, and specifically the PA#1 and PA#3 test cases. The time unit, which is arbitrary, is omitted. The process rates for the corresponding MS model, which are indicated in the figures as MS#1 and MS#3, are computed based on Equation 13 and D ¼ 1=500. As well as for the PA test cases, the initial condition is one cell of type S and the final time, t, is equal to 10; simulations are based on 5 Á 10 4 trajectories.
Appendix 1-table 1. IA and PA test cases simulation parameters (see 'Invariant Asymmetry and Population Asymmetry models').

. Metastate (MS) test cases simulation results in terms clone size distribution
PðnÞ, that is the distribution of the total number of cells n forming the progeny of a single initial stem cell. As well as for the PA test cases, the distribution is shown at the final time, t, at which the total extinction of the process is not yet achieved. Profiles from the numerical simulation for cases MS#,3 are compared to the corresponding PA#1,3 test cases which are based on parameters provided in Appendix 1-table 1. The detailed discussion is reported in 'Population Asymmetry model using metastates'.

GIA 0 test case: Steady state distribution and limiting behavior
A simple Generalized Invariant Asymmetric model, indicated hereafter as GIA 0 , was analyzed to identify the causes of the different clone size distribution behaviors observed in the randomly generated models (see main text). Thus, in this section, we study the Markov process defined by, Here, the renewing compartment is composed of just a single state X 1 and cells in this state asymmetrically divide with rate l 1 . The committed compartment is formed of state X 2 ; cells in this state can either divide to duplicate, with rate l 2 , or die, with rate g. It is noted that for l 2 ¼ 0, this model is reduced to the previously analyzed Invariant Asymmetric (IA) model (see 'Invariant Asymmetry and Population Asymmetry models').
As for the IA model, here the number of cells in state X 1 , indicated as n 1 , is conserved. It is therefore sufficient to determine the statistics of n 2 , defined by the master equation for Pðn 2 Þ, the probability of having n 2 cells in state X 2 , provided that there are n 1 cells in state X 1 . The master equation is given by, dPðn 2 Þ dt ¼ À l 1 n 1 þ l 2 n 2 þ gn 2 ð ÞPðn 2 Þ þ l 1 n 1 þ l 2 ðn 2 À 1Þ ð ÞPðn 2 À 1Þ þgðn 2 þ 1ÞPðn 2 þ 1Þ; (15) also written as, in which rðn 2 Þ ¼ gn 2 and gðn 2 Þ ¼ l 1 n 1 þ l 2 n 2 . Considering that we are interested in clonal dynamics, meaning that we start from a single stem cell, n 1 is equal to one. In this simple case, the steady state distribution P Ã ðn 2 Þ, corresponding to the solution of dPðn 2 Þ=dt ¼ 0, can be analytically derived. Defining the net flux between states n 2 and n 2 À 1 as I n2 ¼ rðn 2 ÞP Ã ðn 2 Þ À gðn 2 À 1ÞP Ã ðn 2 À 1Þ; (17) and considering that I n2þ1 ¼ I n2 for every n 2 , it follows that I n2 where P Ã ð0Þ is the steady state probability of having 0 cells in state X 2 . Finally, by applying the conservation of the total probability, P ¥ n2¼0 P Ã ðn 2 Þ ¼ 1, and rearranging the terms we obtain, In the main text, we defined the dimensionless parametersl 1 ¼ l 1 =g andl 2 ¼ l 2 =g, representing the rescaled division rates for cells in state X 1 and X 2 , respectively. For clarity and readability, in this section, we simplify the notation using p ¼l 1 and q ¼l 2 . Equation 19 is then rewritten as, It is noted that while p varies between 0 and ¥, q is defined between 0 and 1. The mean number of cells in each state, indicated respectively as n 1 and n 2 , satisfies the system of ODEs Based on this, the steady state average number of cells is When the mean number of cells in state X 2 is sufficiently large, that is, for large p or in case q is close to one, the discrete distribution given by Equation 20, can be approximated by a continuous probability density function P Ã ðx 2 Þ, given by, in which x 2 ¼ n 2 = n Ã 2 . We note that Equation 23 corresponds to Equation 11 in the main text. To better understand the distribution for different values of the parameters p and q, the limit behavior are analyzed below.
1. q ! 0 (i.e.l 2 ! 0Þ When q ! 0, Equation 20 can be simplified considering that and Thus, the distribution results in lim q!0 P Ã ðn 2 Þ ¼ p n2 e Àp n 2 ! ¼ PoissonðpÞ; that is a Poisson distribution with mean equal to p. This agrees with what we were expecting considering that when q ¼ 0 the model is reduced to the IA model for which the distribution in n 2 is known to be poissonian. Additionally, for large mean number of cells, which are obtained for large p (when q ¼ 0, then n Ã 2 ¼ p), the Poisson distribution tends to a Normal distribution with mean and variance equal to p. Therefore, lim ðq;pÞ!ð0;¥Þ P Ã ðn 2 Þ ¼ 1 ffiffiffiffiffiffiffiffi ffi 2pp p e À ðn 2 À pÞ 2 2p ¼ Normalðp; pÞ: Rescaling the distribution, and considering x 2 ¼ n 2 = n Ã 2 , results in lim ðq;pÞ!ð0;¥Þ that is a Normal distribution with unitary mean and variance equal to 1=p.
2. q ! 1 (i.e.l 2 ! 1Þ For q ! 1 the steady state mean number of cells n Ã 2 ! ¥ and Equation 23 holds. This equation can be rewritten as, If the Stirling's approximation is applied we obtain, Considering now that it follows that that is a Gamma distribution with unitary mean and shape parameter given by p. Importantly, the Gamma distribution for p ! ¥ tends to a Normal distribution with unitary mean and variance 1/p. For p ¼ 1, it corresponds instead to an Exponential distribution with unitary mean.
3. p ! ¥ (i.e.l 1 ! ¥Þ When p is large, the mean number of cells is large for any value of q. Thus, Equation 32 is valid. By applying the Stirling's approximation also to the term Gðp=qÞ, we obtain, This expression can be also rewritten as, r e p=ð1ÀqÞððx2À1þ1=qÞ logðqðx2À1Þþ1ÞÀx2 logðx2ÞÞÀ1=2ðlogðx2Þþlogðqðx2À1Þþ1ÞÞ : Considering now that p is large, then À1=2ðlogðx 2 Þ þ logðqðx 2 À 1Þ þ 1ÞÞ (p=ð1 À qÞððx 2 À 1 þ 1=qÞ logðqðx 2 À 1Þ þ 1Þ À x 2 logðx 2 ÞÞ, so the term on the right can be neglected. Additionally, for x 2 ! 1 the following expansions can be applied: and Finally, if we consider that then Equation 36 results in that is a Normal distribution with unitary mean and variance equal to 1=p. Importantly, it is noted that the limiting behavior of P Ã ðx 2 Þ for q ! 0 and q ! 1 in case of large p, are both consistent with the results obtained for p ! ¥ and any q. In other words, remembering that p ¼l 1 and q ¼l 2 , the steady state distribution forl 1 ! ¥ and any value ofl 2 is a Normal distribution of unitary mean and variance equal to 1=l 1 .
To globally verify these results, numerical simulations of the stochastic process associated with model 14 for different values ofl 1 andl 2 were run. The following curves were compared: . Stochastic simulation: distribution at the final simulation time, t, of the number of cells in state X 2 . The final time was chosen here as t ¼ 20=a min , where a min ¼ minðl 1 ; l 2 ; gÞ; this value is well representative of a steady state condition. Furthermore, the process rates considered are based on a unitary g (i.e. l 1 ¼l 1 , l 2 ¼l 2 and g ¼ 1). It is noted that the time unit is arbitrary and therefore omitted.
. Analytic distribution: based on Equations 20, for low mean values, and 23, for large mean values.
. Approximate distributions: Poisson, Gamma and Normal distributions respectively given by Equations 27, 34 and 40.
The tested parametersl 1 andl 2 are graphically shown in Appendix 1- figure 5 a contour map showing the expected steady state mean number of cells n Ã 2 over the ðl 1 ;l 2 Þ-parameter plane. The curves from the numerical simulations and the corresponding exact and approximated solutions are shown in Appendix 1-figure 6, Appendix 1- figure 7 and Appendix 1- figure 8: the tested conditions are divided into three groups (one figure each) representing the limiting behaviors discussed above. Generally, analytical and numerical results agree very well. This also demonstrates that GIA models can show both peaked and non-peaked distributions, depending on the model parameters.

Approximation of generic GIA models
As shown in the main text, a generic GIA model can be expressed in terms of the compartments R and C (Equation 9 in the main text). We note that the the GIA 0 model discussed in the previous section corresponds to the general compartment dynamics of GIA models, Equation 9, main text, if the dynamics of compartments are assumed to be Markovian. Thus, we can treat the GIA 0 model as a Markovian approximation of generic GIA models. In this section, we test this approximation numerically.
To this end, we first wish to relate the effective (non-Markovian) rates l R;C and g C of a generic GIA model to the rates of the Markovian approximation, the GIA 0 model. We refer to this modelthe GIA 0 model matched to the effective rates of a particular more complex GIA model -as the equivalent model to the latter. The equivalent rates l R , l C and g C are computed considering the same steady state condition in terms of mean number of cells. To this aim, we rewrite the dynamics of mean cell numbers, Equation 7 in the main text, in block form as, in which n R;C denote the vectors of mean cell numbers of states restricted to compartments R; C, respectively, and n ; the number of lost cells (not considered for total cell numbers and homeostasis condition). It is noted that A RC ¼ 0, since there cannot be links from C to R. Also A ;R ¼ 0 as we do not consider loss from R (see main text for the arguments). Thus, summing up all the components in each compartment, n R ¼ P i ð n R Þ i ¼ 1 and n C ¼ The equivalent parameters are then estimated from the steady state condition n Ã X and n Ã X , for X ¼ R; C; ;, as, The applicability of this approximation was evaluated by comparing the clone size distribution obtained from the random GIA models (generated as described in 'Generation of random models' and analyzed in the main text) with that from the corresponding equivalent GIA 0 model with param- The values ofl 1 andl 2 for all the GIA random models are shown in Appendix 1- figure 9 in the contour map of the expected mean number of cells in C (in compartment R there is always one single cell). In general,l 1 remains below five andl 2 is spread between zero and one. As measure of the error of the equivalent model, , we choose the maximum difference between the distributions of a particular random GIA model and that of the corresponding equivalent model, relative to the peak of the distribution of the random model. For low mean cell numbers, the distribution is compared to Equation 20; for large mean number instead, the rescaled distribution is compared to Equation 23. A threshold on the mean cell number equal to 10 was chosen to distinguish between the two cases. This relative error as function ofl 2 is presented in Appendix 1-figure 10, where it is evident that large errors are obtained only for large values of this parameters. Some illustrative cases, representative of different value ofl 2 , were selected and their distribution is shown in Appendix 1-figure 11, Appendix 1- figure 12 and Appendix 1-figure 13. The following considerations are made: . Two cases forl 2 <0:2 are presented in Appendix 1-figure 11. In these cases, the distribution obtained from the random models agrees with the analytic solution from the equivalent model, which in turn is well approximated by a Poisson distribution. As expected, larger deviations between the equivalent model's analytic solution and the approximation are noted for increasing values ofl 2 . In general, all the models in this range are well approximated by the equivalent model.
. The two cases presented in Appendix 1-figure 12 havel 2 >0:8, for which the Gamma distribution is an approximation of the equivalent model's analytic solution. The distribution in some cases (see for instance the top figure), presents some deviations with respect to the equivalent model. However, globally a good agreement is obtained in most of the cases (failing ratio, based on a 0.5 maximum error is 21.7%).
. Two cases in an intermediate range 0:2<l 2 <0:8 are shown in Appendix 1- figure 13. Again, the equivalent model's analytic solution is well representative of the distribution (failing ratio, based on a 0.5 maximum error is 3.2%). It is noted that for such values ofl 2 an approximation of the equivalent model analytic solution is not available. Thus, in most of the tested cases the equivalent model is able to catch the behavior of a generic random GIA model, and thus represents a good approximation (global failing ratio, based on a 0.5 maximum error is 6%). In the cases where the equivalent model does not yield a good approximation, the internal structure of the R and C compartments become relevant and subsequent events that affect n R and n C become dependent on each other, and thus are non-Markovian.

GIA model for largel R
To test the behavior of a generic GIA model in case of largel R , the GIA random models (generated as described in 'Generation of random models' and analyzed in the main text) were modified by changing the process rates associated to the renewing compartment to achievel R ¼ 30. To this aim, considering that infinite solutions are possible, we applied a global search method, and more specifically a Genetic Algorithm (Goldberg, 1989). We therefore setup an optimization problem, where the process parameters are the optimization variables and the cost function is the error of the currentl R with respect to the target.
The envelope of curves obtained in all the random models and some illustrative profiles are shown in Appendix 1-figure 14. A reference Normal distribution, characterized by unitary mean and variance equal to 1=l R ¼ 1=30 is also reported: this curve corresponds to the distribution expected in the equivalent model for whichl 1 ¼l R . Deviations become relevant, when the internal structure of compartments in a random model leads to subsequent events that are not independent from each other. These effects alter the variance of the Normal distribution. In fact, Figure 4 in the main text is based on the same simulation results, but in this case the rescaling is done considering both the mean number of cells and its variance (a Normal distribution is a two-parameter distribution).
Appendix 1-figure 14. Rescaled clone size distribution for the random GIA models whenl R ¼ 30 at the final simulation time, which corresponds to 20=a min (a min is the minimum process rate). The grey shade represents the percentile of all the simulations (black lines limit the 5-95%ile range); the blue curves correspond to some illustrative selected simulations. A reference curve corresponding to a Normal distribution of unitary mean and variance equal to 1=l R ¼ 1=30 is shown in green. Distributions of the total number of cells n are scaled by the mean number of cells n, being x ¼ n= n. Simulations for which the final condition (20 times the inverse of the minimum process rate) is not achieved (due to computational limitations) are not included, resulting in 922 models. Results discussion in provided in 'GIA model for large'.

GIA B test case: bimodal distribution
In the previous subsection we increased l R in a way which assures that other parameters within R stay of the same order of magnitude. Here, we address the question what happens if some parameters within R are much smaller than parameters of C, such as g C . For that purpose, we study another simple GIA model, let us call it GIA B , as a modification of the GIA 0 test model defined by 14. In the GIA B model the renewing compartment is composed by two states X 1 and X 2 , instead of only one. Cells in these states divide asymmetrically (i.e. one daughter cell remains within the renewing compartment while the other enters the committed compartment) or change state between X 1 and X 2 (cell state switching) while still remaining within the renewing compartment. The committed compartment of the system is composed just by a single state, X 3 , and cells in this state either duplicate or die (as the previous state X 2 in Equation 14). This corresponds to the model X 1 À ! l1 X 1 þ X 3 ; X 2 À ! l2 X 2 þ X 3 ; X 1 À ! !12 X 2 ; X 2 À ! !21 X 1 ; X 3 À ! l3 X 3 þ X 3 ; X 3 À ! g ;: In this model, the effective parameters as defined in 'Approximation of generic GIA models', !ijþ!ji , i; j ¼ 1; 2; i 6 ¼ j, and g C ¼ g. As before, we define the non-dimensionalized parametersl R ¼ l R =g C and here we also define! ¼ ! 12 =g C , and further the parameter ratios a ¼ l 1 =l 2 and b ¼ ! 12 =! 21 . In the following, we test this model for different values of a and! as reported in Appendix 1-table 2, while fixingl R ¼ 30, which is our main scaling parameter, as well asl C ¼ 0 and b ¼ 1. The rescaled distribution of the number of cells in the committed compartment C (i.e. in state X 3 ), n C , obtained at the final simulation time t , is shown in Appendix 1- figure 15. A value of t equal to 20=a min (where a min is the minimum of all rate parameters) was chosen to assure that the steady state is reached. Considering first the test cases GIA B #1 and GIA B #2 according to Appendix 1table 2, which are characterized by a ¼ 1 (i.e. there is no difference in the division timescales for the two renewing states), they both lead to a Normal distribution, independently on the value assumed by!. Test cases GIA B #3 to GIA B #7 instead are all characterized by a ¼ 10, and different orders of magnitude for! are tested. The distribution in these cases is Normal until! !l R =10 (see cases GIA B #3 to GIA B #5); when! is significantly lower thanl R , then bimodality emerges (see cases GIA B #6 and GIA B #7). Looking at the extreme case, GIA B #7, cells in each renewing state, if analyzed independently, would result in a Poisson distribution in the committed compartment with different mean values (low for the slow-dividing state and large for the fast-dividing one). Thus, globally the distribution is in line with a bimodal distribution computed as Appendix 1- figure 15. Rescaled distribution of the cells number in the committed compartment in the GIA B test cases at time t , which is 20=a min (a min is the minimum process rate). The distributions Pðx C Þ of the number of cells in the committed compartment n C is rescaled considering that x c ¼ ðn C À n C Þ=s nc , where s nc is the variance of n c . In addition to the stochastic simulation results for different settings (see Appendix 1-table 2), the reference Normal and bimodal distributions are also shown. Results discussion is provided in 'GIA B test case: bimodal distribution'.
in which b is the mixing parameter, computed as, b ¼ n À n 2 n 1 À n 2 ; and the parametersl ðiÞ R and n i for i ¼ 1; 2 correspond to the parameterl R and to the mean number of cells of a system in which the renewing compartment would be composed just by state X i . The total mean number of cells is instead indicated by n. The bimodal distribution given by Equation 45 is indicated as a black dashed-dotted line in Appendix 1-figure 15.

Analysis of the Generalized Population Asymmetry model
In the main text, it is shown that GPA models predict asymptotically, for large times t, the same rescaled clone size distribution, that is, an Exponential distribution of unitary mean.
In Appendix 1- figure 16, the 50%tile distribution of all the GPA models analyzed is shown at different levels of extinction (which are related to the different time points), showing a gradual convergence to the expected Exponential distribution. Thus, the Markov approximation to all GPA models, Equation 12 in the main text (the equivalent model of GPA models), becomes accurate for sufficiently large t and no significant deviations are observed. This also means that for large t, the distribution is independent of the choice of parameters, since only the mean value of surviving clones, n s , depends on parameters, which however, does not affect the rescaled distribution in terms of x ¼ n ns . We can therefore abstain from an extended study of different parameter regimes. This is in contrast to the GIA model class where distributions depend sensitively in the choice of parameters if we are not in the scaling regime of largel R , and the non-Markovian nature of GIA models can become relevant, as we showed in the previous section.

Asymptotic clone size distributions: Mathematical analysis
In the previous two sections, we studied numerically how a Markovian representation can approximate general cell fate models (GIA and GPA) models. Here, we study from an analytical view point how generic GIA and GPA models converge to the respective limiting distributions, for large time t (GPA models) and largel R (GIA models).
Similar to 'Approximation of generic GIA models', we define n R and n C as the cell number vectors (here: actual cell numbers of the stochastic model, not mean cell numbers) restricted to the states of compartments R and C, respectively. We further define the accumulated cell numbers n R ¼ P i ðn R Þ i and n C ¼ P i ðn C Þ i in R and C, respectively. Considering n R and n C as observables of our compartment model, this corresponds to a Hidden Markov Model in that the dynamics of the observables are not Markovian, yet they are entirely determined by a set of states which follow a Markov process.
General dynamics of C-cells for GIA and GPA models Comments on the effective rate parameter l R For general GIA and GPA models in the compartment representation of Equation 9, main text, the effective rate parameter l R (i.e. the frequency of cell divisions in R per cell), is defined similar as in 'Approximation of generic GIA Models', yet, here we take into account that l R can depend on time via the -not necessarily stationary -distribution of cells within R (since the process is non-Markovian). Hence, in these more general terms, we define l R ðtÞ ¼ P i2R l i P R i ðtÞ where P R i ðtÞ ¼ niðtÞ nRðtÞ is the probability of a single cell to be in state i at time t. P R i ðtÞ may variate after each event E, as the conditional probability P R j E , provided that an event E has just occurred, differs from the stationary state distribution.
In homeostasis, where the number of R-cells must on average stay constant, l R is also the rate, per R-cell, at which C-cells are created from R-cells, via events R ! R þ C; R ! C þ C, or direct transition, R ! C. Thus, the total rate of C-cells being created from the R-cells by such events -let us call them RC-events -is l R n R . While the non-Markovian nature of the process does not assure that such events are distributed exponentially, we can state that, by definition, the number of such creation events in a time period Dt, N RC , has mean value hN RC ðDtÞi ¼ R Dt 0 l R ðtÞn R ðtÞ dt. While, l R ðtÞ may in principle depend on time, we note that when internal rates of R are fast compared to the time period Dt above (an internal rate of R is a rate ! ij where states i,j are both in R), then l R ðtÞ fluctuates quickly and we can make an adiabatic approximation, replacing l R ðtÞ by its is the steady state value of P R i ðtÞ (this corresponds for GIA models to the definition of l R in 'Approximation of generic GIA models'). This is fulfilled in our simulations of largel R , since internal rates, such as! defined in 'GIA B test case: bimodal distribution', scale withl R when l R ! ¥ (see 'GIA model for large'). Hence, the time scales of internal rates are substantially smaller than the relevant time scale Dt ¼ 1= g C , the lifetime of generated C-cells. Therefore, when comparing with simulation results, it is generally appropriate to assume that l R ðtÞ » l R is constant. In the following subsection, we will discuss this case. The case when internal rates are slower than the time scale g C is discussed in the subsequent subsection.

Asymptotic distributions of C-cells
Each C-cell created by an RC-event initiates a sub-clone within C, defined through its progeny, which then follows the dynamics of C. Such sub-clones evolve independently of each other (a defining characteristic of branching processes [Haccou et al., 2005]). Let us call the number of cells of a sub-clone created by an RC-event at time t i , which evolves over time t, as i ðtÞ. We denote two RC-events which happen at the same time via a symmetric division of type R ! C þ C by different indices i and i þ 1, yet with t i ¼ t iþ1 . Therefore, the total number of cells in C is the sum of independent random numbers i , Note that the random numbers i ðtÞ are not identically distributed, since their statistics depend on the time point of the i-th RC-event. In particular, the mean value, i ðt À t i Þ ¼ h i ðtÞi and variance s 2 ðt À t i Þ ¼ hð i ðtÞ À i Þ 2 i depend on the time passed since the respective RC-event at time t i . Thus, we cannot apply the central limit theorem in its original form to the sum of random numbers, Equation 47. However, a variation of the central limit theorem states that sums of non-identically distributed random variables, P i i , converge to normally distributed random variables, if mean and variance of i are finite, and they fulfill Lindeberg's condition (Billingsley, 1995).
The (strict) Lindeberg's condition is said to be fulfilled for a sequence of random numbers i ,i ¼ 1; :::; N, if where s 2 i ¼ hð i À i Þ 2 i and s 2 N ¼ P N i¼1 s 2 i . If this is fulfilled, then n C ¼ P N i¼1 i converges for N ! ¥ to a random variable that is normal distributed.
To show that the i fulfill Lindeberg's condition, we note that i ðt À t i Þ follow a sub-critical multitype branching process, for which i ðtÞ ! 0 for t ! ¥, which is assured since the eigenvalues of the adjacency matrix of C are all negative (since dominant eigenvalues of all SCCs in C are negative [astrom_murray_feedback_book]). For multi-type branching processes, the variance s 2 is proportional to the mean value, hence s 2 i ðt À t i Þ~ ðt À t i Þ. Therefore, s 2 i ! 0 for t ! ¥, hence it is bounded, i.e there exists C > 0 such that s 2 i ðtÞ < C for all t. Furthermore, since initially, at t ¼ t i , i ðt i Þ ¼ 1, we know that there exist t 1 > 0 and d > 0 such that i ðtÞ > d for t À t i <t 1 . Now we recall that, since here we assume the validity of the adiabatic approximation discussed in the previous subsection, the number of RC-events within a time period Dt is N RC ðDtÞ~l R R Dt 0 n R ðt 0 Þ dt 0 . For generic l R , N RC is finite and thus is s N , since all s i ðtÞ ! 0 for large t. However, for l R ! ¥ or n R ! ¥, we get that N RC ðt 1 Þ~ l R n R ! ¥ and thus s 2 N ¼ P NRC i¼1 s 2 i ðtÞ > N RC d ! ¥. On the other hand, all s 2 i < C, which means that all s 2 i s 2 N < C s 2 N ! 0 for l R ! ¥ or n R ! ¥. Hence, Lindeberg's condition is fulfilled if l R ! ¥ or n R ! ¥ and thus, n C becomes normally distributed, The variance scales with n C since variances of independent random numbers add linearly and each s 2 i~ i . The exact value of n C and the pre-factor of the variance of n C in this limit depend on the (non-Markovian) model details.

Deviations from a normal distribution in the asymptotic case
The arguments leading to Equation 49 hold for largel R if the internal rates of R are comparable to , which is satisfied for all cases we sampled randomly for numerical simulations, see 'GIA model for large'. However, if internal rates of R are much smaller than l R , then the adiabatic approx- does not apply and l R ðtÞ may vary slower than the time scale 1= g C . For example, consider a GIA model in which R can be decomposed into two sub-compartments, say R 1 and R 2 , whereby any rates ! ij ; ! ji with i 2 R 1 ; j 2 R 2 have ! ij ; ! ji ( l R , as the example discussed in 'GIA B test case: bimodal distribution'. Then, the single cell in R (note that always n R ¼ 1 in GIA models) may spend long time periods in R 1 and R 2 respectively. Now, if then, for time periods exceeding 1= g C , the effective asymmetric division rates are l R1 and l R2 respectively, and during these time periods the distribution of n C cells has mean n ð1Þ C~ l R1 and n ð2Þ C~ l R2 respectively. Hence, the total clone size distribution will be the mix of two Normal distributions with mean n ð1Þ C and n ð2Þ C , respectively, that is, a bimodal distribution. This scenario is discussed in 'GIA B test case: bimodal distribution', for the specific case of two states in R.

GIA models
In GIA models, the number of R-cells is conserved, and in particular, for clones, we have n R ¼ 1 for all times. Hence, the rate of RC-events is simply l R . Now, if internal rates are fast and l R ! ¥, then n C becomes normally distributed, as argued above. Hence, also n ¼ n R þ n C ¼ 1 þ n C follows a Normal distribution, with mean n C þ 1 instead.
Nonetheless, if internal rates are less than g C then bimodal distributions may be observed, as discussed in 'GIA B test case: bimodal distribution'.

GPA models
The dynamics of GPA models read, in compartment formulation, R À ! lR R þ R Pr: r RR R þ C Pr: 1 À r RR À r CC C þ C Pr: r CC 8 > < > : ; (50) Since the dynamics of R-cells do not depend on C-cells, we can first consider the formers' dynamics separately. In homeostasis, where l R r RR ¼ l R r CC þ ! RC , we have thus for R-cells, This is a simple continuous time branching process with two offspring; yet it is non-Markovian: subsequent events may be correlated, since each event imbalances the internal distribution P R i of cells in the compartment R. Yet, as for C-cells, we can write the number of R-cells as a sum of independent (but not identically distributed) random variables. Let us consider for each R-cells, born at time t i , the random variable R i describing its 'survival' state, that is, R i ¼ 1 if that cell is still in R, and R i ¼ 0 if that cell has left R via symmetric differentiation, R ! C þ C or direct transition, R ! C. Essentially, the random numbers R i are the 'branches' of the branching process. Since these events do not depend on other cells, the random numbers R i are independent of each other, and thus, is a sum of independent, not identically distributed random variables. Here,N b ðtÞ is the total number of birth events occurring at rate l R r RR n R , R ! R þ R, up to time t. Since R i ðtÞ 1 and R i ðt ¼ t i Þ ¼ 1, we can argue analogue to above for Equation 49 that the sequence of R i fulfills Lindeberg's condition and thus n R converges to a Normal distribution, whereby the mean value n R ¼ 1 (since due to homeostasis the mean number is constant and the initial condition is n R ðt ¼ 0Þ ¼ 1). Hence, the probability to have n R cells in R is Pðn R Þ / e À ðn R À1Þ 2 2s 2 R~e À n 2 R 2s 2 R for n R ) 1 : However, here, the variance s 2 R is a random variable itself: Since the R i are independent, s 2 R ¼ P NbðtÞ i¼1 s 2 i , where s 2 i ¼ hð R i À R Þ 2 i, and where N b ðtÞ is a random variable. The random numbers R i can only have the values i ¼ 1 or R i ¼ 0 and they follow a simple death process, so for R ¼ 0, it must be s 2 i ¼ 0, while for R i ¼ 1, the variance must be finite, let's say, s 2 i ¼ bðtÞ>0 where b can in principle depend on time, yet is not known (it depends on the non-Markovian details of the model). Hence, since the number of summands with R i ¼ 1 is the number of surviving R-cells, that is, n R . Substituting s 2 R ¼ bðtÞn R into Equation 54 gives, This is an Exponential distribution with mean value n R ¼ hn R i ¼ 2bðtÞ. Finally, when we enforce normalisation of the probability distribution, we get, Pðn R Þ ¼ 1 n R ðtÞ e À n R n R ðtÞ for n R ) 1 : Eventually, we also have to 'add' the C-cells. Since for t ) 1, also n R ) 1, individual events n R ! n R AE 1 do not significantly affect the distribution of R-cells, P R i ¼ ni nR (in contrast to the case of n R ¼ 1 for GIA models), and hence we can assume the adiabatic approximation discussed above, where P R i » P RÃ i and thus l R » const. Therefore, C-cells are distributed according to a Normal distribution with mean n C and variance s 2 n2~ n C~lR n R . As argued in the main text, the mean value of n R , conditionend on survival of a clone, n R >0, must grow over time, without bound if t ! ¥. Therefore, we can generally assume that n R ) 1, and hence both n C~nR ! ¥ and s 2 C~n R ! ¥. However, if we express the