Cox Markov models for estimating single cell growth

Recent experimental techniques produce thousands of data of single cell growth, consequently stochastic models of growth can be validated on true data and used to understand the main mechanisms that control the cell cycle. A sequence of growing cells is usually modeled by a suitable Markov chain. In this framework, the most interesting goal is to infer the distribution of the doubling time (or of the added size) of a cell given its initial size and its elongation rate. In the literature, these distributions are described in terms of the corresponding conditional hazard function, referred as division hazard rate. In this work we propose a simple but effective way to estimate the division hazard by using extended Cox modeling. We investigate the convergence to the stationary distribution of the Markov chain describing the sequence of growing cells and we prove that, under reasonable conditions, the proposed estimators of the division hazard rates are asymptotically consistent. Finally, we apply our model to study some published datasets of E-Coli cells. MSC 2010 subject classifications: Primary 60J05, 62N02, 62P10; secondary 62F12, 62M05.


Introduction
The understanding of the cell cycle is a classic topic in biology. Most bacteria, such as the Escherichia coli and Bacillus subtilis, divide when they have approximately doubled their volume, but there is always a range of variability of cell sizes. In order to explain this variability, many models for cell size regulation have been introduced in the literature. The key idea is that cells need a feedback mechanism that controls their size distribution and that leads to size homeostasis. Most of the models proposed for the control of the cell size can be grouped in three major paradigms: sizer, when the cell monitors its size and triggers the cell cycle once it reaches a critical size [14]; timer, when the cell attempts to grow for a specific amount of time before the division [14]; adder, when the cell attempts to add a constant volume to its newborn size [12]. It is possible also to imagine various combinations of the previous models to describe more complex mechanisms. A proposal in this direction is contained in [9], where a mechanism operating concertedly on the size and the age of the cell is discussed. Other recent proposals that lie between the timer and the adder are contained in [1,7]. We will refer to these kind of models generically as concerted control models.
Using the experimental technique called mother machine of [17], it is possible nowadays to follow thousands Gram-negative Escherichia coli and Gram-positive Bacillus subtilis cells for hundreds of generations under various growth conditions and hence it is now feasible to infer the main mechanism of the growth process in cells.
Δ i := X F,i − X I,i = X I,i (e αiτi − 1) is the size increment for the i-th cell. The dynamic of the new triplet is the same as the one described above, where f τ (·|α i+1 , X I,i+1 ) in (c) is replaced by the conditional density f Δ (·|α i , X I,i+1 ) of Δ i+1 given (X I,i+1 , α i+1 ).
From a biological point of view the most interesting goal is to understand the doubling time (or the increment size) distribution, that is the distribution of τ i (or equivalently Δ i ), given the initial size X I,i and the elongation rate α i , see [1,7,9,12]. These distributions are typically described in terms of the corresponding hazard functions, sometimes referred as division hazard rate functions. An important point is to understand the shape of the hazard function and its dependency on the other variables of interest, such as cell instantaneous and initial size, added volume, elapsed time from the previous cell division, and growth rate. Different dependencies correspond to different paradigms: sizer, timer, adder and various forms of concerted control.
For example, [9] considers a concerted control model on τ i and the inference on the division hazard rate is performed by first choosing a specific parametric form for the conditional hazard h τ of τ i , and then estimating the parameters by best fitting of the empirical (conditional) hazard.
Recently, [11] and [12] have shown that data are well reproduced by adder models, where the division rate depends on the added size Δ i . In particular, [12] suggests a complete independence of the increment Δ i from any other variables. Specifically, the model proposed in [12] to explain the cell growth is a perfect adder, i.e. the conditional hazard of Δ i has the form h Δ (δ|X I,i , α i ) = h Δ,0 (δ).
The empirical distribution of the Δ i 's is used to infer the shape of the division hazard rate h Δ . However the authors admit that there is no strong evidence against a more complex concerted control mechanism.
The partially contradictory results of the previous studies show the importance of studying efficient and rigorous statistical procedures to infer the shape of the hazard and to validate or reject the proposed models.
The aim of this paper is to propose a new effective method for estimating the division hazard rate using Cox modeling. In particular we shall consider a concerted control timer Cox model with hazard of the type h τ (t|X I,i , α i ) = h 0 (t)e Z(t|X I,i ,αi)β0 (1.2) or, alternatively, a concerted control adder Cox model with hazard of the type where both Z(t|X I,i , α i ) and Z(δ|X I,i , α i ) are possibly non constant covariates. Clearly, if β 0 is zero, we recover the pure timer model in the former case and the pure adder mode of [12] in the latter. Provided β 0 = 0, a correction to the pure timer (pure adder, respectively) is set in, giving rise to particular forms of concerted control models. We propose to estimate the regression parameter β 0 and the baseline hazard h 0 from data using the classic Cox partial-likelihood and Breslow estimator. It is worth noticing that the Cox model usually considers conditionally independent observations and exogenous covariates, whereas here we extend it to Markovian observations and endogenous covariates depending on α i and X I,i . The main theoretical contribution of this paper is the asymptotic consistency of the proposed estimators in this Markovian framework, see Propositions 2.1 and 2.3. A crucial assumption for consistency turns out to be the positive Harris recurrence of the growth process chain. For this reason we investigate some conditions for positive Harris recurrence, see Propositions 3.4 and 3.7. These results are of their own interest, since the convergence of the growth process to a stationary distribution ensures size homeostasis.
We applied our estimation methodology to investigate the size regulation mechanism of Escherichia coli, analyzing some real data taken from [4] and [17] and we explored what kinds of hypotheses the data support. We found that a Cox model applied to the increments Δ i (i.e. the concerted control adder Cox model) better explains the data, confirming the fact that the adder paradigm can be a valid choice for modeling the growth mechanism of cells. We compared a pure adder model (which in our context plays the role of the null model) with the concerted control adder Cox model. In almost all the sets of data we analyzed, we found that the model with the hazard depending on the initial size of the cell produces a better fit than the pure incremental model. We also observed that the stationary distribution of the estimated growth process is not very sensitive to the particular choice of the distribution of the elongation rates α i , while a good estimate of the distribution of the ratios η i is important to well reproduce the empirical distributions of the data. More precisely, allowing the distribution of η i to depend on the final size of the mother cell improves the fitting in presence of filaments (exceptionally long cells).
Some recent studies claim the need for more refined stochastic models that consider the dynamics in cell-cycle sub-periods related to DNA replication and segregation (see [10] and the reference therein). Until now measures of the length of these sub-periods are available only for a small number of cells. However, it is believed that these data will be more and more accessible in the next years. A natural approach to model a growth processes with sub-periods might consider a concatenation of Markovian processes similar to the ones we are proposing for a single-period model.
The rest of the paper is organized as follows. In Section 2 we detail the Markov chain for the growth process of a sequence of cells. In the same section, we propose a Cox model for the hazard rate of the interdivision times or of the size increments, we construct the estimators of the regression parameter and of the baseline cumulative hazard and, finally, we prove their asymptotic consistency. In Section 3 we studied the Positive Harris recurrence of the growth process chain. Section 4 is devoted to the applicability of our theoretical results to some specific Cox models selected for the data analysis. In Section 5 we develop an accurate statistical analysis of the cell data.
The Appendix contains all the proofs and some additional tables. In more details, Appendix A contains the proofs of the results stated in Section 2, while Appendix B contains the proofs of the results stated in Section 3. Some elementary properties and results on Markov chains are recalled in Appendix C. Finally, Appendix D collects some supplementary tables.

Extended Cox Markov models for cell size growth data
In this section we face the problem of using an extended Cox model to estimate the conditional hazard either of the interdivision times τ i or, alternatively, of the size increments Δ i , and we establish the asymptotic consistency of the proposed estimators.

Model description
From now on, we shall briefly refer to (X I,i , α i , τ i ) i≥0 as the time increment representation of the growth process and to (X I,i , α i , Δ i ) i≥0 as the space increment representation.
To simplify the presentation, we start noticing that these two chains share the same probabilistic structure. In order to see this, observe that the dynamic (a)-(c) described in the introduction corresponds to the transition rule where f X I is the conditional density of X I,i given X F,i−1 , f α is as in (b) and f τ as in (c). As for the initial condition, we assume (X I,0 , α 0 , τ 0 ) = (x 0 , a 0 , t 0 ). Analogously, the transition kernel of the chain ( Hence, if one sets Y i = (X I,i , α i ) and T i is either τ i in the time increment representation or Δ i in the space increment representation, it follows that in both cases (Y i , T i ) i≥0 is a Markov chain with transition kernel for suitable transition densities f (·|·) and g(·|·).
In order to introduce the Cox model, it is useful to express the conditional survival function of the positive r.v.'s T i in terms of the conditional hazard λ T (u|y), i.e., for every t > 0, we set The hazard function λ T (·|·) : R + ×R 2 + → R + is such that +∞ 0 λ T (s|y)ds = +∞ for every y ∈ R 2 + . Moreover, we shall denote by Λ T (t|y) = t 0 λ T (s|y)ds the corresponding cumulative conditional hazard. Thus, in the case of the time increment representation, one has for y = (x, a) (2.5) Analogously, for the space increment representation, one has Our proposal is to model the conditional hazard of the variable T i with a Cox model with covariate process {Z i (t) = Z(t|Y i ) : 0 ≤ t ≤ T i }. In other words, we assume that the conditional cumulative hazard of the variable T i is where λ 0 is a suitable unknown baseline hazard function, β 0 an unknown real number and Z(·|·) a given covariate function. If we apply this construction to the time increment representation, this leads to the following form for the survival function: Analogously, the survival function for the space increment representation turns out to be It is worth noticing that specifying either the conditional law of Δ i or of τ i by a Cox model leads to non equivalent laws for the growth process.

Timer and adder Cox models
In our framework the model for the cell growth is specified once f α , f η and the hazards H Δ or H τ are given. We assume that there are no parameters shared by f α , f η and H Δ , H τ , thus neither f α nor f η are involved in the estimation procedure of the hazards. Hence, we classify the models in terms of the conditional hazard, introducing six different choices of the covariate process Z. This leads to six families of models that we will apply to the cell growth data in Section 5. In all the cases we shall consider, the covariate process Z is built starting from a given function Z : Some remarks on f η and on the shape of Z are presented in Section 4, with a view to verifying the conditions for stationarity of the chain and the consistency of the estimators proposed in the following.
The model [τ |X I ] This model is a special case of concerted control depending only on the initial size of the cell. Here, Z(t|x, a) = Z(x) and hence In this case the covariate is independent of t and the model reduces to a Cox model with time independent covariates.
The model [τ |X F ] Also this model is a special case of concerted control. Here the conditional law of the interdivision time depends on the current size of the cell, but it does not depend directly on the growth rate. More precisely, we choose Z(t|x, a) = Z(xe at ) and, in terms of hazard, we get Z(x(e at − 1)) and hence The model [Δ|X I ] The model [Δ|X I ] is a concerted control model for the size increments, where the conditional law of the increment depends on the initial size of the cell, that is Z(δ|x, a) = Z(x) and hence The model [Δ|null] For the sake of comparison we shall also consider the null model h Δ (δ|x, a) = h 0,Δ (δ), that corresponds to a pure adder model.

Inference and consistency results
We now deal with the problem of estimating the parameters of the Cox model and we describe the estimation procedure for the process (Y i , T i ) i≥0 defined by  .
Typically the partial likelihood approach is used for independent failures (given the covariates), see e.g. [5] and the references therein. Since here we use the partial likelihood and Breslow estimators for a class of Markovian data, we have to investigate under which assumptions these estimators are consistent. The result is stated in Proposition 2.1 below.
We assume that the reader is familiar with basic notions of the theory of homogeneous Markov chains. However, for the sake of completeness, some definitions and basic results are included in Appendix C. An important hypothesis we need for proving consistency is the Harris positivity of the chain (Y i , T i ) i≥0 .
In particular, if (Y i , T i ) i≥0 is ϕ-irreducible positive Harris, then it is easy to see that its unique invariant distribution π can be disintegrated as Denote by π T the second marginal distribution, i.e. π T (dt) := R 2 + π(dydt), and define Thenβ n converges a.s. to β 0 andΛ 0,n (t) converges to Λ 0 (t) for every t > 0 a.s..
In Section 3 we shall discuss reasonable conditions under which assumption (a) holds true. Hypotheses (b)-(d) are, in general, easy to check and the next proposition provides a sufficient condition for the validity of (e). , and the Cox-Snell residuals, i.e.R i :=Λ n (T i |Y i ). (2.12) The residualsR i turn out to be very useful for checking the overall fit of the model.

Proposition 2.3.
Under the assumptions of Proposition 2.1,Λ n (t|y) converges to Λ T (t|y) for every t and y in R + a.s.. Furthermore, the Cox-Snell residualsR i converge a.s. to Λ T (T i |Y i ) and are asymptotically independent and exponentially distributed with mean one.

Existence of the stationary distribution
The aim of this section is to provide simple criteria in order to ensure that the growth process is a positive Harris Markov chain, both in the time and in the space increment representation. As we have seen in the previous section, this is a fundamental assumption to prove the consistency of the estimators of β 0 and Λ 0 . Moreover, the convergence of a growth process to a steady state is important also from a biological point of view. Indeed, if one shows that a steady state is reached, then homeostasis is guaranteed. First of all, let us note that the special form of the transition kernel essentially reduces the study of the asymptotic behavior of the whole growth process to the study of the embedded chain of the initial sizes (X I,i ) i≥0 . Apropos of this, in the space increment representation, it will be useful to consider the X I,i 's as random variables taking values in [0, +∞). In point of fact, the space increment representation is well-defined even if X I,i = 0, provided that the conditional law of Δ i given X I,i = 0 is properly assigned. Clearly, the case X I,i = 0 and Δ i > 0 is not consistent with (1.1), so that the space increment representation and the time increment representation are no longer equivalent. Although the situation in which X I,i = 0 has not biological interest, it will be considered below in order to simplify the study of the Harris recurrence. Thus, in this section, we assume the embedded chain (X I,i ) i≥0 to be a sequence taking values in R + = (0, +∞), when we deal with the time increment representation, and in [0, +∞), when we consider the space increment representation. In the first case, the embedded chain has kernel while, in the case of the space increment representation, the kernel of the embedded chain is and f Δ (·|0, a) is assumed to be well defined for every a > 0. By direct computations, it is easy to see that, if π X is an invariant probability measure for the embedded chain (X I,i ) i≥0 , then π X is absolutely continuous with respect to the Lebesgue measure and the Markov chain (X I,i , α i , τ i ) i≥0 has invariant probability measure Analogous considerations hold for the chain (X I,i , α i , Δ i ) i≥0 in the space increment representation. In this case the stationary distribution is Similar results hold for irreducibility and, under some additional assumptions, also for Harris recurrence, as it is stated in the following proposition.
is an irreducible Markov chain with respect to some suitable (σ-finite non trivial) measure. If moreover (X I,i ) i≥0 is positive Harris and irreducible with respect to the Lebesgue measure, then (X I,i , α i , τ i ) i≥0 is positive Harris too. The same statement holds in the space increment representation for the chain ( Note that the restriction of the kernel K * to (0, +∞) is a kernel too and it is absolutely continuous with respect to the restriction of the Lebesgue measure to (0, +∞). As already noted, if an invariant probability measure exists then it is absolutely continuous with respect to the Lebesgue measure, thus the restriction of the invariant probability measure to (0, +∞) is still an invariant probability measure for the kernel restricted to (0, +∞). This shows that also the Harris positivity is preserved by the restricted chain (see Corollary 1 in [13]). From now on, we denote by L both the Lebesgue measure on (0, +∞) and on [0, +∞), according to the considered representation. Moreover, by L-irreducible chain we mean a chain irreducible with respect to the Lebesgue measure L.
In the light of the previous considerations, in order to prove that the entire growth process is a positive Harris Markov chain, it is sufficient to prove that the kernel K (or K * ) of the embedded chain (X I,i ) i≥0 is positive Harris and Lirreducible. A classic strategy to prove that a chain (X I,i ) i≥0 is positive Harris is to show that: (i) it is ϕ-irreducible with the support of ϕ which has non empty interior, (ii) it is weakly Feller, (iii) it satisfies the so called drift condition. See Appendix C.

Sufficient condition for Harris positivity in the time increment representation
The following proposition gives a sufficient condition for L-irreducibility.
for every x > 0 and some x, a), then it is easy to see that the assumptions of Proposition 3.2 are satisfied provided that for some˜ > 0 and for every In the next proposition we consider simple hypotheses yielding that the kernel of the embedded chain is weakly Feller.

Proposition 3.3.
Assume that one of the following conditions holds:

Then, the transition kernel of the embedded chain is weakly Feller.
Finally, an application of the drift condition in Proposition C.1 (Appendix C) gives the next result.

Lemma 3.1. Let the transition kernel of the embedded chain
then the chain (X I,i ) i≥0 is positive Harris.

Remark 2. Recall that in the time increment representation
We focus now on time increment models in which the conditional survival function of τ is specified by an extended Cox model, i.e. we assume that S τ (·|·) is given by (2.7) with baseline survival function Applying Lemma 3.1 and Remark 2 one obtains the following criterion to check the Harris positivity for the time increment Cox model.
Assume also that, for If in addition K is weakly Feller and L−irreducible, then the chain (X I,i ) i≥0 is positive Harris recurrent.

Sufficient condition for Harris positivity in the space increment representation
Let us now consider the growth model in its space increment representation.

Remark 3. Following the same steps of the proof of Proposition 3.5, if, for all
Sufficient conditions in order to ensure that the embedded chain is weakly Feller are contained in the next proposition. Proposition 3.6. Assume that one of the following conditions holds: Then, the transition kernel of the embedded chain is weakly Feller.
As for the drift condition one can use the next Lemma 3.2. Let the transition kernel K * of the embedded chain (X I,i ) i≥0 be weakly Feller and L-irreducible. Assume that for some γ ∈ (0, 1] and that lim sup Then, the chain (X I,i ) i is positive Harris recurrent.
Let us consider the baseline survival function then the next proposition summarizes the main result for the space increment Cox model.

Applicability of the theoretical results
In this section we discuss the applicability of the theoretical results, concerning the existence of a stationary distribution and the asymptotic consistency of the estimators, to the timer and adder Cox models in Subsection 2.2 under the following conditions:

Stationarity
Let us investigate conditions for recurrence, Feller continuity and Harris positivity. a) if and only if h 0,τ (t) > 0, and this simplifies the task of verifying the sufficient conditions for the Lirreducibility stated in Proposition 3.2. As t 0 h 0,τ (s)ds < +∞ for every t > 0 and +∞ 0 h 0,τ (s)ds = +∞, one can always find 0 ≤ t 1 < t 2 ≤ +∞, with t 1 and t 2 as big as one likes, and 0 ≤ a 1 < a 2 ≤ +∞ such that If in addition there exist u 1 and u 2 such that then it is easy to show that the assumptions of Proposition 3.2 are satisfied and hence the L-irreducibility is guaranteed for all the timer Cox models.
Moreover, under (H1) and (H2), by Proposition 3.3 the embedded chain turns out to be weakly Feller. To see this, for example, let us consider the [τ |X I ] model. In this case and the first part of condition (c2) is satisfied since Z is continuous. Moreover, for every x, we may take for a suitable positive constant C and z − = min{β 0 M p , 0}. Hence also the second part of (c2) is satisfied and the chain is weakly Feller.
Finally the assumptions of Proposition 3.4 simplify as follows. One has z 0 = 0 and z ∞ = M p , hence the conditions in (3.6) become Hence an invariant probability measure π X for K should solve the distributional equation where X and Y are independent, X has law π X and Y has density g. We have already noticed that, if a stationary distribution π X exists, it must be absolutely continuous with respect to L. In this case it is easy to see that this equation do not have solutions. Thus, the chain X I,i has no invariant probability measure. For this reason, the model [τ |null] has not been included in Subsection 2.2 and will not be considered in the data analysis.
In the case of the model [Δ|null], if a stationary distribution of the process (X I,i ) i≥0 exists, then it is a perpetuity, see e.g. [3]. More precisely, it is a solution of the distributional equation where X ∞ and (η, Δ) are independent and (η, Δ) has the same distribution of each (η i , Δ i ). By Theorem 3.1 and Corollary 4.1 in [3], if E[log + (ηΔ)] < +∞, the previous equation has a unique solution (in law) and hence K * has a stationary distribution. Under condition (4.1), the kernel K * is L-irreducible and, since for every x, K * (x, ·) is absolutely continuous with respect to L, it follows from Proposition C.2 that K * is positive Harris recurrent.

Consistency
We now deal with the applicability of Proposition 2.1 in order to obtain the consistency of the estimators. Assumptions (b) and (c) are an immediate consequence of (H1). If in addition, we assume (H2) and (d), the properties of Harris positivity and L-irreducibility of the chain (X I,i ) i , which imply (a), are discussed in Subsection 4.1. Below, we study the validity of (e) by means of Proposition 2.2.

The models
Now chooseγ such that {a:γt(a)<γ} f α (a)da > 0 and note that Hence, in order to show that π Y {y : Z(t|y) = c} < 1, it suffices to prove that The last claim follows easily since by (3.1) one can write π Y (dxda) = π X (dx)f α (a)da and then To conclude, note that if the assumptions discussed in the previous subsection hold, the chain (X I,i ) i is L-irreducible and hence L is absolutely continuous with respect to π X (see Appendix C), which yields that π X {0 < x < c 1/p /γ} > 0.

Statistical analysis of some cell size growth data
In this section we apply the proposed extended Cox models to analyse some cell growth data taken from [4], [9] and [17].

Data description
The data in [9] have been elaborated from the publicly available dataset [17]. The original crude sample consists of measurements of size (length, width and projected area) at each time frame of 1 min from segmented images of cells growing in steady exponential conditions in a microfluidic device made of micron-sized channels. All the data correspond to the strain E. coli MG1655. To reduce the noise in the size measurements, the Authors in [9] extracted the cell size at a given time point from the fit of exponential cell growth, instead of using the value obtained for cell size at that frame. To this end, a linear fit was performed on the log of cell size against the time, by the least-squares method. This procedure uniquely associates to each cell its initial size X 0 (expressed in μm), final size X F (in μm), doubling time τ (in minutes) and growth rate α. For more details the reader is referred to the supplementary material of [9]. We perform our analysis on three datasets, each of 10000 cells, from now on denoted by MG-A, MG-B, MG-C. All the data refer to the so called old-pole cells, i.e. the cells at the closed end of each channel in the mother machine.
The second group of data are taken from [4] and have been obtained by using agarose pad microscopy under varying nutrient quality. We consider four datasets corresponding to two different strains (referred to as P5-ori and MRR), and three media (denoted by LB, CAA and Glc). For details on the strains and the media see [4]. In the following the four datasets are denoted by CAA-P5ori, Glc-P5ori, Glc-MRR, LB-MRR and contain 5905, 979, 2073, 2296 observations, respectively. As a first approximation, we treated each dataset as a unique sequence of mother-daughter cells.

Goodness of fit by Cox-Snell residuals
Let us recall that in all the models we assume that the growth rates α i are independent and identically distributed and that there are no parameters shared by the densities of α and η and the hazards of Δ and τ . Hence, as noted in the previous section, neither f α nor f η are involved in the estimation procedure of the hazards H Δ and H τ , which we are mainly interested in. For this reason, at a first stage, we may not specify f α and f η . As for the function Z, in line with hypothesis (H1) of Section 4, in order to satisfy assumption (b) of Proposition 2. In order to assess the fit of each alternate model listed in Subsection 2.2, we first estimated the hazard functions and then we computed the Cox-Snell residualsR i as in (2.12). Then, we checked whether the residualsR i 's behave as a sample from a unit exponential distribution, by means of the Kolmogorov-Smirnoff test. Indeed, thanks to Proposition 2.3, if a Cox model fits the data, then the one-sample Kolmogorov-Smirnov test should fail to reject the null hypothesis that the residuals are exponentially distributed with mean one and the corresponding p-value should be high. Tables 1 and 2 report p-values and Kolmogorov-Smirnoff statistics for the datasets MG-A, MG-B, MG-C and CAA-P5ori, Glc-P5ori, Glc-MRR, LB-MRR, respectively.
Concerning the datasets MG-A, MG-B and MG-C, the summaries in Table 1 < 0.05). This supports the idea that the Cox model behaves quite well when applied to the increments Δ i . We also computed the Nelson-Aalen estimator of the cumulative hazard rate for each of the candidate Cox models and compared them to the unit exponential cumulative hazard rate. Results give similar evidence (plots not shown). As regards Cox-Snell residuals for the data CAA-P5ori, Glc-P5ori, Glc-MRR, LB-MRR reported in Table 2, it is worth noticing that their sample sizes are smaller than in the previous datasets and that they are more noisy. The Cox-Snell residuals of this second group of data do not lead to an interpretation as clean as for the first group.

Comparison between simulated processes and empirical observations
In the perspective of deselecting poor models, as a further measure of fit, we compare the actual marginal empirical distributions of the interdivision times τ i , the increments Δ i , the initial sizes X I,i and the final sizes It is important to note that in order to get a sample from an estimated model one needs to take into account the distributions of α and η. According to (H2) in Section 4, the simplest choice assumes η independent of the final size X F . In this way one can estimate the laws of α and η by the corresponding empirical distributions. As mentioned in the introduction, the law of α is quite peaked, hence another possibility is to estimate it by a (truncated) gaussian distribution with the sample mean and variance as parameters. Apropos of this, let us recall that [9] assumes a gaussian model for α. Indeed, an exploratory analysis of the growth rates α i shows that, if one excludes the extreme values (around 0.1% of the largest and 0.1% of the smallest ones), then a gaussian model fits the data. Once f α and f η has been estimated, we generated the artificial dataset and computed the two-sample Kolmogorov-Smirnoff distance between the actual and the simulated data. As the actual and simulated samples are Markovian, we do not perform a homogeneity Kolmogorov-Smirnoff test.
The results in Table 3 confirm that the incremental Cox models fit better the datasets MG-A, MG-B and MG-C than the interdivision time Cox models, for α and η generated from the empirical distributions. Note that the models [ All these findings are confirmed by the plots of the histograms of the actual data compared with the ones of the artificial data. Figures 1(a), 1

(b) and 3(a) show the plots for MG-A, MG-B and MG-C datasets, respectively.
It is worth mentioning that the comparison between artificial and actual data are not very sensitive to the choice of the law of α. For example, Table 8 in Appendix D reports similar values for Kolmogorov-Smirnoff distances under a gaussian fit of α. Conversely, the results are more affected by the choice of the law of η. For instance, the fit is considerably worse if one generates η from a (truncated) gaussian distribution with sample mean and variance as parameters.
As regards the datasets CAA-P5ori, Glc-P5ori, Glc-MRR, LB-MRR from [4], the comparison between simulated and true observations is less easy to interpret.  First of all, it turns out that the fitting is strongly affected by the variance of the simulated η. In particular, for all models listed in Subsection 2.2, we get better results under a constant η equal to the empirical mean, than for η simulated from the empirical distribution. This may be due to the unexpected high value of η's variance, probably caused by the presence of outliers and errors in the reconstruction of the genealogy of the cells.
In Table 4 we report the two-sample Kolmogorov-Smirnoff statistics, when the artificial η i 's are equal to the empirical mean and the artificial α i 's are sampled from the empirical distribution. The model [Δ|X I ] confirms a good fit for all four datasets.  Table 2). Thus, also for these second group of datasets the Cox model seems to behave quite well when applied to the increments Δ i , while it does not work for the interdivision time τ i . Table 9 in Appendix D gives the two-sample Kolmogorov-Smirnoff statistics when both α and η are generated from the empirical distributions. Similar conclusions can be drawn, even if the Kolmogorov-Smirnoff values are in general higher, indicating a worse fit.
As for the null incremental model [Δ|null], we observe that the two-sample Kolmogorov-Smirnoff statistics are very similar to those obtained with the nonnull models [Δ|X I ] and [Δ|τ ], regardless of the choice of η. In particular, [Δ|null] and [Δ|X I ] essentially yield identical results for Glc-P5ori dataset. As one can seen in Table 10 in Appendix D, this fact is consistent with theβ n estimate that is very close to zero, one order of magnitude less than theβ n estimates for the remaining datasets.

Pure adder versus concerted control adder
As mentioned in the introduction, from a biological point of view it is interesting to understand if a pure adder mechanism explains the growth process or if a more complex mechanism, as a generalized Cox model, is needed. On the basis of the results of Subsection 5.
for almost all x. Then we estimate the L 1 error where model stands for [Δ|null] or [Δ|X I ] and E π denotes the expectation with respect to the stationary distribution of the process. We estimated M [Δ|null] by the empirical meanM [Δ|null] whereŜ 0,Δ,n (δ) = exp{−Ĥ 0,Δ,n (δ)},Ĥ 0,Δ,n is the Breslow estimator of H 0,Δ andβ n is the estimator of β 0 (see [19]). Lastly, we estimated the "true" conditional mean E[Δ i |X I,i = x] non-parametrically by the Nadaraya-Watson esti-matorM with G a bounded symmetric kernel density with bandwidth b (see [18]). Here we assumed the Epanechnikov kernel G(ν) = (3/4)(1 − ν 2 ) + and we chose the bandwidth b given in Equation (5.5) page 127 in [16]. In this way we obtained the following plug-in estimator The estimatesM [Δ|X I ] (x) andM ker (x) may be very poor in the tail of the empirical distribution of the X I,i 's and, as a consequence, the empirical estimate of L model turns out to be sensitive to few extreme X I,i values. To alleviate this problem, we considered the following truncated version where X (i) is the i-th order statistic of the X I,i 's, p ∈ (0, 1) and np is the integer part of np. Note thatL model (p) is a plug-in estimator of where q p is the p-th quantile of the stationary distribution of the X I,i 's. Table 5 Table 6. In particular, the third column in both tables showsL [Δ|null] , while the fourth one reportsL [Δ|X I ] in the case Z(x) = min(x, 350) =: Z 1 (x). Comparing these two columns, we conclude that the model [Δ|X I ] works better than the null model, except for Glc-P5ori. It seems that the values ofL model for Glc-P5ori are not affected by the choice of any model. This matches the fact that theβ n estimate under [Δ|X I ] is very close to zero, suggesting that [Δ|null] is the true model for Glc-P5ori.
We performed the same analysis with Z(x) = min(x γ , 350 γ ) for different choices of γ, without observing significant differences. For the MG datasets all these choices give an estimated conditional meanM [Δ|X I ] that is close tô M ker (x) for x smaller than a suitable threshold (between 4 and 5), while they are not completely satisfactory for larger values. See, e.g., black and blue trianglelines in Figure 2(a). For the group of data CAA-P5ori, Glc-MRR, LB-MRR and Glc-P5ori, the behavior ofM ker (x) is more irregular. The estimated conditional mean of Glc-MRR exhibits a pattern similar to the MG datasets. The   estimated conditional mean of Glc-P5ori is essentially constant, while the patterns of CAA-P5ori and LB-MRR are quite oscillatory. In order to save space, in Figure 2(b) we display the parametric and non-parametric estimates of the conditional mean of Glc-MRR dataset only. Summarizing, at least for the MG group of data,M ker is monotonic in the range corresponding to the 90-95% smallest X I,i 's (the percentage depending on the dataset), while for the remaining values of x the behavior is more irregular. Although the estimatesM ker (x) are less reliable for large x, they suggest that the "true" conditional mean E[Δ|X I ](x) is bounded. On the contrary, one can easily see that if β 0 < 0 and Z is strictly increasing, then the conditional mean is strictly increasing with x; the opposite is true if β 0 > 0. Here all theβ n estimates are negative and coherently the corresponding estimatesM [Δ|X I ] (x) are strictly increasing. Unfortunately, this fact is supported neither by the data nor by biological considerations. In the light of these findings, we explored the impact of a "real" truncation in Z, and we con- . As a rule of thumb, we choosed m and M based on some suitable quantiles of the X I,i 's. For instance, the results shown on the fifth column in Tables 5 and Table 6 Figure 2 confirms these findings. We computed the Cox-Snell residuals of the model [Δ|X I ] with such truncated Z's functions and we found that the corresponding p-values are smaller than those from the non truncated case, although they are bigger than 0.05. See Table 11 in Appendix D.
We can conclude that the non-null model [Δ|X I ] behaves better than the null model [Δ|null] in all considered datasets with the exception of Glc-P5ori.

Modeling the dependence between η and X F
In some growth conditions and for some strains, exceptionally long cells (filaments) can be observed. These cells produce a second minor mode in the distribution of cell size, as can be clearly seen for the MG-C dataset in Figure 3. But this second peak is definitely not reproduced by the models of the previous subsections, as in those applications we assumed the independence of η from the final size of the cell. Already in [17] it has been observed that filaments are probably related to aging, an effect that cannot be captured by any homogenous Markov model. Nevertheless, there is another interesting feature that one can try and take into account in the models we are considering: there is a striking difference in the division patterns of the cells depending on their final size. In the note accompanying the datasets of [17], it is pointed out that filaments longer than 15μm never divide at the midcell but at the quarter positions. This is evident for all the MG datasets, see for example Figure 4 for MG-C. This fact suggests that, in modeling the law of η, one should account for some dependence on X F . Roughly speaking, the η's behave in different manners as the final size X F increases and, in particular, for final size bigger than 13μm we observe a bimodalshaped distribution of η with modes around 0.25 and 0.75. In order to bring this feature into our model as simply as possible, we assume that   wherex 0 = 0 <x 1 < · · · <x K = +∞ and f η,j (u) is a density on the unit interval. In what follows K = 3, f η,j (u) = N [0,1],μj ,σj (u) for j = 1, 2 and f η, 3 where N [0,1],a,b denotes a gaussian density with mean a and standard deviation b truncated on [0, 1] and 0 < p 3 < 1.
Step 1 Using the k-means algorithm, we first split the data into three clusters: cl 1 corresponding to small values of X F,i , cl 2 corresponding to the medium ones and cl 3 corresponding to the large ones.
Step 2 As the third cluster cl 3 highlights two point-clouds, apparently based on the values of η i , we perform another k-means algorithm with k = 2 to the only η i 's of the data in cl 3 and denote by cl 31 , cl 32 the clusters induced in cl 3 by the last 2-means classification of the η i 's.
The result of the above clustering procedure applied to MG-C dataset is shown in Figure 4. Hence we takex 1 andx 2 equal to the maximum values attained by the X F,i 's in cl 1 and cl 2 , respectively, and we set p 3 equal to the proportion of the data in cl 3 belonging to cl 31 . Finally, we estimate μ j by the sample mean and σ j by the sample variance of the η i 's such that the couple (η i , X F,i ) belongs to cl j , whereas μ 3j and σ 3j are estimated by the sample mean and variance of η i 's such that (η i , X F,i ) is in cl 3j , for j = 1, 2.
We performed the validation procedure described in Subsection 5.3 using this estimate of the conditional density f η , for the MG datasets. The corresponding two-sample Kolmogorov-Smirnoff distances between actual and simulated data are collected in Table 7. Even with this choice of f η , Cox models applied to the  Table 3 and Table 7 points out that the introduction of a dependence between η and X F improves the fitting of all the marginal distributions for all the MG datasets. In particular, the strongest improvement is obtained for the MG-C dataset, which contains the highest percentage of filaments (8.5% of cells longer than 13μm versus 1.7% in MG-A and 2.8% in MG-B). The histograms of actual and simulated MG-C data in Figure 3(a) (for η sampled from the empirical distribution) and in Figure 3(b) (for η sampled from the distribution described above) reveal that the Δ-models with η dependent on X F reproduce more accurately the tails of the true data. In particular, the resulting distributions for X I and τ turn out to be more peaked and the marginal distribution of X F better mimics the second minor mode of the true data. We did not apply this statistical analysis to the datasets CAA-P5ori, Glc-P5ori, Glc-MRR, LB-MRR, since they are not large enough to contain a non-negligible number of filaments.

Appendix A: Proofs of Section 2
The proof of Proposition 2.1 follows the same line of the analogous results in [2] and it will be carried out in a series of lemmata. The proofs of Proposition 2.2 and Proposition 2.3 are given at the end of this section.

Proof of Proposition 2.1. Let
In order to show thatβ n converges to β 0 P-a.s., one observes that that M n is a concave function andβ n ∈ argmax M n (β) (see Lemma A. 2). Then one shows that M n (β) converges P-a.s. to M (β) for all β (see Lemma A.6) and that β → M (β) has a unique maximum in β 0 (see Lemma A.4). At this stage the first part of the thesis follows by Thm. 7.7 in [6]. The second part of the thesis is proved in Lemma A.7.
The second part of the thesis is the law of large number for a (ϕ-irreducible and) positive Harris Markov chain. See Appendix C.
for every x ∈ R + see e.g. Proposition 4.2.1 in [8]. We now claim that (B.1) implies n≥1 P{(X I,n+j , α n+j , τ n+j ) ∈ C|X I,j = x} for every x ∈ R + and j = 0, 1, . . .. Because of the homogeneity of the chain (X I,n , α n , τ n ), it is sufficient to prove that (B.1) implies (B.2) when j = 0. To see this, note that n≥1 P{(X I,n , α n , τ n ) ∈ C|X I,0 = x} by (B.2) and theφ-irreducibility of (X I,n , α n , τ n ) n≥0 follows. Let now (X I,i ) i≥0 be ϕ-irreducible and positive Harris. Then there exists a (unique) invariant probability measure π X , and it follows from the first part of the proof that (X I,i , α i , τ i ) i≥0 isφ-irreducible with invariant probability measure π given by (3.1). Hence by Proposition C.2 it suffices to show that for all x K(x, ·) is absolutely continuous with respect to π. To this aim consider A ∈ B(R + ) such that where I(x) = {(a,t)∈R 2 + | (x,a,t)∈A} f α (a)f τ (t|x, a)dadt. Then I(x) = 0 π X -a.s.. Since (X I,i ) i≥0 is L-irreducible, then L is absolutely continuous with respect to π X (see Appendix C) and then I(x) = 0 a.s.. It follows that which completes the proof.
The statement for the chain (X I,n , α n , Δ n ) n≥0 can be proved in a similar way.
Proof. By induction it is easy to show that for every x ∈ N c and every y ∈ This shows that, for every x > 0, y > 0 and x, y / ∈ N , there is n = n(x, y) such that k n (x, y) > 0, so that  k(x, y). The L-irreducibility of the imbedded chain follows by Lemma B.1.
Proof of Proposition 3.3. Using the continuity assumption (c1) and the Scheffé's theorem, one gets that for every measurable set A the function x → P{X I,i+1 ∈ A|X I,i = x} is continuous on R + and hence the chain is strongly (and a fortiori weakly) Feller. Alternatively, if (c2) holds, the dominated convergence yields that lim x→x g(y)K(x, dy) = g(y)K(x, dy), for everyx > 0 and every bounded continuous function g. Thus the chain is weakly Feller.
Proof of Lemma 3.1. Condition (3.2) yields the drift condition (C.1) of Proposition C.1 in Appendix C for the chain W n = X I,n , for suitable positive constants and C, a compact set A = [a, b] in R + and the test function V (x) = log 2 (x). As the Lebesgue measure L on R + has support with non-empty interior, L[a, b] > 0 if a < b, and the sub-levels of V are clearly compact sets, then the thesis follows.

Proof of Proposition 3.5.
Recall that the kernel K * of the imbedded chain (X I,i ) i≥0 in the space increment representation is x + δ f Δ (δ|x, a)dδf α (a)da dy and hence, for every x and almost all y, its kernel density k * satisfies x + δ f Δ (δ|x, a)dδf α (a)da.
Thanks to the assumptions and the definition of B x , if y ∈ (0, u 2 (0)d 1 (0)) and x = 0 or if y ∈ (x(1 − ), x(1 + )) and x > 0, the integrand is positive on B * x and so does k * (x, y). The L-irreducibility of the imbedded chain follows again by Lemma B.1.
Proof of Proposition 3.6. The proof follows the same lines of the proof of Proposition 3.3.
Proof of Lemma 3.2. We shall apply Proposition C.1 in Appendix C to the chain W n = X I,n for V (x) = x γ . Since the Lebesgue measure L on [0, +∞) has support with non-empty interior and the sub-levels of V are compact sets, it is enough to verify (C.1) for a suitable closed interval A. As 0 < γ < 1 and 0 ≤ η n ≤ 1 a.s., one has

Appendix C: Some definitions and properties of homogenous Markov chains
A homogeneous Markov chain (W n ) n≥0 taking values in a measurable space (U, U) with transition kernel K(u 0 , du) = P{W n+1 ∈ du|W n = u 0 } is said to be ϕ-irreducible if there is a (σ-finite non trivial) measure ϕ such that n≥1 K n (u 0 , A) > 0 for every u 0 ∈ U and A ∈ U with ϕ(A) > 0. A ϕ-irreducible Markov chain with kernel K is said to be Harris recurrent if, for every A such that ψ(A) = n≥0 K n (x, A)2 −(n+1) ϕ(dx) > 0 and every u 0 ∈ A, P(W n ∈ A i.o. |W 0 = u 0 ) = 1. Finally a ϕ-irreducible Markov chain is said to be positive Harris if it is Harris recurrent and it admits an invariant probability measure, that is if there is a probability measure π W such that π W (A) = K(u, A)π W (du) for every A ∈ U. In point of fact, if K is positive Harris, the invariant probability measure π W is unique and every irriducibility measure ϕ is absolutely continuous with respect to π. See e.g. Thm. 10.4.4 and Thm. 10.4.9 in [8].
Furthermore, recall that if (U, U) is a topological space with the Borel σ−field, the chain (W n ) n≥0 , or equivalently its transition kernel K, is said to be weakly Feller if K maps C b (U) in C b (U) that is w → K(w, du)g(u) is continuous for every bounded and continuous function g. Analogously, (W n ) n≥0 , or equivalently its transition kernel K, is said to be strongly Feller if w → K(w, du)g(u) is continuous for every bounded and measurable function g.
Let now suppose that the homogenous Markov chain (W n ) n≥0 takes values in a Borel subset of R n and let K denote its transition kernel as above. A sufficient condition for (W n ) n≥0 to be Harris positive is the following version of the so-called "drift" criteria.
Proposition C.1. Let the chain (W n ) n≥0 be weakly Feller and ϕ-irreducible with ϕ such that the support of ϕ has non-empty interior. If the drift conditions hold for some compact set A with ϕ(A) > 0, some constants > 0 and C < +∞, and some non-negative, measurable function V such that the sets {w : V (w) ≤ λ} are compact sets for every λ > 0, then (W n ) n≥1 (or equivalently K) is positive Harris.
Proof. Since the chain is weakly Feller and in view of the property of the irreducibility measure ϕ, it follows from Thm. 6.0.1 and Thm. 6.2.7 in [8] that every compact set is petite. Hence, the function V satisfies that {w : V (w) ≤ λ} is petite for every λ > 0, i.e. V are unbounded off petite sets (see p. 191 in [8]). As (C.1) implies that with A compact and hence petite set, then by Thm. 8.4.3 and Prop. 9.1.7 (ii) in [8] we obtain that the chain (W n ) n≥1 is Harris recurrent. Finally, by Thm. 11.0.1 in [8], the drift conditions (C.1) imply that the invariant measure has finite total mass, i.e. up to a constant it is a probability measure. That is the chain (W n ) n≥1 is positive Harris.
Finally, let us recall Corollary 1 in [13], which gives a sufficient condition for Harris recurrence of the kernel K when there exists an invariant σ-finite measure.
Proposition C.2. Suppose K is π-irreducible with πK = π. If K(x, ·) is absolutely continuous with respect to π for all x, then K is Harris recurrent.