Parameters estimation for asymmetric bifurcating autoregressive processes with missing data

We estimate the unknown parameters of an asymmetric bifurcating autoregressive process (BAR) when some of the data are missing. In this aim, we model the observed data by a two-type Galton-Watson process consistent with the binary tree structure of the data. Under independence between the process leading to the missing data and the BAR process and suitable assumptions on the driven noise, we establish the strong consistency of our estimators on the set of non-extinction of the Galton-Watson, via a martingale approach. We also prove a quadratic strong law and the asymptotic normality.


Introduction
Bifurcating autoregressive processes (BAR) generalize autoregressive (AR) processes, when the data have a binary tree structure. Typically, they are involved in modeling cell lineage data, since each cell in one generation gives birth to two offspring in the next one. Cell lineage data usually consist of observations of some quantitative characteristic of the cells, over several generations descended from an initial cell. BAR processes take into account both inherited and environmental effects to explain the evolution of the quantitative characteristic under study. They were first introduced by Cowan and Staudte [4]. In their paper, the original BAR process was defined as follows. The initial cell is labelled 1, and the two offspring of cell k are labelled 2k and 2k + 1. If X k denotes the quantitative characteristic of individual k, then the first-order BAR process is given, for all k ≥ 1, by X 2k = a + bX k + ε 2k , X 2k+1 = a + bX k + ε 2k+1 .
The noise sequence (ε 2k , ε 2k+1 ) represents environmental effects, while a, b are unknown real parameters, with |b| < 1, related to the inherited effects. The driven noise (ε 2k , ε 2k+1 ) was originally supposed to be independent and identically distributed with normal distribution. But since two sister cells are in the same environment at their birth, ε 2k and ε 2k+1 are allowed to be correlated, inducing a correlation between sister cells, distinct from the correlation inherited from their mother.
Recently, experiments made by biologists on aging of Escherichia coli [15], motivated mathematical and statistical studies of the asymmetric BAR process, that is when the quantitative characteristics of the even and odd sisters are allowed to depend on their mother's through different sets of parameters (a, b), see Equation (2.1) below. In [9,8], Guyon proposes an interpretation of the asymmetric BAR process as a bifurcating Markov chain, which allows him to derive laws of large numbers and central limit theorems for the least squares estimators of the unknown parameters of the process. This Markov chain approach was further developed by Bansaye [2] in the context of cell division with parasite infection, and by Delmas and Marsalle [5], where the cells are allowed to die. Another approach based on martingales theory was proposed by Bercu, de Saporta and Gégout-Petit [3], to sharpen the asymptotic analysis of Guyon under weaker assumptions.
The originality of this paper is that we take into account possibly missing data in the estimation procedure of the parameters of the asymmetric BAR process, see Figure 1 for an example. This is a problem of practical interest, as experimental data are often incomplete, either because some cells died, or because the measurement of the characteristic under study was impossible or faulty. For instance, among the 94 colonies dividing up to 9 times studied in [15], in average, there are about 47% of missing data. It is important to take this phenomenon into account in the model for a rigorous statistical study.
Missing data in bifurcating processes were first modeled by Delmas and Marsalle [5]. They defined the genealogy of the cells through a Galton-Watson process, but they took into account the possible asymmetry problem only by differentiating the reproduction laws according to the daughter's type (even or odd). The bifurcating process was thus still a Markov chain. However, considering the biological issue of aging in E. coli naturally leads to introduce the possibility that two cells of different types may not have the same reproduction law. In this paper, we thus introduce a two-type Galton-Watson process to model the genealogy, and lose the Markovian structure of the bifurcating chain, so that we cannot use the same approach as [5]. Instead, we use the martingale approach introduced in [3]. It must be pointed out that missing data are not dealt with in [3], so that we cannot directly use their results either. In particular, the observation process is another source of randomness that requires stronger moment assumptions on the driven noise of the BAR process and careful choice between various filtrations. In addition, the normalizing terms are now random and the convergences are only available on the random non-extinction set of the observed process.
The naive approach to handle missing data would be to replace the sums over all data in the estimators by sums over the observed data only. Our approach is slightly more subtle, as we distinguish whether a cell has even or odd daughters. We propose a joint model where the structure for the observed data is based on a two-type Galton-Watson process consistent with the possibly asymmetric structure of the BAR process. See e.g. [12,1,10] for a presentation of multi-type Galton-Watson processes and general branching processes. Note also that our estimation procedure does not require the previous knowledge of the parameters of the two-type Galton-Watson process. This paper is organized as follows. In Section 2, we first introduce our BAR model as well as related notation, then we define and recall results on the twotype Galton-Watson process used to model the observation process. In Section 3, we give the least square estimator for the parameters of observed BAR process and we state our main results on the convergence and asymptotic normality of our estimators as well as estimation results on data. The proofs are detailed in the following sections.

Joint model
We now introduce our joint model, starting with the asymmetric BAR process for the variables of interest.

Bifurcating autoregressive processes
On the probability space (Ω, A, P), we consider the first-order asymmetric BAR process given, for all k ≥ 1, by (2.1) The initial state X 1 is the characteristic of the ancestor, while (ε 2k , ε 2k+1 ) is the driven noise of the process. In all the sequel, we shall assume that E[X 8 1 ] < ∞. Moreover, as in the previous literature, the parameters (a, b, c, d) belong to R 4 with 0 < max(|b|, |d|) < 1.
This assumption ensures the stability (non explosion) of the BAR process. As explained in the introduction, one can see this BAR process as a first-order autoregressive process on a binary tree, where each vertex represents an individual or cell, vertex 1 being the original ancestor, see Figure 2 for an illustration. We  Figure 2. The tree associated with the bifurcating auto-regressive process.
Let G r k be the generation of individual k, which means that r k = [log 2 (k)], where [x] denotes the largest integer less than or equal to x. Recall that the two offspring of individual k are labelled 2k and 2k + 1, or conversely, the mother of individual k is [k/2]. More generally, the ancestors of individual k are [k/2], [k/2 2 ], . . . , [k/2 r k ]. Denote by T n = n =0 G ,the sub-tree of all individuals from the original individual up to the n-th generation. Note that the cardinality |G n | of G n is 2 n , while that of T n is |T n | = 2 n+1 −1. Next, T denotes the complete tree, so to speak T = n≥0 G n = n≥0 T n = N * = N\{0}. Finally, we need to distinguish the individuals in G n and T n according to their type. Since we are dealing with the types even and odd, that we will also label 0 and 1, we set We now state our assumptions on the noise sequence. Denote by F = (F n ) the natural filtration associated with the first-order BAR process, which means that F n is the σ-algebra generated by all individuals up to the n-th generation, F n = σ{X k , k ∈ T n }. In all the sequel, we shall make use of the following moment and independence hypotheses.

Observation process
We now turn to the modeling of the observation process. The observation process is intended to encode if a datum is missing or not. The natural property it has thus to satisfy is the following: if the datum is missing for some individual, it is also missing for all its descendants. Indeed, the datum may be missing because of the death of the individual, or because the individual is the last of its lineage at the end of the data's gathering, see Figure 3 for an example of partially observed tree.

Definition of the observation process
Mathematically, we define the observation process, (δ k ) k∈T , as follows. We set δ 1 = 1 and define recursively the sequence through the following equalities: where (ζ k = (ζ 0 k , ζ 1 k )) is a sequence of independent random vectors of {0, 1} 2 , ζ i k standing for the number (0 or 1) of descendants of type i of individual k. The sequences (ζ k , k ∈ 2N * ) and (ζ k , k ∈ 2N + 1) are sequences of identically distributed random vectors. We specify the common laws of these two sequences using their generating functions, f (0) and f (1) respectively: where p (i) (j 0 , j 1 ) is the probability that an individual of type i gives birth to j 0 descendants of type 0, and j 1 of type 1. The sequence (δ k ) is thus completely defined. We also assume that the observation process is independent from the BAR process.
(HI) The sequences (δ k ) and (ζ k ) are independent from the sequences (X k ) and (ε k ).
Remark that, since both ζ 0 k and ζ 1 k take values in {0, 1} for all k, the observation process (δ k ) is itself taking values in {0, 1}. Finally, Equation (2.3) ensures that if δ k = 0 for some k ≥ 2, then for all its descendants j, δ j = 0. In relation with the observation process (δ k ), we introduce two filtrations: Z n = σ{ζ k , k ∈ T n }, O n = σ{δ k , k ∈ T n }, and the sigma field O = σ{δ k , k ∈ T}. Notice that O n+1 ⊂ Z n . We also define the sets of observed individuals as follows: Finally, let E be the event corresponding to the cases when there are no individual left to observe. More precisely, We will denote E the complementary set of E.

Results on the observation process
Let us introduce some additional notation. For n ≥ 1, we define the number of observed individuals among the n-th generation, distinguishing according to their types: and we set, for all n ≥ 1, Z n = (Z 0 n , Z 1 n ). Note that for i ∈ {0, 1} and n ≥ 1 one has One has G * 0 = G 0 = {1}, but, even if 1 is odd, the individual whose lineage we study may as well be of type 0 as of type 1. Consequently, we will work with possibly two different initial laws: P(·|Z 0 = e i ), for i ∈ {0, 1}, where e 0 = (1, 0) and e 1 = (0, 1). The process (Z n , n ≥ 0) is thus a two-type Galton-Watson process, and all the results we are giving in this section mainly come from [12]. Notice that the law of ζ k , for even k, is the law of reproduction of an individual of type 0, the first component of ζ k giving the number of children of type 0, the second the number of children of type 1. The same holds for ζ k , with odd k, mutatis mutandis. This ensures the existence of moments of all order for these reproduction laws, and we can thus define the descendants matrix P P = p 00 p 01 p 10 p 11 , is thus the expected number of descendants of type j of an individual of type i. We also introduce the variance of the laws of reproduction: It is well-known (see e.g. Theorem 5.1 of [12]) that when all the entries of the matrix P are positive, P has a positive strictly dominant eigenvalue, denoted π, which is also simple. We make the following main assumptions on the matrix P .
Hence, still following Theorem 5.1 of [12], we know that there exist left and right eigenvectors for π which are positive, in the sense that each component of the vector is positive. We call y = (y 0 , y 1 ) t such a right eigenvector, and z = (z 0 , z 1 ) such a left one; without loss of generality, we choose z such that z 0 + z 1 = 1. Regarding the two-type Galton-Watson process (Z n ), π plays the same role as the expected number of offspring, in the case of standard Galton-Watson processes. In particular, π is related to the extinction of the process, where the set of extinction of (Z n ) is defined as ∪ n≥1 {Z n = (0, 0)}. Notice that {Z n = (0, 0)} = {Z 0 n + Z 1 n = 0} = {|G * n | = 0}, so that this set coincides with E, defined by Eq. (2.4). Now let q = (q 0 , q 1 ), where, for i ∈ {0, 1}, The probability q i is thus the extinction probability if initially there is one individual of type i. These two probabilities allow to compute the extinction probability under any initial distribution, since P(E) = E[(q 0 ) Z 0 0 (q 1 ) Z 1 0 ], thanks to the branching property. Hypothesis (HO) means that the Galton-Watson process (Z n ) is super-critical, and ensures that 0 ≤ q i < 1, for both i = 0 and i = 1. This immediately yields Under that condition, we also have the existence of a non-negative random variable W such that for any initial distribution of Z 0 lim n→+∞ Z n π n = lim n→+∞ π − 1 π n+1 − 1 n =0 Z = W z a.s.. (2.7) It is well-known that {W = 0} = E a.s., so that the set {W > 0} can be viewed as the set of non-extinction E of (Z n ), up to a negligible set. These results give the asymptotic behavior of the number of observed individuals, since |G * n | = Z 0 n + Z 1 n , and |T * n | = n =0 (Z 0 + Z 1 ): Roughly speaking, this means that π n is a deterministic equivalent of |T * n | and Eq. (2.7) implies that z i is the asymptotic proportion of cells of type i in a given generation. We will thus very often replace |T * n | by π n for computations, and the next lemma will be used frequently to replace π n by |T * n |. Lemma 2.1 Under assumption (HO), we have

Joint model
The model under study in this paper is therefore the observed BAR process defined by The aim of this paper is to study the sharp asymptotic properties of the leastsquares estimators of the parameters (a, b, c, d) and the variance matrix of the noise process.

Least-squares estimation
Our goal is to estimate θ = (a, b, c, d) t from the observed individuals up to the n-th generation, that is the observed sub-tree T * n .

Definition of the estimators
We propose to make use of the standard least-squares (LS) estimator θ n which minimizes Consequently, we obviously have for all n ≥ 1 where, for all n ≥ 0, for i ∈ {0, 1}. In order to avoid intricate invertibility assumption, we shall assume, without loss of generality, that for all n ≥ 0, Σ n is invertible. Otherwise, we only have to add the identity matrix I 4 to Σ n , as Proposition 4.2 states that the normalized limit of Σ n is positive definite.
Remark 3.1 Note that when all data are observed, that is when all δ k equal 1, this is simply the least squares estimator described in the previous literature. However, one must be careful here with the indices in the normalizing matrix, as there are now two different matrices S 0 n and S 1 n , while there was only one in the fully observed problem. The intuitive way to deal with missing data would be to restrict the sums to the observed data only. Note that our estimator is more complex as it involves sums depending on the absence or presence of even-or odd-type daughters of the available data.
We now turn to the estimation of the parameters σ 2 and ρ. We propose to estimate the conditional variance σ 2 and the conditional covariance ρ by where for all k ∈ G n , and T * 01 n = {k ∈ T n : δ 2k δ 2k+1 = 1}, so to speak T * 01 n−1 is the set of the cells of the tree T n−1 which have exactly two offspring.

Main results
We can now state the sharp convergence results we obtain for the estimators above. We introduce additional notation. For i ∈ {0, 1}, let us denote : where z = (z 0 , z 1 ) is the left eigenvector for the dominant eigenvalue π of the descendants matrix P introduced in section 2.2.2, h i , k i are defined in Propositions 6.3 and 6.5 and the four terms of L 0,1 defined in Proposition 6.6. We also define the 4 × 4 matrices Our first result deals with the strong consistency of the LS estimator θ n .
In addition, we also have the quadratic strong law Our second result is devoted to the almost sure asymptotic properties of the variance and covariance estimators σ 2 n and ρ n . Let In addition, ρ n converges almost surely to ρ on E and one has Our third result concerns the asymptotic normality for all our estimators θ n , σ 2 n and ρ n given the non-extinction of the underlying Galton-Watson process. For this, using the fact that P(E) = 0 thanks to Eq. (2.6), we define the probability P E by In addition, we also have (3.10) wherep(1, 1) is defined in Eq. (6.6) and The proof of our main results is going to be detailed in the next sections. It is based on martingale properties, and we will exhibit our main martingale (M n ) in Section 4. Sections 5 to 7 are devoted proving to the sharp asymptotic properties of (M n ). Finally, in Section 8 we prove our main results. Before turning to the definition of the martingale (M n ), we present a short application of our estimation procedure on data.

Results on real data
The biological issue addressed by Stewart et al. in [15] is aging in the single cell organism Escherichia coli, see also [7] for further biological details. E. coli is a rod-shaped bacterium that reproduces by dividing in the middle. Each cell has thus a new end (or pole), and an older one. The cell that inherits the old pole of its mother is called the old pole cell, the cell that inherits the new pole of its mother is called the new pole cell. Therefore, each cell has a type: old pole (even) or new pole (odd) cell, inducing asymmetry in the cell division.
Stewart et al. filmed colonies of dividing cells, determining the complete lineage and the growth rate of each cell. Their statistical study of the averaged genealogy and pair-wise comparison of sister cells showed that the old pole cells exhibit cumulatively slowed growth, less offspring biomass production and an increased probability of death. Note that their test assumes independence between the averaged pairs of sister cells which is not verified in the lineage.
Another analysis was proposed in [9]. They model the growth rate by a Markovian bifurcating process, allowing single-experiment statistical analysis instead of averaging all the genealogical trees. Asymptotic properties of a more general asymmetric Markovian bifurcating autoregressive process are then investigated in [8], where a Wald's type test is rigorously constructed to study the asymmetry of the process. These results cannot be compared to ours because this model does not take into account the possibly missing data from the genealogies, and it is not clear how the author manages them, as not a single tree from the data of [15] is complete. In [5], the authors take missing data into account but, contrary to our approach, they allow different sets of parameters for cells with two, one or no offspring, making the direct comparison with our estimator again impossible.
We have applied our methodology on the set of data penna-2002-10-04-4 from the experiments of [15]. It is the largest data set of the experiment. It contains 663 cells up to generation 9 (note that there would be 1023 cells in a full tree up to generation 9). In particular, we have performed • point estimation of the vector θ, • interval estimation for the coefficients (a, b, c, d), • Wald's type symmetry tests for the entries of θ n .  Table 1 gives the estimation θ 9 of θ with the 95% Confidence Interval (C.I.) of each coefficient. The variance given by the CLT for θ in Eq. (3.9), is approximated by Σ −1 n Γ n Σ −1 n thanks to the convergence given in Corollary 4.3. The confidence intervals of b and d show that the non explosion assumption (|b| < 1 and |d| < 1) is satisfied. Some empirical computation on the process (δ k ) gives the following estimation for the highest eigenvalue of the Galton-Watson process :π = 1.35669 (with confidence interval [1.27979, 1.43361], see [14]), also satisfying the super-criticality assumption. Wald tests of comparison between the coefficients of θ have been deduced of the CLT. The null hypotheses (a, b) = (c, d) (resp. a = c, b = d) are rejected with p-values p= 0.0211 (resp. p= 0.0158 and p=0.0244). Hence on this data set the cell division is indeed statistically asymmetric.

Martingale approach
To establish all the asymptotic properties of our estimators, we shall make use of a martingale approach, similar to [3]. However, their results cannot be used in our framework, since the randomness comes now not only from the state process, but also from the time space (genealogy). These two mixed randomness sources require careful choice between various filtrations, and stronger moment assumptions on the driven noise of the BAR process. For all n ≥ 1, denote Thus, for all n ≥ 2, we readily deduce from Equations (3.1) and (2.1) that The key point of our approach is that (M n ) is a martingale for a well chosen filtration.

Martingale property
Recall that O = σ{δ k , k ∈ T} is the σ-field generated by the observation process.
We shall assume that all the history of the process (δ k ) is known at time 0 and use the filtration where S 0 n and S 1 n are defined in section 3.1 and Proof : First, notice that for all n ≥ 1, one has Now, we use the fact that for all n, F O n is a sub-σ field of O ∨ F n , the independence between O and F n under assumption (HI) and the moment hypothesis (HN.1) to obtain We obtain similar results for the other entries of ∆M n as δ 2k+1 and X k are It is clearly square integrable from assumption (HN.1). The same measurability arguments together with assumption (HN.2) yield Hence the result as Our main results are direct consequences of the sharp asymptotic properties of the martingale (M n ). In particular, we will extensively use the strong law of large numbers for locally square integrable real martingales given in Theorem 1.3.15 of [6]. Throughout this paper, we shall also use other auxiliary martingales, either with respect to the same filtration F O , or with respect to other filtrations naturally embedded in our process, see Lemma 5.1.

Asymptotic results
We first give the asymptotic behavior of the matrices S 0 n , S 1 n and S 0,1 n . This is the first step of our asymptotic results.
In addition, L 0 and L 1 , hence Σ are definite positive.
A consequence of this proposition is the asymptotic behavior of the increasing process of the martingale (M n ).
This result is the keystone of our asymptotic analysis. It enables us to prove sharp asymptotic properties for the martingale (M n ).
In addition, we also have Moreover, we have the central limit theorem on (E, P E ) As seen in Eq. (4.1), ( θ n − θ) is closely linked to M n and this last theorem is then the major step to establish the asymptotic properties of our estimators. The proof of this Theorem is given in Section 7. As explained before, it is a consequence of Proposition 4.2 which proof is detailed in Section 6. In between, Section 5 presents preliminary results in the form of laws of large number for the observation, noise and BAR processes.

Laws of large numbers
We now state some laws of large numbers involving the observation, noise and BAR processes. They are based on martingale convergence results, and we start with giving a general result of convergence for martingales adapted to our framework.

Martingale convergence results
The following result is nothing but the strong law of large numbers for square integrable martingales, written in our peculiar setting, and will be repeatedly used.
Lemma 5.1 Let G = (G n ) be some filtration, (H n ) and (G n ) be two sequences of random variables satisfying the following hypotheses: there exists a sequence of real numbers (a n ) that tends to ∞ such that k∈Tn H 2 k = O(a n ). Then k∈Tn H k G k is a G-martingale and one has lim n→∞ 1 a n k∈Tn H k G k = 0 a.s.
Proof: Define D n = k∈Tn H k G k . Assumptions (i) and (ii) clearly yield that (D n ) is a square integrable martingale with respect to the filtration (G n ). Thanks to (ii), its increasing process satisfies and now, (iii) implies that < D > n = O(a n ). Finally, since the sequence (a n ) tends to ∞, Theorem 1.3.15 of [6] ensures that D n = o(a n ) a.s.
We also recall Lemma A.3 of [3] that will be useful in the sequel. In addition, let (X n ) be a sequence of real-valued vectors which converges to a limiting value X. Then, imsart-ejs ver. 2009/08/13 file: EJS1106-014-revised.tex date: September 28, 2011

Laws of large numbers for the observation process
We now give more specific results on the asymptotic behavior of the observation process (δ k ) k≥1 . Recall the notation T i n defined in (2.2). Lemma 5.3 Under the assumption (HO), we have the following convergences, since G 0 = {1}, so that T i n contains 1 or not, according to i = 1 or not, and where Assumption (i) of Lemma 5.1 is obviously satisfied, since δ k , for k ∈ G n , is Z n−1 -measurable. Regarding (ii), since the sequence (ζ j k ) is a sequence of i.i.d. random variables with expectation p ij and variance σ 2 ij , we have E[G k |Z n−1 ] = 0 and E[G 2 k |Z n−1 ] = σ 2 ij , for k ∈ G n , and E[G k G p |Z n−1 ] = 0, for k = p ∈ G n . Finally, we turn to assumption (iii): thanks to (HO) and Eq. (2.7). Finally, D n = o(π n ), and again using Eq. (2.7), we obtain the first limit. The proof of the second one is similar using the Zmartingale: and Lemma 5.1 again.

Laws of large numbers for the noise process
We need to establish strong laws of large numbers for the noise sequence (ε n ) restricted to the observed indices.
Finally, we turn to assumption (iii): as n tends to infinity, thanks to Eq. (2.7). We obtain the result.
Proof: In order to prove the first convergence, we apply again Lemma 5.1 to the F O -martingale: which both implies assumption (iii) and the first convergence. To prove the second convergence, we write imsart-ejs ver. 2009/08/13 file: EJS1106-014-revised.tex date: September 28, 2011 We use Lemma 5.1 to prove that the first term converges to 0 ; Lemma 5.3 gives the limit of the second term.
Proof: The proof of the first limit is similar to the preceding ones, using the decomposition δ 2k+j = δ k ζ j k and the properties of the sequence (ζ j n ). Using Lemma 5.5 the second one is straightforward.
Proof : The proof follows essentially the same lines as the proof of Lemma 5.5 using the square integrable real martingales It is therefore left to the reader.

Convergence of the increasing process
We can now turn to the proof of our keystone result, the convergence of the increasing process of the main martingale (M n ).

Preliminary results
We first need an upper bound of the normalized sums of the δ 2n+i X 2 n , and δ 2n δ 2n+1 X 2 n before being able to deduce their limits. Proof: In all the sequel, for all k ≥ 1, define a 2k = a, b 2k = b, a 2k+1 = c, b 2k+1 = d and η k = a k + ε k with the convention that η 1 = 0. It follows from a recursive application of relation (2.1) that, for all k ≥ 1, with the convention that an empty product equals 1. Set α = max(|a|, |c|), β = max(|b|, |d|) and notice that 0 < β < 1. The proof of Lemma A.5 in [3] yields where, for i ∈ {0, 1}, The last two terms above are readily evaluated by splitting the sums generation-wise. Indeed, the last term can be rewritten as We now use Lemma 5.2 with A n = π −n and X n = β 2n Z i n+1 π −n . On the one hand, the series of (π −n ) converges to π/(π − 1) as π > 1 by assumption; on the other hand, β 2n tends to 0 as n tends to infinity as β < 1, and Z i n π −n converges a.s. to W z i according to Eq. (2.7), hence β 2n Z i n+1 π −n tends to 0 as n tends to infinity. Lemma 5.2 thus yields We now turn to the term B i n : due to Lemma 2.1. It remains to control the first term A i n . Note that ε k appears in A i n as many times as it has descendants up to the n-th generation, and its multiplicative factor for its p-th generation descendant k is β p δ 2k . This leads to

Now, note that
m=0 δ 2(2 p k+m)+i is the number of descendants of type i of individual k after p + 1 generations. We denote it Z i p+1 (k), and split A i n the following way: We first deal with the second term of the above sum.
. Tedious but straightforward computations lead to the following expression for the second order moment of Y i ,p , relying on assumptions (HI), (HN.1) and (HN.2). We also use the fact that, for k ∈ G , conditionally to {δ k = 1}, Z i p+1 (k) follows the same law as Z i p+1 , and is independent of any Z i p+1 (k ), for k = k ∈ G .
since k∈G −1 δ 2k δ 2k+1 ≤ k∈G −1 (δ 2k + δ 2k+1 ) = Z 0 + Z 1 . Now, using results on the moments of a two-type Galton-Watson process (see e.g. [12]), we know that , which immediately entails that |Y i ,p | = o(π α π γp ) a.s., for any α > 1/2 and γ > 1. We thus one gets a.s., since we can choose γ close enough to 1 to get βπ γ ≤ π, as β < 1. We have thus proved that the second term in the sum in (6.3) is O(π n ), we now turn to the first one n =1 k∈G Finally, A i n = O(π n ), and the first result of the Lemma is proved. The second result follows immediately from the remark that the second sum in Lemma 6.1 is clearly smaller than the first one.
We can easily prove that (B i n + C i n ) = O(π n ). Therefore, we only need a sharper estimate for A i n . Via the same lines as in the proof of Lemma 6.1, but dealing with ε 4 k instead of ε 2 k , we can show that A i n = O(π n ) a.s. which immediately yields the first result. The second one is obtained by remarking that the second sum is less than the first one.

Asymptotic behavior of the sum of observed data
We now turn to the asymptotic behavior of the sums of the observed data. More precisely, set H i n = k∈Tn δ 2k+i X k , for i in {0, 1}, and H n = (H 0 n , H 1 n ) t . The following result gives the asymptotic behavior of (H n ).
Proof: We first prove that the sequence (H n ) satisfies a recursive property using Equation (2.1).
Similarly, we have Let us denote B n = (B 0 n , B 1 n ) t . The last equations yield in the matrix form: with P 1 = 1 π bp 00 dp 10 bp 01 dp 11 = 1 π P t b 0 0 d .
One has P n 1 ≤ π −n β n P n . It is well known that π −n P n converges to a fixed matrix (see e.g. [13]) as P is a positive matrix with dominant eigenvalue π. Since β < 1, the sequence P is bounded, I 2 − P 1 is invertible and n≥0 P n 1 converges to (I 2 − P 1 ) −1 . In order to use Lemma 5.2, we need to compute the limit of B n /π n . First, we prove that k∈Tn\T0 ε k δ 2k+i = o(π n ), (6.4) for i ∈ {0, 1}, thanks to Lemma 5.1.

Remark 6.4
Putting together Proposition 6.3 and Eq. (6.5) above, we immediately get that under the same assumptions as that of Proposition 6.3, for all (i, j) ∈ {0, 1} 2 , result we will use for the study of the limit of X 2 k δ 2k+i .

Asymptotic behavior of the sum of squared observed data
We now turn to the asymptotic behavior of the sums of the squared observed data. Set K i n = k∈Tn δ 2k+i X 2 k , for i in {0, 1}, and K n = (K 0 n , K 1 n ) t . The following result gives the asymptotic behavior of (K n ). (HN.1), (HN.2), (HI) and (HO), we have the convergence:

Proposition 6.5 Under assumptions
Proof: We use again Equation (2.1) to prove a recursive property for the sequence (K n ). Following the same lines as in the proof of Proposition 6.3, we obtain: where C n = (C 0 n , C 1 n ) t is defined by for i ∈ {0, 1}. Note that P n 2 ≤ π −n β 2n P n , so that P n 2 converges to 0. In addition, P n 2 is bounded, I 2 − P 2 is invertible and n≥0 P n 2 converges to (I 2 − P 2 ) −1 . In order to use Lemma 5.2, we have to compute the limit of C n /π n . Following the proof of (6.4), we already have, for (i, j) ∈ {0, 1} 2 , k∈T j n ε k δ 2k+i = o(π n ) a.s.
We now turn to the terms k∈Tn−1 X 2 k δ 2k+i (δ 2(2k+i)+j − p ij ),for (i, j) ∈ {0, 1} 2 . To deal with these terms, we use Lemma 5.1 with the same setting we used to prove Eq. (6.5), except that we replace X k with X 2 k . Assumptions (i) and (ii) of Lemma 5.1 have thus already been checked, and regarding (iii), we have k∈Tn−1 X 4 k δ 2k+i = O(π n ) a.s. thanks to Lemma 6.2. We conclude that We use the same martingale tool, so to speak Lemma 5.
And we conclude using Lemma 5.2 again. Propositions 6.3 and 6.5 together with Equation (2.7) give the asymptotic behavior of the matrices S 0 n and S 1 n . The next result gives the behavior of matrix S 0,1 n given through the quantities k∈Tn δ 2k δ 2k+1 X k and k∈Tn δ 2k δ 2k+1 X 2 k . It is an easy consequence of Propositions 6.3 and 6.5, together with Lemma 5.3 for the first limit.

Proof of Proposition 4.2:
We are now in a position to complete the proof of Proposition 4.2. Simply notice that we have proved in Propositions 6.3, 6.5 and 6.6 all the wished convergences, except that we normalized the sums with π n . Thanks to Lemma 2.1, we end the proof.
Remark 6.7 In the case of fully observed date, the matrix P is a 2 × 2 matrix with all entries equal to 1, π equals 2 and the normalized eigenvector z equals (1/2, 1/2). One can check that in that case, our limits correspond to those of [3].

Asymptotic behavior of the main martingale
Theorem 4.4 is a strong law of large numbers for the martingale (M n ). The standard strong law for martingales is unhelpful here. Indeed, it is valid for martingales that can be decomposed in a sum of the form n =1 Ψ −1 ξ where (Ψ ) is predictable and (ξ ) is a martingale difference sequence. In addition, (Ψ ) and (ξ ) are required to be sequences of fixed-size vectors. Such a decomposition with fixed-sized vectors is impossible in our context (see Lemma A.2), essentially because the number of observed data in each generation asymptotically grows exponentially fast as π n . Consequently, we are led to propose a new strong law of large numbers for (M n ), adapted to our framework.
Note that M t n Σ −1 n ∆M n+1 and ∆M t n Σ −1 n M n+1 are scalars, hence they are equal to their own transpose and as a result, one has M t n Σ −1 n ∆M n+1 = ∆M t n Σ −1 n M n+1 . By summing over the identity above, we obtain the main decomposition The asymptotic behavior of the left-hand side of (7.1) is as follows.
Proof : Thanks to the laws of large numbers derived in Sections 5 and 6, the proof of Proposition 7.1 follows essentially the same lines as [3] and is given in Appendix A.
Since (V n ) and (A n ) are two sequences of non negative real numbers, Proposition 7.1 yields that 1 {|G * n |>0} V n = O(n) a.s. which proves Equation (4.2). We now turn to the proof of Equation (4.3). We start with a sharp rate of convergence for (M n ). Proof : The result is obvious on E. On E, the proof follows again the same lines as [3] thanks to the laws of large numbers derived in Sections 5 and 6. It is given in Appendix B.
Besides, Corollary 7.3 yields that A n ∼ n π−1 π 4σ 2 a.s. on E. Plugging these two results into the equality π−1 a.s. on E and convergence (4.3) directly follows.

Proof of the main results
We can now proceed to proving our main results.
Consequently, the asymptotic behavior of θ n − θ is clearly related to the one of V n . More precisely, we can deduce from Corollary 4.3 and the fact that the eigenvalues of a matrix are continuous functions of its coefficients the following result lim where λ min (A) denotes the smallest eigenvalue of matrix A. Since L as well as Σ is definite positive, one has λ min (Σ) > 0. Therefore, as we use Result (4.2) of Theorem 4.4 to conclude that which completes the proof.

Strong consistency for the variance estimators
For n ≥ 1, set The almost sure convergence of σ 2 n and ρ n is strongly related to that of V k −V k .
Proof of result (3.6) of Theorem 3.3: First of all, one has We clearly have imsart-ejs ver. 2009/08/13 file: EJS1106-014-revised.tex date: September 28, 2011 One can observe that for all k ∈ G n , V k − V k is F O n -measurable. Consequently, (P n ) is a real martingale transform for the filtration F O . Hence, we can deduce from the strong law of large numbers for martingale transforms given in Theorem 1.3.24 of [6] together with (3.5) that a.s.
It ensures once again via convergence (3.5) that which completes the proof of result (3.6).
Proof of results (3.7) and (3.8) of Theorem 3.3: We now turn to the study of the covariance estimator ρ n . We have The process (Q n ) is a real martingale transform for the filtration F O satisfying It now remains to prove that where where ⊗ denotes the Kronecker product of matrices, i.e.
and Φ 01 is defined similarly as Φ 0 and Φ 1 by the collection of (δ 2k δ 2k+1 , δ 2k δ 2k+1 X k ) t , k ∈ G . As Φ 01 n (Φ 01 n ) t = S 01 n − S 01 n−1 , proposition 4.2 implies that so that the asymptotic behavior of R n /n boils down to that of A proof along the same lines as in Section 7 finally yields the expected results, i.e.
which completes the proof of convergence (8.1). We then obtain which completes the proof of Theorem 3.3.

Asymptotic normality
Contrary to the previous literature on BAR processes, we cannot use the central limit theorem given by Propositions 7.8 and 7.9 of [11] as in [8,3] because the normalizing term is now the number of observations and is therefore random. The approach used in [5] strongly relies on the gaussian assumption for the noise sequence that does not hold here. Instead, we use the central limit theorem for martingales given in Theorem 3.II.10 of Duflo [6]. However, unlike the previous sections, this theorem can not be directly applied to the martingale (M n ) because the number of observed data in a given generation grows exponentially fast and the Lindeberg condition does not hold. The solution is to use a new filtration. Namely, instead of using the observed generation-wise filtration, we will use the sister pair-wise one. Let for some r > 2 and thanks to Hölder and Chebyshev inequalities. Besides, using Eq. (6.1) and similar calculations as in Lemma 6.1, one readily obtains Now, assumption (HN.1) together with β < 1 yield the existence of a constant C such that sup and recall that E[X 8 1 ] < ∞. Finally, since the entries of D k are combinations of ε 2k+i and X k , using again (HN.1) and (HI), one obtains that with r = 8. The Lindeberg condition is thus proved, plugging the convergence (8.2) into the following equality: We can now conclude that under P E Finally, result (3.9) follows from Eq. (4.1) and Corollary 4.3 together with Slutsky's Lemma.
Proof of Theorem 3.4, second step: On the one hand, we apply Theorem 3.II.10 of [6] As above, one clearly has Using assumptions (HN.1), (HI) and Lemma 5.3 we compute the limit of the increasing process under P E Therefore, the first assumption of Theorem 3.II.10 of [6] holds under P E . Thanks to assumptions (HN.2) and (HI) we can prove that for some r > 2, which implies the Lindeberg condition. Therefore, we obtain that under P E Furthermore, we infer from Eq. (3.6) that which yields result (3.10).
We turn now to the proof of result (3.11) with another G O p -martingale (M (n) ) defined by As above, one easily shows that Using assumptions (HN.1) and (H.I), we compute the limit of the increasing process lim n→∞ < M (n) > νn = ν 2 τ 4 − ρ 2 P E a.s.
Finally, result (3.11) follows, which completes the proof of Theorem 3.4.

Appendix A: Quadratic strong law
In order to establish the quadratic strong law for the main martingale (M n ), we are going to study separately the asymptotic behavior of (W n ) and (B n ) which appear in the main decomposition given by Equation (7.1). 1 n W n = 4(π − 1) π σ 2 1 E a.s.
Proof : First of all, we have the decomposition W n+1 = T n+1 + R n+1 where We first prove that As T n is a scalar and the trace is commutative, one can rewrite T n+1 as T n+1 = tr(Σ −1/2 H n+1 Σ −1/2 ) where Our goal is to make use of the strong law of large numbers for martingale transforms, so we start by adding and subtracting a term involving the conditional expectation of ∆H n+1 given F O n . We have already seen in Section 4.1 that for all n, E[∆M n+1 ∆M t n+1 |F O n ] = Γ n − Γ n−1 . Consequently, we can split H n+1 into two terms On the one hand, it follows from Corollary 4.3 and Lemma 2.1 that Thus, Cesaro convergence and the remark that {|G * | = 0} ⊂ {|G * n | = 0} for all ≤ n yield On the other hand, the sequence (K n ) is obviously a matrix martingale transform and tedious but straightforward calculations, together with Lemmas 6.1 and 6.2 and the strong law of large numbers for martingale transforms given in Theorem 1.3.24 of [6] imply that 1 {|G * n |>0} K n = o(n) a.s. Hence, we infer from the equation above that 1 n T n = π − 1 π tr(Σ −1/2 ΓΣ −1/2 )1 E = π − 1 π 4σ 2 1 E a.s.
which proves (A.1). We now turn to the asymptotic behavior of R n+1 . We know from Proposition 4.2 that 1 {|G * n |>0} (|T * n |Σ −1 n − Σ −1 ) goes to 0 as n goes to infinity. Hence, for all positive and for large enough n, one has Using again that {|G * | = 0} ⊂ {|G * n+1 | = 0} for all ≤ n + 1, we rewrite Hence, ). This last inequality holding for any positive and large enough n, the limit given by Equation (A.2) entails that Proof : The result is obvious on the extinction set E. Now let us work on E. Now for i ∈ {0, 1} and n ≥ 1, let ξ i n = (ε 2 n +i , ε 2 n +2+i , . . . , ε 2 n+1 −2+i ) t , be the collection of ε k , k ∈ G i n , and set ξ n = ξ 0 n , ξ 1 n t . Note that ξ n is a column vector of size 2 n+1 . With these notation, one has The sequence (B n ) is a real martingale transform satisfying ∆B n+1 = B n+1 − B n = 2M t n Σ −1 n Ψ n ξ n+1 .
Consequently, via the strong law of large numbers for martingale transforms, we find that either (B n ) converges a.s. or B n = o(< B > n ) a.s. where As C is definite positive under assumption (HN.1), one has C ≤ 2σ 2 I 2 n+1 in the sense that 2σ 2 I 2 n+1 − C is semi definite positive. Hence, one has

Now, by definition, one has
We now use Lemma B.1 of [3] on each entry to obtain as the matrix l k in that lemma is definite positive. Therefore, we obtain that Finally, we deduce from the main decomposition given by Equation (