Asymptotic analysis for bifurcating autoregressive processes via a martingale approach

We study the asymptotic behavior of the least squares estimators of the unknown parameters of bifurcating autoregressive processes. Under very weak assumptions on the driven noise of the process, namely conditional pair-wise independence and suitable moment conditions, we establish the almost sure convergence of our estimators together with the quadratic strong law and the central limit theorem. All our analysis relies on non-standard asymptotic results for martingales.

1. Introduction.Bifurcating autoregressive (BAR) processes are an adaptation of autoregressive (AR) processes to binary tree structured data.They were first introduced by Cowan and Staudte [2] for cell lineage data, where each individual in one generation gives birth to two offspring in the next generation.Cell lineage data typically consist of observations of some quantitative characteristic of the cells over several generations of descendants from an initial cell.BAR processes take into account both inherited and environmental effects to explain the evolution of the quantitative characteristic under study.
More precisely, the original BAR process is defined as follows.The initial cell is labelled 1, and the two offspring of cell n are labelled 2n and 2n + 1. Denote by X n the quantitative characteristic of individual n.Then, the first-order BAR process is given, for all n ≥ 1, by The noise sequence (ε 2n , ε 2n+1 ) represents environmental effects while a, b are unknown real parameters with |b| < 1.The driven noise (ε 2n , ε 2n+1 ) was originally supposed to be independent and identically distributed with normal distribution.However, two sister cells being in the same environment early in their lives, ε 2n and ε 2n+1 are allowed to be correlated, inducing a correlation between sister cells distinct from the correlation inherited from their mother.
Several extensions of the model have been proposed.On the one hand, we refer the reader to Huggins and Basawa [10] and Basawa and Zhou [1,15] for statistical inference on symmetric bifurcating processes.On the other hand, higher order processes, when not only the effects of the mother but also those of the grand-mother and higher order ancestors are taken into account, have been investigated by Huggins and Basawa [10].Recently, an asymmetric model has been introduced by Guyon [5,6] where only the effects of the mother are considered, but sister cells are allowed to have different conditional distributions.We can also mention a recent work of Delmas and Marsalle [3] dealing with a model of asymmetric bifurcating Markov chains on a Galton Watson tree instead of regular binary tree.
The purpose of this paper is to carry out a sharp analysis of the asymptotic properties of the least squares (LS) estimators of the unknown parameters of general asymmetric pth-order BAR processes.There are several results on statistical inference and asymptotic properties of estimators for BAR models in the literature.For maximum likelihood inference on small independent trees, see Huggins and Basawa [10].For maximum likelihood inference on a single large tree, see Huggins [9] for the original BAR model, Huggins and Basawa [11] for higher order Gaussian BAR models, and Zhou and Basawa [15] for exponential first-order BAR processes.We also refer the reader to Zhou and Basawa [14] for the LS parameter estimation, and to Hwang, Basawa and Yeo [12] for the local asymptotic normality for BAR processes and related asymptotic inference.In all those papers, the process is supposed to be stationary.Consequently, X n has a time-series representation involving an holomorphic function.In Guyon [5], the LS estimator is also investigated, but the process is not stationary, and the author makes intensive use of the tree structure and Markov chain theory.Our goal is to improve and extend the previous results of Guyon [5] via a martingale approach.As previously done by Basawa and Zhou [1,14,15] we shall make use of the strong law of large numbers [4] as well as the central limit theorem [7,8] for martingales.It will allow us to go further in the analysis of general pth-order BAR processes.We shall establish the almost sure convergence of the LS estimators together with the quadratic strong law and the central limit theorem.
The paper is organised as follows.Section 2 is devoted to the presentation of the asymmetric pth-order BAR process under study, while Section 3 deals with the LS estimators of the unknown parameters.In Section 4, we explain our strategy based on martingale theory.Our main results about the asymptotic properties of the LS estimators are given in Section 5.More precisely, we shall establish the almost sure convergence, the quadratic strong law (QSL) and the central limit theorem (CLT) for the LS estimators.The proof of our main results are detailed in Sections 6 to 10, the more technical ones being gathered in the appendices.
2. Bifurcating autoregressive processes.In all the sequel, let p be a non-zero integer.We consider the asymmetric BAR(p) process given, for all n ≥ 2 p−1 , by (2.1) , where [x] stands for the largest integer less than or equal to x.The initial states {X k , 1 ≤ k ≤ 2 p−1 − 1} are the ancestors while (ε 2n , ε 2n+1 ) is the driven noise of the process.The parameters (a 0 , a 1 , . . .a p ) and (b 0 , b 1 , . . ., b p ) are unknown real numbers.The BAR(p) process can be rewritten in the abbreviated vector form given, for all n ≥ 2 p−1 , by (2.2) where the regression vector This process is a direct generalization of the symmetric BAR(p) process studied by Huggins, Basawa and Zhou [10,14].One can also observe that, in the particular case p = 1, it is the asymmetric BAR process studied by Guyon [5,6].In all the sequel, we shall assume that E[X 8 k ] < ∞ for all 1 ≤ k ≤ 2 p−1 −1 and that matrices A and B satisfy the contracting property where A = sup{ Au , u ∈ R p with u = 1}.As explained in the introduction, one can see this BAR(p) process as a pthorder autoregressive process on a binary tree, where each vertex represents an individual or cell, vertex 1 being the original ancestor, see Figure 1 for an illustration.For all n ≥ 1, denote the nth generation by In particular, G 0 = {1} is the initial generation and G 1 = {2, 3} is the first generation of offspring from the first ancestor.Let G rn be the generation of individual n, which means that r n = log 2 (n).Recall that the two offspring of individual n are labelled 2n and 2n + the sub-tree of all individuals from the original individual up to the nth generation.It is clear that the cardinality Finally, we denote by T n,p = {k ∈ T n , k ≥ 2 p } the sub-tree of all individuals up to the nth generation without T p−1 .One can observe that, for all n ≥ 1, T n,0 = T n and, for all p ≥ 1, T p,p = G p .
3. Least-squares estimation.The BAR(p) process (2.1) can be rewritten, for all n ≥ 2 p−1 , in the matrix form and the (p + 1) × 2 matrix parameter θ is given by Our goal is to estimate θ from the observation of all individuals up to the nth generation that is the complete sub-tree T n .Each new generation G n contains half the global available information.Consequently, we shall show that observing the whole tree T n or only generation G n is almost the same.We propose to make use of the standard LS estimator θ n which minimizes Consequently, we obviously have for all n ≥ p where the (p + 1) × (p + 1) matrix S n is defined as In the special case where p = 1, S n simply reduces to In order to avoid useless invertibility assumption, we shall assume, without loss of generality, that for all n ≥ p − 1, S n is invertible.Otherwise, we only have to add the identity matrix I p+1 to S n .In all what follows, we shall make a slight abuse of notation by identifying θ as well as θ n to .
The reason for this change will be explained in Section 4. Hence, we readily deduce from (3.2) that , where ⊗ stands for the matrix Kronecker product.Consequently, it follows from (3.1) that Denote by F = (F n ) the natural filtration associated with the BAR(p) process, which means that F n is the σ-algebra generated by all individuals up to the nth generation, F n = σ{X k , k ∈ T n }.In all the sequel, we shall make use of the five following moment hypotheses.
(H.1)One can find σ 2 > 0 such that, for all n ≥ p − 1 and for all k ∈ G n+1 , ε k belongs to L 2 with Remark 3.1.In contrast with [14], one can observe that we do not assume that (ε 2n , ε 2n+1 ) is a sequence of independent and identically distributed bivariate random vectors.The price to pay for giving up this iid assumption is higher moments, namely assumptions (H.3) and (H.5).Indeed we need them to make use of the strong law of large numbers and the central limit theorem for martingales.However, we do not require any normality assumption on (ε 2n , ε 2n+1 ).Consequently, our assumptions are much weaker than the existing ones in previous literature.
We now turn to the estimation of the parameters σ 2 and ρ.On the one hand, we propose to estimate the conditional variance σ 2 by where for all n ≥ p − 1 and for all k . One can observe that, on the above equations, we make use of only the past observations for the estimation of the parameters.This will be crucial in the asymptotic analysis.On the other hand, we estimate the conditional covariance ρ by 4. Martingale approach.In order to establish all the asymptotic properties of our estimators, we shall make use of a martingale approach.It allows us to impose a very smooth restriction on the driven noise (ε n ) compared with the previous results in the literature.As a matter of fact, we only assume suitable moment conditions on (ε n ) and that (ε 2n , ε 2n+1 ) are conditionally independent, while it is assumed in [14] that (ε 2n , ε 2n+1 ) is a sequence of independent identically distributed random vectors.For all n ≥ p, denote The key point of our approach is that (M n ) is a martingale.Most of all the asymptotic results for martingales were established for vector-valued martingales.That is the reason why we have chosen to make use of vector notation in Section 3. In order to show that (M n ) is a martingale adapted to the filtration F = (F n ), we rewrite it in a compact form.Let Ψ n = I 2 ⊗ Φ n , where Φ n is the rectangular matrix of dimension (p + 1) × δ n , with δ n = 2 n , given by It contains the individuals of generations G n−p+1 up to G n and is also the collection of all Y k , k ∈ G n .Let ξ n be the random vector of dimension δ n .
The vector ξ n gathers the noise variables of generation G n .The special ordering separating odd and even indices is tailor-made so that M n can be written as By the same token, one can observe that Under (H.1) and (H.2), we clearly have for all n ≥ 0, E[ξ n+1 |F n ] = 0 and Ψ n is F n -measurable.In addition, it is not hard to see that for all n ≥ 0, We shall also prove that (M n ) is a square integrable martingale.Its increasing process is given for all n ≥ p + 1 by It is necessary to establish the convergence of S n , properly normalized, in order to prove the asymptotic results for the BAR(p) estimators θ n , σ 2 n and ρ n .One can observe that the sizes of Ψ n and ξ n are not fixed and double at each generation.This is why we have to adapt the proof of vector-valued martingale convergence given in [4] to our framework.

Main results.
We now state our main results, first on the martingale (M n ) and then on our estimators.Proposition 5.1.Assume that (ε n ) satisfies (H.1) to (H.3).Then, we have where L is a positive definite matrix specified in Section 7.
This result is the keystone of our asymptotic analysis.It enables us to prove sharp asymptotic properties for (M n ).
Theorem 5.1 Assume that (ε n ) satisfies (H.1) to (H.3).Then, we have In addition, we also have ) and (H.5), we have the central limit theorem From the asymptotic properties of (M n ), we deduce the asymptotic behavior of our estimators.Our first result deals with the almost sure asymptotic properties of the LS estimator θ n .
Theorem 5.2.Assume that (ε n ) satisfies (H.1) to (H.3).Then, θ n converges almost surely to θ with the rate of convergence In addition, we also have the quadratic strong law where Λ = I 2 ⊗ L.
Our second result is devoted to the almost sure asymptotic properties of the variance and covariance estimators σ 2 n and ρ n .Let Theorem 5.3.Assume that (ε n ) satisfies (H.1) to (H.3).Then, σ 2 n converges almost surely to σ 2 .More precisely, In addition, ρ n converges almost surely to ρ Our third result concerns the asymptotic normality for all our estimators θ n , σ 2 n and ρ n .Theorem 5.4.Assume that (ε n ) satisfies (H.1) to (H.5).Then, we have the central limit theorem In addition, we also have (5.13) The rest of the paper is dedicated to the proof of our main results.We start by giving laws of large numbers for the noise sequence (ε n ) in Section 6.In Section 7, we give the proof of Proposition 5.1.Sections 8, 9 and 10 are devoted to the proofs of Theorems 5.2, 5.3 and 5.4, respectively.The more technical proofs, including that of Theorem 5.1, are postponed to the Appendices.
6. Laws of large numbers for the noise sequence.We first need to establish strong laws of large numbers for the noise sequence (ε n ).These results will be useful in all the sequel.We will extensively use the strong law of large numbers for locally square integrable real martingales given in Theorem 1.3.15 of [4].Lemma 6.1.Assume that (ε n ) satisfies (H.1) and (H.2).Then In addition, if (H.3) holds, we also have and Proof : On the one hand, let Hence, it follows from (H.1) and (H.2) that (P n ) is a square integrable real martingale with increasing process Consequently, we deduce from Theorem 1.3.15 of [4] that P n = o(< P > n ) a.s.which implies (6.1).On the other hand, denote where e n = ε 2 n − σ 2 .We have First of all, it follows from (H.1) that for all k ∈ G n+1 , E[e k |F n ] = 0 a.s.In addition, for all different k, l E[e k e l |F n ] = 0 a.s.
thanks to the conditional independence given by (H.2).Furthermore, we readily deduce from (H.3) that Therefore, (Q n ) is a square integrable real martingale with increasing process Consequently, we obtain from the strong law of large numbers for martingales that (Q n ) converges almost surely.as |T n | − 1 = 2|G n |, which implies (6.2).We also establish (6.3) in a similar way.As a matter of fact, let Then, (R n ) is a square integrable real martingale which converges almost surely, leading to (6.3).
Remark 6.2.Note that via Lemma A.2 In fact, each new generation contains half the global available information, observing the whole tree T n or only generation G n is essentially the same.
For the CLT, we will also need the convergence of higher moments of the driven noise (ε n ).Lemma 6.3.Assume that (ε n ) satisfies (H.1) to (H.5).Then, we have Proof : The proof is left to the reader as it follows essentially the same lines as the proof of Lemma 6.1 using the square integrable real martingales and Remark 6.4.Note that again via Lemma A.2 7. Proof of Proposition 5.1.Proposition 5.1 is a direct application of the two following lemmas which provide two strong laws of large numbers for the sequence of random vectors (X n ).
where the notation {A; B} k means the set of all products of A and B with exactly k terms.For example, we have {A; B} 0 = {I p }, {A; B} 1 = {A, B}, {A; B} 2 = {A 2 , AB, BA, B 2 } and so on.The cardinality of {A; B} k is obviously 2 k .Remark 7.4.One can observe that in the special case p = 1, where 8. Proof of Theorems 5.1 and 5.2.Theorem 5.2 is a consequence of Theorem 5.1.The first result of Theorem 5.1 is a strong law of large numbers for the martingale (M n ).We already mentioned that the standard strong law is useless here.This is due to the fact that the dimension of the random vector ξ n grows exponentially fast as 2 n .Consequently, we are led to propose a new strong law of large numbers for (M n ), adapted to our framework.
Proof of result (5.2) of Theorem 5.1 : First of all, we have By summing over this identity, we obtain the main decomposition (8.1) where The asymptotic behavior of the left-hand side of (8.1) is as follows.
Lemma 8.1 Assume that (ε n ) satisfies (H.1) to (H.3).Then, we have Proof : The proof is given in Appendix B. It relies on the Riccation equation associated to (S n ) and the strong law of large numbers for (W n ).
Since (V n ) and (A n ) are two sequences of positive real numbers, we infer from Lemma 8.1 that V n+1 = O(n) a.s.which ends the proof of (5.2).
Proof of result (5.5) of Theorem 5.2: It clearly follows from (4.1) that Consequently, the asymptotic behavior of θ n − θ is clearly related to the one of V n .More precisely, we can deduce from convergence (5.1) that since L as well as Λ = I 2 ⊗ L are definite positive matrices.Here λ min (Λ) stands for the smallest eigenvalue of the matrix Λ.Therefore, as , we use (5.2) to conclude that which completes the proof of (5.5).
We now turn to the proof of the quadratic strong law.To this end, we need a sharper estimate of the asymptotic behavior of (V n ).Proof of result (5.3) of Theorem 5.1: First of all, A n may be rewritten as and convergence (5.3) directly follows from Corollary 8.3.
We are now in position to prove the QSL.
Proof of result (5.6) of Theorem 5.2: The QSL is a direct consequence of (5.3) together with the fact that which completes the proof of Theorem 5.2.
9. Proof of Theorem 5.3.The almost sure convergence of σ 2 n and ρ n is strongly related to that of V n − V n .
Proof of result (5.7) of Theorem 5.3: We need to prove that Once again, we are searching for a link between the sum of V n − V n and the processes (A n ) and (V n ) whose convergence properties were previously investigated.For all n ≥ p, we have where Now, we can deduce from convergence (8.5) that Therefore, we can conclude via convergence (5.3) that Proof of result (5.8) of Theorem 5.3: First of all, We clearly have Consequently, (P n ) is a real martingale transform.Hence, we can deduce from the strong law of large numbers for martingale transforms given in Theorem 1.3.24 of [4] together with (9.1) that It ensures once again via convergence (9.1) that We now turn to the study of the covariance estimator ρ n .We have where Moreover, one can observe that J 2 ΓJ 2 = Γ.Hence, as before, (Q n ) is a real martingale transform satisfying We will see in Appendix D that Finally, we find from (9.2) that which completes the proof of Theorem 5.3.
10. Proof of Theorem 5.4.In order to prove the CLT for the BAR(p) estimators, we will use the central limit theorem for martingale difference sequences given in Propositions 7.8 and 7.9 of Hamilton [8].
where Ω n is a positive definite matrix and where Ω is also a positive definite matrix.(b) For all n ≥ 1 and for all i, j, k, l, Then, we have the central limit theorem We wish to point out that for BAR(p) processes, it seems impossible to make use of the standard CLT for martingales.This is due to the fact that Lindeberg's condition is not satisfied in our framework.Moreover, as the size of (ξ n ) doubles at each generation, it is also impossible to check condition (c).To overcome this problem, we simply change the filtration.Instead of using the generation-wise filtration, we will use the sister pair-wise one.Let be the σ-algebra generated by all pairs of individuals up to the offspring of individual n.Hence (ε 2n , ε 2n+1 ) is G n -measurable.Note that G n is also the σ-algebra generated by, on the one hand, all the past generations up to that of individual n, i.e. the r n th generation, and, on the other hand, all pairs of the (r n + 1)th generation with ancestors less than or equal to n.In short, Therefore, (H.2) implies that the processes (ε 2n , Proof of result (5.4) of Theorem 5.1: First, recall that Y n = (1, X n ) t .We apply Propositions 10.1 to the G n -martingale difference sequence (D n ) given by .
We clearly have Hence, it follows from (H.1) and (H.2) that Moreover, we can show by a slight change in the proof of Lemmas 7.1 and 7.2 that which is positive definite, so that condition (a) holds.Condition (b) also clearly holds under (H.3).We now turn to condition (c).We have Under (H.1) to (H.5), we can show that (R n ) is a martingale transform.Moreover, we can prove that R n = o(n) a.s.using Lemma A.6 and similar calculations as in Appendix B where a more complicated martingale transform (K n ) is studied.Consequently, condition (c) also holds and we can conclude that Proof of result (5.11) of Theorem 5.4: We deduce from (4.1) that Hence, (5.11) directly follows from (5.4) and convergence (8.5) together with Slutsky's Lemma.
Proof of results (5.12) and (5.13) of Theorem 5.4: On the one hand, we apply Propositions 10.1 to the G n -martingale difference sequence (v n ) defined by Hence, condition (a) holds.Once again, condition (b) clearly holds under (H.5), and Lemma 6.3 together with Remark 6.4 imply condition (c), Therefore, we obtain that Furthermore, we infer from (5.8) that Finally, (10.2) and ( 10.3) imply (5.12).On the other hand, we apply again Proposition 10.1 to the G n -martingale difference sequence (w n ) given by Under (H.4), one has E[w 2 n ] = ν 2 −ρ 2 which implies that condition (a) holds since lim Once again, condition (b) clearly holds under (H.5), and Lemmas 6.1 and 6.3 yield condition (c), Consequently, we obtain that Furthermore, we infer from (5.10) that (10.5) lim Finally, (5.13) follows from (10.4) and (10.5) which completes the proof of Theorem 5.4.

Laws of large numbers for the BAR process
We start with some technical Lemmas we make repeatedly use of, the well-known Kronecker's Lemma given in Lemma 1.3.14 of [4] together with some related results.
Lemma A.1.Let (α n ) be a sequence of positive real numbers increasing to infinity.In addition, let (x n ) be a sequence of real numbers such that Then, one has x k = 0.
Lemma A.2. Let (x n ) be a sequence of real numbers.Then, Proof: First of all, recall that |T n | = 2 n+1 − 1 and We have the decomposition, Consequently, Conversely, suppose that A direct application of Toeplitz Lemma given in Lemma 2.2.13 of [4]) yields Lemma A.3.Let (A n ) be a sequence of real-valued matrices such that In addition, let (X n ) be a sequence of real-valued vectors which converges to a limiting value X.Then, Proof: For all n ≥ 0, let Proof: First of all, recall that β = max{ A , B } < 1.The cardinality of {A; B} k is obviously 2 k .Consequently, if it is not hard to see that Hence, (U n ) converges to zero which completes the proof of Lemma A.4.
We now return to the BAR process.We first need an estimate of the sum of the X n 2 before being able to investigate the limits.
It follows from a recursive application of relation (2.2) that for all n ≥ 2 p−1 (A.5) with the convention that an empty product equals 1.Then, we can deduce from Cauchy-Schwarz inequality that for all n ≥ 2 p−1 Hence, we obtain that for all n ≥ 2 p , where The last two terms of (A.6) are readily evaluated by splitting the sums generation-wise.As a matter of fact, (A.7) It remains to control the first term P n .One can observe that ε k appears in P n as many times as it has descendants up to the nth generation, and its multiplicative factor for its ith generation descendant is (2β) i .Hence, one has The evaluation of P n depends on the value of 0 < β < 1.On the one hand, if β = 1/2, P n reduces to Hence, However, it follows from Remark 6.2 that In addition, we also have Consequently, we infer from Lemma A.3 that (A.9) On the other hand, if β = 1/2, we have Thus, which ends the proof of Lemma 7.1.
Proof of Lemma 7.2 : We shall proceed as in the proof of Lemma 7.1 and use the same notation.Let We infer again from (2.2) that The two first results (6.1) and (6.2) of Lemma 6.1 together with Remark 6.In order to establish the quadratic strong law for (M n ), we are going to study separately the asymptotic behaviour of (W n ) and (B n ) which appear in the main decomposition (8.1).
Lemma B.3 Assume that (ε n ) satisfies (H.1) to (H.3).Then, we have Proof : First of all, we have the decomposition W n+1 = T n+1 + R n+1 where We claim that lim Our goal is to make use of the strong law of large numbers for martingale transforms, so we start by adding and subtracting a term involving the conditional expectation of ∆H n+1 given F n .We have already seen in Section 4 that for all n ≥ p − 1, E[∆M n+1 ∆M t n+1 |F n ] = Γ ⊗ Φ n Φ t n .Consequently, we can split H n+1 into two terms where On the one hand, it follows from convergence (5.1) and Lemma A. On the other hand, the sequence (K n ) is obviously a matrix martingale transform satisfying For all u ∈ R 2(p+1) , let K n (u) = u t K n u.It follows from tedious but straightforward calculations, together with (A.4), (A.13) and the strong law of large numbers for martingale transforms given in Theorem 1.

On Wei's Lemma
In order to prove (8.3), we shall apply Wei's Lemma given in [13] page 1672, to each entry of the vector-valued martingale .
We shall only carry out the proof for the first (p + 1) of M n inasmuch as the proof for the (p + 1) last components follows exactly the same lines.Denote On the one hand, P n can be rewritten as We

Figure 1 .
Figure 1.The tree associated with the bifurcating auto-regressive process.
Finally, as (|G n |) is a positive real sequence which increases to infinity, we find from Lemma A.1 in Appendix A that n k=p i∈G k e i = o(|G n |) a.s.leading to n k=p i∈G k e i = o(|T n |) a.s.
Proof : The proof is given in Appendix C.A direct application of Lemma 8.2 ensures that V n = o(n δ ) a.s.for all δ > 1/2.