Some limit results for Markov chains indexed by trees

We consider a sequence of Markov chains $(\mathcal X^n)_{n=1,2,...}$ with $\mathcal X^n = (X^n_\sigma)_{\sigma\in\mathcal T}$, indexed by the full binary tree $\mathcal T = \mathcal T_0 \cup \mathcal T_1 \cup ...$, where $\mathcal T_k$ is the $k$th generation of $\mathcal T$. In addition, let $(\Sigma_k)_{k=0,1,2,...}$ be a random walk on $\mathcal T$ with $\Sigma_k \in \mathcal T_k$ and $\widetilde{\mathcal R}^n = (\widetilde R_t^n)_{t\geq 0}$ with $\widetilde R_t^n := X_{\Sigma_{[tn]}}$, arising by observing the Markov chain $\mathcal X^n$ along the random walk. We present a law of large numbers concerning the empirical measure process $\widetilde{\mathcal Z}^n = (\widetilde Z_t^n)_{t\geq 0}$ where $\widetilde{Z}_t^n = \sum_{\sigma\in\mathcal T_{[tn]}} \delta_{X_\sigma^n}$ as $n\to\infty$. Precisely, we show that if $\widetilde{\mathcal R}^n \to \mathcal R$ for some Feller process $\mathcal R = (R_t)_{t\geq 0}$ with deterministic initial condition, then $\widetilde{\mathcal Z}^n \to \mathcal Z$ with $Z_t = \delta_{\mathcal L(R_t)}$.


Introduction
In [BP94], Benjamini and Peres introduced the notion of a tree-indexed Markov chain. Since then, a lot of effort has been spent in studying weak and strong laws of large numbers for very general types of and even possibly random trees [LW03,LY04,Yan03,YL06,Tak06,Guy07].
Our work is motivated by an observation in microbiology, where a population of bacteria is growing (along a binary tree, say), and every individual bacterial cell is in a certain state (e.g. some gene expression profile), which can be -atleast partially -inherited. It has been observed for a long time that such populations tend to be heterogeneous although all cells carry the same genome; see [SK76] for an early reference.
The question which has arisen is about the mechanisms which are responsible for such phenotypic heterogeneity. Two competing views exist: either, random fluctuations lead to heterogeoity [MA99,ELSS02] or social interactions of cells together with a regulatory mechanism are key drivers for heterogeneity [SP11,Pel12]. Several examples are today known to fall in one of the two categories; see the Review [Ave06].
In this manuscript, we analyse one consequence of the first view, i.e. a law of large numbers. This results entails that the dynamics of single cells can be stochastic while the behavior of the whole population becomes deterministic. We will define a Markov kernel dependent on some scaling parameter n (which will tend to infinity) and look at the empirical measure process in the [nt]-th generation of the population, t ≥ 0, which corresponds to a time-scaling of the Keywords: Tree-indexed Markov chain, weak convergence, tightness, random measure, empirical measure AMS Subject classification: 60F15; 60F05 process of empirical measures. We will prove the weak convergence of the empirical measure process, which will be a deterministic limit (if the initial distribution is deterministic).
After presenting the general setup in Section 2, we present our main result in Theorem 1 in Section 3, together with two simple examples. Then, we give the proof of Theorem 1 in Section 4.

Setup
Let be a complete binary tree, rooted at ∅ ∈ T 0 , where σ0, σ1 ∈ T k+1 are the two children of σ ∈ T k , k = 0, 1, 2, ... For σ ∈ T k and j ≤ k, we denote by π j σ the prefix of σ of length j. On T , we set |σ| = k iff σ ∈ T k and in addition, set π −1 σ := π |σ|−1 σ, the immediate ancestor of σ. Define the ≤-relation by writing as the most recent common ancestor of τ and τ ′ .
Let (E, r) be a complete and separable metric space, and denote by B(E) the Borel-σfield, or the set of bounded measurable functions (with an abuse of notations). A stochastic process X = (X σ ) σ∈T is called a time-homogeneous, tree-indexed Markov chain (extending a notion introduced in [BP94]), if there is a Markov transition kernel p from E to B(E 2 ) (the Borel-σ-field on E 2 ) such that for all σ ∈ T and A 0 , A 1 ∈ B(E), With X , we connect the Markov chain R = (R n ) n=0,1,2,... , with transition kernel Here, R arises from observing the state of X when walking along T starting from the root from σ to σ0 and σ1 purely at random. Another representation of R is as follows: Let (Σ k ) k=0,1,2,... be a symmetric random walk on T (independent of X ), i.e. Σ k ∈ T k almost surely and P(Σ k+1 = σ0|Σ k = σ) = P(Σ k+1 = σ1|Σ k = σ) = 1 2 . Then, R d = (X Σ k ) k=0,1,2,... . If (X σ ) σ∈T is a (time-homogeneous) Markov chain, we then define the process of empirical measures Z = (Z k ) k=0,1,2,... through Note that Z takes values in P(E), the set of probability measures on B(E) and that Z is a non-homogeneous Markov chain (indexed by k = 0, 1, 2, ...).
Remark 2.1 (Symmetric, tree-indexed Markov chains). The idea to consider different transition mechanisms to the two different children comes from the work of [Guy07]. A special, classical case is taht of a symmetric tree-indexed Markov chain as follows: We call a time-homogeneous, (tree-indexed) Markov chain with transition kernel p (from E to B(E 2 )) symmetric, if there is a Markov transition kernel q (from E to B(E)) such that for all x ∈ E, A 0 , A 1 ∈ B(E) In other words, the transitions from X σ to X σ0 and to X σ1 are independent. In this case, we have that R k d = X σ for all σ ∈ Σ k .
In the next section, we will deal with a sequence (X n ) n=1,2,... of tree-indexed Markov chains.

Results
Now, we state our main limit theorem for the setup given in the last section. Therefore, let (X n ) n=1,2,... be a sequence of tree-indexed Markov chains with complete and separable metric state spaces (E n , r n ) n=1,2,... . As a limiting state space, we have a complete separable metric space (E, r) and Borel-measurable maps η n : E n → E.
Let R n be the process of observing X n when moving randomly along the tree. We denote the corresponding transition kernel by p n (for X n ) and p R n , respectively. Moreover, let Z n be the process of empirical measures based on X n , which has state space P(E n ), n = 1, 2, ... Our goal is to find sufficient conditions for X n (via R n ), such that the process of empirical measures Z n converges, and to characterize the limit process. We first recall some basic notation.
We need two more notions.
Definition 3.2 (Feller property, compact containment condition). Recall that (E, r) is complete and separable.
1. A Markov process X = (X t ) t≥0 with state space E and càdlàg paths satisfies the Feller Recall that if (E, r) is locally compact, every Feller semigroup is a strongly continuous contraction semigroup ( [Kal02], Theorem 17.6) and is uniquely characterized by its Now we can formulate our main result.
2. As the Theorem shows, the limiting process of empirical measures Z is deterministic (if the initial distribution is a Dirac-measure). The heuristics behind this result is that two distinct values X n σ , X n τ with σ, τ ∈ T [nt] have already evolved independently for O(n) steps. Hence, Z n t is approximately given by the empirical measure of 2 nt independent processes, which leads to a deterministic limit. This argument will be made precise below.
3. Having obtained a law of large numbers, it would be interesting to see a central limit theorem, as well. In the present context, this would require a fine analysis of the error terms ε n appearing in (4.6). We devote this study to future research.
We now give two simple examples for normal and Poisson convergence.
(In other words, the states of the two children of σ are a pair of dependent random variables.) Then, the process R n = (R n t ) t=0,1,2,... can be written as R n Donsker's Theorem yields the convergence R n n→∞ = == ⇒ B to the standard Brownian motion B. Our theorem now says that the limiting process Z is the law of B, so we find that Z = (N (0, t)) t≥0 , where N (0, t) is the normal distribution with mean 0 and variance t.
2. Let (Y n σ ) σ∈T be a family of independent, identically distributed random variables with values in Z + and P(Y n σ > 0) = 2λ/n + o(1/n), P[Y n σ > 1] = o(1/n). Moreover, let X n 0 := 0 and (X n σ0 , X n σ1 ) := X σ , X n σ + Y n σ . (In other words, the state of the left child equals the state of its parent while the state of the right child has a small probability of having increased by 1. Then, the process R n = (R n t ) t=0,1,2,... can be written as 1,2,... are independent and identically distributed with ( Y n k ) * P = 1 2 δ 0 + 1 2 (Y n σ ) * P, i.e. P[ Y n k > 0] = λ/n+o(1/n), P[ Y n k > 1] = o(1/n). Classical convergence results (see e.g. [Kal02], Theorem 5.7) then show that R n converges weakly to a Poisson process with rate λ. Consequently, we then have by the above theorem that Z = (Z t ) t≥0 with Z t = Poi(λt).

Proof of Theorem 1
Throughout this section, we build on the same assumptions as in Theorem 1. We will replace η n ( R n ) by R n and η n * Z n by Z n in the sequel (and similarly for the processes without ∼). This should not cause confusion and increase readability.
Before we start, we give basic relationships between the processes R n and Z n , which we will frequently use. (Some more refined relationships will be given in the proof of Lemma 4.2. Let ϕ ∈ C b (E). Then, (4.1) Similarly, we write (4.2) In the proof of Theorem 1, it suffices to assume that ν = δ x , i.e. deterministic starting conditions. (The general case then follows by mixing over the initial condition.) We need to show two assertions: 1. The sequence ( Z n ) n=1,2,... is tight.
For 2., we will show in Lemma 4.2 that Z n t n→∞ = == ⇒ δ L(Rt) holds for all t ≥ 0. Since the right hand side is deterministic, we have already shown convergence of finite dimensional distribution and we are left with showing 1. Here, we use Jakubowski's tightness criterion, which is recalled in Proposition A.3 in the appendix. For this criterion, we have to show that (i) Z n satisfies the compact containment condition in P(E) (see Definition 3.2) and (ii) that the sequence ( Z 1 t , ϕ ) t≥0 , ( Z 2 t , ϕ ) t≥0 ... is tight for all ϕ ∈ Π ′ (a vector space which separates points). (i) will be resolved in Lemma 4.3, while (ii) is a result in Lemma 4.4. Hence, we are done once we have shown Lemma 4.2, 4.3 and 4.4.
We start with a fundamental fact, which is based on the fact that two random leaves from T n have a most recent common ancestor node which is close to the root. Recall that by [EK86], Corollary 4.8.9 we already have that R n n→∞ = == ⇒ R for a Feller-(hence càdlàg)-process R.
2. Let Σ n 1 , Σ n 2 be two vertices, chosen uniformly at random from T [nt] . Then, in probability.
Proof. Recall that for the (independent) random walk (Σ k ) k=0,1,... on T we have that R n k = X n Σ k . It suffices to prove the result for deterministic R 0 ∈ E. By assumption, for all m ∈ N, since R has càdlàg paths. 1. Let σ ∈ T and |σ| = m. Assume that the assertion does not hold, i.e. X n σ does not converge weakly to R 0 . Let ε > 0 such that P(r(X n σ , R 0 ) > ε) > ε for all n. We have that P(r(R n m , X n σ ) ≤ ε) ≥ P(r(R n m , X n σ ) ≤ ε, R n m = X n σ ) = P(R n m = X n σ ) ≥ 2 −m for all ε > 0, since the random walk (Σ m ) m=0,1,2,... along we read off R n has a chance of 2 −m to pass through vertex σ. Hence, this implies that for ε > 0 as above in contradiction to (4.3). Hence, 1. follows. 2. Let ε > 0 and m be large enough for 2 −m < 2ε. From 1., we have that (X n σ ) σ∈Tm by 1. and we have shown that X Σ n 1 ∧Σ n 2 n→∞ −−−→ R 0 in probability. By the same arguments, we also find that X (Σ n 1 ∧Σ n 2 )i n→∞ −−−→ R 0 in probability for i = 0, 1 and we are done.
Lemma 4.2 (Convergence of Z n at fixed times). Consider the same situation as in Theorem 1 and let t ≥ 0.
Proof. Note that the assertion holds once we show that for all ϕ ∈ C b (E). (Indeed, the family ( Z n t , ϕ ) n=1,2,... is tight by the boundedness of ϕ and any subsequent limit point is deterministic by Lemma A.2.) For this, we already know . Further we will show that which then implies (4.4). For this, consider two randomly picked vertices Σ 1 , Σ 2 ∈ T [nt] with Σ 1 = Σ 2 . Then, without loss of generality we assume that π |Σ 1 ∧Σ 2 |+1 X Σ 1 = X (Σ 1 ∧Σ 2 )0 and π |Σ 1 ∧Σ 2 |+1 X Σ 2 = X (Σ 1 ∧Σ 2 )1 such that (4.6) Hence, we must show ε n t n→∞ −−−→ 0 for (4.5), which is implied by the boundedness of ϕ (showing convergence to 0 of the last term in the last line of (4.6)), by the Cauchy-Schwartz inequality and in probability. We already know from Lemma 4.1 that X n π |Σ n 1 ∧Σ n such that, since R has càdlàg paths, convergence of semigroups and [EK86], Theorem 1.6.1 (see also Remark 4.8.8) and the strong continuity of the semigroup for R, in probability, which shows (4.7). This completes the proof. Now, we come to the proof of the compact containment condition for ( Z n ) n=1,2,... . Lemma 4.3 (Compact containment condition for Z n ). If ( R n ) n=1,2,... satisfies the compact containment condition (in E), then ( Z n ) n=1,2,... satisfies the compact containment condition (in P(E)) as well.
for all L ⊆ P(E) compact. (Such an ε exists since the compact containment condition for Z n does not hold.) For all δ > 0, let K δ ⊆ E be compact and such that sup n=1,2,...
For δ > 0 and ε > 0 as above set Then, the closure of L is a compact subset of P(E) by Prohorov's Theorem and by (4.8) there exist random times τ k , bounded by T such that sup k=1,2,...
Proof. We start by reformulating, using (4.2), n · E ϕ n ( R n s+1/n ) − ϕ n ( R n s )| Z n s ds. (4.11) We now show that M n,ϕn n→∞ = == ⇒ 0. From Lemma 4.2, we already know that Z n t n→∞ = == ⇒ L(R t ). We complement this by showing that (note that the right hand side is deterministic) for all s ≥ 0 in probability, by (3.3), Lemma 4.2 (which shows that the limit of Z n s is deterministic and hence the second to last line in (4.12) converges to 0), and weak convergence R n n→∞ = == ⇒ R. For every t ≥ 0, we now have that (4.13) Hence, we can write by Doob's inequality since M n,ϕ t is bounded in n and convergence in (4.13) also holds in probability. Then, using (4.11), |M n,ϕn s | > ε) by (4.14) and (4.12) .

A Random probability measures
In the following, (E, r) is a complete and separable metric space and P(E) is the set of probability measures on (the Borel σ-algebra of) E, equipped with the topology of weak convergence. We will state some results about random measures.
Lemma A.2 (Characterisation of deterministic random measures). Let Z be a random variable taking values in P(E) with the first two moment measures µ := µ (1) and µ (2) . Then the following assertions are equivalent: 1. There is ν ∈ P(E) with Z = ν, almost surely.