A spectral decomposition for a simple mutation model

We consider a population of N individuals. Each individual has a type belonging to some at most countable type space K . At each time step each individual of type k ∈ K mutates to type l ∈ K independently of the other individuals with probability m k,l . It is shown that the associated empirical measure process is Markovian. For the two-type case K = { 0 , 1 } we derive an explicit spectral decomposition for the transition matrix P of the Markov chain Y = ( Y n ) n ≥ 0 , where Y n denotes the number of individuals of type 1 at time n . The result in particular shows that P has eigenvalues (1 − m 0 , 1 − m 1 , 0 ) i , i ∈ { 0 , . . . , N } . Applications to mean ﬁrst passage times are provided.


Introduction and main results
Consider a population consisting of N ∈ N := {1, 2, . . .} individuals. It is assumed that each individual has a type belonging to some at most countable type space K. At each time step each individual of type k ∈ K mutates its type independently of the other individuals to type l ∈ K with given probability m k,l ∈ [0, 1]. We call the stochastic matrix M := (m k,l ) k,l∈K the mutation matrix.
For n ∈ N 0 := {0, 1, . . .} and r ∈ {1, . . . , N } let X (r) n denote the type of individual r at time step n. Clearly, X := (X n ) n∈N0 , defined via X n := (X (1) n , . . . , X (N ) n ) for all n ∈ N 0 , is a homogeneous discrete-time Markov chain with state space K N . Since for each r ∈ {1, . . . , N } the process (X (r) n ) n∈N0 is a Markov chain and since the types of the N individuals evolve independently, the Markov chain X is a so-called product chain having n-step transition probabilities π (n) i,j := P(X n = j | X 0 = i) = N r=1 P(X (r) n = j r | X is called the empirical measure process of X. Let K denote the power set of K. Clearly, η has state space M, the set of measures µ on (K, K) with values in {0, . . . , N } and total mass µ(K) = N . It is a priori not totally obvious that η is still Markovian, since functions of Markov chains are in general not Markovian anymore. Proposition 1.1 below shows that η still enjoys the Markov property and provides a formula for its n-step transition probabilities. The process of going from X to η is an example of what is called projection or lumping of Markov chains in the literature. The proof of Proposition 1.1 is provided in Section 4. In the following, for any measure µ ∈ M, we use the notation µ k := µ({k}), k ∈ K.
where the sum T extends over all T = (t k,l ) k,l∈K ∈ N K×K 0 with marginals l∈K t k,l = ν k , k ∈ K, and k∈K t k,l = µ l , l ∈ K.
From now on we restrict to the two-type situation K = {0, 1} and write the mutation matrix M = (m k,l ) k,l∈{0,1} as well in the form to avoid indices. The model has hence three parameters, the population size N and the two mutation probabilities a = m 0,1 and b = m 1,0 . We exclude the two trivial cases a = b = 0 and a = b = 1. Note that the entries of M n = (m (n) k,l ) k,l∈{0,1} = 1 − a n a n b n 1 − b n are explicitly known, namely a n = m (n) For the particular situation a = b this model was studied by Scoppola [12] and in a preprint of Berger and Cerf [1]. We allow here for a general stochastic mutation matrix M . Nestoridi [10] studies a different but slightly related random walk on the hypercube {0, 1} N which at each step flips a fixed number k of randomly chosen coordinates. As in [1] we are interested in the stochastic process Y := (Y n ) n∈N0 , defined via Y n := η n ({1}) = N r=1 X (r) n =: X n 1 for all n ∈ N 0 . Note that Y n counts the individuals of type 1 at time n. Corollary 1.2 below is a straightforward consequence of Proposition 1.1. In the following we use for the binomial probabilities the notation B(n, p, k) := The process Y is a homogeneous discrete-time Markov chain with state space S := {0, . . . , N } and n-step transition probabilities where a n = m (n) i,j = P(K n + L n = j), where K n and L n are independent random variables with K n binomially distributed with parameters i and 1 − b n and L n binomially distributed with parameters N − i and a n . From |1 − a − b| < 1 we conclude that lim n→∞ a n = a/(a + b) and lim n→∞ b n = b/(a + b). Thus, K n → K ∞ and L n → L ∞ in distribution as n → ∞, where K ∞ and L ∞ are independent with K ∞ binomially distributed with parameters i and a/(a + b) and L ∞ binomially distributed with parameters N − i and a/(a + b). It follows that The stationary distribution ( j ) j∈S of Y is hence the binomial distribution with parameters N and a/(a + b).
Let us now turn to spectral analysis. Spectral decompositions of transition matrices P for Ehrenfest type models [5] have been provided by Kac [8]. Ehrenfest type models belong to the class of nearest neighborhood models or local Markov chains allowing for jumps only to the nearest neighbors (or at least to close neighbors) with positive probability. In contrast, the Markov chain Y under consideration is non-local satisfying p i,j > 0 for all i, j ∈ S. In this situation it is usually much harder to find the eigenvalues and corresponding eigenvectors of P . The literature on (examples of) such classes of full occupied matrices with explicitly known spectral decomposition is hence somewhat more sparse. One example from mathematical population genetics are transition matrices of the forward process of exchangeable Cannings population models [3,4,6]. Another well known example, being important in the theory of discrete Fourier transforms, are matrices of the form P = (a (i+j) mod N +1 ) i,j∈{0,...,N } for some given sequence (a 0 , . . . , a N ). In the following it is assumed that a = 0 and b = 0 to avoid trivialities. In order to  Note that A and B are non-singular with inverses A −1 and B −1 having entries The main result (Theorem 1.4) provides an explicit spectral decomposition for the transition matrix of the Markov chain Y . Its proof is provided in Section 4.
The last column of R contains the (obvious) right eigenvector (1, . . . , 1) T to the eigenvalue (1 − a − b) 0 = 1. The last row of L contains the probabilities l N,j = B(N, a/(a + b), j) = j , j ∈ S, of the stationary distribution of Y . For a = b the matrices R and L only depend on the population size N but not on the parameters a and b.
Lemma 5.2 in the appendix (Section 5) shows that R and L have horizontal generating it follows that the entries of the matrices R and L are related via

Applications to mean passage times
Spectral decompositions of transition matrices P = (p i,j ) i,j∈S of Markov chains have many applications, in particular in the context of potential theory of Markov chains. For example, each spectral decomposition P = RDL with L = R −1 implies as well a spectral decomposition of the resolvent R α : For some more information on the resolvent we refer the reader to Norris [11, p. 146].
We provide here another application concerning mean passage times. Define W := (w i,j ) i,j∈S := lim n→∞ P n . Clearly, W P = W and W 2 = W and, hence, Since, for all i = j, we can summaries these results in the form Explicit formulas for p (n) i,j are available for the present example (see (1.3)). However, the calculation of the series on the right hand side in (2.1) is inconvenient. Fortunately, thanks to the spectral decomposition of P , this series can be expressed as a finite sum and can hence be calculated rather easily as follows. We have r i,N = 1 for all i ∈ S and, hence,

Final remarks and open problems
1. It is natural to conjecture that the transition matrix Π = (π i,j ) i,j∈{0,1} N of the original product Markov chain X has as well (as Y ) the eigenvalues λ i : for all n ∈ N 0 and all i = (i k ) k∈K , j = (j k ) k∈K ∈ S, where the sum T extends over all T := (t k,l ) k,l∈K ∈ N K×K 0 with marginals l∈K t k,l = i k , k ∈ K, and k∈K t k,l = j l , l ∈ K. Finding spectral decompositions of the transition matrices of X or Y for the multi-type model is a challenging open problem.
3. Let us finally provide some (mainly non-rigorous) information on the mixing time of the chain Y . Assume that P(Y 0 = i) = 1 for some given fixed state i ∈ S. Let Y ∞ be binomially distributed with parameters N and a/(a + b). Theorem 1.4 yields an exact formula for the total variation distance since r i,N = 1, d N,N = λ 0 = 1 and l N,j = j . In the following it is assumed for simplicity that a + b < 1 such that all eigenvalues Fix ε ∈ (0, 1). Based on (3.4) it is reasonable to conjecture that a good approximation for the mixing time t mix (ε) : Note however that this approximation is a non-rigorous result. For large N one may further simplify the approximation (3.5) as follows. We have where the last asymptotics follows from well known results for asymptotically normal random variables. Together with |r i,N −1 | ∼ N as N → ∞ it follows that c N,i ∼ (aN )/(2πb) as N → ∞. Thus, for large N , one can expect that, independent of the fixed initial state i, is a good approximation for the mixing time. Numerical comparisons with the exact value t mix (ε), which can be computed via (3.2) using Corollary 1.2 or via (3.3) using Eqs. (1.8) and (1.9), show that both approximations (3.5) and (3.6) are rather sharp. Based on these approximations we conjecture that Y is rapidly mixing with t mix (ε) = Θ(log N ) as N → ∞. We leave a rigorous proof of this conjecture as an open problem. Since Y is non-reversible, it is not straightforward to use the eigenvalues and eigenvectors to rigorously provide upper or lower bounds for the mixing time. For example, Wilson's lower bound formula (see, for example, [9, Theorem 13.28]) states that, for 1/2 < λ < 1, where Φ := (Φ(0), . . . , Φ(N )) T is a right eigenvector to the eigenvalue λ and R := max 0≤i≤N E i ((Φ(Y 1 ) − Φ(i)) 2 ). In our situation, Φ(i) = r i,N −1 = N − a+b a i, i ∈ {0, . . . , N }, and, hence, Φ(i) ∼ N as N → ∞. Straightforward calculations show that R = O(N 2 ) as N → ∞. Wilson's lower bound is hence bounded in N and therefore not helpful to prove the conjecture that t mix (ε) = Θ(log N ) as N → ∞.

Proofs
In this section, proofs of Proposition 1.1, Corollary 1.2 and Theorem 1.4 are provided.
Proof. (of Proposition 1.1) We have η n = f • X n , where f : K N → M is defined via f (k) := N r=1 δ kr for all k = (k 1 , . . . , k N ) ∈ K N . By a criterion of Burke and Rosenblatt [2], provided for convenience in Lemma 5.1 in the appendix, the process η is Markovian if, for every i = (i 1 , . . . , i N ) ∈ K N and every µ ∈ M, the sum j∈f −1 (µ) π (n) i,j depends on i only via f (i). In this case η has n-step transition probabilities p (n) where i ∈ f −1 (ν) can be chosen arbitrarily. We therefore fix n ∈ N 0 , i = (i 1 , . . . , i N ) ∈ K N and µ ∈ M, and focus on the sum j∈f −1 (µ) π (n) i,j . Define ν := f (i). By (1.1), We therefore obtain where the sum T extends over all T = (t k,l ) k,l∈K ∈ N K×K 0 with marginals l∈K t k,l = ν k , k ∈ K, and k∈K t k,l = µ l , l ∈ K. Since there exist exactly ( k∈K ν k !)/( k,l∈K t k,l !) vectors j = (j 1 , . . . , j N ) ∈ K N satisfying N (i, j) = T we obtain where the sum T extends over all T = (t k,l ) k,l∈{0,1} ∈ N K×K 0 with marginals l∈K t k,l = ν k , k ∈ K, and k∈K t k,l = µ l , l ∈ K. Using the notation k := t 1,1 , i := ν 1 and j := µ 1 , this turns into which is equal to the right hand side in (1.3). coincide. This proof is relatively short but somewhat intransparent. In particular, the spectral decomposition, i.e. the matrices R, D and L, have to be known in advance. The second proof is more technical and hence longer, but has the advantage that a recipe is provided how to obtain, recursively over l ∈ {0, . . . , N }, a right eigenvector to the eigenvalue λ N −l . Hence, the matrices R, D and L do not need to be known in advance.
The formulas for the entries of these three matrices pop up naturally while performing the calculations.

Proof 1.
It suffices to verify that the generating functions of P i,. and (RDL) i,. coincide. From Proposition 1.2 it follows that the ith row of the transition matrix P = (p i,j ) i,j∈S has probability generating function (pgf) Plugging in d m,m = (1 − a − b) N −m and the formula for r i,m (see (1.8)) and interchanging the sums over m and k yields This expression coincides with (4.1) and the result is shown.
2 Proof 2. This proof of Theorem 1.4 is somewhat technical in detail, but there is a straightforward approach behind the technical calculations. We therefore first describe the basic method and work out the details afterwards. We proceed as follows. As in the first proof the starting point is formula (4.1) for the pgf of the ith row of P . Now choose in (4.1) s = s 0 with s 0 : Thus, λ N = (1 − a − b) N is an eigenvalue of P with corresponding right eigenvector x 0 = (x 0,0 , . . . , x 0,N ) T having entries x 0,j := s j 0 , j ∈ S. Taking the derivative with respect to s in (4.1) it follows that (product rule) Choosing again s = s 0 = −b/a it follows that j∈S p i,j js j−1 Adding to this equation the C-fold (C ∈ R) of the equation Choosing the constant C such that C = N a + C(1 − a − b), so C = a a+b N , yields a right eigenvector x 1 to the eigenvalue λ N −1 , namely x 1 = (x 1,0 , . . . , x 1,N ) T with entries The general method is now obvious. We differentiate (4.1), successively for l = 1, 2, . . . , N , exactly l-times with respect to s and choose afterwards s = s 0 . With some skill one We will see in the following step, that these are the key equations in order to find the eigenvectors of the transition matrix P .
Step 2. We will now use the system of equations (4.2) in order to derive, successively for j = 0, . . . , N , a right eigenvector x j to the eigenvalue λ N −j = (1 − a − b) N −j . We make the ansatz that x j = (x j,0 , . . . , x j,N ) T is of the form where the coefficients b k,j may depend on the parameters N , a and b of the model but not on the state i. Since eigenvectors can be scaled arbitrarily, we can in principle choose b j,j arbitrary (not equal to zero), but we shall see soon that b j,j := ((a + b)/a) j is a convenient choice. Since x j should be(come) a right eigenvector to the eigenvalue λ N −j , the chain of equalities  We now solve this system of equations. Defining the system reduces to c k,j = j l=k c l,j λ j−l a l−k (l − k)! , j ∈ S, k ∈ {0, . . . , j}.
For k = j this holds since c j,j = ((a + b)/a) j . The induction step from j, j − 1, . . . , k + 1 to k reads In summary, it is shown that x j := (x j,0 , . . . , x j,N ) T , defined via (4.3), with coefficients a i,k := i k s i−k 0 and b k,j := Step 3. It is now straightforward to derive the desired spectral decomposition of the transition matrix P . Let R = (r i,j ) i,j∈S denote the matrix, which contains in the jth column the right eigenvector x j to the eigenvalue λ N −j , i.e. r i,j := x j,i = j k=0 a i,k b k,j or, in matrix notation, R := AB, where A := (a i,j ) i,j∈S is the left lower triangular matrix and B := (b i,j ) i,j∈S the right upper triangular matrix defined via (1.5). Note that A and B are non-singular with inverses (1.6). Since R is the matrix containing in the jth column the right eigenvector x j to the eigenvalue λ N −j , we have P R = RD, where D is the diagonal matrix with entries d i, Multiplying from the right with L := R −1 yields the spectral decomposition P = RDL. The proof is complete.

Appendix
For convenience we record a criterion which ensures that a function of a Markov chain is still Markovian. The result is well known in the probability community (see, for example, Levin and Peres [9, Lemma 2.5]) and essentially goes back to Burke and Rosenblatt [2]. Lemma 5.1. Let X = (X n ) n∈N0 be a homogeneous discrete-time Markov chain with state space S and transition probabilities π i,j , i, j ∈ S. Moreover, let f : S → S be a surjective function in a space S . Define Y n := f • X n for all n ∈ N 0 . If, for every i ∈ S and every v ∈ S , the sum j∈f −1 (v) π i,j depends on i only via f (i), then Y = (Y n ) n∈N0 is a homogeneous discrete-time Markov chain with state space S . In this case Y has n-step transition probabilities p (n) u,v = j∈f −1 (v) π (n) i,j , n ∈ N 0 , u, v ∈ S , where i ∈ f −1 (u) can be chosen arbitrarily. Here π (n) i,j , n ∈ N 0 , i, j ∈ S, denote the n-step transition probabilities of X.
The following lemma provides the horizontal generating functions of the transformation matrices R and L from Theorem 1.4. Lemma 5.2. Let R and L be the matrices defined via R := AB and L := R −1 , where A and B are defined via (1.5), and let r i : C → C and l i : C → C, i ∈ S, denote the associated horizontal generating functions defined via r i (z) := j∈S r i,j z j and l i (z) := j∈S l i,j z j for all z ∈ C and i ∈ S. Then Proof. From (1.8) it follows that Similarly, we deduce from (1.9) that which is the desired formula for the horizontal generating function l i .