Embeddable Markov Matrices

We give an account of some results, both old and new, about any $n\times n$ Markov matrix that is embeddable in a one-parameter Markov semigroup. These include the fact that its eigenvalues must lie in a certain region in the unit ball. We prove that a well-known procedure for approximating a non-embeddable Markov matrix by an embeddable one is optimal in a certain sense.


Introduction
A Markov matrix A is defined to be a real n × n matrix with non-negative entries satisfying n j=1 A i,j = 1 for all i. The spectral properties of non-negative matrices and linear operators and in particular of Markov matrices have been studied in great detail, because of their great importance in finance, population dynamics, medical statistics, sociology and many other areas of probability and statistics. Theoretical accounts of parts of the subject may be found in [1,5,16,17]. This paper develops ideas of [10], which investigated when the pth roots of Markov matrices were also Markov; this problem is related to the possibility of passing from statistics gathered at certain time intervals, for example every year, to the corresponding data for shorter time intervals.
Given an empirical Markov matrix, three major issues discussed in [19] are embeddability, uniqueness of the embedding and the effects of data/sampling error. All of these are also considered here. We call a Markov matrix A embeddable if there exists a matrix B such that A = e B and e Bt is Markov for all t ≥ 0. The matrix B involved need not be unique, but it must have non-negative off-diagonal entries and all its row sums must vanish; see [1] or [5, section 12.3]. In probabilistic terms a Markov matrix A is embeddable if it is obtained by taking a snapshot at a particular time of an autonomous finite state Markov process that develops continuously in time. On the other hand a Markov matrix might not be embeddable if it describes the annual changes in a population that has a strongly seasonal breeding pattern; in such cases one might construct a more elaborate model that incorporates the seasonal variations. Embeddability may also fail because the matrix entries are not accurate; in such cases a regularization technique might yield a very similar Markov matrix that is embeddable; see [15] for examples arising in finance.
Theorem 9 describes some spectral consequences of embeddability. The earliest analysis of the structure of the set E of embeddable n × n Markov matrices and its topological boundary in the set of all Markov matrices was given by Kingman [14], who concluded that except in the case n = 2 it seemed unlikely that any very explicit characterisation of E could be given; see [12] for further work on this problem. Theorem 13 proves that a well-known method of approximating a Markov matrix by an embeddable Markov matrix is optimal in a certain sense. Many of the results in the present paper appear in one form or other in papers devoted to the wide variety of applications, and it is hoped that collecting them in one place may be of value.

The main theorem
For the sake of definiteness we define the principal logarithm of a number z ∈ C\(−∞, 0] to be the branch of the logarithm with values in {w : |Im(w)| < π}. We define the principal logarithm of an n × n matrix A such that Spec(A) ∩ (−∞, 0] = ∅ to be that defined by the functional integral using the principal logarithm of z and a simple closed contour γ in C\(−∞, 0] that encloses the spectrum of A. This formula goes back to Giorgi in 1926; see [6, Theorem VII.1.10 and notes, p.607]. If A = T DT −1 where D is diagonal, this is equivalent to log(A) = T log(D)T −1 where log(D) is obtained from D by applying log to each diagonal entry of D. The non-diagonalisable case is discussed in some detail in [19] and yields the same matrix as (1).

Lemma 1 If
A is a Markov matrix and Spec(A) ∩ (−∞, 0] = ∅ then the principal logarithm L = log(A) lies in the set L of all real n × n matrices L such that 1≤j≤n L i,j = 0 for every i.
Proof We use the formula (1) and take the contour γ to be symmetrical with respect to reflection about the x-axis. The statements of the lemma follow directly from two properties of the resolvent matrices.
The first is the identity This holds for large |z| by virtue of the identity and (2) then extends to all z / ∈ Spec(A) by analytic continuation.
The second identity needed is whose proof follows the same route, using (3) and analytic continuation.
The results in our next lemma are all well known and are included for completeness.

Lemma 2 If
A is embeddable then 0 is not an eigenvalue of A and every negative eigenvalue has even algebraic multiplicity. Moreover det(A) > 0. If A is embeddable and A i,j > 0, A j,k > 0 then A i,k > 0.
Proof The first statement follows from the fact that Spec(A) = exp(Spec(B)).
Given an eigenvalue λ < 0 of A let S + = {z ∈ Spec(B) : e z = λ and Im(z) > 0}, S − = {z ∈ Spec(B) : e z = λ and Im(z) < 0}, and let L ± be the spectral projections of B associated with S ± . Since e z = λ implies that Im(z) = 0, we can deduce that L − ∩L + = 0 and that M = L − +L + is the spectral projection of A associated with the eigenvalue λ. Since B is real L − may be obtained from L + by complex conjugation, so See [7].
By combining the reality of B with the formula det(A) = e tr(B) we obtain det(A) > 0. See [14].
The last statement follows from the general theory of Markov chains and is due to Ornstein and Levy, independently; see [1, Section 2.5, Theorem 2] and [5,Theorem 13.2.4]. We first note that one may write B = C − δI where all the entries of C are non-negative and δ ≥ 0. Hence where each entry of each C n is non-negative. This implies that if (e Bt ) i,j > 0 for some t > 0 the same holds for all t > 0. This quickly yields the final statement.
Kingman [14] has shown that the set E of embeddable Markov matrices is a closed subset of the set of all n × n Markov matrices. The matrix norm used throughout this paper is

Lemma 3 The set S of all A ∈ E with no negative eigenvalues is a dense relatively open subset of E.
Proof If A ∈ S then a simple perturbation theoretic argument implies that there exists ε > 0 such that C has no negative eigenvalues for any n × n matrix satisfying A − C < ε. This implies that S is relatively open in E.
If A ∈ E then A = e B for some Markov generator B. If {x r + iy r } n r=1 is the set of eigenvalues of B and t > 0 then e Bt has a negative eigenvalue if and only if ty r = π(2m + 1) for some r and some integer m. The set of such t is clearly discrete. It follows that e Bt ∈ S for all t close enough to 1 except possibly for 1 itself. Since lim t→1 e Bt − A = 0, we conclude that S is dense in E.
The following example shows that the density property in Lemma 3 depends on the embeddability hypothesis.

Example 4
The Markov matrix has Spec(A) = {1, −1/3}. If 0 < ε < 1/3, any matrix close enough to A also has a single eigenvalue λ satisfying |λ + 1/3| < ε by a standard perturbation theoretic argument. Since A has real entries the complex conjugate of λ is also an eigenvalue, so λ must be real and negative. Therefore the set of Markov matrices with no negative eigenvalues is relatively open but not dense in the set of all Markov matrices, at least for n = 2. The example may be used to construct a similar example for every n > 2.
We will need Lemma 5 and its corollary in the proof of Theorem 7.
Lemma 5 There exists a polynomial p in the coefficients of an n × n matrix A such that A has a multiple eigenvalue in the algebraic sense if and only if p = 0. We call p the discriminant of A.
Proof A has a multiple eigenvalue if and only if its characteristic polynomial q(z) = z n + a 1 z n−1 + . . . + a n has a multiple root; the coefficients of q are themselves polynomials in the entries of A. Moreover q has a multiple root if and only if its discriminant (the square of its Vandermonde determinant) vanishes, and the discriminant of q is a polynomial in a 1 , . . . , a n .
Corollary 6 If A 0 , A 1 are two n×n matrices then either A z = (1−z)A 0 +zA 1 has a multiple eigenvalue for all z ∈ C or this happens only for a finite number of z.
Proof The discriminant of A z is a polynomial in z, which has a finite number of roots unless it vanishes identically.

Theorem 7
The set T of all n × n embeddable Markov matrices that have n distinct eigenvalues is relatively open and dense in the set E of all embeddable Markov matrices.
Proof A standard argument from perturbation theory establishes that T is relatively open in E, so we only need to prove its density.
Let A = e B 0 where B 0 is a Markov generator, and let ε > 0. Then put One sees immediately that B t is a Markov generator for all t ∈ [0, 1] and that it has n distinct eigenvalues if t = 1. Corollary 6 now implies that the eigenvalues of B t are distinct for all sufficiently small t > 0. By further restricting the size of t > 0 we may also ensure that e Bt − A < ε/2.
Having chosen t, we put B = sB t where s ∈ R is close enough to 1 so that e Bt − e B < ε/2; we also choose s so that if λ 1 , λ 2 are any two eigenvalues of B t then s(λ 1 − λ 2 ) / ∈ 2πiZ. These conditions ensure that e B − A < ε and that e B has n distinct eigenvalues.
The following lemma may be contrasted with the fact that a complex number λ such that |λ| = 1 is the eigenvalue of some n × n Markov matrix if and only if λ r = 1 for some r ∈ {1, 2, . . . , n}; see [16,Chap 7,Theorem 1.4]. Permutation matrices provide examples of such spectral behaviour. The lemma has been extended to an infinite-dimensional context in [4].
Lemma 8 (Elfving,[7]) If A is an embeddable Markov matrix and λ = 1 is an eigenvalue of A then |λ| < 1. The main application of the following theorem may be to establish that certain Markov matrices arising in applications are not embeddable, and hence either that the entries are not numerically accurate or that the underlying process is not autonomous. The theorem is a quantitative strengthening of Lemma 8. It is of limited value except when n is fairly small, but this is often the case in applications.
Proof This depends on two facts, firstly that Spec(A) = exp(Spec(B)) where B = c(C − I), c > 0 and C is a Markov matrix. Secondly by applying a theorem of Karpelevič to C and then deducing (6) from that; see [13]  We turn now to the question of uniqueness. The first example of a Markov matrix A that can be written in the form A = e B for two different Markov generators was given by Speakman in [20]; Example 17 provides another. The initial hypothesis of our next result holds for most embeddable Markov matrices by Theorem 7.
1. The solutions of e B = A form a discrete set and they all commute with each other and with A.
2. Only a finite number of the solutions of e B = A can be Markov generators.
3. If |λ r | > exp(−π tan(π/n)) for all r then only one of the solutions of e B = A can be a Markov generator, namely the principal logarithm.
for all r. It follows that B can be written as a polynomial function of A.
For each λ r , the equation (8) has a discrete set of solutions µ r .

If
A is an invertible Markov matrix with distinct eigenvalues and the solution B of e B = A is a Markov generator then every eigenvalue µ r of B lies in the sector {−u+iv : u ≥ 0, |v| ≤ u cot(π/n)} by (6). Combining this restriction on the imaginary parts of the eigenvalues with (8) reduces the set of such B to a finite number. See [11, Theorem 6.1] for items 1 and 2, and for an algorithm implementing item 2.
3. We continue with the assumptions and notation of item 2. The assumption (7) implies that if µ r = −u r + iv r then u r < π tan(π/n). Item 2 now yields |v r | < π. Hence µ r is the principal logarithm of λ r and B is the principal logarithm of A.
The conclusions of the above corollary do not hold if A has repeated eigenvalues or a non-trivial Jordan form; see [3,11]. For example the n × n identity matrix has a continuum of distinct logarithms B which do not all commute; if the eigenvalues of B are chosen to be {2πri : 1 ≤ r ≤ n}, then the possible B are parametrized by the choice of an arbitrary basis as its set of eigenvectors. The general classification of logarithms is given in [8] and [9, Theorem 1.28]. These comments reveal a numerical instability in the logarithm of a matrix if it has two or more eigenvalues that are very close to each other.
The following provides a few other conditions that imply the uniqueness of a Markov generator B such that A = e B .
Theorem 11 (Cuthbert,[2,3]) Let A = e B where B a Markov generator. Then (9) ⇒(10) ⇒(11) ⇒ (12), where If A is a Markov matrix that has distinct eigenvalues and det(A) > e −π then its only possible Markov generator is its principal logarithm log(A). If A t is a one-parameter Markov semigroup then for every t > 0 one may define L(t) to be the number of Markov generators B such that e Bt = A t . Some general theorems concerning the dependence of L(t) on t may be found in [3,19].

Regularization
Let G denote the set of n × n Markov generators; following the notation of Lemma 1, G is the set of G ∈ L such that G i,j ≥ 0 whenever i = j.
Let A be a Markov matrix satisfying the assumptions of Lemma 1, for which L = log(A) does not lie in G. There are several regularizations of L, that is algorithms that replace L by some G ∈ G that are (nearly) as close to L as possible. Kreinin and Sidelnikova [15] have compared different regularization algorithms for several empirical examples arising in finance and it appears that they all have similar accuracy. The best approximation must surely depend on the matrix norm used, but if one considers the physically relevant matrix norm (4) then we prove that the simplest method, called diagonal adjustment in [15], also produces the best possible approximation. We emphasize that although G is a closed convex cone, this does not imply that the best approximation is unique, because the matrix norm (4) is not strictly convex.
Theorem 12 Let L ∈ L and define B ∈ G by together with the constraint n j=1 B i,j = 0 for all i. Then Proof It follows from the definition of the matrix norm that we can deal with the matrix rows one at a time. We therefore fix i and put ℓ j = L i,j , P = {j : j = i and ℓ j ≥ 0}, N = {j : j = i and ℓ j < 0}, so that ℓ i = ℓ N − ℓ P . We next put b j = B i,j , where B is defined as in the statement of the theorem. Thus A direct calculation shows that Finally given G ∈ G we define g j = G i,j for all j. We have One observes that all the entries of A − A are less than 0.036 in absolute value.
The following exactly soluble example illustrates the use of some of our theorems.

Theorem 15
Let where s ∈ R, and let A s = e Ls . Then 1. If s ≥ 0 then A s is an embeddable Markov matrix.
2. If s < σ ∼ −0.5712 then A s has at least one negative entry.
3. If σ ≤ s < 0 then A s is Markov but not embeddable.
Proof We first note that L s 1 = 0 for every s ∈ R, so A s 1 = 1.
Item 1 follows from the fact that L s satisfies all the conditions for the generator of a Markov semigroup.
To prove item 2 we note that L s = −(1 + s)I + F + sB where F, B are permutation matrices that commute. Let S be the set of all s such that e Ls is non-negative. If t ∈ S and s ≥ t then we conclude that every eigenvalue of L s differs from an eigenvalue of B s by an integral multiple of 2πi. A direct computation shows that For s in the stated range, each non-zero eigenvalue λ of L s satisfies | arg(λ)| < 5π/6 and the same applies if one adds an integral multiple of 2πi to the eigenvalue. Hence each non-zero eigenvalue λ of B s satisfies | arg(λ)| < 5π/6 and (5) implies that B s cannot be a generator.

Example 16
The following illustrates the difficulties in dealing with Markov matrices that have negative eigenvalues. If c = 2π/ √ 3 and then the eigenvalues of B are 0, − √ 3π ± πi. The matrix A = e B is self-adjoint with eigenvalues 1, −e − √ 3π , −e − √ 3π . If one uses Matlab's 'logm' command to compute log(A), one obtains a matrix with complex entries that is not close to B; it might be considered that 'logm' should produce a real logarithm of a real matrix if one exists, but it is not easy to see how to achieve this.

Example 17
We continue with the above example, but with the more typical choice c = 4. The eigenvalues of B are now 0, −6±3.4641i. Clearly A = e B is an embeddable Markov matrix. If one rounds to four digits one obtains which is also a Markov generator. We conclude that A is an embeddable Markov matrix in (at least) two distinct ways.
This is not an instance of a general phenomenon. If one defines the 5 × 5 cyclic matrix B by if r + 1 = s, 4 if r = 5, s = 1, 0 otherwise, then B is a Markov generator with eigenvalues 0, −7.2361±2.3511i, −2.7639± 3.8042i. However L = log(exp(B)), with the principal choice of the matrix logarithm, is a cyclic matrix with some negative off-diagonal entries, so it cannot be a Markov generator.