Why the Kemeny Time is a Constant

We present a new fundamental intuition for why the Kemeny feature of a Markov chain is a constant. This new perspective has interesting further implications


Introduction
The second-named author has long been interested in the properties of the Kemeny constant in Markov chains, see Hunter [1] and citations therein. At the 22nd IWMS Conference in Toronto in 2013 he introduced the Kemeny constant to the rst-named author and emphasized especially the lack of reasoned, plausible, intuitive argument, apart from purely mathematical justi cations, for why this feature of a Markov chain should be a constant. Subsequently, in Gialampoukidis, Gustafson, Antoniou [2] we accepted its constancy and established the relationship of Kemeny Time to a maximum mixing time for a two-state Markov chain to achieve a total variation distance no greater than any chosen tolerance ϵ from the nal stationary vector π. Then at the 24th IWMS Conference in Haikou in 2015 the two authors of this paper had further discussions of various issues surrounding the Kemeny constant. As a result of those discussions we found a new intuition from which to view the issue. The purpose of this short paper is to present that new perspective and some reasoned and plausible supporting arguments.
The new intuition is to see the well-known basic mean rst passage time matrix equation Mπ = Ke as a change-of-basis procedure. Once that is carefully written out, but as Mπ = k where we call k the Kemeny vector, and where M is M with its diagonal deleted, an insistence on viewing M as the change-of-basis matrix from the M column basis to the natural basis, and M − as the change-of-basis matrix from the natural basis to the M column basis, intuits that one must "end up with equally probable pure states". For brevity, we will not survey the literature, that having been provided in [1]. Again for brevity and convenience we will rely upon that paper for notation and basis facts and previously known interpretations of the Kemeny constant in Markov chains. However, here is some quick background. The pioneering book Kemeny and Snell [3] is the origin of the Kemeny feature: the average mean rst passage time from any state i with respect to the equilibrium probability π does not depend on the state i. Here P is the row-stochastic n × n transition matrix for a regular Markov chain with equilibrium (and stationary) probability π. The most relevant pages in [3] are pp. 75-82 and we will refer to those. In particular, the Kemeny feature is embodied in [3,Theorem 4.4.10]: Mα T = cζ . This we have written above in more modern notation and as in [1] as Mπ = Ke, where e is the column vector e = ( , · · · ; ) T and M is the matrix [m ij ] of rst passage times. K is commonly called the Kemeny constant and was shown in [3] to be K = trace(Z) where Z = [I − (P − A)] − is a resolvent operator and A = limP n as n → ∞.
In the ensuing years there arose some disquiet about the meanings of this result and those are detailed in [1]. A small prize was o ered and eventually given to Peter Doyle who showed that the vector components k i of Mπ = k satisfy the maximum principle k i = Σ i p ij k i and thus must be constant. However, this is more the way of proof rather than some deeper intuition so the issue remained still somewhat open. An interesting interpretation of K is the mean number of links a random surfer will encounter when navigating a random walk on a Markov web until reaching an unknown destination state. See [1] and [3] for further background information.
We will prefer to present our new intuition with the always-invertible matrix M which is M with it's diagonal elements set to zero. This matrix enters also into the proof in [3] and just reduces the Kemeny constant to K − . To conclude this introduction, let us note that it is quite elementary to see from the original treatment In other words, Mπ is in the principal eigenspace sp[e] of P and is therefore a constant times e.

Why the Kemeny Vector has Equal Coordinates
Our approach starts with no Kemeny constant K at all. As if we were teaching the introductory linear algebra course, we write the invertible equation Mπ = k is the change of basis: We have written in three dimensions for clarity but the argument is the same in all dimensions. We call the columns on the left the M column basis and the three columns on the right the natural basis or the pure states or e , e , e or s , s , s , whatever be your predilection. This is why our intuition said: there is an equiprobable pure state assumption somewhere underlying the fact that k has equal coordinates. Stated another way, in the way physicists like to claim that one should always work in a "coordinate-free" way: π is "just" k but now expressed in the M column basis rather than in the pure state "natural" basis. Stated a third way: the stationary probability π, which is the fundamental measure for the process at equilibrium, is really the equiprobability measure in disguise. This is a strong claim and a new outcome that we will support in the rest of this paper. To begin, our new intuition originated from thinking of (2.1) from the change-of-basis procedure as implemented by Gauss row reduction, e.g. see Lay [4,Section 4.7 ]. To invert a matrix equation Ax = b one forms the tableau [A|I] and row reduces that to [I|A − ]. This is a special case of a general change of basis procedure where P C←B transforms any vector from representation in the B column basis to its representation in the C column basis. In the special case one can say that x is merely b changed from its representation in the natural basis to it's representation in the A column basis.
We will illustrate this in the next section by explicitly carrying it out for the Land of Oz example of [3].
Of course the change-of-basis matrix inversion perspective applied to Mπ = k and the π =M − k is just a special case of representing any vector b written as usual in the natural basis to changing it's representation to x = A − b where x is now its coordinates in the A column basis. The key here is that π is a very special equilibrium probability measure.

The Change-of-Basis Picture
Because our new intuition arose out of insisting that we view the remarkable Kemeny-Snell equation Mπ = Ke as a change-of-basis statement, we elaborate by speci c example here. A good elementary reference is the book [4, Section 4.7 pp 239-242]. We may immediately get into the spirit by doing the key example used throughout [3]: the Land of Oz example (3.1) We know that Pe = e, P T π = π = (π , π , π ) T = ( , , ) T , and as computed in [3] via the resolvent operator Z, the mean rst passage time matrix M is To calculate M − by the Gauss procedure, one row reduces the tableau as follows: This is a special case of the more general change-of-basis in which one drives the tableau: probable measure e = ( , , ) T to a multiple of the stationary measure (π , π , π ). Generally for the n × n case where Mπ = (K − )e, we make the right side of measure one by dividing both sides by n(K − ) a factor which can be absorbed by M and its inverse. One easily calculates that K − = for the Land of Oz chain so the normalizing factor is .
While this change-of-basis picture brings to the fore that the right side of (2.1) is actually a representation of the Kemeny-Snell vector k in terms of the pure states s , s , s , it does not prove that k = k = k . That fact was already established in [3] and has been shown other ways, see [1]. We gave a very simple proof at the end of Section 1. Here is another one, which we wish to mention in order to bring us to the point we emphasized at the end of Section 2: π is a very special vector measure-theoretically.
Just apply P n to both sides of the change-of-basis equation (2.1) and go the limit as n → ∞. The left side is invariant since P n (Mπ) = Mπ as we showed in Section 1. The right converges to (3.4) so that left side Mπ = k is a constant multiple (K − ) of e. Here we have used the fact that lim P n = A in the Kemeny-Snell notation [3] is the rank-one oblique projection given by A = eπ T . For the Perron convergence theory see Hunter [5,Chapter 7] and Horn and Johnson [6, Chapter 8] and especially their wonderful Lemma 8.2.7 on pages 497-498. In their notation lim P n is L = xy T = eπ T here and we sometimes like to go further, see Gustafson [7,p. 206] to regard the normalized version xy T y T x as the oblique projection onto the span of x from the direction perpendicular to y. That L = L projection view provides a strictly geometrical new view of K: the amplitude of the oblique rank-one projection L(Mπ) onto sp [e].
Thus the change-of-basis equation (2.1) by the invariance of its left side Mπ under the Markov chains transition matrix iterates P m as shown in equation (1.1) has led us to the fact (3.3) that the Markov process must "end up with equally probable pure states". The later are the essence of the at-rst seemingly harmless eigenvector e. The fact this occurs rests principally upon the stationary probability π.

Discussion
Our new perspective raises a number of interesting implications. Some of these may be worthy of further study but we can only mention a couple of them here in this brief paper.
Why equi-probability? The reply: Kemeny-Snell's [3] remarkable equation Mπ = Ke is only a statement at equilibrium. Everyone knows that one can start a regular Markov chain with any initial probability and iterate until you get to the limit distribution π. This is generalized in the famous Perron Theorem, e.g, [6, p. 499], and the point is that the L ∞ limit of P n is L = xy T = eπ T in our case. L is a rank-one oblique projector and in fact it itself represents an independent trials process with transition matrix with Perron eigenvector Le = e and stationary equilibrium probability L T π = πe T π = π. An MCMC implication? The widely acclaimed Markov Chain Monte Carlo, see e.g Antoniou, Christidis, Gustafson [8], assumes you can nd an initial distribution π which after a su cient number of interations is close to the invariant distribution π which is believed to represent the physical process being modeled. One then performs Monte Carlo simulations on the latter. Our interpretation in [8] is that the iterations generate su cient mixing so that the subsequent sampling stage represents adequately the regular probability distribution of the application. We go further [8] and hope that there exists a deeper underlying physical dynamics. Here we say: do your Monte Carlo equiprobably.
Next, we mention that we became curious about how Kemeny-Snell [3] somehow were able to move effortlessly between P and P T , or if you wish between M and M T , viz, between [3, Theorems 4.4.9 and 4.4.10]. The technical secret seems to lie in the second term in equation (1.1) in our introduction. Namely, the symmetric operator D − E has null space sp{π}. One could go a bit further intuitively and assert that D represents the probability of the self loops of the pure states s , s , s and E represents random equiprobable noise and the two are canceled on the stationary distribution π.
We may ask how our column bases (the columns of M) behave as the Markov process progresses. That is, we expect Kemeny time K to 'decrease' as we step forward in the chain P, P , · · · . To make this precise, recall K = + n ( − λ m i ) − , and let us make the additional assumption that P is primitive so that all the |λ i | < for all of i > . The Kemeny-Snell equation Mm π = km = Km e at the mth step in the Chain has Kemeny time Km = + n i= ( − λ m i ) − which converges down to K L = n as the |λ i | m all go to zero. The column bases of Mm converge to those of M L which for n = are

Conclusions
In the recent paper [1] and before that it has been emphasized that there was still needed a better reasoned, plausible intuitive argument, apart from purely mathematical justi cations, for why the Kemeny feature of a Markov chain should be constant. Here we have shared with you a new intuition, reasoned arguments supporting that intuition, and a perhaps unexpected plausible fundamental outcome. The intuition was to insist on viewing the remarkable Kemeny-Snell rst passage time equation Mπ = k as an M-column basis representation of k, then wonder why the new coordinates k , k , k of the natural basis representation of π need to be equal. Of course that perspective holds for arbitrary dimension n. The resulting reasoned arguments followed closely the original treatment in [3] and, by the way, completely avoided the machineries of operator resolvents or generalized group inverses. The other perspective in our reasoned arguments was the Perron Theorem and especially its limit oblique projection eπ T . The plausible outcome was that the Markov chain in the limit must converge to equally probable pure states. This equiprobability measure is hidden within the equilibrium measure π. In important applications it is postulated to represent a deeper underlying chaos [8].