Accurate calculations of stationary distributions and mean first passage times in Markov renewal processes and Markov chains

This article describes an accurate procedure for computing the mean first passage times of a finite irreducible Markov chain and a Markov renewal process. The method is a refinement to the Kohlas, Zeit fur Oper Res, 30,197-207, (1986) procedure. The technique is numerically stable in that it doesn't involve subtractions. Algebraic expressions for the special cases of one, two, three and four states are derived. A consequence of the procedure is that the stationary distribution of the embedded Markov chain does not need to be derived in advance but can be found accurately from the derived mean first passage times. MatLab is utilized to carry out the computations, using some test problems from the literature.


Introduction
A variety of techniques have been developed for computing stationary distributions and mean first passage times (MFPTs) of Markov chains (MCs). In this paper we focus primarily on the accurate computation of these key properties based upon the well known state reduction procedures of Grassman, Taksar and Heyman (the GTH algorithm) ( [4]), or the equivalent Sheskin ([15]) procedure, that were developed primarily for the computation of the stationary distributions of irreducible MCs. The stability of the procedure is the result of the observation that no subtractions need be carried out. This is discussed in Section 2. Kohlas [13] developed a related procedure for the computation of the MFPTs, based mainly on considering the computation of the mean times to absorption, by showing that the computations were more naturally focused on considering the underlying model as a Markov renewal process (MRP) rather than as a MC. We delve into these procedures in more detail after first summarizing, in Section 3, the key properties of MRPs. In Section 4 we work through the ideas of the Kohlas algorithm and give a general procedure for computing the mean passage times between any two states rather than consider mean times to absorption, as in Kohlas [13]. We explore in some detail, in Sections 5 to 8, procedures for the special cases of particular finite state spaces of one, two, three and four states, obtaining expressions for the MFPTs, some of which that have previously been given in the literature. We see that the Kohlas procedure is not ideal for the global derivation of the MFPT matrix but we develop in Section 9 a modification of the Kohlas procedure, an Extended GTH procedure that will lead to expressions for the MFPTs using effectively the same calculations as in the GTH algorithm. In the final section we explore some calculations, using MatLab, of the key properties for some ill conditioned transition matrices that have previously been considered as test problems in the literature.
Note, that due to space considerations, we do not explore other procedures for finding MFPTs. We leave this for a further paper where we compare the procedure of this paper with the well known approaches of Kemeny and Snell [12] and others, for example, in Meyer [14], Stewart [16], Hunter [9] and Dayar and Akar [3] as well as some new perturbation procedures under development by the author. This also enables us to make additional comparisons using the test problems of this paper to validate the stability of this new procedure.

Computation of the stationary probabilities
Let P ( N ) = p ij N ,1 ( N ) , p N ,2 ( N ) ..., p N ,N −1 ( N ) ) and p N −1 ( N )(c)T = ( p 1,N ( N ) ,..., p N −1,N ( N ) ) be 1 × (N -1) row vectors with r and c denoting, respectively, row and column elements of the probabilities, with the superscript N denoting that they are from the N-th row or N-th column of the P ( N ) matrix and the subscript N -1 that they are vectors of length N -1. Similarly, we use the superscript N in the sub-matrices Q N −1 ( N ) to denote that they are sub-matrices of the transition matrix P ( N ) associated with an N-state MC and the subscript N -1 to denote that the matrix is of order (N-1) × (N -1).
Let e (N)T = (1, 1, …,1) be an 1 × N vector and I N be the N × N identity matrix.
In the procedures that we consider for finding stationary distributions and the MFPTs of MCs, we start with an N-state MC {X k ( N ) , k ≥ 0} and reduce the state space by one state at a time. Once we get to two states we expand the state space one state at a time until we return to the final set of N states. We concentrate on the sequential state reduction process at first by starting with N states 1, 2, …., N and initially reducing the state space to 1, 2, …., N-1. For simplicity, when there is no ambiguity, we write p ij ( N ) simply as p ij , Note that Q N −1 ( N ) is not stochastic, since and that p N −1 ( N )(r )T e ( N −1) + p NN ( N ) = 1.
From (3), Equations (2) and (5) imply that 1) and the transition probabilities associated with P ( N ) . Further, from equations (4) and (6), Let Note that P ( N −1) is a stochastic matrix with N-1 states, since from (1), , k ≥ 0} be the MC that has as its transition matrix. Note also that (8), Note the computation of the quantities S(N) can be carried out without any subtraction.
We can interpret the transition probabilities p ij , k ≥ 0} on the state space S N-1 as the transition probability from state i to j of the MC {X k ( N ) , k ≥ 0} on S N restricted to S N-1 , i.e. the "censored" MC. ( See also pg. 17, Bini, Latouche and Meini [2]). For (i, j) ∈S N −1 × S N −1 it is possible to jump directly from i to j with probability p ij ( N ) .
Alternatively, it is possible to jump from i to j via state N, being held at state N for t steps, (t = 0, 1, 2,…) followed by a one-step jump to j from N, with probability , leading to the general expression (9) for p ij ( N −1) .
Note that there is a connection between equation (9) and Schur complementation. This is discussed in Bini,Latouche and Meini,(pg 17,[2]).
Further, from (7) and (8), Thus we have reduced the state space from N to N -1 with the resulting MC {X k ( N −1) , k ≥ 0} having a stationary distribution {π i ( N −1) } that is a scaled version of the first N -1 components of the stationary distribution of the MC {X k ( N ) , k ≥ 0}with N states, as given by (11).
Let us define the stationary probability vector of the MC {X k ( N ) , k ≥ 0} as π T = (π 1 ,π 2 ,...,π N ) = π ( N )T . As we continue to reduce the state space to S n (n = 1, 2, …, N -1) it is clear, from an extension of (11), that . (12) i.e. the stationary probabilities of the MC {X k (n) , k ≥ 0} on S n are scaled versions of the first n Let us now consider expanding the state space from S N-1 to S N . Note that, from (11), i.e. the first N-1 terms of π i ( N ) are a multiple of π i ( N −1) . Further, from (6) and definition of S(N), π (n)T = (π 1 (n) ,π 2 (n) ,...,π n (n) ) = k n (π 1 ,π 2 ,...,π n ) where k n = 1 From (13) and (14), the constant c N-1 is determined from the fact that π i ( N ) = 1, i=1 N ∑ and the stationary distribution for the MC on S N can be determined from the MC on S N-1 yielding leading to a procedure for determining the stationary distribution on the expanded state space.

Compute, for
This is a formal derivation of the Grassman, Taksar and Heyman (GTH) algorithm ( [4]) or the equivalent Sheskin State Reduction procedure ( [15]) for finding the stationary distribution of an irreducible finite MC. The procedure is numerically stable and accurate in that no subtractions need be carried out.
Note that, as a result of equation (12), the stationary distribution for the derived MC {X k (n) , k ≥ 0} with transition probability matrix P (n) = p ij (n) ⎡ ⎣ ⎤ ⎦ on the reduced state space S n , is given as π i (n) = r i r j j=1 n ∑ , i = 1,2,..., n.

Markov renewal processes
We give a brief review of some of the key properties of Markov renewal processes. We refer the reader to Section 2.2 of [9] where the following general concepts and notation are presented.
We consider a MRP {(X n ,T n ), n ≥ 0}, with state space S = {1, 2, …, N} and semi-Markov {X n }, (n ≥ 0), tracks the states successively visited and T n is the time of the n-th transition.
Observe that Q ij (+∞) = P{X n+1 = j X n = i}, so that {X n } is a MC, the embedded MC, with is the distribution function of the "holding time" T n+1 − T n in state X n until transition into state X n+1 given that the MRP makes a transition from X n to X n+1 .

Computation of the Mean First Passage Times
We seek a computational procedure that will enable us to calculate all the MFPTs times in a MC.
As Kohlas [13] pointed out in his pioneering paper, it is more natural to consider the Markov renewal setting. Let us define T k (n) ), k ≥ 0} with n-states, transition matrix P (n) and mean holding time vector µ (n) . From (20), the matrix M n satisfies . (22) . We partition so that block multiplication of (22) yields (1,2) Block: From (26), and, using (2), Substitute into (24) Thus, using the expression for P (n−1) as derived earlier (cf. equation (8)), This is of similar form to the n-state case as given by (22) but with the state space reduced to n -1 and a changed form for µ (n−1) .
This leads to the following structural result.
, and vector of mean holding times µ (n)T = (µ 1 (n) ,..., µ n−1 (n) , µ n (n) ) then M n satisfies equation (22), Then, under the state reduction process as carried out under the GTH algorithm, where the transition probabilities p ij (n−1) are given by and the elements of the mean holding time vector µ (n−1)T = (µ 1 (n−1) ,..., µ n−1 (n−1) ) are given by ☐ Note that equation (31) is identical to format of the transition probabilities as used in the GTH algorithm with the derivation given by equations (8) and (9) with N replaced by n. The expression for the elemental expressions for the mean holding times (32) follows from (29).
. This means that we can reduce the state space by successive steps retaining the same MFPTs for the reduced state space in the upper block of M n although the mean holding times in the states are modified, as given by equation (32).
Upon increasing the state space from S n-1 to S n , as in the GTH algorithm, we wish to find an expression for the elements of M n , given From the properties of MCs and MRPs the following results for the mean recurrence times m nn , can be deduced: Proof: , leading to (33) for i = n and also to equation where m 11 = µ 1 (1) .

Proof:
Equation (36) follows from an elemental expression of equation (27). The result for n = 1 follows from equation (34) as π 1 (1) = 1 and hence λ 1 (1) = µ 1 (1). . ☐ Theorem 4 gives an additional useful computational procedure for m nn . While it does require knowledge of the m in for i = 1, 2, …, n -1, it avoids the calculation of the stationary distribution which is an advantage in the Markov renewal setting. The computation of the m in for i < n requires some additional computational effort as we shall see shortly.
With knowledge of the elements of M n-1 expressions for the elements of m n−1 (n)(r )T = (m n1 , m n2 ,...., m n,n−1 ) can easily be deduced directly from equation (28).
From (25), Even though (I n−1 − Q n−1 (n) ) −1 exists we use the reduction procedure used above by eliminating m n−1,n from m n−1 (n)(c)T and replacing it in the expressions for the elements m 1n , m 2n ,...., m n−2,n .
The following theorem enables us to develop expressions for the m in for i < n. where Proof: (a) Expression (39) is equation (38) in element form, using equations (42) and (44).
In the calculations expressed by equation (47) it would be advantageous if we could express R(i,n) as a sum of terms, with no subtraction, as was the case for the S(n).
The state reduction process can continue to a single state, n = 1, where from (30), m 11 = µ 1 (1) . (see Section 5 for a further discussion on this result.) We can however finish the state reduction process when we are left with n = 2 states. From that are easily solved to yield, using the observation that 1− p 11 (2) = p 12 (2) and 1− p 22 (2) = p 21 (2) , p 21 (2) p 21 (2) Note, from equation (32) with n = 2, leading to the expression for m 11 in (49). Following the state reduction process to S 2 we now need to increase the state space to S N through the inclusion of successive additional states.
From the process outlined in Theorem 2, where the M 2 matrix is given by (49).
Thus the process can be progressed from M n-1 to M n using Theorems 4, 5 and 6.

. Special case N = 2:
We consider the MRP on the state space S 2 ={1,2} with embedded irreducible MC having a transition matrix and mean state holding 2) , i = 1, 2.
For the N = 2 state situation, we have solved the matrix equation (22) when n = 2, in Section 4, in element form leading to equation (49) for M 2 .

Now from
, implying that Using the facts, derived from the above observations, .
, and using the simplifications that Δ 1 p 13 (3) ) and (3) ) we express all the elemental expressions of the M 3 matrix for the MFPTs in terms of the p ij (3) and the µ i (3) . This leads to Note that the expression for m 33 can also be deduced from either equation (34 or (36). Note also that from the properties of MRPs, m ii = λ 1 3) . The diagonal elements of (53) are consistent with this observation since, using (52), the mean asymptotic increment is given by Δ .

Algorithms for the computation of the matrix of MFPTs
From the special cases considered in Sections 5 to 8, we have typically extended the calculations for the elements of M n from the matrix M n-1 by appending the elements m in (i = 1, .., n -1), m nn and m nj (j =1, …, n -1). However by exploring these calculations in more depth it is apparent that an recursive process for computing the m ij elements for the original M N matrix can be constructed through three separate different procedures corresponding to the three cases i < j, i = j, and i > j. We can separate these three cases as follows, using the notation developed earlier, viz., as given by equations (31) and (32), q ij (t−1,n) with q ij (n−1,n) = p ij (n) as given by equations (41) and (42), ν i (t−1,n) as given by equations (43) and (44), and , as given by equation (46).
Proof: The expressions in (a) are when the indices i > j, in (b) when i = j and in (c) when i < j. Equation (55) follows from (37) and (56) from (49); equation (57) from (36) and (58) from (36); equation (59) and (60) from (47). ☐ The general procedure described by Theorem 7 is difficult to program, for a general state space, using MatLab. In particular the computation for the m ij when j > i demands additional computations. Typically the elements of P (n) = p ij (n) ⎡ ⎣ ⎤ ⎦ , an n x n stochastic matrix, are easily found by the GTH algorithm. However in order to compute the q ij (t ,n) requires first identifying the starting elements of Q n (n−1) = q ij i.e. the elements are from the sub-stochastic matrix found from the first n -1 rows and n -1 columns of P (n) . The reduction through the sequence of GTH reduction procedures leads from Q n (n−1) = q ij to eventually arrive at Q n (2) = q ij (2,n) ⎡ ⎣ ⎤ ⎦ 2×2 and finally at Q n (1) = q ij (1,n) ⎡ ⎣ ⎤ ⎦ 1×1 . Because of the truncation of P (n) , for each n, to start with Q n (n−1) , this process has to be implemented for each value of n = N, N -1, …., 2. Thus the GTH algorithmic reduction has to be carried out a number of times, as in the following grid, 3) and finally to P (2) → Q 1 (2) . From the initial (N -1) GTH algorithmic procedures for the p ij (n) , followed by (N -1) matrix reductions to start with the initial q ij (n−1,n) there are a further (N -2) + (N-3) + ….+ 1 reductions for a total N(N -1)/2 separate GTH procedures. For the original GTH procedure for finding the stationary probabilities we only needed retention of the p in (n) and p nj (n) boundary terms of the P (n) matrices, whereas for the MFPT's we need to retain additional elements of the P (n) leading to the Q i (n) matrices.
In the computation of the R(i,n) = 1− q ii (1,n) expressions we have no easy technique to ensure that no subtraction is required. This is due to the fact that the sum of the elements in the last row of the Q i (n) matrix do not sum to 1, as in the P (n) matrices.
The q ik (t ,n) terms only arise in the computation of the MFPTs m ij when i < j, whereas the p ij (n) are all that is needed to compute the m ij when i >j and these probabilities are all that is needed to compute the mean holding times µ i (n) . The above observations lead to the following as a general technique for finding all the elements of M for the case of a given MRP. Since we effectively use the computations of the GTH procedure, we call this the "Extended GTH" (EGTH) algorithm.

EGTH Algorithm
Step 1(i): Start with , carry out the GTH algorithm by calculating successively, for n = N, N -1, …, 2, where (Note that we only have to retain the , i.e. the n-th row and n-th column of for n = 2,…, N, as in the GTH algorithm.) Step 1 Comment: The steps that follow are based on the observation that by starting with P (N) , which we define as P (N)(1) , we are able to find expressions for , the first column of M, giving the MFPTs to state 1 from all the other states. Successively we permute the elements of P (N) so as to do this for each of the states 2, …,N. For state 2 we can do this by moving the elements of first column of P (N to after the N-th column, followed by moving the first row to the last row, to obtain a new transition matrix P (N) (2) .
There are a variety of ways we can do this. Here are three such ways: (ii) P (N)(2) (mod(row + N -2, N) + 1, mod(col + N -2, N) + 1) = P (N) (1) (row, col). This can be done in stages if necessary with say the row shift followed by the column shift.  (2) (iii) In MatLab use the "circshift" operator with Step 2: For k = 2, 3, 4,…, N -1, N. (i) Repeat Step 1(i) but with P (Comment: This steps leads to the appropriate and elements.) Another key observation is that the EGTH algorithm, as outlined above, retains the calculation accuracy with no subtractions being involved.
Note also that we do not compute the stationary probabilities in determining the MFPTs. In the MC setting they would typically be found using the basic GTH algorithm. However in this setting the stationary probabilities can also be found directly as the inverse of the m ii , alleviating the necessity of any prior calculation. For example in the MC setting, the initial holding times are we have from the first step in the EGTH algorithm that giving an alternative derivation of as Once again, no subtraction operation need be performed.

The Test Problems
The following test problems were introduced by Harrod & Plemmons ( [5]) and have been considered by others in different contexts. They were initially introduced as poorly conditioned examples for computing the stationary distribution of the underlying irreducible MC. While the dimensions of the state space are relatively small, the test problems lead to some computational difficulties. TP1: (As modified by Heyman and Reeves ([8]) These errors are given in Tables 1 and 2 below.   Heyman and Reeves ([8]) also considered four different techniques for M with their most accurate procedure based upon a state reduction procedure. We do not go into details of the procedures that they used but they compared the accuracy of the procedures by evaluating the number of accurate digits. The most accurate procedure in [7] was based upon using an LU factorization and normalization related to a state reduction procedure. In [8] the comparable procedure was also a state reduction procedure. The double precision result was used as the "true" result and the single precision result as the "computed" result. Considering the results of Heyman and O'Leary [7] and Heyman and Reeves [8], it is obvious that no procedure that they considered has any improvement over the results of this paper. Heyman and O'Leary have values between 6 and 7 for their favoured algorithm while Heyman and Reeves favoured algorithm appears to have values in the range of 7.2 to 7.4. Thus, the algorithm given in this paper produces results that have not been achieved in the past.
Using the computations for the MFPT matrix M, as calculated using the EGTH algorithm, we compute the elements of the stationary distributions as the reciprocal of the diagonal elements. We calculate the following errors for the stationary distribution, both in single and double precision: Minimum absolute error = min 1≤ j≤m where the π j are given by the calculations. We also compute the overall residual error when the stationary distribution is computed using the standard GTH algorithms. These calculations are given in Table 4 and 5.  As can be expected the errors for computing the stationary distributions using the well established GTH algorithm are very comparable with the EGTH procedure of this paper giving only a marginal reduction but in some isolated cases a slightly improved result.
In order to make comparisons in the case of the stationary distribution calculations that appear in the literature we also compare the errors between performing the calculations for both the EGTH and the original GTH algorithms in double and single precision as follows.
Let π S and π D be the stationary distributions as computed under single and double precision. As used in Harrod and Plemmons,([5]) the residual error is, in effect, the residual error computed as above under single precision, i.e, π S T − π S T P 1 . The relative error is computed as π S, j − π D, j j=1 m ∑ . We also compute the minimum absolute error min 1≤ j≤m π S, j − π D, j and the maximum absolute error max 1≤ j≤m π S, j − π D, j .
Harrod and Plemmons ( [5]) gave the exact solution for the solution of the stationary probabilities using some direct methods, however the transition matrix above is not irreducible, and consequently some of the entries of the stationary probability vector should have been zero. Heyman ([6]) commented that the GTH algorithm determines that states 1, 3, 4 and 8 are transient, although this is transparent from an examination of the transition graph. Heyman showed that the GTH algorithm, under single precision, on the above transition matrix produces 6 significant decimal digits (while some alternatives produce only 5) and showed that GTHRE = 4.5 E-08. These were compared with a range of procedures considered by Harrod and Plemmons (1984) that yielded MINRE = 6.9 E-08, MAXRE = 3.7 E-08.
As was done in Heyman and Reeves ([8]), we discard the transient states. With the state space S = {2, 5, 6, 7,9, 10} we consider the irreducible transition matrix as stated. Using the MatLab single precision our residual error (1.2080E-08) was an improvement over those stated above.

TP2:
Harrod and Plemmons, [5], stated the exact solution for the stationary distribution to 9 significant figures and showed that the smallest relative error they could achieve was of the order of 9.9 E-07. Heyman, [6], claimed that the GTH algorithm produces 6 significant decimal digits with a residual error of 9.64 E-08. Comparable to the figure of 8.56 E-08 that we have achieved. Under double precision we have been able to achieve 14 significant figures.
TP3: Harrod and Plemmons,([5]), give the exact solution for the stationary distribution to 9 significant figures and using a variety of procedures obtain the smallest residual error of the order of 3.0 E-08. Heyman, using the GTH algorithm, produces 6 significant decimal digits with alternatives giving only 1 or 2. He obtains a residual error of 3.1 E-08 for the GTH algorithm. We have improved this to 14 significant figures, once again with improved accuracy achieving a residual error of 1.4 E-17. [5] the original matrices were not stochastic. Heyman ([6]) corrected this to ensure stochasticity and showed that the stationary distributions of the MCs with these four transition matrices are all the same. He showed that the residual error for the GTH algorithm, under single precision, is 1.38 E-07 for all the four test problems, whereas we achieve accuracy within the range 5.49 E-08 to 7.96E-08.

TP4: In Harrod and Plemmons
All in all, the extraction of the stationary distribution as a byproduct of our EGTH algorithm gives comparable accuracy similar to that previously obtained.
In a sequel to this paper we explore some other techniques for computing the MFPTs for these matrices. The results of this paper are required as a benchmark in order to carry out comparisons of accuracy of the alternative procedures.