The Markov Chain Tree Theorem in commutative semirings and the State Reduction Algorithm in commutative semifields

We extend the Markov Chain Tree Theorem to general commutative semirings, and we generalize the State Reduction Algorithm to general commutative semiﬁelds. This leads to a new universal algorithm, whose prototype is the State Reduction Algorithm which computes the Markov chain tree vector of a stochastic matrix.


Introduction
The Markov Chain Tree Theorem states that each (row) stochastic matrix A has a left eigenvector x, such that each entry x i is the sum of the weights of all spanning trees rooted at i and with edges directed towards i.This vector has all components positive if A is irreducible, and it can be 0 in the general case.It can be computed by means of the State Reduction Algorithm formulated independently by Sheskin [27] and Grassman, Taksar and Heyman [12]; see also Sonin [28] for more information on this.
In the present paper, our main goal is to generalize this algorithm to matrices over commutative semifields, inspired by the ideas of Litvinov et al. [16,18,20].
To this end, let us mention first the tropical mathematics [1,5,13], which is a relatively new branch of mathematics developed over idempotent semirings, of which the tropical semifield, also known as the max algebra, is the most useful example.In one of its equivalent realizations (see Bapat [3]), the max algebra is just the set of nonnegative real numbers equipped with the two operations a ⊕ b = max(a, b) and a • b = ab; these operations extend to matrices and vectors in the usual way.Much of the initial development of max algebra was motivated by applications in scheduling and discrete event systems [1,13].While this original motivation remains, the area is also a fertile source of problems for specialists in combinatorics and other areas of pure mathematics.See, in particular, [17,21] According to Litvinov and Maslov [16], tropical mathematics (also called idempotent mathematics due to the idempotency law a ⊕ a = a) can be developed in parallel with traditional mathematics, so that many useful constructions and results can be translated from traditional mathematics to a tropical/idempotent "shadow" and back.Applying this principle to algorithms gives rise to the programme of making some algorithms universal, so that they work in traditional mathematics, tropical mathematics, and over a wider class of semirings.
There is a well-known universal algorithm, which derives from Gaussian elimination without pivoting.This universal version of Gaussian elimination was developed by Backhouse and Carré [2], see also Gondran [11] and Rote [26].Based on it, Litvinov et al. [16,18,20] formulated a wider concept of a universal algorithm, and discovered some new universal versions of Gaussian elimination for Toeplitz matrices and other special kinds of matrices.The semiring version of the State Reduction Algorithm found in the present paper can be seen as a new development in the framework of those ideas.
The present paper is also a sequel of our earlier work [4], where the Markov Chain Tree Theorem was proved over the max algebra.To this end, we remark that the max-algebraic analogue of probability is known and has been studied, e.g., by Puhalskii [25] as idempotent probability.Our work is also related to the papers of Minoux [23,24].However, the Markov Chain Tree Theorem established in the present paper is different from the theorem of [23] which establishes a relation between the spanning tree vector and bi-determinants of associated matrices of higher dimension.Also, no algorithms for computing the spanning tree vector are offered in [23,24].
Let us mention that the proof of universal Markov Chain Tree theorem given in the present paper generalizes a proof that can be found in a technical report of Fenner and Westerdale [9].In our development of the universal State Reduction Algorithm we build upon the above mentioned State Reduction Algorithm of [27,12,28].The work of Sonin [28] appears to be particularly useful here, since it provides most of the necessary elements of the proof.We recommend both the works of Fenner-Westerdale [9] and Sonin [28] to the reader as wellwritten explanations of the Markov Chain Tree theorem and the State Reduction Algorithm in the setting of classical probability.The proofs we give here are predominantly based on combining the arguments of these earlier works and verifying that they generalize to the abstract setting of commutative semirings and semifields.
When specialized to the max algebra, the universal State Reduction Algorithm provides a method for computing the maximal weight of a spanning tree in a directed network.Of course, the problems of minimal and maximal spanning trees in graphs, particularly undirected graphs, have attracted much attention [14].Recall that in the case of directed graphs, the best known algorithm is the one suggested by Edmonds [8] and, independently, Chu and Liu [6].This algorithm has some similarities with the universal State Reduction Algorithm (when the latter is specialized to the max algebra), but we will not give any further details on this.
Let us also mention that the State Reduction algorithm can be seen as a special case of the stochastic complements technique, see Meyer [22].
The rest of the paper is organized as follows.In Section 2 we obtain the universal version of the Markov Chain Tree Theorem.In Section 3 we formulate the universal State Reduction Algorithm and provide a part of its proof.Section 4 is devoted to the proof of a particularly technical lemma (basically following Sonin [28]).

Markov Chain Tree Theorem in Semirings
A semiring (S, +, •) consists of a set S equipped with two (abstract) binary operations +, •.The generalized addition, +, is commutative and associative and has an identity element 0. The generalized multiplication • is associative and distributes over + on both the left and the right.There also exists a multiplicative identity element 1 and the additive identity is absorbing in the the sense that a • 0 = 0 for all a ∈ S. We shall only be concerned with commutative semirings, in which • is also commutative.Next we list some well-known examples of semirings where Theorem 2.6 is valid.
Example 2.1.Classical nonnegative algebra which consists of the set of all nonnegative real numbers together with the usual addition and multiplication is a commutative (but not idempotent) semiring.Example 2.5.Given a semiring S with idempotent addition (a + a = a), equipped with the canonical partial order a b iff a+b = b, an Interval Semiring I(S) (see [19]) can be constructed as follows.I(S) consists of order-intervals [a 1 , a 2 ] (where a 1 a 2 ) and is equipped with the operations + and • defined by We define addition A + B and multiplication AB of matrices over S in the standard fashion.Given a matrix A ∈ S n×n , the weighted directed graph D(A) is defined in exactly the same way as for matrices with real entries.
Let us proceed with some graph-theoretic definitions.By a (spanning) itree we mean a (directed) spanning tree rooted at i and directed towards i.A functional graph (V, E) is a directed graph in which each vertex has exactly one outgoing edge.Such graphs are referred to as "sunflower graphs" in [13].It is easy to see that a functional graph in general contains several cycles, which do not intersect each other.A functional graph having only one cycle that goes through i and is not a loop (that is, not an edge of the form (i, i)) will be called i-unicyclic.
Let T be a subgraph of D(A).Define its weight π(T ) as the product of the weights of the edges in T .We will use this definition only in the cases when T is a directed spanning tree or a unicyclic functional graph.By the total weight of a set of graphs (for example, the set of all i-trees or all i-unicyclic functional graphs) we mean the sum of the weights of all graphs in the set.
We now present a semiring version of the Markov Chain Tree Theorem.This proof is a semiring extension of the proof in Fenner-Westerdale [9].See also Freȋdlin-Wentzell [10] Lemma 3.2 and Sonin [28], Lemma 6.
We denote the set of all i-trees in D(A) by T i .The Rooted Spanning Tree (RST) vector w ∈ S n is defined by In general, the set T i may be empty and then w i = 0.In the usual algebra and in the max algebra, w is positive when A is irreducible.
A matrix A ∈ S n×n is said to be stochastic if Markov Chain Tree Theorem in Semirings Theorem 2.6.Let A ∈ S n×n and let w be defined by (1).Then for each i = 1, . . ., n, we have If A is stochastic then Proof: To prove (2) we will argue that both parts are equal to the total weight of all i-unicyclic functional digraphs, which we further denote by π[i].
On the one hand, every combination of an i-tree and an edge (i, j) with j = i results in an i-unicyclic functional digraph.Indeed, the resulting digraph is clearly functional; moreover, every cycle in it has to contain the edge (i, j), so there is only one cycle.Hence, using the distributivity, the left hand side of (2) can be represented as sum of weights of some i-unicyclic functional digraphs.As each i-unicyclic functional digraph is uniquely determined by an i-tree and an edge (i, j) where j = i, the above mentioned sum contains all weights of such digraphs, with no repetitions.Thus the left hand side of ( 2) is equal to π[i].
On the other hand, every combination of a j-tree and an edge (j, i) with j = i also results in an i-unicyclic functional digraph (since every cycle in the resulting functional graph has to contain the edge (j, i)).Hence, using the distributivity, the right hand side of (2) can be also represented as sum of weights of some i-unicyclic functional digraphs.If we take an i-unicyclic functional graph then i may have several incoming edges, but only one of them belongs to the (unique) cycle.Hence there is only one j such that there is an edge (j, i) and a path from i to j so that a j-tree exists.Thus an i-unicyclic functional digraph is uniquely determined by a j-tree and an edge (j, i) where j = i, and the right hand side of ( 2) is also equal to π[i].
Equation (3) results from adding w i a ii to both sides of (2) for each i, and using the stochasticity of A.
Example 2.7.Consider the Boolean algebra over the two-element set U = {σ 1 , σ 2 }.Observe that the 3 × 3 matrix Referring to (1), it is readily determined that the rooted spanning tree vector for A 1 is the zero vector.
On the other hand, for the stochastic matrix find that the rooted spanning tree vector is 0 σ 2 σ 2 .We note in passing that for the matrix A 2 , the techniques of [15] can be used to show that the vectors 1 1 σ 2 and 0 σ 2 1 form a basis for the left eigenspace of A 2 corresponding to the eigenvalue 1.

State reduction algorithm in semifields
In this section, we describe an algorithm for computing the spanning tree vector w in anti-negative semifields.We first recall some necessary definitions.A semiring (S, +, •) is called a semifield if every nonzero element of S has a multiplicative inverse.The semirings in examples 2.1 and 2.2 are commutative semifields.
A semifield S is antinegative if a + b = 0 implies that a = b = 0 for a, b ∈ S.
Algorithm 3.1 below provides a universal version of the state reduction algorithm.Following [20] we describe this in a language derived from MATLAB.The basic arithmetic operations here are a + b, ab and inv(a) := a −1 .For simplicity, we avoid making too much use of MATLAB vectorisation here.However, we exploit the functions "sum" and, respectively, "prod", which sum up and, respectively, take product of all the entries of a given vector.Algorithm 3.1.State reduction algorithm for anti-negative semifields.
Input: An n × n matrix A with entries a(i, j) and at least one non-zero offdiagonal entry in each row, A is also used to store intermediate results of the computation process.Phase 1: State Reduction In order for the algorithm to work, it is necessary to ensure that the elements s i are non-zero at each step.To this end, we assume that the matrix A has at least 1 non-zero off-diagonal element in each row.Formally, for 1 ≤ i ≤ n, there exists some j = i such that a ij = 0.A simple induction using the next lemma then shows that s i will be non-zero at each stage of Algorithm 3.1, Phase 1. Lemma 3.2.Let A ∈ S n×n have at least one non-zero off-diagonal element in each row.Let s = n j=2 a 1j and define Â ∈ S n×n as follows: (i) âij = a ij + s −1 a i1 a 1j for i, j ≥ 2; (ii) âij = a ij otherwise.
Then for 2 ≤ i ≤ n, there is some j ≥ 2, j = i with âij = 0.
Proof: Let i ≥ 2 be given.By assumption, there is some j = i with a ij = 0.If j ≥ 2, (i) combined with the antinegativity of S implies that âij = 0.If not, then it follows that a i1 = 0 and again by assumption there is some j with a 1j = 0.As S is antinegative, it is immediate from (i) that âij = 0. Remark 3.3.Phase 1 is, in fact, similar to the universal LDM decomposition described in [20], with algebraic inversion operations instead of algebraic closure (Kleene star).Algorithm 3.1 requires n 3  3 + O(n 2 ) operations of addition, 2n 3 3 + O(n 2 ) operations of multiplication and n − 1 operations of taking inverse.The operation performed in Phase 1 can be seen as a state reduction, where a selected state of the network is suppressed, while the weights of the edges not using that state are modified.Recall that in the usual arithmetic and if A is stochastic, the weights of edges are transition probabilities.
For instance, on the first step of Phase 1 we suppress state 1 and obtain a network with weights We inductively define for i = 1, . . ., n− 1.So kl is the matrix of the reduced network obtained on the ith step of Phase 1, by forgetting the states 1, . . ., i.
Denote by w (i) the spanning tree vector of the ith reduced Markov model (with n − i states).This vector has components w n .We will further use the following nontrivial statement, whose proof (following Sonin [28]) will be recalled below in Section 4. Lemma 3.4.For all i < k we have s i • w Let us show (modulo this Lemma) that Algorithm 3.1 actually works.Theorem 3.5.Let S be a commutative anti-negative semifield and A ∈ S n×n be such that every row contains at least one nonzero off-diagonal element.Then Algorithm 3.1 computes the spanning tree vector of A. If A is stochastic then this vector is a left eigenvector of A.
Proof: We will prove this theorem by induction, analyzing Phase 2 of Algorithm 3.1.
To begin, we show that initializing w(n) = s n−1 and performing 1 step of Phase 2, (w(n − 1), w(n)) is the spanning tree vector of the reduced matrix A (n−2) on the 2 states n − 1, n.It is easy to check that in this case, we obtain n−1,n so that in this case, (w(n − 1), w(n)) is indeed the spanning tree vector of A (n−2) as claimed.
For the inductive step, let us make the following assertion: If we initialize the beginning of Phase 2, then the vector w(i + 1), . . ., w(n) obtained on the n − i − 1 step of Phase 2 is the spanning tree vector w n of the ith reduced network, with the states 1, . . ., i suppressed.
We have to show that with the above assertion, if we initialize w(n) = s i • . . .• s n−1 then the vector w(i), . . ., w(n) obtained on the n − i step of Phase 2 is the spanning tree vector of the i − 1 reduced network.
Indeed, we have by Lemma 3.4.Combining this with the induction hypothesis and our choice of w(n), we see that the components w(i + 1), . . ., w(n) are indeed equal to the entries w i+1 , . . ., w of the spanning tree vector.Next, observe that Algorithm 3.1 computes w(i) using w(i + 1), . . ., w(n) via the balance equation: As s i is invertible, it now follows from Theorem 2.6 that w(i) = w

Proof of Lemma 3.4
This proof follows closely that given in Sonin [28], Section 5. Our main reason for including it in full is to verify that it generalizes to an arbitrary antinegative semifield and to give, in our view, a different and more transparent explanation of the initial proof.
We have to show that s i • w It is enough to consider the case when i = 1 and k > 1.For convenience, let us assume k = n, so we are to prove that s 1 w (1) n = w n .Recall that here w n is the total weight of all n-trees, s 1 = j>1 a 1j , and w n is the total weight of all n-trees in the reduced Markov model where the weight of any edge (k, l) for k, l > 1 equals In every tree T = (V (T ), E(T )) that contributes to w n we can identify the set D of nodes i such that (i, 1) ∈ E(T ) (the edge originating at i terminates at 1).Further, each tree contributing to w n is uniquely determined by 1) the set D, 2) the forest F whose (directed) trees are rooted at the nodes of D ∪ {n}, and 3) the edge starting at node 1 and ending at a node of the tree rooted at n.
In contrast to the case of w n , w n (using the distributivity property of S) can be written as a sum of terms, where each term is determined not only by an n-tree on the set {2, . . ., n}, but also by the choice of the first or the second term in (8), made for each edge of the tree.For every such term we can identify the set of nodes D such that for each edge starting at one of these nodes the second term in ( 8) is chosen.Further, each term contributing to w (1) n is uniquely determined by 1) the set D, 2) the forest F whose trees are rooted at the nodes of D ∪ {n} and 3) by the mapping τ from D to {2, . . ., n} (which is, in general, neither surjective nor injective).
Given a forest F on the set D ∪ {n} and k ∈ D ∪ {n}, we denote by T k (F ) the tree rooted at k.
In view of the above and making use of the distributivity property of S, the equation n is equivalent to the following: As the set of all pairs (D, F ) and the set of all pairs ( D, F ) are identical, we are left to prove the following identity The proof of (10) makes use of the following well-known combinatorial identity, whose derivation we will briefly explain, for the reader's convenience.Let T be an n-tree on {1, . . ., n}, and let T be the set of all n-trees.For each node k ∈ {1, . . ., n}, its indegree indeg(k, T ) in T is defined as the number of ingoing edges.Let x 1 , . . ., x n be arbitrary scalars from S. We will use the following version of Cayley's tree enumerator formula: Recall that this formula admits a classical proof which works in any commutative semiring.Indeed, observe that for each term on the right hand side of (11), there is at least one variable among x 1 , . . ., x n−1 which does not appear, since each tree has at least one leaf.The same is true about the left hand side. of (11), since any monomial in the expansion of (x 1 + . . .+ x n ) n−2 has total degree n − 2, which is one less than n − 1. Due to this observation, it suffices to prove Observe that by induction (whose basis for n = 2 is trivial) we have where T ′ is the set of all (directed) n-trees on nodes 2, . . ., n. Multiplying both parts of ( 13) by (x 2 + . . .+ x n ) and using the identity which is due to the bijective correspondence between the trees in T having node 1 as a leaf and the combinations of trees in T ′ and edges issuing from node 1, we obtain (12) and hence (11).
To apply (11), observe first that each mapping τ in (10) defines a mapping on D ∪ {n}: we put an edge (u, v) for u, v ∈ D ∪ {n} if τ (u) belongs to the tree rooted at v. Further, this mapping defines a directed tree on D ∪ {n}, rooted at n.In particular, observe that any cycle induced by τ would yield a cycle in the original graph (which is a spanning tree on the nodes 2, . . ., n rooted at n).Also, none of the nodes except for n can be a root since τ is defined for all nodes of D. We will refer to such a tree on D ∪ {n} as a τ -induced tree, or just induced tree if the mapping is not specified.
For any pair (D, F ) and for any n-tree T on D ∪ {n} we can find a mapping τ : D → {2, . . ., n} which yields T as a τ -induced tree.Thus for any given pair (D, F ), the set of all possible induced trees (with all possible τ ), coincides with the set of all n-trees on D ∪ {n}.This set will be further denoted by T induced .
Let us set x l = k∈T l (F ) a 1k for all l ∈ D ∪ {n}.Applying (11) to the set of all reduced trees, with these x l , a fixed pair D, F , and |D| + 1 instead of n, we have These equalities can be explained as follows.On the first step, we classify mappings τ according to the induced trees that they yield.On the next step, D is represented as a union over all sets in(l, T ) where l ∈ D ∪ {n}, and we use the fact that each τ : D T → {2, . . .n} can be decomposed into a set of some "partial" mappings σ : in(l, T ) → T l (F ), and vice versa; every combination of such "partial" mappings gives rise to a mapping τ that yields T (as a τ -induced tree).On the last step we use the multinomial semiring identity To understand this identity observe that the left hand side of ( 17) is a product of indeg(l, T ) = | in(l, T )| identical sums of |T l (F )| terms.By distributivity, this product can be written as a sum of monomials, where each monomial corresponds to a combination of choices made in each bracket, and hence to a mapping σ : in(l, T ) → T l (F ).Finally, by ( 16) the right-hand sides of ( 10) and ( 15) are equal, and this completes the proof.

Example 2 . 2 .
What we are referring to as the max algebra is often called the max-times algebra to distinguish it from other isomorphic realisations.The maxplus algebra (isomorphic to max algebra via the mappingx → exp(x)) consists of S = R ∪ {−∞}with the operations a + b = max(a, b) and a • b = a + b.The min-plus algebra (isomorphic to max plus algebra by the mapping x → −x) consists of S = R ∪ {+∞} with the operations a + b = min(a, b) and a • b = a + b.All of these realisations are commutative idempotent semirings.Example 2.3.Let U be a set, and consider a Boolean algebra of subsets of U .This is an idempotent semiring where a + b = a ∪ b and a • b = a ∩ b for any two subsets a, b ⊆ U .In the case of finite U , matrix algebra over U was considered, e.g., by Kirkland and Pullman[15].

Example 2 . 4 .
The max-min algebra consisting of S = R ∪ {−∞} ∪ {+∞} equipped with a + b = max(a, b) and a • b = min(a, b) for all a, b ∈ S is another commutative idempotent semiring.

σ
are left to show that the right-hand sides of (10) and (15) coincide.For an induced tree T , let τ : D T → {2, . . .n} denote the fact that T is τinduced.For each l ∈ D ∪ {n} let in(l, T ) denote the set of in-neighbours of l.Consider the following chain of equalities, with D and F fixed.τ : D→{2,...,n} k∈D a 1τ (k) = : in(l,T )→T l (F ) s∈in(l,T )