An Application of Graph Theory in Markov Chains Reliability Analysis

The paper presents reliability analysis which was realized for an industrial company. The aim of the paper is to present the usage of discrete time Markov chains and the flow in network approach. Discrete Markov chains a well-known method of stochastic modelling describes the issue. The method is suitable for many systems occurring in practice where we can easily distinguish various amount of states. Markov chains are used to describe transitions between the states of the process. The industrial process is described as a graph network. The maximal flow in the network corresponds to the production. The Ford-Fulkerson algorithm is used to quantify the production for each state. The combination of both methods are utilized to quantify the expected value of the amount of manufactured products for the given time period.


Introduction
The reliability of production plays the fundamental role in an industrial sphere.Nowadays the reliability of industry process is on a high level.It is increased by improving the quality of each component or by redundancy of the production process.Even though it is the top reliability process, there is still a chance that system fails.In our case we analyse the process which has no redundancy.Thus the information about the probability of the systems failures is very valuable.
In the previous work an [1] a reliability analysis of the part of an industry process was realized.In the former application there was analysed a part of the industry process distributed in parallel.The main aim of the cited paper was to estimate the probability, that the production will be equal or greater than the industrial partner demand.To deal with the task a Monte Carlo simulation of Discrete time Markov chains was used.
The goal of this paper is to present the Discrete time Markov chain analysis realized on the system, which is not distributed in parallel.Since the analysed process is more complicated the graph theory approach seems to be an appropriate tool.
Graph theory has previously been applied to reliability, but for different purposes than we intend.One of example of application of graph theory in reliability is a reliability polynomial [3] or network reliability.In this approach a graph describes a network where each component (edge) has the same probability of fail.The most related work to our paper is probably the research of Christopher Dabrowski [4], [5].As well as in our paper Dabrowski uses graph theory as a tool for counting with discrete time Markov chains.In contrast to our work the Markov chains described in Dabrowski one are used for different purposes.Dabrowski analyses a large scale grid.His study describe the application for finding states of the grid, which could lead to the system degradation.Dabrowski uses the graph theory to describe transitions of the Markov chain model between the initial state and the absorbing state.We use the graph theory to describe the analysed process on which the discrete time Markov chain is be applied.

Markov Chains
Markov chain is a random process with a discrete time set T ⊂ N ∪ {0}, which satisfies the so called "Markov property".The Markov property means that the future evolution of the system depends only on the current state of the system and not on its past history.
where X 1 , . . ., X n is a sequence of random variables.The index denotes certain time t ∈ T x 1 , . . .x n is a sequence of states in time t ∈ T .As a transition probability p ij we regard probability, that the system changes from the state i to the state j: Matrix P, where p ij is placed in row i and column j, is for all admissible i and j called transition probability matrix: Clearly all elements of the matrix P satisfy the following property: As a probability vector we will understand a vector v where, an amount of its elements v i are equal to the amount of states in Markov chain and the following equations holds: By v(t) we denote vector of probabilities of all the states in time t ∈ T .By v(0) we denote an initial probability vector.Usually, all its coordinates are equal to zero except the first, which is equal to 1.The vector v(0) denotes the probability in which state the system occurs in time t = 0.It is easy to proof that [6]: As a stationary distribution we will understand a vector z which satisfies the property: Suppose that the limit π = lim n→∞ π(n) exists, then the probability vector π is called limiting distribution (limiting vector): In some literature there is written that, the stationary distribution is equal to the limiting distribution.Actually it is true only if the discrete time Markov chain (further as DTMC) satisfies the condition of ergodicity (irreducibility, aperiodicity).To be able to calculate the limiting distribution π we will describe further properties of DTMC.
A Markov chain j is periodic with period p, if on leaving state j a return is possible only in a number of transitions that is a multiple of the integer p > 1.
For example Markov chain with following transition probability matrix: is periodic with period p = 2.The DTMC is said to be irreducible if every state can be rached from every other state.
The probability of ever returning to state j is denoted by f jj and is given by: If f jj < 1, then state j is said to be transient.If f jj = 1, the state is recurrent and we can define the mean recurrent time.
As mean recurrent time M jj we will understand: If M jj < ∞ (or equal to ∞), we say that the state is positive recurrent (or null recurrent).In a finite Markov chain, each state is either positive-recurrent or transient, and furthermore, at least one state must be positive-recurrent [6].Theorem Let a Markov chain C be irreducible.Then C is positive-recurrent or null recurrent or transient, i.e.: • all the states are positive-recurrent, or • all the states are null-recurrent, or • all the states are transient.
If all the states of DTMC are positive-recurrent and aperiodic, then the probability distribution v(n) converges to a limiting distribution π, which is independent of initial distribution v(0).The π can be obtained be solving the following equations: (11)

Preliminaries
In this section we will establish fundamental terminology of graph theory which will be used further in this paper.

Definition of an Oriented Graph
An oriented graph is an ordered pair G = (V ; E), where V is the set of vertices and E is the set of edges.E ⊆ V × V .

Definition of Network
Network is a four-tupple S = (G; s; t; x), where: • G is an oriented graph, • vertices s ∈ V (G), t ∈ V (G) are the source and sink, • x : E(G) → R + is a a positive labelling on edges, called edge capacities.

Definition of Flow in Network
Flow in a network S = (G; s; t; x) is a function f : E(G) → R + 0 , where: • no edge capacity is exceeded: • the conservation of flow equation holds: The value of a flow f is f = e←z f (e)− e→z f (e).

Definition of Cut
Cut in the network S = (G; s; t; x) is such a set of edges C, C ⊆ E(G), such that in the factor of graph G, G − C, no oriented path remains.As minimal cut we will understand the cut where each proper subset of C is not a cut in the network.

Problem Formulation
The research presented here, was motivated by the practical problem.Analysed company was asked to quantify the probability that production fails was.Knowledge of risk, that the order won't be delivered in time, is important for the partner's firm to establish sufficient gods supplies.In the previous application see [1] a reliability analysis of the part of an industry process was realized.For each machine we could distinguish two modes -in order '1' or in fail '0'.Thus the whole system could occur in one of the 2 n states where n is an amount of machines of analysed industry process.Since machines were organized in parallel it was easy to calculate a whole production of each state.The production of the certain state was calculated as a sum of production of functional states.More complicated situation occurs when the system is not connected in parallel.The aim of this paper is to present a calculation of a maximal production of each of 2 n states by usage a graph flow in network theory.In the following application we will simplify the process that, even at the beginning the gods can be reached from the begin to end of the process.We assume that for each state the process will produce a maximum possible production w.

Application
In this section we will present an example how to use a flow in network theory in a reliability analysis.First, we will define states of the industry process and calculate a maximum production of each state by using the Ford Fulkerson algorithm.At the end we will demonstrate on the certain data how to calculate an expected value of production for a given time t.
The analyzed industry process consists of six machines.For each machine we distinguish 2 different states one -work, zero -in fail.Thus the industrial process consists of 2 6 different states.State denotes an ordered six -tuple of ones and zeroes.
Let us describe the industry process of a firm with an oriented network.Every machine is represented as a vertex.The begin and the end of the processes represented by a source and sink of the network.The begin of a process consists of acceptance of gods and division between machines of the process.The end of a process consists of product inspection and expediting to the customers.Oriented edges describes the direction of the production process.Labelling of vertices represents the maximum amount of gods processed by the certain machine.To be able to work with labelled edges each vertex V 1 , V 2 , . . .V 6 from the structure of the process (Fig. 1) is replaced by two vertices connected by the edge labelled by the same value as former vertex.
For our purposes all edges except newly created edges (with original labelling of vertices ) incident with source and sink are labelled by ∞.For each of 2 6 state we will find the maximum flow in a network.For each state the edges incident with the vertices representing the machines occurred in failure will be removed from the network.To calculate the maximal production of each state we will find a max flow in network.In our application we will assume that the maximal flow is achieved for any possible state.Without this simplification, there would be nearly impossible to estimate the production of certain state.

Begin
End Fig. 1: Structure of process.
To find a maximal flow we will use a well known Ford-Fulkerson algorithm.For searching the graph edges a "Breadth-first" search was used.In the Tab. 1 there is a demonstration of several states and their production w i .
To be able, to compute with DTMC, we need to calculate elements of transition probability matrix.To calculate the transition probability matrix P we need to calculate probabilities of failure p f , and probability of repair p r for each machine.To estimate probability p f that the system fails during one hour we calculated the expected value as an average length of period fail Using the maximum likelihood method we estimated the probability p r which says that a machine will be repaired within one hour: where V is an amount of all repairs that were realized and ∆V is an amount of all repairs that lasted less than one hour.
The calculated probabilities for given machines V 1 , V 2 ,. . .V 6 are presented in the Tab. 2. Since there are 2 6 different states the probability matrix will consist of 2 6+6 elements.With presented probabilities p f , p r we can calculate the probability transition matrix.For example p 11 = 6 i=1 p f i and p 89 probability that the system changes from the state 8 to the state 9 in the Eq.14, see the Tab.1: Tab. 1: Production of states.Each element of transition probability matrix is a result of multiplication of non-zero probabilities.Since the probability matrix consists of non-zero elements (some of them re close to zero), every state can be reached from every other in time t = 1 the analyzed DTMC is irreducible and aperiodic.Because of the DTMC is finite and irreducible it is also positiverecurrent.Then we can use the Eq. 6 to calculate the limiting distribution π.The Eq. 11 were solved numerically by using the "backslash"(implemented solver of linear equations) from Matlab program.In Tab. 4 there are presented few Elements of the 1st row of transition probability matrix and limiting distribution π.
After calculating the transition probability matrix P and the production of each state we can quantify the reliability of the industry process.To describe the reliability of the process we will calculate the expected value of production.The expected value of production W within time T is a sum of all expected values for each time step t ∈ {1, 2, • • • , T }.In our case, the expected value E(W )(t) of production for the certain time t is equal to: where W is a vector of productions w of all 2 6 states.
The results for calculating an expected value of production for time step 1 (time t) are presented in the Tab. 3 (maximal production for one time step is 170).
According to the Tab. 3 we can conclude, that the expected value of production for the certain time quickly converges to the limit expected value (π • w).

Complexity of the Algorithm
In this section we aim to estimate the complexity of the Ford-Fulkerson (F-F) algorithm for calculating the production W of all states.As a T (n) we will denote the worst-case time complexity of an algorithm with the input of size n.The complexity of F-F algorithm is T (n) = O(n 3 ).In our case the size of input n is equal to the number of vertices |V | of the network.The algorithm should run up to The given formula is an upper bound of complexity of the algorithm for searching the vector of production W in a network.To be able to estimate a complexity of the whole application we have to add a complexity of matrix exponentiation which is less than T (n) < 2 |V | 3 t and multiplication of first row of transition probability matrix by the vector W which is equal tu 2 |V | .The last form is not very significant In comparison with complexity of matrix exponentiation and calculating the vector W of production.
The complexity of calculating the production W will also depend on the amount of vertices of the network and on the character of the network.Another way how to decrease the amount of computation is to use the minimal cut.Let S f fail state be a state with zero production w equal to zero.Let C denotes the set of vertices of the state S f .Clearly the edges incident with C form the cut of the network.Any other state where the vertices C are in failure has the zero production.
For example in the Tab. 1 we can see that the production of the 8 th (0,0,1,1,1,1) state is zero.Thus any state (for example the 23 rd state), where the first and the second vertex is in fail has also the zero production.For those states, there is no need to use the Ford-Fulkerson algorithm to calculate the maximal flow.For the large time t there would be more suitable to to use limit probability distribution than to exponentiate the matrix.
A problem could occur with saving the transition probability matrix in the computer memory.An amount of elements of the matrix grow exponentially (2 2n ) with an amount of elements of analysed systems.The problem could by partially solved by rounding small values to zero and save the matrix as a sparse matrix.In this case there should be more complicated to proof the convergence of the DTMC.Another way how to work with large matrices is to dived the DTMC into more separate systems and calculate with their transition probability matrix separately.Other possibility how reduce the transition probability matrix is to shorter the time step and define only one possible change within the time step.It has the effect of simplifying the transition matrix.

Conclusion
In the paper we have presented the application of graph flow in the network in the reliability analysis.The main worth of this work is an innovative usage of discrete time Markov chains and the flow in network theory in reliability analysis.The DTMC we used to describe the changes of the analysed process.For calculating for larger time step t, there is possible to use the limiting distribution instead of exponentiate the probability matrix.The Ford-Fulkerson algorithm was used to compute the industry production of analysed system.The main disadvantage of the presented approach is a high computational complexity.In further research there is possible to modify the Ford-Fulkerson algorithm to be more suitable for our purposes.