Development issues in algorithms for system level self-diagnosis

The paper deals with the problem of developing probabilistic algorithm for system level self-diagnosis. The main goal of the suggested algorithm is to minimize the mean time of its executing. The algorithm is based on the computing of the posterior probability of fault-free state of each system unit. Final decision about unit’s state is made on the chosen decision rule. The execution of the probabilistic algorithm is elucidated with the help of simple example and then explained for the case of more complex systems.


Introduction
In real complex systems, units are not necessarily homogeneous and can operate under different conditions. Therefore, units can have different levels of their reliability. This fact can be accounted for by assigning probabilities to the reliability of a unit. Probabilistic approach to system level self-diagnosis [2] doesn't deal with such problems as t-diagnosability [1] and testing assignment [6]. Self-diagnosis which focuses on probabilities of fault-free and faulty states of system units is called probabilistic diagnosis.
The benefits of the probabilistic diagnosis are as follows:  it is simpler than the system level self-diagnosis methods based on the PMC model [6],  probabilistic algorithms are faster algorithms without restrictive assumptions on the testing assignment or on the fault sets (i.e., they do not place an upper bound on the number of permitted faulty units).
Among the first who investigated the probabilistic algorithms were H. Fujiwara and K. Kinoshita [3]. The probabilistic algorithms are based on the computing of the posterior probabilities of system unit states, upon which the decision about the states of the system units is made. The algorithm presented here aims at minimizing the mean time of its executing. For this, the algorithm is structured so that it has several branches.

Probabilistic algorithm
It is assumed that there is statistical information about the average time of branch executing, t Bi . Each branch can be also assigned a probability that after its executing the states of all system units will be identified, P Bi . (i.e., algorithm ends). At the beginning, the branch which corresponds to the most probable situation in the system is executed. If after executing of the first branch the states of system units are not identified, then there will be executed the second branch. The choice of the second branch is also made according to the same criterion. There is also a probability that the third branch will be used.
According to the described approach, the mean time of algorithm executing is equal to where lg a t -time of algorithm execution.
Branches of the algorithm correspond to the certain faulty situations in the system. The most probable is the situation when all system units are fault-free. The given situation corresponds to the first branch of the algorithm. The second in order of probability is the situation when only one system unit is faulty. All the situations when the number of faulty units in the system is greater than 1 are considered as single situation. This combined situation corresponds to the third branch of the algorithm (see fig. 1) Generally, while developing probabilistic algorithms it is assumed that test doesn't provide perfect fault coverage (i.e., testing fault-free unit not always detects a fault in the tested unit). But for the algorithm under consideration it is assumed that fault coverage is equal to 100%. Given such assumption, it is easy to conclude that the first branch of the algorithm consists in calculating the sum of test results which are equal to 1. If the resulting sum is equal to 0, then the algorithm ends with issuing information about fault-free states of all system units. Otherwise, the second branch is executed. It also consists in calculating the sum of test results which are equal to 1. But this time not all test results are taken into consideration.
Both the results of tests performed on i-th unit and the results performed by i-th unit are excluded. This i-th unit is determined on the basis of revising in sequence all the test results obtained after performing testing in the system. As soon as the test result which is equal to 1 is found, the revising procedure ends, and this test result (e.g., r ji = 1) allows to determine the sought i-th unit.
Similarly to the first branch, if the resulting sum is equal to 0, the algorithm ends with issuing information about faulty state of unit u i . Otherwise, the third branch is executed. It consists in computing the posterior probabilities of system units fault-free states. Having determined all the sought posterior probabilities, we can make decision about the state of each particular unit.

Simple example
Let's consider with a simple example how the posterior probabilities are determined. In this case, there are only two units u 1 and u 2 . Their prior probabilities that units are fault-free, p 1 and p 2 , are known. The probabilities of their faulty state are equal to q 1 = 1-p 1 and q 2 = 1-p 2 respectively.
Let's assume that there was performed only one test τ 12 with the result r 12 = 0 (see fig. 2). The probability of event A can be determined by using the formula for total probability If we assume that the prior probabilities of fault-free state of units u 1 and u 2 take the same value and are equal to P 1 = P 2 = 0.8, then the posterior probability P 2 * will be equal to 0.973. From this example it follows that performing only one test with the result equal to 0 considerably increases the probabilities of the units fault-free states.

General case
Let's consider more complex example. Assume that the system consists of five units and its testing assignment is such as shown in fig. 3.

Fig. 3. Complex example
The results of tests are presented in the testing graph next to the corresponding edges. Let's also assume that the prior probabilities of fault-free states of the units are equal to P i = 0.8, i = 1, ..., 5.
According to the above presented method the following posterior probabilities of units fault-free states can be computed as P 1 * = 0.883, P 2 * = 0.939, P 3 * = 0.941, P 4 * = 0.055, P 5 * = 0.008. Analysis of the values of the received probabilities allows to make decision that units u 1 , u 2 and u 3 are fault-free, whereas the units u 4 and u 5 are faulty.
Usually, the decision about the states of system units is made on the basis of the chosen decision rule. There can be suggested simple likelihood ratio such as * * 1 i i P P    According to the above presented method, there can be developed the algorithm for determining the probability P i * for arbitrary testing graphs. The algorithm works as follows.
Given the testing graph G(V, E) and actual syndrome R F as algorithm input, the number of simple cycles with zero weights of edges, Z, is determined. For each simple cycle, C Z , the number of vertices, L Z , and the number of edges, K Z , it contains is determined. Then, for each vertex of the Z-th cycle the posterior probability of fault-free state of the unit which corresponds to this vertex is determined by formula  At the last step, all the rest tests which have the results r ji = 1, r ij = 1, r ij = 0 are accounted, and the sought probability P i * is corrected. The probability P i * is determined for each system unit u i , i = 1, ..., N.
Having determined the posterior probabilities of fault-free states for all system units, we can decide upon which system units are indeed fault-free and which are faulty. This decision is made on the basis of the chosen decision rule.
For determining the posterior probabilities of system units fault-free states there were considered different hypotheses about the sets of faulty units in the system. However, it is worth noting that there can be obtained the syndrome R F which doesn't correspond to any hypothesis. In this case, we say that there is a conflict situation. Presence of such situation indicates that there are intermittent faults in the system [4]. For diagnosing intermittently faulty system units special diagnosis algorithms should be applied [5].

Conclusions
Before developing diagnosis algorithm for system level selfdiagnosis, we have to make decision on several issues. First of all it is necessary to choose the diagnosis strategy (i.e., perform unique diagnosis, or sequential diagnosis, or excess diagnosis). Next issue concerns the similarity of system units (i.e., either homogeneous or heterogeneous system). Finally, we need to make assumption about allowable faulty situations in the system (i.e., consider situations when either only permanent faults are allowable, or only intermittent faults are allowable, or hybrid faulty situation is allowable). Probabilistic algorithm suggested in the paper is developed for unique diagnosis of heterogeneous systems when only permanent faults are allowable. Correctness of the diagnosis performed with the probabilistic algorithms depends considerably on the chosen decision rule and on the made assumptions about test results (e.g., assumption about fault coverage of the test). In view of this the probabilistic algorithms are correct as long as the made assumptions are met.