A Method for rApid evAluAtion of kout-of-n systeMs reliAbility

IEC 61508 standard could be used in the evaluation of safety of the k-out-of-n technical systems, including elements which may remain in one out of four different reliability states. Such a model leads to the huge complexity of analytical calculations and the limitations of its practical application possibilities. Therefore, a computerised method using Markov processes for estimating the reliability of k-out-of-n systems was developed. The algorithmization of the applied computational procedure was performed. It allowed one to analyse systems including a huge number of elements. An algorithm that may be applied for complex k-out-of-n systems was developed and used for exemplary calculations. The developed method was verified by comparing the obtained results with the ones obtained from analytical method as well as simulation method. The compatibility of results obtained in the two methods confirms the correctness of the developed procedure and proposed computer program which now offers the possibility of doing calculations for k-outof-n structures with more than three elements required for the system’s proper functioning and significantly accelerates calculations. Reliability and safety are priorities in the operation of technical systems. This decides of the applicability of the calculation methods described. The operational safety aspects are of particular significance in cases when the occurrence of a failure is a hazard to people’s health and life, ecological risk or considerable financial loss.


MetodA szybkiej oceny niezAwodności ukłAdów typu k z n*
IEC 61508 standard could be used in the evaluation of safety of the k-out-of-n technical systems, including elements which may remain in one out of four different reliability states.Such a model leads to the huge complexity of analytical calculations and the limitations of its practical application possibilities.Therefore, a computerised method using Markov processes for estimating the reliability of k-out-of-n systems was developed.The algorithmization of the applied computational procedure was performed.It allowed one to analyse systems including a huge number of elements.An algorithm that may be applied for complex k-out-of-n systems was developed and used for exemplary calculations.The developed method was verified by comparing the obtained results with the ones obtained from analytical method as well as simulation method.The compatibility of results obtained in the two methods confirms the correctness of the developed procedure and proposed computer program which now offers the possibility of doing calculations for k-outof-n structures with more than three elements required for the system's proper functioning and significantly accelerates calculations.Reliability and safety are priorities in the operation of technical systems.This decides of the applicability of the calculation methods described.The operational safety aspects are of particular significance in cases when the occurrence of a failure is a hazard to people's health and life, ecological risk or considerable financial loss.Keywords

For Markov Processes modeling:
S i -i-th state of the system, S P -absorbing state, n U -number of forms (types) of failures of a component, λ j -type j failure rate, μ j -rate of repair in reference to type j failures.

Introduction
To assess the reliability of repairable systems of complex reliability structure, whose times-to-failure and times-to-repair of technical components can be described by exponential distribution, Markov processes are successfully employed [3,11,14].The limitations due to the application of exponential distribution to describe failure and repair processes do not significantly affect the practical applicability of these calculation procedures.An interesting approach for failures prediction is presented in [2].In the calculation models used it is generally assumed that technical components are two state components, i.e. they can be in the state of either availability or unavailability.
Reliability is strictly connected with functional safety of complex technical systems and the correlation between the two concepts is clear in the IEC 61508 and IEC 61511 standards [7,9].
The aim of the present article is to discuss the problems related to the safety of complex technical systems and to, first, characterize the assumptions made in the calculation procedures employed in the aforementioned standards, and next to explain how these assumptions complicate the graphs of states and reliability calculations.With the above in view, the authors have formulated the assumptions for the algorithmization of calculation procedures using Markov processes.On the basis of the analysis they have developed a calculation program which can be used successfully for the verification of the calculations done following the IEC 61508 standard.
The correct operation of the proposed calculation program has been verified on some examples comparing the results obtained with those reached from analytical calculations.Additionally, comparative calculations have been performed using BlockSim Reliasoft.

Assumptions for the assessment of safety-related systems failure probability
The IEC 61508 and IEC 61511 standards were developed to meet the needs of creating functional safety of technical systems whose failure during operation might pose a severe hazard to the environment, and even to people's health and life.
In many cases meeting operational safety requirements, often very rigorous, needs the introduction of additional systems -safety-related systems.Their task is to continuously monitor selected parameters of the system used, and when they reach the boundary values or when specified symptoms occur, to perform the programmed functions to prevent a hazardous event.
Therefore, the methodology of the calculations of the aforementioned standards is based on the operational safety requirements of the given system, most frequently described as the risk level tolerable for this system.The risk is understood here as the product of the frequency of occurrence of hazardous events and their consequences [7].Assuming the invariability of the consequences of hazardous events, the possibility of risk reduction from the level generated by the system used to the tolerable level depends on the functioning reliability of the additional safety-related system.Consequently, for a safety-related system there are defined limits of the so-called mean likelihood of its failure (PFD -Probability of Failure on Demand that is dimensionless and determined when the safety function is invoked less frequently than once a year or PFH -Probability of Failure per Hour), on which the risk reduction to an admissible level at least will depend [4].These values are determined on the basis of the quotient [7,11]: where: P A -tolerable frequency of hazardous events, P B -frequency of hazardous events generated by the system used.
To facilitate the use of these values and classification of safety systems based on their risk reduction potential probability ranges PFD and PFH were adopted as the so-called Safety Integrity Levels at four levels from SIL 1 to SIL 4 [5,7,9].In the process of design and monitoring of safety-related systems employed in industry it is necessary to specify the PFD or PFH values and check whether they are comprised within the SIL required for them as related to the necessary risk reduction.
The safety-related systems are usually composed of three (figure 1) series connected subsystems [7]: subsystems of sensors -elements that measure the values of stated parameters, logic subsystem -processes the signals from sensors and based on programmed functions and depending on the value and number of signals, actuates the executive subsystem, executive subsystem -elements performing the specific safety function of preventing a hazardous event.Each subsystem has a specific reliability k-out-of-n structure and the reliability analysis is carried out only during their ordinary operation -when the components failure rate λ is constant, non-aging components (cf.[1]).Since the subsystems are composed of electronic, electrical and programmable electronic devices (E/E/PE), they can be diagnosed during operation.This is executed by diagnostic tests at time intervals T 2 .This is important because it enables fast detection of a portion of failures of component, which reduces their downtime.From among all dangerous failures of a component the portion that can be detected by a diagnostic test is defined by diagnostic coverage of the component (DC).Assuming that diagnostic test detectable failures and those that are undetectable occur independently, the following can be written: In case when λ D and DC are known, one can calculate: and: All component failures undetected by diagnostic test are detected in a periodical test at time interval T 1 , and T 1 >>T 2 .In the periodical test all components are checked and its 100% efficiency of detecting any type of component failures is assumed.
On the basis of the above assumptions it can be stated that any component can be in the reliability state of availability or in unavailability states that result from various possible types of failures.These states can include: availability, unavailability due to a diagnostic test detectable failure, unavailability due to a diagnostic test undetectable failure, unavailability due to both types of failure.-To unavailability states corresponding component repair rates are allocated: In practice a single event may cause a failure of all the components in the subsystem at the same time, which independently of k-out-of-n structure leads to the state of unavailability of the entire system.This type of failures is called failures of a common cause.Their contribution is included in both failures detectable (β D ) and undetectable (β) by diagnostic test and are taken into account in calculations [11].
In the reliability analysis of a safety-related system the probability that in the interval (0, T 1 ) the system is in the state of unavailability is predicted.This probability is predicted separately for each of the subsystems of a safety-related system shown in figure 1.
For such analyses, at assumed constant rate of transitions (failures and repairs of components), Markov processes can be employed.In the presented case it is made difficult because even for a single element there are four possible states on the transition graph (Fig. 2), instead of two states, i.e. availability and unavailability adopted in reliability calculations.And an increase of the number of components of a tested system leads to a fast growth of the number of these states, which in consequence makes the analysis and calculations more difficult.

Application of Markov processes in the threshold k out of n structures reliability assessment
The assessment of the reliability of threshold k-out-of-n structures, in which components can be repaired while the system is being used, can be done employing Markov processes [5,14].Depending on the assumptions based on either operational practice or the recommendations of reliability assessment standards a calculation model can be built in various ways.Consequently, the number of states n ES in which a single component can be found is: while the number of states n S in which the given system can remain is: The absorbing state (S P ) is a state of total unavailability of the system.If the system is in this state, the whole of it (including all the components) is qualified for repair.In practice, the absorbing state may, but not necessarily, be considered in the reliability model.If we assume that the system can be repaired at any time, regardless of the number of failed components and the criterion of its availability, the absorbing state may not be included in the model.In such case the states of availability of the system are distinguished from among all its states (from i = 1 to i = n S ) on the basis of the availability criterion adopted, in reference to separate components that make up the system.
For the three possible forms of failures of components (described in the previous section, when n U = 3), the number of states in which a component can be found is four (following formula ( 7)), which is shown in figure 2 [10].
Examples of the number of states of the entire system n S , depending on the number of components n, are given in table 1. Fig. 2. Graph of transition of a system of 1 out of 1 structure (S1 availability state) [10] Fig. 3. Transition graph for a system of n = 2, built of components of n U = 3 (with interconnected states equal as to system's availability and unavailability) [10] sciENcE aNd tEchNology With an increasing number of a system's elements the number of its states grows fast.This makes it difficult to assess the reliability of more complex systems since it is necessary to create a graph of transition between the states, and also because the possibility of transitions themselves have to be identified.In the case of four possible states of each component, the creation of a transition graph manually, even for a three-element structure, becomes very time consuming.
An example of such a graph (created after a simplified interconnection of the system's states as to its availability was introduced) for a system composed of only two components is shown in figure 3 [10], and a system of differential equations developed on its basis is given below [10] 2 The complexity and time consumption of the reliability assessment procedure employing the presented method make it necessary to search for a method of computer generation of systems of differential equations for systems built of a significant number of components, which is discussed in what follows.

Algorithmization of reliability assessment procedure using Markov processes
Since there is a large number of system's states, even when the number of components constituting the system is insignificant, it is quite difficult to evaluate the probability of a system's transition into the state of unavailability.An example of the complexity level of the problem for a system composed of one, two and three components is shown in the form of a simplified graph of transition between states in figure 4. The graph vertices representing the system's states are numbered, each is marked with a corresponding point that constitutes the vertex, and the edges indicating the paths along which the systems state transitions are marked with lines.For a large number of components figures become less clear-cut.
The rising level of complexity of the considered systems justifies the need to develop a computation program which due to a large number of variables required an adequate form of their notation.A system's state is a sequence of states of particular components, which can be noted as: where: S SYS,i -i-th state of system, S OT,j -states of particular components.
Each of the components can be in one of the four states, which has been noted in a binary code using two digits (00 -represents both types of failures, 01 or 10 -represent one of the types of failures and 11 -represents state of availability).With such notation rule adopted, particular states of the system will be vectors of state of the number of elements twice higher than the number of components in the system, and each element of such sequence will have the value of zero or one.Assuming that the system's states are numbered from zero to 4 n -1, the proposed form of notation will be one-to-one transformation between the state number i and its vector of state, in which the decimal value will be converted into a binary value, or conversely (i.e.i=0 is transformed (00) binary and i=3 is transformed (11) binary like in figure 4a).The proposed notation form provides a considerable saving in computer memory and simplification of the computation algorithm.

Fig. 4. Graphs of transition between states of systems of different number of components (a -a single element system, b -a two element system, c -a three element system)
The algorithm written as a block diagram is shown in figure 5 and its procedure is described below.The input data are the failure and repair rates λ DD , λ DU , µ DD , µ DU , of particular components of the system, the number of system's components n, a minimal number of available components required for the system's state of availability k and time horizon T H .An intermediate result of the program's operation is a matrix of the indices of Kolmogorov system of differential equations, and the final result is the probability of the system being in unavailability state P Nzdat .
To determine the value of the matrix of indices M wsp the probable transitions between particular states and the rates of these transitions have to be determined.This is performed in two loops.In the loop whose condition is expression i < 4 n index i is transformed into the corresponding vector of state S akt through the operation [ ] 2 indicating the conversion of a number in the decimal system into a number in the binary system.
When the vector of state S akt and the minimal number of components required for the system's state of availability k are known, we can verify whether it is the system's state of availability.And if it is, the next element of the value equal to the number of state plus one is b) a) c) added to vector S zdat .The next loop conditioned by expression j ≤ n verifies for each component from among n components all the possible transitions from state S akt to states S jNast1 and S jNast2 following the cycle in figure 5, and allocates the sum of the values of these transitions rates in an adequate place of the matrix of indices M wsp .The rates of returns are also allocated in adequate fields of matrix M wsp .After all the states have been verified, matrix M wsp contains all the indices of the Kolmogorov system of equations, which is resolved employing Runge-Kutta algorithm [12] indicated in the diagram as function ODE23.
The arguments of ODE23 function are: matrix of indices M wsp , time horizon T H and initial conditions W t0 .The results of the function's operation are the probabilities of a system staying in particular states compiled in matrix P zdat .With known numbers of a system's availability states compiled in vector S zdat the fields with the probabilities of the system being in availability state can be separated out of matrix P zdat .When the sums of these fields are subtracted from one, the probability of the system staying in unavailability state P Nzdat can be determined.The computations and results obtained using this program are presented in section 5.

Verification of the proposed program and comparison of computations results
In the study for this article the calculations of the indices necessary for the assessment of safety integrity level were performed also with the use of BlockSim Reliasoft software.The application of a simulation method in the analysis of the time period of correct operation in the BlockSim environment requires the entry of input data.These data which basically include the system's reliability structure, reliability characteristics of the components of the system, simulation duration expressed in the adopted units of operation, and the number of simulation repetitions.The performance of computations results in the values of the probability of the structure's components' failures.Further operations of the simulation program lead to the computation of the system's reliability on the basis of its reliability structure pre-declared by the analyst.
In this study the interval of the periodical test T 1 = 17520 [h] was adopted, next the Monte Carlo simulation parameters were employed.The termination of operation and maintenance adopted was 20000 [h].This period of time was bigger than T 1 due to the property of the computation process whose results on the interval boundaries demonstrate a considerable scatter of values compared with the values within the interval.The computation step expressed by time increment of 10 [h] and the multiplication factor of repetitions equal 100000 cycles of transitions were selected as a result of the optimization of the accuracy and duration of the process of simulation values generation.Since it was necessary to include specified failure fractions λ DD and λ DU , which make up λ D , the computations were performed on specially prepared k out of n structures of equivalent systems.Those selected out of the structures are presented in figure 6.
The computations were done for two values of time T 1 8760 [h] and 17520 [h] and two values of failure rate λ D , expressed by λ DD and Fig. 5. Simplified block diagram of computation algorithm Fig. 6.Selected equivalent block diagrams of k out of n structures analysed sciENcE aNd tEchNology λ DU fractions.All the components in the k-out-of-n structure were assumed to be identical and have the same reliability parameters.The values of the parameters are presented in table 2.
The results obtained with the use of the method of algorithmization of Markov processes, proposed in section 4 of the article, were compared with the results of analytical calculations characterised in sections 2 and 3 and with the BlockSim simulation results.
The results of the calculations of the probability of systems of specified structures staying in unavailability state are presented in tables 3 and 4. Notation "koon" used in tables 3 and 4 and in figures 7 and 8 means "k-out-of-n".
The results of the study are presented in the form of the characteristics of the probability of a system's remaining in the unavailability state which were plotted in figures with a semi-logarithmic net.
The results of the analytical calculations and those performed employing the proposed program contain comparable values.The differences may result only from rounding off in the calculations done in digital system.
The results of calculations following the Block-Sim program are pessimistic compared with those obtained by the other methods.This offers a possibility of a large safety margin.This property is specially justified since the simulation results are not repeatable.One of the significant properties of the simulation method, which is its advantage, is that it is not particularly sensitive to the complexity of the analysed reliability structure and the related calculation difficulties, as well as fast generation of results.This method can also be employed in the simulation of reliability indices of systems when the knowledge of the processes occurring during their operation is insufficient.
The simulation method used in BlockSim packet, which employs a random numbers simulator based on Ecuyer's algorithm and Bayes-Durham sampling, allows the prediction of the values of reliability indices [8,13].The calculation model built on selected operation characteristics enables the simulation of components functioning and servicing process [6].

Conclusions
To verify the program written following the above algorithm six basic k-out-of-n structures were selected used in safety-related systems.The results of calculations performed using the proposed program and after the classical method of building a transitions block, writing equations and their solutions are presented in tables 3 and 4 and figures 7 and 8.
The compatibility of results obtained in the two methods confirms the correctness of the developed procedure and proposed computer program which now offers the possibility of doing calculations for k-out-of-n structures with k > 3 and significantly accelerates calculations.The results of calculations performed after Block-Sim program differ from the other two methods, but yield values of a large safety margin, which is favourable from the practical point of view.
Reliability and safety are priorities in the operation of technical systems.This decides of the applicability of the calculation methods de-  scribed.The operational safety aspects are of particular significance in cases when the occurrence of a failure is a hazard to people's health and life, ecological risk or considerable financial loss.Emails: mlynarski_st@poczta.onet.pl,pilch@agh.edu.pl,smolnik@agh.edu.pl,szybja@agh.edu.pl,wiazania@agh.edu.pl

Fig. 7 .Fig. 8 .
Fig. 7. Results of calculations of the probability of unavailability states of k-out-of-n systems after the proposed algorithm

Table 1 .
The number of states n S of a system built up of components for which n U = 3, at various number of all components n making up the system :

Table 3 .
Comparison of results of calculations for data group 1 and selected reliability structures (a -algorithmization method, b -analytical method, c -simulation in BlockSim program)

Table 2 .
Adopted values of input parameters and formulas for computations This work was financed by AGH University of Science and Technology, Faculty of Mechanical Engineering and Robotics, research program No. 11.11.130.174.

Table 4 .
Comparison of results of calculations for data group 2 and selected reliability structures (a -algorithmization method, b -analytical method, c -simulation in BlockSim program)