On the Calculation of Functional Safety Parameters of Technical Systems

Now the scientific methodology is created, the theory and practice of the analysis and synthesis of functional safety of responsible electronic programmable devices and systems at all stages of their life cycle are developed. The basics of the methodology are fixed by standards. Methods of analysis and synthesis of functional safety are strictly formalized. They are based on the calculations of functional safety indicators with respect to failures of constituent elements and, especially, dangerous and protective failures of the system. Known methods of calculation are focused on determining the intensity and probability of dangerous failures. The objective of the proposed method lies in the fact that, in graph form, without resorting to the solution of the system of equations in the operator transformations to establish the distribution function of time until the threat or security failure, or any unhealthy condition of the system. These distribution functions determine all the necessary indicators of mean time (and, if necessary, the variance of this time) to a dangerous or protective failure. The proposed semi-Markov (Markov) operator method allows to solve a number of problems of calculation and prediction of functional safety of critical (responsible) systems. The method is formalized and suitable for subsequent computer implementation. This fact testifies to the expediency of further development of graph methods, convenient for the study of the safety of complex critical systems, devoid of the shortcomings of the proposed method in terms of the complexity of the preparatory work to determine the analytical expressions of transition probabilities in the Laplace Stieltjes transformations. The given example of using the method has an independent value – it allows you to assess the advantages and disadvantages of ensuring functional safety by building a two-channel system without restarting the channels KeywordsOf functional safety parameters, Hazardous and protection failures, Markov and semi-Markov stochastic processes, The weight of a path in a graph, The weight of decomposition on the graph.


Introduction
The functional safety of safety-critical technical systems has been a focus of attention of experts since the last century (Swir, 1986;Guller, 1991;Braband and Lennartz, 1999). The functional safety of control systems has been investigated in the work of a number of leading scientists (Braband, 2001;Schäbe, 2002;Smith and Simpson, 2004;Gulker and Schäbe, 2006;Bouwman et al., 2009;Kayen and Schäbe, 2009) Critical (dangerous) system faults cause management errors that lead or may lead to fatalities, unacceptable damage to the environment, the economy and the industry's public image. The widespread introduction of information technology, development of safety-critical multi-functional hardware and software control systems eliminated the possibility of manual or even automated identification of all possible causes of dangerous failures. Currently, a scientific methodology, theory and practice of analyzing and synthesizing the functional safety of critical electronic programmable devices and systems at all stages of their life cycle has been developed. Basics of this methodology have been standardized for different branches (IEC 61508-(1-7)-2012), in railway (EN 50126-(1-5):2017, IEC 62278-2098, IEC 62279-2016, IEC 62280-2017, in nuclear energy (IEC 61513-2011, etc.), industrial networks (IEC 61784-2016, etc) and other industries. The methods of analysis and synthesis of functional safety parameters are strictly formalized. They are based on the calculations of functional safety of technical systems that may be in the following states: i. Functioning or defective state -the state of the system in which all the requirements of technical documentation are provided or at least one of these requirements is not provided, respectively. ii. Healthy or unhealthy state -the state of the system, in which the values of all parameters characterizing the ability to perform specified functions, meet the requirements of technical documentation or not provided the value of at least one parameter, respectively. iii. The protective state -that is, the state of the system in which the performance of all the planned system functions is disabled in case of timely detection of a failure of any control element or a breach of control safety. iv. Hazardous state -i.e. a down state of a system, in which at least one safety function is not performed. v. Non-hazardous condition -the operational or protective state of the system. These states and their interrelation are illustrated by the images of set theory. Let the complete set of safety states of the system (or software, in the case of its autonomous consideration) is denoted by the symbol S.
Each control system has at least two safe states: normal operation state and shutdown state (the system is off).
It is assumed that a system is free from failures in the normal operating state. A shutdown state is typically a state, in which a system does not perform system functions. A safe shutdown state is to be achieved within a relatively short period of time through the termination of system functions. The termination of system functions can be an active process (an additional function of the system).

Problem Definition
It is assumed that the mathematical modeling of functional safety parameters of the system under consideration is carried out using a semi-Markov or Markov random process and a system state graph.
The initial data is as follows: , where S is a finite set of vertices (states) of the system; H is a finite set of arcs between vertices i, j (states j i S S , ).
• Criterion of a dangerous failure in the form of a set of operable or non-dangerous states S S N  , a set of dangerous failure states , and also the • Protective failure criterion in the form of a set of protective states N P S S  , a set of operational or non-dangerous and unsafe states of conditional distribution functions of the time a system is in specific states (vertices) of the graph, the adjacency matrix and the distribution vector of initial probabilities for the ergodic or transient states. If the behavior of a system is described by a Markov random process, it suffices to set the matrix of transition intensities between adjacent vertices ( ij  ), where ij  is the rate of failures and recoveries of one element of the system in the i-th state, as a result of which it goes into the adjacent j-th state.
The problem consists in finding formulas that allow using standard procedures for finding paths and contours to calculate a number of indicators that are essential for the rational design of safe systems: mean time to dangerous failure D T ; mean time to protective failure P T ; dispersion of time to dangerous failure D D or protective failure P D (if required for the study).
The list above does not include the indicators of the rate and probability of hazardous failures set forth in standards (IEC 61508-(1-7)-2012). They can be identified using the above safety indicators.
It is assumed that the processes of occurrence and elimination of hazardous and protection failures can be simulated using the mathematics of random Markov or, more generally, semi-Markov random processes. Due to the large number of states of the systems under study and, consequently, the increasing number of equations, it is known that the solution of large systems of equations is in many cases complicated. It is preferable to determine the desired indicators of safety and dependability of systems directly on the state graph. The well-known Markov and semi-Markov graph methods (Shubinsky,1985;Rinske and Ushakov, 1988;Shubinsky and Zamyshlyaev, 2012;Pronevich and Shved, 2018) have great advantages in terms of solution technique, since it suffices to once perform well-formalized procedures for finding paths and contours on graphs to identify the dependability indicators of a complex system. However, these methods are not geared towards solving problems of functional safety and, in addition, are not universal enough for a wide class of technical systems.

The Method of Solving the Problem
In some problems of calculating the safety and reliability indicators of systems, there is a practical possibility of moving from the description of Markov or semi-Markov random processes of system behavior using differential equations to the description of system behavior using operator Laplace -Stieltjes transformations (Korolyuk, 1965;Kashtanov and Kondrashova, 2012;Viktorova and Stepanyants, 2014;Schäbe and Shubinsky, 2016). The objective of the proposed method is to establish in graph form, without resorting to solving a system of equations in operational transformations, the time distribution functions to a dangerous or protective failure or to any non-working state of the system. Using such distribution functions, all safety indicators listed in clause 2 above are in operator form. To do this, the following input data is to be specified: , where S is a finite set of vertices (states) of the system; H is a finite set of arcs between vertices i, j (states  square matrix ( ) (t F ij ) of conditional distribution functions of the time of the system being in particular states (vertices) of the graph, the adjacency matrix and the distribution vector of initial probabilities for the ergodic or irrevocable states.
Theorem. The distribution function of time to dangerous failure of a system, the behavior of which is described by a semi-Markov random process, in the Laplace-Stieltjes transforms at the i-th initial state (

etc. are the weights of independent contours on a graph in
Laplace-Stieltjes transforms.
Proof. In, Korolyuk (1965), it is shown that the distribution function of the time a system is in a fixed set of states N S in the Laplace-Stieltjes transforms can be obtained from the equation We transform this equation into the matrix form, bearing in mind that the right side of the equation is a column vector of free terms of semi-Markov transition probabilities in one step from In the system of equations, the elements of the column vector are unknown. After grouping them on the left side we obtain: Then, according to the Cramer rule, we find ) ( and when we replace index l with j, we obtain the desired result. The theorem is proved. • mean time to failure (any down state)

Example of Calculation
Let us illustrate the analytical findings with an example of calculation of the average time to dangerous failure. The system contains two identical and independent data processing channels, as well as diagnostic facilities that test the status of each channel and compare their outputs with a periodicity less than the allowable time of detection of single failures. Information is read, if the channel outputs match. Channel failure is asymmetrical. If the diagnostic facilities are operable, the fact of failure of any one channel is identified, upon which the system is put in the state of protective failure. In the event of failure of the diagnostic facilities, a non-dangerous failure occurs. The following channel failure causes the dangerous failure of the system. A dangerous failure can also occur if the diagnostic facilities ignore a channel failure. The graph of the safety states of a two-channel system with in-built diagnostic facilities is shown in Figure 1. The states are as follows: 0: operable state; 1: failure of diagnostic facilities; 2: protective failure initiated by the user or an automatic circuit in case of detection of a failure of any one of the channels by standard diagnostic facilities with the probability of ν; 3: undetected failure of a channel due to failure or insufficient effectiveness of diagnostic facilities (dangerous failure).
The sets of non-hazardous or protective states, respectively The functional safety model of the two-channel system in Figure 1 involves the following logic of operation: 0 is the initial state (all elements of the system are correct). If the diagnostic facilities fail, the system goes into state 1. If anyone channel (state 2) has failed and the channel failure was timely detected with probability ν, the system goes into a protective failure state (the system does not function, the channel is under maintenance). In case of hidden channel failure with probability     1 or one channel failure upon failure of diagnostic facilities (path 0 -1 -3), the system is put into state 3 of dangerous failure.
The solution of this model will consist in the analytical definition of the indicator of mean time of wrong-side failure of such two-channel system. Given the above, the behavior of the system is described with a Markov random process. For the solution of the problem involves the preliminary definition of the distribution functions of the unconditional time of the system being in the specific states of the graph and in the Laplace-Stieltjes transforms source parameterstransition probabilities.
The distribution functions of the time of the system being in the graph states in Figure  . If the elimination of a wrong-side failure does not require the modification of the device, then c = 1 and the rate of elimination of a wrongside failure is equal to the rate of recovery of the device. If the device requires a modification,