SFRs-based numerical simulation for the reliability of highly-coupled DFTS Metoda symulacji numerycznej oparta na pojęciu zakresów uszkodzeń sekwencyjnych służąca do obliczania niezawodności układów modelowanych

Event tree/ fault tree (E/FT) method is the most recognized probabilistic risk assessment tool for complex large engineering systems, while its classical formalism most often only considers pivotal events (PEs) being independent or time-independent. However, the practical difficulty regarding phased-mission system (PMS) is that the PEs always modelled by fault trees (FTs) are explicit dependent caused by shared basic events, and phase-dependent when the time interval between PEs is not negligible. In this paper, we combine the Bayesian networks (BN) with the E/FT analysis to figure such types of PMS based on the conditional probability to give expression of the phase-dependency, and further expand it by the dynamic Bayesian networks (DBN) to cope with more complex time-dependency such as functional dependency and spares. Then, two detailed examples are used to demonstrate the application of the proposed approach in complex event tree time-dependency analysis.


Introduction
Dynamic fault trees have been presented [6,7,8] as an extension of traditional static fault trees with the aim to model complex systems having sequence-and function-dependent failure behaviors.Such modeling techniques are widely used in Nuclear Power Plant (NPP) industry, space mission systems and chemical process plant where systems safety is emphatically focused.The problem is how to quantify the reliability index of complex systems modeled by highlycoupled DFTs.Markov-based methods [1,15,19] have been proved to be efficient and versatile.But these approaches are subjected to the problem of "state space explosion".To mitigate the scale of the system state space to be considered, some hierarchical methods [11,23,24] (i.e., modularization techniques) are developed.Such hierarchical approaches can greatly reduce the Markov state space using a "divide and conquer" strategy under some circumstances.Yet these techniques become unfeasible when the independent sub-modules are placed under a dynamic gate.The IE-based approach [14,18] is a combinatorial method based on enumerating the complete minimal cut sequences/sets (MCSs) of a considered DFT.In contrast to Markov-based methods, the IE-based approach is efficient since it does not require highly-coupled DFTs converted to state space forms.To calculate the reliability of a considered DFT, the complete minimal cut sequences/sets would be rewritten using the inclusion-exclusion principle.Given that a DFT has n MCSs, the IE formula would generate 2 n -1 logic products.Hence the IE-based approach is vulnerable to the problem of combinatorial explosion.
To overcome the shortcomings of existing methods mentioned above, sequence failure regions (SFR)-based numerical simulation approach is first proposed.This method relies on the complete MCSs, i.e., minimal cut sequence set (MCSS), of a considered DFT.Some achievements for finding the MCSS have been made as: Liu et al [13] proposed a series of inference rules to generate the MCSS of a given DFT; Shrestha et al [20] put forward a sorting algorithm to enumerate the MCSS; Merle et al [17] presented several temporal operators and related operation rules to deduce the structure function of a DFT which finally can be reduced to the MCSS.Actually, as to numerical simulation for reliability of a system modeled by DFT, some researchers [2,9,12,25] have made prospective studies and applications.Unfortunately, such simulation-based methods are just based on different dynamic gates.To the author's knowledge, no articles have presented a numerical simulation approach for a generalized minimal cut sequence (GMCS) as well as a highly-coupled DFT.By contrast, the proposed method is considered to be a universal numerical simulation tool for non-repairable DFTs with arbitrary distributions, including a GMCS.Results of the validation example indicate the proposed method is reasonable.

Dynamic Logic Gates
To characterize sequence-and function-dependent failure behaviors existing in many real-life systems, Dugan et al [5,6] introduced several new dynamic gates, such as Sequence Enforcing (SEQ) gate, Priority AND (PAND) gate, Function Dependent (FDEP) gate, Cold Spare (CSP) gate, Warm Spare (WSP) gate, and Hot Spare (HSP) gate.Such dynamic gates are integrated into static fault trees to form DFTs. Hence, the occurrence of a considered DFT not only depends on the combinations of basic events, but also depends on their failure orders.Figure 1 shows the commonly-used dynamic gates for a DFT.

SEQ gate: (a)
SEQ gate forces the input events to fail in a left to right order.That is to say an input event never fails before all the input ones to its left hand have already failed.As to the SEQ gate in Fig. 1  rence of a trigger event may cause other dependent components unusable, but the occurrence of dependent events does not have any effect on the trigger event.As to the FDEP gate in Fig. 1 (c), T is a trigger event of which the occurrence would cause dependent components A, B, and C unusable.CSP gate: CSP gate allows modeling of the case where the spares (d) stay at an unpowered state when the primary event operates normally, That is to say cold spares never fail before the ones to its left.Hence, the failure behavior of CSP gate is similar to SEQ gate.For the CSP gate in Fig. 1 (d), A as the primary event fails first, then the first cold spare B fails, at last the second cold spare C fails.WSP gate: Unlike CSP gate, the spares in WSP gate operate at a (e) reduced power when the primary event operates successfully.It means that warm spares can fail independently in standby state and all of the possible failure sequences may occur.HSP gate: In a HSP gate, the spares run at a full power when the (f) primary event operates normally.Its failure behaviors are logically equivalent to static AND gate.

Minimal Cut Sequence Set
In DFTs, the occurrence of the top event not only relies on the combinations of basic events, but also relies on their failure sequences.Apparently, traditional minimal cut set is unable to express such failure behaviors.To solve this problem, the concept of a minimal cut sequence (MCS) is first proposed by Tang and Dugan [21] to express the minimal failure sequence that leads to an occurrence of a DFT's top event.As to a general MCS, it can be written as A 1 →A 2 →⋯→A n where the capital letter A i represents a basic event denoting an occurrence of a failure, and the symbol "→" indicates the order of failure precedence, i.e., the left event failing before the right one.Hence, a specific MCS expresses what events and in what ways of failing sequences that leads to an occurrence of a DFT or a module.As mentioned in section 1.1, the failure of the spares always depends on the primary event.To reflect such dependence in a MCS expression, three special symbols are introduced as: A i A 0 j , A i A α j and A i A 1 j .The symbol A i A 0 j represents A j fails as a cold spare of A i and means A j fails after A i , A i A α j denotes A j fails as a warm spare of A i in standby state and implies A j fails before A i fails, and A i A 1 j indicates A j fails as a warm spare of A i after replacing the faulty primary unit and implies A j fails after A i .Obviously, as to a non-repairable DFT, the complete MCSs, i.e., minimal cut sequence set (MCSS), can characterize its failure logic.Supposing that a DFT has n MCSs, the system failure logic (SFL) can be expressed by: where, MCS i represents the i th MCS.Hence, for the WSP gate in Fig. 1 (e), the SFL can be written as: 3. SFLD and SFR

Sequence Failure Logic Diagram
A specific MCS is just a logic relationship and only provide qualitative information.To reflect the inner failure mechanisms of a MCS, a sequence failure logic diagram (SFLD) is introduced which is a graphical description of a MCS.In a SFLD, the failure behavior of an event is expressed by its time to failure, the vertical axis represents the failure sequence of a specific MCS where each event is placed according to its position located in the considered MCS, and the horizontal axis indicates time.To illustrate such SFLD, a complex DFT is introduced in Fig. 2, where three typical dynamic gates are highlycoupled together.
Applying the inference rules presented in Ref. [13], the SFL of the considered DFT can be expressed as: In this article, we use τ X to represent the time-to-failure of X in a working state at full power, and use τ X to express the time-to-failure of X in a standby state at a reduced power.Assume the system starts at t=0, and mission time is t m .Take the first MCS A→ 1  A B→C→ 0 C D for example: A starts at t=0, and then fails in the region (0, t m ); B also starts at t=0, first it must survive the primary A, and then fails after A in working state; C starts at t=0 as well, and then fails after B; D starts after C fails, and then fails before t m .And its SFLD is drawn in Fig. 3.

Sequence Failure Region
In our previous paper [10], we put forward probabilistic modelbased multi-integration formulas to quantify a GMCS and pointed out that the occurrence probability of a GMCS can be obtained by doing integration of the random variables over the valid sequential intervals referring to time to failure of components involved in a GMCS.That is, if and only if the events occur in their valid intervals that leads to occurrence of the considered GMCS.Hence, for a GMCS, such valid sequential intervals can be considered as a sequence failure region (SFR).As to a GMCS: A 1 →A 2 →⋯→A n , the SFLD with a SFR is shown in Fig. 4.
The R(τ A i ) represents the valid failure region of τ A i ; a i (1<i≤n) denotes the start point of A i considering some components do not need to start at t =0, such as cold spares, and 0≤a i ≤t m .As discussed in Ref. [10], the R(τ A i ) can be always expressed by: where φ ϕ i i is a linear expression representing the lower / upper boundary of R(τ A i ), and the explicit expressions of ϕ i and φ i are defined by the specific MCS.Note that the Eq. ( 4) never considers the cases where some warm spares fail after replacing the faulty primary units.For such cases, the reference [10] points out that it is okay to add prerequisites that the warm spares survive the faulty ones in a standby state.Supposing that A 2 is a warm spare of A 1 , then the Eq. ( 4) should be rewritten as: , , , , , , , , , , , ,

sciENcE aNd tEchNology
As to the MCS expressed by Eq. ( 4), the corresponding SFR can be expressed as: where the Ω f indicates the failure region of the considered MCS.And for the MCS expressed by Eq. ( 5), the failure region can be represented by: For a general MCS with k (k<n) warm spares failing in a working state at full power, its failure region can be also obtained inferentially from the Eq. ( 5).Here, we assume that a DFT has m MCSs, and the sequence failure region of the j th MCS can be expressed as Ω f-j .According to the Eq.(1), the system sequence failure regions (SFRs) can be expressed as: where the Ω sys represents the failure region of a system.

Theoretical Foundation-CMC
The crude Monte Carlo (CMC) method is often used to study a probability problem with a statistical simulation through converting the analytical model under study into a probabilistic model.Given a set of variables sector X={x 1 , x 2 , …, x n }, and X⊆R (n) where R (n) represents a n-dimensional real space, the failure probability of which the X occur in the failure region Ω f ={X| g(X)<0} can be calculated by: ( ) ( ) ( ), where f (X) is the joint probability density function (PDF), and I[g(X)] is an indicator function which is defined as: However, the Eq. ( 9) cannot be solved analytically when the explicit inverse function of f(X) does not exist.Thanks to the rule of large numbers, the P f can be evaluated approximately by a statistical simulation approach, i.e., CMC method, using the following statistical expression: Based on the Central Limit Theorem, the following equation must hold for any nonnegative number x: where σ P f  2 is the variance of the P f  , and As N is chosen large enough, we can get the approximate equation as: lim , where (1−α) is the confidence level.Then, the absolute error for the P f  can be evaluated by: where the z α /2 is the quantile of the α 2 .And the relative error for the P f  can be also expressed by: Considering P f  is a small amount, the simulation number N is approximately expressed as: Obviously, given a relative error ε r and a confidence level (1−α), the simulation number N is inversely proportional to P f  .In general, the value of ε r is set as 0.1 and the confidence level is defined as 0.95, then the simulation number N should be chosen as: N P f = 384  .

SFRs-based CMC for a highly coupled DFT
To explain the proposed method, the GMCS indicated by Eq. ( 4) is considered once again.The analytical solution to the considered GMCS can be obtained using a sequential multi-integration by: Where R +(n) represents a n-dimensional positive real space; f(τ) is the joint PDF; is the indicator func- tion, and cannot be found explicitly in some cases, and the Eq. ( 17) is calculated numerically.Note that the numerical computation complexity would reach up O (M n ), where the M is the number of equal slices of dividing R i i ( ) τ A .Hence, solving such n- embedded integral by numerical integration method is very time consuming, especially a result with a high accuracy is needed.In this section, a SFRs-based CMC for simulating the occurrence probability of GMCS is proposed.Suppose that the simulated sample sciENcE aNd tEchNology , , , n .Then, the P GMCS_f can be evaluated using the CMC method as: where the statistical indicator function (SIF) I[h(τ τ  )] is defined as: The simulated sample point τ τ  can be obtained using a random sampling approach.Given that the cumulative distribution function (CDF) of τ A i is F (τ A i ), then the τ A i can be always expressed as: And the sample point can be sampled by where ε is a uniform random number used to replace F(τ A i ) in Eq. ( 20), and ε can be obtained in [0,1] by any standard random number generator.Suppose that τ A i follows the exponential distribution with a fail- ure rate parameter λ, and the f(τ A i ), F(τ A i ) of τ A i provided by the following expressions: Then the τ A i is expressed as a function of F(τ A i ), i.e., G(F(τ A i )).
The simulation procedures for the SFRs-based CMC for a GMCS are shown in Algorithm 1. Algorithm 1.
Step 1.Let the failure number N GMCS _ f =0.
Step 2. Generate the sample point τ τ     
Step 5. Transfer to Step 2 in case that the total simulated number does not equal a given N.
Step 6.Output the occurrence probability of a given GMCS: Now we will discuss how to simulate a highly coupled DFT.As mentioned in section 3.2, an occurrence of any MCS can lead to the failure of a considered highly coupled DFT, and system failure region can be expressed as: . Given that the DFT under study is non-repairable, the system fails only once in its lifespan.That is to say at most one MCS occurs in a simulation.Given that a DFT has m MCSs and n input events.Then referring to the Algorithm 1 for a GMCS, the complete SFRs-based numerical simulation procedures for a highly coupled DFT are shown in Fig. 5, where the P syst_f is the simulated unreliability of a considered system.

Validation Example
In this section, the illustrative example in Fig. 2 is considered for a validation purpose.In the first case, suppose that all of the components are exponentially time-to-failure distributed, and their failure parameters are listed in Table 1.
Given the MCSS expressed by Eq. ( 3), the SFLD of each MCS with its SFR can be drawn.As an illustration, we present the specific SFLD of the third MCS with its SFR (Fig. 6), which represents the Obviously, the results obtained by the SFRs-based simulation method are in good agreement with those derived by the IE-based method.For the computational efficiency, the average computing time for SFRs-based approach (N=1.0e+6) is about 3.09 mins, yet the average computing time for IE-based method (M=100) reaches up 324.7 mins.Hence compared with the IE-based method, the SFRsbased simulation approach is more efficient.

A case study
The WPS (water pumping system) is a critical-safety system for PWR (Pressurized Water Reactor) and it is used to carry of the reaction heat of reactor core by pumping coolant from the water source.If the system loses its function, it will cause a severe consequence.Hence, it is quite significant to analyze the reliability of the system.
The system is operational requiring at least two pumps to be successful.The system consists of three pumps among which pump A and B are operating under normal circumstances, and C as a cold spare stays at an unpowered state.Once some pump fails, the pump C will be started by a switch D to replace the faulty one.The switch is controlled by a sensor system E which is used to detect the failure signal of the active pumps.As soon as a failure signal is received, the sensor system E will activate the cold spare C through controlling the switch D. Hence, the WSP fails if pump A or B fails after D or E fails.In addition, the sensor system E is dependent on the power suppliers P 1 and P 2 among which P 1 is the primary supplier and P 2 is a cold spare as P 1 .The simplified DFT model of the system is shown in Fig. 8.
Given that the time to failure of pumps follows the lognormal distributions, the failure parameters are: mean: μ A,B,C =15, variances: σ A =25, σ B =30, σ C =35.The switch D follows the uniform distribution in the lifespan [0, 10 4 h].Power suppliers P 1 and P 2 are the Weibull distributions with the arguments: m P 1 =2 (shape), η P 1 =80 (scale); m P 2 =2 (shape), η P 2 =100 (scale), and the sensor system E is expo- nentially distributed with failure rate λ E =1.0e-4.The system failure logic can be expressed using its MCSS by: Considering that there exist non-exponential distributions in the considered system, Markov-based approaches are not longer applicable.The IE-based method is suitable for such case, yet the IE (inclusion-exclusion) formula would generates 2 10 -1 (1023) logic terms, most complex failure behavior.Note that the component B failing in working state means that the B must survive the primary A. That is to say τ τ B > A .Hence, the SFR of the MCS can be expressed as Similarly, the specific SFRs for other MCSs are also obtained.Now we use the SFRs-based simulation method to evaluate the reliability of the considered example system.For a comparison purpose, the Markov-based approach is adopted as a benchmark.The results obtained by the two methods are shown in Fig. 7.
With the simulation sample size N=100, the ε r (relative error) of the results obtained by SFRs-based simulation and Markov-based approaches is notable.Yet with the increasing of N, ε r becomes smaller and smaller.As the simulation sample size reaches up 1.0e+4, the results derived from the two methods are matched.
Without loss of generality, the case with general distributions is also considered, where A follows the Weibull distribution with arguments (shape: m=2, scale: η=80), B is the exponential distribution (s*=4.0e-3/h,a*=2.0e-2/h),C follows the lognormal distributions with parameters (μ=15, δ=10) and D is the exponential distribution with failure rate 1.5e-2/h.Given that the Markov approach is not applicable for non-exponential distribution situations, we adopt the IE-based method as a benchmark where each cut sequence is solved numerically.The results obtained at different mission times are listed in Table 2.  and the logic terms should be further expanded into disjoint cut sequences as the repeated events appearing in different MCSs.Hence, to calculate the unreliability of the WPS system, the IE-based approach would produce tens of thousands cut sequences.It is a very tedious and error-prone process, and furthermore, as mentioned in section 4.2, the computational complexity to solve a cut sequence would reach up O (M n ).Hence, it is very time-consuming by applying the IE-based method.To make an efficient analysis of the system reliability, the SFRs-based simulation approach is applied.The results at different sampling sizes are listed in Table 3. Obviously, the SFRs-based simulation method can offer reasonable solutions efficiently.

Conclusion
In this paper, the SFRs-based numerical simulation approach is proposed to analyze a highly coupled DFT on its MCSS.This method is not only applicable for a DFT, but also applicable for a GMCS which is a significant contribution of this paper.The complete simulation procedures are provided.The results of the case study indicate the proposed method can offer reasonable solutions with an affordable computing time.
As to low probability events, the proposed method is time-consuming, which can be viewed as a disadvantage of this approach.In the feature work, we are focused on advanced sampling techniques to improve its efficiency, such as importance sampling [4,22], adaptive importance sampling [16,3], and etc.
(a), the only failure sequence is that A fails first, then B fails, and C fails last.PAND gate: PAND gate is used to detect certain failure sequences (b) of input events [7].In a PAND gate, different failure orders are permitted, but only a specific failure sequence (from left to right) leads to the fire of the gate.For the PAND gate in Fig.1 (b), there exist two failure sequences: A fails first, then B fails; B fails first, the A fails.But only the former failure order can lead to the firing of the gate.FDEP gate: FDEP gate characterizes a situation where the occur-(c)

Fig. 1
Fig. 1 Dynamic gates used in DFT

Fig. 5 .
Fig. 5. Flow chart of SFRs-based CMC method for a highly coupled DFT

Fig. 7 .
Fig. 7. Comparisons of the results under exponential distributions

Table 3 .
the results obtained by SFR simulation