AvAilAbility AnAlysis for A multi-component system with different kout-of-n : G wArm stAndby subsystems subject to suspended AnimAtion

Industrial equipment or systems are usually constructed with kout-of-n:G subsystems in series to fulfill a specified function[14]. The k-out-of-n:G structure is a common type of redundancy used to improve the reliability and availability of engineering systems. A kout-of-n:G system refers to a system that is functional if and only if at least k out of n components within the system are functional[31]. Two types of components exist in the k-out-of-n:G system, which are acYu WAng Linhan guo Meilin Wen Yi YAng


Introduction
Industrial equipment or systems are usually constructed with kout-of-n:G subsystems in series to fulfill a specified function [14].The k-out-of-n:G structure is a common type of redundancy used to improve the reliability and availability of engineering systems.A kout-of-n:G system refers to a system that is functional if and only if at least k out of n components within the system are functional [31].Two types of components exist in the k-out-of-n:G system, which are ac-

Yu WAng Linhan guo Meilin Wen Yi YAng
AvAilAbility AnAlysis for A multi-component system with different k-out-of-n:G wArm stAndby subsystems subject to suspended AnimAtion

AnAlizA Gotowości systemu wieloelementoweGo skłAdAjąceGo się z różnych wyGAszAlnych podsystemów typu k-z-n:G stAnowiących rezerwę ciepłą
Industrial equipment or systems are usually constructed as a multi-component series system with k-out-of-n:G subsystems to fulfill a specified function.As a common type of standby, warm standby is considered in the multi-component series system with k-outof-n:G standby subsystems.When a subsystem fails, the non-failed subsystems are shut off and cannot fail, which is defined as suspended animation (SA).If the SA is ignored the non-failed subsystems are assumed to keep working in the SA time, which will cause inaccuracy in the availability analysis for the system.In this paper, we focus on the SA to construct an availability model for a multi-component series system with k-out-of-n:G warm standby subsystems.Multiple continuous time Markov chains are constructed to model the system availability.A Monte Carlo simulation has been carried out to verify our method.Several interesting findings are obtained.1) The failure rates of subsystems with SA and their limits are derived.
2) The closed-form expressions for the stationary availability of the system and subsystems, mean time to failure, mean time to repair and stationary failure frequency are obtained considering SA. 3) The system stationary availability is a monotone function for its parameters.4) The SA effect on the stationary availability should be emphasized in two cases, one is both the value of n/k and the failure rate of active components in a k-out-of-n subsystem are relatively large or small, the other is both the value of n/k and the repair rate are relatively small.Keywords: availability; multi-component series system; k-out-of-n:G warm standby subsystem; suspended animation; Markov process.

sciENcE aNd tEchNology
tive components and standby components [23].A standby component switches into the active state upon an active component failure [29].Warm standby has a general expression for the system reliability and availability.It is worthwhile to study the multi-component system with warm standby.According to the level of working load on the standby component, the standby component is classified into three types: hot, cold and warm standby [1,6].Hot standby implies that the standby component has the same failure rate as the active component.Cold standby implies that the standby component has a zero failure rate.The failure rate of warm standby is between cold and hot standby.
Subsystem-independence assumption can cause the inaccuracy of the system availability for the multi-component system.Some studies on a series system with k-out-of-n:G subsystems assumed that the subsystems work independently [7,8,11].When a subsystem fails, the non-failed subsystems are shut off and cannot fail, which is defined as suspended animation (SA) [4,18].This phenomenon indicates that the subsystems are dependent.If we assume that the subsystems are independent, SA is then ignored.That is to say, the non-failed subsystems are assumed to keep working in SA time, which could result in an inaccurate estimation of the system availability.
In recent studies, some scholars have investigated the shut-off rule.The shut-off rules include SA and continuous operation (CO) [9,12].The SA rule specifies that no component operates when the system is down.The CO rule specifies that non-failed components continue to operate even if the system is down.The functional subsystem cannot be shut off because of the failed subsystem.The subsystemindependence assumption has no impact on the accuracy of the system availability subject to CO.However, to obtain more accurate availability, SA should be considered when we analyze the availability for a multi-component series system with different k-out-of-n:G warm standby subsystems.SA has been analyzed by some scholars in the series system and single k-out-of-n:G system.
The availability analysis for SA in a series system consisting of multiple components has been studied.Most of the studies obtained closed-form expressions for the system stationary availability.SA was first defined by Barlow and Proschan [4].They analyzed the SA states of components in a series system and derived the system availability.
The system structure has two levels.one is the system, the other is the components.Khalil [13] studied the shut-off rules of SA and CO in a series system.The availability model was constructed for the series system with exponential lifetime components.The closed-form availability was derived based on the convolution integral.Sherwin [24] discussed the calculation of the steady-state availability for a series system with SA.Pham [22] proved that the steady-state availability of a series system subject to SA is always larger than that subject to CO. Wang and Pham [26] analyzed a series system subject to SA considering the imperfect repair and the correlation of failure and repair.They assumed an arbitrary distribution of uptimes and downtimes of components and derived availability indices including system stationary availability, mean time to failure (MTTF), mean time to repair (MTTR) and stationary failure frequency (SFF).
The following studies have considered SA in single k-out-of-n:G systems with hot standby [3], cold standby [17,25] or warm standby [27,28] components.Moghaddass [20] et al. studied a k-out-of-n:G system with hot standby components and R repairmen.They investigated the system availability under different shut-off rules and derived closed-form expressions for the system stationary availability, MTTF, and mean time to first failure.Amiri and Ghassemi-Tari [2] performed a transient analysis for the k-out-of-n:G system subject to SA.A Markov model was constructed, and the diagonalization method was used for the transient analysis.They obtained the transient availability and MTTF of the system.Moghaddass [21] et al. analyzed the availability of a homogeneous k-out-of-n:G system with hot standby components under SA considering repair priority and finite repairmen.Moghadd-ass and Zuo [19] modeled the SA to analyze the availability of a k-outof-n:G cold standby system considering repair priority.Kuo [15] et al. focused on SA to analyze the availability of a k-out-of-n:G system with warm and cold standby components.The availability model was constructed using a retrial queue at the repair facility, and the stationary availability, MTTF, and MTTR were derived.Zhang [32] et al. investigated a k-out-of-(M+N):G warm standby system with two different types of components subject to SA. Xie [30] et al. analyzed a k-out-of-n:G system jointly with hot standby redundancy and spare parts.The shut-off rule of the mixture of SA and CO was considered to analyze the system availability.An approximation of the system stationary availability is obtained.
Recently, some researches have studied the availability model of the multi-component series system with k-out-of-n: G subsystems.However, most of the models failed to consider subsystems dependence due to SA [7,8,11].There are two articles most related to our work considering SA in such a system.One is that Cekyay and Ozekici [5] investigated the availability for a multi-component series system with k-out-of-n: G subsystems with exponential lifetime components considering SA.Only a continuous time Markov chain (CTMC) describing the system available states at the component level was constructed and the system stationary availability was obtained.The other is that Huffman [10] studied a multi-component series system with k-out-of-n: G hot standby subsystems considering SA.The repair begins if a failed component occurs and the repair makes the failed component brand new.They calculated the mean up-time and down-time of the subsystems based on the result derived by Li [18] et al. and substituted them in the equation proposed by Barlow and Proschan [4]: where θ and φ i are the failure rate and repair rate of components respectively.
Our work is different from the works mentioned above.Most existing studies considered SA to investigate the single k-out-of-n:G standby system or the series system consisting multiple components.We focus on the multi-component series system with different k-outof-n:G warm standby subsystems considering SA.Although two studies [5,10] are closely related to ours, the assumption of Eq. ( 1) was not satisfied in the Huffman's model [10].In the Huffman's model, the repair time of a subsystem can be overlapped with that of the other subsystems.The assumption in Eq. ( 1) is that the repair time of each subsystem is independent of each other.The CTMC constructed by Cekyay and Ozekici [5] has the problem of state space explosion if the number of component type is large.Moreover, both studies failed to obtain closed-form solutions.In our paper, we model the dependence among the repair times of the multi-component k-out-of-n:G warm standby subsystem to analyze the system availability.We use multiple CTMCs to derive the system availability at the subsystem level to avoid the state space explosion.Moreover, the closed-form solution of the system availability is obtained.
The contributions of this paper can be summarized as follows.
We consider the SA in a multi-component series system with 1) different k-out-of-n:G warm standby subsystems and use multiple CTMCs to model the system.We effectively avoid the state explosion by constructing the 2) CTMC model at the subsystem level.We derive the closed-form expressions for the stationary avail-3) ability of the system and subsystems, MTTF, MTTR, SFF based on the proposed CTMC model.We discuss the property of the stationary availability function 4) for k, n, failure rate, and repair rate.

sciENcE aNd tEchNology
The remainder of the paper is organized as follows.In section 2, the problem is described in detail, and the assumptions and notations are provided.In section 3, the mathematical model is given for the subsystem and system transition process.Then, the closed-form expressions of the stationary availability of the system and subsystems, SFF, MTTF, and MTTR are derived from the model.We also discuss the monotonicity of the system stationary availability function.In section 4, we carry out three numerical examples.The first example is a Monte Carlo simulation to verify our model.The second example is a comparison between the method with subsystem-independence assumption and the proposed method.The third example is a sensitivity analysis for the difference between the two methods in term of system stationary availability.Finally, conclusions and future research are presented in section 5.A and

System description
We consider a multi-component series system consisting of m different k-out-of-n:G warm standby subsystems, as shown in Fig. 1.
has i n identical and independent com- ponents.There are i k active components and i i n k − warm standby components in subsystem i .Subsystem i fails when less than i k out of i n components are functional.The system is functional only if all subsystems are functional.When a subsystem fails, the other subsystems are in the SA state.In the SA state, the non-failed subsystems cannot operate or fail.We assume that the repair of subsystem i begins upon the number of failed components being greater than ( ) nk − in the subsystem.When the repair is complete, the system restarts to operate.
We define a transition process ( ),0 St t≥ to describe the system states with SA.Let ( ) 0 St= denote that the system is operating at time t , and ( ) { } ( )  denote that the system is down at time t due to the failure of subsystem i .The operating state can transit to one of the failure states, vice versa.A failure state cannot transit to the other failure states since no failure occurs when the system is down.
The aim of this work is to consider SA for the availability analysis on a multi-component system with different k-out-of-n:G warm standby subsystems.The system availability is the probability that ( ) 0 St= .The state probability can be solved if we have the transi- tion rates between state ( ) 0 St= and ( ) The transition rate from ( ) St= is the repair rate of subsystem i .In this paper, we assume the repair rate of subsystem i is a constant i µ . However, the transition rate from state ( ) sciENcE aNd tEchNology

Assumptions
The system fails when anyone of the subsystems fails.Subsys- (1) tem i fails when less than i k out of i n components are func- tional.The lifetimes of active components and standby components (2) in subsystem i are independent and exponentially distributed with the parameters i λ and λ λ λ ) , respectively.The failure of active or standby components occurs only when the components operate.Upon a failure of an active component, a standby component (3) instantly switches into the active state with 1 probability, if the standby component is available.
The repair of a subsystem will not start until the number of (4) failed components in the subsystem reaches 1 i i nk −+ and the repair makes all failed components in the subsystem brand new.The repair time of subsystem i is exponentially distributed with the µ i parameter.The occurrence of more than one failed subsystem is an impos- (5) sible event.
As an example, we plot a sample path of ( ) S t of a series system with 2 k-out-of-n:G subsystems and a corresponding path of ( ) i I t .We depict more detail of the transition process , ii X t I t as follows.Denote i r as the number of failed components in sub- system i .Due to a failure of one component in subsystem i , the state transits from ( ) The repair of the subsystem makes the state transit from ( )  We present two lemmas to derive the expression of ( ) i t α and its limit based on the process Lemma 1 The failure rate of subsystem i can be calculated by

Proof
At state ( ) , all standby components in subsystem i are failed, and one more failure of the active components in subsystem i results in the failure of subsystem i .Therefore, the subsystem failure when the subsystem is operating at time t .The details are provided in the following deduction: Lemma 1 is proved.
To analyze the system stationary availability, we study the limit of ( ) i t α by analyzing the limiting behavior of ( ) ( ) ( ) , ii X t I t .Note that Eq. ( 4) has no terms about the SA states except time t influenced by SA.We first do not consider the SA time spent in state ( ) , ii X t I t , the operating time spent in state ( ) On the other hand, since the state space of ( ) i Y T does not include the SA states corresponding to ( ) 0 i It= , we have the conditional probability equation: ( ) ( ) Multiplying ( ) ( ) at the two sides of Eq. ( 6), we have:

(
) ( ) ( ) Substituting Eq. ( 7) in Eq. ( 3) and dividing the top and bottom by ( ) ( ) P 0 i I t≠ , we have: Based on the CTMC ( ) i Y t and Eq. ( 8), we propose Lemma 2 for obtaining the formula of the limit of ( ) i t α when time t tends to infinity.

Lemma 2
The limit of ( ) i t α is a constant, computed as follows:

Proof
According to Eq. ( 8), the limit of ( ) i t α can be derived if the limit of ( ) ( ) ( ) When the time t tends to infinity, we have: CWT t and ( ) CRT i t respectively denote the cumulative working time and the cumulative repair time of subsystem i by time t , and S A is the system stationary availability.Then, we have for 0,1, , which can be calculated using the Chapman-Kolmogorov equation of ( )

sciENcE aNd tEchNology
As ( ) i Y T is an irreducible and aperiodic CTMC with finite state space, the stationary state probability is regardless of the initial distribution.Solving Eq. ( 12), we obtain that: ) for 1, 2, , . Then, we obtain the limit of ( ) i t α based on Eq. (8,13): Lemma 2 is proved.
Lemma 2 indicates that, after a long time, the transition rates from ( ) 0 St= to ( )

Stationary availability and other characteristics
In this subsection, we present the closed-form expressions for the stationary availability of the system and subsystems, SFF, MTTF, and MTTR.
The system availability is the probability that ( ) 0 St= .The avail- ability of subsystem i is the probability that ( )

St i
≠ .After a long time, the behavior of the stochastic process ( ) S t can be described using a CTMC where the system state ( ) S t transits from 0 to i with the transition rate i α and transits from i to 0 with the transition rate i µ .By solving the limiting state probability, we derive the stationary availability of the system and subsystems.
Theorem the stationary availability of the system is: And the stationary availability of subsystem i is:

Proof
The stationary availability of the system is the probability that ( ) 0 St= as time t tends to infinity: ( ) As to the stationary availability of subsystem i , the suspended time of subsystem i is the available time of subsystem i because subsystem i is functional in the SA state.Then, we have the stationary availability of subsystem i as: ( ) The Chapman-Kolmogorov equation of the CTMC is as follows: As the limit behavior of ( ) S t is an irreducible and aperiodic CTMC with finite state space, the stationary state probability is regardless of the initial distribution.Solving Eq. ( 19), we have: Substituting Eq. ( 20) in Eq. (17,18), we have the closed-form expression of S A and i A .Then the theorem is proved.The other characteristics including SFF, MTTF, and MTTR can also be derived based on the limiting behavior of ( ) S t .The failure frequency of the system is a total of the failure rates of the subsystems on condition that the system is available.MTTF and MTTR are also related to the failure rates of the subsystems.Based on the two Lemma and theorem, we propose the following corollary for the formula of SFF, MTTF, and MTTR.
Corollary 1 Based on the limiting behavior of ( ) S t , we have the formula of SFF, MTTF, and MTTR as follows: sciENcE aNd tEchNology

Proof
According to Lemma 1 and Lemma 2, i α is defined as the failure rate of subsystem i on condition that the system is operating.SFF is the total failure rate of the system when the system is operating.Thus, SFF can be calculated as the sum of i α multiplying the probability that ( ) 0 St= as time t to infinity: According to the concept of MTTF and MTTR, we have: Based on Lemma 2, Theorem and Eq.(23)(24)(25), we have the formula of SFF, MTTF, and MTTR.Corollary 1 is proved.
A multi-component series system with m components is a special case of the considered system.we can set 1 i i nk == , and according to Eq. (15,(21)(22)(23), we have: The above results coincide with the works of Kuo and Zuo in which a CTMC at the component level is constructed to solve these characteristics [16].

Analysis of the system stationary availability function
To apply the proposed method in the product development, we need to analyze the effect of the input parameters, including the failure rate and repair rate of the components and redundancy level, on the system availability so that we can improve the reliability of the product.Then, we discuss the property of the system stationary availability function.

Corollary 2
The system stationary availability is a monotone decreasing function of i k

Proof
For convenience, we denote as the function i α of i k .Then, we have: Then, we have: 0 Therefore, ( ) A is a monotone decreasing function of i α according to Eq. ( 19).Thus, S A is a monotone decreasing function of i k .Corol- lary 2 is proved.

We can similarly derive S
A is a monotone increasing function for i n .

Corollary 3
The system stationary availability is a monotone decreasing function of i λ .

Proof
The first order difference of S A for i λ is: where For the other parameters i λ − and i µ , the monotone of S A can be similarly derived by the partial difference.sciENcE aNd tEchNology

Numerical examples 4.1. Inputs of the numerical examples
In this part, we present some examples to illustrate and better understand the proposed method.A multi-component series system with 10 k-out-of-n:G warm standby subsystems is considered as the object in the following examples.The input parameters including , and i µ in each subsystem are shown in Table 1.We use the same inputs for verification using a Monte Carlo simulation developed by MATLAB 2016a.We compare the methods assuming subsystem independence and subsystem dependence due to SA to find the difference between the two methods.A sensitivity analysis is performed with respect to the input parameters, including i n , i k , i λ , and i µ to pro- vide technical insight for the reliability engineer.We coded the numerical algorithm in MATLAB 2016a.The programs were run in a PC with a 2.50 GHz processor and 4 Gb of RAM.

Model verification by a Monte Carlo simulation
In this subsection, we calculate S A , SFF, MTTF, and MTTR based on the analytical method proposed in section 3. A Monte Carlo simulation is carried out to verify our method.The number of replications in the simulation is 4  10 , and the time period is 5  10 hours.The time unit is the hour.In the simulation, we record whether the system is working or under repair in each hour, so that we can carry out the instantaneous analysis.S A is calculated by the average ratio of the cumulated working times to the time period; The instantaneous availability is calculated by the average ratio of the number of working systems to the number of replications at time t in the simulation.Failure frequency(FF) is calculated by the average ratio of the failure numbers to the time period; MTTF (MTTR) is calculated by the average ratio of the cumulated working (repair) time to the failure number in the time period.
We choose 5 to 10 subsystems as the object systems and make the comparison between the analytical model and simulation to verify the analytical model proposed in this paper.The results of the analytical model and simulation by 5 10 t = hours are shown in Table 2.All the relative errors between analytical model and simulation at or below 3 10 − level.For the 7th combination, we plot the instantaneous availability, FF output by simulation in Fig. 4. The other combinations are similar to the 7th combination.
In fig. 4 (a), the simulation curve drops quickly at first and then tends to be stable.The simulation curve covers the analytical result 0.8259  hours, which indicates that the system tends to be stable from the initial state.The relative error between FF and analytical result of SFF is 0.5440% at 5 10 t = hours.Thus, it can be concluded that our method is accurate.

Comparison of the method assuming subsystem independence and the proposed method
In this subsection, we compare the methods assuming subsystem independence and subsystem dependence due to SA to find the error of the method assuming subsystem-independence.We calculate the stationary availability of the system and subsystem i under the condition of no suspended animation (NSA) and SA.We denote the stationary availabilities of the system and subsystem i based on the NSA method as and:

A
. If the SA is not considered, the stationary availability of the system and subsystems is always underestimated.From point 10 i = to 6 i = in Fig. 5, the difference decreases, which implies that SA states have a greater impact on the subsystem with lower availability.
In addition, that the subsystem with lower availability causes a longer SA time indicates dependency being on other subsystems.This dependency can result in a larger difference associated with the availability of other subsystems.

Sensitivity analysis for the difference between NSA
, nk sequence is arranged in the descending order of the value of 1 1 / n k .That is to say that, for the system with a high value of n/k in the subsystem, SA should be emphasized when failure rate is great.While for the system with a low value of that, SA should be emphasized when failure rate is small./ n k increasing.Therefore, the SA effect is relatively strong for the system with the relatively ,, nkλ , where 1 λ ranges from    large value of n/k in a k-out-of-n subsystem.In addition, for each curve, decreases along with the increase of but the difference has a minimum.That is to say, the SA effect should be emphasized if both the value of n/k in a k-out-of-n subsystem and the repair rate of the subsystem are relatively small.

Conclusion
In this paper, we focus on SA of subsystems in a multi-component series system with different k-out-of-n:G warm standby subsystems.We relax the assumption that the k-out-of-n:G subsystems are independent for the system availability analysis.A transition process focusing on one subsystem with SA is constructed to analyze its failure rate.Then, a CTMC modeling the system failure and repair transition is constructed to derive the system stationary availability.We discuss the monotonicity of system stationary availability function based on the obtained expression.In numerical examples, we first verify our method by a Monte Carlo simulation.All relative errors are at or below10 −3 level.Then, we make a comparison between the method assuming subsystem independence and the proposed method.We also perform a sensitivity analysis for the difference of the two in term of the system stationary availability.Findings in this paper can be drawn as follows.1) The failure rates of subsystems with SA and their limits are derived.2) The closed-form expressions for A S , A i , MTTF, MTTR, and SFF, considering SA are obtained.3) The system stationary availability is a monotone function for its parameters.4) The SA effect on the stationary availability should be emphasized in two cases, one is that the value of n/k and the failure rate of active components in a k-out-of-n subsystem are both relatively large or small, the other is the value of n/k and the repair rate are both relatively small.In future work, the arbitrary distributed failure times and repair times will be studied for practical application.Different repair strategies for the kout-of-n:G subsystems will be modeled for availability analysis.

Fig. 1 .
Fig. 1.Configuration of the multi-component series system consisting of m different k-out-of-n:G warm standby subsystems 3. System availability modeling and solution repair of the failed subsystem leads to the inverse transition.The transition diagram of the process

Fig. 2 .
Fig. 2. Sample path of ( )S t and corresponding ( ) i I t .The operation state of ( ) follows an exponential distribution with param- eter i µ .Thus, the operating and repair times can be analyzed in a new CTMC.The transition diagram of the new CTMC is shown in the dotted rectangle in Fig. 3.We denote the new CTMC as ( ), 0 i YT T≥ to describe the transition process among the states ( ) the time spent in the SA states ( ) to analyze the relationship of the time spent in ( ) i Y T and in

Fig. 3 .
Fig. 3. Transition diagram of the process compute the mean of the instantaneous availability output by the simulation from the result is 0.8260.The relative error between the mean and analytical result of S A is 0.012%.In fig.4(b), the simulation curve changes significantly from

A
In this subsection, we perform a sensitivity analysis for the difference between NSA S A and SASA with respect to different types of parameters.We choose two combinations of parameters, () nkµ to observe the variation of the

Fig. 4 .
Fig. 4. Comparison between the analytical method and simulation in terms of the output.In Fig. 4(a), the simulation curve covers the output of the analytical method from

4 .
The results are plotted in Fig.6.The difference between SA S nk λ , is plotted in Fig.7.Each curve in Fig.7rises at first and then drops, which indicates that the SA effect on availability become weak when the failure rate is relatively smaller or greater.On the other hand, in Fig.7, for low failure rate 1 λ , Δ ( ) 111 Ä , , nk λ increases in the order of (8, 2), (5, 2),(8,4),(5,4), while Δ ( ) 111 Ä , , nk λ increases in the inverse order for high failure rate.The order of the ( )

( 2 )
Difference analysis for ( ) nk µ , is plotted in Fig.9.For each 1 µ in this range, nk µ increases with the value of 1 1

Fig
Fig. 6.SA S A and NSA S A

0
St= to ( ) i t α .Finally, we can solve the state probability based on ( ) S t after obtaining ( ) i t α .
only considers the states of the components in subsystem i without the states of the components in other subsystems.The effect of SA among the subsystems can be described by( ) ( ) , ii X t I t derived from ( ) S t to obtain the subsystem failure rate ( ) i t α .As the failure rate ( ) ii X t I t i I t .The relationship of ( ) i I t and ( ) S t is as follows:

Table 1 .
Input parameters of the system

Table 2 .
The results of the analytical model and simulation [16]ectively.We denote the stationary availabilities of the system and subsystem i based on SA method as SA S A and SA i A , respectively.According to the reference[16], the formulas of NSA

Table 3 .
The results of 1 λ ranges from