Availability and cost-benefit evaluation for a repairable retrial system with warm standbys and priority

This paper investigates a warm standby repairable retrial system with two types of components and a single repairman, where type 1 components have priority over type 2 in use. Failure and repair times for each type of component are assumed to be exponential distributions. The retrial feature is considered and the retrial time of each failed component is exponentially distributed. By using Markov process theory and matrix-analytic method, the system steady-state probabilities are derived, and the system steady-state availability and some steady-state performance indices are obtained. Using the Bayesian approach, the system parameters can be estimated. The cost-benefit ratio function of the system is constructed based on the failed components and repairman's states. Numerical experiments are given to evaluate the effect of each parameter on the system steady-state availability and optimize the system cost-benefit ratio with repair rate as a decision variable.


Introduction
In practical engineering applications, the technique of redundancy or standby design is usually applied to improve system reliability and availability, such as k-out-of-n: G system, cold standby system, warm standby system and hot standby system. Generally, a k-out-of-n: G warm standby system means that when a working component fails, a warm standby component replaces it immediately, if it is available. The system works if and only if at least k components work, in which the failure rate of each warm standby component is non-zero and lower than that of the working component. Such systems can be applied to network design, power plant, aerospace and other fields.
Redundant systems have aroused wide discussion among many researchers such as Singh (1989), She and Pecht (1992), Srinivasan and Subramanian (2006), Hsu et al. (2009), Amari et al. (2012), Zhao and Wu et al. (2020) and Levitin et al. (2021). On the other hand, the reliability modeling and analysis of redundant systems with various types of components have aroused attention of many scholars. Zhang and Horigome (2000) analyzed the availability of a 3-out-of-4: G warm standby repairable system, which consists of two types of components. Later, the model was expanded by Zhang et al. (2006) to a general k-out-of-(m+n): G warm standby repairable system. El-Damcese (2009) studied a warm standby system with two types of components and common-cause failures. Wu et al. (2016) investigated a k-out-of-n: G warm standby system with many non-identical components. Recently, a (k 1 , k 2 )-out-of-n: G system with two different types of components and CO rule was modeled by Wang et al. (2021).
The study of retrial queueing has attracted wide attention in queueing theory. There have been many literatures describing retrial queues, such as cellular mobile networks in Tran- Gia and Mandjes (1997), call center in Pustova (2010), local area network in Janssens (1997) and Wang and Zhang (2016). In some practical systems, there may be no waiting space for failed components. When a component fails randomly and the repair facility is busy, retrial feature of failed components can be considered for system reliability modeling, which can more accurately evaluate the reliability of the system. Krishnamoorthy and Ushakumari (1999) introduced retrial feature of failed components into repairable systems and analyzed the reliability of three different models. Ke et al. (2013) investigated a warm standby repairable retrial system and proposed an efficient method to calculate the availability of system. The working vacation policy was considered in a warm standby repairable retrial system by Yang and Tsao (2019). Gao (2021) considered second optional repair service in a retrial system with warm standby components. Chen and Wang (2018) and  considered a retrial machine problem with working breakdown under N-policy and F-policy, respectively. The repairable retrial system models have been extensively studied. However, fewer studies consider the retrial system with different types of components. The different types of components might have different functions that could better meet practical situations. Based on the context, this paper studies a warm standby repairable retrial system with two types of components and priority. The contributions of this paper are listed below: (1) retrial of failed components is introduced into the warm standby repairable system with different types of components; (2) the cost-benefit ratio function of the system based on failed components and repairman's states is developed to determine the optimal repair rate.
The rest of the paper is structured as follows. In Section 2, the detailed description of the model is given. In Section 3, the steady-state availability and some steady state performance indices are derived by using probability analysis, the system parameters are estimated by Bayesian approach, and the system cost-benefit ratio function is developed. Section 4 presents the numerical analysis of a retrial machine repair problem with warm standby units to illustrate the proposed model and examines the effects of different parameters on the performance indices of the system. In Section 5, the work of this paper is summarized.

Model description
Consider a warm standby repairable retrial system with two types of components, which are respectively named type 1 and type 2. Each type has N components. Here, components of type 1 have priority over type 2 in use. The system operates naturally if and only if the number of normal type 1 and type 2 components is greater than or equal to N. When the system fails, the non-failed components are still subject to failure. The assumptions of the proposed system model in this paper are given as follows.
(1) At time t = 0, all type 1 components are in operating state and all type 2 components are in warm standby state. The system is in operation and the repairman is idle.
(2) If any one of operating components fails, it will be superseded by an available warm standby type 2 component, which develops into an active state. (3) The failed components are repaired by a repairman who only repairs one at a time. When a component fails and the repairman is free, the failed component will be repaired immediately, otherwise if the repairman is busy with another failed component, the new failed component will enter the retrial orbit and try again after a random period of time until it is repaired. The retrial discipline for the failed components in the orbit is first in first out. Retrial times are governed by exponential distribution with parameter γ . (4) The operating time of each type 1 component and each type 2 component obeys exponential distributions with rates λ 1 and λ 2 , respectively. The failure time of each warm standby type 2 component follows exponential distribution with rate λ(λ < λ 1 < λ 2 ). The time-to-repair for each failed component is exponentially distributed with rate μ, and the repaired component is as good as new. (5) The failure rate of type 1 component is lower than that of type 2, so type 1 components will be used preferentially. The priority of using discipline is the following: type 2 components are used as standby when all type 1 components in the system are normal; type 2 components start operation when the number of normal type 1 components is smaller than the minimum value for normal operation of the system; during normal operation of the system, after a failed type 1 component is repaired, it will replace one of operating type 2 components, and the replaced type 2 component will convert to standby. (6) All the random variables are independent of each other.

Model analysis
At time t, let I(t) and J(t) represent the number of failed type 1 and type 2 components in retrial orbit, respectively. K(t) describes the state of the repairman, where 1 1 , if the repairman is repairing failed type 1 component at time t, 1 2 , if the repairman is repairing failed type 2 component at time t.
Obviously, the state of system at time t is {K(t), I(t), J(t)}, which can be described as a Markov process Based on model description and assumptions, the repair schematic diagram of failed components is shown in Figure 1. The transition rates between the system states can be described as follows.
(1) (0, i, j) → (1 1 , i, j), i = 0, 1, . . . , N − 1, j = 0, 1, . . . , N: the transition occurs when a type 1 component fails and the state of the repairman changes from idle to repairing the failed type 1 component. The transition rate among this process is . . , N − 1: the transition occurs when the repairman is repairing a failed type 1 component and a type 2 component fails and enters the retrial orbit. The transition rate among this process is . . , N − 1: the transition occurs when the repairman is repairing a failed type 2 component and a type 1 component fails and enters the retrial orbit. The transition rate among this process is the transition occurs when the number of failed components in orbit remains unchanged and the repairman's state changes from repairing the failed type 1 or type 2 component to idle. The transition rate among this process is μ.
transition occurs when a failed type 1 in the orbit retries successfully and the repairman repairs the failed type 1 component. The transition rate among this process is γ .
transition occurs when a failed type 2 in the orbit retries successfully and the repairman repairs the failed type 2 component. The transition rate among this process is γ .

System transition rate matrix
Based on the state transition of system and division of idle and busy states of the repairman, the system transition rate matrix Q of order 3N 2 + 4N can be regarded as a block matrix, which is described as follows: According to the division of the number of failed type 1 and type 2 components in the retrial orbit, the sub-blocks Q 1 , Q 2 , Q 3 and Q 4 of the matrix Q can be expressed as follows: (1) , with the order of Q 0 being N + 1, with the order of Q i being N + 1, Q N = diag(−(γ + Nλ 2 ), −(2γ + (N − 1)λ 2 ), . . . , −(2γ + λ 2 )), with the order of Q N being N.
(2) (2N 2 +2N) , and the sub-blocks Q 21 and Q 22 can be divided according to the number of failed components of type 1 and type 2 in the retrial orbit, as below: , with the order of B being N, and 0 = (0, . . . , 0) 1×N , and the sub-blocks Q 31 and Q 32 can be divided according to the number of failed components of type 1 and type 2 in the retrial orbit, as below: (4) and the sub-blocks Q 41 and Q 42 can be divided according to the number of failed components of type 1 and type 2 in the retrial orbit, as below:
• The steady-state availability of the system can be given by • Mean number of failed type 1 components in the orbit is • Mean number of failed type 2 components in the orbit is • The probability that the repairman is in idle state is • The probability that the repairman is repairing a failed type 1 component is • The probability that the repairman is repairing a failed type 2 component is

Cost-benefit ratio
Cost is an important index that needs to be evaluated in practical engineering. It has attracted attention of many scholars (see, for example, Kuo et al., 2014;Zhao & Guo et al., 2018;Meena et al., 2019;Kumar & Jain, 2020;Wu et al., 2021;Gao & Wang, 2021). There are many factors that affect the total system cost, such as component failure rate, retrial rate, repair rate, the number of standby components and so on. Meena et al. (2019) and Kumar and Jain (2020) both considered the cost of repairman and failed components. The former constructed a function of the total cost per unit time, in which the repair rate and vacation rate are the decision variables, while the latter constructed a function of the expected total cost at time t regarding the repair rate. However, the benefit of the system should also be considered while maintaining the smaller total cost conditions, so cost-benefit analysis is an important part of the system design. Based on the cost models in Meena et al. (2019) and Kumar and Jain (2020), we develop a cost-benefit ratio optimization model. The cost-benefit ratio is the ratio of expected total cost and availability per unit time under steady-state condition. In this work, it mainly studies the influence of decision variable μ on the cost-benefit ratio of the system and finds an appropriate one that minimizes the cost-benefit ratio.
The per-unit cost elements on constructing the cost model are defined as follows: • C h 1 : cost incurred on failed type 1 components in orbit, • C h 2 : cost incurred on failed type 2 components in orbit, • C n : cost for repairman in idle, • C b 1 : cost incurred on repairman who is busy repairing a failed type 1 component, • C b 2 : cost incurred on repairman who is busy repairing a failed type 2 component, • C r 1 : cost incurred on the repair job for failed type 1 components with rate μ, • C r 2 : cost incurred on the repair job for failed type 2 components with rate μ.
Using the above cost elements and the corresponding system performance indices, the expected total cost function is formulated as follows Taking TC(μ) as the numerator and A(∞) as the denominator of the cost-benefit ratio, an optimization model of the cost-benefit ratio is given by subject to μ > 0.

Numerical results
In some real manufacturing engineering systems, the failure of machine or product equipment can reduce production output, which can be ameliorated by the support of repair facility and standby machines. An application of our model can be realized in a retrial machine repair system with warm standby units, which consists of two type 1 units, two type 2 units and a single repair server. The system works normally only if there are two or more units in operation. We take λ 1 = 0.10, λ 2 = 0.16, λ = 0.008, μ = 1.0, and γ = 1.2 as the basic parameter values (They can be estimated by the Bayesian approach in Section 3.3). According to the theoretical results provided in Section 3, the system steady-state availability and cost-benefit ratio are analyzed numerically. Based on default values, we obtain A(∞) = 0.982.

Numerical analysis for steady-state availability
By choosing λ 1 or λ 2 as one of each pair and one of the rest system parameters as another, we analyze the effect of system parameters on A(∞). The results are shown in Figures 2-5. Figure 2 shows the three-dimensional graph of A(∞) by varying λ 1 from 0.04 to 0.14 and λ 2 from 0.16 to 0.26. We observe from Figure 2 that A(∞) decreases with the increase of λ 1 or λ 2 , and the effect of λ 2 on A(∞) is smaller than that of λ 1 . This is because with the failure rate of type 1 or type 2 unit increases, the system will more easily break down. In Figure 3, A(∞) decreases as the value of λ increases from 0.006 to 0.016. On the contrary, Figures 4 and 5 show the increasing trend of A(∞) as μ increases from 1.0 to 2.0 and γ increases from 1.2 to 2.2, respectively. The reason is that higher repair rate and retrial rate can make failed units return to normal state faster. Therefore, increasing them can improve the steady-state availability of the system. Besides, we also observe from Figures 2-5 that λ 1 and μ affect A(∞) significantly, and λ 2 affects A(∞) moderately, while λ and γ affect A(∞) slightly.

Numerical analysis for cost-benefit ratio
The effect of repair rate on system cost-benefit ratio is investigated for the retrial machine repair problem with two types of units. For the calculation of performance indices, the default parameters are assumed as C h 1 = $70,    C h 2 = $80, C n = $30, C b 1 = $50, C b 2 = $55, C r 1 = $40, C r 2 = $45, and the system cost-benefit ratio with respect to repair rate μ can be numerically calculated. Numerical results in Figure 6 indicate the effect of repair rate μ on some performance indices of the system, where E(N 1 ), E(N 2 ), P b 1 and P b 2 decrease as the increase of μ. On the contrary, P I shows an increasing trend as μ grows up. As shown in Figure 7, the cost-benefit ratio first decreases and then increases with the increase of μ, and there is an optimal value μ * to minimize the cost-benefit ratio. Through MATLAB software, we can obtain the optimal repair rate μ * (0.676) and the minimum value of cost-benefit ratio (120.524). Furthermore, we find that the system availability is 0.951 when the cost-benefit ratio is minimized.

Conclusions
This paper proposes a warm standby repairable retrial system model with non-identical components and operating priority. The system steady state performance indices are analyzed by using Markov process theory and matrix analytical method. By Bayesian approach, the basic estimation idea of estimating system parameters with gamma prior distribution is provided. A minimum optimization model of system cost-benefit ratio with repair rate as decision variable is constructed. Numerical examples demonstrate how the system parameters affect the system steady-state availability, and the effect of repair rate on the cost-benefit ratio of the system. The important numerical results are summarized as follows. (i) The change of type 1 component failure rate on system steady-state availability is more affected than that of type 2 component. (ii) Through optimization analysis, the optimal repair rate is obtained, which makes the cost-benefit ratio of the system minimum. The numerical results provide a reference for decision-maker of the system in selecting the parameter values, so as to better design the system. However, the limitations of this system model are the assumptions on exponential distribution, and the same number of type 1 and type 2 components. In the future, a potential work is to consider that the failure time and repair time are assumed to follow phase-type distribution. Another potential work is to extend the number of type 1 and type 2 components to make it more general.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by the National Natural Science Foundation of China [Grant Number 72071175, 72001070].