Reliability Comparative Evaluation of Active Redundancy vs. Standby Redundancy

Redundancy is a commonly applied reliability improvement technique to enhance the system reliability and availability of safety critical systems, or operational impact systems in the railroad and mass transit industry. In this paper, two very basic but different types of parallel redundancy, namely active redundancy and standby redundancy are introduced and studied according to the mechanism structure built in a system. The pros and cons of the active redundancy and standby redundancy are also discussed. The Markov model technique is utilized to illustrate the Mean Time Between Failure (MTBF) calculation for the active and standby redundancy for the purpose of reliability evaluation. The comparison is also undertaken for the active redundancy versus standby redundancy from a reliability point of view.


Introduction
In the railroad and mass transit industry, the safety critical functions or operational impact systems require redundancy implementation to enhance the system safety and strengthen systems reliability and availability. Redundancy is defined as the existence of more than one means for accomplishing a given task or function in a system.
One thing that should be revealed is that the application of redundancy is not without penalties. Despite reducing system mission failures, redundancy increases system logistics failures. It will also increase weight, space requirements, complexity, cost, and time to design. The increase in complexity results in an increase in unscheduled maintenance. Thus, system safety and mission reliability is gained at the expense of adding an item(s) in the unscheduled maintenance chain. The increase in unscheduled maintenance may be counteracted by reliability improvement techniques such as design simplification, derating, and the use of more reliable components.
By incorporating redundancy in a system design, the "checkability" or diagnostic coverage must also be considered. The status of some items may not be checkable prior to the mission start. Such items will then be assumed to be functional at the beginning of the mission. In reality, pre-mission failures of redundant items could be disguised. If it is not known that redundant elements are operational prior to mission start, then the purpose of redundancy can be defeated because the possibility exists of starting a mission without the designed redundancy (a reliability loss).
The two basic types of commonly applied redundancy are active redundancy and standby redundancy. Active redundancy does not require the external components or devices to perform the function of detection, decision and switching when an element or path in the redundant structure fails. The redundant elements are always in operation to share the load of the system, and automatically pick up the load for a failed element. Active redundancy is also called Full-on redundancy or Load-sharing redundancy in other papers. Fig.1 shows an active redundant system configuration. Standby redundancy is defined as the redundancy that requires the external elements or devices to detect, make a decision and switch to another element or path as a replacement for a failed element or path. Standby units can be operating (hot standby) or inactive (cold standby). Hot standby and active redundancy can be considered identical if the switching device is perfect. Fig. 2 shows a standby redundant system configuration. In reference Military Standard (2005), the concept of active redundancy and standby redundancy were introduced. In reference Mok et al. (2013), types of redundancies including active redundancy and standby redundancy were presented. In reference Mohammad et al. (2013), a load-sharing systems using k-out-of-n structure were presented. Active redundancy is a 1-out-of-2 load-sharing system.
In the reliability engineering practice, when we make a decision to use redundant design techniques to improve system reliability and availability. We usually confront a fundamental question: what type of redundancy is more appropriate to achieve required system reliability and availability? Active redundancy or standby redundancy. In this paper, we will perform a reliability analysis to compare the active redundancy against the standby redundancy by utilizing Markov model technique. The conclusion will be summarized at the end of this paper.

Markov Model
The term "Markov model" is named after the Russian mathematician Andrei Markov, originally referred to mathematical models in which the future state of a system depends only on its current state, and not on its past history. That is the memory less characteristic, which is the main Markov property. The other characteristic of Markov model is stationary. A stationary system is one in which the probabilities which govern the transitions from state to state remain constant with time (i.e. constant failure rate or repair rate). For any given system, a Markov model consists of a list of the possible states of that system, the possible transition paths between those states, and the rate parameters of those transitions.
Markov model is a very useful and powerful reliability analysis tool to evaluate the redundant systems which have the constant failure rate and repair rate. In reference Klion (1977), Markov approaches for full on operation and stand-by operation were introduced. In reference Military Standard (2005), Markov theory was introduced. In reference Jackson (2013), Markov analysis with non-constant hazard rates was presented. In reference Dakic (2015), Markov model was presented as one of the deductive methods of reliability quantification methods and techniques. In this paper, we will utilize Markov model to measure the reliability parameter Mean Time Between Failure (MTBF) for the active redundant system and standby redundant system respectively. The comparison will be undertaken between the active redundant system and the standby redundant system based on the reliability parameter evaluation.

Reliability Evaluation for Active Redundancy
In order to utilize the Markov model to analyze an active redundant system, a state transition diagram is illustrated in Fig. 3. In the above state transition diagram, state one is the initial state where unit A and unit B are both operating properly. State two is the state where one unit has failed, the remaining unit is still working to keep the system operational (success). System only fails if both unit A and unit B fail to meet the system operational requirement. State three is reached when unit A and B have both failed. An assumption used in developing the state transition diagram is that unit A and unit B cannot change states simultaneously. In Fig. 3 λ is the unit failure rate and μ is the unit repair rate.
For state one: The probability of being in state one at time t+∆t is equal to the probability of being in state one at time t and not transitioning out during ∆t. This can be written as International Journal of Mathematical, Engineering andManagement Sciences Vol. 1, No. 3, 122-129, 2016 ISSN: 2455-7749 125 Rearranging by moving P1(t) from the right-hand side to left-hand side, and dividing ∆t on the both sides of equation (1) to obtain equation (2) ) ( By integrating equation (2), we obtain Where, the boundary condition P1 (∞) = 0, P1 (0) = 1.
Note that the boundary condition is equal to one at the state of P1 (0) or P3 (∞), and zero at all other states.
T1 is defined as the expected time in state one; T2 is defined as the expected time in state two.
For state two: The probability of being in state two at time t+Δt is equal to the probability of being in state one at time t and transitioning to state two in Δt plus the probability of being in state two at time t and not transitioning out during Δt. This can be written as Rearranging by moving P2 (t) from the right-hand side to left-hand side, and dividing Δt on the both side of equation (6) to obtain equation (7) ) By integrating equation (7), we obtain Substituting T1 in equation (10)  Here, the success of the system is defined by state one and state two. State three is the failed condition. Consequently, we can write the MTBF. The MTBF would be defined as the sum of the expected time in state one and state two. Mathematically, this can be written as If the system is not maintained or non-repairable, then removing μ from the equation (12) and is simplified as

Reliability Evaluation for Standby Redundancy
Standby redundancy is more complicated than active redundancy because the switching device is involved to detect the failed primary unit and turn on the standby unit. The failure of switching device will result in different consequence for the system operation. If the switching device operates properly, it detects the failed primary unit and turns on the standby unit. The system operates until the standby unit fails. If the switching device fails while the primary unit is operating, the system operates until the primary unit fails. If the switching device fails in a way that a switch to the standby unit is mandated, while the primary unit is still capable of operating. The standby unit is turned on and the system operates until the standby unit fails. If the switching device fails while the primary unit is still operating, it fails in such a way that the primary and standby units are unable to operate and the system fails.
Considering the complexity introduced by the switching device, in this paper we assume that the switching device is always operating until the system fails. In other words, the failure of the switching device is not taken into account in the reliability analysis performed below.
In order to utilize the Markov model to analyze a standby redundant system, again, a state transition diagram is illustrate in Fig. 4. In the above state transition diagram, state one is the initial state where unit A is operating as a primary unit and unit B is not operating as standby. State two is the state where unit A has failed, the switching device detects the failure of primary unit A, and turn on the standby unit B to keep the system operational (success). State three is the state when the primary unit A and the standby unit B have both failed. In Fig. 4 λ is the unit failure rate and μ is the unit repair rate.
For state one: The probability of being in state one at time t+∆t is equal to the probability of being in state one at time t and not transitioning out during ∆t. This can be written as Rearranging by moving P1(t) from the right-hand side to left-hand side, and dividing ∆t on the both sides of equation (14) to obtain By integrating equation (15), we obtain Where, the boundary condition is P1(∞) = 0, P1(0) = 1.
Note that the boundary condition is equal to one at the state of P1(0) or P3 (∞), and zero at all other states.
International Journal of Mathematical, Engineering andManagement Sciences Vol. 1, No. 3, 122-129, 2016 ISSN: 2455-7749 128 T1 is defined as the expected time in state one; T2 is defined as the expected time in state two.
For state two: The probability of being in state two at time t+Δt is equal to the probability of being in state one at time t and transitioning to state two in Δt plus the probability of being in state two at time t and not transitioning out during Δt. This can be written as Rearranging by moving P2(t) from the right-hand side to left-hand side, and dividing Δt on both sides of equation (19) to obtain equation (20) ) By integrating equation (20), we obtain Substituting T1 in equation (23) Here, the success of the system is defined by state one and state two. State three is the failed condition. Consequently, we can write the MTBF. The MTBF would be defined as the sum of the expected time in state one and state two. Mathematically, this can be written as If the system is not maintained or non-repairable, then the equation (25)

Comparison and Conclusion
Based on the above performed reliability assessment and analysis, both active redundancy and standby redundancy will improve the system mission reliability and availability, and prolong the system operating time. For the repairable systems or the systems which are maintained, the reliability improvement is significant compared to the non-repairable systems or the systems which are not maintained. Mathematically, the reliability improvements for the active redundancy and the standby redundancy are very close based on the calculated MTBF comparison. More precisely the calculated MTBF for the standby redundancy shows slightly better than the active redundancy. Notwithstanding the switching device in the standby system will increase the complexity to the standby redundant system. Additionally the failure of the switching device will degrade the mission reliability and availability for the standby redundant system. Therefore, system engineers should consider the manifold factors including cost, complexity, maintainability, space, checkability, failure rate of unit and switching device, failure consequence and safety impact etc., and decide which of the redundancy technique is more appropriate to achieve the intended system mission reliability requirement based on analysis of the tradeoffs involved.