Reliability Calculation for Dormant k-out-of-n Systems with Periodic Maintenance

In this paper, a dormant k-out-of-n systems redundancy calculation will be introduced. Dormant failure is a failure that cannot be detected when it occurs because of the nature of the failure characteristic. Therefore, a dormant failure becomes the blind point to the design for reliability and maintainability because of its inability to be detected. The most popular approach in detecting a dormant failure is to carry out a scheduled periodic inspection, test or maintenance activity. The scheduled periodic maintenance is applied to prevent and reduce the unexpected dormant failures that could lead to safety consequences, or costly corrective maintenance. This paper will introduce a methodology on how to calculate the reliability parameter such as Mean Time Between Failure (MTBF) for the dormant k-out-of-n redundant systems. The mathematical relationship between the effective MTBF and the scheduled periodic inspection/maintenance interval is also elaborated. Case studies are adopted to illustrate how to apply the developed reliability calculation methodology in the mass transit train reliability and safety design.


Introduction
In a k-out-of-n redundant system, the unit failure can be classified into two categories: detectable and non-detectable. A detectable failure is a failure that is detected and /or annunciated when it occurs. In mass transit train subsystems and units, electrical, hydraulic and pneumatic subsystems failures are mostly linked to detectable failure and is usually monitored by the train on board health monitoring system. When such a failure occurs, the train on board health monitoring system can detect the degradation and abnormal condition and subsequently annunciate an alarm to the train operation control center. The train operation control center will take corrective actions to manage these occurred failures. A non-detectable failure is a failure which is not detected and /or annunciated when it occurs. A non-detectable failure is also called a passive failure or dormant failure in some standards and documentations. The most effective approach to detect a passive or dormant failure is to carry out a scheduled periodic inspection or test to identify them. The implementation of Failure Mode, Effects and Criticality Analysis (FMECA) or Failure Mode Effect Analysis (FMEA) can be utilized to identify the detection method of the failure mode. If a failure mode is identified as a dormant failure, then a scheduled periodic inspection or test is required as a mitigating action to detect these types of failures. In this paper, a reliability calculation method is introduced to speculate the relationship between the effective Mean Time Between Failure (MTBF) and the scheduled periodic inspection interval for the k-out-of-n redundant system. The study indicates that the shorter the scheduled periodic inspection interval, the greater the effective MTBF, vice versa, the longer the scheduled periodic inspection interval, the smaller the effective MTBF. This paper will start with the introduction of the reliability calculation methodology for the k-out-of-n redundant system which is periodically maintained. Then the study proceeds to apply the developed calculation methodology in the brake discount calculation for the safe stopping distance analysis in the Sao Paulo Monorail. The conclusion and boundary are summarized at the end of this paper.
In reference Klion (1977), systems periodically maintained was introduced. In reference Military standard (1980), failure mode, effects and criticality analysis was introduced. In reference IEEE standard (1999), the safe train separation and typical safe braking model were introduced. In reference Vintr et al. (2003), preventive maintenance optimization on the basis of operating data analysis was presented. In reference Military Standard (2005), Binomial distribution was introduced. In reference Tutt et al. (2009) and(2012), risk-informed preventive maintenance optimization was presented. In reference Babishin et al. (2016), maintenance inspection optimization of k-out-of-n redundant system was presented. In reference Guo et al. (2016), optimization of preventive maintenance interval on the aircraft indicators was presented.

Reliability Calculation Methodology for the Redundant Systems Periodically Maintained
In this section, a redundant calculation approach is introduced to a systems which is periodically inspected and maintained. The mass transit trains encompass various electrical and mechanical subsystems. Most of the failures that occur in the electrical subsystems are usually detectable and annunciate with an alarm to the train on board health monitoring system. Notwithstanding, some of the failures that occur in the mechanical subsystem are dormant and cannot be detected. For example, a brake caliper stuck in the release position is considered as a dormant failure, the failure cannot be detected until the next scheduled inspection. For the subsystems with a potential dormant failure, a maintenance team will visit these subsystems at every predetermined interval and repair all occurred failures.
If we define T as the predetermined maintenance interval or unattended period of operation; and define f(t) as the failure density function. Then The probability that the system will be on at the end of T is   If the system is still operating at T, then the operating time for the system is T. If the system fails at t in (0, T), then the operating time for the system is t. Therefore, the average uninterrupted operating time of a system in (0, T), TAU is given by It is possible for a system to fail before the first cycle (0, T) is completed or it is possible that the system will not fail until the Nth cycle is completed. Therefore, if we had a large number of such systems (X) in the field and intended to maintain these N systems over a long time period: R(T) = proportion of system surviving the first cycle with no failure R(T) 2 = proportion of system surviving the second cycle with no failure R(T) 3 = proportion of system surviving the third cycle with no failure and, is an infinite geometric series. R(T) is between 0 and 1. So the sum of this infinite geometric series is shown as the following equation.
, as N gets arbitrarily large, the above equation will be equal to The average uninterrupted operating time to the first failure TFF: Where T represents the unattended period of operation or predetermined maintenance interval (i.e., every T hours a maintenance team visits the system and repairs all unit failures). The Fig.1 shows that a system is restored to its original condition following preventive maintenance, i.e., "as good as new".

Reliability Calculation Methodology for k-out-of-n Systems Periodically Maintained
In the Substitute equation (5) into the equation (4) to obtain: In practice, after an enormous reliability calculation with a long time period, we have observed that the numerator of equation (6): Therefore the equation (6) can be expressed as a following closed-form equation (7): The advantage to equation (7) compared with equation (6) is that we can save a lot of time by skipping the massive integral calculation and obtaining an approximated result.  (7) can also be expressed as: for the exponential distribution. It is understood that MTBF is the reciprocal of the failure rate. The above equation also indicates that the inspection interval will not change the failure rate of a single unit configuration.

Determining Effective MTBF for 1-out-of-2 Redundant Systems
In the Table 6 It should be noted that the shorter the maintenance interval (T), the greater the effective MTBF, vice versa, the longer the maintenance interval (T), the smaller the effective MTBF. We provide the following example to demonstrate this concept.
For a repairable two unit redundant system with an identical constant failure rate λ=10 -5 failure per hour. If the periodic maintenance interval is one month, then T=1 month=24x30=720 hours. Substituting T=720 and λ=10 -5 into equation (10) The above calculation indicates that if the maintenance interval is stretched out from one month to three months, the effective MTBF will be roughly shortened to one third.

Case Study-Brake Discount Calculation in Safe Braking Model
Train collision is considered as one of the major safety concerns in the mass transit industry. Automatic Train Protection (ATP) is a dedicated system to prevent one train from colliding with the other train on the same line by means of maintaining a safe separation between trains. The safe separation (braking) distance analysis shall be based on braking capacity (dependent on weight), the gradient at the location concerned, the maximum possible speed of the trains using the section, the allowance for system reaction and a credible margin. The ATP profile shall be governed by a safe braking model shown in Fig. 2 and shall ensure that under no circumstances (including failures) the movement authority limit will be exceeded by an ATP equipped train. With respect to the safe braking model, a reliability engineer is required to analyze the brake failure case and determine the discounted brakes quantity. For the Bombardier developed platform monorail, a train is composed of two cars. A train consists of four brake axles; each braked axle is equipped with a single passive caliper and disc pair. Fig. 3 shows the brake axle configuration. Each brake axle has a failure rate λ: 4.86 x 10 -6 failure per hour. The train mission time is 13.5 hours per day. Brake supplier has recommended a three month preventive maintenance for brake axles.

Case Study 1: 1-out-of-4 Brake Axles Fails, 3 of 4 Brake Axles are working
In this case study, we consider one of four brake axles fails, and the remaining three brake axles are still working normally.
Substitute k=3 (Number of brake axles working in a train), n=4 (Total number of brake axles in a train), λ=4.86E-6 (failure rate of brake axle, unit: failure per hour), T=3x30x13.5=1215 (three month maintenance interval x 30 days x daily mission, unit: hours) into equation (7): We convert the above calculated MTBF into the failure rate, and the reciprocal failure rate =1.6983E-7 fph. In the mass transit reliability regime, the threshold for the improbable probability is 10E-9 fph, which means if the failure rate is lower than 10-9 fph, it can be assumed that the occurrence of such a failure may not be experienced in the thirty year life time. Because the failure rate for 1-out-of-4 brake axles is 1.6983E-7 fph, greater than 10E-9 fph. It can be concluded that one brake axle could fail in the monorail's thirty-year life time. Therefore, one brake axle failure shall be considered in the safe braking model failure case.

Case Study 2: 2-out-of-4 Brake Axles Fail, 2 of 4 Brake Axles are working
In this case study, we consider the situation in which 2-out-of-4 brake axles fail, and the remaining two brake axles are still working.
Substitute k=2 (number of brake axles working in a train), n=4 (total number of brake axles in a train), λ=4.86E-6 (failure rate of brake axle, unit: failure per hour), T=3x30x13.5=1215 (three month maintenance interval x 30 days x daily mission, unit: hours) into equation (7): We convert the calculated MTBF into the failure rate, and the reciprocal failure rate =6.6889E-10 fph. The failure rate for 2-out-of-4 brake axles failing is lower than 10E-9 fph, thus it can be concluded that two brake axles failing simultaneously in [0, 1215 hours] could not be experienced in the monorail's thirty-year life time. Therefore, the situation that two of the four brake axles failing at the same time will not be considered in the safe braking model failure case study.

Conclusion and Boundary
The purpose of this paper is to determine the mathematical relationship between the reliability parameter: Mean Time Between Failure (MTBF) or failure rate, and the maintenance interval or unattended period of the operation for the k-out-of-n redundant systems. The developed formula and methodology in this paper can be utilized in the MTBF approximation practice for the k-outof-n redundant system which is periodically maintained. The approach presented in this paper can also be applied in reliability calculations of the systems with the potential dormant failures. As described in the paper, the approach introduced in this paper is limited within the application of the exponential distribution.