Reliability analysis of the pRoducts subject to competing failuRe pRocesses with unbalanced data

Competing failure involving performance degradation and catastrophic failure can be found in many products [8, 22]. During the working span, if any one of the failure modes occurs first, the product fails. The performance degradation failure, which is also termed soft failure, is due to aging degradation which makes the performance value reach an unacceptable level. Compared with the degradation failure, the catastrophic failure is more severe because the product may not function once it occurs [15]. For example, a semiconductor device’s failure may be due to electrical malfunctions or mechanical fatigue of I/O connectors (e.g., solder joints, etc.). The failure of the insulation system of a DC motor can be attributed to turn failure, phase failure, or ground failure. Failures of ball bearing assemblies are attributed to either race or ball failures [23]. Competing failure is an important failure concept for products, so it is significant to study the reliability of products with competing failure modes. Reliability analysis for products that experience only degradation has been extensively studied in the literature. Lu et al. [20] presented a general mixed-effects path model and used a two-stage approach to estimate the parameters of normally distribution. Subsequently, Lu and Meeker [21] used a simple degradation model to compare degradation analysis and traditional failure-time analysis in terms of asymptotic efficiency, and the results showed that degradation analysis provided more precision estimations. Bae and Kvam [2] developed a nonlinear random-effect model to describe the degradation of vacuum fluorescent displays. Furthermore, Bae et al. [3] investigated the link between a choosing mixed-effects model and the resulting lifetime model and pointed out that the degradation implied the lifetime distribution. In addition, stochastic process formulations have Junxing LI Yongbo ZHANG Zhihua WANG Huimin FU Lei XIAO


Introduction
Competing failure involving performance degradation and catastrophic failure can be found in many products [8,22].During the working span, if any one of the failure modes occurs first, the product fails.The performance degradation failure, which is also termed soft failure, is due to aging degradation which makes the performance value reach an unacceptable level.Compared with the degradation failure, the catastrophic failure is more severe because the product may not function once it occurs [15].For example, a semiconductor device's failure may be due to electrical malfunctions or mechanical fatigue of I/O connectors (e.g., solder joints, etc.).The failure of the insulation system of a DC motor can be attributed to turn failure, phase failure, or ground failure.Failures of ball bearing assemblies are attributed to either race or ball failures [23].Competing failure is an important failure concept for products, so it is significant to study the reliability of products with competing failure modes.
Reliability analysis for products that experience only degradation has been extensively studied in the literature.Lu et al. [20] presented a general mixed-effects path model and used a two-stage approach to estimate the parameters of normally distribution.Subsequently, Lu and Meeker [21] used a simple degradation model to compare degradation analysis and traditional failure-time analysis in terms of asymptotic efficiency, and the results showed that degradation analysis provided more precision estimations.Bae and Kvam [2] developed a nonlinear random-effect model to describe the degradation of vacuum fluorescent displays.Furthermore, Bae et al. [3] investigated the link between a choosing mixed-effects model and the resulting lifetime model and pointed out that the degradation implied the lifetime distribution.In addition, stochastic process formulations have Junxing LI Yongbo ZHANG Zhihua WANG Huimin FU Lei XIAO

Reliability analysis of the pRoducts subject to competing failuRe pRocesses with unbalanced data opaRta na nieZbilansowanych danych analiZa nieZawodnoŚci pRoduKtÓw podlegajĄcych pRocesom powstawania usZKodZeŃ KonKuRujĄcych
Considering the degradation and catastrophic failure modes simultaneously, a general reliability analysis model was presented for the competing failure processes with unbalanced data.For the degradation process with highly unbalanced data, we developed a linear random-effects degradation model.The model parameters can be estimated based on a simple least square method.Furthermore, to fully utilize the degradation information, we considered the last measured times of the degradation units that had only one or two measured time points as zero-failure data or right-censored data of the catastrophic failure mode.Then the incomplete data set was composed of zero-failure data and catastrophic failure data.To analyze the incomplete data, the definition of the interval statistics was firstly given.The best linear unbiased parameter estimators of catastrophic failure were obtained based on the Gauss-Markov theorem.Then, the reliability function of the competing failure processes was given.The corresponding two-sided confidence intervals of the reliability were obtained based on a bootstrap procedure.Finally, a practical application case was examined by applying the proposed method and the results demonstrated its validity and reasonability.
Keywords: reliability evaluation, competing failure model, unbalanced data, interval statistics.
W pracy przedstawiono ogólny model analizy niezawodności procesów związanych z powstawaniem uszkodzeń konkurujących, który pozwala na wykorzystanie danych niezbilansowanych oraz umożliwia jednoczesne uwzględnienie uszkodzeń wynikających z obniżenia charakterystyk i uszkodzeń katastroficznych.Opracowano  sciENcE aNd tEchNology nonparametric degradation modeling framework for making inference on the evolution of degradation signals that are observed sparsely or over short time intervals.Rao [24] and Swamy [27] have analyzed the linear random-effects regression model and given the parameter estimation approaches.Zhuang et al. [36] proposed a linear mixed-effects model and estimated the parameters with the repeated measurements data and the unbalanced data respectively.Yuan et al. [33] presents an advanced nonlinear mixed-effects model for modeling and predicting degradation in nuclear piping system.The model offers considerable improvement by reducing the variance associated with degradation of a specific unit, which leads to more realistic estimates of risk.It has been widely believed that the regression method is the most convenient and important tool for analyzing the unbalanced data of performance degradation.
Furthermore, some degradation units may only be inspected at one or two time point, such as unit 2 and unit 3 in Fig. 1.These degradation units make the analysis more challenging due to the sparse measured data.And this degradation data may be abandoned due to being unable to fit the degradation path.Therefore, in order to fully utilize the degradation information, we consider the last observation time points of these degradation units, such as 1 o t and 2 o t shown in Fig. 1, as the zero-failure data or right-censored data of the catastrophic failure mode.Then, the incomplete data consists of the zero-failure data from the degradation units and the failure data from the catastrophic failure, as shown in Fig. 2. Kaplan et al. [14] proposed Kaplan-Meier estimation method to analyze the reliability for the incomplete data.Amster [1] developed an average rank method to estimate the parameters of the life distribution.Lawless [17] used the maximum likelihood method to analyze the incomplete data.Lin [19] used the Expectation Maximization algorithm to compute the non-parametric maximum likelihood estimation.In this paper, we define the interval statistic and propose a non-parametric estimation method to analyze the incomplete data, and then the best linear unbiased estimates of the distribution parameters can be obtained.In addition, to the best of our attracted considerable attention from researchers in the degradation analysis, such as Markov chain, Wiener process and Gamma process, etc.Among them, Wiener process is one of the most prominent degradation models and has been studied rather extensively.Tseng et al. [29] used a Wiener process to describe the degradation o f the light intensity of LED lamps.Whitmore and Schenkelberg [31] presented a time-scale transformation Wiener process to analyze the reliability of self-regulating heating cables, and so on.
A variety of reliability models for competing failure modes have been developed.Zuo et al. [37] presented a mixture model which can be used to model both catastrophic failures and degradation failures.This mixture model also shows engineers how to design experiments to collect both hard failure data and soft failure data.Huang et al. [11,12] developed an extension of reliability analysis of electronic devices with multiple competing failure modes and derived the probability of a product with a specific failure mode, then predicted the probability of the dominant failure mode on the product.Li et al. [18] proposed a reliability evaluation model of multi-state degraded systems subject to multiple competing failure processes and assumed that these processes were independent.Jiang et al. [13] presented a reliability and maintenance model for systems subject to competing failure processes, which included a soft failure caused by continuous degradation due to a shock process and a hard failure caused by the instantaneous stress.Song et al. [25] developed a multi-component system reliability model for the complex multi-component systems, which would experience multiple competing failure processes of each component due to simultaneous exposure to degradation and shock loads.Wang et al. [30] established a competing failure model for aircraft engines based on the data fusion method.Wu et al. [32] investigated the reliability and quality problems when the competing risks data are progressive type-I interval censored with binomial removals.Tang et al. [28] studied a replacement problem for a continuously system subject to the competing risk of soft and sudden failures.
Before statistical analysis, the competing failures are usually assumed that the failure modes are independent or dependent.Recently, reliability modeling for products with multiple independent competing failure modes has been investigated by several researchers.Huang et al. [12] presented an extended method of reliability analysis for an electronic device, which has two failure modes-solder/ Cu pad interface fracture (a catastrophic failure) and light intensity degradation (a degradation failure).They assumed that the two failure modes were mutually independent due to the failure modes caused by different stresses.Recently, Cha et al. [6] used an improved method to analyze the reliability of this electronic device and the competing failure modes also were considered independent.Li et al. [18] developed models for evaluating the reliability of multi-state degraded systems with multiple competing failure modes, which were assumed independent.Applications of such systems can also be found in the Space Shuttle computer complex, electric generator power systems, and so on.Bocchetti et al. [5] proposed a competing risk model to access the reliability of the cylinder liners of a marine Diesel engine, and the two failure modes (wear and thermal crack) of cylinder liners were considered independent.Furthermore, in the practical engineering, the competing failure modes that each may have a different root cause can be considered independent.For example [9,16], a semiconductor device failure may be due to electrical malfunctions or mechanical fatigue of I/O connectors (e.g., solder joints, etc.).Therefore, we assume that the competing failure modes are independent of each other in this paper.
In practice, the observed degradation data are often highly unbalanced.Here unbalanced means that the number and time of measurements are not identical for degradation units in a given population of products.Due to the unbalanced nature, the degradation data cannot be rationally analyzed by using the traditional models.Many researchers have studied this problem.Zhou et al. [35] presented a Fig. 1.Zero-failure data from degradation units Fig. 2. Incomplete data knowledge, most of the studies in the competing failure analysis have not considered the two-sided confidence intervals of the reliability, which is an important index in the reliability evaluation, such as Zuo et al. [37], Li et al. [18], and Bocchetti et al. [5], etc.To remedy this deficiency, we develop a bootstrap (simulation) procedure to derive the two-sided confidence intervals for the reliability of the competing failure.
In this paper, we propose a generalized reliability analysis model for the competing failure mode under the hypotheses that i) the product fails when a first of the competing failure mechanisms reaches a failure state; ii) each failure mode has a known life distribution model; iii) the competing between degradation failure and catastrophic failure results in products failure.A linear random-effect model is presented for analyzing the highly unbalanced measurement data from performance degradation failure, and a least square method for parameter estimation has been developed in the situation where the degradation and catastrophic failures are independent.For the catastrophic failure mode, the concept of interval statistics is introduced, by combining the catastrophic failure data and the last measured time points of the degradation units that have one or two measured time points, a reliability model based on Weibull distribution is proposed.Moreover, the two-sided confidence intervals of the reliability for competing failure mode are given based on the bootstrap method.
The rest of this paper is organized as follows.Section 2 introduces the reliability models for the performance degradation model, catastrophic failure model and competing failure model.Section 3 and Section 4 present the estimation theory of the parameters of performance degradation model and catastrophic failure model, respectively.Section 5 gives the steps for reliability confidence interval estimation of competing failure mode.Section 6 contains an engineering example to demonstrate the proposed method.Section 7 includes the summary and conclusions.

The performance degradation model
For product performance degradation, it can be considered as failure when the degradation reaches to the failure level f D .Among several existing modeling approaches, a widely used one is the linear random-effect model.Its modeling procedure is as follows.

Assuming 1.
n units are put into test, q n units occur catastrophic failure and m units occur performance degradation failure, where q n m n + = .For each degradation unit, the measured times are random.For example, the performance of unit i is

1). In
Table 1, δ = 0 means that the performance degradation occurs only, δ = 1 means that the catastrophic failure occurs only.
Based on the properties of the linear random-effect degrada-2.
tion model, we have: For convenient calculation, let: Thus the linear random-effect degradation model can be expressed by: where . ρ denotes the correlation of β i1 and β i2 .
β β i is the random-effect parameter vector of the i th unit.ε ε i denotes the measurement error.The β β i and ε ij are assumed to be mutually independent of each other.
, , , , , σ σ ρ σ ( ) denote the vector of the unknown parameters.Then, a simple least square method can be developed to estimate the unknown parameters Θ in the proposed degradation model.
Without loss of generality, we assume that the degradation measurements increase over time.Thus, the distribution of time-to-failure T can be defined as:

tEchNology
Finally, the reliability function at a given time 5) t can be defined as:

The catastrophic failure model
For the degradation unit that only has one or two measurement time point and the catastrophic failure does not occur, the last measured time points can be regarded as zero-failure data, such as 32 t and 51 t in Table 1.Then, the incomplete catastrophic failure data consists of zero-failure data and catastrophic failure data.This is true.First, it is assumed that the competing failure modes are independent due to different root causes.Second, the catastrophic failure has not occurred until the last test time point for an individual degradation unit.Moreover, the corresponding performance degradation value is far enough from the predefined failure level.
For q n units of the catastrophic failures, the corresponding failure times are 1 2  denote the zerofailure data set.So the incomplete data set can be defined as We assume that the catastrophic failure time follows a Weibull distribution.Thus, the probability distribution function can be defined as: And the reliability function of the catastrophic failure can be represented by: ( , , , , , , , , , )

The competing failure model
The reliability analysis presented in this paper is based on the assumption that the degradation failure mode and the catastrophic failure mode are independent of each other.Thus the reliability function of the competing failure for an operating time t is expressed as: where TC t and d t denote the catastrophic time-to-failure and degradation time-to-failure, respectively.

Parameter estimation of performance degradation model
In this section, we discuss a simple least square method for estimating the unknown parameters in degradation model.First, let Therefore, the linear model of performance degradation can be rewritten as: where * i n I is an identity matrix.
Based on the least square theory, the sum of squared error of performance degradation model can be expressed as: Let: ∂ ( ) So the unbiased estimation of the random coefficient's mean is: An estimator of the error variance σ  i 2 for degradation unit i is where p is the dimension of β i .
 ( ) It can be proved that the unbiased estimation of error variance In [36], the author discussed the unbiased estimation of variancecovariance matrix ∑ for the linear mixed-effect model.So we derive the unbiased estimation of the random coefficient's variance-covariance matrix based on [36]. where 4. Parameter estimation of catastrophic model where 0 Theorem 2. For the interval statistics 1 where 0,1, 2, , Theorem 3.For the statistics 1 the joint probability density function of the i th interval statistics o i X and the j th order statistics j X is:  , or as: if i j ≥ , and , 0,1, 2, , i j N =  .The proof of Theorem 1-3 are detailed in Appendix A.

Parameter estimation of catastrophic failure model
According to the above discussions, the incomplete data t follows a Weibull distribution.Let * ln t t = , ' p q n n n + = , σ α = 1 and µ η = ln , so * t follows the Extreme value distribution: The transformed catastrophic failure times where  Thus, the residual sum of squares Q can be obtained , so the best linear unbiased of the parameters µ and σ of Extreme value distribution can be estimated by the Gauss-Markov theorem.
Then, the parameters α and η of Weibull distribution can be estimated by:

Estimation of confidence intervals for R(t)
Based on the methods given in Sections 3 and 4, we can obtain the estimates of the competing failure model parameters  b ,  ∑ , σ  , α  , η  .The reliability R(t) of competing failure can be calculated by substituting the estimates into R(t;b     , , , , ) There are many methods to construct confidence intervals for a point on a distribution function.
One should note that it is nearly impossible to estimate the standard error of R(t) directly, and we cannot select an appropriate distribution for the reliability function.Therefore it is a difficult problem to construct confidence intervals for R(t) .The bootstrap method is often used to construct confidence intervals or assess standard errors when there is no appropriate approach that is both tractable and sufficiently accurate.Accordingly, we develop the following bootstrap procedure to construct pointwise confidence intervals for R(t) .
Estimate the degradation model parameters 1.
b , ∑ , and the catastrophic model parameters α ,η by using the method in Section 3 and 4 respectively, giving  of catastrophic failure time from Eq. (34).Then the incomplete data consist of q n simulated realizations j t  and p n zero-failure data from degradation units.
, where ε  ij are pseudo measuring er- rors generated from N 0 ( ) and compute the corresponding degradation failure Compute the estimate 7.
  R (t) d from the simulated empirical distribution: Compute the estimate 8.
  R(t) of the competing reliability by sub- TC into Eq.( 7) for any desired value t .

sciENcE aNd tEchNology
Do step 2-9 B times to obtain the bootstrap estimates 10.
  Sort the estimates 11.
  for each desired time t to give   Following [10], determine the lower and upper bounds of 12.
pointwise 1 α − confidence intervals for R(t) as

Case study
The reliability evaluation method presented in this paper for products with competing failure can be illustrated by an engineering example based on a well-known data set given in [26].The data (Table 2) contains information about 33 cylinder liners of 8 cylinder SULZER RTA58 engines which were tested.A liner's failure is the competing result of wear failure and thermal crack failure.The wear failure mode can be treated as a performance degradation process and the thermal crack failure mode can be treated as a catastrophic failure.In this paper, we assume the two failure modes are independent.In Table 2, δ=1 and δ=0 represent the catastrophic failure and performance degrada- tion failure, respectively.The measured time points and time intervals of degradation units are listed in Table 2.

Reliability of the performance degradation failure mode
From Table 2, we observe that the data from 11 units ( i = 5, 7, 12, 13, 14, 17, 24, 25, 26, 28 and 30) can be considered as the degradation failure mode.The cylinder liner is defined to have failed, if the wear exceeds a degradation threshold value f D =4 mm.However, it is obvious that the 14th, 17th, 24th, 26th and 30th degradation units only have one or two performance degradation measurement.Then we can consider the last measured times of these degradation units as the zero-failure data or right-censored data of the catastrophic failure mode.Therefore, the degradation data consist of the remaining 6 units ( i = 5, 7, 12, 13, 25, and 28).
To test the normality assumption, we give the quantilequantile (Q-Q) plot for the degradation data, as shown in Fig. 3, which shows that the plot of the quantiles of degradation data versus theoretical quantiles from a normal distribution is close to linear.In addition, we perform the Shapiro-Wilk (S-W) goodness-of-fit tests.The S-W test also verifies the normality assumption of the random-effects model for the degradation data with p-values of 0. 73.
For each degradation unit, the estimates of the degradation parameters β  i1 and β  i2 can be obtained based on the least square method given in Section 3. Then we use these estimated results to test the assumptions required for the degradation model.For the random-effects degradation model, we assume that β σ  , ( ) .In order to demonstrate the normality, we first give the P-P plots of β  i1 and β  i2 , as shown in Fig. 4 and Fig. 5.
The sample points will be approximately linear if they are normal.From Fig. 4  Then, we apply the proposed random-effects degradation model to fit the data.Based on the simple least square method mentioned in Section 3, the parameters in the degradation model can be estimated as: To demonstrate the goodness of fit, the estimated mean degradation path is used to compare with the degradation sample.The results are depicted in Fig. 6, which shows the goodness-offit of the degradation model.
For further illustration, the 100 p th percentile of performance degradation at a given time t can be expressed as: .From Fig. 7, it can be observed that most wear data is under the 10th percentile curve obtained from the proposed model.The standard residuals plot over time is further given in Fig. 8, which shows that the proposed degradation model is appropriate to describe the degradation data.Thus, we can obtain the reliability function  ( ) d R t of the performance degradation failure by substituting the estimated parameters into Eq.(4).

Reliability of the catastrophic failure mode
The incomplete data set consists of 22 catastrophic failure units and 5( i = 14, 17, 24, 26, 30) performance degradation units.According to the lifetime of the two failure mode, the incomplete data is listed in Table 3.The last measured time points of the 30th , 26th and 17th degradation units can be considered as the values of the 1st, 9th and 13th interval statistics, and the last measured time points of the 24th and 14th degradation units both can be considered as the values of the 10th interval statistic.
Assuming the catastrophic failure times follow a Weibull distribution, this assumption can be justified based on theoretical considerations that fatigue life data is often shown to be adequately analyzed using the Weibull distribution and is supported by a graphical analysis.In particular, the graphical analysis is performed by plotting on Weibull paper as Fig. 9. Fig. 9 shows that the points roughly follow a straight line and gives no obvious evidence that the catastrophic failure data do not fit a Weibull distribution.
The calculated catastrophic failure model parameters of incomplete data are listed in Table 4.In comparison with the conventional approach, the estimates of Weibull distribution parameters have significantly increased.The estimated shape parameter α  increases from 3.1914 to 10.4123, and the reason is that the current incomplete data estimation theory combines the catastrophic failure data with the last test time points of the degradation units that only have one or two inspection time points so that the sample size is enlarged, which means that the population properties can be depicted more properly.Meanwhile, with the increasing of life information content, the estimated scale parameter η  also increases.In addition, the MTTF t is improved twice due to full use of the test information.
According to the results listed in Table 4, the reliability of the catastrophic failure at given time t can be obtained as:  Estimated by traditional method [7]   Estimated by the method in this paper . t versus t respectively.It can be observed that the catastrophic failure of cylinder liner is the dominant failure mode.After operating the first 12000h, the reliability of the catastrophic failure mode begins to significantly decrease.This result can be explained by the fact that the probability that the thermal crack occurs will be enlarged with the increase of the wear loss.
Following Section 5, we obtain the competing failure model estimate of  R(t) with pointwise two-sided 90% and 80% confidence intervals as shown in Fig. 11.The confidence intervals are obtained by the bootstrap simulation with B=5000 and B N =10000.

Conclusions
The conclusions drawn from this research are as follows: Considering degradation and catastrophic failures, a general 1.
reliability analysis model for the competing failure mode has been presented.Unlike the previous studies assuming that the degradation 2.
data are repeated measurements, this paper presents a linear random-effect model for the highly unbalanced measurement data, and has developed a least square method for parameter estimation in the situation where the degradation and catastrophic failures are independent.
For the catastrophic failure mode, we propose a reliability 3.
model based on Weibull distribution.By combining the catastrophic failure data and the last measured time points of the degradation units that have only one or two measured time points, we obtain the estimates of the catastrophic failure model based on interval statistics theory.This method makes full use of the test information and improves the accuracy of estimation.
Based on the bootstrap method, we obtain the two-sided con-4.
fidence intervals of the competing failure model for reliability assessment.A practical application case was examined by applying the 5.
proposed methods to analyzing the competing failure data of cylinder liners.The results show that the degradation and catastrophic failure models presented in this paper are feasible and reasonable in practical applications.However, the performance degradation failure and the catastrophic failure in some products are dependent of each other.In addition, more than two failure modes have been found in some products.Therefore, future study would also focus on the competing failure model of product which takes more than two dependent failure modes.

Proof of Theorem 1:
For N order statistics 1 , which also can be represented by 0 Define a interval ( ) + , then we have the probability: For N order statistics, there are the corresponding 1 N + interval statistics.We assume that one interval statistic o i X is in the interval Similarly, Theorem 2 and 3 can be proved.
actual degradation path for unit i .Then, and the incomplete data to estimate parameters of the competing model, giv- and Fig. 5, it can be observed that both the estimated values of β  i1 and β  i2 perform quite well.To further test the normality of the degradation model parameters, the S-W goodness-of-fit tests are performed.For the random-effects model, the S-W test failed to reject the null hypothesis that β  i1 and β  i2 are normally distributed with p-values of 0.57 and 0.29, respectively.

Fig. 3 .Then
Fig. 3. Q-Q plot of the degradation data

3 .
Reliability of the competing failure Following Eq. (7), we can obtain the estimates of the reliability  R(t) ,  ( ) d R t and  ( ) TC R t by substituting the estimates  b ,  ∑ and α  , η  .Fig.10 depicts the product's reliability  R(t) under the competing failure model together with the reliability  ( ) d R t and  ( ) TC R

Fig. 10 .Fig. 8 .Fig. 9 .
Fig. 10.Reliability of the competing failure, performance degradation and catastrophic failure + , and i interval statistics are in the interval ( ) , x −∞ .Thus, there are N i − interval statistics in the interval ( )

Table 1 .
Product competing failure data

4.1. Definition of interval statistics
i X illustrates a distribution function ( )

Table 2 .
Performance degradation and catastrophic failure data of cylinder liners

Table 3 .
Incomplete data consisting of catastrophic failures and degradation data

Table 4 .
Comparison between the estimated results from traditional method and this paper