Discrimination between accelerated life models via Approximate Bayesian Computation

Accelerated life testing (ALT) is widely used in high‐reliability product estimation to get relevant information about an item's performance and its failure mechanisms. To analyse the observed ALT data, reliability practitioners need to select a suitable accelerated life model based on the nature of the stress and the physics involved. A statistical model consists of (i) a lifetime distribution that represents the scatter in product life and (ii) a relationship between life and stress. In practice, several accelerated life models could be used for the same failure mode and the choice of the best model is far from trivial. For this reason, an efficient selection procedure to discriminate between a set of competing accelerated life models is of great importance for practitioners. In this paper, accelerated life model selection is approached by using the Approximate Bayesian Computation (ABC) method and a likelihood‐based approach for comparison purposes. To demonstrate the efficiency of the ABC method in calibrating and selecting accelerated life model, an extensive Monte Carlo simulation study is carried out using different distances to measure the discrepancy between the empirical and simulated times of failure data. Then, the ABC algorithm is applied to real accelerated fatigue life data in order to select the most likely model among five plausible models. It has been demonstrated that the ABC method outperforms the likelihood‐based approach in terms of reliability predictions mainly at lower percentiles particularly useful in reliability engineering and risk assessment applications. Moreover, it has shown that ABC could mitigate the effects of model misspecification through an appropriate choice of the distance function.


INTRODUCTION
Designing components with high reliability and longer lifetime presents a challenging task for industry because the necessary testing time under normal use conditions is often excessive. Such circumstances lead to consider accelerated life testing (ALT) as the most suitable technique to overcome this issue through the extrapolation of the failure data collected from severe stress levels to the nominal stress level. 1 It is commonly assumed that the failure modes and failure mechanisms remain the same for all the applied stress levels to be able to extrapolate. Roughly speaking, the components are subjected to higher stress conditions than those for which they were designed to operate forcing them to fail more rapidly. It is commonly assumed that for all stress levels, the failure times are governed by the same parametric lifetime distribution. The acceleration life models are formulated by expressing the parameters of the reliability model with relationships, called acceleration models, which express their dependencies on the stress levels. The statistical physics-based models explain the relationship between the applied stress levels and the system failure time by using the parameters of the failure physics in conjunction with the statistical parameters to obtain realistic models. 1-3 A detailed description of ALT can be found in Meeker and Escobar. 4 In ALT, researchers are widely divided between two assumptions: (1) the stress independence of the distribution shape parameter is commonly accepted because the resulting models are typically easier to use and the data set suggests that from a practical standpoint, such a constraint is sometimes valid since extrapolation is made assuming that the failure mechanism remains the same for all the accelerated stress levels. Therefore, the tested units fail in the same manner across different loads. Carlsson et al. 5 made this assumption for the service life assessment of solar thermal components, and Gaertner et al. 6 for the life prediction at normal use conditions of cathode ray tubes. However, the second assumption (2) is that the shape parameter does depend on stress, that is, in the case of more than one parameter of the lifetime distribution being dependent on the stress, each one can be related to the stress by an acceleration model. In the literature, some authors suggested checking the validity of the stress independence assumption by a graphical procedure. Whitman et al. 7 showed that the slopes of the probabilities lines for the different stress levels are nearly parallel. Also, Barlow et al. 8 showed that ALT data of Kevlar/Epoxy strands follow Weibull distribution with a stress-dependent shape parameter. In the same perspective, Giuseppe et al. 9 studied the estimation of fatigue reliability of structural components via a two-parameter Birnbaum-Saunders (BS) model with stress-dependent parameters (both scale and shape parameters).
Selecting a suitable acceleration life model is still quite challenging for practitioners. In practice, several competing models could potentially fit the data distribution reasonably well, regarding the physics involved. Birnbaum and Saunders 10 showed that most two-parameter lifetime distributions fit well the fatigue data and produce similar fitting qualities in the central tendency region of the reliability curve. Nevertheless, large discrepancies could be seen in lower and upper percentiles when comparing these distributions. This is a challenging issue for industry interested in low percentiles predictions. Box and Draper 11 stated that 'All models are wrong but some are useful', this means that there is no true model for all the situations but there is only a model that performs better than the other competing models for a specific goal. Hence, the challenging task consists in selecting the 'most useful' model and estimating its parameters from ALT observations. This is of a paramount importance to make good reliability predictions at normal use conditions, especially for lower lifetime percentiles that are important for early failure predictions, costly product failure claims avoiding and warranty analysis. 12 Thus, models should be compared in order to select the one which presents the highest performance. A comprehensive review of reliability model selection was presented by Settani et al. 13 Several classical methods have been proposed in the literature to deal with model selection issues. The largely used ones are presented hereafter. The classical techniques commonly used are the weighted least squares method, [14][15][16] the graphical analysis method, 17,18 the likelihood ratio test (see Kalbfleisch et al. 19 ) and the Information Criteria such as the Akaike Information Criterion (AIC) 20 and the Bayesian Information Criterion (BIC). 21 Ghaly et al. 22 used likelihood ratio test to compare between exponentiated inverted Weibull and inverted Weibull distributions using real life data set. These approaches require the estimation of the model parameters that maximize the likelihood function and an additional term to penalise overly complex models. The major limitation of these approaches is that they undertake separately the competing models regardless the uncertainties related to the parameters estimates which could lead to an inaccurate selection of the 'best' model as stated by Gupta et al. 23 Contrarily to the classical approaches, Santana et al. 24 proposed a novel Bayesian formulation based on the residuals evaluation to examine the benefits of the uncertainty evaluation in the model selection procedure. Other Bayesian approaches based on goodness of fit indices evaluation are used to discriminate between the competing models. The idea is to select the model that presents the best point estimate of the indices to be evaluated. 25 Ling et al. 26 proposed an optimization method for parameter estimation and model selection based on the minimization of the Kolmogorov-Smirnov distance between the observed probability of failure and the expected probability of failure associated to the competing distributions.
Further methods based on Bayes theory have been proposed in the literature for model selection. The Reversible-Jump Markov-Chain Monte Carlo (RJ-MCMC) algorithm is one of the largely used Bayesian methods in model selection issues. 27 Several studies have addressed the strengths and weakness of RJ-MCMC and the other trans-dimensional setups of the algorithm. 28,29 Indeed, their implementations require the construction of across-model proposals between the state spaces. Hence, the change of models dimensions introduces additional difficulty since the definition of the between-model transition matrices may no longer be intuitive. This could result in slowing the state space exploration. In some circumstances, the proposed between-model jumps tend to be rejected and the chain becomes trapped in one model, even though that model has low posterior probability. 30 Bayes factors (BFs) 31 have been considered for a long time as the standard tools for performing Bayesian model comparison. However, they provide only a relative comparison of competing models, not the absolute values of their posterior probabilities. In fact, from a decision-making point of view, it can be perceived as the posterior probability of the null hypothesis. The main drawback of BFs is their sensitivity to the choice of priors. In fact, the approach proposed by Jeffreys 31 is not available for comparing models with improper priors since these are defined only up to a multiplicative constant, leaving the BF undetermined. 32 This issue was addressed by the intrinsic Bayes factors (IBF) proposed by Berger et al. 33 The IBF uses a part of the data as a training sample to convert the non-informative priors to proper posterior distributions. Then, the IBF are computed with the remainder of the data and using the posterior distribution estimated from the training data as a prior distribution. Nevertheless, the main challenges of IBF are that they manage only pairwise comparisons to discriminate numerous models.
So far, the common limitation of the classical Bayesian approaches is that they require the definition of a likelihood function to measure the level of agreement between the observed and simulated data particularly for high-dimensional models and low-sized samples. In some circumstances, the likelihood function could not be available in a closed form and may encounters numerical problems such the non-convergence or the convergence to the wrong roots raised in Refs. 34-37 Another aspect concerns the effects of model misspecification on the desired estimates of some life characteristics. Balakrishnan et al. 38 showed that the effect of model misspecification between Gamma and Weibull distributions should not be ignored for one-shot device testing data. The authors evaluated bias, root mean square errors (MSEs), coverage probabilities and average widths of confidence intervals for the mean lifetime as well as the reliability at a given mission time and suggested implementing a specification test to improve the estimation accuracy. Furthermore, Khakifirooz et al. 39 considered generalised Gamma, Lognormal and Weibull distributions for studying model misspecification effects under accelerated lifetime testing with censored data. Castilla et al. 40 studied the effect of model misspecification between Gamma, Weibull, Lognormal and BS lifetime distributions on the design of optimal constant-stress accelerated life-test for one-shot device.
In this work, and in order to overcome the above-mentioned issues, a likelihood-free method called also Approximate Bayesian Computation (ABC) which has been widely used in the literature (see, Beaumont et al. 41 ) to deal with model selection and parameter estimation is adopted. Contrary to the classical Bayesian inference, which requires to write down a likelihood function, the ABC algorithms relax the need for an explicit likelihood function. In a nutshell, it consists in approaching the posterior distributions by generating samples from prior distributions of the model parameters, evaluating the level of agreement between the ALT data and a simulated data through an appropriate distance metric and propagate good particles until the convergence condition is satisfied. Nevertheless, a well-known problem of the ABC algorithm in its basic form is the low acceptance rate of samples. It decreases dramatically along the iterations which increases exponentially the computational time. This has been reported by Bonassi et al. 42 To overcome this shortcoming and alleviate the computational burden, an ABC algorithm based on an elegant ellipsoidal Nested Sampling technique (ABC-NS) and a re-weighting scheme introduced in Ref. 43 is used. It promises drastic speedups compared to other variants 43 and provides a good alternative for parameter estimation as shown in Ref. 44 Additionally, it has been demonstrated that the ABC-NS can mitigate the effects of model misspecification through an appropriate choice of the distance function which can be tailored according to the user's interest.
The paper is organized as follows. In Section 2, the principle of the Bayesian inference and ABC is set out with a brief description of the algorithm implementation. Section 3 introduces the selected accelerated life models used in this study. In Section 4, a simulation study is carried out to demonstrate the performance of the proposed likelihood-free Bayesian procedure compared to a likelihood-based approach. In Section 5, the procedure is applied on the basis of real fatigue life data. Finally, Section 6 provides some concluding remarks with an emphasis on future research challenges.

Bayes formalism
Bayesian inference has proven very efficient to deal with model selection and parameter estimation in many disciplines. It consists of assessing the relative plausibility of a set of competing models M M M = { , = 1, … , } as well as the parameters within each model (for simpler presentation a subscript on is omitted) using a combination of one's prior knowledge and a set of available data, . Both levels of inference (i.e. parameter estimation and model selection) can be addressed via the sequential application of Bayes' theorem: Evaluation of Equation (1) requires the definition of the prior, ℙ( | ), and the likelihood, ℙ(| ,  ). The prior is a subjective probability distribution which describes our prior belief for the parameter values. The likelihood encodes the information contained in the data according to model,  , with parameters, . The denominator of Equation (1) is a normalising constant which ensures that ℙ( |,  ) integrates to unity. Successful evaluation of Equation (1) gives the posterior parameter distribution, which describes the probability of parameter vector, , given the data, , and the chosen model structure,  .
With regard to Equation (2), ℙ( ) is the prior probability of  that expresses the modeller's judgement on the initial relative plausibility of  ∈ M M M, before observing data . Prior model probability ℙ( ) can be specified depending on the existing prior knowledge about the credibility of model  , or it can be given a uniform probability, ℙ( ) = 1∕ , if not referring to any information. ℙ() is a normalising constant and ℙ(| ) is equal to the evidence term on the denominator of Equation (2). ℙ( |) is a distribution describing the relative probability of different competing model structures conditional on the data, .

Likelihood-free method
To deploy ABC methods, we need to be able to simulate data from the model, and require a distance function, (,  ⋆ ), where  ⋆ is the simulated data generated through the model. This distance function provides a measure of discrepancy between the observed and the simulated data. After calculating (,  ⋆ ), we accept the particle where (,  ⋆ ) ≤ , where is the tolerance threshold. This leads to a modification of original Bayes theorem (see, Equation 1, the model is dropped for simplicity): For the prior probability distribution, ℙ( ), we will assume a uniform prior over the possible parameter set, . By being able to produce simulations from the model, we are able to perform inference for the parameters of interest, subject to data . The tolerance threshold determines the level of approximation as → 0, ℙ( | (,  ⋆ ) ≤ ) → ℙ( |).
As already mentioned, the ABC method can address both levels of inference. Conversely to the classical Bayesian inference, the ABC algorithm brings a straightforward solution to approximate ℙ( |) (see Refs. 45,46). Briefly, a model  is sampled from the prior and the model is stored only if the difference between simulated data and the observed data is less than a prefixed threshold. The posterior probability ℙ( |) is then approximated by the acceptance frequency of  . When no prior information is available for the preference of the models, one assumes that the prior on the models follows a uniform distribution. As a result, comparing ℙ( |) is the same as comparing ℙ(| ), and the model with the largest value of ℙ(| ) should be selected. In the literature, ℙ(| ) is known as the Bayesian evidence or the marginal likelihood under model  , and it has also been widely used for model selection (e.g. see Refs. 47,48). The most elementary implementation of ABC is ABC rejection sampling (ABC-RS), 49,50 (see Algorithm 1). This algorithm requires a huge computational time to converge because it samples blindly over the model and parameter spaces. To overcome this issue, many variants have been proposed in the literature. 51,52 In the

Steps of the ABC-NS algorithm
The iterative process is given in Algorithm 2 for completeness (see Ref. 43 for more details). The algorithm starts by selecting a model from the candidate models supposed to be equally probable. A proposal particle is then sampled from the prior over the model parameters, simulating the data using the probability model and keep the pair (model, particle) if the constraint (,  ⋆ ) ≤ 1 is satisfied (here,  for observed data and  ⋆ for simulated data, (⋅) is the function measuring the discrepancy). In this study, the initial tolerance threshold is chosen to target an acceptance rate roughly equal to ∼ 50%. The simulations are repeated until one gets particles distributed over the candidate models ( is equal to 1000 for all the simulations). Then, weights are calculated for each particle and normalised according to each model using the kernel shown in step 10. The first iteration of the ABC-NS algorithm is similar to the ABC-RS algorithm but with a large tolerance threshold value to speed-up the algorithm. The subsequent tolerance threshold is defined based on the discrepancy values ranked in descending order (highest on top, see, step 12) as the ( 0 )th value where 0 is equal to 0.3. The dropped particles represent 30% from the total number of particles. After that, we normalise the weights of the remaining particles (see, step 14) and a weight of zero is assigned to the dropped particles. From the remaining particles, a number of ( 0 ) are randomly selected and propagated to the next population ( 0 is equal to 0.6). In this way, the algorithm can visit any part of the parameter space and cannot be trapped in a local minima to ensure a good exploration of the model and parameter spaces. The remaining particles associated to each model are then enclosed in an ellipsoid in which the mass centre and the covariance matrix  are estimated based only on the remaining particles; one denotes this ellipsoid by  ( ) = ( ( ) ,  ( ) ). The aim of the elliptical bound is to restrict the parameter space around the most interesting part of the parameter space which improves the acceptance ratio and efficiency through the iterations. The volume of the generated ellipsoid could be enlarged by a factor 0 as mentioned in step 17 to ensure that the particles on the borders will be inside. It is worth noting that the ellipsoidal sampling was firstly proposed in Ref. 53 to improve the efficiency of the nested sampling algorithm which has been widely used for Bayesian inference. Finally, the population is replenished by re-sampling particles inside the enlarged ellipsoid for each model (see step 18) following the scheme in Ref. 54 In the subsequent steps, the threshold is updated adaptively in the same way and samples selection are subjected to more stringent threshold. The priors on the models are also updated when one of the competing model is eliminated. Through the populations and as → 0, a larger number of particles are selected for the most likely model(s) and the samples for the parameters better reflect the real posterior distribution. Several stopping criteria could be used to stop the algorithm. In this work, the algorithm is stopped when the difference between two successive tolerance threshold values falls below a pre-specified value.
The hyper-parameters applied in this study are those suggested in Ref. 43 through a sensitivity analysis. To sum up, the ABC-NS hyper-parameters are selected in such a way as to guarantee the best trade-off between the computational time and the statistical efficiencies by ensuring relatively high acceptance rates over iterations (≥50%) and precise estimates.

6:
Simulate a candidate data set  ⋆ ∼ (⋅| ⋆ ,  ⋆ ) In fact, the stopping criterion threshold value, , is selected by the user according to the required precision on the posterior estimates. Ben Abdessalem et al. 43 showed that, for a fixed , the ellipsoid enlargement factor, 0 , impacts slightly the acceptance rate of the ABC-NS approach over the populations. Furthermore, the effect of 0 on the posterior estimates is negligible. As for the initial tolerance threshold value 1 , it is calibrated, particularly for each case study, by running some few preliminary numerical simulations. Indeed, it should be set large in order to explore all the parameters' space. Finally, the best trade-off between 0 and 0 is, respectively, 0.3 and 0.6 as these values ensure a stable convergence and a high acceptance rate.
At the last population, the algorithm produces a Markov chain on (, ) for which the marginal distribution is p(|). Let us suppose that ( ( ) , ( ) ) with ( = 1, … , , = 1, … , ) are obtained at the last population, the posterior model probability can then be estimated byP

Selection of distance functions
To implement the ABC-NS algorithm, one needs to select an effective distance function. Various types of distances have been proposed in the literature to measure the discrepancy between the empirical data (times to failure) and the data simulated by the reliability model. In this section, the computational forms of the distances used in this study are given. 55 • The Cramer-von Mises distances The Cramer-von Mises (CM) distance measures the distance between the cumulative distribution function (CDF) of the derived distribution against the dataset's cumulative histogram. It is defined by where ( ) is the predicted cumulative probability from the theoretical CDF and ( ) is the empirical cumulative distribution function (ECDF) of the observed data. Ψ( ) is a weight function such that Ψ( ) = 1. For given ordered sample, 1 , 2 , … , , from a statistical distribution, the distance can be written as follows: where = ( ) is the predicted value for the th observation. In the following, the same alphabets denote the same meaning. • The Anderson-Darling distance The Anderson-Darling (AD) distance first introduced by Anderson and Darling 56 to place more weight at the tails of the distribution. 57 It is defined by the following equation: where Ψ( ) = [ ( )(1 − ( ))] −1 is a non-negative weight function. Equation (7) can be written for a finite data sample as Two modified distances of the AD distance called ADL and ADR which give more weight to the left and right tail, respectively, have been tested. They are computed by the following formulas: Similarly, the AD2R and AD2L metrics, which assign larger weights to the tails of the distribution, are considered. They are computed by the following formulas:

Choice of the parametric lifetime distributions
The accelerated life models consists in choosing a parametric distribution and an acceleration model. In practice, the ALT data (times-to-failure) may be represented by several lifetime distributions. In this study, three parametric distributions named Weibull, BS and Lognormal distributions will be used. It should be noted that a candidate distribution is chosen for one of the following reasons: (i) there is a physical statistical argument that theoretically matches a failure mechanism to a life distribution model, (ii) a particular model has previously been used successfully and confirmed by experience for the same or a similar failure mechanism, (iii) a convenient model provides a good empirical fit to all the failure data.
In this study, our choices are motivated by the failure mechanism which is induced by fatigue. The specifications of the parametric distributions used in this study with some relevant life characteristics are defined below.
• The two-parameter Weibull distribution It is a popular distribution used for modelling the lifetime data because of its flexibility within multiple types of failure mechanisms and its easiness of computing probabilities without the need for numerical integration (Cao et al. 58 ). The probability density function and the reliability function of the twoparameter Weibull distribution are given respectively by where and are the scale and shape parameters, respectively. is the lifetime of the component. The percentiles, denoted p , is the time by which a specified fraction of the population fails is given by where −1 (⋅) is the inverse CDF. The mean time to failure (MTTF) is the expected life ( ), it is defined by where Γ(⋅) denotes the gamma function. • The two-parameter Lognormal distribution The Lognormal distribution is widely used for modelling failure times of materials and structures induced by fatigue. When has a Lognormal distribution with parameters and 2 , then = ln ( ) has a normal distribution with mean equal to and a variance 2 . 4 The probability density function of Lognormal distribution is where is the standard normal pdf. The percentiles is given by the following equation: where Φ −1 (⋅) is the inverse standard normal CDF. The MTTF is defined by • The two-parameter Birnbaum-Saunders distribution The BS model is based on a physical argument of cumulative damage that produces fatigue in materials, which is exerted by cyclical stress. The failure follows from the development and growth of a dominant crack in the material. The probability density function and the reliability function of the BS model are given, respectively, by where et are the shape and scale parameters, respectively. is the lifetime of the component and Φ(⋅) is the CDF of the normal distribution. By definition, the BS random variable ∼ BS( , ), is linked to the standard normal random variable (where ∼ N(0,1)) by the following relationship: The percentiles and the MTTF are given, respectively, by the following equations:

Choice of the acceleration model
This is a key element of ALT, as it expresses the relationship between the life characteristics and the applied stress. Combined with the life distribution, it provides a single model, known as the accelerated life model, which the ALT data analysis can be based on. It has been assumed in this study, that the parameters of the selected distributions are expressed as stress function through the inverse power law (IPL) relationship which is commonly used for non-thermal accelerated stresses. The choice is also motivated by the failure mechanism and the nature of the applied stress. The IPL is given by Equation (25): where ( ) is the life characteristic, x = 0 is the ratio of severe stress by operating stress, 0 and 1 are the IPL parameters of the acceleration model estimated from the ALT data. From Equation (25), a linear extrapolation can be obtained using a logarithmic transformation on the accelerating variables as follows: Following this, the dependence of the scale parameter of the Weibull distribution at the th stress level can be expressed as follows: In the same way, the shape parameter could be linked to the applied stress by the same formula: The same IPL has been used for the BS and Lognormal distributions. In this study, for illustrative purpose, only five competing accelerated life models are considered. The models are obtained by coupling the IPL to the two-parameter Weibull, the BS and the Lognormal distributions as listed in Table 1.

Formulation of the log-likelihood function of the accelerated life models
The likelihood-based approach uses the log-likelihood functions for the different competing models. One assumes that the components are tested only under three accelerated stress levels. Hence, , ( = 1, 2, 3 and = 1 … ) stands for the set of the ALT data in the ith group and is the number of failures observed at . Let LH denotes the log-likelihood function of  1 . It is given by the following equation 59 : Similarly, the log-likelihood functions of models  2 ,  3 ,  4 are given, respectively, by the following equations 9 :

SIMULATION STUDY
In this section, a simulation study is conducted to investigate the performance of the ABC method to deal with model selection and parameter estimation of accelerated life models. First, a comparison between the different distances in terms of the bias and the MSE under different sample sizes is given. Then, a qualitative comparison in terms of reliability predictions of some relevant life characteristics is presented. Finally, a comparison of the PCS when very close competing models are considered under different sample sizes is given. Let us consider a three-level constant-stress ALT without censoring. The accelerated data are collected under three stress levels: 1 = 320 MPa, 2 = 360 MPa and 3 = 400 MPa. The operating stress level 0 is supposed to be 200 MPa. It is assumed that the lifetime of an item at use condition follows a Weibull distribution with parameters: = 1.5, 0 = 10 and 1 = 5. The IPL is assumed between the scale parameter and the stress level only for illustrative purpose. In this manner, the lifetime data for each stress level can be obtained by combining the CDF of the parametric Weibull distribution and the IPL following Equations (34) and (35): where is a uniform random number in the interval [0, 1]. Sample sizes of = 5, 10, 20, 30, 50, 100 and 200 have been considered to investigate the effects of the times to failure data. In this study, the model parameters were given uniform priors spanning a range of one order of magnitude above and below crude estimates obtained by least square method 18 as illustrated in Table 2 for  1 .
The ABC-NS algorithm is used here to deal with parameter estimation. It is implemented with the following hyperparameters: • the initial threshold 1 is set to 1000.
• the number of particles per population is set to 1000.
• the proportion of dropped particles and surviving particles are set respectively, to 30 and 60%. • The ellipsoid enlargement factor 0 , is set to 1.1.
• The stopping criterion threshold value is equal to 10 −6 .
The quality of the estimates from the different distances is studied by considering the bias, the MSE and the deficiency (DEF) given by The DEF is a natural measure of joint efficiencies of the estimatorŝ,̂0 and̂1. DEF is formulated as follows: By replicating the experiment 1000 times, we can obtain the estimated MSEs, bias and the deficiencies (ˆ) for the considered distances and sample sizes. From Table 3, the following observations can be drawn: • As expected, the MSE of all the estimators become smaller as the sample sizes increase. The numerical results indicate that the ML estimator behave slightly better than the ABC in terms of the MSE. • The likelihood-free method provides overall good estimates mainly when the AD and ADR are used showing an ABC estimator to be competitive with the ML estimator. • By analysing the bias which is the average tendency of the simulated data to over-or under-estimate the real parameters (positive values indicate a model under-estimation and negative values indicate an over-estimation), it is clear that the LH metric provides the best estimates, seconded by the ADR and the AD distances. • From the same table, one can see that the latter distances provide an estimates close to the ML estimates.
This mean that these distances could be used as a robust alternative estimator when for instance the likelihood function is difficult to formulate or challenging to be minimized which is a commonly encountered in practice mainly when one deals with small sample sizes or when it is challenging to derive a likelihood function.

Estimation of the reliability lifetime characteristics
Put the focus now on the estimation of some important life characteristics widely used in reliability engineering applications. Some of them are: the MTTF, the percentile 100 , (0 ≤ ≤ 1) by which time 100 % of the product population will fail is used as the warranty life. Generally, the percentile life for airplane parts is = 0.1 or = 0.01 and is = 0.05 or = 0.10 for machine parts. 60 In Table 4, these life characteristics using the true parameters with the assumed ALT model are given. The objective here is to compare the predictive capability of the ABC method using the considered distance functions over the likelihood-based method. Furthermore, one investigates the effect of the sample size on the predicted lifetimes. The results summarized in Figures 1 and 2  • It can be seen that the differences on the percentiles estimated by the different approaches and distances can be significant. • One can see that the AD distance outperforms the other distances to get precise estimates of the lower extreme percentiles (B 1 , B 5 , B 10 ) mainly for small sample sizes. • Another important point should be highlighted is that, regardless the sample size, CM and LH distances always overestimate the lower percentiles (B 1 , B 5 and B 10 ) by several orders of magnitude. • For LH distance, the estimates present the highest MSE for extremely small sample sizes ( = 5). However, the MSE using LH decreases for large sample sizes and performs better than the other distance functions. • Only the AD2L and AD2R yield conservative estimates of the percentiles as they ensure negative bias associated to lower percentiles. In the same way, one can see that the other distances provide always non-conservative predictions as the bias are greater than 0. • It is clear that the MSE of each low percentiles for the considered distances tend to zero for large sample sizes (for instance, for = 5 the log(MSE) of 1 estimated for AD2L distance is 25% greater than that estimated for = 200 using the same distance function). • For large sample sizes, the MSE deviations between the considered distances start to widen contrary to small sample sizes. Indeed, AD2L yields the highest values (as could be seen for = 20, = 30, = 50, = 100, = 200). • One can see that, for a given sample size, the MSE magnitudes increase for larger percentiles regardless the used distance function. • To conclude, it is recommended to use the ABC method coupled with the AD distance to precisely estimate the lower lifetime percentiles as it presents the lesser bias and the smaller MSE mainly for small sample sizes. Furthermore, LH distance presents the less precise predictions of the low percentiles regardless the sample size. However, this overestimation is less pronounced for large sample sizes.  In the same way, we shall compare the MTTF life characteristics. Figure 3 displays the log-transformed bias and MSE of the estimated MTTF using different distances and for different sample sizes. From Figure 3, it is obvious that for small sample sizes ( = 5), the predictions provided by the LH distance are slightly better as the bias is closer to 0 and the MSE value is the lowest. However, as we increase the sample size, the advantage of the ML method becomes less pronounced. For different sample sizes, not much differences were observed between all the distance functions except for AD2L that provides the worst predictions since it presents the highest bias and MSE values of the MTTF. Furthermore, it is important to mention that for = 200, the AD2R outperforms LH distance by providing a better prediction of the MTTF with roughly the same magnitude of the MSE.

Proportion of correct selection
In this subsection, we shall compute the proportion of correct selection (PCS) using different sample sizes and for all the distances by considering very close accelerated lifetime models and nested ALT models. The idea here is to compare the performance of each distance to recover the 'parent model'. Since it is not possible to consider all possible models in a single study, the choice is made to generate the simulated data sets through  1 and to consider only  1 ,  2 ,  3 and  5 (already presented in Section 3.2) for the discrimination studies. This choice is made to, first, keep a reasonable computational time and, second, to present the results for the main scenarios: (1) discrimination between similar models ( 1 vs.  3 and  1 vs.  5 ) and (2) discrimination between nested models ( 1 vs.  2 ). The computation of the PCS matrices was performed using 1000 replications.

Discrimination between close ALT models
The matrices exhibiting the PCS for  1 versus  3 and  1 versus  5 discrimination studies are shown, respectively, in Figures 4 and 5.
One can clearly see, from both figures, that the PCS increases when the sample size increases regardless the distance function. It is apparent from both discrimination studies that the ABC approach using ADL and AD2L distances is performing the best for small sample sizes compared to the likelihood-based distance, through exhibiting the highest proportion of 'parent model' selection. This is quite interesting, as these distances could be efficiently used to mitigate the risk of model misspecification by inferring and discriminating very close acceleration life models when observed failure times are limited.
It is important to mention that the evolution of  1 versus  5 's PCS for different sample sizes shares the same trend as that presented in  1 versus  3 's PCS matrix.
Furthermore, from both figures, one can see that ADR and AD2R distances exhibit the lowest PCS for both cases. This can be explained by the fact that as these distances give more weight to the right-tail region where the available data are limited (the three competing ALT models are right-skewed). It will be difficult to discriminate between overlapping models. One can conclude that these distances are not suitable to discriminate between very close acceleration life models.

Discrimination between nested ALT models
In the following study, one investigates the performance of the ABC approach and the information criteria (IC) to recover the correct model by considering nested ALT models for different sample sizes. The PCS matrices are estimated for different tolerance thresholds (Δ ) using both ABC algorithm and IC.
F I G U R E 1 Bias of the low percentiles predicted using different distances and for different sample sizes.

F I G U R E 2
Mean squared errors of low percentiles predicted using different distances and for different sample sizes.

F I G U R E 3
Mean squared error (MSE) and bias of the predicted MTTF using different distances and for different sample sizes.

F I G U R E 4
Proportion of correct selection between  1 and  3 accelerated life models when the data are generated from model  1 .
Two popular IC have been used: the AIC and the BIC. They are given by where stands for the number of model parameters and (̂) denotes the likelihood function evaluated at the estimated parameters values. BIC = −2 ln((̂)) + ln( ), where represents the number of observed data.
F I G U R E 5 Proportion of correct selection for  1 and  5 accelerated life models discrimination when the data are generated from model  1 .

F I G U R E 6
Proportion of correct selection for  1 and  2 accelerated life models discrimination for = 10 −2 .
It should be mentioned that for a set of candidate models, the selected model is the one that presents the lowest IC value.
Hereafter, LH distance is not considered in PCS matrix since the competing models do not have the same dimensionality. Hence, for the same data sets generated from model  1 , the estimated PCS matrix for Δ ≤ 10 −2 is illustrated in Figure 6. Similarly, a PCS matrix was estimated for Δ ≤ 10 −6 . However, it is not presented in this paper since the proportions of parent model selection tend towards zero for all the considered distances regardless the sample sizes. The following observations can be drawn: • Overall, we notice that, for small tolerance thresholds, all the considered distances largely favour the complex model ( 2 ). • The proportions of  2 selection obtained for Δ = 10 −2 are less pronounced than those for Δ = 10 −6 . This proves that tightening the expected tolerance threshold reduces the selection probability of the simple model. • It can be seen that increasing sample sizes for a given tolerance thresholds favours the selection of  2 regardless the considered distance and the tolerance threshold. This could be explained by the fact that  2 is more flexible and offer a better fitting quality to the data. • Conversely to the ABC algorithm, and ensure the selection of simpler model (i.e.  1 ) regardless the tolerance threshold. The PCS for IC increases slightly for large sample sizes. Indeed, for = 5, and present almost the same PCS value, respectively, 77.5 and 78.5%, and for = 200, the PCS increases to, respectively, 86 and 86.1%.
• Independently of the considered tolerance thresholds, the IC penalises complex models and leads to the selection of the simple model. This could be justified by the fact that models with low number of parameters generalise better.
To sum up, the ABC-NS algorithm coupled with appropriate distances ensures, depending on the user's objective, the flexibility of selecting the complex model and infer its parameters to better fit the considered data (low tolerance threshold) or generalising for unseen data (large tolerance threshold).

REAL DATA APPLICATION
In this section, the ABC-NS algorithm is employed to discriminate between five competing accelerated life models (see Table 1) using real data. It is constituted of 304 fatigue failure times using 6061-T6 aluminium coupons under three different stress levels: 1 = 31, 000 Psi with 1 = 101 observations, 2 = 26, 000 Psi with 2 = 102 observations and 3 = 21000 Psi with 3 = 101 observations. All tests have been conducted until failure (see Ref. 61 for more details). As in the first part, a comparison between likelihood-based approach and likelihood-free method applied for model selection is conducted. For the ABC-NS algorithm, vague uniform priors based on crude estimates from a least square method are employed. The estimates are obtained, for each model, through a preliminary analysis that consists of using the data collected at the stress levels. The priors on the parameters of the considered five accelerated life models are given in Table 5. Furthermore, equal prior probabilities are assumed for the five competing models, that is, ℙ( ) = 1 5 . The other hyper-parameters of ABC-NS algorithm are kept the same as for the simulation study.

Results
In this subsection, we investigate the performance of the ABC-NS algorithm using the AD distance for illustrative purpose. This choice is motivated by the fact that this distance provides the best predictions at low lifetime percentiles and very efficient when very close models are in competition. The posterior probabilities over the populations are shown in Figure 7.
A few important observations are worth mentioning: first, the parsimony principle is well embedded in the ABC-NS algorithm; the algorithm tries, first, to select the simpler models ( 1 ,  3 and  5 ). Then, when these latter are no longer able to fit the data very well, the algorithm switches to a more complex models ( 2 and  4 ). Second, one can see that  1 is eliminated first, then  5 pursued by  3 . After that, the ABC-NS algorithm oscillates between  2 and  4 and finishes by converging to model  2 . This model has been selected around the 100th population with a posterior probability equal to 1. The algorithm continues to run until reaching the stopping condition. In other words, from the 100th F I G U R E 8 Histograms for the parameters of the selected  2 with the AD distance.

F I G U R E 9
Acceptance rate over the populations.
to the 136th population, the algorithm tends to refine the posterior estimates. Figure 8 displays the marginal distributions of the selected model parameters. As one can see, they are very well peaked with low variability, the value of the distance function at the last population is equal to AD = 3.142. Figure 9 shows the acceptance rate over the populations computed by dividing the number of particles needed to replenish each population by the total number of simulated particles. One can clearly see that the ABC-NS algorithm ensures a relatively high acceptance rates. It is worth mentioning that for the first populations, the acceptance rate decreases because the algorithm explores large model and parameter spaces, then it steadily rises as the sampling spaces tighten and the least likely models are eliminated.
For comparison purpose, the algorithm is now employed with the other distance functions in order to select the adequate model(s) and to compare the fitting qualities. Table 6 gives the selected model for the different distances. Again, model  2 is selected as the most likely model to fit the accelerated data when using CM , ADL and AD2L distances (rank 1 in bold in the table). However, for ADR and AD2R distances, the most likely model seems to be model  4 (rank 1 in bold in the table). The obtained results show that the selected model may vary depending on the selected distance function which measures the level of agreement between the empirical and the simulated data and also expresses a desired objective of the user. Table 7 shows that the differences on the percentiles estimated by the different selected models and distances are significant. One important point that should be highlighted is that the AD2L and ADL distances tend to be more 'pessimistic' to predict lifetime at the left-tail. This lies to the fact that both distances give more weights to the left observations which rises the risk of over-fitting. In other words, they provide a good fitting quality at the left-tail region when observed data are available, but they are poor in terms of predictions (for unseen data). Both ADR and AD2R provide probably an TA B L E 6 Model ranking for the considered distance functions. over-estimated values of lower percentiles as one can see from the same table. It is obvious that these distances are not recommended in predicting the 'safe life' at lower percentiles. Another reason to explain the big difference in terms of lifetime predictions is that both distances select  4 as the most likely model. Despite that the selected models are very close, they yield quite different values. It is obvious from the obtained results that the choice of the model but also the distance used to infer the accelerated life model play a crucial role with regard to prediction of tail percentiles. Based on the obtained results and for the purpose of specifying a safe life, one can say that  2 provides the best compromise between a good fitting quality and a better generalization (i.e. prediction of unseen data). Finally, the disparity between the predicted MTTF given by the used distances is less pronounced as one can see from Table 7. Table 8 gives an estimate of and criterion for the competing models. One can see that the most likely model using both criteria is  2 where it presents the lowest IC values (in bold in the table). In addition, it can be seen that  4 provides an acceptable fit to the data as the difference between the IC values is not significant. The obtained results from the likelihood-based approach is coherent with the likelihood-free method. It is obvious that the  2 and  4 are the most likely models to fit the data (the values are in bold in the table). The advantage of the likelihood-free method is that it can be tailored through an appropriate choice of the distance function according to a specific user objective.

CONCLUDING REMARKS
In this paper, a comparison between likelihood-free and likelihood-based approaches has been conducted to deal with accelerated life model selection and parameter estimation. Several distance functions have been tested for the first approach to get more insights. A focus on the ability of each approach to precisely estimate the model parameters, to predict some lower percentiles and to discriminate between very close competing accelerated life models have been investigated. It has been shown that each approach has its own strength and its own drawbacks. The likelihood-based approach is a good estimator but presents some weaknesses in predicting lower percentiles and also to discriminate between very close models when the available data are limited. However, the likelihood-free approach coupled with an appropriate distance (selected based on the user's goal) may be a good alternative to overcome these issues as it has been demonstrated. In addition, the ABC method is a single step procedure as all the models are considered simultaneously and the least likely models are ruled out progressively over the populations. The presented results indicated that the likelihood-free Bayesian approach applied for the first time to discriminate between accelerated life models is a promising option for practical applications regarding reliability of systems/components, mainly when a good predictions of lower percentiles lifetime is of major concern. The method can be tailored to our needs through an appropriate selection of a distance function/metric. The overall conclusion here is that the likelihood-free method may be of considerable value in engineering reliability problems for selecting a 'useful' model. It is flexible in the sense that different distance functions can be used and can be considered as a promising alternative to overcome the drawbacks of the likelihood-based approaches. As it has been demonstrated, it has a good prediction capability especially for lower percentiles. Although, we have discussed only the complete data, but similar results can be obtained for censored data also. The work is in progress and it will be reported later. The improvement of the ABC-NS algorithm using reinforcement learning is another avenue under investigation to gain more in efficiency.

A C K N O W L E D G E M E N T S
We would like to thank the University of Angers and Liebherr Aerospace Toulouse for providing facilities throughout the project. This article has benefited from the contributions of Mr Anis Ben Abdessalem, Mr Laurent Saintis, Prof Bruno Castanier and Mr Rodrigue Sohoin. We thank the journal and the reviewers for the thoughtful comments and constructive suggestions and for the attention given to this work. This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data and the codes that support the findings of this study are available upon request.