Parametric conditional variance estimation in location-scale models with censored data

Abstract: Suppose the random vector (X,Y ) satisfies the regression model Y = m(X)+σ(X)ε, where m(·) = E(Y |·), σ2(·) = Var(Y |·) belongs to some parametric class {σθ(·) : θ ∈ Θ} and ε is independent of X. The response Y is subject to random right censoring and the covariate X is completely observed. A new estimation procedure is proposed for σθ(·) when m(·) is unknown. It is based on nonlinear least squares estimation extended to conditional variance in the censored case. The consistency and asymptotic normality of the proposed estimator are established. The estimator is studied via simulations and an important application is devoted to fatigue life data analysis. MSC 2010 subject classifications: Primary 62N01, 62N02; secondary 62N05.


Introduction
Study of the conditional variance with censored data involves an increasing interest among scientists.Indeed, domains like Medicine, Economics, Astronomy or Finance are closely concerned by this topic.In financial time series for instance, volatility (conditionally on time) often represents the quantity of interest and in this context, censoring can appear, by example in [20], when limitations are imposed on asset prices to mitigate their fluctuations.Therefore, although the methodology proposed in this paper enlarges beyond the following topic, we are here interested in the relationship between fatigue life of metal, ceramic or composite materials and applied stress.This important input to design-forreliability processes is motivated by the need to develop and present quantitative fatigue-life information used in the design of jet engines.Indeed, according to the air speed that enters an aircraft engine, the fan, the compressor and the turbine rotate at different speeds and therefore are submitted to different stresses.Moreover, fatigue life may be censored since failures may result from impurities or vacuums in the studied materials, or no failure may occur at all due to time constraints of the experiments.In particular, a frequently asked question in this imsart-generic ver.2014/10/16 file: main250214.texdate: January 12, 2017 context is to know whether or not the variability of fatigue life depends on the applied stress.Furthermore, in case of heteroscedasticity, a parametric shape for this (conditional) variability should be provided.We therefore consider the general heteroscedastic regression model where , known upto a parameter vector θ ∈ Θ with true unknown value θ 0 , Θ is a compact subset of IR d , and ε is independent of the (one-dimensional) covariate X.In the context displayed above, a discussion can therefore be lead about the constancy of σ θ0 (•) (σ θ0 (•) = θ 0 for a one-dimensional θ 0 ) and its parametric refinements to be possibly brought to fit available information.We do not consider any parametric form for m(•) and the distribution of ε.Indeed, since our objective is to estimate a parametric form for σ θ0 (X) (keeping in mind that a further step is to develop goodness-of-fit tests), we want the procedure to be free of any misspecification of other quantities; these could indeed seriously influence the quality of the results on the variance itself.Suppose also that Y is subject to random right censoring, i.e. instead of observing Y , we only observe (Z, ∆), where Z = min(Y, C), ∆ = I(Y ≤ C) and the random variable C represents the censoring time, which is independent of Y , conditionally on X.Let (Y i , C i , X i , Z i , ∆ i ) (i = 1, . . ., n) be n independent copies of (Y, C, X, Z, ∆).Since X is complete, we thus observe The aim of this paper is more specifically to extend classical least squares procedures to take into account censored data when estimating σ θ0 (•).If a lot of work has been devoted to polynomial estimation of the regression function for censored data (see e.g.[5] for a literature overview), much less work is achieved for the estimation of the conditional variance.In fact, model (1.1) was already considered in fatigue curve analysis ( [13,15]) but with parametric forms for imsart-generic ver.2014/10/16 file: main250214.texdate: January 12, 2017 m(•) and the distribution of ε.As explained above, we want to avoid using such parametric influences (see also Section 5).In the same idea, [6] developed a methodology to estimate a parametric curve for m(•) without any assumed parametric shape for the conditional standard deviation and the residuals distribution.[3] proposed a goodness-of-fit test for any scale function but only adapted to a subfamily of tested parametric functions.
We thus propose a new estimation method for θ 0 .The idea of the method is as follows.First, we construct for each observation a new square of the multiplicative error term that is nonparametrically estimated.Then, θ 0 is estimated by minimizing the least squares criterion for completely observed data (and parametric conditional variance estimation), applied to the so-obtained new squared errors.The procedure involves different choices of bandwidth parameters for kernel smoothing.
The paper is organized as follows.In the next section, the estimation procedure is described in detail.Section 3 summarizes the main asymptotic results, including the asymptotic normality of the estimator.In Section 4 we present the results of a simulation study and Section 5 is devoted to a deep analysis of data from a study on the relationship between fatigue life of metal and applied stress.The Appendix contains the proofs of the main results of Section 3.

Notations and description of the method
As outlined in the introduction, the idea of the proposed method consists of first estimating unknown squares of multiplicative error terms of the type ε2 (X) = σ 2 θ0 (X)ε 2 , and second of applying a standard least squares procedure on the so-obtained artificial squared errors.It follows that for continuous distributions F (y|x) = P (Y ≤ y|x) and G(y|x) =

It is easily seen that
for any location function m 0 (•) and scale function σ 0 (•) and where E 0 i = (Z i − m 0 (X i ))/σ 0 (X i ) (i = 1, . . ., n). m 0 and σ 0 are now chosen in such a way that they can be estimated consistently.As is well known (see by example [19]), the right tail of the distribution F (y|•) cannot be estimated in a consistent way due to the presence of right censoring.Therefore, we work with the following choices of m 0 and σ 0 : where F −1 (s|x) = inf{y; F (y|x) ≥ s} is the quantile function of Y given x and J(s) is a given score function satisfying 1 0 J(s) ds = 1.When J(s) is chosen appropriately (namely put to zero in the right tail, there where the quantile function cannot be estimated in a consistent way due to the right censoring), m 0 (x) and σ 0 (x) can be estimated consistently.Now, replace the distribution imsart-generic ver.2014/10/16 file: main250214.texdate: January 12, 2017 F (y|x) in (2.1) by the Beran estimator ( [1]), defined by (in the case of no ties): where , K is a kernel function and {a n } a bandwidth sequence, and define as estimators for m 0 (x) and σ 02 (x).Next, let denote the Kaplan-Meier-type estimator ( [9]) of F 0 ε (in the case of no ties), where Ê0 i = (Z i − m0 (X i ))/σ 0 (X i ), Ê0 (i) is the i-th order statistic of Ê0 1 , . . ., Ê0 n and ∆ (i) is the corresponding censoring indicator.This estimator has been studied in detail by [18].Finally, m(x) is estimated by the method of [7] applied to the estimation of a conditional mean: where T < τ H 0 ε (τ F = inf{y : F (y) = 1} for any distribution F ) is a truncation point that has to be introduced to avoid any inconsistent part of F 0 ε (y).However, when τ F 0 ε ≤ τ G 0 ε , the bound T can be chosen arbitrarily close to τ F 0 ε .This leads to the following estimator of ε2 * i : and E 0T = E 0 ∧ T .As before, these coefficients θ T 01 , . . ., θ T 0d can be made arbitrarily close to θ 01 , . . ., θ 0d , provided τ pending on x.In this case, it can be shown that m 0 (x) + σ 0 (x)τ H 0 ε ≥ τ H(•|x) for any value of x such that consistent areas of (2.5) can be substantially larger than for local estimators (see [7] for a complete discussion).
Remark 2.2 (Conditional scale function) Some researchers in (nonparametric) survival analysis often criticize the idea of estimating quantities which use the whole support of Y given X = x.The ultimate consequence in this paper is that we need to define the vector of pseudo-parameters θ T 0 instead of θ 0 .A possible solution to avoid that problem is to only estimate parts of F (•|X = x) by truncating inconsistent areas (as it is the case for m0 (x) and σ0 (x)) or by simply considering quantiles of this conditional distribution.The methodology proposed in this paper can be easily extended to this type of estimation.The idea is thus to reduce the area where F (•|X = x) is estimated by defining a conditional scale function.For the sake of accuracy, let's take the simple example of the estimation of the conditional quantile of order s (0 < s < 1) of This can be defined as where F W |X (•|X) denotes the conditional distribution of W given X.F −1 Y |X (s|X = x) can be estimated with the method proposed in [7] (similarly to mT (•) in this paper and with the same objective of enabling to estimate consistently -with F 0 ε (•)-the conditional quantile for each value of x).This leads to the estimator (to make the following formula more readable, we omit here the theoretical bound T which 'cuts' inconsistent parts of F 0 ε (•)) where ( F 0 ε ) −1 (s) = inf{y : F 0 ε (y) ≥ s}.Finally, the resulting estimated quantiles F −1 W |X (s|X i ), i = 1, . . ., n, can be introduced in a least squares problem of the type (2.7): where σ θ (•) now denotes the corresponding parametric quantile function.Since the concept of variance is more widely used in other domains (see Section 5) and often preferred to this type of scale function, and since we also want to highlight the benefit of using F 0 ε (•) to improve consistency (see Remark 2.1), we decided to study the variance function.However, as it can be seen here, it is easy to extend the methodology to another scale function.

Asymptotic results
We start by showing the convergence in probability of θT n and of the least squares criterion function.This will allow us to develop an asymptotic representation for θT nj − θ T 0j (j = 1, . . ., d), which in turn will give rise to the asymptotic normality of these estimators.The assumptions and notations used in the results below, as well as the proofs of the two first results, are given in the Appendix.

Practical implementation
The estimator θT n depends on a number of parameters: the score function J, the bandwidth a n and the cut off point T that can be chosen in a data driven way.The function J is computed as in [5], i.e., J(s , where b = min 1≤i≤n F (+∞|X i ) (the region where the Beran estimators F (•|X 1 ), . . ., F (•|X n ) are inconsistent is not used and we exploit to a maximum the 'consistent' region), while the point T can be chosen equal to the last order statistic Ê0 (n) of the estimated residuals Ê0 1 , . . ., Ê0 n (in this way, all the Kaplan-Meier jumps of the integral (2.6) are considered).When Ê0 (n) is censored, it is redefined as uncensored.
To choose the bandwidth parameter, we could minimize (with respect to a n ) an asymptotic expression of where θT n (a n ) denotes θT n determined with bandwidth parameter a n .However, that would involve complicated expressions with too many unknown quantities.
We therefore prefer to use the following bootstrap procedure.This is based on the method proposed by [11].
) calculated with a pilot bandwidth g n asymptotically larger than the original a n . Step imsart-generic ver.2014/10/16 file: main250214.texdate: January 12, 2017 Step 4. Define bn (a n ) the estimator of the variance vector of parameters based on the bandwidth parameter a n and the obtained resample From this, (4.1) can be approximated by where θ T n (g n ) is the estimator of θ T 0 based on the initial sample and the bandwidth g n .We now select the value of a n that minimizes IM SE * (a n ).The same bootstrap procedure is also used to approximate the distribution of θT n , instead of using the asymptotic distribution of Theorem 3.3, which is hard to estimate in practice.Bootstrap confidence intervals illustrate this in Section 5.

Simulations
We now study the finite sample behavior of the newly proposed estimator compared to a similar methodology but replacing F 0 ε ( •− m0 (Xi) σ0 (Xi) ) by F (•|X i ).More precisely, the new squared errors are in this case replaced by θn is obtained by minimization over θ ∈ Θ of the expression We are primarily interested in the behavior of the estimator bias, variance and mean squared error (MSE).The simulations are carried out for samples of size n = 100 and/or 200, B = 500 and the results are obtained by using R = 1000 simulations.
We work with a biquadratic kernel function . The bandwidth a n is selected by minimizing the expression (4.2) over a grid of 21 possible bandwidths depending on the covariate support.The step of the grid is added to its largest value to obtain the pilot bandwidth g n .For small values of a n , the window [x − a n , x + a n ] at a point x might not contain any X i (i = 1, . . ., n) for which the corresponding Y i is uncensored (and in that case estimation of F (•|x) is impossible).We enlarge the window in that case such that it contains at least one uncensored data point in its interior.It might also happen that the bandwidth a n at a point x is larger than the distance from x to both the left and right endpoints of the interval.In such cases, the bandwidth is redefined as the maximum of these two distances.We did not consider the boundary issue in this paper.The estimator of [18], F 0 ε (•), indeed involves m0 (•) and σ0 (•), and these can suffer from bias increasing close to the boundaries of the support of X.In the complete data case, many methods have been developed to handle this problem (see [2] for an overview of existing methods, including a new one).However, if these methods often enable to obtain a smaller bias of the studied estimators, the resulting variance is also larger, which often does not markedly lead to better mean squared errors (see also [5] for an application of boundary kernels in a similar context).Since our final objective is a parametric estimator (not a nonparametric one) based a least squares procedure using estimated (artificial) squared errors, we deemed the influence of boundary corrections too weak to be applied in this context.
In the first setting, we generate i.i.d.data from the normal heteroscedastic regression model where β 0 = 1.25, β 1 = 0.8, β 2 = 1, γ 0 = 1 and γ 1 = 0.1, 0.25, 0.5 or 0.75.X has a uniform distribution on the unit interval and the error term ε is a standard normal random variable.The censoring variable C satisfies C = α 0 exp(α 1 X + α 2 X 2 ) + γ 1 ε * for certain choices of α 0 , α 1 , α 2 and where ε * has a standard normal distribution.It is easy to see that, under this model,  than for θn .Obviously, the sample size increase is accompanied by an improvement of the results (bias, variance and MSE) independently of the considered estimator.This improvement seems however more important for θT n .Figure 1 also illustrates the previous results.For each value of the covariable, the dashed curves represent the 5% and 95% quantiles of the empirical distribution of the estimated conditional variance (constructed with the R estimations at this value).The solid curve represents the median of this empirical distribution while the true curve is dash-dotted.The grey (respectively black) color imsart-generic ver.2014/10/16 file: main250214.texdate: January 12, 2017 represents these curves for θn (respectively θT n ).From Figure 1, we remark that the median curve obtained with θT n is globally closer to the true curve than with θn .However, quantile curves are more distant for σ θT n (•) than for σ θn (•).These effects obviously decrease when n increases.
In the second setting, we generate i.i.d.data from the normal heteroscedastic regression model where β 0 = 10, β 1 = −2, β 2 = 70, γ 0 = 1 and γ 1 = −0.23.X has a uniform distribution on [75, 150] and ε is a standard normal random variable.These digits are initially motivated by the type of models met in fatigue life data analysis (see Section 5 in [6]).The censoring variable C satisfies C = α 0 + α 1 log(X − α 2 ) + exp(η 0 + η 1 log X)ε * for certain choices of α 0 , α 1 , α 2 , η 0 , η 1 and where ε * has a standard normal distribution.α 0 is chosen in such a way that some increasing in the different censoring percentages is obtained.Moreover, conditional mean of the censoring variable is differently sloping while its conditional variance varies as well.The smoothing parameter is chosen as the minimizer of (4.2) among a grid of values varying between 11.25 and 22.50 by step of 0.5625, and the value of the pilot bandwidth is 23.0625.Other simulations (not reported here) showed that the final results were not very sensible to the choice of these digits.
The great advantage of θT n = (γ T 0n , γT 1n ) (with respect to θn = (γ 0n , γ1n )) is its apparently small bias.If its variance is often larger, the impact on MSE is relatively moderate.In Table 2, these important characteristics are observed as well.Moreover, when the censoring percentage increases, θT n seems to deteriorate less than θn (feature also observed on a small scale in Table 4.2).For this range of censoring percentages and these models, the inconsistency in the right tail of the Beran estimator ( [1]) combined with its local property (especially when constructing ε2 * i for censored data) has a large impact on the estimation of  σ θn (•).Obviously, ε2 * i is also deteriorated, in particular, by the decrease of the upper bound in the integrals of m0 (x) and σ0 (x).This effect seems to be however slighter.The above characteristics also appear on Figure 2 which is constructed similarly to Figure 1.

Data analysis
As mentionned in Section 1, we are here interested in the relationship between fatigue life of metal, ceramic or composite materials and applied stress.From a long time, an important question in fatigue analysis is to understand how the variability of fatigue life given the stress (or the strain) depends on stress (or strain).Several authors addressed this problem, among others, [13,14,15] who studied the number of cycles before failure of nickel-base superalloys as functions of the strain or the pseudostress (Young's modulus times strain).By example, [14] considered model (1.1) with the following form for the conditional standard deviation of the logarithm of the number of cycles before failure: (5.1) However, those authors assumed parametric forms for both m(X) and the error distribution.
We present, in this section, a data set of n = 246 specimens of a nickel-base superalloy provided by [16] and studied by [15].For these data, we consider imsart-generic ver.2014/10/16 file: main250214.texdate: January 12, 2017 model (1.1)where Y is the logarithm of the number of cycles before failure and X is the corresponding strain (see Figure 3).A quick graphical check enables to easily observe that the above variance model does not correctly fit the data.
As suggested by [15], we only fitted the model (5.1) on the 115 observations for which strain is below 0.007.In addition, since we are interested by the conditional variance shape (whether constant or not), it seems appealing to consider the left part of the data.
The bandwidth is chosen by the bootstrap procedure explained in Section  with their corresponding standard deviations σ θT n (x) and σ θn (x).Solid curves correspond to mT (x) (black) and m(x) (grey) while dashed curves correspond to mT (x) ± σ θT n (x) (black) and m(x) ± σ θn (x) (grey).Note that the smoothing parameters for mT (x) and m(x) are chosen by the bootstrap method proposed by [7] (with the same pilot bandwidth as above).This graph clearly exhibits a decreasing standard deviation.
Next, confidence intervals are provided for both methods in Table 5.To correctly approximate the distribution of our estimators, a double bootstrap procedure is proposed: in each resample (B = 1000), an optimal value for the bandwidth parameter is obtained by a second bootstrap stage (using B = 1000 bootstrap samples for each resample).The first stage resamples are generated with the procedure of Section 4.1 and are also used to obtain the optimal bandwidth for the initial sample.Next, this procedure together with expression (4.strap methods are developed.We can observe that the intervals lengths using bootstrap for γT 0,n and γT 1,n are larger than the ones using bootstrap for γ0,n and γ1,n .0 is never included in the confidence interval for γ 1 except when using the estimated distribution of γT 1,n with the basic bootstrap procedure.Again, that suggests a non constant variance. In Figures 5 and 6, we construct for a grid of values of x, basic and percentile bootstrap confidence intervals (two-sided, 95%) for σ 2 respectively.For these two estimators, the optimal bandwidth is obtained with the method proposed by [7] adapted to the variance case (with the same pilot bandwidth as above).Under the assumed model, once again, the standard deviation seems to be far from a constant, whether for parametric or nonparametric estimations.The proposed parametric model however better fits the data.Since (5.1) is of the form γ 0 x γ1 , goodness-of-fit tests for a conditional standard deviation or for any other scale function can be considered as equivalent (see [3]).
The test proposed in this last paper could thus be applied.However, other testing procedures specifically adapted to the conditional standard deviation can be studied as well, for example, using the artificial data points (2.6).The method proposed by [3] to test a constant conditional variance and the form (5.1) leads to p−values equal to 0.000 and 0.018 respectively.The null hypothesis with the model (5.1) cannot thus be rejected at the 1 percent level even though the fit is not perfect.
Proof of Theorem 3.2.For some θ 1n between θT n and θ T 0 We have such that R 22n is a sum of i.i.d.random vectors with zero mean (by definition of θ T 0 ).For each component j of R 21n , we use Lemma A.2 while for R 1n , we write imsart-generic ver.2014/10/16 file: main250214.texdate: January 12, 2017 Using assumption (A8) and Lemme A.1, we have that each component of R 11n is o P (1).Again using condition (A8), = 2Ω + o P (1).
The result now follows.
where For A c (X, Z, ∆), write Using Proposition 4.5 and Corollary 3.2 of [18] together with an order one Taylor development and the fact that sup y≤T |yf 0 ε (y)| < ∞, coefficients of Âc (X, Z) in the two first terms of A c (X, Z, ∆) are Now, using Lemma A.2 of [7] and Lemma A.1 of [5], where R n1 (X, Z, ∆) is bounded by where R n2 (X, Z, ∆) is bounded by (A.5).To treat the terms where both Ê0T and E 0T are involved (i.e. the second term on the right hand side of (A.4) and the third, fourth and fifth terms on the right hand side of (A.6)), we need to introduce the sum used in the statement of Lemma A.2.More precisely, for the second term of (A.4), we have (1 and Ê0 T i

Fig 1 .
Fig 1.Comparison of the estimated conditional variances obtained with θn (grey curves) and θT n (black curves) for model (4.3).Dashed curves : 5% and 95% quantiles of the empirical distribution based on the estimated conditional variances; solid curve: median of this empirical distribution; dash-dotted curve: true conditional variance curve.

Fig 2 .
Fig 2. Comparison of the estimated conditional variances obtained with θn (grey curves) and θT n (black curves) for model (4.4).Dashed curves : 5% and 95% quantiles of the empirical distribution based on the estimated conditional variances; solid curve: median of this empirical distribution; dash-dotted curve: true conditional variance curve.

−Fig 3 .
Fig 3. Fatigue life data.Scatter plot of the logarithm of fatigue life versus the logarithm of strain for specimens of a nickel-base superalloy.

Figure 4
Figure 4 represents the conditional means estimates mT (x) and m(x) together

2 )
are adapted to compute each (first stage) resample optimal bandwidth (the pilot bandwidth is kept constant throughout the entire methodology).The confidence intervals are two-sided and their level is 0.95.Both basic and percentile bootimsart-generic ver.2014/10/16 file: main250214.texdate:January 12, 2017
, each estimated with the B resamples.On the left panel, confidence intervals do not always contain σ 2 θn (x) (especially for small values of x) while σ 2 θT n (x) is always included in the corresponding bounds.

Fig 7 .
Fig 7. Fatigue life data.Scatter plot of the residuals of a nonparametric fit versus the strain with m * (x) = mT (x) for the left panel and m * (x) = m(x) for the right panel.Solid black and grey curves correspond to σ θT n (x) and σ θn (x) respectively; dash-dotted black and grey curves represent the nonparametric σT (x) and σ(x) respectively for x ∈ [0.0036; 0.0070].
). Practically, the point Tx is chosen as the largest data point in the window defined by the bandwidth parameter.When this data point is censored, it is redefined as uncensored.Then, the resulting estimator imsart-generic ver.2014/10/16 file: main250214.texdate:January12, 2017

Table 4 .
2 summarizes the simulation results for different values of α 0 , α 1 , α 2 and γ 1 .For fixed value of γ 1 , α 0 , α 1 and α 2 are chosen in such a way that some variation in the different percentages of censoring (in % and denoted CP in the tables) is obtained.The censoring proportion is computed as the average of P (∆ = 0|x) for an equispaced grid of values of x.The smoothing parameter is chosen as the minimizer of (4.2) among a grid of values varying between 0.15 and 0.30 by step of 0.0075, and the value of the pilot bandwidth is 0.3075.Main observations from Table4.2 are listed in the next paragraph.For a fixed sample size, the censoring percentage increase seems to induce a deterioration of the results, whatever the considered estimation method.Moreover, a more important heteroscedasticity (through the choice of γ 1 ) implies imsart-generic ver.2014/10/16 file: main250214.texdate:January12, 2017

Table 3
Fatigue life data.Confidence intervals for γ 0 and γ 1 where strain domain is restricted to values below 0.007.The first line (for γ 0 and γ 1 ) is obtained with the estimated θn distribution and the second line with the estimated θT n distribution.