Smoothed rank-based procedure for censored data

: A smoothed rank-based procedure is developed for the accel- erated failure time model to overcome computational issues. The proposed estimator is based on an EM-type procedure coupled with the induced smoothing. The proposed iterative approach converges provided the initial value is based on a consistent estimator, and the limiting covariance matrix can be obtained from a sandwich-type formula. The consistency and asymptotic normality of the proposed estimator are also established. Extensive simulations show that the new estimator is not only computationally less demanding but also more reliable than the other existing estimators.


Introduction
The accelerated failure time (AFT) model is a useful alternative to the popular Cox regression model for survival data analysis, especially when the key assumptions of the Cox model, for example, proportional hazards, are not guaranteed to hold in practice. The direct interpretability of the AFT model in terms of life expectancy is more desirable for practitioners, compared to the indirect interpretation based on hazards in the Cox model. For more discussion, see Wei (1992), among others. Buckley and James (1979) proposed solving normal equations with censored responses replaced by their expected life length based on the Kaplan-Meier estimate. Simulation studies conducted by Miller and Halpern (1982) showed that the Buckley-James estimator is reliable. However, its asymptotic properties are difficult to establish; see Lai and Ying (1991). In addition, the iteration involved can become trapped in a loop and thus may never converge. Nonconvergence occurs more frequently when censoring occurs more severely. Stare et al. (2000) suggested that it is safe to use the Buckley-James estimator only when censoring is less than 20%. Ritov (1990) and Tsiatis (1990), among others, studied rank estimators based on the linear rank statistics for censored data proposed by Prentice (1978). A Gehan-type weight function is commonly used, and is a starting point in considering rank estimators, as the function leads to a monotonic estimating function. Although other weighting schemes could yield more efficient estimates, the monotonicity of their associated estimating functions is not ensured, and the uniqueness of the estimates is not guaranteed. Jin et al. (2003) developed an iterative procedure for approximating possibly non-monotone weighted log-rank estimating functions around the true values of regression parameters. The iteration algorithm starts with a Gehan-type estimator, and solves a monotone estimating function at each step with updated weights. The covariance matrix of the estimated regression coefficients are then estimated by using a resampling approach to approximate their distributions. However, although this numerical procedure can be conveniently implemented, for large datasets the high computational burden resulting from resampling techniques is undesirable.
To overcome difficulties in estimating the covariance matrix, usually caused by non-smooth estimating functions, Brown and Wang (2005) developed an induced smoothing technique, resulting in continuously differentiable objective or estimating functions, which makes the standard numerical methods applicable. This induced smoothing technique provides a natural method for determining kernel bandwidths when replacing discontinuous functions in estimating equations with smoothed distribution kernels. Recently, Chiou et al. (2013) developed a fast and efficient rank-based inference procedure for the AFT model in the context of case-cohort studies by adopting the induced smoothing approach.
Smoothed rank procedure for censored data 2955 Consider the linear regression model, . . n, are independent and identically distributed.
Denote X i − X j by d ij . The corresponding rank estimating function is, (1) The smoothed rank estimating function derived by Brown and Wang (2005) is, in which Γ is a p × p covariance matrix ofβ, the root of the smoothed rank estimating function. Γ can be estimated with a sandwich formulaΓ = D −1 cov{ S n (β 0 )}{D T } −1 , where D = E{ S ′ n (β)} β0 , and Φ(·) denotes the standard normal cumulative distribution function. Here, D and cov{ S(β 0 )} can be replaced by their estimatesD = S ′ n (β) and cov{ S n (β 0 )}, respectively. By starting from an initial matrix of Γ (0) based on an initial estimate of β (0) , we can obtain β (k+1) by iterating between solving S (k) (β) using Γ (k) , and updating Γ (k+1) . Thus, the unique estimator of β 0 and its estimated covariance matrix Γ are the limits of the sequence of β (k) and Γ (k) , k ≥ 1. The smoothed rank estimating function with variable kernel bandwidths has an appealing rationale. The consistent estimatorβ to be obtained can be stochastically expressed as β 0 + Γ 1 2 N (0, I p ), which is a random perturbation to the true parameter values β 0 . This leads us to consider S(β 0 ) = E{S(β 0 + Γ 1 2 N (0, I p ))}, an average of random perturbations of S. Since monotonicity of the estimating equations is needed for the induced smoothing method, application to the AFT models is made by Brown and Wang (2006) only for the Gehan estimating equations.
In this article, we investigate the induced smoothing approach for the AFT model together with the EM algorithm, i.e, iterating between imputing the censored lifetime using their expected conditional survival time and solving the smoothed rank estimating equation for "complete data." To overcome nonsmoothness and non-monotonicity due to imputation for the censored lifetime based on the Kaplan-Meier estimate of survival functions, we first propose in Section 2 an estimate of the conditional expectation of residuals using kernel smoothing, which is smooth in the residuals, but also more importantly the regression coefficient β. The choice of the kernel function and bandwidths are discussed in Section 3. Using the consistent Gehan estimator as the initial value, we develop an iteration procedure consisting of imputing censored residuals with their smoothed estimates and solving smoothed monotone rank estimation equations established by using the induced smoothing approach in Section 4. In light of the smoothness of the estimating functions in each iteration, we can obtain the limiting covariance matrix by the standard procedure. Theoretical justifications for the locally asymptotic linearity of the resulting estimating functions and asymptotic normality of the proposed estimators are provided. Simulation results and two real examples are also presented in Section 5 and Section 6, to demonstrate the usefulness of the proposed method.

Smoothed estimates of conditional mean residuals
Suppose that there is a random sample of n subjects. Let T i and C i be the failure time and censoring time, respectively, for the ith subject and X i be the associated p × 1 vector of covariates. It is usually assumed that T i and C i are independent, conditional on X i . The observed data may be written as a triplet (T i , δ i , X i ); 1 ≤ i ≤ n, whereT i = min(T i , C i ) and δ i = I(T i ≤ C i ) is the censoring indicator. The usual accelerated failure time model takes the form, where β 0 is the true value of a p × 1 vector of regression coefficients, and {ε i ; 1 ≤ i ≤ n} are independent error terms following a common distribution F . Denote by {Y i = log(T i ), 1 ≤ i ≤ n} the observed responses. For fixed β, in the presence of censoring, the observed residual,ẽ i = Y i − X T i β is not the actual residual any more for the ith subject. Note thatẽ i = min(e i , c i ), where e i = log T i − X T i β and c i = log C i − X T i β, and thus, whenever the observed lifetime is censored, the corresponding residual is also censored. Buckley and James (1979) suggested replacing the censored residualẽ i with the conditional expectation of e i , which can be expressed as In the Buckley-James estimator for the AFT model, F is replaced by its Kaplan-Meier estimatorF and the censored observation Y i by The estimated conditional mean lifetime in (3) is piecewise linear in β with jumps where there is a change in the ranks of {ẽ i (β), 1 ≤ i ≤ n}. When imputing in the subsequent iteration procedures for updating β, the discontinuous estimate of m(ẽ i ; β) can lead to non-uniqueness of β estimates, which is indicated by iterates oscillating between two values. To deal with the non-smoothness caused by applying the Kaplan-Meier es-timateF , smoothed estimates of F can be used instead in (2). Numerous smoothed estimates for lifetime survival functions have been proposed and studied in the literature. They could be adopted here to obtain a smoothed estimate of the residual distribution. However, care is needed because the smoothness property is required not only in the residuals but, more importantly, also, in the regression coefficients β. Bearing this in mind, we start by considering the kernel estimator for the hazard rate function α(t), developed by Ramlau-Hansen (1983).
Let E 1 < E 2 < · · · be the ordered successive jump residuals for a given β. This kernel estimator of hazard rate was derived by smoothing the increments of the Nelson-Aalen estimator and may be written as, in terms of residuals, where Y (e) = n − n i=1 I(ẽ i < e) = n(1 − F n (e)), and F n (e) = n −1 n i=1 I(ẽ i < e). Here, the kernel function k is a bounded function that vanishes outside [−1, 1] and has integral 1. The bandwidth h 1 is a positive parameter. By assuming a continuous underlying distribution ofẽ i , i = 1 . . . n, further kernel smoothing applied to the at-risk process Y (t) givesŶ (e) = n(1 −F n (e)), whereF n (e) = n −1 n i=1 K{(e −ẽ i )/h 2 } and K(t) = t −∞ k(u)du; see Azzalini (1981) among others. Here, the bandwidth parameters h 1 and h 2 are to be determined among uncensored subjects, and among all subjects, respectively. Substituting Y (e) withŶ (e) inα(e), we may further derivẽ Then a kernel-based estimatorÃ(e) for the cumulative hazard function A(e) = e −∞ α(u)du isÃ Note thatÃ(e) is continuous in e, and thus the conditional survival distribution of the residual errors can be subsequently estimated byS(e) = exp −Ã(e) for a given β and convariates x. Let {ẽ (i) ; 1 ≤ i ≤ n} be the ordered residuals of {ẽ i ; 1 ≤ i ≤ n}. Substituting the kernel estimatorS(e) into (2) for a censored residualẽ i leads to a smoothed estimator for m(ẽ i , β), where j is the rank ofẽ i (j = 1, 2, . . . , n), i.e.,ẽ i =ẽ (j) . For convenience, we defineẽ (n+1) = 0, andS(ẽ (n+1) ) = 0. It is obvious thatm(·) is smooth in the residuals and the regression coefficients β. Under the conditions required in Ramlau-Hansen (1983) for showing the consistency ofα(t) in (4), we have the following consistency results in Theorem 1.

Choice of bandwidths and kernel functions
If smoothing the at-risk process is regarded as smoothing a survival distribution for complete data, the optimal choice for h 2 is of order n −1/(2q−1) ; while the optimal choice for h 1 is of order N −1/(2q+1) 1 for estimating the kernel density inα(e), where N 1 is the number of uncensored subjects. See Azzalini (1981) for a detailed discussion. In these cases of h 1 and h 2 , from Giné and Nickl (2009) , and thus, that the convergence in Theorem 1 follows.
As defined in estimating the conditional expectation of regression residuals, the kernel function k is required to be a bounded function that vanishes outside [−1, 1] and integrates to 1. Although any kernel functions satisfying this requirement can be used and we expect that the choice of kernel function is a secondary matter compared with selecting the bandwidths as shown in Andersen et al. (1992), we suggest using the Epanechnikov kernel for convenience. This also makes K in the denominator ofα(e) the cumulative Epanechnikov kernel, which is Corresponding to the Epanechnikov kernel, in all the simulation studies performed in Section 5 and the examples shown in Section 6, the optimal bandwidth (40 √ π) 1/5 σ 1 N −1/5 1 for h 1 was used, and h 2 was set to be 1.
Here σ 1 and σ 2 are estimates of the standard deviations of e = log T −β T X, respectively, among the uncensored subjects and among all subjects. In consideration of heavy-tailed and skewed errors, they can be estimated, following Silverman (1986), bŷ whereŝ, Q 1 and Q 3 are the sample standard deviation, the first and third sample quartiles, among N 1 uncensored subjects when estimating σ 1 , or among all N 2 subjects when estimating σ 2 , respectively. The scale parameter (40 √ π) 1/5 σ 1 in h 1 is calculated by referring to a normal underlying distribution and using the Epanechnikov kernel in the optimal smoothing parameter suggested in Silverman (1986), which minimizes the approximate mean integrated square error (AMISE) for the kernel density. When determining the scale parameter for h 2 , we recommend using 1.3σ 2 for general use for ease of computation, which has been shown in Azzalini (1981) as a reliable choice against various underlying distributions for the Epanechnikov kernel. Although plug-in methods or even cross-validation methods may provide a more accurate approximation to the bandwidth minimizing AMISE, see Altman and Leger (1995) among others, the high computational burden required makes them practically difficult to implement in the estimating procedure proposed later.

Smoothed rank procedure for the AFT model
In this section, we propose an EM-type procedure for the AFT model, iterating between estimating the kernel smoothed conditional expectations of the censored residuals developed in Section 2 and solving the ordinary rank-based estimating equations for complete data, with censored residuals being imputed by their estimates obtained in the expectation step. The induced smoothing technique is also introduced into this framework to facilitate estimating the covariance matrices required in the iteration procedure.
With the censored residuals being imputed by their smoothed estimates of their conditional expectations and replaced in (1) for the ordinary rank-based estimating function for complete data, an estimating function can be defined as for censored residuals, we may approximate U (β) with the smoothedŨ (β) using the induced smoothing technique as follows: which is reasonable as long as ||X i || is bounded almost surely by a non-random constant for 1 ≤ i ≤ n. Note thatŨ (β) is asymptotically equivalent to U (β) as presented in Lemma 1 although unconditionally it is not established by directly taking an overall average on U (β) after perturbing β everywhere whenever it appears inside the estimating function. In the iteration procedure developed later where {V i ; 1 ≤ i ≤ n} have been pre-fixed from the previous estimation step,Ũ (β) is naturally obtained as if there is no censored data.
Since it is reasonable to assume that In light of the continuity ofŨ (β) everywhere in β,Ũ (β) is locally linear at β 0 by applying the Taylor expansion.
Suppose the root ofŨ (β) isβ DSR . We show in the Appendix thatβ DSR is a consistent estimator of β 0 and n 1/2 (β DSR − β 0 ) is asymptotically normally distributed as N (0, nΓ). Denote the covariate vectors corresponding to the or- whereβ is an estimate of β 0 obtained from a consistent estimator, and Because of the asymptotic equivalence ofŨ (β 0 ) and U (β 0 ), the asymptotic covariance matrix cov{U (β 0 )} needs to be estimated next. This can be approached by applying the Hajek projection method tô which is asymptotically equivalent to U (β 0 ) as obtained from Lemma 1. It follows estimating cov{U (β 0 )} by wherê The proposed estimating functionŨ (β) is smooth but not necessarily monotone in β, i.e.,β DSR is not necessarily a unique root ofŨ (β). However, in light of the local linearity ofŨ (β), we propose to approximateβ DSR with the following iterative algorithm.
Step 1: Initial calculation. Use the induced smoothing method in Brown and Wang (2006) for the AFT model to get the Gehan rank estimatorβ G and let β (0) =β G .
Step 3: Updating regression estimates. Solvingβ (m+1) fromŨ (β) = 0 as if {Y i + V i ; 1 ≤ i ≤ n} are completely observed lifetimes. To solve the estimating equation, the numerical algorithm proposed in Brown and Wang (2005) is embedded in Step 3, consisting of 1) Usingβ (m) and Γ (m) as initial values 2) Updatingβ (m) by solvingŨ (β) = 0 with Γ being fixed at the updated Γ (m) 3) Updating (10) and (14) by using the updatedβ (m) and hence Γ (m) 4) Repeating 2) and 3) until convergence. The updatedβ (m) and Γ (m) after convergence are denoted asβ (m+1) and Γ (m+1) Step 4: Continuation. Repeat Steps 2 and 3 until convergence. By starting from an initial value obtained from a consistent estimator, such as the Gehan rank estimatorβ G , we show in the Appendix that the mth-step estimator where is a consistent estimator of β 0 and asymptotically normally distributed. Provided thatβ (m−1) at the (m − 1)th-step of iteration is a fixed and consistent estimator of β 0 so that the imputed lifetime Y i + V i , i = 1 . . . n and Γ (m−1) are given, it can be seen that β (m) is a unique root ofŨ {β,β (m−1) }, the estimating function in (9) at the mth-step, since now this conditional function, as an ordinary rank-based estimating function for complete data, is smooth and monotone. By starting from a fixed initial value obtained from a consistent estimator, by induction β (m) is a unique estimator at the mth-step of iteration. Apparently,β (m) approaches the usual smoothed rank estimator when the censoring vanishes. When the hazard α(e) is nondecreasing in the residual e, we further show that the matrix B − D is nonnegative, which implies from (15) that when m tends to infinity, (I − B −1 D) m tends to a zero matrix, andβ (m) uniquely estimates β 0 as lomg asβ (0) is consistent. The covariance matrix of β (m) approachesΓ = D −1 cov{Ũ n (β 0 )}{D T } −1 after sufficiently many steps of iteration m, which in turn approaches to Γ as n → ∞.

Simulations
To examine the finite sample performance of the proposed rank procedure, we considered the same model simulated by Jin et al. (2006a). Specifically, the failure times were generated from the following model: where X 1 follows Bernoulli with .5 success probability and X 2 has the independent normal distribution N (0, 0.5 2 ). Four different error distributions for ǫ: the standard normal distribution; the standard logistic distribution; the extremevalue distribution; and a mixture of N (0, 1) and N (0, 9) with mixing probabilities (0.9, 0.1), denoted by 0.9N (0, 1) + 0. 1N (0, 9), and one Weibull distribution with hazard rate 1/(2 √ t) for exp(ǫ), denoted by Weibull(0.5, 1), were investigated. We considered random samples of 100 and 200 subjects. The censoring times were generated from the uniform distribution Un[0, c], where c was chosen to yield separately a 25% and a 50% censoring rate for each scenario.
In addition to the proposed estimator (DSR), we included the profile likelihood estimator (PLE) proposed by Zeng and Lin (2007), the log-rank (Logrank), the least squares (LSQ) estimators proposed by Jin et al. (2003Jin et al. ( , 2006a, and the Gehan estimator for efficiency comparisons. The log-rank and least squares estimators are asymptotically efficient under the extreme-value and normal error distributions, respectively. The profile likelihood method demonstrated substantial gain in efficacy for ǫ following Weibull(2, 1) in the simulations conducted by Zeng and Lin (2007). We did not include this case as it might be more informative for comparisons after further transformations to the LSQ, Log-rank, and DSR methods for the Weibull(2, 1) error of L-shape than comparing them directly. Note that the results for the Gehan estimator asymptotically also represented the performance of its smoothed version proposed by Brown and Wang (2006), as well as the one proposed by Heller (2007), which is essentially the same as Brown and Wang (2006) as commented in Johnson and Strawderman (2009).
The results of the simulations based on 1000 repetitions for n = 100 and n = 200 are summarized in Table 1. In Table 1, we find that the proposed parameter estimators of β 1 and β 2 are virtually unbiased. The standard errors of the PLE, Log-rank, and LSQ estimators are very similar to those obtained for the same scenarios in Zeng and Lin (2007) and Jin et al. (2003Jin et al. ( , 2006a, respectively, though not presented here. The induced smoothing procedure for estimating variances appears to reflect true variations, and the confidence intervals had proper coverage probabilities. The Epanechnikov kernel and the bandwidths used worked well, and the approximation of the conditional mean residual seemed appropriate in producing reasonable estimates.
With the normal error and the 25% censoring rate, the proposed estimator was slightly less efficient than the least squares estimator, but more efficient than the other two estimators. For the 50% censoring rate, it outperformed all the other estimators, even including the least squares estimator. Under the extremevalue error, the proposed estimator was less efficient than the log-rank estimator, but much more efficient than the other two estimators. Under the logistic and mixture normal error distributions, the proposed estimators were more efficient than the other three estimators. For the Weibull case, the relative performance of all the methods was, as expected, consistent with those for the extreme value distribution, because a log-Weibull distribution of ǫ is theoretically equivalent to a scaled extreme value distribution. In addition, the least squares method also performed well for the logistic apart from normal, but poorly for the other error distributions. The Log-rank regression is most efficacious for the distributions scaled to the extreme value. However, its performance was generally the worst for all the other scenarios. In general, the PLE estimator performed well when the censoring rate was 25%. However, when the censoring rate was increased to 50%, it showed no better performance than the least squares and log-rank estimators. The Gehan estimator performed well for normal and logistic distributions for the 25% censoring rate. However, the performance was pretty much poor for the 50% censoring rate. In summary, the doubly smoothed estimator appeared to retain relatively high efficiency against different underlying distributions. The efficiency gains were particularly substantial for severe censoring.
With a convergence criterion of 0.001, the proposed numerical procedures converged very rapidly with a 25% censoring rate, usually within 5 iterative steps. For the scenarios with a 50% censoring rate, convergence was always achieved between 10 to 50 steps, depending on the sample sizes. Table 2 demonstrates the computing time for the rank-based procedures in seconds for each replicate averaged among all the simulations, when running on a personal computer with an Intel 2.33GHz CPU and 2G RAM. For the 25% censoring rate,  the computational speed of the newly proposed estimator was almost the same as the profile likelihood estimator. Both were much faster than the other two estimators based on resampling techniques. When the censoring rate was 50%, the DSR method was slower than the profile likelihood and the Gehan esti- mator. However, some of the other procedures except for the DSR converged even faster compared with their computing time for the 25% censoring rate when n was fixed. This is because in all the estimating functions of the other estimators, the censoring indicator is a multiplier, which means that the computational complexity for these estimators decreases along with the increased censoring rate. However, the DSR method can always manage to recover the censored information, which implies a more complex computational requirement when the censoring rate is higher. This perhaps, from a different perspective, explains why the performance of the DSR method for the high censoring rate was better than the others for most of the scenarios and even very close to those supposed to be asymptotically efficient for the underlying distributions.

Two examples
We first applied the proposed doubly smoothed method to the Mayo primary biliary cirrhosis (PBC) study described in Fleming and Harrington (1991), which has been widely investigated, such as in Jin et al. (2006a) and Zeng and Lin (2007). The data contain information about the survival time and prognostic factors for 418 patients. Jin et al. (2006a) fitted the accelerated failure time model with five covariates -age, log(albumin), log(bilirubin), edema, and log(protime) -using rank and least squares estimators. We fitted the same model using the proposed method, for which the Epanechnikov kernel was used in estimating conditional mean residuals, separately with the bandwidths a 1 = (40 √ π) 1/5 σ 1 N −1/5 1 and a 2 = σ 1 N −1/7 1 for h 1 . The results are presented in Table 3, together with the estimates from other methods, cited directly from Zeng and Lin (2007) and Jin et al. (2003Jin et al. ( , 2006a, for ease of comparison.  The parameter estimates and the variance estimates produced from the proposed method appear similar for the two different choices of bandwidths. The new estimating method led to numerically smaller absolute effects in age, log(Albumin) and log(Bilirubin) than those from the other methods. For Edema and log(Protime), our estimates are comparable with those from the Gehan and least squares regressions. As the censoring rate in this dataset is approximately as high as 61.5%, the new estimates could provide a useful alternative explanation for the model in view of the relatively good performance of the method for severe censoring, shown in the simulations.
The second example is the well-known Stanford heart transplant data analyzed by Miller and Halpern (1982) using the Cox and Buckley-James regression methods. Survival time for a total of 184 heart-transplanted patients was collected with their ages at the time of the first transplant and T5 mismatch scores. Following Miller and Halpern (1982), the 27 patients who did not have T5 mismatch scores and the 5 patients with survival time of less than 10 days were excluded from our analysis. Out of the remaining 152 patients, 55 were censored as of February 1980. The dataset is available and named "stanford2" in the "survival" package in R. A quadratic model of base 10 logarithm of survival time with regard to age was also considered here as reported in Miller and Halpern (1982), rather than a linear model for a better fit.
The AFT model is usually used when the proportional hazards assumption required by the Cox model is questionable. In this regard, many existing methods for assessing the adequacy of the proportional hazards assumption can be first used here. Our perspective is that even though the Cox model is valid, the AFT model can be a reliable alternative by providing a different interpretation to the covariate effect. In light of this, instead of checking the proportional odds assumption, we explored graphically the appropriateness of the log-linear specification assumed in the AFT model. Kaplan-Meier estimates of the log survival time conditional on age at quantiles τ = 0.01, 0.25, 0.5, and 0.75 were plotted against age. Larger quantiles were not included because there are relatively few ages at which these quantiles are available around observed data points. Meanwhile, at each age, the convention to redefine the largest log survival time as uncensored whenever it is not was adopted. Fitted quadratic models of these quantiles with regard to age using least squares estimate were also plotted. Figure 1 demonstrates that the quadratic age model assumed fits the data well, and the patterns on age effect at different quantiles appear similar to each other, indicating that a homogeneous AFT model is adequate for the data.
To better interpret the age effect to avoid the possible multicollinearity between age and age 2 , we further centralized age at its mean value 42, following Wei et al. (1990). The parameter estimates for centralized age and age 2 using the proposed DSR method and a convergence criteria of 10 −6 are −0.033 and −0.0014 with the corresponding 95% confidence intervals (−0.051, −0.016) and (−0.0028, −0.00014), respectively. Compared with the results from the Gehan (−0.036 and −0.0017) and log-rank (−0.038 and −0.0016) methods reported in Wei et al. (1990), the DSR estimators demonstrate also negative linear and quadratic, but both age effects are smaller in magnitude.

Discussion
In this article, we have proposed an induced smoothing rank regression method for censored data, where censored observations are imputed by using kernel smoothed estimates for conditional expectations of lifetimes. The original purpose is to recover the efficiency of rank procedure with censored data following the logistic distribution, as in the common Wilcoxon type procedures. The simulation results also showed that the performance of the proposed estimator generally outperformed the other existing estimators, in particular, for severe censoring. The kernel estimation used is based on the convolution formulation for the hazard function in Ramlau-Hansen (1983). Other smoothed estimators, such as those proposed in Chaubey and Sen (2008), could also be adopted here. We could even consider smoothing the mean residual lifetime directly as in Abdous and Berred (2005), among others. However, the kernel estimator we proposed is convenient for demonstrating smoothness in the regression coefficients, and for showing asymptotic properties for the final regression estimators.
The proposed method could also be extended to clustered data settings. To this end, we can consider induced smoothing the weighted rank regression suggested by Wang and Zhao (2008) with censored observations being imputed by certain smoothed estimates of conditional expectations of lifetimes, adjusted for correlations. Another potential application can be developing a smoothed estimating equation in a quantile regression (Wang et al. (2009)), when both smoothing and imputing are needed for censored data. This is of particular interest in presence of heteroscedasiticity. .
Suppose that the distribution function ofẽ i = min(e i , c i ), i = 1 . . . n, is continuous. From Theorem 1 in Giné and Nickl (2009) and the conditions assumed on h 1 and h 2 , it can be shown that sup . This means also that sup E∈E |(1−F n (E)) −1 −(1−F n (E)) −1 | = o a.s. (h 1 ) from the continuous mapping theorem. It follows that |α(e)−α(e)| P → 0 by noting that k is a bounded kernel.
Observing that the counting process arguments are still applicable here in term of the residuals even though arbitrarily large negative values may be observed, we further obtain thatα(e) P → α(e) as n → ∞, under those conditions required for showing consistency ofα(e) in Ramlau-Hansen (1983). It implies thatα(e) P → α(e). By dominated convergence again, we have |Ã(e) − A(e)| P → 0 as n → 0. The consistency ofS(e) follows immediately. Following the derivation of Lemma 1 in James and Smith (1984), now we can show that sup e∈E |m(e; β)− m(e; β)| P → 0.
Let h =β DSR − β 0 and denoteβ DSR by β 0 + h. Due to the continuity of U (β) in β, generalizations of the mean value theorem give that almost surely where D h = 1 0 JŨ (β 0 + th)dt, 0 ≤ t ≤ 1, and JŨ (·) is the Jacobian matrix of U (·). The triangle inequality and (A5) further give that By assuming the Jacobian matrix JŨ (·) is non-singular within 0 ≤ t ≤ 1, it shows thatβ DSR is a consistent estimator of β 0 . Note that this consistency result states nothing about the uniqueness ofβ DSR .
The asymptotic covariance matrix ofβ (m) is determined by the variations of botĥ β G andŨ (β 0 ). However, it approximately equals Γ = D −1 cov{Ũ (β 0 )}{D T } −1 when m is sufficiently large. This can be seen by first showing that B − D is nonnegative under the condition that the hazard α(e) is nondecreasing in e. We find that for any i = 1 . . . n, V i = (1 − δ i ){m(e i ; β) − e i } = (1 − δ i ) ∞ ei S(t)dt/S(e i ) is nonincreasing in e i , shown as follows, It follows that for any vector γ p×1 and v i = ∂V i /∂β, is nondecreasing in γ T x i . It implies that B − D is nonnegative and therefore (I − B −1 D) m approaches a zero matrix as m is sufficiently large.