Non-linear wavelet-based density estimators under random censorship

https://doi.org/10.1016/S0378-3758(02)00366-XGet rights and content

Abstract

We provide an asymptotic expansion for the mean integrated squared error (MISE) of non-linear wavelet-based density estimators with randomly censored data. Our technique is facilitated by a result of Stute (Ann. Statist. 23 (1995) 422) that approximates the Kaplan–Meier integrals by an average of i.i.d. random variables with a certain rate. We show this MISE expansion, when the underlying survival density function and censoring distribution function are only piecewise smooth, is the same as analogous expansion for the kernel density estimators. However, for the kernel estimators, this MISE expansion holds only under the additional smoothness assumption.

Introduction

The mathematical theory of wavelets and their applications in statistics have become a well-known technique for non-parametric curve estimation: See e.g. Meyer (1990), Daubechies (1992), Chui (1992), Mallat (1989), Donoho and Johnstone (1994), Donoho 1995, Donoho 1996 and Kerkyacharian and Picard 1992, Kerkyacharian and Picard 1993. For a systematic discussion of wavelets and their applications see the recent monograph by Härdle et al. (1998). The major advantage of the wavelet method is its adaptation to the erratic behavior of the density and local adaptation to the degree of smoothness of the unknown density. These wavelet estimators typically achieve the optimal convergence rates over exceptionally large function spaces. They do an excellent job of taking care of discontinuities in the target function, and in consequence they enjoy a very good convergence rate even if smoothness conditions are imposed only in a piecewise sense.

Hall and Patil (1995) first explicitly demonstrated that, in the case of no censorship, the discontinuities of densities have a negligible effect on the performance of the non-linear wavelet density estimators. The mean integrated squared error (MISE) of the kernel estimator of a density function f has the formMISE∼c1(nh)−1+c2h2r,where “∼” means that the ratio of the left- and right-hand sides converges to 1 as n→∞ and n denotes the sample size, h is the bandwidth of the kernel estimator, r is the order of the kernel and c1 and c2 are constants depending on both the kernel and unknown density. The first term derives from the variance and the second from the squared bias. This expansion for the kernel estimators generally fails if the underlying density function does not have r derivatives (Hall and Patil, 1995, p. 906). However, the MISE expansion of the non-linear wavelet estimators is still valid for only piecewise smooth density function, and even has the same constants c1 and c2. Patil (1997) provided similar results for non-linear wavelet hazard rate estimators with complete data.

In industrial life-testing, medical follow-up research and other studies, the observation of the occurrence of the failure event may be prevented by the previous occurrence of the censoring event. So only part of the observations are real failure times. Formally, let X1,X2,…,Xn be i.i.d. survival times with a common distribution function F and density function f. Also let Y1,Y2,…,Yn be i.i.d. censoring times with a common distribution function G. It is assumed that Xi is independent of Yi for every i. Rather than observing X1,X2,…,Xn, the variables of interest, in the randomly right-censored model, one observes Zi=min(Xi,Yi)=XiYi and δi=I(XiYi), i=1,2,…,n, where I(A) denotes the indicator function of the set A.

Antoniadis et al. (1999) describe a wavelet method for the estimation of density and hazard rate functions from randomly right-censored data. The method is based on dividing the time axis into a dyadic number of intervals and then counting the number of events within each interval. The number of events and survival function of the observations are then separately smoothed over time via linear wavelet smoothers. They provide asymptotic normality of the estimator and obtain best possible asymptotic MISE convergence rate under the assumption that survival time density function f is r-times continuously differentiable and the censoring density g is continuous.

The objective of this paper is to propose a non-linear wavelet estimator of a density function with censored data and derive a result similar to the main result, Theorem 2.1, of Hall and Patil (1995). One of the consequence of this extension is that we can show that MISE has the analogous expansionMISE∼k1n−1p+k2p−2r,where n denotes the sample size, p is the smoothing parameter depending on n, a wavelet analog of the bandwidth h−1 for kernel estimators and k1 and k2 are constants depending on the wavelet, unknown density and censoring distribution.

Wu and Wells (1999) provided hazard rate estimation by non-linear wavelet methods in the left truncation and right censoring model. They applied counting process techniques and obtained analogous expansion. They provide a wavelet-based estimator for the hazard rate function over a bounded interval [ι,τ] which is chosen such that the size of risk population satisfies some additional conditions.

In this paper, we apply the method of Stute (1995) that approximates a Kaplan–Meier integrals by an average of i.i.d. random variables with a certain small rate. We provide an MISE expansion similar to that of Hall and Patil (1995) for density function over (−∞,T], for any fixed T<τH, where τH=inf{x:H(x)=1}⩽∞ is the least upper bound for the support of H, the distribution function of Z1.

In the next section, we give the elements of wavelet transform and provide non-linear wavelet-based density estimators. The main results are described in Section 3, while their proofs appear in Sections 4 and 5.

Section snippets

Notations and estimators

This section contains some facts about wavelets that will be used in the sequel. Let φ(x) and ψ(x) be father and mother wavelets, having the properties: φ and ψ are bounded and compactly supported; ∫φ2=∫ψ2=1, μk≡∫ykψ(y)dy=0 for 0⩽kr−1 and μr=r!κ≠0, where κ=(r!)−1∫yrψ(y)dy. Letφj(x)=p1/2φ(px−j),ψij(x)=pi1/2ψ(pix−j),x∈Rfor arbitrary p>0,−∞<j<∞ and pi=p2i,i⩾0. Then∫φj1φj2j1j2,∫ψi1j1ψi2j2i1i2δj1j2,∫φj1ψij2=0,where δij denotes the Kronecker delta, i.e. δij=1, if i=j; 0, otherwise. For more on

Main results

We assume that the smoothing parameters p,q and δ satisfy the following condition:(SP):p→∞,q→∞,pqδ2→0,p2r+1δ2→∞,δ⩾Cn−1lnn,whereC>C0≡2{r(2r+1)−1supf1(1−G)−1}1/2.

Theorem 3.1

In addition to the conditions on φ and ψ stated in Section 2, assume that the rth derivative f(r) is continuous on (−∞,∞) and is bounded, monotone on (−∞,−u) for a sufficiently large positive u and the censoring distribution function G is continuous. Also assume condition (SP) holds. ThenE∫(f1̂−f1)2n−1p∫f11−G+p−2rκ2(1−2−2r)−1∫f1(r)2=o(n

Proofs

The proof of the above theorem follows along the lines in Hall and Patil (1995), combined with Stute (1995) which establishes an approximation for the Kaplan–Meier integral ∫ϕdFn̂ as an average of i.i.d. random variables with a sufficiently small error. This allows for a more traditional and direct approach to the density estimation problem for censored data, compared to the martingale approach as used, e.g. in Wu and Wells (1999). We begin with some lemmas.

Lemma 4.1

Let b̂j and b̂ij be defined as in Eqs.

Acknowledgements

The author expresses his deep gratitude to his advisor Professor Hira L. Koul for his constant advice, valuable suggestion and careful reading which greatly improve the presentation of this paper. The author also appreciates the constructive suggestion from Professor Winfried Stute on Lemma 4.1 and is very grateful to one referee for his pointing out errors and typos and providing many insightful comments.

References (18)

There are more references available in the full text version of this article.

Cited by (21)

  • On the block thresholding wavelet estimators with censored data

    2008, Journal of Multivariate Analysis
    Citation Excerpt :

    They obtain the estimator’s asymptotic normality and asymptotic mean integrated squared error (MISE). Li [16] considers a nonlinear wavelet estimator of a single density function with randomly censored data and derives its mean integrated squared error. The objective of this paper is to propose block thresholding wavelet estimators with censored data for the density functions which belong to a large function class and investigate their asymptotic convergence rates.

  • On the minimax optimality of wavelet estimators with censored data

    2007, Journal of Statistical Planning and Inference
    Citation Excerpt :

    Antoniadis et al. (1999) provided a wavelet method for the estimation of density and hazard rate functions from randomly right-censored data. They obtained the estimator's asymptotic normality and best possible asymptotic mean integrated squared error (MISE) convergence rate for a fixed density function f. Li (2003) considered a non-linear wavelet estimator of density functions with randomly censored data and showed that its MISE, when the underlying curve is only piecewise smooth, has the same expansion as an analogous kernel estimator. However, that MISE expansion usually fails for the kernel estimators, if an additional smooth assumption is not imposed on the underlying density function.

View all citing articles on Scopus
1

Research partly supported by the NSF Grant DMS 0071619.

View full text