Non-linear wavelet-based density estimators under random censorship
Introduction
The mathematical theory of wavelets and their applications in statistics have become a well-known technique for non-parametric curve estimation: See e.g. Meyer (1990), Daubechies (1992), Chui (1992), Mallat (1989), Donoho and Johnstone (1994), Donoho 1995, Donoho 1996 and Kerkyacharian and Picard 1992, Kerkyacharian and Picard 1993. For a systematic discussion of wavelets and their applications see the recent monograph by Härdle et al. (1998). The major advantage of the wavelet method is its adaptation to the erratic behavior of the density and local adaptation to the degree of smoothness of the unknown density. These wavelet estimators typically achieve the optimal convergence rates over exceptionally large function spaces. They do an excellent job of taking care of discontinuities in the target function, and in consequence they enjoy a very good convergence rate even if smoothness conditions are imposed only in a piecewise sense.
Hall and Patil (1995) first explicitly demonstrated that, in the case of no censorship, the discontinuities of densities have a negligible effect on the performance of the non-linear wavelet density estimators. The mean integrated squared error (MISE) of the kernel estimator of a density function f has the formwhere “∼” means that the ratio of the left- and right-hand sides converges to 1 as n→∞ and n denotes the sample size, h is the bandwidth of the kernel estimator, r is the order of the kernel and c1 and c2 are constants depending on both the kernel and unknown density. The first term derives from the variance and the second from the squared bias. This expansion for the kernel estimators generally fails if the underlying density function does not have r derivatives (Hall and Patil, 1995, p. 906). However, the MISE expansion of the non-linear wavelet estimators is still valid for only piecewise smooth density function, and even has the same constants c1 and c2. Patil (1997) provided similar results for non-linear wavelet hazard rate estimators with complete data.
In industrial life-testing, medical follow-up research and other studies, the observation of the occurrence of the failure event may be prevented by the previous occurrence of the censoring event. So only part of the observations are real failure times. Formally, let X1,X2,…,Xn be i.i.d. survival times with a common distribution function F and density function f. Also let Y1,Y2,…,Yn be i.i.d. censoring times with a common distribution function G. It is assumed that Xi is independent of Yi for every i. Rather than observing X1,X2,…,Xn, the variables of interest, in the randomly right-censored model, one observes Zi=min(Xi,Yi)=Xi∧Yi and δi=I(Xi⩽Yi), i=1,2,…,n, where I(A) denotes the indicator function of the set A.
Antoniadis et al. (1999) describe a wavelet method for the estimation of density and hazard rate functions from randomly right-censored data. The method is based on dividing the time axis into a dyadic number of intervals and then counting the number of events within each interval. The number of events and survival function of the observations are then separately smoothed over time via linear wavelet smoothers. They provide asymptotic normality of the estimator and obtain best possible asymptotic MISE convergence rate under the assumption that survival time density function f is r-times continuously differentiable and the censoring density g is continuous.
The objective of this paper is to propose a non-linear wavelet estimator of a density function with censored data and derive a result similar to the main result, Theorem 2.1, of Hall and Patil (1995). One of the consequence of this extension is that we can show that MISE has the analogous expansionwhere n denotes the sample size, p is the smoothing parameter depending on n, a wavelet analog of the bandwidth h−1 for kernel estimators and k1 and k2 are constants depending on the wavelet, unknown density and censoring distribution.
Wu and Wells (1999) provided hazard rate estimation by non-linear wavelet methods in the left truncation and right censoring model. They applied counting process techniques and obtained analogous expansion. They provide a wavelet-based estimator for the hazard rate function over a bounded interval [ι,τ] which is chosen such that the size of risk population satisfies some additional conditions.
In this paper, we apply the method of Stute (1995) that approximates a Kaplan–Meier integrals by an average of i.i.d. random variables with a certain small rate. We provide an MISE expansion similar to that of Hall and Patil (1995) for density function over (−∞,T], for any fixed T<τH, where is the least upper bound for the support of H, the distribution function of Z1.
In the next section, we give the elements of wavelet transform and provide non-linear wavelet-based density estimators. The main results are described in Section 3, while their proofs appear in Sections 4 and 5.
Section snippets
Notations and estimators
This section contains some facts about wavelets that will be used in the sequel. Let φ(x) and ψ(x) be father and mother wavelets, having the properties: φ and ψ are bounded and compactly supported; ∫φ2=∫ψ2=1, for 0⩽k⩽r−1 and μr=r!κ≠0, where . Letfor arbitrary p>0,−∞<j<∞ and . Thenwhere δij denotes the Kronecker delta, i.e. δij=1, if i=j; 0, otherwise. For more on
Main results
We assume that the smoothing parameters and δ satisfy the following condition:where Theorem 3.1 In addition to the conditions on φ and ψ stated in Section 2, assume that the rth derivative f(r) is continuous on (−∞,∞) and is bounded, monotone on (−∞,−u) for a sufficiently large positive u and the censoring distribution function G is continuous. Also assume condition (SP) holds. Then
Proofs
The proof of the above theorem follows along the lines in Hall and Patil (1995), combined with Stute (1995) which establishes an approximation for the Kaplan–Meier integral as an average of i.i.d. random variables with a sufficiently small error. This allows for a more traditional and direct approach to the density estimation problem for censored data, compared to the martingale approach as used, e.g. in Wu and Wells (1999). We begin with some lemmas. Lemma 4.1 Let and be defined as in Eqs.
Acknowledgements
The author expresses his deep gratitude to his advisor Professor Hira L. Koul for his constant advice, valuable suggestion and careful reading which greatly improve the presentation of this paper. The author also appreciates the constructive suggestion from Professor Winfried Stute on Lemma 4.1 and is very grateful to one referee for his pointing out errors and typos and providing many insightful comments.
References (18)
- et al.
Density estimation in Besov space
Statist. Probab. Lett.
(1992) - et al.
Density estimation by kernel and wavelet methods, optimality in Besov space
Statist. Probab. Lett.
(1993) Nonparametric hazard rate estimation by orthogonal wavelet methods
J. Statist. Plann. Inference
(1997)- et al.
Density and hazard rate estimation for right-censored data by using wavelet methods
J. Roy. Statist. Soc. B
(1999) Wavelets: A Tutorial in Theory and Applications
(1992)Ten Lectures on Wavelets
(1992)- et al.
Ideal spatial adaptation by wavelet shrinkage
Biometrika
(1994) - et al.
Wavelet shrinkageasymptopia?
J. Roy. Statist. Soc. Ser. B
(1995) - et al.
Density estimation by wavelet thresholding
Ann. Statist.
(1996)
Cited by (21)
Nonparametric regression estimates with censored data based on block thresholding method
2013, Journal of Statistical Planning and InferenceWavelet based estimation for the derivative of a density by block thresholding under random censorship
2012, Journal of the Korean Statistical SocietyWavelet estimation of conditional density with truncated, censored and dependent data
2011, Journal of Multivariate AnalysisA Berry-Esseen type bound in kernel density estimation for strong mixing censored samples
2009, Journal of Multivariate AnalysisOn the block thresholding wavelet estimators with censored data
2008, Journal of Multivariate AnalysisCitation Excerpt :They obtain the estimator’s asymptotic normality and asymptotic mean integrated squared error (MISE). Li [16] considers a nonlinear wavelet estimator of a single density function with randomly censored data and derives its mean integrated squared error. The objective of this paper is to propose block thresholding wavelet estimators with censored data for the density functions which belong to a large function class and investigate their asymptotic convergence rates.
On the minimax optimality of wavelet estimators with censored data
2007, Journal of Statistical Planning and InferenceCitation Excerpt :Antoniadis et al. (1999) provided a wavelet method for the estimation of density and hazard rate functions from randomly right-censored data. They obtained the estimator's asymptotic normality and best possible asymptotic mean integrated squared error (MISE) convergence rate for a fixed density function f. Li (2003) considered a non-linear wavelet estimator of density functions with randomly censored data and showed that its MISE, when the underlying curve is only piecewise smooth, has the same expansion as an analogous kernel estimator. However, that MISE expansion usually fails for the kernel estimators, if an additional smooth assumption is not imposed on the underlying density function.
- 1
Research partly supported by the NSF Grant DMS 0071619.