Kernel regression uniform rate estimation for censored data under $\alpha$-mixing condition

In this paper, we study the behavior of a kernel estimator of the regression function in the right censored model with $\alpha$-mixing data . The uniform strong consistency over a real compact set of the estimate is established along with a rate of convergence. Some simulations are carried out to illustrate the behavior of the estimate with different examples for finite sample sizes.

model involves pairs (T i , δ i ) i=1,...,n where only T i = Y i ∧ C i and δ i = 1I {Y i ≤C i } are observed. Let X be an IR d -valued random vector. Let (X i ) i≥1 be a sequence of copies of the random vector X and denote by X i,1 , · · · , X i,d the coordinates of X i . The study we perform below is then on the set of observations (T i , δ i , X i ) i≥1 . In regression analysis one expectes to identify, if any, the relationship between the Y i 's and X i 's. This means looking for a function m * (X) describing this relationship that realizes the minimum of the mean squared error criterion. It is well known that this minimum is achieved by the regression function m(x) defined on IR d by There is a wide range of literature on nonparametric estimation of the regression function and many nonlinear smoothers including kernel, spline, local polynomial, orthogonal methods and so on. For an overview on methods and results for both theoretical and application points of view considering independent or dependent case, we refer the reader to Collomb [13], Silverman [45], Härdle [25], Wahba [48], Wand and Jones [47], Masry and Fan [37], Cai [8] and Cai and Ould-Saïd [9]. In the uncensored case, the behavior of nonparametric estimators built upon mixing sequences is extensively studied. The consistency has been investigated by many authors. Without exhaustivity we quote Robinson [39,40], Collomb [13], Roussas [42] and Laïb [29]. Some other types of dependence structure have been considered. We refer to Yakowitz [49] for Markov chains, Delecroix [18], Laïb and Ould-Saïd [30] for ergodic processes, Hall and Hart [24] for long-range memory processes, Cai and Roussas [11] for associated random variables. Collomb and Härdle [15] obtained the uniform convergence with rates and some other asymptotic results for a family of kernel robust estimators under a ϕ-mixing condition, whereas Gonzalez-Manteiga et al. [22] developed a nonparametric test, based on kernel smoothers, to decide whether some covariates could be suppressed in a multidimensional nonparametric regression study. Under the α-mixing condition, the uniform strong convergence of the Nadaraya-Watson estimator is treated in Doukhan [19], Bosq [4] and Liebscher [34]. Roussas [42] established the consistency with rate of the regression estimator under some summability requirement. Techniques used in the estimation of nonparametric regression are closely related to density estimation; in this case, kernel estimators have been extensively studied: See for example Roussas [41], Tran [44], Vieu [46] and Liebscher [33,35]. Cai and Roussas [10] established the strong convergence of the kernel density with rate in the muldimensional case, while Tae and imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: February 20, 2008 Cox [43] established the same result with a slight difference on the rate. Andrews [1] provides a comprehensive set of results concerning the uniform almost sure convergence. Masry [36] derived sharp rates for the same kind of convergence, but confined attention to the case of bounded regression. Our goal is to establish the strong uniform convergence with rate for the kernel regression estimate under α-mixing condition in random censorship models. For this kind of model, Cai [5,7] established the asymptotic properties of the Kaplan-Meier estimator. The strong convergence of a hazard rate estimator was examined by Lecoutre and Ould-Saïd [31] while Liebscher [35] derive a rate uniform for the strong convergence of kernel density and hazard rate estimators. His result represents an improvement of that given in Cai [6]. The consistency results concerning the nonparametric estimates of the conditional survival function introduced by Beran [2], Dabrowska [16,17] in the iid case, were extended by Lecoutre and Ould-Saïd [32] to the strong mixing case. We point out that, for the independent case, the behavior of the regression function under censorship model has been extensively studied. We can quote Carbonez et al. [12], Köhler et al. [28] and Guessoum and Ould-Saïd [23]. However, few papers deal with the regression function under censoring in the dependent case. To this end, we were interested in extending the result of Guessoum and Ould-Saïd [23] from the iid to the dependent case. The paper is organized as follows: In Section 2 we give some definitions and notations under the censorship model of the regression function and strong-mixing process. Section 3 is devoted to the assumptions and main result. In Section 4, some simulations are drawn to lend further support to our theoretical results. Proof with auxiliary results are relegated to Section 5.

Definition of estimators
Suppose that {Y i , i ≥ 1} and {C i , i ≥ 1} are two independent sequences of stationary random variables. We want to estimate m(x)= IE (Y |X = x) which can be written as m( with f ·,· (x, y) being the joint density of (X, Y ) and ℓ(·) the density function of the covariates. Now, it is well known that the kernel estimator of the regression function imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: February 20, 2008 m(·) under censorship model (see, eg Carbonez et al. [12]) is given bỹ whereḠ is the survival function of the rv C and are the Watson-Nadaraya weights, K d is a probability density function (pdf) defined on IR d and h n a sequence of positive numbers converging to 0 as n goes to infinity. Then (2) can be writteñ (3) In practice, G is usually unknown, we replace it by the corresponding Kaplan-Meier [27] estimator (KME) G n defined by The properties of the KME for dependent variables can be found in Cai [5,7]. Then a feasible estimator of m(x) is given by: m n (x) = is an estimator of r 1 (x) and ℓ n (x) (defined in (3)) an estimator of ℓ(x).
In what follows, we define the endpoints of F and G by τ F = sup y,F (y) > 0 , τ G = sup y,Ḡ(y) > 0 and we assume that under study. We point out that since Y can be a lifetime we can suppose it bounded. We put t = d j=1 |t j | for t ∈ IR d . In order to define the α-mixing property, we introduce the following notations. Denote by F k i (Z) the σ−algebra generated by {Z j , i ≤ j ≤ k} .
..} denote a sequence of rv's. Given a positive integer n, set The sequence is said to be α-mixing (strong mixing) if the mixing coefficient There exists many processes fulfilling the strong mixing property. We quote, here, the usual ARMA processes which are geometrically strongly mixing, i.e., there exist ρ ∈ (0, 1) and a > 0 such that, for any n ≥ 1, α(n) ≤ aρ n (see, e.g., Jones [26]). The threshold models, the EXPAR models (see, Ozaki [38]), the simple ARCH models (see Engle [20]), their GARCH extension (see Bollerslev [3]) and the bilinear Markovian models are geometrically strongly mixing under some general ergodicity conditions. We suppose that the sequences {Y i , i ≥ 1} and {C i , i ≥ 1} are α-mixing with coefficients α 1 (n) and α 2 (n), respectively. Cai ([7], Lemma 2) showed that {T i , i ≥ 1} is then strongly mixing, with coefficient From now on, we suppose that {(T i , δ i , X i ) i = 1, ..., n} is strongly mixing. Now we are in position to give our assumptions and main result.

Assumptions and main result
We will make use of the following assumptions gathered here for easy reference: A1) The bandwidth h n satisfies: lim n→+∞ nh d n = +∞ and lim n→+∞ h µ n log log n = 0 where 0 < µ < d. A2) The kernel K d is bounded and satisfies: imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: February 20, 2008 A3) The mixing coefficient α is such that α(n) = O(n −ν ) for some ν > p A4) The function r 1 (·) defined in (1) is continuously differentiable.
Remark 3.1 Assumption A1 is very common in functional estimation both in independent and dependent cases. However, it must be reinforced by Assumptions A3 and A7 which ensure a practical calculus of the covariance's terms and the convergence of the series which appear in proof of Lemma 3. Assumptions A2, A4, A5 and A6 are needed in the study of the bias term of r 1,n (x) which is the kernel estimator of r 1 (x). We point out that we do not require for K d to be symmetric as in Guessoum and Ould-Saïd [23]. Assumption A8 intervenes in the convergence of the kernel density. Finally, the boundeness of Y is made only for the simplification of the proof. It can be dropped while using truncation methods as in Laïb and Ould-Saïd [30].
In the sequel letter C denotes any generic constant. Our main result is given in the following theorem which concerns the rate of the almost sure uniform convergence of the regression function.
a.s. This is the optimal rate obtained by Liebscher [34] in the uncensored case.

Simulations Study
First, we consider the strong mixing bidimentionnal process generated by: where 0 < ρ < 1, (ǫ i ) i is a white noise with standard Gaussian distribution and X 0 is a standard Gaussian rv independent of (ǫ i ) i . We also simulate n iid rv C i exponentially distributed with parameter λ = 1.5. It is clear that the process (X n , Y n , C n ) is stationary and strongly mixing, in fact the process (X n ) is an AR(1) and given X 1 = x, we have Y 1 = ρx + 1 − ρ 2 ǫ 2 , then, Y 1 ֒→ N (ρx, 1 − ρ 2 ). In all cases we took ρ = 0.9. We calculate our estimator based on the observed data (X i, T i , δ i , ) i = 1, ..., n, by choosing a Gaussian kernel K. In this case, we have m(x)= IE (Y 1 |X 1 = x) = ρx. In all cases we took h n satisfying A1 and A7, that is h n = O (log n/n) 1/3 . We notice that the quality of fit increases with n (see Figure 1).
We also consider two nonlinear cases Then we have m(x)= sin( π 2 x) for (5) and m(x)= 5 12 ρ 2 x 2 + 5 12 (1 − ρ 2 ) − 2 for (6).   Figures 2 and 3 show that the quality of fit for the non linear model is as good as in the linear model.

Proofs
We split the proof of the Theorem 3.1 in the following Lemmata.
Lemma 5.1 Under Assumptions A1, A2 i) and A4, for n large enough: Proof of Lemma 5.1: Observe that Then, we have from (3) A Taylor expansion gives where x ′ is between x − h n t and x. Then Then Assumptions A1, A2 i) and A4 , give the result. Now, we introduce the following lemma (see Ferraty and Vieu, [21] Proposition A.11 ii), p. 237).
Lemma 5.2 Let {U i , i ∈ IN} be a sequence of real random variables, with strong mixing coefficient α(n) = O(n −ν ), ν > 1, such that ∀n ∈ IN, ∀i ∈ IN, 1 ≤ i ≤ n |U i | < +∞. Then for each ε > 0 and for each q > 1 : Proof of Lemma 5.3: C is a compact set, then it admits a covering S by a finite number υ n of balls where a n verifies a dγ n = h d(γ+ 1 2 ) n n − d 2 , (γ is the same as in Assumption A2 iii)). Since C is bounded there exists a constant M > 0 such that υ n ≤ M a d n . Now we set, for x ∈ C: It is obvious that On the other hand, let U i = nh d n ∆ i (x * k ). In order to apply Lemma 5.2, we have to calculate S 2 n . It is clear that Then to evaluate S 2 * n the idea is to introduce a sequence of integers w n which we precise below. Then we use (7) for the close i and j and (8) otherwise. That is ≤ C nh 2d n w n + Cn 2 α (w n ) .
Assumption A3 and the right part of Assumption A7 yield n 2 α( 1 Finally, we have Then, for ε > 0, applying Lemma 5.2, we have If we replace ε by ε 0 log n nh d n for all ε 0 > 0 in (9), we get We have from the left part of assumption A7 Then, for an appropriate choice of θ, J 2 is the general term of a convergent series. In the same way, J 1 ≤ n ς−Cε 2 0 and we can choose ε 0 such that J 1 is the general term of convergent series. Finally, applying Borel-Cantelli lemma, to (11) gives the result.
Remark 5. 1 We point out that the parameter θ of Assumption A7 can be chosen such as: This condition ensures the convergence of the series of Lemma 3.
Furthermore, under A2 i) and A8 and using a Taylor expansion, we get sup x∈C |IE (ℓ n (x)) − ℓ(x)| = O(h n ) which permit us to conclude. The kernel estimator ℓ n (x) is almost surely bounded away from 0 because of Lemma 5.4 and the fact that the second part of Assumption A8. Then, (13) in conjunction with Lemmas 5.1, 5.3, 5.4 and 5.5 we conclude the proof.