A strong uniform convergence rate of a kernel conditional quantile estimator under random left-truncation and dependent data

In this paper we study some asymptotic properties of the kernel conditional quantile estimator with randomly left-truncated data which exhibit some kind of dependence. We extend the result obtained by Lemdani, Ould-Sa\"id and Poulin [16] in the iid case. The uniform strong convergence rate of the estimator under strong mixing hypothesis is obtained.


Introduction
Let Y and T be two real random variables (rv) with unknown cumulative distribution functions (df) F and G respectively, both assumed to be contin-uous. Let X be a real-valued random covariable with df V and continuous density v. Under random left-truncation (RLT), the rv of interest Y is interfered by the truncation rv T , in such a way that Y and T are observed only if Y ≥ T . Such data occur in astronomy and economics (see Woodroofe [31], Feigelson and Babu [7], Wang et al. [30], Tsai et al. [29]) and also in epidemiology and biometry (see, e.g., He and Yang [12]). If there were no truncation, we could think of the observations as (X j , Y j , T j ) ; 1 ≤ j ≤ N, where the sample size N is deterministic, but unknown. Under RLT, however, some of these vectors would be missing and for notational convenience, we shall denote (X i , Y i , T i ) ; 1 ≤ i ≤ n, (n ≤ N) the observed subsequence subject to Y i ≥ T i from the N−sample. As a consequence of truncation, the size of actually observed sample, n, is a binomial rv with parameters N and µ := IP (Y ≥ T ) > 0. By the strong law of large numbers we have, as N → ∞ µ n := n N → µ, IP−a.s.
Now we consider the joint df F (., .) of the random vector (X , Y) related to the N−sample and suppose it is of class C 1 . The conditional df of Y given X = x =: (x 1 , ..., x d ) t , that is F (y|x) = IE 1 {Y≤y} |X = x which may be rewritten into where F 1 (x, .) is the first derivative of F (x, ·) with respect to x. For all fixed p ∈ (0, 1), the p th conditional quantile of F given X = x is defined by q p (x) := inf {y ∈ IR : F (y|x) ≥ p} .
It is well known that the quantile function can give a good description of the data (see, Chaudhuri et al. [5]), such as robustness to heavy-tailed error distributions and outliers, especially the conditional median function q 1/2 (x) for asymmetric distribution, which can provide a useful alternative to the ordinary regression based on the mean. The nonparametric estimation of conditional quantile has first been considered in the case of complete data (no truncation). Roussas [24] showed the convergence and asymptotic normality of kernel estimates of conditional quantile under Markov assumptions. For independent and identically distributed (iid) rv's, Stone [27] proved the weak consistency of kernel estimates. The uniform consistency was studied by Schlee [26] and Gannoun [9]. The asymptotic normality has been established by Samanta [25]. Mehra et al. [20] proposed and discussed certain imsart-generic ver. 2008/01/24 file: ejs_2008_306.tex date: October 7, 2008 smooth variants (based both on single as well as double kernel weights) of the standard conditional quantile estimator, proved the asymptotic normality and found an almost sure (a.s.) convergence rate, whereas Xiang [32] gave the asymptotic normality and a law of the iterated logarithm for a new kernel estimator. In the dependent case, the convergence of nonparametric estimation of quantile was proved by Gannoun [10] and Boente and Fraiman [1]. In the RLT model, Gürler et al. [11] gave a Bahadur-type representation for the quantile function and asymptotic normality. Its extension to time series analysis was obtained by Lemdani et al. [15]. The aim of this paper is to establish a strong uniform convergence rate for the kernel conditional quantile estimator with randomly left-truncated data under α−mixing conditions whose definition is given below. Hence, we extend the obtained result by Lemdani et al. [16] in the iid case. First, let F k i (Z) denotes the σ-field of events generated by {Z j , i ≤ j ≤ k}. For easy reference, let us recall the following definition.
Given a positive integer n, set: The sequence is said to be α−mixing (strongly mixing) if the mixing coefficient α(n) → 0.
Among various mixing conditions used in the literature, α−mixing is reasonably weak and has many practical applications (see, e.g. Doukhan [6] or Cai ([3,4] for more details). In particular, Masry and Tjφstheim [18] proved that, both ARCH processes and nonlinear additive AR models with exogenous variables, which are particularly popular in finance and econometrics, are stationary and α−mixing.
The rest of the paper is organized as follows. In Section 2, we recall a definition of the kernel conditional quantile estimator with randomly left-truncated data. Assumptions and main results are given in Section 3. Section 4 is devoted to application to prediction. Finally, the proofs of the main results are postponed to Section 5 with some auxiliary results and their proofs.

Definition of the estimator
In the sequel, the letters C and C ′ are used indiscriminately as generic constants. Note also that, N is unknown and n is known (although random), our results will not be stated with respect to the probability measure IP (related to the N−sample) but will involve the conditional probability P (related to the n−sample). Also IE and E will denote the expectation operators related to IP and P, respectively. Finally, we denote by a superscript ( * ) any df that is associated to the observed sample. The estimation of conditional df is based on the choice of weights. For the complete data, the well-known Nadaraya-Watson weights are given by that are measurable functions of x depending on X 1 , ..., X N , with the convention 0/0 = 0. The kernel K is a measurable function on IR d and (h N ) a nonnegative sequence which tends to zero as N tends to infinity. The regression estimator based on the N-sample is then given by where v N is a well knwon kernel estimator of v based on the N−sample. As N is unknown, then v N (·) cannot be calculated and therefore r N (·). On the other hand, based on the n-sample, the kernel estimator is an estimator of the conditional density v * (x) (given Y ≥ T ), see Ould-Saïd and Lemdani [21]. Under RLT sampling scheme, the conditional joint distribution (Stute, [28]) of (Y, T ) becomes imsart-generic ver. 2008/01/24 file: ejs_2008_306.tex date: October 7, 2008 where t ∧ u := min(t, u). The marginal distribution and their empirical versions are defined by where 1 A denote the indicator function of the set A.
In the sequel we use the following consistent estimator for any y such that C n (y) = 0, where F n (y−) denotes the left-limite of F n at y. Here F n and G n are the product-limit estimators (Lynden-Bell [17]) for F and G, respectively i.e., He and Yang [13] proved that µ n does not depend on y and its value can then be obtained for any y such that C n (y) = 0. Furthermore, they showed (see their Corollary 2.5) its IP−a.s. consistency. Suppose now that one observes the n triplets (X i , Y i , T i ) among the N ones and for any df L, denotes the left and right endpoint of its support by a L := inf {x : L(x) > 0} and b L := sup {x : L(x) < 1} , respectively. Then under the current model, as discussed by Woodroofe [31], F and G can be estimated completely only if In order to estimate the marginal density v we have to take into account the truncation and the estimator imsart-generic ver. 2008/01/24 file: ejs_2008_306.tex date: October 7, 2008 is considered in Ould-Saïd and Lemdani [15]. Note that in this formula and the forthcoming, the sum is taken only for i such that G n (Y i ) = 0. Then, adapting Ould-Saïd-Lemdani's weights, we get the following estimator of the conditional df of Y given where H is a distribution function defined on IR, and is an estimator of F 1 (x, y). As the latter is continuous, it is clear that it is better to define a smooth estimator by using a continuous function H(·) instead of a step function I {·} . We point out here that the estimators (8) and (9) have been already defined in Lemdani et al. [16]. Then a natural estimator of the p th conditional quantile q p (x) is given by which satisfies F n (q p,n (x)|x) = p.

Assumptions and main results
In what follows, we focus our attention on the case of a univariate covariable (d = 1) and denote X for x and K for K 1 . Assume that 0 = a G < a F and b G ≤ b F . We consider two real numbers a and b such that a F < a < b < b F .
Let Ω be a compact subset of Ω 0 = {x ∈ IR|v(x) > 0} and γ := inf x∈Ω v(x) > 0. We introduce some assumptions, gathered below for easy reference needed to state our results.
(K1) K is a positive-valued, bounded probability density, Hölder continuous with exponent β > 0 and satisfying which is poisitive, bounded and has compact support. It is also Hölderian with exponent β. (K3) i) H (1) and K are second-order kernels, for some constant C not depending on (i, j) . (D3) The joint conditional density of (X i , Y i , X j , Y j ) denoted by f * (., ., ., .), exists and satisfies for any constant C, The joint density f (., .) is bounded and twice continuously differentiable. (D5) The marginal density v(.) is locally Lipschitz continuous over Ω 0 .
The bandwidth h n =: h satisfies: Our first result, stated in Proposition 3.1, is the uniform almost sure convergence with rate of the conditional df estimator defined in (8). The second result deals with the strong uniform convergence with rate of the kernel conditional quantile estimator q p,n (.) which is given in the following theorem.

Applications to prediction
It is well known, from the robustness theory that the median is more robust than the mean, therefore the conditional median, µ(x) = q 1/2 (x), is a good alternative to the conditional mean as a predictor for a variable Y given X = x. Note that the estimation of µ(x) is given by µ n (x) = q1 2 ,n (x). Using this considerations and section 2, we want to predict the non observed r.v. Y n+1 (which corresponds to some modality of our problem), from available data X 1 , . . ., X n . Given a new value X n+1 , we can predict the corresponding response Y n+1 by Y n+1 = µ n (X n+1 ) = q 1/2,n (X n+1 ).
Applying the above Theorem, we have the following corollary: s. as n → ∞.

Proofs
We need some auxiliary results and notations to prove our results. The first lemma gives the uniform convergence with rate of the estimator v * n (x) defined in (5). Proof. We have We begin by study the variability term I 1n . The idea consists in using an exponential inequality taking into account the α−mixing structure. The compact set Ω can be covered by a finite number l n of intervals of length ω n = (n −1 h 1+2β) ) 1 2β , where β is the Hölder exponent. Let I k := I(x k , ω n ); k = 1, ..., l n , denote each interval centered at some points x k . Since Ω is bounded, there exists a constant C such that ω n l n ≤ C. For any x in Ω, there exists I k which contains x such that |x − x k | ≤ ω n . We start by writing Firstly, we have under assumption (K1), Hence, by (H1) and for n large enough, we get S 1n = o P (1). We now turn to the term S 2n in (13). Under (K1), the rv's U i = nh△ i (x k ) are centered and bounded. The use of the well known Fuk-Nagaev's inequality (see Rio [23, formula 6.19b, page 87]) slightly modified in Ferraty and Vieu [8, see proposition A.11-ii), page 237], allows one to get, for all ε > 0 and r > 1 where Putting r = (log n) 1+δ , where δ > 0, and ε = ε 0 log n nh , for some ε 0 > 0. (16) We have Note that under (M3) , it is easy to see that the following modified assumption (H ′ 2) of (H2) hold, where η satisfies ν and β are the same as in (M3).
imsart-generic ver. 2008/01/24 file: ejs_2008_306.tex date: October 7, 2008 Then, from the left-hand side of (H ′ 2) Hence, for any η as in (17), Q 1n is bounded by the term of a finite-sum series. Before we focus on Q 2n , we have to study the asymptotic behavior of =: s var n + s cov n . First, by (K3 : ii), (D1) and a change of variable, we obtain On the other hand, a change of variable, (K1), (M1) and (D2) lead to Note also that, these covariances can be controlled by means of the usual Davydov covariance inequality for mixing processes (see Rio [ To evaluate s cov n , we use the technique developed by Masry [18]. Taking ϕ n = (n −1 h n ) −1/ν (where ⌈.⌉ denotes the smallest integer greater than the argument), we can write First, applying the upper bound (19) to the first covariance term in (21), we get For the second term, thanks to (20) we get According to the right-hand side of (H ′ 2), using (M3), (22) and (23), we get Finally, (18) and (24) lead directly to s 2 n = O (nh) . This is enough to study the quantity Q 2n , since for ε and r as in (16) and Taylor expansion of log(1 + x) allows us to write that By using (H ′ 2) and (M3), the later can be made as a general term of a convergent series. Hence n≥1 (Q 1n + Q 2n ) < ∞, and therefore by Borel-Cantelli's Lemma, we have On the other hand, the bias term I 2n does not depend on the mixing structure. We prove its convergence by using a change of variable and a Taylor expansion (see Lemma 6.1 in Lemdani et al. [16]). We get, under (K3) and (D1) Hence, replacing I 1n and I 2n in (12), we get the result.
The following Lemma is Lemma 4.2 in Ould-Saïd and Tatachak [22], in which they state a rate of convergence for µ n under α-mixing hypothesis, which is interesting in itself, similar to that established in the iid case by He and Yang [13]. Proof. See Lemma 4.2 in Ould-Saïd and Tatachak [22].
Adapting (9), definẽ  Proof. The proof is analogous to that in Lemma 5.1. We give only the leading lines. As Ω and [a, b] are compact sets, then they can be covered by a finite number l n and d n of intervals I 1 , ..., I ln and J 1 , ..., J dn of length ω n as in Lemma 6.1 and λ n = n −1 h 2β 1 2β and centers x 1 , ..., x ln and y 1 , ..., y dn respectively. Since Ω and [a, b] are bounded, there exist two constant C 1 and C 2 such that l n ω n ≤ C 1 and d n λ n ≤ C 2 . Hence for any (x, y) ∈ Ω × [a, b], there exist x k and y j such that |x − x k | ≤ ω n and |y − y j | ≤ λ n . Thus we have the following decomposition Firstly, concerning J 1n and J 5n , assumptions (K1) and (K2) yield Hence, by (H1) we get Similarly, we obtain for J 2n and J 4n Again, by (H1) we get imsart-generic ver. 2008/01/24 file: ejs_2008_306.tex date: October 7, 2008 As to J 3n , for all ε > 0 we have Set, for any i ≥ 1, Under (K1) and (K2), the rv's V i := nhΨ i (x k , y j ) are centered and bounded by 2µM 0 M 1 G(a F ) =: C < ∞. Then, applying again Fuk-Nagaev inequality, we obtain that, for all ε > 0 and r > 1, where By taking ε and r as in (16), we get Then, using (H1) and (H2) we get Hence, the condition upon β and for any η as in (H2), J 31n is the general term of a finite-sum series.
imsart-generic ver. 2008/01/24 file: ejs_2008_306.tex date: October 7, 2008 Let us now examine the term J 32n . First, we have to calculate We have Remark that Then Under (K3 : ii) and (D1), we have V 1 = O (h). An analogous developpement gives that By using (H2) and (M3), the later can be made as a general term of summable series. Thus n≥1 (J 31n + J 32n ) < ∞. Then by Borel-Cantelli's Lemma, the first term of (29) goes to zero a.s. and for n large enough, we have Proof. The bias terms do not depend on the mixing structure. The proof of Lemma 5.5 is similar to that of Lemma 6.2 in Lemdani et al. [16], hence its proof is omitted.
The next Lemma gives the uniform convergence with rate of the estimator v n (x) defined in (7). Proof. Adapting (7), definẽ v n (x) = µ nh We have imsart-generic ver. 2008/01/24 file: ejs_2008_306.tex date: October 7, 2008 For the first term, using analogous framework as in Lemma 5.3, we get In addition, by using the same approach as for I 1n in the proof of Lemma 5.1, we can show that, for n large enough Finally, a change of variable and a Taylor expansion we get, under (K3) and (D5) , which yields that L 3n = O h 2 , P − a.s. as n → ∞. imsart-generic ver. 2008/01/24 file: ejs_2008_306.tex date: October 7, 2008 In conjunction with Lemmas 5.3-5.6, we conclude the proof.
We now embark on the proof of Theorem 3.1.
The consistency of q p,n (x) follows then immediately from Proposition 3.1 in conjunction with the inequality For the second part, a Taylor expansion of F (.|.) in neighborhood of q p , implies that F (q p,n (x)|x) − F (q p (x)|x) = (q p,n (x) − q p (x)) f (q p (x)|x) whereq p is between q p and q p,n and f (.|x) is the conditional density of Y given X = x. Then, from the behavior of F (q p,n (x)|x) − F (q p (x)|x) as n goes to infinity, it is easy to obtain asymptotic results for the sequence (q p,n (x) − q p (x)) . By (37) we have sup x∈Ω |q p,n (x) − q p (x)| |f (q p (x)|x)| ≤ sup x∈Ω sup a≤y≤b |F n (y|x) − F (y|x)| .
The result follows from (D4) and the Proposition 3.1. Here we point out that, if f (q p (x)|x) = 0, for some x ∈ Ω, we should increase the order of Taylor expansion to obtain the consistency of q p,n (x) (with an adapted rate).