Nonparametric Multivariate L1-median Regression Estimation with Functional Covariates

In this paper, a nonparametric estimator is proposed for estimating the L1-median for multivariate conditional distribution when the covariates take values in an infinite dimensional space. The multivariate case is more appropriate to predict the components of a vector of random variables simultaneously rather than predicting each of them separately. While estimating the conditional L1-median function using the well-known Nadarya-Waston estimator, we establish the strong consistency of this estimator as well as the asymptotic normality. We also present some simulations and provide how to built conditional con?fidence ellipsoids for the multivariate L1-median regression in practice. Some numerical study in chemiometrical real data are carried out to compare the multivariate L1-median regression with the vector of marginal median regression when the covariate X is a curve as well as X is a random vector.


Introduction
In statistics, researchers are often interested in how a variable response Y may be concomitant with an explanatory variable X. Studying the relationship between Y given a new value of the explanatory variable X is an important task in non-parametric statistics. For instance, regression function provides the mean value that takes Y given X = x. Some other characteristics of the conditional distribution, such as conditional median, conditional quantiles, conditional mode, maybe quite interesting in practice. Furthermore, it is widely acknowledged that quantiles are more robust to outliers than regression function.
Conditional quantiles are widely studied when the explanatory variable X lies within a finite dimensional space. There are many references on this topic (see Gannoun et al. (2003a)).
During the last decade, thanks to progress of computing tools, there is an increasing number of examples coming from different fields of applied sciences for which the data are curves. For instance, some random variables can be observed at several different times. This kind of variables, known as functional variables (of time for instance) in the literature, allows us to consider the data as curves. The books by Bosq (2000) and Ramsay and Silverman (2005)) propose an interesting description of the available procedures dealing with functional observations whereas Ferraty and Vieu (2006) present a completely non-parametric point of view. These functional approaches mainly rely on generalizing multivariate statistical procedures in functional spaces and have been proved to be useful in various areas such as chemiomertrics (Hastie and Mallows (1993) and Quintela-del Río and Francisco-Fernández (2011)), economy (Kneip and Utikal (2001)), climatology (Besse et al. (2000)), biology (Kirkpatrick and Heckman (1989)), Geoscience (Quintela-del Río and Francisco-Fernández (2011)) or hydrology (Chebana and Ouarda (2011)). These functional approaches are generally more appropriate than longitudinal data models or time series analysis when there are, for each curve, many measurement points (Rice (2004)).
In the univariate case (i.e. Y ∈ R and X is a functional covariable), among the lot of papers dealing with the nonparametric estimation of conditional quantiles, one may cite papers by Cardot et al. (2005) which introduced univariate quantile regression with functional covariate and Ferraty et al. (2005) estimates conditional quantile by inverting the conditional cumulative distribution function. Ezzahrioui and Ould-Saïd (2008) establish the almost complete convergence and the asymptotic normality in the setting of independent and identically distributed (i.i.d.) data as well as under α-mixing condition. Dabo-Niang and Laksaci (2012) stated the convergence in L p -norm. In the same framework, Laksaci et al. (2009) estimated the conditional quantile nonparametrically, by adapting the L 1 -norm method. Recently Quintela-del Río and Vieu (2011) have used the same approach proposed by Ferraty et al. (2005) to predict future stratospheric ozone concentrations and to estimate return levels of extreme values of tropospheric ozone.
Over the past decades, researchers have shown increasing interest in studying multivariate location parameters such as multivariate quantiles in order to find suitable analogs of univariate quantiles that used to construct descriptive statistics and robust estimations of location. In contrast to the univariate case, the order of observations Y i laying in R d (with d ≥ 2) is not total. Consequently, several quantiles-type multivariate definitions have been formulated. The pioneer paper of Haldane (1948) considered a multivariate extension of the median defined as an M -estimator (also called spatial or L 1 -median). The reader is referred to Serfling (2002) for historical reviews and comparisons. Chaudhuri (1996) and Koltchinskii (1997) defined the geometric quantile as an extension of multivariate quantiles based on norm minimization and on the geometry of multivariate data clouds.
In contrast, relative little attention has been paid to the multivariate conditional quantiles (Y ∈ R d and X ∈ R s ) and their large sample properties. Cadre (2001) defined the conditional L 1 -median and provided its uniform consistency on a compact subsets of R s . Recently, De Gooijer et al. (2006) have introduced a multivariate conditional quantile notion, which extends the definition of unconditional quantiles by Abdous and Theodorescu (1992), to predict tails from bivariate time series. Cheng and De Gooijer (2007) have generalized the notion of geometric quantiles, defined by Chaudhuri (1996), to the conditional setting. They have established a Bahadur-type linear representation of the u-th geometric conditional estimator as well as the asymptotic normality in the i.i.d. case.
The purpose of this paper is to add some new results to the non-parametric estimation of the conditional L 1 -median when Y is a random vector with values in R d while the covariable X take its values in some infinite dimensional space F. As far as we know, this problem has not been studied in literature before and the results obtained here are believed to be novel. Moreover, our motivation for studying this type of robust estimator is due to its interest in some practical applications. Note also that, it would be better to predict all components of a vector of random variables simultaneously in order to take into account the correlation between them rather than predicting each of component separately. For instance, in EDF (French electricity company) the estimation of the minimum and the maximum of the electricity power demand represents an important research issue for both economic and security reasons. Because an underestimation of the maximum consumed quantity of electricity (especially in winter) may require importation of electricity from other European countries with high prices, while an over estimation of this maximum quantitiy may induce a negative effect on the electricity distribution network. The estimation of the minimum power demand is also an important task for the same reasons. Notice that the minimum and the maximum of the electricity power demand are strongly correlated. Thus, it is more appropriate to predict these variables simultaneously rather than predicting each of them separately. On the other hand, weather variables, like temperature curves, can play a key role to explain the minimum and the maximum of power demand. Due to its robust properties, the conditional L 1 -median may be used to solve this prediction problem using a temperature curve as covariate.
The paper is organized as follows. Section 2 outlines notations and the form of the new estimator. Section 3 presents the main results concerning the asymptotic behavior of the estimator, including consistency, asymptotic normality and evaluation of the bias term. An estimation of the conditional confidence region is then deduced. Section 4 is devoted to a simulation study giving an example of the estimated confidence region. An application to chemiometrical real data is proposed in Section 5, where we compare three approaches: L 1 -median regression, the vector of marginal conditional median and non-functional multivariate median to predict a random vector. The proofs of the results in Section 3 are relegated to the Appendix.

Notations and definitions
Let us consider a random pair (X, Y ) where X and Y are two random variables defined on the same probability space (Ω, A, P). We suppose that Y is R d -valued and X is a functional random variable (f.r.v.) takes its values in some infinite dimensional vector space (F, d(·, ·)) equipped with a semimetric d(·, ·). Let x be a fixed point in F and F (.|x) be the conditional cumulative distribution (1) The general definition (1) does not assume the existence of the first order moment of Y . However, when Y has a finite expectation, µ(x) becomes a minimizer over Notice that the existence and the uniqueness of µ(x) is guaranteed, for d ≥ 2, provided that the conditional distribution function F (·|x) is not supported on a single straight line (see theorem 2.17 of Kemperman (1987). Hence, uniqueness holds whenever Y has an absolutely continuous conditional distribution on R d with d ≥ 2. Without loss of generality, we suppose in the sequel, that E Y < ∞. Therefore for any fixed x ∈ F, the conditional L 1 -median µ(x) may be viewed as a minimizer of the function G which is assumed to be differentiable and uniformly bounded with respect to u.
We introduce now some further definitions and notations. Denote by A t the transpose of the matrix A, and let A = tr(A t A) be the norm trace. Notice that for any y ∈ R d , the function y −→ y is differentiable everywhere except at z = 0 R d , one may then define (by continuity extension) its derivative as U(y) = y/ y when y = 0 and U(y) = 0 whenever y = 0. For any y = u, define M(y, where I d is the d × d identity matrix. We denote by ∇ u G x (u) the gradian of the function G x (u) and by H x (u) its Hessian functional matrix (with respect to u). According to Koltchinskii (1997), it is easy to see that Notice that H x (u) is bounded whenever E Y − u −1 | X = x < ∞. According to (1) and (3), the conditional L 1 -median may be then implicitly defined as a zero with respect to u of the following equation: To build our estimator, let (X i , Y i ) i=1,...,n be the statistical sample of pairs which are independent and identically distributed as (X, Y ). Let us denote by , the so-called Nadaraya-Watson weights, where ∆ i (x) = K (d(x, X i )/h), with K a kernel function, h := h n is a sequence of positive real numbers which decreases to zero as n tends to infinity. A kernel estimator of the function G x (u) is given by when the denominator is not equal to 0, where A kernel estimate of ∇ u G x (u) may be defined by According to the statement (2), the estimator of the conditional L 1 -median, µ n (x), may be viewed as a minimizer over u of the function G x n (u), that is or as a zero with respect to u of the equation ∇ u G x n (u) = 0. Similar to the Fact 2.1.1 in Chaudhuri (1996) and Remark 2.3 in Cheng and De Gooijer (2007), the existence of the estimator µ n (x) is guaranteed by the fact that the function u −→ n i=1 w n,i (x) Y i − u explodes to infinity as ||u|| → ∞. On the other hand, since this function is continuous with respect to u, then µ n (x) must be a minimizer over u of n i=1 w n,i (x) Y i − u . Next comes the question of uniqueness, since R d is equipped with the Euclidean norm that is a strictly convex Banach space for d ≥ 2, it follows from Theorem 2.17 of Kemperman (1987) that unless all the data points Y 1 , . . . , Y n fall on a straight line in R d , n i=1 w n,i (x) Y i − u must be a strictly convex function of u. This guarantees the uniqueness of the minimizer µ n (x) in R d , for any d ≥ 2.

Further notations and hypotheses
Let x be a given point in F and V x a neighbourhood of x. Denote by B(x, h) the ball of center x and radius h, Our hypotheses are gathered here for easy reference.

Comments on the Hypotheses
The above conditions are fairly mild. Condition (H1) is standard in the context of functional non-parametric estimation. Contrarily to the real and vectorial cases (for which we generally suppose the strict positivity of the explanatory variable's density, the concentration hypothesis (H2)-(i) acts directly on the distribution of the functional random variable rather than on its density function. The idea of writing the small ball probability F x (h) as a product of two independent functions g(x) and φ(h) was adopted by Masry (2005) who reformulated the Gasser et al. (1998) one. This assumption has been used by many authors where g(x) is interpreted as a probability density, while φ(h) may be interpreted as a volume parameter. In the case of finite-dimensional space, that is is the volume of the unit ball in R d . Furthermore, in infinite dimensions, there exist many examples fulfilling the decomposition mentioned in assumption (H2)-(i) (see Ferraty et al. (2007) and Ezzahrioui and Ould-Saïd (2008) for more details). The function τ 0 (·), introduced in assumption (H2)-(ii), plays a determinant role in asymptotic properties, in particular when we give the order of the conditional bias and the asymptotic variance term.
Conditions (H3) and (H4) are mild smoothness assumptions on the functionals G (·) (u) and H (·) (u) and continuity assumptions on certain second-order moments. A similar assumption to (H3)-(iii) has been supposed in Cheng and De Gooijer (2007) (see condition 6 in their paper). Condition (H5) is used to evaluate the bias term.

Almost sure consistency
The following result states the almost surely (a.s.) convergence (with rate) of the functional estimator G x n (u). This result plays an instumental role to prove the almost sure consistency of µ n (x) for a fixed x ∈ F. Proposition 3.1 Assumes that conditions (H1)-(H2), (H3)(i) and (H4)(i) hold true and Then, we have Notice that the condition (11) is standard when we deal with the uniform consistency of the density function on the whole space (see, for instance, Corollary 2.2 of Bosq (1996)).
Here then, we give our first result of the conditional L 1 -median estimator µ n (x).

Asymptotic normality
To state the asymptotic normality of our estimator, some notations are required. Let us first denote by Set µ(x) =: µ = (µ 1 , . . . , µ d ) t and µ n (x) =: µ n = (µ n,1 , . . . , µ n,d ) t . We have by the definition of µ n that Obviously the equation (13) is satisfied when the numerator is null. Then, we can say also that Thereafter, one may write For each j ∈ {1, . . . , d}, Taylor's expansion applied to the real-valued function ∂ G x n ∂u j implies the existence of ξ n (j) = (ξ n,1 (j), . . . , ξ n,d (j)) t such that where, for all u ∈ R d and x ∈ F, (15) can be then rewritten as Equation (16) plays a key role to give the conditional bias and the asymptotic distribution of the conditional L 1 -median estimator µ n .
Proposition 3.2 Under assumptions (H1)-(H3) and (H4)(i) and condition (10) Using Remark 4 and Lemma 5.3 of Chaudhuri (1992), we know that both the matrix H x (µ) itself and its inverse matrix exist whenever d ≥ 2. It follows from this result combined with (16) that, for n large enough, . One may then write, for large n that The following proposition gives the order of the conditional bias term Proposition 3.3 Under assumptions (H1), (H2) and (H5), and the fact that g(x) > 0 and | 1 0 (sK(s)) τ 0 (s)ds| < ∞, we have: The Theorem below gives the asymptotic normality of our estimator.
Theorem 3.3 Suppose assumptions (H1)-(H5) and condition (10)(i) hold. If (nφ(h)) δ/2 → ∞, for some δ > 0, then: (ii) If in addition we impose the following stronger conditions on the bandwidth h n : Remark 3.4 . (i) Notice that the constants M 1 and M 2 are strictly positive. Indeed making use of the condition (H1) and the fact that the function τ 0 (·) is nondecreasing, it suffices to perform a simple integration by parts. Also, from the point that the conditional distribution Y given X = x is absolutely continuous, we know that Σ x (µ) is definite positive matrix.
(ii) Whenever F = R s , s ≥ 1, and if the probability density of the random variable X, say g s (·), is of class is the volume of the unit ball of R s . In such case, the asymptotic variance expression takes the form In such case the central limit theorem has the form given in the above theorem with convergence rate (nh s n ) 1/2 . Notice that in the finite dimensional case, the function φ(h) could decrease to zero as h → 0 exponentially fast and the convergence rate becomes effectively (nφ(h)) 1/2 . This fact may be used to solve the problem of the curse of dimensionality (see Masry (2005), for details). As an example, consider in an infinite dimensional space setting, the random process defined by It is well-known (see Lipster and Shiryayev (1972)) that the distribution ν X of X is absolutely continuous with respect to the Wiener measure ν X , which admets a Radon-Nikodym density f (x). In this case, hypothesis (H2)(i) is satisfied with φ(h) = 4 π exp(− π 2 8h 2 ) (see Laïb and Louani (2011) for details). The convergence rate in Theorem 3.3 being O(n Observe now in Theorem 3.3 that the limiting variance contains the unknown function g(x), therefore the normalization depends on the function φ which is not identifiable explicitly. To make this result operational in practice, we have to estimate the quantities Σ, H and τ 0 . For this purpose, we estimate the conditional variance matrix Σ Making use of the decomposition of F x (u) in (H2)(i), one may estimate τ 0 (u) by Subsequently, for a given kernel K, the quantities M 1 and M 2 are estimated by M 1,n and M 2,n respectively replacing τ 0 by τ n in their respective expressions.
Corollary 3.5 below, which is a slight modification of Theorem 3.3, allows to obtain usefull form of our results in practice.
Corollary 3.5 Assume that conditions of Theorem 3.3 hold true, K and (K 2 ) are integrable functions. If in addition we suppose that nF x (h) → ∞ and h β (nF x (h)) 1/2 → 0, as n → ∞, where β is specified in the condition (H3), then, for any x ∈ F such that g(x) > 0, we have

Simulation example
Let us consider a bi-dimensional vector Y = (Y 1 , Y 2 ) ∈ R 2 and X(t) is a Brownian motion trajectories defined on [0, 1]. The eigenfunctions of the covariance operator of X are known to be (see Ash and Gardner (1975)), for j = 1, 2, . . .
Let (f 1 (t)) t∈[0,1] (resp. (f 2 (t)) t∈[0,1] ) be the first (resp. the second) eigenfunction corresponding to the first (resp. second) greater eigenvalue of the covariance operator of X. It is well known that f 1 (t) and f 2 (t) are orthogonal by construction, i.e. < f 1 , f 2 >:= 1 0 f 1 (t)f 2 (t) = 0. We modelize then the dependence between Y and X by the following model: where is a standard normal random variable.  ...,200 . The left box contains the covariates X i and in the right one we present their associated vectors Y i .
We have simulated n = 200, 700 independent realizations (X i , Y i ), i = 1, . . . , n. To deal with the Brownian random functions X i (t), their sample were discretized by 100 points equispaced in [0, 1]. In Figure 1, we plot a 200 simulated couples (X i , Y i ) i=1,...,200 as described above. The left box contains the covariates X i and in the right one we present the associated vectors Y i = (Y 1 i , Y 2 i ). We aim to assess, for a fixed curve X = x, the performance of the asymptotic conditional confidence ellipsoid given by (18) in finite sample. For that we have first to estimate µ(x). Three parameters should be fixed in this step: the kernel K, the bandwidth h and the semimetric d(·, ·) which measure the similarity between curves. Choice of the kernel: there are many possible density kernel functions. Specialists in nonparametric estimation agree that the exact form of the kernel function does not greatly affect the final estimate with regard to the choice of the bandwidth. In this section, the so-called Gaussian kernel will be used, which is defined by K(u) = (2π) −1/2 exp(−u 2 /2), for u ∈ R. Choice of the bandwidth h n : the bandwidth determines the smoothness of the estimator. The problem of the choice of the bandwidth has been widely studies in non-parametric literature. Recently Rachdi and Vieu (2007) have proposed a data-driven criterion for choosing this smoothing parameter. The proposed criterion can be formulated in terms of a functional version of crossvalidation ideas. Antoniadis et al. (2009) treated the same problem in the context of time series prediction. In the following, the bandwidth h n is selected by L 1 cross-validation method: Choice of the semi-metric d(·, ·): because of the roughness of our covariate curves we chose a semi-metric computed with the functional principal components analysis with dimension q = 2.
In Figure 2, we plot the 95% confidence ellipses of µ(x) when x = 0 F . We can remark from Figure 2 that the lengths of the major and the minor axes of the confidence ellipse decrease when the sample size n increases. Similar results were obtained for other sample sizes n and values of the curve x.

Application to Chemiometrical data prediction
The purpose of this section is to apply our method based on multivariate L 1 -median regression to some chemiometrical real data and to compare our results to those obtained by other definitions of conditional median studied in literature. For that, we used a sample of spectrometric data available on the web site: http://lib.stat.cmu.edu/datasets/tecator. We have a sample of n = 215 pieces of meat and for each unit i, we observe one spectrometric discretized curve X i (λ) which corresponds to the absorbance measured at a grid of 100 wavelengths (i.e. X i (λ) = (X i (λ 1 ), X i (λ 2 ), . . . , X i (λ 100 ))). Figure (3) plots the spectrometric curves. Moreover, for each unit i, we have at hand its Moisture content (Y 1 ), Fat content (Y 2 ) and Protein content (Y 3 ) obtained by analytical chemical processing. Let us denote by Y = (moisture, f at, protein) t := (Y 1 , Y 2 , Y 3 ) t the vector of specific chemical contents of meat. Given a new spectrometric curve X new (λ), our purpose is to predict simultaneously the corresponding vector of chemical contents Y using the multivariate L 1 -median regression. Obtaining a spectrometric curve is less expensive (in terms of time and cost) than analytical chemistry needed for determining the percentage of chemical contents. So, it is an important economic challenge to predict the hole vector Y from the spectrometric curve.
Let us consider 215 observations (X 1 (λ), Y 1 ), . . . , (X 215 (λ), Y 215 ) split into two samples: learning sample (160 observations) and test sample (55 observations). We compare the following three methods, based on multivariate conditional median, to predict the vector of chemical contents Y of the test sample. In the following three approaches, we choose the quadratic kernel K defined by: (i) Non-functional approach (NF) This method is based on the definition of conditional spatial median studied by Gannoun et al. (2003b) and Cheng and De Gooijer (2007). This approach does not consider the covariate X as a function but a vector of dimension 100 while the response variable Y is a vector. For each i = 1, . . . , 160 in the learning sample, the i th vector Y i is predicted as follow: and w N F n,j (X i ) = K X i − X j h n n j=1 K X i − X j h n are the so-called Nadaraya-Watson weights.
For the choice of the bandwidth h n , Cheng and De Gooijer (2007) gave the exact expression of the optimal bandwidth that minimizes the asymptotic mean square error. In this case h n is of the rate n (−1/104+ ) , where > 0 is a sufficiently small constant.

(ii) Vector Coordinate Conditional Median (VCCM)
This approach supposes that the covariate X is considered as functional. For each i = 1, . . . , 160 in the learning sample, we predict each component of its vector response Y i by the one-dimensional conditional median. Then we obtain the vector of coordinate conditional medians (VCCMs) defined as where each component µ j (X i ) = ( F j ) −1 (1/2 | X i ) is the one-dimensional conditional median estimator.
F j (· | X i ) is the conditional distribution function estimator of the component Y j given X = X i . Ferraty and Vieu (2006), p. 56, have proposed a Nadaraya-Watson kernel estimator of the conditional distribution, F j (· | X = X i ), when covariate takes values in some infinite dimensional space. This estimator is given by To apply this approach, we used the Ferraty and Vieu's R/routine funopare.quantile.lcv 1 to estimate µ j (X i ). The optimal bandwidth is chosen by the cross-validation method on the k nearest neighbours (see Ferraty and Vieu (2006), p.102 for more details).

(iii) Conditional Multivariate Median (CMM)
The approach that we propose here supposes the covariate X is a curve and the response Y is a vector. For each i = 1, . . . , 160 in the learning sample we take To estimate the conditional multivariate median, µ(X i ), we have adapted the algorithm proposed by Vardi and Zhang (2000) to the conditional case and used the function spatial.median from the R package ICSNP. As in the previous approach, the optimal bandwidth is chosen by the crossvalidation method on the k nearest neighbours.
A common evaluation procedure: We have adapted, to the multivariate case, the algorithm proposed by Attouch et al. (2009) andVieu (2006), p.103) in order to get the optimal smoothing parameter h n for each X i in the test sample.
Step1. We compute the kernel estimator µ(X j ) (resp. µ k (X j )), for all j by using the training sample.
Step2. For each X i in the test sample, we set i = arg min j=1,...,160 d(X i , X j ).
Step3. For each i = 161, . . . , 215, we take µ(X i ) = µ(X i ) and µ k (X i ) = µ k (X i ). The used bandwidth for each curve X i in the test sample is the one obtained for the nearest curve in the learning sample. Because the spectrometric curves presented in Figure (3) are very smooth, we can choose as semi-metric d(·, ·) the L 2 distance between the second derivative of the curves. This choice has been made by Attouch et al. (2009) andFerraty et al. (2007) for the same spectrometric curves. Both (CMM) and (NF) methods take into account the covariance structure between variables of of the vector Y. In fact, the correlation coefficients between Y 1 = moisture, Y 2 = f at and Y 3 = protein are given by ρ 1,2 = −0.988, ρ 1,3 = 0.814 and ρ 2,3 = −0.860. As we can see moisture, fat and protein contents in meat are strongly correlated then it will be more appropriate to predict these variables simultaneously rather than each one separately. To compare (CMM), (NF) and (VCCM) methods, we are based on the following criterias: We can conclude from table 1 that our method is more appropriate to predict meat components than (VCCM). In fact, the (VCCM) approach predicts each component of Y separately using conditional univariate median. This method supposes independence of the components of Y and doesn't take into account the correlation structure between variables. The Non-Functional approach gives the most important prediction errors and this is because of the dimension of the covariate (100 in this case). This problem is well-known in nonparametric estimation as curse of dimensionality. Taking into account the functional aspect of the covariate seems to be necessary in such case.

Concluding remarks
In this paper, we have introduced a kernel-based estimator for the L 1 -median of a multivariate conditional distribution when covariates take values in an infinite-dimensional space. Prediction using the least square estimates of regression parameters is highly sensitive to outlying points. Therefore, there is no doubt that conditional L 1 -median can be used to make prediction. We have shown that our estimator is well adapted to predict a multivariate response vector. In fact, in contrast to the Vector Coordinate Conditional Median method, the multivariate conditional L 1 -median takes into account the inter-dependance of the coordinates of the response vector. Asymptotic results, i.e., almost sure consistency and asymptotic normality, has been given under some regularity conditions. Many extensions can be given to this work. For instance, the same type of theoretical results could be obtained in a non-independence framework (e.g. mixing dependence). Furthermore, it is well known that quantiles are very useful tools to detect outliers and to modelize the dependence of the covariates in lower and upper tails of the response distribution. In future work, we aim to generalize our study to the multivariate quantiles regression when covariates take values in some infinite dimensional space.

Appendix: Proofs
In order to prove our results we have to introduce some further notations. Let and define the bias of G x n (u) as B x n (u) = G x n,2 (u) − G x (u).
Consider now the following quantities It is then clear that the following decomposition holds Since G x n,1 is independent of u, it follows from decomposition (21) that The proof of Proposition 3.1 is split up into several lemmas, given hereafter, establishing respectively the convergence almost surely (a.s.) of G x n,1 to 1 and that of B x n (u), R x n (u) and Q x n (u) (with rate) to zero.
We start by the following technical lemma whose proof my be found in Ferraty et al. (2007).
Lemma 5.1 Assume that conditions (H1),(H2) hold true. For any real numbers j ≥ 1 and k ≥ 1, as n → ∞, we have Lemma below gives the convergence rate of the quantity G x n,1 .
Lemma 5.2 Under assumptions (H1)-(H2) and condition (10)(i), we have Proof of Lemma 5.2. Let us denote by . To apply the exponential inequality given by Corollary A.8(i) of Ferraty and Vieu (2006) in Appendix A we have first to show that for all m ≥ 2 there exist a positive constant C m such that E|L m n,1 (x)| ≤ C m a 2(m−1) . We have Then using Lemma 5.1 we get E (|L n, Therefore, we have a 2 = (φ(h)) −1 . Now, for all > 0, we have The desired result follows from Borel Cantelli Lemma by choosing = 0 log n/nφ(h) where 0 is a large enough positive constant.
The following lemma describes the uniform asymptotic behavior of the conditional bias term B x n (u) as well as that of R x n (u) and Q x n (u) with respect to u.
(ii) If in addition that (H1)-(H2) hold true and condition (10) is satisfied, we have Proof of Lemma 5.3. Recall that Conditioning by X and using the definition of G x (u) and condition (H3)(i), one has The later quantity is independent of u, this leads to sup The statement (24) follows from (23) combined with Lemma 5.2.
Proof of Lemma 5.4. For u ∈ R d and r > 0, let be the sphere of radius r centered at u. Let [−n γ , n γ ] d , for 1/2 < γ < 2, be an interval of R d . Divide [−n γ , n γ ] into k n subintervals each of length b n = [2n γ /k n ] (where [t] is the integer part of t). Since the set S(0, n γ ) = {u : ||u || ≤ n γ } is compact, it can be covered by k d n bounded hypercubes of the form S n,j := S(u j , b n ) = {u : ||u − u j || ≤ b n }, j = 1, . . . , k d n .
Observe now that If we denote by α n = nφ(h)/ log n the convergence rate, one gets by Lemma 5.2 The choice of k d n = [α n n γ log n] implies that α n (I n,1 + I n,3 ) = o(1).
In order to evaluate the term I n,2 , let us denote by Then, we have In order to apply an exponential type inequality, we have to give an upper bound for E (|Z n,1 (x)| m ). It follows from the above inequality that On the other hand, we have for any k ≥ 2 Using the first part of condition (H4)(i), which implies that G x k (u j ) is bounded uniformly for all j, one may write . Next, applying Lemma 5.1, one may write where C m is a real positive constant depending on m. Because φ(h) tends to zero as n goes to infinity, it comes that Now, applying Corollary A.8 − i in Ferraty & Vieu (2006) k d n times with a 2 = (φ(h)) −1 we obtain, by choosing = n = 3 0 √ v n where v n = (a 2 log n)/n = log n/(nφ(h)) −→ 0 as n → ∞, that P (|I n,2 | ≥ ) ≤ 2k d n exp − 2 0 log n One may choose 0 large enough such that n P (|I n,2 | ≥ ) < ∞.
We conclude by Borel-Cantelli lemma and (26) that in view of the above result. Now, we have The last term in (27) is zero for large n, since conditioning by X, one may write α n |G whenever γ > 1/2 and the condition (11) is satisfied. Moreover, we have for any > 0 To treat J n,1 , denote by

The event A n (ω) is nonempty if and only if there exists at least
whenever γ > 1, which implies that J n,1 = o a.s. (1) by Borel-Cantelli Lemma.
Proof of Theorem 3.2. We have from the definitions of µ(x) and µ n (x) and the existence and the uniqueness of these quantities that: It follows then Moreover, since for any fixed x ∈ F, the function G x (·) is uniformly continuous and because µ(x) is the unique minimizer of the function G x (·), we have then, for any > 0, which means that there exists for every > 0, a number η( ) > 0 such that G x (u) > G x (µ(x))+η( ) for every u such that µ(x) − u ≥ . This implies that the event { µ(x) − µ n (x) > } is included in the event {G x (µ n (x)) > G x (µ(x)) + η( )}. Using inequality (29) we get similarly to the proof of the Proposition 3.1. The statement (12) follows then from an application of Borel-Cantelli Lemma.
Proof of Proposition 3.2 To prove Proposition 3.2, it suffices to see that Concerning the first term, observe that where Using Theorem 3.2 and the triangular inequality we can easily see that Combining Markov and Cauchy-Schwarz inequalities and making use of the assumption H3-(iii), we can easily prove that . Then we conclude that A n = o P (1). For the second term B n of the inequality (32), we have by triangular inequality and the fact that U (Y i − θ) = 1, that , we can conclude, by using Theorem 3.2, that Finally, using the same arguments as above (concerning the proof of the term A n ), we get B n = o P (1) and this is allows us to conclude that H x n (ξ n (i)) − H x n (µ) = o P (1). Now we are interesting to the second term of the right side term of (31). Write .
We have to show that each term K n,i (i = 1, 2) is asymptotically negligible. We have K n,1 2 = tr(K T n,1 K n,1 ) = where (Z k,j ) 1≤k,j≤d is the general term of the matrix K T n,1 K n,1 which may be can be written as Using the assumption (H3)-(iv), Lemma 5.1 and corollary A.8 of Ferraty and Vieu (2006), we can easily prove that for all 1 ≤ k, j ≤ d, Z k,j = o P (1). To handle K n,2 , observe that in view of condition (H3)(ii).
Proof of Lemma 5.6. Let's denote by From the Cramer-Wold device, Lemma 5.6 can be proved by finding the limit distribution of the real variables sequence 1 √ n n i=1 t A i , for all ∈ R d satisfying l = 0.
Because the random variables t A 1 , . . . , t A n are i.i.d. with zero mean and asymptotic variance The result may be obtained by applying the Liapounov Central Theorem Limit. For this propose, we have to prove the following Lindeberg condition: It is easy to see that: Moreover, using C r and Jensen inequalities, we obtain It follows then, by hypothesis (H4)(ii) and Lemma 5.1, that Finally, since ( t Σ x (µ) ) −(2+δ)/2 is finite, it comes that because nφ(h) → ∞ as n → ∞. This implies the Lindeberg condition, which completes the proof of the Lemma.
The following Lemma gives the analytic expression of the matrix Σ x (µ). and V x n (µ n ) = M 1,n M 2,n nF x,n (h) T x n (µ n ) (µ n − µ) .