Kernel regression with functional response

: We consider kernel regression estimate when both the response variable and the explanatory one are functional. The rates of uniform almost complete convergence are stated as function of the small ball probability of the predictor and as function of the entropy of the set on which uniformity is obtained.


Introduction
Regression model is a main tool to examine the relationship between a response variable and an explanatory one. We are interested in estimating the nonparametric regression when both variables (response and explanatory) are functional. The study of statistical models adapted to such kind of data received a lot of attention in recent literature (see, [14], [2] and [7] for recent monographies and [5] for an handbook on statistics in infinite dimensional spaces). Since the previous paper by [6] the literature on nonparametric regression (see the survey by [8]) starts to be rather important when the response variable is scalar, but there are very few advances in this direction when this response is functional (see however [3] and [11] for earlier references). Various advances on this field have been already provided under some linear assumption on the regression operator (see for instance [13], [9], in standard i.i.d. case; see also [2] in the specific time series context).
This work presents some asymptotic property for a kernel-type regression estimator when both response and explanatory variables are functional. Precisely we state the uniform almost complete convergence rate of this doubly functional kernel estimate. As far as we know, our result is the first one stating uniform asymptotic results in nonparametric doubly functional regression problems. As usually in functional statistics, the topological structures on the infinite-dimensional spaces play a prominent role, and we present the rates of convergence in such a way to highlight these topological effects. Firstly, the problems linked with the high (high because infinite) dimensionality of the explanatory variable are dealt with by means of small ball probability considerations (and this is directly linked with the topological structure). Secondly, the uniformity of the convergence is obtained by means of entropy notions (which are, once again, direct topological considerations). Finally, the type of the Banach space on which the response variable takes its values acts also directly on the rates of convergence. Section 2 is dedicated to some probability tools for functional variable valued in a Banach space. The doubly functional model and its estimate are presented in Section 3 and the uniform rates of convergence are stated therein. Some technical proofs are defered to the Appendix.
Before closing this introduction it is worth being stressed that, even if we have deliberately chosen to present a short theoretical paper, there exists a wide scope of applied scientific fields for which our approach could be of interest. In the next future, our works will be concentrated on the implementation of this doubly functional nonparametric method. To fix the ideas, one can for instance find examples in Biometrics, Genetics or Environmetrics in [9], [13] and [10] respectively.
Kernel regression with functional response 161

Some probability tools for functional variables
We present two general tools for random variables valued in Banach spaces. The topological complexity of the Banach space B will appear through the following following notion.
if there exists a strictly positive constant c such that for any finite sequence Z 1 , . . . , Z n of independent B-random variables such that E||Z i || p < ∞ and EZ i = 0, we have Remark 1. Clearly, R k (k ≥ 1) or more generally any Hilbert space is a Banach space of type 2.
In addition, and as it is usual in nonparametric statistics, one needs some kind of exponential inequality for getting rates of convergence. We will make use later in this paper of the following Bernstein's type inequality for sums of Banach-valued random elements: for some positive constants b = b(n) and l = l(n), then: where S n = n i=1 Z i . This inequality can be found at page 49 in [2], and a deeper discussion on B-random variables can also be found in this book.

The doubly functional nonparametric setting
Let us consider a sample of independent pairs (X 1 , Y 1 ), . . . , (X n , Y n ) identically distributed as (X, Y ) which is a random pair valued in F × B, where (F , d) is a semi-metric space and (B, ||.||) is a Banach space of type p ∈]1, 2]. Recall that a semi-metric (sometimes called pseudo-metric) is just a metric violating the property: Nonparametric estimates of the operator m are constructed by local weighting ideas, as for instance the following doubly functional kernel estimate: where K is a kernel function and h = h n is a sequence of positive real numbers which goes to zero as n goes to infinity. We stay here with this simple nonparametric smoother, but alternative ones could be introduced such as the local functional one previously studied by [1] for scalar Y .

The general hypotheses
Now S F is a fixed subset of F , and for η > 0 we consider the following ηneighborhood of S F : We will use the notation The model consists in assuming that the probability distribution of X is such that there exists a non-decreasing function φ such that: while the joint distribution of (X, Y ) has to satisfy: where r! = r(r − 1) . . . (r − [r] + 1), [r] being the largest integer smaller than r. We also need the following technical conditions on the kernel K: (H4) The kernel function has to be such that: (i) K is a bounded and Lipschitz continuous function with support [0, 1), and if K(1) = 0 it has also to fulfill, jointly with φ(.), the conditions: While (H1)-(H4) are standard conditions to get pointwise rates of convergence, the next assumptions are directly linked with our wishes to have uniform rates over the set S F . These conditions will make appear the topological complexity of the set S F which will act through the Kolmogorov's entropy of S F defined for any ǫ > 0 by: where N ǫ (S F ) is the minimal number of open balls in F of radius ǫ which are necessary to cover S F . Of course these conditions will also cross restrictions on the small ball probability function φ introduced in (H1). Assume that: It is important to stress that, despite of its rather intricate form, this set of assumptions is not too much restrictive. Excepted (H3), all other conditions are related with the explanatory variable X and they have been discussed in various previous papers. The reader can look for instance at Chapter 13 in [7] to see how all these conditions can be shown to be true pending to suitable topological structure on the space F (that is, pending to suitable choice of the semi-metric d). It is out of purpose to provide detailled discussion in this paper, because this is definitively not linked with the functional nature of the response (which is the main point that we wish to address) and also because such a discussion appears already in various other papers. Finally, the only condition which is specific to the functional response Y is the rather unrestrictive conditional moment existency assumed in (H3).

Uniform rates of convergence
The following theorem states the rate of convergence of m, uniformly over the set S F . The asymptotics are stated in terms of almost complete convergence (denoted by a.co.) which is known to imply both weak and strong convergences (see, among other, Section A-1 in [7]). The topological structure on the space F acts directly on these rates through the functions φ and ψ SF , while the topological complexity of the space B will act through its type p.. As discussed before, this is the first result of this kind in regression setting when both X and Y are functional. In the special simpler case when Y is real this result was given in [4]. To fix the ideas and to highlight the wide generality of the apparently highly technical assumptions (H1)-(H7) a few special cases will be considered later in Remark 2, while Remark 3 will present an interesting direct consequence of this general result.
Proof. We consider the decomposition The denominator f (x) does not involve the functional response, and therefore the following results stated in [4] remain true: So finally, Theorem 3.1 will be true as long as both following lemmas will be proved. The proofs of these lemmas are reported to the Appendix.

Remark 2.
For any Hilbert space, it is clear that p = 2 (see, Remark 1) so the rate (3.2) becomes: In the special case when B is the euclidian space R k , when S F is compact and when X has a density with respect to the Lebesgue measure, then (3.2) becomes the usual multivariate nonparametric rate: Remark 3. Uniform consistency allows to replace a fixed x with a random element X. Indeed, as soon as P (X ∈ S F ) = 1, one gets

Appendix A: Proof of technical lemmas
In the following, we will denote K i (x) = K(h −1 d(x, X i )). First of all, one has The result (A.1) is obvious when K(1) > 0 and can be extended to continuous kernel K satisfying (H4) as shown in Lemma 4.4, page 44, in [7]. From now on, we will denote by C is a generic nonnegative real constant, and we will take ǫ = log n n .
Note that condition (H5) implies that for n large enough: in such a way that (H6) implies both that Proof of Lemma 3.2. One has Hence, we get Thus, with hypotheses (H1), (H2) and (A.1) we have, for n large enough: This last inequality yields the proof, since C does not depend on x.

F. Ferraty et al.
One considers now the following decomposition We will first deal with the terms G 1 and G 3 which are the simplest ones in the sense that they are not linked with the functional nature of Y and so one can make use of previous literature for scalar response Y to treat them. The term G 2 will need more specific attention.
i) Study of the term G 1 . We get directly from (H1) and (A.1): In a first attempt, assume that K(1) = 0 (i.e. K is Lipschitz on [0, 1]) in order to get: Clearly, we get from (H3): which implies that Moreover, by using the second result in (A.2) together with the definition of ǫ we have for n large enough: Both last results yeld directly to So, by applying Corollary A.8 in [7] with a 2 = ǫ h φ(h) , one gets Kernel regression with functional response 167 Finally, applying again (A.3) for m = 1 one gets Now, using (H6) together with the second part of (A.2) and with the definition of ǫ, we get: The proof of (A.4) for the case K(1) > C > 0 (i.e. K Lipschitz on [0, 1)) is not presented here since one can proceed exactly as in Lemma 6 in [4] by splitting again G 1 into three terms. ii) Study of the term G 3 . By definition of G 3 we have: the first inequality coming from the contractive property of the expectation operator (see [2], page 29). So we have finally G 3 ≤ EG 1 which, combined with (A.4), leads directly to iii) Study of the term G 2 . This part is the most technical because it involves directly the functional response Y . This is the main specificity of our work and so G 2 cannot be treated by the same techniques as if Y was real (as for G 1 and G 3 ). The proof will use the exponential inequality for Banach space valued random variables (see, Lemma 2.1). Let: It is clear that, ∀η > 0, Choosing now we have To apply the inequality of Lemma 2.1, one must evaluate the quantities Using the condition (H3) we have for all j ≤ m: leading finally, by using the result (A.1) and the boundedness of K (see (H4)), to E Y 1 K 1 (x) j ≤ Cj!φ(h). (A.9) Now we use the Newton's binomial expansion and we get: where C k,m = m! k!(m−k)! . We get by (A.9): Note that we have used the fact that Because of (A.10), we are now in position for applying Lemma 2.1, by taking and t = η 0 ψ SF (ǫ) nφ(h) , and we arrive at the second inequality coming from the first result in (A.2). Therefore, by using (A.11), (A.7) and (A.6), we have: Because of (H7) there exists some β > 1 such that