Superoptimal Rate of Convergence in Nonparametric Estimation for Functional Valued Processes

We consider the nonparametric estimation of the generalised regression function for continuous time processes with irregular paths when the regressor takes values in a semimetric space. We establish the mean-square convergence of our estimator with the same superoptimal rate as when the regressor is real valued.


Introduction
Since the pioneer works of [1,2], the nonparametric estimation of the regression function has been very widely studied for real and vectorial regressors (see, e.g., [3][4][5][6][7][8]) and, more recently, the case when the regressor takes values in a semimetric space of infinite dimension has been addressed. Interest in this type of explanatory variables has increased quickly since the foundational work of Ramsay and Silverman (1997), who proposed efficient methods for linear modelling (see [9] for a reissue of this work or [10,11] for other developments on this topic). Later, fully nonparametric methods have been proposed (e.g., [12][13][14][15]) but the increased generality comes at a price in terms of convergence rate: in the regression estimation framework, it is well known that the efficiency of a nonparametric estimator decreases quickly when the dimension of the regressor grows. This problem, known as the "curse of dimensionality, " is due to the sparsity of data in high dimensional spaces. However, when studying continuous time processes with irregular paths, it has been shown in [16] that even when the regressor is R -valued, we can estimate the regression function with the parametric rate of convergence O(1/ √ ). This kind of superoptimal rate of convergence for nonparametric estimators is always obtained under hypotheses on the joint probability density functions of the process which are very similar to those introduced by [17]. Since there is no equivalent of the Lebesgue measure on an infinite-dimensional Hilbert space, the definition of a density is less natural in the infinite-dimensional framework and the classical techniques cannot be applied. Under hypotheses about probabilities of small balls, we show that we can reach superoptimal rates of convergence for nonparametric estimation of the regression function when the regressor takes values in an infinite-dimensional space.
Notations and assumptions are presented in Section 2. Section 3 introduces our estimator and the main result. We comment on hypotheses and results and give some examples of processes fulfilling our hypotheses in Section 4. A numerical study can be found in Section 5. The proofs are postponed to Section 6.

Problem and Assumptions
Let { , } ∈[0,∞) be a measurable continuous time process defined on a probability space (Ω, F, ) and observed for ∈ [0, ], where is real valued and takes values in a semimetric vectorial space H equipped with the semimetric (⋅, ⋅). We suppose that the law of ( , ) does not depend on and that there exists a regular version of the conditional probability distribution of , given (see [18][19][20] for conditions giving the existence of the conditional probability). Throughout this paper, C denotes a compact set of H. Let 2 International Scholarly Research Notices Ψ be a real valued Borel function defined on R and consider the generalized regression function We aim to estimate from { , } ∈[0, ] . We gather hereafter the assumptions that are needed to establish our result.

Estimator and Result
We define the generalized regression function estimate bŷ where ( ) = I [0,1] ( ) is the indicator function on [0, 1] and ℎ is a bandwidth decreasing to 0 when → ∞. Remark that this estimator is the same as the one defined in [21, page 130] with the use of the semimetric instead of the simple difference used in the real case.
Theorem 1 explores the performance of̂( ) in terms of mean-square error.
We can compare this rate of convergence with the one obtained for discrete time processes in [14], which is, with our notations, Remark that, with infinite-dimensional variables, (ℎ) can decrease to zero, when ℎ tends to zero, at an exponential rate so that ℎ have to tend to zero at a logarithmic rate.

Comments and Examples
(H1) is a very classical Hölderian condition on the true regression function, but, in the infinite-dimensional framework, this condition depends on the semimetric used.
The assumption on small balls probabilities given in (H2)-(i) is widely used in nonparametric estimation for functional data (see, e.g., the monograph [22]). However, we want to point out the fact that if we define equivalence classes using the semidistance , we can construct a quotient space on which is a distance and if this quotient space is infinitedimensional, then this condition can be satisfied only very locally in that for any point of our compact C, we can find, for any > 0, a point and a positive number ℎ < such that ( , ) < and ( 0 ∈ B( , ℎ)) ≤ 1 (ℎ): in that case, we could not extend our hypothesis to every point in an open ball (see [23] for a result on the consequences of a similar hypothesis on every point in a ball).
The most specific and restrictive assumption is (H2)-(ii), which is an adaptation to infinite-dimensional processes of the conditions on the density function introduced in [17] for real valued processes and transposed in [21, pages 135-136] to the estimation of the regression function with a vectorial regressor. Note that when H = R and (ℎ) = ℎ , the rate of convergence obtained in Theorem 5.3 in [21, page 136] is the same as the one we obtain here, and the condition I2 used implies (H2)-(ii). On the other hand, processes can meet (H2)-(ii) and infringe the condition in [21], especially when the vectorial process does not admit a density. For real valued processes, a slightly different version of the Castellana and Leadbetter hypothesis on the joint density is given in [24] where it is shown that this hypothesis is satisfied for a wide class of diffusion processes, including the Ornstein-Uhlenbeck Process: these processes are also examples of the range of applications of our result. Real continuoustime fractional ARMA processes studied in [25] are given as examples in [26]. Depending on the choice of the impulse response functions, a vector composed of such processes can fulfil (H2)-(ii) for any : using the notations of [25], if (( 1, ), . . . , ( , )) are independent processes complying with conditions of Proposition 4 in [25] with > −1/2 − 1/ and > 0, then the vectorial process (( 1, ), . . . , ( , )) meets (H2)(ii). For processes valued in infinite-dimensional spaces, we can also give the example of hidden processes: let ( ) be a nonobserved process valued in R , for which conditions of Theorem 5.3 in [21, page 136] hold for every in a compact A, let Γ be an unknown function from R to a space H (that can be infinite-dimensional) equipped with a semimetric , and let ( ) = (Γ( )) be an observed process. If there exist two positive constants ( , ) such that for any > , ( ) does not satisfy the assumptions usually imposed to vectorial processes to obtain a superoptimal rate.
There are two conditions in (H3). The condition | ( | , )| ≤ 1 (| − |) is less restrictive than imposing that the regressor and the noise are independent. | ( is a weak condition on the decay of dependence as the distance between observations increases, and ( ) may not be -mixing. Note that we do not impose to ( ) to be an irregular path process.
At last, it is much less restrictive to impose (H4) than to suppose that Ψ( ) is bounded. In particular, this assumption allows us to consider the model where is a bounded function, ( ) is a square integrable process, and and ( ) are independent. On a given space, we can define many semidistances and hypotheses (H1)-(H2,) as well as the estimator itself, depending largely on the choice of this semidistance: the importance of this choice is widely discussed in [22] and a method to choose the semimetric for independent variables is proposed in [27], but this method does not ensure that (H1) holds. Actually, we can obtain a semimetric such that ( , ) = 0 ( ) = ( ). It would be of interest to develop a data driven method adapted to continuous time processes to select the semimetric.
In the statement of our theorem, we impose that ℎ = T −1/ where is an unknown parameter so that the adaptation to continuous time processes of the method developed in [28] to choose the bandwidth would be interesting but is not in theory necessary in our framework. In point of fact, and it is what was very surprising when Castellana and Leadbetter first obtained a superoptimal rate of convergence, the bound for the variance of the estimator does not depend on ℎ and we can choose ℎ = − log( ) which will always satisfy ℎ = T −1/ for large enough: even if this choice has no reason to be optimal, it leads to the claimed superoptimal rate of convergence.
Recently, results have been obtained in the case where the response is valued in a Banach space, which can be infinitedimensional (see [29,30]). Note that until Ψ is a real valued Borelian function, there is no need to change our proofs to obtain our result if is valued in a Banach space. However, in the case where Ψ( ) is a Banach valued variable, we could not easily adapt our proofs and obtaining a superoptimal rate would involve very different techniques; it would be an interesting extension for further works.

Simulations
We chose L 2 ([−1, 1]) endowed with its natural norm as the functional space and simulated our process as follows.
At first we simulated an Ornstein-Uhlenbeck process solution of the stochastic differential equation where denotes a Wiener process. Here, we took = 0.0005.
For any square integrable function on [−1, 1], we define the function and set where = − −1 and is a Wiener process independent of .
In order to obtain a panel of 20 points (in L 2 ([−1, 1])) where we can evaluate the regression function, we did a first simulation with = 10 and set C := ( /2 , ∈ 1, 2, . . . , 20). Once obtained, C is considered as a deterministic set. We represent these functions in Figure 1.
Remark. We check here that the simulated processes fulfil our hypotheses.   The Ornstein-Uhlenbeck process satisfies the part of Condition I2 on the regressor's density in [21, page 136]. Moreover, Γ is a bijection from R to Im(Γ), and it can be shown that, for some constant , there exist 0 < < such that for any 0 < ℎ < and any ∈ Γ −1 (C), the two following implications are correct: which implies that (H2)(i)-(ii) are fulfilled when taking (ℎ) = ℎ.
Since ( ) and ( ) are independent and Cov( , ) = 0 if | − | > 1, (H3) is satisfied. Finally, the model used in the simulation corresponds to the choice of the identity function for Ψ in (1), where ( ) is an unbounded process and (⋅) is not a bounded function. However, (⋅) is bounded on Im(Γ) and so (H4) is fulfilled.
We simulated the paths of the process ( , ) ∈[0, ] for different values of . Figure 2 represents the path of the process ( ) for ∈ [0, 1].
We estimated the regression function at each point in C, for different values of , and compared our results to those obtained when studying a discrete time functional process, that is, when we observe ( , ) only for ∈ N, and we use the estimator defined in [12] with the indicator function as the kernel: it corresponds to an infinite-dimensional version of Nadaraya-Watson estimator with a uniform kernel. When working with the discrete time process we used the datadriven way of choosing the bandwidth proposed in [28]. When working with the continuous time process that is observed on a very thin grid, for = 50, we chose the same bandwidth as the one used for the discrete time process and, for > 50, we supposed to be Lipschitz (i.e., = 1, which is the case here) and used the bandwidth ℎ = ℎ 50 (50/ ). In Table 1, we give the mean square error evaluated on the functions of the panel for different = 50, 500, and 2000.   We can see that, for = 50, we already have a smaller mean square error with the estimator using the continuous time process, and when increase, the mean square error seems to decrease much more quickly when working with the continuous time process. However, the continuous time approach takes much more time and much more memory; we had to split the calculation into several parts and delete intermediate calculations to avoid saturating memory.
In Figures 3 and 4, we have in abscissa the value of the real regression function applied to each function of our panel and in ordinate the estimated value of the regression function. We represent on the left the results for the continuous time estimator and on the right the results for the discrete time estimator.  Proof of Lemma 2. Observe that, for any ∈ C,

Proofs
Hence, This ends the proof of Lemma 2.
Proof of Lemma 3. For any ∈ C, by Fubini's Theorem, we have Upper Bound of the Covariance Term. In order to simplify the notations, we set The triangular inequality and Jensen's inequality yield Upper Bound for . Using (H2)-(ii), we have Upper Bound for . Owing to (H1), we have Δ , | | ≤ Δ , sup ∈B( ,ℎ ) | ( ) − ( )| ≤ Δ , ℎ . It follows from this inequality and (H2)-(i) that Upper Bound for . By similar techniques to those in the bound for and (H3), we obtain On the other hand, by (H2)-(ii), Hence, Therefore, setting the obtained upper bounds for , , and yield Final Bound. Combining (24) and (38) and using (H2)-(i), we have We finally obtain Putting the obtained upper bounds for , , and together, we have sup ∈C (̂( ) − ( )) 2 = O ( 1 ) .
Theorem 1 is proved.