Nonparametric relative recursive regression

Abstract: In this paper, we propose the problem of estimating a regression function recursively based on the minimization of the Mean Squared Relative Error (MSRE), where outlier data are present and the response variable of the model is positive. We construct an alternative estimation of the regression function using a stochastic approximation method. The Bias, variance, and Mean Integrated Squared Error (MISE) are computed explicitly. The asymptotic normality of the proposed estimator is also proved. Moreover, we conduct a simulation to compare the performance of our proposed estimators with that of the two classical kernel regression estimators and then through a real Malaria dataset.


Introduction
Nonparametric regression provides a useful diagnostic tool for data analysis. A useful mathematical model is to estimate the link between a Borelian function m(T) and X, by means of a function r(x) which achieves the minimum of the mean squared error (MSE) based on a random sample of data (X , Y ) , . . . , (Xn , Yn) from a unknown joint density f (·, ·), when the covariates (X i ) for i ∈ { , . . . , n} take values in nite dimensional R d . Nonparametric regression methods have attracted much attention among statisticians in the last several decades, and a large literature now exists. When it comes to the situation with multiple covariates, multivariate nonparametric regression has been proved to be very useful in practice. [45,46] have shown that the local regression estimators having optimal rates of convergence, and Cleveland and Devlin [6] have proved that they are very useful in modeling data. [34] derived the asymptotic properties of the multivariate local linear and local quadratic estimators. [49] studied multivariate plug-in bandwidth selection and Herman et al. [14] proposed plug-in approaches for bivariate convolution kernel estimator. Eubank [10], [48], and [12] described thin plate smoothing splines. Often, in nonparametric estimation, we use the least squares and the least absolute deviation as criteria to construct the predictors. However, for many practical situations the MSRE is more appropriate as measure of performance than the two previous criteria, see, [18] for some models in software engineering, [4] for some examples in medicine or [5] for some nancial applications. Let us underline that, the classical procedure estimation (MSE) is based on some restrictive condition that is the homoscedasticity. This consideration gives the same weight for all observations, which is inadequate when the data contains some outliers. Although relative error is not widely studied in the statistical literature there are methods designed with rel-ative error performance. We can list the work of [28], which consist on the study of an estimation method for minimizing the sum of absolute relative residuals. However, [11] developed an estimation method designed to reduce absolute relative-error. Moreover, [18] studied the asymptotic properties of the estimators by minimizing the sum of the squared relative errors. [15] introduced and studied local constant and local linear nonparametric regression estimators when it is appropriate to assess performance in terms of mean squared relative error of prediction. [51] established the connection between relative error estimators and the M-estimation in the linear model. [2] considered the case of spatial data. [7] considered the case where the explanatory variable are of functional type of data, [1] investigate the functional nonparametric regression estimation in the case when the response is subject to left-truncation by an other random variable, [43] considered the case of recursive estimation of the regression estimation in the case of the functional data, while [8] study the M-estimation of the functional nonparametric regression when the response variable is subject to left-truncation by an other random variable. Relative error is sometimes a more meaningful measure of performance of a predictor than the absolute error. Generally, this occurs when the range of predicted values is large.
In this paper, we construct an alternative kernel estimate regression function using a recursive methods by considering the problem of estimating the regression function based on the minimization of the MSRE. We address recursive kernel estimators for which recursive means that the estimator calculated from the rst n observations, say fn, is a function of only f n− and the (n) −th observation. The Robbins-Monro algorithm was originally proposed by [32] and further developed and investigated as well as applied in many di erent situations (see, among many others, [9, 19-21, 25, 26, 30, 31, 33, 37, 39-42, 47]).
As is well known, such a recursive property works well within the framework a data streams. Streaming data are massive data arriving in streams, and if they are not processed immediately or stored, then they are lost forever. The sample data are obtained by means of an observational mechanism that allows for a rapid increase in the sample size over time. In recent years, data streams have become an increasingly important area of research. Common data streams include Twitter activity, the Facebook news stream, Internet packet data, stock market activity, credit card transactions and Internet and phone usage. In those situations, the data arrive so rapidly that it is impossible for the user to store them all in disk (as a traditional database), and then interact with them at the time of our choosing. Consequently, to deal with such big data, the traditional nonparametric techniques rapidly require a lot of time to be computed and therefore become useless in practice. Therefore, the development of methods of processing and analyzing these data streams e ectively and e ciently has become a challenging problem in statistics and computational science. This is why we consider the regression estimation problem in the context of data streams in this paper. This recursive estimator shows good theoretical properties, from the point of view of relative mean square error.
The general idea of the proposed recursive methods is described in Section 2. Asymptotic MSRE properties of the recursive regression estimator are given and discussed in Section 3. A simulation study is presented in Section 4. In Section 5, we consider a real Malaria dataset. We conclude the paper in Section 6, whereas the technical details are deferred to Section 7.

Presentation of estimates
Given identically distributed (i.i.d.) observations (X , Y ), . . . (Xn , Yn) with joint density function f (x, y), and f denote the probability density of X. In regression analysis, our interest is the estimation of Y given X, which consist on nding a function η(X) which satis es the problem (1).
In order to construct a stochastic algorithm for the estimation of the regression function r (x) at a point x, we de ne an algorithm that calculates the zero of the function h : y → r (x) − y. Because x is a xed point, then the value r (x) is the unique solution of the equation h (y) = with unknown y.
A typical kernel based estimator of the Robbins-Monro's procedure (see [32]) is However, the use of previous loss function as a measure of prediction performance may be not suitable in some situation. In particular, in the case when the presence of outliers can lead to unreasonable results since all variables have the same weight. Now, to overcome this limitation we propose to estimate the function r by an alternative loss function.
In the relative regression analysis r (x) is obtained by minimizing the mean squared relative error (MSRE) ie: r (x) is the solution of the optimisation problem: It is clear that this criterion is a more meaningful measure of prediction performance than the least squares error, in particular, when the range of predicted values is large. Moreover, the solution of this problem can be expressed by the ratio of rst two conditional inverse moments of Y given X. As proposed by [29], is the best MSRE predictor of Y given X. Thus, we can estimate r (x) by . In order to construct a stochastic algorithm for the estimation of the regression function ϕ : a point x, we de ne an algorithm of search of the zero of the function h : y → ϕ (x) − y. Following Robbins-Monro's procedure (see [32]), this algorithm de ned by setting ϕ (x) ∈ R , and, for all n , where Wn (x) is an observation of the function h at the point ϕ n− (x), and the stepsize (γn) is a sequence of positive real numbers that goes to zero. Taking , then, the estimator ϕn to recursively estimate the function ϕ at the point x can be written as We let ϕn( ) = , then, we can estimate ϕ recursively at the point x by where Πn = n i= ( − γ i ), following similar steps, we can estimate recursively the function ψ at the point x by moreover, we let ψn( ) = , then, we can estimate ψ recursively at the point x by Then, our proposal in this paper is the following estimator: The purpose of this paper is the study of the properties of the proposed relative recursive regression estimators (4), and its comparison with the direct analogue of the well-known Nadaraya-Watson estimator introduced separately by [27] and [50], and de ned as This estimator was proposed in [15]. However, the strong consistency and the asymptotic normality of this estimator under weak dependence conditions is given in [22], while the case of censored data was considered in [16].

Assumptions and main results
We de ne the following class of regularly varying sequences.
De nition 1. Let γ ∈ R and (vn) n≥ be a nonrandom positive sequence. We say that (vn) Condition (6) was introduced by [13] to de ne regularly varying sequences (see also [3]) and by [24] in the context of stochastic approximation algorithms. Noting that the acronym GS stand for (Galambos and Seneta). Typical sequences in GS (γ) are, for b ∈ R, n γ (log n) b , n γ (log log n) b , and so on.
In this section, we investigate the asymptotic properties of our proposed estimators (4). The assumptions to which we shall refer are the following iv) The function ψ (x) > and the inverse moments of the response variable ∀m ≥ , E Y −m |X = x < C < ∞.

Discussion of the assumptions
It is interesting to underline that the intuition behind the use of such bandwidth (hn) belonging to GS (−a) is that the ratio h n− /hn is equal to + a/n + o /n , then using such bandwidth and using the assumption (A ) on the bandwidth and on the stepsize, Lemma 2 ensures that the bias and the variance will depend only on hn and not on h , . . . , hn, then the MISE will depend also only on hn, which will be helpful to deduce an optimal bandwidth. Moreover, in order to help the readers to follow the main results obtained in this paper, we underline that under the assumption (A ), we have Πn n

Assumptions (A ) and (A ) are regularity conditions which permit us
to evaluate the bias term, the variance term of the estimator (4). Moreover, (A ) include some technical condition to attain brevity of proofs and to obtain a convergence rate. Somme work in progress plain to consider less restrictive conditions. Assumption (A ) (iii) is usual in the framework of stochastic approximation algorithms. It implies in particular that the limit of [nγn] − is nite. For simplicity, we introduce the following notations: . Results on the relative recursive regression estimators 4 In this section, we explicit the choice of the bandwidth (hn) through a plug-in method, which consist on considering an asymptotic unbiased estimator of the unknown quantities which can be appeared in the expression of the theoretical bandwidth and then in the expression the corresponding MISE. Our rst result is the following proposition, which gives the bias and the variance of rn.
If a ∈ [α/(d + ), ), then If a ∈ [ , α/(d + )), then The bias and the variance of the estimator rn de ned by the stochastic approximation algorithm (4) then heavily depend on the choice of the stepsize (γn).
We propose now to state the following theorem, which gives the weak convergence rate of the estimator rn de ned in (4).
where D → denotes the convergence in distribution, N the Gaussian-distribution and P → the convergence in probability.
Let us now consider the case where the bandwidth hn is chosen so that limn→∞γ − n h d+ n = (which corresponds to under-smoothing). Thus, the proposed estimator satis es the following central limit theorem: Let ϕ denote the distribution function N ( , ), and t α/ be such that ϕ t α/ = − t α/ (where α ∈ ( , )). Then the approximate asymptotic con dence interval of r (x), with level − α, is given by where V (x) is the empirical estimator of V (x), we estimate r by rn and f by fn, where fn in the recursive kernel density estimator: In order to obtain a theoretical expression of the bandwidth (hn), we state the following proposition, which gives the MISE of the estimator rn.
3. If a > α/ (d + ), then The following corollary is a direct consequence of the previous proposition, and then the corresponding MISE We can observe that the proposed optimal bandwidth depends on the following unknown quantities: V and B, in order to overcome this problem, we followed the plug-in method proposed in [36] , which leads to consider the following kernel estimators where K (j) b is the j-th derivative of a kernel K b and bn the associated bandwidth. We followed the approach proposed in [36,37] and we showed that bn and b n should belong to GS − /(d + ) and GS − /( (d + )) , respectively. In practice, we use (13) with β = /(d + ) and β = /( (d + )), respectively. Where (see [35] with s the sample standard deviation, and Q , Q denoting the rst and third quartiles, respectively. Then, we have the following corollary.
and then the corresponding MISE .

Results on the relative non-recursive regression estimator
Let us claimed the following Lemma which gives the bias and variance of the relative non-recursive regression estimator (5), the proof follows easily from the one of the Theorem 1.

Lemma 1 (Bias and variance of rn). Let Assumptions (A ), (A ) ii) and (A ) hold
, and assume that f ( ) is continuous at x. and Then, it follows from Lemma 1, that Let us now consider the case where the bandwidth hn is chosen so that limn→∞nh d+ n = (which corresponds to under-smoothing). Thus, the non-recursive estimator rn satis es the following central limit theorem: Let ϕ denote the distribution function N ( , ), and t α/ be such that ϕ t α/ = − t α/ (where α ∈ ( , )). Then the approximate asymptotic con dence interval of r (x), with level − α, is given by where V (x) is the empirical estimator of V (x), we estimate r by rn and f by fn, where fn in the non-recursive kernel density estimator: Then, to minimize the asymptotic MISE of rn, the bandwidth (hn) must equal to and then the corresponding MISE Since (16) depends on the unknown quantities: V (x) R (K) and B (x), we consider the following kernel estimators Following similar steps as [36,37], we showed that bn and b n should belong to GS − /(d + ) and GS − /( (d + )) , respectively. Then, in practice, we use (13) with β = /(d + ) and β = /( (d + )), respectively. Then, we have and then the corresponding MISE

Simulations
In order to investigate the comparison between the three estimators, we consider three sample sizes: , , and and we use the following model Z = µ (X) + ε and Y = exp (Z), which ensures that the response variable is strictly positive. We consider the standard normal kernel K (z , . . . , z d ) = ( π) −d/ exp −

Computational cost
In order to give some comparative elements with the direct analogue of the well-known Nadaraya-Watson estimator (5), including computational costs. We consider a samples of size n = n/ (the lower integer part of n/ ), moreover, we suppose that we receive an additional samples of size n − n . This property can be generalized, one can check that it follows from (2) that for all n ∈ [ , n − ], where α = It is clear, that the use of the proposed estimator (4) can improve considerably the computational cost.   (17), Recursive 1 correspond to the estimator (4) using the proposed plug-in bandwidth selection (14) and the stepsize (γn) = n − , Recursive 2 correspond to the estimator (4) using the proposed plug-in bandwidth selection (14) and the stepsize (γn) = hn / n k= h k using model 1; µ (X) = . + x, X ∼ U ( , ) and ε ∼ N ( , ). Table 1, we conclude that: 1-The proposed relative recursive regression estimators (4) and (5) are close to the true regression function. 2-The three estimators included our two relative recursive regression estimators (4) with respectively (γn) = n − and (γn) = (hn) n k= h k − and (5) performed very well, and that none of the three can be claimed to be best in all cases. 3-The estimators get closer to the true density function as sample size increases.

. Feasibility in term of con dence interval
The aim of this subsection is to compare the performance of the non-recursive relative recursive regression (5) with that of the recursive estimator (4), from con dence interval point of view. We set where, when i = , ψn = rn is the non-recursive estimator (5), and C (ψn) = , V ,n (x) = V (x) and when i = , ψn = rn is the recursive estimator (4) with the choice (γn) = γ n − , C (ψn) = γ ( γ −(α−ad)) and V ,n (x) = V (x). It comes from (15) and (12) that both con dence intervals I ,n and I ,n have the same asymptotic level (equal to %), whereas I ,n has a smaller length than I ,n . Table 2 give the empirical levels (# r (x) ∈ I i,n /N) for di erent values of d, σ , the sample size n, by considering x = (resp. x = ( , )). Table 2 shows that the recursive estimator with the choice (γn) = hn / n k= h k outperforms the nonrecursive estimator and the recursive one with the choice (γn) = n − : the empirical levels of the intervals I ,n are greater than those of I ,n .  (17), Recursive 1 correspond to the estimator (4) using the proposed plug-in bandwidth selection (14) and the stepsize (γn) = n − , Recursive 2 correspond to the estimator (4) using the proposed plug-in bandwidth selection (14) and the stepsize (γn) = hn / n k= h k using model 2; µ (X) = log . + (x − . ) , X ∼ U ( , ) and ε ∼ N ( , ).

Real dataset
We considered a dataset of families in Senegal, totalizing children between 2 and 19 years old, living in two villages of Niakhar (Diohine and Toucar). The number of observations was . We measured Plasmodium falciparum Parasite Load (PL) from thick blood smears obtained by nger-prick during two different seasons and regularly over a three-year observation period (2001)(2002)(2003), the number of measurements per child ranged from 1 to 15, for more details see ( [23]), this data was used also in [38] in a parametric context.
We had the following variables: 1-Family identification : A factor with levels; 2-Child identification : A factor with levels; 3-PL : Parasite Load (is strictly positive since the children have a positive PL); 4infection : A factor with two levels (infected: or not infected: ); 5year : A factor with three levels ( for , for and for ); 6number of measurements per child : A factor with levels; 7age : Age of the child in years between and ; 8season : A factor with two levels (July-October and October-March); 9village : A factor with two levels (Diohine and Toucar). Figure 3 show that the parasite load density can be higher in some speci c age classes. Moreover, one can observe that the three considered estimators can give quite similar classes of age, the di erent between the estimators aren't signi cative.

Conclusion
In this paper we propose a recursive relative regression estimators given in (4). The proposed estimators asymptotically follows normal distribution. Moreover, our proposed estimators attained the asymptotic con- ) of three models using three estimators; non-recursive correspond to the estimator (5) using the proposed plug-in bandwidth selection (17), Recursive 1 correspond to the estimator (4) using the proposed plug-in bandwidth selection (14) and the stepsize (γn) = n − , Recursive 2 correspond to the estimator (4) using the proposed plug-in bandwidth selection (14) and the stepsize (γn) = hn / n k= h k .  (14)) are then compared with the non-recursive one proposed by [15] using the plug-in bandwidth selection developed in the subsection 3.2 (see, (17)). We showed that, using some particularly choice, the proposed estimators can give in some situation a better results compared to the non-recursive approach in terms of estimation error. The simulation study con rms the nice feature of our proposed recursive estimators.
In conclusion, the proposed estimators allowed us to obtain a good results. A future research direction would be to extend our ndings to the α-mixing framework see [17]. Another direction is to investigate the relative regression estimation based on the transformation of the data, in the case when the response variable are not positive see [44].  (17), Recursive 1 correspond to the estimator (4) using the proposed plug-in bandwidth selection (14) and the stepsize (γn) = n − , Recursive 2 correspond to the estimator (4) using the proposed plug-in bandwidth selection (14) and the stepsize (γn) = hn / n k= h k .

Proofs
Throughout this section we use the following notation: Let us rst state the following technical lemma.
Lemma 2. Let (vn) ∈ GS v * , (γn) ∈ GS (−α), and m > such that m − v * ξ > where ξ is de ned in (7). We have Moreover, for all positive sequence (αn) such that limn→+∞ αn = , and all δ ∈ R, Lemma 2 is widely applied throughout the proofs. Let us underline that it is its application, which requires Assumption (A )(iii) on the limit of (nγn) as n goes to in nity.

. Proof of Theorem 1
Let us rst note that, for x such that ψn (x) ≠ , we have with It follows from (18), that the asymptotic behavior of rn (x) − r (x) can be deduced from the one of Bn (x). Moreover, the following Lemma follows from the Proposition 1 of [25].
We now prove (28). In view of (19), we have Moreover, in view of (A ), classical computations give The application of Lemma ensures that The convergence in (28) then follows from the application of Lyapounov's Theorem.

. Proof of Proposition 1
Following similar steps as the proof of the Proposition 2 of [25], we proof Propostion 1.