On the Adaptive Nadaraya-Watson Kernel Estimator for the Discontinuity in the Presence of Jump Size

In this paper, we studied an Adaptive Nadaraya Watson kernel estimator to check the bias effect on both side of the discontinuity in the presence of jump size for regression discontinuity model. We have proposed the modified Adaptive Nadaraya Watson kernel estimator and derived its normality and variance. We have also compared with the asymptotic normality of the Mean Integrated Square Error (MISE) of Adaptive Nadaraya Watson kernel estimator and Nadaraya Watson kernel estimator. The results obtained from the simulation study have showed that Adaptive Nadaraya Watson estimator has better performance than the Nadaraya Watson Kernel estimator.


Introduction
Nonparametrics: For last few years, nonparametric regression has become important tool for data smoothing.Most commonly used estimates of nonparametric regression functions including kernel estimates based on smoothness of the regression functions.In many applications, the function is to be estimated as discontinuities or threshold points.Gao, et al. [1], for example, when studying the impact of advertising, the time at which this action takes place impact could effectively be modeled by the location of a jump point and the magnitude of the effect of this action is measured by the jump size.If we ignore the jump point we make serious error in order to draw the inference about the processes under study.Similarly also see Yin [2] for estimating locations of discontinuity points of the regression function.
Regression discontinuity design becomes one of the useful designs when there is threshold point in the treatment or in the probability assignment.The treatment assignment under the weak smoothness becomes random near the threshold point.Regression Discontinuity model is mostly used only for the information that is very close to threshold point.In Regression Discontinuity model lacks of smoothness is not only the problem but the size of discontinuity is also important.It is useful to estimate the conditional expectation of the boundary points to check the difference of results given by boundary estimation (see Porter [3]).
Demir and Toktamis [4] considered Adaptive Nadaraya Watson kernel estimator to estimate the regression function.Our paper is also composed of Adaptive Nadaraya Watson as a jump size instead of Nadaraya Watson.Because the main drawback of the Nadaraya Watson estimator used as jump size in regression discontinuity model has poor asymptotic bias behavior whereas Adaptive Nadaraya Watson overcomes this problem.
Variance estimation of regression discontinuity was constructed by Hardle [5].Pagan and Ullah [6] also gave the consistent estimation of variance for the left hand limit i.e. σ 2− and right hand limit i.e. σ 2+ of the derivatives.Similarly Silverman [7] gave density of the discontinuity of the jump size, f 0 ( ū) which can be estimated consistently by kernel density estimation.
In this paper, we have proposed Adaptive Nadaraya Watson kernel estimator for the estimation of σ 2+ , σ 2− for jump size in regression discontinuity model.Further, we have derived the asymptotic properties of σ 2+ , σ 2− .Marginal and hence optimal rate of convergence is derived theoretically to estimate the treatment effect in regression discontinuity by using Stone's [8] definition.The properties such as MSE, MISE and bias effect on the left and right hand side of the discontinuity of regression discontinuity model are also derived.The performance of proposed regression discontinuity model is compared with the existing once numerically.
The rest of the paper is organized as follows: Section 2 includes Nadaraya Watson kernel estimator as a jump size in the regression discontinuity model and technical lemmas are discussed.In section 3, proposed work based on Adaptive Nadaraya Watson kernel estimator and its assumptions are discussed.Simulation study is performed in Section 4. Real life data is used to see performance of proposed estimator in Section 5.The derivations of the Section 3 and Section 4 are available in Appendix .

Nadaraya Watson Estimator and Technical Lemmas
As the previous work is based on Nadaraya Watson kernel estimator used as a jump size in regression discontinuity model.The connection between the regression discontinuity model and the treatment effect have been discussed in Hahn et al. [9].Trochim [10] distinguished among the two dissimilar connections between the Regression Discontinuous model, that depends upon the treatment assignment connected with the observed variable.The treatment assignment given by the indicator variables dε(0, 1) are defined as; where u is observed variable with the known threshold point u.Let y 0 and y 1 be the potential outcomes parallel to the two treatment assignments, and as usual Y = dy 1 +(1− d)y 0 is the observed outcomes.By using the smoothness assumption that E( y i u ) is continuous at u for j = 0, 1.So the expected casual effect of the treatment effect can be identified at the discontinuity point.
Following Lammas are useful in obtaining the main results of our paper (see Porter [3]).
Suppose the kernel k is bounded, symmetric, zero outside of a bounded set M and Lipschitz, and Lemma A 2 .Suppose the kernel k is bounded, symmetric, zero outside of a bound set M and Lipschitz, Lemma A 3 Suppose the kernel k is a bounded, symmetric, zero outside of a bounded set [−M, M].On N 1 , f 0 is continuously differentiable for x = x, xεN 1 , and m is continuous at x with finite right and left hand derivatives.Then

Proposed Estimator and Assumptions
The main purpose of our study is to minimize the bias effect on the left and right side of the discontinuity by using Adaptive Nadaraya Watson Kernel estimator as a jump size.
We also obtained the optimal rate of convergence through simulation study.
Consider the random-design regression model given by; where m is an unknown regression function with compact interval [0, 1] and ε i is the observation error which is independently identically distributed with mean 0 and variance σ 2 .In the discontinuity model we have a cut off point existed for m function whereas cut off point ( ū) is 0 < ū < 1.
Usually regression function is defined as; Here, m(u) represents a continuous function defined on [0, 1].α=jump size and can be defined as where ū= jump size at cut off point Basically the jump size at the possible cut off point ū is m 1 ( ū) = lim u↓ ū m(u) is at the right of the discontinuity curve and m 2 ( ū) = lim u↑ ū m(u) is at the left side of the discontinuity curve. where and d = 1(u i ≥ ū), I(A) is an indicator for event A. u i represents a random variable and d is an indicator whereas hλ * i shows adaptive bandwidth that is used to control the size of local neighborhood on average.First term in the α( ū) is a weighted average depends on the distance from the discontinuity ( ū − U i ) whereas α is the jump size of the discontinuity model.The cut-off point ū provides a chance to observe the average difference in the potential outcomes from the points on either side of the discontinuity.The main point of our estimation is that in all cases, the casual effect is found out from any of the expression that is only involve in the size of the discontinuity in the conditional expectation.Necessary assumptions required to derive the limiting distribution for the estimator are:  a) Smoothness on any side of the discontinuity for compact interval M of ȳ is with M ⊂ ( ȳ), but it allows for unequal left and right side derivatives of m.Let's q o is l q times continuously differentiable and it is bounded away from zero, m(y) is l m times continuously differentiable for ȳ ∈ M/ ȳ. b) Results of the limiting distribution have no effect on the Adaptive Nadaraya Watson estimator, but play an important role in the asymptotic biasness of the subsequent estimator.Whereas left and right hand side of the discontinuity of m to order l q are equal at cut off point( ȳ).
) is continuous for y = ( ȳ) and left and right hand side of the limits exist at ȳ. c) For some ξ > 0, E(|ε| 2+ξ y) is uniformly bounded in a compact interval M.
d) The marginal density f (y) of y is continues on the compact interval M. To estimate the Adaptive Nadaraya Watson kernel estimator, we use the estimators fr (u) and fr (u, y) of the density function to estimate the regression function.We obtain the adaptive Nadaraya Watson kernel estimator with varying bandwidths as follows This idea is formalized in the following theorem, which is based on the limiting distribution of the adaptive Nadaraya Watson estimator at a boundary point.
Theorem 1.By using assumption 1(a), 2(a) and assumption 3 holds l q with any positive integer and l r with any negative integer.If Remarks.
Observations used in the first and second terms of the difference defining the α are independent.For Adaptive Nadaraya Watson estimator at its interior point shows that its asymptotic bias is written in a form that underscores its dependence on the rate with adaptive bandwidth approaches to zero.When p = 0 we are "under smoothness" and the asymptotic bias is zero.When hλ * i ∼ (n −4/5 ) and, α achieves its fastest rate of convergence at (n −4/5 ) than the asymptotic bias term is considered.From this, we see that the bias of α is of order o(hλ * i ).Whereas, from the theorem we find that higher-order bias-reducing kernels do not affect the order of asymptotic bias.Also, left and right hand derivates of the cut-point are equal or not equal do not affect the order of the asymptotic bias.
Theorem 2. Suppose Assumption 1(a), 2(a), and 3 with l q and l m are positive integers and ξ ≥ 2. If

Simulation Study
We have performed practical strategy for Adaptive Nadaraya Watson and Nadaraya Watson kernel estimator to provide simulation evidence for finite sample performance.Hence our objective is to estimate the discontinuity at particular point, for that we use unbiased cross validation which was proposed by Hall and Schucany [11] for the density estimation.Simulation study is used to compare the performance of the Nadaraya Watson kernel estimator and Adaptive Nadaraya Watson kernel estimators.For the simulation study we consider the regression discontinuity function; where, u i is uniformly distributed with interval [0,1].Hardle [5] had given that error term ε i is normally distributed with mean 0 and variance (0.1).We have generated sample of size 50,100,250,500,1000 for the fixed bandwidth and adaptive bandwidth.We have used the Epanechnikov and Gussain kernel density function for the simulation.For each group of simulation, we have calculated mean square error (MSE), mean integrated square error(MISE), bandwidth and jump size of the proposed model which we have considered.The number of the replication is 1000 and for the varying sample sizes.
In the whereas; Where α is sensitivity parameter which varies between (0, 1) or we can write it as 0 < α < 1.Here we take α = 0.5 As, we have By using the transformation t = (y−Y i ) After simplification we have; By replacing in equation (5.1) we have; Hence, we obtain the Adaptive Nadaraya Watson kernel regression estimator as: By simplify we have; Replacing it in equation ( 5.3) we have estimated jump size is Proof of Theorem 1.
Let q denote a positive generic constant and as we know M is a compact interval.Let suppose M 0 also be compact interval such that ūε int(M 0 ) and M 0 ⊂ int(M)and by using assumption 1(a), suppose we support of the kernel k is [-M,M].Hence the observation just to the right of the discontinuity are more likely to be greater than the intercept m( ū) + α giving upward biassed.similarly, an average of observations just to the left of the discontinuity would provide a downward biassed estimate m( ū).We have By replacing y i = m(u i ) + ε i And rearranging the above equation, we have Multiple both sides with nhλ * i , we have nhλ * i ( α −α); By simplification we have .
(5.4) Now by taking the denominator of the first term.Show 1 n ∑ n j=1 Taking variance on both sides, we have (5.5) Hence by using the transformation in equation (5.4), we have By simplification, we have Then, by using Chebyshev's Inequality the general formula is 1 By using the transformation, we have; Hence, we know that Similarly, we do with second term of equation ( 5.3), by taking their denominator; Taking variance on both sides, we have (5.6) Hence by using the transformation we have By simplification, we have Then, by using Chebyshev's Inequality the general formula is 1 Hence, we know that Hence, when we have large number we use Liapunov's condition; By using central limit theorem (CLT) to find out the asymptotic variance; Similarly, we have now by using Liapunov's CLT and after simplification, we have Finally, we consider the bias of the estimator; Hence, again by using Chebyshev's Inequality; By using the values which we find out for the denominator of equation ( 5.3), replace into the equation (5.7), we have After simplification, we have Therefore, at the end after simplification and by using different method, we proved that: Proof of Theorem 2. In order to give importance to the optimization, we have The kernel density estimate at y is: Whereas, q+ (y) is the density estimate come from data to right of the discontinuity and similarly q− (y) is the part left of discontinuity.Whereas, m(y i ) = v i j ŷ j is the consistent adaptive nadaraya Watson kernel estimator of m(y i ).
Such that q − is l q times continuously differentiable and bounded away from zero.And δ 2 (y i ) = var(z i /y i ) is uniformly bounded near y 0 and the limits δ 2+ (y 0 ), δ 2+ (y 0 ) exist are finite we have; and s(y) = m(y)q 0 (y) Suppose α ( jump size) is a consistent estimator for α.
Then by defining their left and right hand side of variance estimation by Hence, we consider the numerator of the equation (5.8) Divide equation (5.9) in part (a + b + c) and solve one by one; By using Lemmas A 1 , A 2 we have now by taking part we have By using Lemma A 3 Similarly by taking part, we have By using lemma A 3 , we have By using approximation, we have As already we have prove they asymptotic properties of the estimator, we use the asymptotic variance that is; Hence, by using the results of chebysehev's inequality for the uniform convergence, and also that ( α − α) = o p ( Hence, we consider the numerator of the equation (5.10) (5.11) Divide equation (5.11) in parts and solve one by one; By solving using Lemmas A 1 , A 2 we have sup yεM 0 |d q− (y)| ≤ sup yεM 0 d| q− (y)−E q(y)|+sup By using approximation, we have As already we have prove they asymptotic properties of the estimator, we use the asymptotic variance that is; This result requires a more stringent moment for the limiting distribution to estimate .variance estimator and the above consistency theorem don't required continuity in the derivatives of m at the discontinuity.

Assumption 1 .
Choice for the kernel estimator: a) Kernel estimator is symmetrically bounded, Lipchitz function and bounded .

= 1 b)
For any positive integer, k(v)v j dv = 0 1 < j < r −1 and r ≥ 3 Let's suppose q o be the any marginal density function of y and m(y) denotes the conditional expectation of z given y minus discontinuity.So, m(y) = E(z/y) − α1[y ≥ ȳ].Assumption 2.

Table 2
Adaptive Bandwidth, Jump Size, MSE and MISE are minimized as compare to Table1as we increase the sample size and we can also see that the rate of convergence of Adaptive Nadaraya Watson is faster than Nadaraya Watson estimator.From Table4and 5 we can see that the MSE of proposed estimator is less than the MSE of existing estimator.We can also check the rate of convergence of proposed Adaptive Nadaraya Watson estimator is faster than existing Nadaraya Watson estimator.
5. Real Life DataData is taken from Gross Domestic Product, Current Prices.Values are based upon GDP in national currency converted to U.S. dollars using market exchange rates (yearly average).Exchanges rate projections are provided by country econometrics for the group of other emerging market and developing countries.Exchanges rates of advanced economics are established in the WEO assumptions for each WEO exercise.The data is summarized in Table3.(Source of Data:International Monetary Fund, World Economic Outlook Database, April 2015)6.ConclusionIn this study, we concluded that the results obtained from the Adaptive Nadaraya Watson Kernel estimator gives better result than the Nadaraya Watson estimator used in discontinuity model.We showed it by finding Mean Square Error and Rate of convergence.AppendixProof.To estimate the Adaptive Nadaraya Watson estimator we express m( ū) in term of probability density function pd f f (u, y).As we have

Table 1 .
Nadaraya Watson Estimator with Epanechikov as Density Function

Table 2 .
Adaptive Nadaraya Watson Estimator with Epanechikov as Density Function

Table 3 .
Real Data: Year and Gross Domestic Product Current Prices

Table 4 .
Nadaraya Watson Estimator With Epanechikov As Density Function (Existing Model)

Table 5 .
Adaptive Nadaraya Watson Estimator With Epanechikov As Density Function (Proposed Model)