Difference-based ridge-type estimator of parameters in restricted partial linear model with correlated errors

In this article, a generalized difference-based ridge estimator is proposed for the vector parameter in a partial linear model when the errors are dependent. It is supposed that some additional linear constraints may hold to the whole parameter space. Its mean-squared error matrix is compared with the generalized restricted difference-based estimator. Finally, the performance of the new estimator is explained by a simulation study and a numerical example.

component. Wang et al (2007) presented higher-order differences for optimal efficiency in estimating the linear part by using a special class of difference sequences.
In this article we will use the ridge regression concept that was presented by Hoerl and Kennard (1970) to overcome the multicollinearity in regression problem. Multicollinearity is denoted as the existence of nearly linear dependency among column vectors of the design matrix X in the linear model y = Xβ + ǫ, where y is a n × 1vector of observed responses, X is the observed matrix of independent variables of dimension n × p, assumed to have full rank p, β is an unknown parameter, ǫ is an error vector with E(ǫ) = 0, E(ǫǫ ′ ) = σ 2 I p . Multicollinearity may lead to wide confidence intervals for individual parameters may produce estimates with wrong signs, etc.
The condition number is a measure of the presence of multicollinearity. The condition number of the matrix X present some information about the existence of multicollinearity, however it does not illustrate the structure of the linear dependency among the column vectors X 1 , X 2 , . . . , X n . The best way of illustrating the existence and structure of multicollinearity is to see the eigenvalues of X ′ X. If X ′ X is ill-conditioned with a large condition number a ridge regression estimator can be used to estimate β [see e.g., Swamy et al. (1978); Sarkar (1992); Shi (2001); Zhong and Yang (2007); Zhang and Yang (2007); Tabakan and Akdeniz (2010); Akdeniz and Tabakan (2009);Roozbeh et al. (2010); Duran and Akdeniz (2012); ; Hu (2005) and Hu et al. (2015)]. In this paper, we will examine a biased estimation techniques to be followed when the matrix X ′ X appears to be ill-conditioned in the partial linear model. We suppose that the condition number of the parameteric component is large explain that a biased estimation procedure is desirable.
The rest of the paper is organized as follows. In section "The model and differencing-based estimator", the model and differencing methodology are given. Section "Generalized difference-based ridge estimator" contains the definition of the generalized difference-based ridge estimator and some comparison results are given in section "MSEM-superiority of the generalized difference-based ridge estimator β GRD (k) over the the generalized restricted difference-based estimator β GRD ". The results from section "MSEM-superiority of the generalized difference-based ridge estimator β GRD (k) over the the generalized restricted difference-based estimator β GRD " are applied to a simulation study in section "Exemplary simulation" and a numerical example is given to illustrate the theoretical result in section "A numerical example". Some conclusion remarks are given in section "Conclusions".

The model and differencing-based estimator
In this section we use a difference-based method to estimate the linear regression coefficient vector β. This method has been presented to remove the nonparametric component in the partially linear model by many authors (Yatchew 1997(Yatchew , 2000(Yatchew , 2003. Consider the following partially linear model where f is an unknown smooth function and has a bounded first derivative. (3) y = Xβ + f + ε Now we present the differencing method. Let d = (d 0 , . . . , d m ) be a m + 1 vector, where m is the order of differencing and d 0 , . . . , d m are differencing weights satisfying the conditions Now, we denote the (n − m) × n differencing matrix D whose elements satisfy Eq. (4) as follows: This and related matrices are given, for example, in Yatchew (2003). Then we can use the differencing matrix to model (3), and this leads to direct estimation of the parametric effect. In particular, take Since the data have been reordered so that the X ′ s are close, the application of the differencing matrix D in model (6) can remove the nonparametric effect in large samples (Yatchew 2003). This ingores the presence of Df(t). Thus, we may write Eq. (6) as or where y = Dy, X = DX and ǫ = Dǫ.
So, we can see that ǫ is a n − m vector of disturbances distributed with For arbitrary differencing coefficients satisfying Eq. (8), Yatchew (1997) defines a simple differencing estimator of the parameter β in a partial linear model Hence, differencing allows one to perform inferences on β as if there were no nonparametric component f() in the model (3) (Yatchew 2003). Once β is estimated, a variety of nonparametric techniques could be applied to estimate f() as if β were known.
In order to account for the parameter β in Eq.
(3), we propose the modified estimator of σ 2 , defined as where P is the projection matrix and defined as

Generalized difference-based ridge estimator
In this section we discuss the following partially linear model: with E(ε) = 0 and E(ε ′ ε) = σ 2 V . So using the method we proposed in section "The model and differencing-based estimator", we have ε = Dε is a (n − m)-vector of disturbances distributed with It is well known that adopting the linear model (12), the unbiased estimator of β is the following generalized difference-based estimator given by and the modified estimator σ 2 , where P is the projection matrix and defined as It is observed from Eq. (14) that the properties of the generalized difference-based estimator of β depends heavily on the characteristics of the information matrix C D . If the C D matrix is ill-conditioned, then the β GD leads to large sampling variances. Moreover, some of the regression coefficients may be statistically insignificant with wrong sign and meaningful statistical inference becomes difficult for the researcher. As a remedy, we consider the linear constraint for a given q × p matrix R with rank q < p. Subject to the linear restriction (17), the generalized restricted difference-based estimator is given by Now we propose a generalized difference-based ridge estimator, which is defined as Then, it is easy to see that β GRD and β GRD (k) are restricted with respect to Rβ = 0. It is also clear that for k = 0, we obtain β GRD (0) =β GRD .
MSEM-superiority of the generalized difference-based ridge estimator β GRD (k) over the the generalized restricted difference-based estimator β GRD In this section, our aim is to examine the difference of the mean squared error matrices (MSEM) of two estimators β GRD (k) and β GRD . Let b * be an estimator of β in model Y = Xβ + ǫ. The MSEM of b * is defined as If we denote the covariance matrix of an estimator b * by V (b * ), then (21) is equivalent to The scalar valued mean square error MSE is given by Using Eq. (20), we obtain and Thus, Then, the difference Var(β GRD ) − Var(β GRD (k)) can be expressed as Since W is an nonnegative definite matrix [see Shi (2001)], we can conclude that Var(β GRD ) − Var(β GRD (k)) is an nonnegative definite matrix. It is of interest to know under which conditions β GRD (k) is better than β GRD . For this, we investigate the difference � = MSEM(β GRD , β) − MSEM(β GRD (k), β), when is nonnegative definite matrix, β GRD (k) is preferred to β GRD . Thus, for the MSE, of the generalized difference-based ridge estimator β GRD (k), from (23) and (24), we obtain Since β GRD is unbiased estimator for β, we have Now from (27) and (28), we may write the difference Then using Theorem (Farebrother 1976), we can conclude that if

Exemplary simulation
In this section, we study the MSE of the proposed estimator. Our sampling experiment consists of different combinations of k and n. In this paper, we simulate the response from the following model: .05 that is called Doppler function for t i = (i − 0.5)/n and for i = 1, . . . , n, the explanatory variables are generated by the following equation (Liu 2003): where z ij and z i(p+1) are independent standard normal pseudo-random numbers and γ is specified so that the correlation between any two explanatory variables is given by γ 2 . In this paper, we consider n = 200 and p = 4.
In this article we use a third-order differencing coefficients d 0 = 0.8502, d 1 = −0.3832 , d 2 = −0.2809, d 3 = −0.1942 in which m = 3. Now, we define the (200 − 3) × 200 differencing matrix as follows:  (17), the R is given as follows: Let GRD define the generalized restricted difference-based estimator and GRDR define the generalized restricted difference-based ridge estimator and the estimated MSE of GRD and GRDR are given in Figs. 1, 2 and 3.
From Figs. 1 and 3, we see that we k is smaller, the new estimator is better than the generalized difference-based estimator in the mean squared error sense. And with the increase of the mulitillinearity, the new estimator is perform well.

A numerical example
In this section, we consider a numerical example to explain the performance of theoretical result presented in "MSEM-superiority of the generalized difference-based ridge estimator β GRD (k) over the the generalized restricted difference-based estimator β GRD " section. The data was generated by Yatchew (2003), later discussed by Tabakan and Akdeniz (2010) and came from the survey of 81 municipal electricity distribution in Ontario, Canada, in 1993.
As we all know, the partial linear model is a simple semiparametric generalization of the Cobb-Douglas model. We consider a simple variant of the Cobb-Douglas model for the cost of distributing electricity for tc stands for the log of total cost per customer, cust denotes the log of the number of customers, wage defines the log of wage rate, pcap stands for the log price of capital, puc denotes a dummy variable for the public utility commissions that deliver additional services and may benefit from economy of scope, kWh defines the log of kilowatt hours per customer, life denotes the log of the remaining life of distribution assets, lf shows the log of the load factor and kmwire presents the log of kilometers of distribution wire per customer (Tabakan and Akdeniz 2010). It is easy to see that (34) contains both nonparametric effect and parametric effects.
Since V is seldom known, the estimation of V can be used. Trenkler (1984) gave some estimates of V as where the terms of the error vector are from the MA(1) process: . For the linear restriction (17), the R is given as follows: In this section, we study ρ = 0.3, σ 2 µ = 0.1 and consider matrix V is estimated by (35). It is easy to compute the condition number is 2365.158, suggesting the presence of severe collinearity.
Now we see theorem 21 That is to say our numerical example satisfied with theorem 4.1. This also means our method is meaningful in practice.

Conclusions
In this article, we present a new generalized difference-based ridge estimator that can be applied in the presence of multicollinearity in a partial linear model. Its MSE is compared analytically with the generalized restricted difference-based estimator. It is shown that for small values of the ridge parameter k, the new estimator is MSEM-superior to the generalized restricted difference-based estimator over an interval depending on the design points and the unknown parameter. (37) β ′ W + 2 k I −1β = 0.0578 < σ 2