A Good Choice of Ridge Parameter with Minimum Mean Squared Error

In this paper, the problem of estimating the regression parameters is considered in a multiple regression model Y= X α + u hen the multicollinearity is present. Two suggested methods of finding the ridge regression parameter k are investigated and evaluated in terms of Mean Square Error (MSE) by simulation techniques. A number of factors that may affect the properties of these methods have been varied. Results of a simulation study indicate that with respect to MSE criteria, the suggested estimators perform better than both the ordinary least squares (OLS) estimator and the other estimators discussed here. Citation: Iguernane M (2016) A Good Choice of Ridge Parameter with Minimum Mean Squared Error. J Biom Biostat 7: 289. doi:10.4172/21556180.1000289 J Biom Biostat ISSN: 2155-6180 JBMBS, an open access journal Page 2 of 4 Volume 7 • Issue 2 • 1000289 ∑ ∑ + + + = = 2 2 1 2 2 ) ( ) ( ) ( ˆ ( i i i i p i i i i k k k k MSE λ α λ λ σ α , (8)


Introduction
In multiple regressions it is known that the parameter estimates, based on minimum residual sum of squares, have a high probability of being unsatisfactory if the prediction vectors, X, and are multicollinear.
In fact, the question of multicollinearity is not of existence, but of degree. In the situation when the prediction vectors are far from being orthogonal, i.e. when strong multicollinearities exist in X, Horel and Kennard [1] suggested ridge regression to deal with the problem of estimating the ridge regression parameters. However, if the degree of multicollinearity in X is not strong, then the data are near-orthogonal. In this situation, various estimators (called shrunken estimators, introduced by Stein [2] are known to dominate the OLS estimator, as is shown in many simulation studies comparing shrunken estimators among themselves and with the OLS estimator, Vinod [3] and Gunst and Mason [4].
To discuss the multicollinearity problem, let us consider the standard multiple linear regression model; Where (Y= T × 1) consists of the observations on the dependent variable, (Y= T × p) is the matrix of observations on the explanatory variables, (Y= T × 1) is the residual vector. Obviously, we have; Where α is the OLS estimator for α . If we denote X′X by Q and 2 2 σ σ = u , then the variance of α is given by; Denoting the eigenvalue of Q by i λ , it can be shown that; In case of severe multicollinearity, Q becomes almost singular, which means that one or more of the eigenvalues i λ are close to zero.
The effect on the OLS estimator is obvious from Eq. (4). The variance of the estimator becomes large and the estimates are strongly correlated. A further effect of multicollinearity is that the parameter estimates tends to become "too large", which is shown by the fact that; The measurement of the severity of the multicollinearity in no straight-forward matrix, except in the case when the number of explanatory variables is just two. In this case, the simple correlation coefficient between the explanatory variables contains all the relevant information. When the number of explanatory variables is large than two, the pairwise correlation coefficients are still important. However, it is possible for the multicollinearity to be quite severe, even if all simple correlation coefficients are only moderately large. The fundamental reason for all the trouble is the fact that the matrix Q is almost singular. Thus, the value of the determinant of this matrix is an indicator of the severity of the problem.
In an effort to circumvent the problems caused by multicollinearity, Horel and Kennard [1] proposed a biased estimator usually called ridge regression and defined as follows; It is almost possible to find a value of k for which; This result is obtained by Horel and Kennard [1]. This means that it is always possible to find a value of k that leads to a smaller MSE than in the case of OLS estimator. There are many different techniques for estimating the ridge parameter k have been proposed (see for example, Horel, Kennard and Baldwin [5], Gibbons [6], Kibria [7], Khalaf and Shukur [8], Khalaf [9], Khalaf [10], Khalaf and Iguernane [11].
The plan of this paper is as follows: In Section 2, the two proposed methods to estimate the ridge regression parameter k are described. Then we illustrate the simulation in Section 3, after that the results of simulation are given in Section 4. And finally, a summary and conclusion are presented in Section 5.

The Proposed Estimators
Horel and Kennard proved that the value i k which minimize thê For this estimator, we use the acronym HK.
Based on Eq. (10), we will review some methods as follows; (1) Horel et al. [5] proposed a different estimator of k by taking the harmonic mean of i kˆ in Eq. (10). That is; ,ˆ2 α α σ ′ = p k HKB (11) Where α is the OLS estimator of α . For this estimator, we use the acronym HKB.
(2) From the Bayesian point of view, Lawless and Wang [12] suggested an estimator of k. The corresponding estimator of HKB is given by; .ˆ2 α α σ X X p k L W ′ ′ = (12) For this estimator, we use the acronym LW.
New estimator 2 MI , by using Eq. (12) which gives the following estimator; The Simulation In this section, we present a simulation to illustrate the performance of the ridge regression estimator based on the suggested estimators when compared with the OLS estimator and the ridge regression estimators, based on HK, HKB and LW. The properties of these estimators will be compared in terms of MSE criterion. To compare between these five methods, we prefer that who give the smallest MSE.
Following Muniz and Kibria [13], the explanatory is generated by; Where ij z are generated using the standard normal distribution and the dependent variable is then determined by Three factors can affect these properties; the first factor is that of the sample size (n), the second one is the degree of correlation between the explanatory variables, and, finally, the error variance as a third factor. In other words, we will study the consequences of varying n, degree of correlation and error variance.
To investigate the effect of sample sizes on the properties of all estimators under consideration, we used samples of the size; 10, 20, 70 and 150 which may cover situations of small and large samples.
Two models are used, one is the 6-factor structure, and another is the 8-factor structure. Since our primary interest lies on investigating the properties of our proposed approaches to minimize the MSE, thus the different degrees of correlation between the variables included in the two models, has been used. We choose these values equal to 0.

The Results
In this section we present the results of our simulation concerning the properties of our suggested estimators and that for the others for choosing the ridge regression parameter k, when multicollinearity among the columns of the design matrix exists.
It is known that goodness and accuracy of an estimator is quantified through the MSE criterion. We now compare the MSE among the different methods used to develop the ridge regression parameter k. Small MSE indicates a good performance of the respective suggested method. In what follows, we go through Tables 1 and 2.
It is noted that our suggested estimators MI 1 and MI 2 produce a small MSE among all the parameters under consideration in both models, in particular MI 1 , when the sample size and the correlation coefficient are large. We also noted that the HKB estimator performs well comparative with the OLS estimator and the other ridge estimator.
In Model (2), it is clear that MI 1 is better than all other estimators, especially, when n is large followed by HKB.

Discussions and Conclusions
In this paper, we studied the properties of two modifications of Horel et al. [5], given by (11), and Lawless and Wang [12], defined by (12), proposed approaches for choosing the ridge parameter k when the multicollinearity among the explanatory variables exists.
The investigating has been done using simulation technique where, in addition to the different multicollinearity levels, the numbers of observations and the error variance have been varied. For each combination, we have used good replication. The evaluation of our suggested methods, given by Eqs (13) and (14), has been done by comparing the MSEs between these methods and those of HK, HKB and LW. We found that the performance of our suggested methods, in particular, outperform the others in almost all cases, especially when the variance of the residual and the sample size are large. The results also indicate that all methods produced a smaller MSE than that of the OLS, and the OLS estimator gets the worst in all cases with regard to the MSE criterion.