The Comparison Between Different Approaches to Overcome the Multicollinearity Problem in Linear Regression Models

In the presence of multi-collinearity problem, the parameter estimation method based on the ordinary least squares procedure is unsatisfactory. In 1970, Hoerl and Kennard insert an alternative method labeled as estimator of ridge regression. In such estimator, ridge parameter plays an important role in estimation. Various methods were proposed by many statisticians to select the biasing constant (ridge parameter). Another popular method that is used to deal with the multi-collinearity problem is the principal component method. In this paper, we employ the simulation technique to compare the performance of principal component estimator with some types of ordinary ridge regression estimators based on the value of the biasing constant (ridge parameter). The mean square error (MSE) is used as a criterion to assess the performance of such estimators.


Introduction
Consider the linear regression model (1) where is 1 vector of response variable, is matrix of explanatory variablesand n > p, is 1 vector of unknown parameters, is 1 vector of unobservable random errors and 0, The aim of regression analysis is to estimate the numerical values of linear model parameters.Recently, biased estimators of regression parameters get attention of many researchers, because the ordinary least squares procedure is unable to provide reasonable point estimates when the matrix of explanatory variables if there exists the problem of multi-collineardata.Where we refer through the paper to the ridge regression estimators and principal component estimators as alternatives to the ordinary least square estimators with multi-collineardata.The estimators of each ridge regression and principal component allow a small amount of bias in order to achieve a major reduction in the variance in contrast to ordinary least squares.

The Case of Multi-collinearity
The problem of multi-collinearity occurs when there exists an exact linear relationship or an approximate linear relationship among two or more explanatory variables, two types of multicollinearity may be faced in regression analysis, exact and near multi-collnearity.During regression calculations, the exact linear relationship causes a division by zero which in turn leads to the abortion of the calculations.In case of not exact relationship, the calculations will not be aborted and the division by zero does not occur.Nevertheless, the results will be distorted when the division is done by a very small quantity.Therefore, the determination whether the multi-collinearity is a problem is one of the first steps in regression-analysis.Multi-collinearity can be thought of as a situation where two or more explanatory variables in the data set move together, as a consequence it is impossible to use this data set to decide which of the explanatory variables is producing the observed change in the response variable.Some multi-collinearty is nearly always present, but the important point is whether it is serious enough to cause appreciable damage to the regression analysis.Indicators of multicollineaity include a low determinant of the information matrix X'X, a very high correlation among two or more explanatory variables, very high correlation among two or more estimated coefficients, a very small (near zero) eigenvalues of the correlation matrix of the explanatory variables and the too large condition number.

The Class of Shrinkage Estimators
Applying the singular value decomposition technique, we can decompose the matrix X as follows [1] 1 2 X = H Λ G' (2) where H is ( n p  ) matrix satisfying H'H =Ip,  is a (pp) diagonal matrix of ordered singular values of X.  is thej th column of the matrix G, j δ is the j th diagonal element of the shrinkage factors diagonal matrix  , j 0 δ 1, j = 1, 2,..., p   and j C is thej th element of the uncorrelated components vector C.

Ordinary Ridge Regression Estimators
The most popular method that has been proposed to deal with multi-collinearity problem is the ordinary ridge regression.The ordinary ridge regression method is a modification of ordinary least squares method to allow the biased estimators of regression coefficients.
The ridge estimators depend crucially upon an exogenous parameter, say k, called the ridge parameter or the biasing parameter of the estimator.For any k 0  , the corresponding ordinary ridge estimator denoted by RR b is defined as:  is a constant selected by the statistician according to some intuitively plausible criteria put forward by Hoerl and Kennard [2].It can be shown that the ridge regression estimator given in equation ( 4) is a member of the class of shrinkage estimators as follows By using matrix algebra and singular value decomposition approach we get Equivalently, the shrinkage factors j δ , j= 1,2,...,p of the ridge estimator has the form Where j λ is thej th element (eigenvalue) of the diagonal matrix  , and K is the ridge parameter.
The mean square error of ordinary ridge regression estimator can easily demonstrated to be [2] ‫ﺍﻟﺘﻄﺒﻴﻘﻴﺔ‬ The first term can be shown to be the sum of variances(total variance) of the parameter estimates and the second term can be considered to be the square of the bias introduced when RR b is used instead of OLS b .

Choice of Ridge Parameter
The ordinary ridge regression estimators do not provide a unique solution to the multicollinearity problem, but provide a family of solutions.These solutions depend upon the ridge parameter (the value of k).No explicit optimum value can be found for k.Yet, several stochastic choices have been proposed for this ridge parameter.Some of these choices may be summarized as follows Hoerl and Kennard (1970).Suggested graphical method called ridge trace to select the value of the ridge parameter k.When viewing the ridge trace, the analyst picks the value of k for which the regression coefficients have stabilized.
Often, the regression coefficients will vary widely for small values of k and then stabilize.We have to select the smallest value of k (which introduced the smallest bias) after which the regression coefficients have seemed to remain constant.Hoerl, Kennard and Baldwin in (1975), proposed another method to select a single value of K given as [3] 2 Where p is the number of explanatory variables, 2  S is the OLS estimator of 2 σ and OLS b is the OLS estimator of the vector of regression coefficients .Lawless and Wang (1976) proposed selecting the value of K by using the formula [4] 2 Assuming that the regression coefficients vector has certain prior distribution srivastava followed Bayesian approach to estimate the ridge parameter.He concluded that [5] Where tr (X'X) denote the trace of the matrix X'X.
Hazim Mansoor Gorgees and Fatimh Assim Mahdi (2017) proposed a new method for selecting the ridge parameter by employing the concept of condition number [6].
The suggested estimator denoted as ĈN k is defined as Where CN reffered to the condition number which is the ratio of the largest to the smallest singular value of the matrix of explanatory variables X.

Principal Components Regression
Ridge regression was offered as a technique which attempted to overcome the multicollinearity problem.An alternative procedure known as principal components approach, was first proposed by Harold Hoteling (1933).In order to obtain a good realization of this approach let us proceed our discussion with the case of two predictors .If these predictors are correlated then the matrix X will not be orthogonal consequently, this will complicate the interpretation of the effects of and on the response variable y.From the geometric point of view, suppose we rotate the coordinate axis so that in the new system, the predictors are orthogonal.Moreover, let us make the rotation so that the first axis lies in the direction of the greatest variation in the data, the second axis lies in the direction of the second greatest variation in the data.These rotated directions ( and say in our two predictors' case) are simply linear combinations of the original predictors.We now illustrate how these directions can be calculated.Using singular value decomposition then Since G is orthogonal matrix then the general linear regression model y = Xβ +  can be rewritten as Where Z= XG and  = G'β Hence: Where ⋯  0 are the eigenvalues of X'X.The columns of G are the eigenvectors of X'X and the columns of Z are the principal components of X and these are orthogonal to each other.
Thus, the procedure creates a set of artificial variables 's j z from the original ' s j X via a linear transformation Z = XG in such a way that the Z vectors are orthogonal to each other.The Corresponding to the largest value is called the principal component and it explains the largest proportion of the variation in the standardized dataset.Further, 's j z explain smaller and smaller until all variation is explained.Typically, one does not use all the 's j z but follows some type of selection rule.No universal rule is presented for selecting the components.Some statisticians use the rule that only eigenvalues greater than 1 are of interest.Other statisticians suggested that the components might be computed until some arbitrarily large proportion ( maybe 0.75 or more ) of the variances has been explained the OLS estimator of  is given as: Assuming that the first q ( q p ) principal components are selected, then the reduced estimator can be written as Where q Z = X q G , q G denote as the first q eigenvectors of X'X matrix and q  is the diagonal matrix contains the first q eigenvalues of X'X.
To find the principal component estimator of the regression coefficients in terms of the original variables we can solve  = G'β for β to get β = G since G is orthogonal matrix.Let ' ' The Simulation Results To exhibit multi-collinearity in the simulated data, we use different degrees of correlation between the variables included in the model.Specifically, we assume correlation values to be ρ 0.75, 0.80 and 0.95  ,four predictor variables have been generated.Since the performance of different estimators is influenced by the sample size, we have used three types of samples, small of size 20, median of size 50,80 and large of size 200.The standard deviations of the error terms are taken as σ 10,25 and 30


. Ordinary ridge estimates are computed using different ridge parameters given in equation ( 8) to (11) and the principal components regression given equations (12) to (15).The mean square error (MSE) is used as a criterion in order to assess the performance of the stated methods.This experiment is repeated 1000 times.And the results are presented in tables (1), ( 2) and (3).

Conclusions
In the sense of mean square error (MSE) as a criterion of performance.In our paper "An Alternative Approach for selecting Ridge Parameter for Ordinary Ridge Regression Estimator Regression Estimator" International Journal of Science and Research (IJSR).We made the comparison between the performance of different type ordinary ridge regression as well as the generalize ridge regression and we found the proposed method was the best in the since of MSE, while in this paper we introduced another which will be known method of estimation which is the principal component method and compared it with many different types of ridge y = G' OLS b is the vector of uncorrelated components of OLS b .This can be noticed by considering the variance-covariance matrix of C that can be easily shown to equal the diagonal matrix2 1 Haitham J. for Pure & Appl.Sci.Vol.31 (1) 2018