Comparing the Principal Regression Analysis Method with Ridge Regression Analysis in Overcoming Multicollinearity on Human Development Index (HDI) Data in Regency/City of East Java in 2018

East Java Province is one of the densely populated provinces. The population density in the province of East Java does not hamper the welfare of its people. The welfare of the people in the province of East Java can be seen through the Human Development Index (HDI), which includes elements of life expectancy, education, and economy. According to the Central Statistics Agency (BPS) of East Java, the HDI level from 2017 to 2018 has increased, but this data experiences a case of multicollinearity. Researchers compared the Principal Regression Analysis method with Ridge Regression Analysis in overcoming the multicollinearity. The conditions taken are methods that have a small MSE value and a large R 2 value. Based on the results and discussion, the Main Component Regression Analysis method has an MSE value of 2.51 and an R 2 value of 91.0%, the Ridge Regression Analysis Method has an MSE value of 0,00071 and an R 2 of 97.7%. The provisions obtained are the value of MSE Regression Ridge is smaller than the value of MSE Regression Main Components and the value of R 2 Regression Ridge is greater than the value of R 2 Main Components. So, the best method in overcoming the multicollinearity problem in the Regency / City Human Development Index (HDI) data in East Java in 2018 is the Ridge Regression Analysis method.


Introduction
The number of problems or cases that occur in everyday life, do not escape from the factors that influence the problem or case. Problems that occur can include economic, social, cultural, political, religious, health problems, to population density. Talking about population density in an area, synonymous with the welfare of the people who live in it. One area that has a dense population is in the province of East Java. The level of welfare in the province of East Java can be seen through the Human Development Index (HDI), which includes life expectancy, education, and the economy.
According to the East Java Central Statistics Agency (BPS), the level of HDI in East Java Province from 2017 to 2018 increased by 0.72%. Surabaya City has the highest HDI of 81.74, while the region with the lowest HDI is in the Sampang region, which is 61.00. However, the district/city Human Development Index (HDI) data in East Java Province in 2018 experienced a case of multicollinearity. Multicollinearity is a strong relationship between one independent variable with another independent variable. Independent variables or factors that influence the level of district/city Human Development Index (HDI) in East Java Province in 2018, include the percentage of health complaints, percentage of poverty rate, percentage of old school expectations, percentage of the average length of the school, and percentage of gross domestic product.
Referring to the problems above, a method of analyzing the Main Component Regression Analysis with Ridge Regression Analysis will be used to overcome the multicollinearity. The context  (Mangaraj,Aparajita [4]) discusses the analysis and building of a general Human Development Index (HDI) model. According to (Sen, Zhang, Nagarajaiah, Sun [6]) multicollinearity problems can be overcome by using the Principal Regression Analysis method. Multicollinearity can also be overcome by Ridge Regression Analysis (Gorski, Jakubowska [2]). The advantage of the Principal Component Analysis method is that high correlations can be reduced without eliminating the original variables. However, it can only show the characteristics of the independent variables. The strength of Ridge Regression Analysis is that the estimated value of the ridge regression is biased, more stable, and relatively small. This bias estimator can meet the ridge regression estimator coefficient.
The purpose of this study is to find out the best method for dealing with multicollinearity cases in the Human Development Index (HDI) data of regencies/cities in East Java Province in 2018. Through this article, a comparison of the methods of Main Component Regression Analysis with Ridge Regression Analysis with the method selection requirements is having a small Mean Square Error (MSE) and a large Determination Coefficient (R 2 ). This research expects to be able to see the best method to overcome multicollinearity in the district/city Human Development Index (HDI) data in East Java Province in 2018 and produce significant results.

Method
Quantitative research is a method that plays a role in the research of this article. The nature of this quantitative research is systematic, detailed, and structured. In addition, this study also uses tables to show the results of the data obtained. The data used in a quantitative approach is population data or sample data and using research instruments and data analysis processes that have statistical properties in the presence of a predetermined hypothesis test.
The Human Development Index data used in this study are secondary data from the Central Statistics Agency of East Java Province in 2018 [1]. The amount of data used is 38 data. Factors that influence Human Development Index (HDI) include life expectancy, education, and economy. There are aspects that can be described from each of these Human Development Index (HDI) factors, (Yektiningsih [10]).
The variables in this study have two types of variables, namely the independent and dependent variables. The dependent variable used is the level of district/city Human Development Index (HDI) in East Java Province in 2018. Then for the independent variables used include the percentage of health complaints, percentage of poverty, percentage of old school expectations, percentage of the average length of the school, and the percentage of gross domestic product. This research was conducted with the help of Minitab 16 and Ms. Excel.
Before analyzing using methods Regression Analysis of Principal Components and Ridge Regression Analysis, the data needs to be tested using the assumption of multiple linear regression first. Things that must be done in the assumption of multiple linear regression include data that must be normally distributed, data not heteroscedasticity, data not autocorrelation and data not multicollinearity. after the data has passed the multiple linear regression assumption test and the data experiences multicollinearity, the data must be overcome by methods Regression Analysis of Principal Components and B.
The process of handling data using Regression Analysis of Principal Components method includes standardization of independent variables, determining eigenvalues and eigenvectors, regression of dependent variables with new independent variables (main component scores), returning the transformation results and ANOVA test of principal component analysis ( [2], [3], [5], [9]). The second process of handling data using Ridge Regression Analysis method. The steps include transforming the Y and X variables through centering and rescaling, forming a matrix from the results of the transformation, determining the bias constant value c, determining the estimator coefficient of the selected VIF value c, the ridge regression ANOVA test and seeing the estimator relationship of the ridge regression parameters ( [3], [6], [7], [8] Based on Table 1 it can be seen that only the fourth assumption does not meet the assumption of Multiple Linear Regression, that is data, not multicollinearity. Assumptions that are not met will be overcome using two methods of Main Component Regression Analysis and Ridge Regression Analysis. Then, the results of the two methods will be compared to get the best method in overcoming the multicollinearity problem.

Regression Analysis of Principal Components. According to (Sriningsih, Hatidja, Prang [8]) The
Principal Component Regression Analysis Process starts from the standardization of the independent variables, determines the eigenvalue and eigenvector vector, regresses the dependent variable with the new independent variable (the score of the main component), returns the transformation results and ANOVA Test of the Main Component Analysis. The process will bring up new independent variables that are no longer correlated. (1) from the results of the Main Component Regression model, the transformation process will be returned to (Setiani [7] ANOVA test can show the MSE value and the 2 value of the Ridge Regression. The MSE and 2 values of the Ridge Regression Analysis method can be seen in Table 3.

Discussion of Result
Based on the research results, the general model obtained from the Ridge Regression method in overcoming multicollinearity problems is . This model interprets that the increase in Human Development Index (HDI) for East Java in 2018 is influenced by the level of health complaints and percentage of poverty that has decreased, then the percentage of old school expectations, percentage of the average length of the school and the percentage of gross domestic product increased.
The general model obtained from the Principal Component Regression method in overcoming multicollinearity problems is 5  Different from the Ridge Regression model, Human Development Index (HDI) will increase if the level of health complaints and poverty decreases, then the expectation level of school years, the average length of schooling, and gross domestic product increases. However, in the Principal Component Regression model, only the level of health complaints is still increasing. So, the best model of the two methods is the Ridge Regression method.
If you look at it from the calculation point of view, the reference here is the smallest MSE and the largest Coefficient of Determination. The value of MSE Regression Ridge = 0,0007< the value of MSE Regression Main Components = 2.51. Then for the value of 2 Ridge Regression = 97.7% > the value of 2 Regression Main Components = 91.0%. The provisions that apply to the selection of the best method have been met, which have a small MSE value and a large 2 . So, it can be concluded that the best method for overcoming the multicollinearity problem in the Regency / City Human Development Index (HDI) data in East Java in 2018 is Ridge Regression Analysis.

Conclusion
Based on the results and discussion, the value of MSE Regression Ridge is less than the value of MSE Regression Main Components. Then for the value of 2 Ridge Regression is more than the value of 2 Regression Main Components. The provisions that apply to the selection of the best method have been met, which have a small MSE value and a large 2 . So, it can be concluded that the best method for overcoming the multicollinearity problem in the Regency / City Human Development Index (HDI) data in East Java in 2018 is Ridge Regression Analysis. The drawback of previous research is that the multicollinearity assumption test is not strong enough and uses only one method to overcome it. If more than one method, which one is more accurate and reliable can be compared. The author expects a more in-depth study of the Human Development Index (HDI), what other aspects are needed from each of the main Human Development Index (HDI) factors to determine the increase in Human Development Index (HDI) in an area.