VIOLATION OF THE ASSUMPTION OF HOMOSCEDASTICITY AND DETECTION OF HETEROSCEDASTICITY

Original scientific paper Abstract: In this paper, it is assumed that there is a violation of homoskedasticity in a certain classical linear regression model, and we have checked this with certain methods. Model refers to the dependence of savings on income. Proof of the hypothesis was performed by data simulation. The aim of this paper is to develop a methodology for testing a certain model for the presence of heteroskedasticity. We used the graphical method in combination with 4 tests (Goldfeld-Quantum, Glejser, White and Breusch-Pagan). The methodology that was used in this paper showed that the assumption of homoskedasticity was violated and it showed existence of heteroskedasticity.


Introduction
Econometrics is a discipline that determines the connection between economic phenomena and confirms or does not confirm economic theory, starting from mathematical equations and forming econometric models suitable for testing. Regression analysis is one of the most commonly used tool in econometrics to describe the relationships between economic phenomena. One of the classic assumptions of linear regression is homoskedasticity. Homoskedasticity implies that the variance of random error is constant and equal for all observations. When the random errors of the classical linear regression model are not homoskedastic, then they are heteroskedastic (Mladenović & Petrović, 2017).
The main goal of the paper is to show how the linear regression model behaves in conditions of violating the assumption of homoskedasticity and how this violation is detected. The basic contribution of the paper is that in one place it gives a developed method of detecting violating of homoskedasticity, ie the existence of Djalic and Terzic/Decis. Mak. Appl. Manag. Eng. 4 (1) (2021) 1-18 2 homoskedasticity in linear regression models. This paper presents a methodology for detecting heteroskedasticity in linear regression models by a combination of a graphical method and four tests.
After the introduction, a review of the literature was performed, after which the basics of heteroskedasticity were presented. In this part of the paper, the Goldfeld-Quantum, Glejser, White and Breusch-Pagan tests are presented. At the end of the paper, concluding remarks were made and recommendations for further research were given. Aue et al. (2017) state that heteroskedasticity is a common characteristic of financial time series and most often refers to the process of model development using autoregressive conditional heteroskedastic and generalized autoregressive conditional heteroskedastic processes. Ferman & Pinto (2019) formed a model of inference that works with adjusting differences in differences with several treated and many controlled groups in the presence of heteroskedasticity. Charpentier et al. (2019) developed the Gini-White test, which shows greater strength in solving the problem of heteroskedasticity than the ordinary White test in cases when external observations affect the data. Moussa (2019) analyzes cases in which heteroskedasticity is the result of individual effects or idiosyncratic errors, or both. Linton & Xiao (2019) study the effective estimation of nonparametric regression in the presence of heteroskedasticity and conclude that in many popular nonparametric regression models their method has a lower asymptotic variance than the usual unweighted procedures. A large number of authors pay attention to heteroskedasticity and develop models for solving certain problems (Baum & Schaffer, 2019;Brüggemann et al., 2016;Lütkepohl & Netšunajev, 2017;Cattaneo et al., 2018;Ou et al., 2016;Sato & Matsuda, 2017). Taşpınar et al. (2019) investigate the properties of finite samples of the heteroskedasticity-robust generalized method of moments estimator (RGMME), ie develop a robust spatial econometric model with an unknown form of heteroskedasticity. Crudu et al. (2017) propose a new inference procedures for models of instrumental variables in the presence of many, potentially weak instruments that are robust to the presence of heteroskedasticity. Lütkepohl & Velinov (2016) compare models of long-term restriction that are widely used to identify structural shocks in vector autoregressive (VAR) analysis based on heteroskedasticity. Harris & Kew (2017) test adaptive hypotheses for a fractional differential parameter in a parametric ARFIMA model with unconditional heteroskedasticity of unknown shape. In the case of heteroskedasticity, there are occasionally precise theoretical reasons for assuming that the errors have different variances for different values of the independent variable. Very often, arguments for the presence of heteroskedasticity are so well defined, and sometimes there is a vague suspicion that the assumption of homoskedasticity is too strong (Barreto & Howland, 2006). It is important to note that heteroskedasticity is a common occurrence in spatial samples due to the nature of collection of data. Obvious sources of heteroskedasticity are associated with different dimensions for different regions in the study area, unequal concentrations of population and economic activity in rural and urban areas (Arbia, 2006). Baum & Lewbel (2019) provide advice and guidance to researchers who wish to use tests to check heteroskedasticity.

Methodology
The simplest form of linear regression, which shows a linear relationship between two phenomena, is a simple linear regression: is a random error that we make during linear regression, and α and β are unknown parameters. To estimate the unknown parameters, we use a sample. For fixed n values of the independent variable the values of the variable are determined. In this way, n pairs ( 1, 1), ( 2, 2), … , ( n, n) are obtained, which forming the model of the simple linear regression sample: The assumption of homoskedasticity for the random variable i is: When this assumption is violated, that is, when the random errors of the classical linear regression model do not satisfy this characteristic, then they are heteroskedastic.
If the assumption of homoskedasticity (Jovičić, 2011): is not met, but the variances are different and valid: respectively (Mladenović & Petrović, 2017), it can be said that the errors are heteroskedastic or there is heteroskedasticity in the model. Figure 1 presents a model where heteroskedasticity of the error is assumed. The growth of savings with increasing income is shown, where the variance of savings is smaller with different income levels. The variance is not constant, but increases with the growth of income, which corresponds to real economic relations (Mladenović & Petrović, 2017). Heteroskedasticity can also be caused by errors of specification . For example, by omitting an important regressor whose influence will be covered by the error, a different variation of the error for different observations can be obtained. Similarly, the wrong functional form of the model can lead to heteroskedasticity of the error. As data collection techniques are advancing, which implies the provision of representative samples for statistical processing, so do errors and thus their dispersions are decreasing. And this may be another reason for the occurrence of heteroskedasticity.

Consequences of heteroskedasticity
The presence of heteroskedasticity in the model of dependence of savings on income can be represented on the basis of the following point scatter diagram ( Figure  2): Violation of the assumption of homoscedasticity and detection of heteroscedasticity  (Mladenović & Petrović, 2017) Estimates of unknown parameters using the ordinary least squares method are determined from the condition that the residual sum of squares, 2 i e  , is minimal. In that case, all squares of the residuals have the same weight, ie they give the same information when forming the necessary estimates. This condition is not precise enough for the sample presented in Figure 2. Data that are far from the sampling regression line provide less useful information about its position than those that are closer to it. Higher residual values in absolute terms correspond to more distant data. These residues dominate in the total residual sum of squares. Therefore, it is realistic to expect that the application of ordinary least squares method does not provide estimates with desirable statistical properties.
Suppose that in the model (Mladenović & Petrović, 2017): there is heteroskedasticity: The estimate b of the parameter β, obtained using the ordinary least squares method, is unbiased, because the corresponding proof does not use the assumption of the stability of the variance of the random error.
To determine the variance of the estimate b we start from the expression: Djalic and Terzic/Decis. Mak. Appl. Manag. Eng. 4 (1) (2021) 1-18 6 based on which the variance is: In the Eq. (11), all elements of the form   ij E  , i ≠ j are equal to zero. The expression for the variance of the estimate b is: The variance of the estimate b, in a simple linear regression model, is given by the following expression: When the existence of heteroskedasticity is neglected, the estimate of the variance of the estimate b is obtained by the following formula: When the variance of the random error grows in parallel with the explanatory variable then the estimate 2 b s underestimates the actual variance of the estimate b. This arises because the estimate of the random error variance, 2 s , underestimates the actual random error variance of the initial model.
Thus, the properties of the estimates of parameter obtained by applying the ordinary least squares method in the presence of heteroskedasticity are: 1. The ratings are unbiased, 2. Estimates do not have minimal variance, that is, they are ineffective.
Violation of the assumption of homoscedasticity and detection of heteroscedasticity 7 3. The assessment of the variance of a random error underestimates, in most cases, the actual variance. Therefore, the estimate of the variance of the estimate of slope, , also underestimates the variance . 4. Confidence intervals and tests based on the assessment of the variance of a random error are unreliable.

Testing of heteroskedasticity
The true nature of heteroskedasticity is usually unknown, so the choice of the appropriate test depends on the nature of the data. But as the amount of error variation around the mean value typically depends on the height of the independent variables, all tests rely on examining whether the error variance is some function of the regressor. Certain methods for testing the existence of heteroskedasticity are presented below.

Graphic method
One of the simplest methods for examining the existence of heteroskedasticity consists in visually viewing the residuals of the estimated model. It is common to form point scattering diagram of residual or their absolute value, i e , and independent variable i x . Since the variance of a random error   Based on the point scatter diagrams, we can conclude about whether heteroskedasticity exists, and if so, in what form it occurs, ie how the variance of random error is generated. Figure 3 presents graphs of some of the possible point scatter diagrams (Mladenović, 2011). The first graph corresponds to a model in which there is no systematic dependence between the variances of random errors and the independent variable . i x In such a model, random errors can be considered homoskedastic. Other graphs show the regularity in the position of the points on the scatter diagram, suggesting possible heteroskedasticity. The second graph indicates a linear dependence, while the third and fourth graphs represent the dependence expressed in square form, in the sense that the variance of the random error is correlated with 2 i x .
Graphic methods are only a means of preliminary analysis. In order to get a more precise answer to the question of whether heteroskedasticity is present or not, it is necessary to use appropriate tests.

Goldfeld-Quandt test
One of the earliest, which is very simple and often used is the Goldfeld-Quandt test (Kalina & Peštová, 2017). This test tests the null hypothesis of random error constancy versus alternative that the variance of a random error is a linear function of the independent variable. It is assumed that the random error is non-autocorrelated and with normal distribution. The test procedure is as follows (Mladenović & Petrović, 2017):  e e e    . In this case, the quotient of these two sums is close to the value of 1. On the contrary, the existence of heteroskedasticity results in a higher value of the residual sum where the k is the number of parameters for evaluation in the known model. It follows that the observed relationship: Therefore, the Goldfeld-Quandt test statistic is in a form: If the calculated value of F -statistics is higher than the corresponding critical value at a given level of significance, we conclude that there is heteroskedasticity in the model.

Glejser test
The application of this test does not require a priori knowledge of the nature of heteroskedasticity, but it is reached during the testing. The test procedure is as follows (Im, 2000):  The initial regression  The statistical significance of the evaluation of the parameter 1  is tested using the t-test.  The coefficients of determination obtained for different values of the parameter h are compared. The statistical significance of the estimate 1  leads to the conclusion that there is heteroskedasticity. The very character of heteroskedasticity is determined according to the regression with the highest value of the coefficient of determination.

White test
The test is based on the comparison of the variance of the estimators obtained by the method of ordinary least squares in the conditions of homoskedasticity and heteroskedasticity. If the null hypothesis is correct, the two estimated variances would differ only due to fluctuations in the sample. The null hypothesis about the homoskedasticity of a random error is tested against the widely placed alternative hypothesis that the variance of a random error depends on the explained variables, their squares and intermediates, ie. the variation of the residuals under the combined action of the regressors is examined.
The White test consists of the following steps (White, 1980): Step 1: The model  Step 3: Based on the value of the coefficient of determination from the auxiliary regression, Step 4: If the calculated value of the test statistics is greater than the tabular value, ie if the coefficient of determination in the auxiliary function of the residual square is high enough, the homoskedasticity hypothesis is rejected.
The White test is not sensitive to the deviation of errors from normal and it is simpler, so it is more often used to test the existence of heteroskedasticity. In the case that there are multiple regressors, the introduction of squares and all intermediates in the auxiliary regression can mean a large loss in the number of degrees of freedom. That is why the White test is often performed without intermediates.

Breusch-Pagan test
This test is based on the idea that the estimates of the regression coefficients obtained by the least squares method should not differ significantly from the maximally plausible estimates, if the homoskedasticity hypothesis is true (Halunga et al., 2017). The null hypothesis about the homoskedasticity of random error is tested against the broadly set alternative hypothesis about the influence of a number of factors on the variance of random error. For simplicity, assume that test examines the influence of the explanatory variable i X in simple regression. The testing procedure is as follows (Mladenović & Nojković, 2017) has 2 X distribution with one degree of freedom. The heteroskedasticity hypothesis will be accepted when the value of the calculated ratio 2 2ˆi g  is greater than the critical value of 2 X distribution with one degree of freedom. After the obtained coefficients, the analysis of model variance was performed (Table 3): The coefficient of determination was determined, 2 R = 0.669. Figure 4 graphically shows the simulation model. is shown by full line. The graph clearly shows that the scatterings are higher for higher values of the independent variable X and that sample line Ŷ slightly deviates from the line Y . After the graphical representation of the model, it can be assumed that certain deviations exist, so we will test the heteroskedasticity with the previously described tests. Figure 5 in graph (a) clearly shows the relationship between the residuals and the independent variable X (the larger X , the larger residuals), while in diagram (b) the dependence of the squared residuals with respect to X can be seen (the dependence in the square form).

Goldfeld-Quandt test
After the order of observations in ascending order of magnitude X two models (for the first 15 and last 15 observations) of linear regression As the critical value of the -distribution with 13 and 13 degrees of freedom and a significance level of 0.05 is 2.58, this test shows that heteroskedasticity is present (the value of the test statistics is higher than the critical value).

Glejser test
Three linear regression models are being tested: The estimated parameters that stand next to the regressors are statistically significant. All parameters are suitable for testing the hypothesis of heteroskedasticity, and based on the coefficient of determination, the first is preferred because it is the largest. This test also shows the presence of heteroskedasticity.

White test
Auxiliary linear regression was estimated: The critical value of the 2 distribution with one degree of freedom and a significance level of 0.05, is 3.841, so it is also concluded that heteroskedasticity is present.

Discussion
After testing, it is clear that all four tests show the presence of heteroskedasticity in a given model. The Goldfeld-Quandt test shows that the F -distribution is equal to 2.58 and it is higher than the corresponding critical value at a given level of significance (0.05). Based on this we can conclude that heteroskedasticity is present in the model. In the Glejser test the parameter 1  is tested and the coefficients of determination obtained for different values of the parameter h are compared. In this model (Table 5) all parameters are suitable for testing the hypothesis of heteroskedasticity, and based on the coefficient of determination, the first is preferred because it is the largest. This test also shows the presence of heteroskedasticity. White test shows that the calculated value (18.21) of the test statistics is greater than the tabular value and we can conclude heteroskedasticity is present. In the Breusch-Pagan test the value of the calculated ratio is 20.33 and it is greater than the critical value of 2 X distribution that is 3.841 with one degree of freedom, and we also can conclude that heteroskedasticity is present.

Conclusion
One of the classic assumptions of linear regression is homoskedasticity, and when it is disturbed, heteroskedasticity occurs. Graphical methods and heteroskedasticity tests are used to detect heteroskedasticity, although it is not possible to say with certainty which test is the best. In this paper, we explained and applied the graphical method and four tests (Goldfeld-Quantum, Glejser, White and Breusch-Pagan test). Through a review of the literature, it can be seen that many authors have addressed this issue and used various tests to detect heteroskedasticity.
The tests were applied by data simulation. It can be seen that the graphical method and all four applied tests confirm the presence of heteroskedasticity, so we can conclude that all four tests showed a good result and that it can be confirmed the assumption of the existence of heteroskedasticity in the model.
Future researchers are left with the question of solving heteroskedasticity, ie the question of removing heteroskedasticity from the model. When eliminating heteroskedasticity, care must be taken which method can be used depending on the form 2 i  .
Author Contributions: Each author has participated and contributed sufficiently to take public responsibility for appropriate portions of the content.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflicts of interest.