Testing under a special form of heteroscedasticity

ABSTRACT In the presence of heteroscedasticity, conventional standard errors (which assume homoscedasticity) can be biased up or down. The most common form of heteroscedasticity leads to conventional standard errors that are too small. In this study, we discuss the conditions under which conventional standard errors are too large. In such settings, standard tests of heteroscedasticity may fail and leave the heteroscedasticity undetected. This is particularly problematic as power gains can be achieved when testing for the causal effect in such settings.


I. Introduction
In the June 2011 issue of the American Economic Review, Vikesh Amin commented on an article by Dorothe Bonjour et al. published in December 2003 also in the American Economic Review. Bonjour et al. (2003) estimated the private return to education using a dataset containing 428 female monozygotic twins. One of their main findings was an estimated return to one additional year of education of 7.7%, which is statistically significant at the 5% level. Amin (2011) replicated their results and performed similar estimations where he excluded outliers. He found that many of Bonjour et al.'s withintwin pair estimates became smaller in magnitude and significant only at lower levels or insignificant when removing these extreme values. In this study, we show that the inference in Amin (2011) is mostly incorrect due to the presence of a special form of heteroscedasticity. The correct standard errors turn out to be around 15% lower, leading to different policy conclusions. In contrast to Amin (2011), we find a significant positive return to education for most of the within-twin pair regressions.
In Section II, we provide a theoretical background for the situation when an upward bias in conventional standard errors occurs. There we also discuss the difficulties in using standard tests for heteroscedasticity in such settings. We then propose a heteroscedasticity test which has better power properties. Section III presents the results of a series of Monte Carlo simulations based on data exhibiting this special form of heteroscedasticity. In Section IV, we use three test procedures to test for heteroscedasticity in Bonjour et al.'s dataset. The Koenker variant of the Breusch-Pagan test and the White test do not reject the hypothesis of homoscedasticity, which is as expected, due to the special form of heteroscedasticity present. However, our proposed test rejects the null hypothesis, in favour of the special form of heteroscedasticity. Also in Section IV, we present the within-twin pair regressions using the appropriate standard errors. Section V concludes.

II. Inference problems
In the presence of heteroscedasticity, conventional standard errors (which assume homoscedasticity) can be biased up or down. The most common form of heteroscedasticity, where the residual variance rises in increasing regressor values, usually leads to conventional standard errors that are too small. When Wald tests based on these standard errors are insignificant, heteroscedasticity robust standard errors do not change inference. On the other hand, inference is conservative in a setting with upward-biased conventional standard errors. Using heteroscedasticity robust standard errors may change inference in this case. Angrist and Pischke (2010) Let σ 2 β denote the true sampling variance for the estimatorβ and let ½σ 2 β conv denote the corresponding conventional standard error, assuming homoscedasticity. Then An upward bias in conventional standard errors occurs if there is a negative covariance between the squared residual e 2 i and the squared deviation of x i from its mean x. The further away the observation x i is from x, the smaller becomes Var½e i jx i ¼ E½e 2 i jx i , the conditional variance of residual e i . When Cov½e 2 i ; ðx i À xÞ 2 < 0, the corresponding scatter plot of e i on the regressor x i often resembles an ellipse. That is why we refer to this form of heteroscedasticity as elliptical heteroscedasticity.
If the data exhibit elliptical heteroscedasticity, the usual Wald tests for hypotheses about β in the bivariate regression model using conventional standard errors give an actual size smaller than the nominal size. Policy conclusions based on estimates with conventional standard errors are thus conservative. Conversely, Wald tests using heteroscedasticity robust standard errors are size-correct and therefore yield valid policy conclusions. Furthermore, heteroscedasticity robust Wald tests lead to power gains compared with tests using conventional standard errors in this case.
By exploiting Equation (2), we can test specifically for elliptical heteroscedasticity in the classical bivariate regression model.
To derive our elliptical heteroscedasticity test, consider the regression where the squared residuals e 2 i are obtained from the regression of y i on x i .
Under elliptical heteroscedasticity, we know that Cov½e 2 i ; ðx i À xÞ 2 < 0 and therefore Hence, by exploiting this knowledge we can test specifically for elliptical heteroscedasticity.
Our elliptical heteroscedasticity test conducts a one-sided Wald test for H 0 : δ 1 ! 0 against H a : The hypotheses are H 0 : no elliptical heteroscedasticity and H a : elliptical heteroscedasticity.

III. Monte Carlo simulations
To illustrate the testing issues arising from elliptical heteroscedasticity, we run a series of Monte Carlo simulations. The design of our Monte Carlo simulations is based on the following data generating process. We chose the model so that the shape of the resulting y-x scatter plot resembles Panel A, Figure 1 in Amin (2011). For values of a between 0.15 and 0.3, the y-x scatter plot is most similar to Panel A. The operator ½: rounds x Ã i to the nearest integer. Hence, x i is an integer, just as the withintwin difference in estimated years of schooling in Bonjour et al. (2003). Furthermore, also in accordance with the within-twin difference in estimated years of schooling, the values of x i are centred around the mean x. The structure of the error term e i implies that Cov½e 2 i ; ðx i À xÞ 2 < 0 if a > 0. The larger is the parameter a, the more negative is the covariance between e 2 i and ðx i À xÞ 2 , and therefore, the stronger is the upward bias caused by elliptical heteroscedasticity. For a ¼ 0, the error term is homoscedastic. The number of observations is set to N ¼ 214, as in Bonjour et al.'s dataset, and additionally to N ¼ 2140. The number of replications is 10 000.
In each simulation, we evaluate the size and power of three different tests for heteroscedasticity: the Koenker (1981) variant of the Breusch-Pagan (1979) test, which drops the assumption of normality, with x as the independent variable, the White (1980) test and our elliptical heteroscedasticity test introduced in Section II. In addition, we compare the size for the parameter of interest in the causal model using Wald tests for the hypothesis H 0 : β ¼ 0:04 against H a : β Þ 0:04 in the regression of y i on x i using robust and conventional standard errors. Figure 1, Panel A shows power plots for the heteroscedasticity tests. The simulation with a ¼ 0 gives the actual size of each test. While the rejection frequency of the Breusch-Pagan and White test is close to the given significance level of α ¼ 5% for N ¼ 214, the actual size of the elliptical heteroscedasticity test is above this value, with 11:9%. However, the actual test size for the latter test approaches the theoretically given significance level for larger numbers of observations. The simulation with N ¼ 2140 yields an actual size of 7:7% for the elliptical heteroscedasticity test.
For a > 0, Figure 1, Panel A displays the power of each test. The rejection frequency of the White test and our elliptical heteroscedasticity test increases with stronger elliptical heteroscedasticity, i.e. with increasing values of a. Compared with the elliptical heteroscedasticity test, the White test performs worse in detecting heteroscedasticity, although the difference in power gets smaller for larger values of a. In contrast to the elliptical heteroscedasticity test, the White test does not have elliptical heteroscedasticity as the alternative hypothesis, but rather heteroscedasticity in general. The less specific formulation of H a may explain the White test's worse performance. The Breusch-Pagan test has considerably smaller rejection frequencies than the two other tests throughout the whole range of a > 0. For N ¼ 214 and N ¼ 2140, it does not reach a power of 5% for any given positive value of a. This result is related to the fact that the basic specification of the Breusch-Pagan test is for detecting linear  forms of heteroscedasticity, whereas elliptical heteroscedasticity implies a nonlinear form of heteroscedasticity. Figure 1, Panel B displays the size of the Wald tests with H 0 : β ¼ 0:04 for different values of a using conventional and robust standard errors. Under homoscedasticity, a ¼ 0, both test versions' sizes are close to the given significance level of 5% for N ¼ 214 and equal to α ¼ 5% for N ¼ 2140. In the presence of heteroscedasticity, a > 0, the Wald tests using robust standard errors yield also a size around 5%. The size of the Wald tests using conventional standard errors, however, decreases with increasing a. At a ¼ 0:5, the Wald test with conventional standard errors has an actual size of 0:1% for N ¼ 214 and N ¼ 2140, respectively. Hence, t-tests with conventional standard errors do not reject the correct null hypothesis often enough for a > 0. This is due to the upward bias in conventional standard errors in this case. Similarly, the Wald tests with conventional standard errors lose power with increasing a. Conversely, the heteroscedasticity robust Wald tests become more powerful as a gets larger. 2

IV. Empirical illustration: returns to education
Amin (2011) excluded up to four twin pair outliers. These were determined on the basis of the absolute between-twin difference in hourly wages. Figure 1, in Amin (2011) illustrates which data points he removed. Figure 1, Panel A, in Amin (2011) already suggests that the data exhibit the elliptical heteroscedasticity discussed in Section II, which leads to an upward bias in conventional standard errors.
To test for the presence of heteroscedasticity, we performed the three tests outlined in Section II for all within-twin pair OLS and IV regressions in columns 3, 4 and 7, 8 of Table 1 in Amin (2011). In all regressions, the dependent variable is the within-twin difference in log hourly wages. The regressor of interest is the withintwin difference in self-reported education. In the IV regressions, this variable is instrumented by the within-twin difference in the co-twin's report of the other twin's education. The regressions in columns 7 and 8 include the covariates withintwin difference in marital status, current job tenure, part-time status and whether a person lives in London or the southeast of the UK. Table 1 provides the p-values for the Koenker variant of the Breusch-Pagan test with within-twin difference in estimated years of schooling as the only independent variable, the White test and our proposed elliptical heteroscedasticity test from Section II. In the regressions including covariates, we partialled them out before testing. The elliptical heteroscedasticity test rejects for all regression specifications at least at the 10% level. In contrast, the Breusch-Pagan and White test do not reject the hypothesis of homoscedasticity in any regression. This can be attributed to the difficulties and lower power in detecting elliptical heteroscedasticity when using more general tests as discussed in Section II. Based on our proposed elliptical heteroscedasticity test, there is evidence for the presence of elliptical heteroscedasticity in the data. Hence, conventional standard errors are incorrect and may lead to false policy conclusions. Table 1 also shows the causal effect of education on average wages. In addition to the replications using conventional standard errors, Table 1 reports robust standard error estimates and the corresponding significance levels. In all but two regressions, the robust standard error is smaller than the conventional one. This result is in line with the suspicion that elliptical heteroscedasticity is present in the data, which causes an upward bias in conventional standard errors. It also supports the conclusions from our elliptical heteroscedasticity test. In many regressions where the causal effect is insignificant using conventional standard errors, it becomes significant at the 5% or 10% level when using robust standard errors. With conventional standard errors, 13 out of the 20 regressions yield an insignificant parameter estimate. In contrast, only in 3 out of the 20 regressions do we fail to find a return to education significantly different from 0 when (correctly) using robust standard errors.

V. Conclusion
In this study, we discuss the conditions under which conventional standard errors are upward biased. In such settings, standard tests of heteroscedasticity may fail and leave the heteroscedasticity undetected. When Wald tests based on downward-biased conventional standard errors are insignificant, heteroscedasticity robust standard errors do not change inference. On the other hand, inference is conservative in a setting with upward-biased conventional standard errors. We discuss the power gains when using robust standard errors in this case and also potential problems of heteroscedasticity tests. In Monte Carlo simulations, we show that our proposed heteroscedasticity test has a far better actual size. In our application only this test detects the heteroscedasticity, and using then the appropriate standard errors leads to different test decisions.

Disclosure statement
No potential conflict of interest was reported by the authors.