On the Detection of Heteroscedasticity by Using CUSUM Range Distribution

The problem of detecting heteroscedasticity among the uncorrelated error terms has been an interesting topic for many years in Statistical, Econometric and Financial analysis. In the past, the hetoeroscedasticity was studied by several people. Morgan (1939), Pitman (1939), Wilks (1946) were some of the pioneers to investigate heteroscedasticity in the variability of the error terms. Later Valavanis (1959), Cacoullos (1965), Cacoullos (2001), Goldfeld & Quandt (1973), Chen & Gupta (1997), Kanbur et al. (2010), Kanbur et al. (2009) are among many others who have studied the heteroscedasticity in various contexts. In regression, the residuals can be tested for heteroscedasticity by using the tests such as Pittman-Morgan t-test or the Breusch-Pagan test which regresses the squared residuals to the independent variables. However, the Breusch-Pagan test is very sensitive to the error normality and hence a more robust Koenkar-Basset test (or the generalized Breusch-Pagan test) is preferred. For testing group-wise heteroscedasticity, the Goldfeldt-Quandt test and the Levene test are commonly used.


Introduction
The problem of detecting heteroscedasticity among the uncorrelated error terms has been an interesting topic for many years in Statistical, Econometric and Financial analysis. In the past, the hetoeroscedasticity was studied by several people. Morgan (1939), Pitman (1939), Wilks (1946) were some of the pioneers to investigate heteroscedasticity in the variability of the error terms. Later Valavanis (1959), Cacoullos (1965), Cacoullos (2001), Goldfeld & Quandt (1973), Chen & Gupta (1997), Kanbur et al. (2010), Kanbur et al. (2009) are among many others who have studied the heteroscedasticity in various contexts. In regression, the residuals can be tested for heteroscedasticity by using the tests such as Pittman-Morgan t-test or the Breusch-Pagan test which regresses the squared residuals to the independent variables. However, the Breusch-Pagan test is very sensitive to the error normality and hence a more robust Koenkar-Basset test (or the generalized Breusch-Pagan test) is preferred. For testing group-wise heteroscedasticity, the Goldfeldt-Quandt test and the Levene test are commonly used. Kanbur et al. (2007) developed a nonparametric test for heteroscedasticity of the residual errors by using the CUSUM ranks. However, this test uses bootstrap based numerical simulations in this nonparametric approach to test for heteroscedasticity. Generally speaking, the nonparametric tests are less powerful. So, in this paper, we develop a theoretical (parametric) test based on the CUSUM Range. This test is theory based and so does not require bootstrap as a tool to detect heteroscedasticity. We use the same data that is used in Kanbur et al. (2007) for the comparison.

Methodology
We will analyze the error residuals of a simple linear regression in order to check for a possible heteroscedasticity among the error variability. Here, the response variable is the Cepheid Luminosity and the predictor is the natural logarithm of the Cepheid Periodicity. The Cepheids are pulsating stars from the nearby galaxy LMC (Large Magellanic Clouds).
for i = 1, 2, ..., n where L represents the luminosity; P the periodicity and ϵ, the noise in the data. We use the least square method to estimate the unknowns. Let where a and b are the least square estimates of α and β respectively.
As noted in Koen et al (2007) when there are no deviations from the linearity (or in other words when the error variability is homoscedastic) then C( j) is the sum of the uncorrelated random variates and hence it is a random www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 3;2015 walk. On the other hand, when there is a violation then C( j) will not be a random walk. Here, we will use the range statistic Next, we present a table for the possible values of the range statistic. This table is very helpful in constructing the probability distribution for the Range Statistic. Note that n represents the sample size. Table 3. n = 4 Remark: Based on the probability weight pattern, we have developed a formula for the probability weight where n equals the number of observations and i equals the number of error terms in the CUSUM Range.
Next, we derive the distribution for the Range (R). Note that the range R can be written in the following form as the error terms are exchangeable (due to being independent and identical in distribution).
In the general case, given there are i number of error terms then, Remark: Overall, the CUSUM Range is a mixture distribution.
Next, we present the CUSUM Range distribution.
Lemma 1. Let R represent the CUSUM Range. Then, Proof. See Appendix Section.
Remark: The CUSUM Range is a mixture of folded normal variates. Its density is given by Also, the conditional density function is given by Result 1: The conditional expected value for the CUSUM Range is given by Result 2: The conditional second moment for the CUSUM Range is given by .
As we noted earlier when there is homogeneity among the error variability, the Range follows a mixture of folded normal distribution. Hence its conditional expected value and the conditional second moment satisfy equations (12) and (13).
In the next section, we present the numerical results.

Numerical Results
Here in this section, we present the numerical results based on the simulated data and the actual data.

Simulated Data
The following table presents the numerical values (based on the simulated data) for the conditional expected values given by (12) and the simulation based empirical estimates for the conditional expected values.
Case 1: No change in error variability (n = 10, σ ϵ = 0.022 ) A random sample of error terms of size = 10 was simulated according to a normal distribution with mean = 0 and standard deviation σ ϵ = 0.022. This sample of size = 10 was simulated several times to compute the empirical estimate. Case 2: Change in error variability (n = 20, σ 1ϵ = 0.022, σ 2ϵ = 0.045) A random sample of 20 error terms was simulated several times with 5 of these error terms following a normal distribution with mean = 0 and standard deviation σ 1ϵ = 0.022, and the other 15 error terms following a normal distribution with mean = 0 and standard deviation σ 2ϵ = 0.045. As we can see from the simulation results, there is a difference between the formula based conditional expected values and the empirical estimates based on the simulation. This result supports the fact that there is a change in the error variability (as is the case with this simulation).
Note: The graph in blue represents the cumulative probability distribution based on the CUSUM Range from the simulated error terms. The graph in green is for the residual errors with homogeneous variance and the graph in red is for the residual errors with heterogeneous variance.

A Real Application
We can use the concept of error homogeneity (heterogeneity) to check whether the relationship between two quantitative variables is linear or not. For example, when the relationship is actually not linear (or when there is a change point) then the residual error variance will not be homogeneous under the assumption of linearity. This is the concept that we are about to use to check for a possible linear relationship between the Cepheid Period and the Luminosity in the next example.
Example (based on Actual Data) Here, we consider an astrophysical data set that has 1779 observations about the Cepheid Period and Luminosity Relationship. In order to check whether this relationship is linear or not, we drew several random samples of size n=10 and again several random samples of size n=20 from this data set. Just like we did in the previous simulation, we computed the empirical (cumulative) distribution for the CUSUM Range and the theoretical (cumulative) distribution for the CUSUM Range based on at selected values of x as indicated below. The Goodness-of-Fit test and the Kolmogorov-Smirnov test clearly indicate a difference between the empirical distribution and the theoretical distribution (which assumes a linear relationship between the predictor and the www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 3;2015 response variable). This means that in this astrophysical data, the relationship is not linear.

Discussion and Conclusion
This paper gives a mathematical justification as to how the CUSUM Range distribution is affected by the residual error heteroscedasticity. Unlike the papers that were published in the past on the topic of error heteroscedasticity, this paper presents a very simple method to detect error heteroscedasticity. The paper uses only the assumption of normality among the error terms, and this is always an assumption in the Linear Models and to a larger extent in the Time series Models. Moreover, this research confirms as in the previous papers that the Cepheid Period-Luminosity relationship is not linear.