Estimation and Testing in One-Way ANOVA when the Errors are Skew-Normal

We consider one-way analysis of variance (ANOVA) model when the error terms have skewnormal distribution. We obtain the estimators of the model parameters by using the maximum likelihood (ML) and the modified maximum likelihood (MML) methodologies (see, Tiku 1967). In the ML method, iteratively reweighting algorithm (IRA) is used to solve the likelihood equations. The MML approach is a non-iterative method used to obtain the explicit estimators of model parameters. We also propose new test statistics based on these estimators for testing the equality of treatment effects. Simulation results show that the proposed estimators and the tests based on them are more efficient and robust than the corresponding normal theory solutions. Also, real data is analysed to show the performance of the proposed estimators and the tests.


Introduction
Consider the following one-way ANOVA model, y ij = µ + α i + ij , i = 1, 2, . . . , a; j = 1, 2, . . . , n where, y ij are the responses corresponding to jth observation in the ith treatment, µ is the overall mean, α i is the effect of ith treatment and ij are the independent and identically distributed (iid) random error terms.
In general, normality assumption is made for the random error terms and the well known least squares (LS) method is used for estimating model parameters. However, in the literature, there are numerous studies pointing out that nonnormal distributions are more prevalent than normal distribution , in practice, see for example, (Pearson 1932, Geary 1947, Huber 1981, Tan & Tiku 1999. It is known that LS estimators of the parameters and the test statistics based on them lose their efficiency when the normality assumption is not satisfied, (see, Tukey 1960). That is why there is great interest in studying the effect of nonnormality on the F statistics used for testing the main effects and the interaction in the framework of experimental design; see, for example, (Geary 1947, Srivastava 1959, Donaldson 1968, Spjotvoll & Aastveit 1980, Tan & Tiku 1999, Senoglu & Tiku 2001). The following conclusions have been drawn from these studies. For numerous non-normal distributions: i. Type I error of the F statistic is not much different than that for a normal distribution. This is essentially due to the central limit theorem.
ii. Power of the F test is considerably lower than that for a normal distribution. This is essentially due to the inefficiency of the sample mean.
See Senoglu & Tiku (2001) and the references therein. These conclusions are particularly true for non normal distributions having skewness in different directions (Senoglu & Tiku 2002). Therefore, it is necessary to obtain new F statistics whose distribution provides satisfactory approximations to the percentage points of the null distribution when the distribution of the error terms is non-normal (see condition i). The proposed test should also maintain higher power than the classical F test based on LS estimators (see condition ii).
There are various ways of analyzing non-normal data, such as Box-Cox normalizing transformation and nonparametric methods. However, in this study, we adopt the parametric ML and MML methods where original data are used rather than transformed data. In the ML method, the likelihood equations are solved iteratively by using the iteratively reweighting algorithm (IRA). However, in the MML method, the explicit estimators of model parameters are obtained by approximating the likelihood equations.
In this study, we assume that the distribution of the error terms in one-way ANOVA model in (1) is Azzalini's skew-normal (Azzalini 1985(Azzalini , 1986) and obtain the ML and the MML estimators of the model parameters. We then propose new 77 test statistics based on these estimators. To the best of our knowledge, there is no previous work assuming SN (λ) as an error distribution in the context of ANOVA. The reason for choosing the SN (λ) as an error distribution is that it includes the normal distribution as well as plausible alternatives thereof with different levels of skewness and kurtosis. Therefore, SN (λ) distribution is considered to be an extension of normal distribution. This provides us flexibility for modeling the data with normal − like shape but with skewness and heavy tails. It is also useful for modeling the data having normal distribution with outliers and contamination. Its mathematical tractability is another reason for using SN (λ) in this study.
The rest of the paper is organized as follows. In Section 2, SN (λ) distribution is introduced. The ML and the MML estimators are derived in Section 3 and Section 4, respectively. Efficiencies of the ML and the MML estimators are compared via Monte Carlo simulation study in Section 5. New test statistics for testing the equality of treatment effects are proposed in Section 6. Power comparisons and robustness properties of these tests are also given in this section. A real life example is analyzed in Section 7 to present the application of the proposed estimators and the tests based on them. Our conclusions are presented.

Skew-Normal Distribution
The probability density function (pdf) of the SN (λ) distribution is given by where φ(z) and Φ(z) are the pdf and the cumulative distribution function (cdf) of the standard normal distribution, respectively. λ is the skewness parameter, it is also known as the shape parameter since it regulates the shape of the distribution. If a random variable Z has a skew-normal distribution with parameter λ then it is denoted by Z ∼ SN (λ). Some extensions of this distribution con found in Martínez-Flórez, Vergara-Cardozo & González (2013) and Pereira, Marques & da Costa (2012).
It may be noted that for λ=0, SN (λ) reduces to the well known standard normal distribution N (0, 1). When λ → ∞, SN (λ) converges to the half-normal distribution, h(z) is strongly unimodal for fixed λ. It is right skewed for λ > 0 and left skewed for λ < 0. SN (λ) distribution has also the following properties: Azzalini 2005). To better understand the shape of the SN (λ) distribution, see the coefficients of skewness (γ 1 ) and the kurtosis (γ 2 ) for some representative values of λ given in Table 1.
It is clear from Table 1 that the skewness of the distribution increases as the skewness parameter λ increases (in absolute value). Skewness of the SN (λ) distribution takes values in the interval (−0.995, 0.995) and the maximum value of its kurtosis is 3.869. Here, it should be noted that the skewness values corresponding to the positive λ values are exactly the same, but with opposite sign, with 78 Nuri Celik, Birdal Senoglu & Olcay Arslan the skewness values corresponding to the negative λ values. Therefore, in Table  1, we just reproduce the skewness values corresponding to the positive λ values for brevity. It can also be seen that the SN (λ) and the normal distribution are indistinguishable for λ < 3.
Here and in many other studies, we consider a more general form of the distribution given in (2) by performing a change of location and scale: Based on this linear transformation, pdf of the random variable Y is obtained as shown below, where, µ ∈ R is the location parameter and σ ∈ R + is the scale parameter. If the random variable Y has SN (λ) distribution with the parameters µ, σ and λ, then it is denoted by Y ∼ SN (µ, σ, λ). The expected value and the variance of SN (µ, σ, λ) distribution are given by, respectively.
Here, it should be noted that the skewness parameter is assumed to be known throughout the study. Since the ML method gives doubtful estimates when we estimate the location, the scale and the shape parameters simultaneously unless large samples ( n > 250 or so) are available, (see, Bowman & Shenton 2001, Kantar & Senoglu 2008. See also, the Introduction of Acitas, Kasap, Senoglu & Arslan (2013). However, the sample size is much smaller than 250 in the context of experimental design. Therefore, in this study, we only estimate the location and the scale parameters for a better fitting. In spite of the fact that the shape parameter 79 is assumed to be known, in practice, we must identify its value. Shape parameters can be identified by using various techniques, such as Q-Q plots, goodness-of fit tests etc. The algorithm given in Acitas et al. (2013, p. 417) can also be used for the identification of the shape parameter, see also Islam & Tiku (2004). Suppose that the value of the shape parameter in skew-normal distribution might be somewhat misspecified by using these techniques. Then the question arises what effect will it have on the efficiencies of the location and the scale estimators. The answer is that this does not adversely affect the efficiencies of the estimators since the estimators obtained in this study are robust to plausible deviations of the true model.
To obtain the ML estimators of the unknown parameters in model (1), the loglikelihood function is maximized with respect to the unknown parameters µ, α i and σ. Here By differentiating the log-likelihood function with respect to the unknown parameters and equating them to zero we obtain the following likelihood equations Solutions of these equations are the ML estimators. These equations have no explicit solutions; therefore we resort to iterative methods.
If we appropriately reorganize the likelihood equations in (8) and define the weight function w ij as below the likelihood equations can be written as follows: wherē We use IRA which is very popular in robustness studies to compute the ML estimates of the parameters. It can be shown that IRA is an expectation-maximization (EM) type algorithm so that its convergence is guaranteed (see, Arslan & Genc 2009). Also Arrellano-Valle, Bolfarine & Lachos (2005)  i (i = 1, 2, . . . , a) and σ (0) for µ i and σ, respectively.
ii. Compute the weights w Here, m is the number of iterations and takes the values 1, 2, 3, . . .
iii. Find new estimates of the parameters by using the following updating equa- It should be noted that LS estimates are used as initial estimates for this algorithm. However, some other robust estimates can also be used as initial estimates.

Modified Maximum Likelihood Estimator
In this section, we use the MML methodology originated by Tiku (1967) to obtain the explicit estimators of the model parameters by approximating the likelihood equations appropriately. This methodology is used to alleviate the computational difficulties encountered in solving the likelihood equations given above. MML methodology proceeds as follows: Let be the order statistics obtained by arranging y ij (i = 1, 2, . . . , a; j = 1, 2, . . . , n) in ascending order. The likelihood equations in (8) can be written in terms of the order statistics as shown below, since complete sums are invariant to ordering It should be noted that the last two terms of ∂ ln L ∂σ are obtained by simply multiplying the terms of ∂ ln L ∂µ by z i (j) . z i (j) is the loading factor and instrumental in yielding an estimator which is always real and positive. Then, we linearize the intractable terms in (11) by using the first two terms of Taylor series expansion around the expected values of the standardized order statistics, i.e., t (j) = E(z i (j) ), j = 1, 2, . . . , n. This linearization yields g(z i(j) ) = α j − γ j z i(j) , i = 1, 2, . . . , a; j = 1, 2 . . . , n where and The exact values of t (j) are not available, however, we use their approximate values obtained from the equation, (see, Tiku & Akkaya 2004). Here, we use the property: If F (z j ) ∼ U (0, 1) then F (z (j) ) ∼ Beta(j, n − j + 1) with the expected value j n+1 , (j = 1, 2 . . . , n). Incorporating equation (12) into the likelihood equations in (11), we obtain the modified likelihood equations ∂ ln L * ∂µ , ∂ ln L * ∂αi and ∂ ln L * ∂σ . The solutions of these modified likelihood equations are the following MML estimatorŝ wherê The divisor N in the expression forσ was replaced by N (N − a) as a bias correction. MML estimators have the following properties: i. They are the functions of sample observations and are easy to compute.
ii. They are asymptotically equivalent to the ML estimators. Therefore, under regularity conditions, they are asymptotically fully efficient, i.e., they are unbiased and minimum variance bound (MVB) estimators.
iii. Even for small sample sizes, they are highly efficient.
iv. They are robust.
It should be noted that weights β (j) in (12) have half-umbrella ordering, i.e., they are a decreasing sequence of positive numbers in the direction of the long tail. Therefore, weights β (j) given to the extreme residuals deplete the dominant effect of long tail and outliers. This is a very important property for achieving robustness, see for example Tiku & Akkaya (2004). On the other hand, in LS method, all e (j) receive the same weight. This exposes the LS estimators to the dominant effect of long tail and outliers making them nonrobust.

Comparison of Estimators
In this section, we compare the ML, MML and LS estimators of the model parameters in terms of means, variances and mean square errors (MSE) for some representative values of the skewness parameter λ. All the simulations are based on [100, 000/n] Monte Carlo runs. In the simulation study, we use a = 3, 5, n = 5, 10, 15, 20 and α = 0.05, however, we just reproduce the results for a = 3 for the sake of brevity. Without loss of generality, we choose the following setting in our simulation: µ i (µ + α i ) = 0(i = 1, 2, . . . , 1) and σ = 1.
Here, it should be noted that we are interested in λ values satisfying the property 0.4 < [P (X > E(X))] < 0.6 in the context of experimental design. We, therefore use λ values satisfying the mentioned condition, i.e. we take −1 < λ < 1 from now on. Simulation results are given in Table 2.
From Table 2, it is seen that both the ML and the MML estimators are more efficient than the LS estimators of µ i and σ when the skewness parameter λ is close to 1. When the skewness parameter λ is close to 0 all the three estimators have similar efficiencies as expected. Because, SN (λ) distribution reduces to normal distribution when λ is equal to 0; in that case, algebraic forms of the ML and the MML estimators are exactly the same with the corresponding LS estimators of the unknown parameters.
It is interesting to note that relative efficiencies (REs) of the ML and the MML estimators decrease as the sample size n increases. Robustness: In this study, we use the following definition of robustness. An estimator is called robust if it is fully efficient under the assumed model and maintains high efficiency under the plausible alternatives of the assumed model, (see, Tiku & Akkaya 2004). Assume, for illustration, that the true model in the simulation study is taken to be SN (0, 1, 1). We use the following sample models to represent a large number of plausible alternatives.

Sample Models:
Model (1) 90SN (0, 1, 1) + 0.10N (0, 1). Table 3 are the simulated values of the means, variances and MSEs for the ML, the MML and the LS estimators of the model parameters µ i (i = 1, 2, . . . , a) and σ under the alternative models. We simply reproduce the results for µ 1 since they are all similar. We also give the REs of the ML and the MML estimators with respect to the LS estimators. It can be seen that the ML and the MML estimators are robust owing to the reason mentioned at the end of the Section 4.

Hypothesis Testing
In one-way ANOVA, our aim is to compare the equality of treatment effects, in other words, to test the following null hypothesis against the alternative hypothesis H 1 : at least one α i = 0.
Traditionally, for testing the null hypothesis given in (15) the following test statistics based on the LS estimators are used

85
As mentioned earlier, power of F LS is very sensitive to non-normality and to data anomalies. Therefore, in this paper, we propose the following test statistics based on the ML and the MML estimators as an alternative to the test statistic given in (16),  (18) respectively. Table 4 shows that central F distribution with a − 1 and N − a degrees of freedom provides accurate approximations to the distributions of F M L and F M M L even for small n values. We now compare the power of the F M L and F M M L tests with the traditional F LS test by simulating the probabilities for some representative values of λ. It should be noted that all the observations are divided by their standard errors. A constant d is added to the observations in the first and third treatments and a constant 2d is subtracted from the observations in the second treatment. Simulation results showing the power comparisons of the proposed tests with the LS based test are given in Table 5.
From Table 5 it is clear that power of F LS , F M L and F M M L are very similar when λ is close to 0. When λ approaches 1, F M L and F M M L seem more powerful than the F LS , but the differences are not very attractive. This is not surprising due to the fact that the quadratic form of a skew-normal distributed random variable has the chi-square distribution (Azzalini 1985, Gupta & Huang 2002. Table 5: Power values of the FLS, FML and FMML tests: a = 3,n = 10; α = 0.050. Robustness: We use the following definitions of robustness formulated by Box (1953). See also Box & Tiao (1964), Tiku, Tan & Balakrishnan (1986).

Nuri Celik, Birdal Senoglu & Olcay Arslan
Criterion robustness: If the Type I error of a test is not substantially higher under plausible alternatives than that attained under an assumed model, the test is said to have criterion robustness.
Efficiency robustness: If the power of a test is the highest possible (or nearly so) under an assumed model but stays high for all plausible models, the test is said to have efficiency robustness.
In this section, our aim is to identify the affect of deviations from an assumed model on the Type I error and the power of the proposed tests. For this purpose, we use the sample models given in Section 5. These simulated values of the power of the proposed tests and the F LS test are given in Table 6. Table 6: Values of the power for the alternatives to SN (0, 1, 1): a = 3, n = 10; α = 0.050.
(1)  It is clear from Table 6 that the power of the F M L and F M M L tests are much higher than the corresponding F LS test for all the sample models, i.e., Model (1) through Model (4). For d = 0, the values represent Type I errors. Then it is said that proposed tests have criterion robustness as well as the efficiency robustness.

Application
Consider the data given in Montgomery (2005); pertaining to the relationship between the radio frequency power setting and the etch rate for plasma. This is an example of a one-way ANOVA with 4 levels of the factor and 5 replicates. The data is given in Table 7. To identify the distribution of the error terms, we use the Q-Q plot technique, one of the well-known and widely used graphical techniques. The Q-Q plot of normal distribution is shown in Figure 1. On the other hand, among the Q-Q plots of the residuals obtained for various different values of the skewness parameter λ, SN (µ, σ, λ = 1) adequately models the residuals, since the observations do not deviate very much from the straight line, see Figure 2. When we take the skewness parameter λ as 1, parameter estimates and calculated F values are obtained as shown in Table 8).
The ML and the MML estimates of µ i are very close to the LS estimate of µ i with smaller standard errors. All the three tests are consistent in rejecting the null hypothesis, H 0 : there is no difference between the radio frequency powers.

Nuri Celik, Birdal Senoglu & Olcay Arslan
Figure 2: Q-Q plot of the residuals for SN (λ = 1). However, the p values for F M L and F M M L are much smaller than the p value of the F LS . This is due to the smaller standard errors of the ML and the MML estimators. Therefore, their results are more reliable than normal theory results.

Conclusion
Traditionally, LS estimators and the tests based on them are used in the context of experimental design. However, efficiencies of the LS estimators are low when the usual normality assumption is not satisfied. They are also not robust to departures from normality.
In this paper, we derived estimators of the model parameters in one-way ANOVA by using the ML and the MML methodologies. New test statistics based on these estimators were proposed for testing the equality of the treatment effects when the distribution of the error terms is skew-normal. SN (λ) distribution covers the normal and normal-like distributions with different skewness and kurtosis values. Therefore, it provides very flexible and simple alternative model for the normal distribution in most practical problems.

89
Simulation studies show that the ML and the MML estimators and the tests based on them are more efficient and robust than the corresponding LS versions thereof.
It can also be seen that there is no significant difference between the methodologies based on ML and MML even for small sample sizes. The methodology based on ML is somewhat preferable than the methodology based on MML in terms of efficiency and power. On the other hand, the methodology based on MML is computationally feasible and less time consuming.