The comparison of nonparametric statistical tests for interaction effects in factorial design

Department of Statistics, Faculty of Science, Kasetsart University, Bangkok, Thailand 10900 C H R O N I C L E A B S T R A C T Article history: Received October 9, 2018 Received in revised format: October 18, 2018 Accepted November 16, 2018 Available online November 16, 2018 Correct application of the classical factorial F-test depends on normality and homogeneity of variance assumptions. If these assumptions are violated the type I error rate will be inflated and power of the test will be decreased. Therefore nonparametric statistical tests have been proposed to analyze the interaction effects in factorial designs. A simulation was conducted to investigate the effect of non-normality on type I error rate and power of the test of the classical factorial Ftest and five nonparametric tests namely rank transformation (FR), Winsorized mean (FW), modifies mean (FM), adjusted rank transform (ART) and adjusted median transform (AMT) using program SAS 9.4 with 1,000 replications. The study used 2×2 factorial design with replications of 3, 4 and 6 making sample sizes of 12, 16, and 24, respectively and 3×3 factorial designs with replication of 3 making a sample size of 27 studied at 0.05 level of significance. As a results, when the normality of assumption is satisfied all six statistical tests have the ability to control type I error in all situations. The ART test cannot control type I error rate for 3×3 factorial design when sample size is 27 when normality assumption is violated. For power of the test, the F-test provided the highest test power when the normality of assumption is met. The ART and AMT tests provided approximately the same test power. The AMT and ART tests can be effectively used to analyse the interaction effect between factors A and B in 2×2 factorial design when the sample size is 12 and 16 or 24 respectively and the normality of assumption is not met. Moreover, the results showed that when sample sizes increased, all six statistical tests tended to increase the power of the test.


Introduction
Factorial design is used to study the effect of factors on the characteristics of an interest.It is important to recall that the significant of the main effects and interactions are independent.An interaction is the effect that a combination of two or more factors has on the expected value of the response variable.In terms of the parametric perspective, the problem of testing the main effects and interactions are analyzed with Analysis of variance (ANOVA) model.The valid application of the ANOVA F-test depends on assumptions, namely that the observations are independent, the distributions of error are normal, and the observations have homogeneity of variance.In practice, violations of these assumptions are commonly stated many restudies such as O'Gorman (2001).If these assumptions are not met, then the type I error will deviate from the nominal level and this will decrease the power of the test.Therefore, nonparametric approach should be considered to be alternative methods to classical factorial F-test.The purpose of this study is to compare the classical factorial F-test and five nonparametric tests namely rank transformation (FR), Winsorized mean (FW), modified mean (FM), adjusted rank transform (ART) and adjusted median transform (AMT) for testing the interaction effects in factorial designs by considering their abilities to control type I error and the power of the tests when the normality assumption is not satisfied.

Simulation
A simulation study was conducted to investigate the effect of non-normality on type I error rates and test power of the classical factorial F-test (F), rank transformation (FR), Winsorized mean (FW), modified mean (FM), adjusted rank transform (ART) and adjusted median transform (AMT) for testing 2×2 and 3×3 interaction effects in factorial designs.The model for this study is as follows, where, ijk Y is experimental response, μ is general mean, α is main effect of factor A, β is the main effect of factor B, αβ is the interaction effect between factor A and B and ε are random error terms.We generate data using program SAS 9.4 with 1,000 replications under the scope of the research as follows: 1. Determine distributions of observations as: (i) Normal distribution with mean 0 and variance 1 (ii) Chi-square distribution with 5 degree of freedom (iii) t distribution with 2 degree of freedom All five statistics and classical factorial F-statistics were computed.It was determined whether 0 H would be rejected for interaction effect at the significance level of 0.05 and repeat 1000 times in each situation.We calculate the approximations of the probability of type I error and the percentages of the power of the test as follows, Probability of type I error the number of reject H , when H is true 1000 , Percentage of power of the test the number of reject H , when H is not true.1000 100. (3) To assess the ability to control type I error, Bradley (1978)

Statistical Tests
The statistical tests for interaction effects between two factors in this study are examined next.

Classical factorial F-test (F)
The total corrected sum of squares for two-way factorial F-test can be written as: where Yijk denotes the observation measured from replication k (number of replications), i levels (factor A) and j levels (factor B).Y … denotes general mean for two way interactions.
Sum of squares for two-way factorial design are calculated as follow, where SS Total denotes the total sum of squares, SS AB is the sum of squares for interaction of factor A and B, SS Cell gives the sum of squares for cells or sub-groups, SS A represents the sum of squares for factor A, SS B provides the sum of squares for factor B and SS Error is considered for the error sum of squares.
F statistic is computed as where AB AB AB

SS MS = DF
denotes the mean square for interaction and Error Error Error

SS MS = DF
denotes the mean square for error.The F-test statistic distributed as F-distribution with DF AB = (a-1)(b-1) which is the degree of freedom for interaction and DF Error = ab(r-1) which is the degree of freedom for error term, (Montgomery, 1997).

Rank transformation test (FR)
The rank transformation has been introduced by Conover and Iman (1976).This procedure is just the usual parametric procedure applied to rank of the data.Conover and Iman (1981) stated that the rank transformation procedure is robust and powerful in two way factor with a test for interaction when replication effect are present.From the study of Olejnik and Algina (1985), rank transformation has been recommended as an alternative to factorial F-test, especially when normality assumption is not met.The steps of FR are: (i) rank all observations (Yijk) by assigning one to the smallest and n to the largest.If ties are present, the average rank is assigned to all tied observations.Then, we replace each observation by its rank, (ii) classical factorial F-test on the ranks is used.Therefore, the corrected total sum of squares can be written as: where R... Y denote general rank mean.
Computations of the sum of squares for main effects, interaction effect and error for the rank transformation procedure are the same as the classical factorial F-test.In this case, the rank transformation procedure test statistics are computed as follows, where AB RMS denotes the mean square for interaction computed based on ranked observations and Error RMS is the mean square error computed based on ranked observations, respectively.

Winsorized mean test (FW)
Winsorized mean procedure has been studied by Wilcox (1996).It is a robust estimator of the population mean when there are outliers in the sample.The Winsorized mean is computed after the k smallest observations are replaced by the (k+1)st smallest observations, and the k largest observations are replaced by the (k+1)st largest observations.The steps of Winsorized mean approach are: (i) rank all observations in each treatment combination.(ii) replace the smallest observation in each treatment combination (position: r = 1) by the second smallest (position: r = 2) and replace the largest observation (position: r = r) by the second largest (position: r = r-1).For example, treatment combination a1b1 has 15, 17, 18, 19, 20, the result is 17, 17, 18, 19, 19. (iii) sums of squares are computed using general Winsorized mean by replacing the general arithmetic mean, (iv) the classical factorial F-test is applied on the general Winsorized mean.Therefore, the corrected total sum of squares can be written as follows, where W... Y denotes general Winsorized mean.Computation of the sum of squares for the main effects, interaction effect and error for the Winsorized mean procedure are the same as for the classical factorial F-test.Thus, test statistics for the Winsorized mean are computed as follows, where AB WMS is the mean square for interaction computed based on Winsorized mean and Error WMS is the mean square error computed based on Winsorized mean.

Modified mean test (FM)
Mendeş and Yiğit (2013) presented the procedure of the modified mean.This procedure is computed by dividing the rank data set into two as Set 1 and Set 2. Then the arithmetic means of both groups are calculated as Y Set1 and Y Set2 , respectively.We replace Y Set1 with the smallest number and replace Y Set2 with the largest number.The modified mean test is obtained as follows: (i) rank all observations in each treatment combination, (ii) calculate the smallest adjusted average ( ij EK ) and calculate the largest adjusted average ( ij EB ), where ij EK denotes the average of observations which are lower than Y ij and ij EB denotes the average of observations which are greater than Y ij (iii) in each treatment combination, replace the smallest observation by ij EK and the largest observation by ij EB .Afterwards, the mean of modified data set are calculated.Computations of the sum of squares for main effects, interaction effect and error for the modified mean, the procedure are the same as the classical factorial F-test.Therefore, the corrected total sum of squares can be written as follows, where M... Y denote general modified mean.Test statistics for the modified mean are computed as below: where AB MMS denotes the mean square for interaction computed based on the modified mean observations and

MMS
denotes the mean square error computed based on modified mean.

Adjusted rank transform test (ART)
ART is based on the rank transformation introduced by Conover and Iman (1981).Wobbrock et al. (2011) presented the aligned rank transform for nonparametric factorial data.The method consists aligning the observation before assigning the rank and analyses the adjusted data with classical F-test.
The main idea of ART is to remove the unwanted effects from the response variable in order to study one effect at a time.Kelley and Sawilowsky (1997) found good results for the adjusted rank transform test and indicated that the test aligned by means had superior power when compared with the classical F-test if the distribution is heavy tailed or skewed.The procedure of adjusted rank transform test are: (i) subtract the average of all observations in level i from factor A   i..

Y and the average of all
observations in level j from factor B   .j.
Y .Thus, the adjusted value is ijk i.. .j.

Y Y Y  
(ii) rank all adjusted values, if ties are present, the average rank is assigned to all tied observations, then, replace observations by rank of observations.(iii) using the rank of observation compute the sum of squares for main effects, interaction effect and error for the adjusted rank transform test in the same process as that for the classical factorial F-test.

Adjusted median transform test (AMT)
AMT is also based on the rank transformation introduced by Conover and Iman (1981).The procedure of AMT is developed from the idea of the ART using the median instead of mean by following the suggestion of Sawilowsky (1990) who recommended for using alignments other than the mean for further study of the aligned rank transform test for interaction.The procedures of adjusted median transform test are: (i) subtract the median of all observations in level i from factor A   i..
Y  and the median of all observations in level j from factor B   .j.
Y  .Thus, the adjusted value is ijk (ii) rank all adjusted values, if ties are present, the average rank is assigned to all tied observations, then, replace observations by rank of observations.(iii) using the rank of observation compute the sum of squares for main effects, interaction effect and error for the adjusted median transform test in the same process as that for the classical factorial F-test.

The ability to control type I error
Table 1 shows the empirical type I error rates of the classical factorial F-test and five nonparametric tests namely rank transformation (FR), Winsorized mean (FW), modifies mean (FM), adjusted rank transform (ART) and adjusted median transform (AMT) where two-way factorial designs are used for significant level 0.05.The results show that for 2×2 factorial design all five statistical tests and classical factorial F-test have the ability to control type I error for all distribution.Thus all six statistical tests are robust to the normal assumption condition.The results for 3×3 factorial design show that when the normal assumption is violated, ART does not have the ability to control the type I error rate.However, all six statistical tests still have the ability to control type I error rate for the t distribution that is all six statistical tests still robust when the distribution is symmetry or not much deviate from the normal.Furthermore, the increase in the number of replication has positively affected keeping type I error rates at nominal level.When the level of factors A and B increased ART test tended to decrease the ability to control type I error.

Power of the test
To consider the power of the test, the results in

Conclusion and Discussion
O' Gorman (2001) presented that some nonparametric tests could be used in place of classical F-test when normality assumption is not satisfied.However the performance of these nonparametric tests may differ based on the experiment condition such as distribution, number of factors, number of replications, etc.In general the parametric factorial F-test would recommend if the normality assumption is not violated because it provides the greatest power and would hold the type I error rate at nominal level.In this study, the results have shown that the classical F-test had the ability to control type I error rate and had the highest test power when the normality assumption was satisfied.However, one can conclude that the shape of the distribution did not affect the ability to control type I error much but the level of factors A and B and the number of replications did.As the level of factors A and B or the number of replications increased, ART test tended to decrease the ability to control type I error.To consider the power of the test, the F-test provided the highest test power when normality assumption was satisfied, if the assumption of normality is suspicious AMT test and ART test are recommended.The ART test is an alternative nonparametric statistical test for testing the interaction effect between factors A and B in 2×2 factorial designs when the sample size is 16 or 24 and the distribution of error is Chi-square.The AMT test is recommended for testing the interaction of 3×3 factorial designs when the sample size is 27.Sample size affected the power of the test; when the sample size increased, all six statistical tests tended to increase the power of the test.

2.
Determine replications according to levels of factors as: (i) 2×2 factorial designs: replications of 3, 4 and 6, making sample sizes of 12, 16, and 24, respectively.(ii)3×3 factorial designs: a single replication of 3, making a sample size of 27.Note: Only balanced design (equal number of replications in each cell) is considered.3. Determine significance level at 0.05 4. The effect of treatment is fixed to test the hypothesis: criterion was applied.According to this criterion, the actual type I error rate of a test has to be in the range of 0.025-0.075 when testing at the 0.05 level.In this study, a test would be considered to have the ability to control type I error, if its empirical type I error rate falls within the interval [0.025, 0. 075].We consider only statistical tests which have the ability to control type I error, if a statistical test has the highest power of the tests and assume that this statistical test is the most effective.
Table 2 show that for 2×2 factorial design the classical F-test and FW test provided approximately the same test power while ART test and AMT test provided approximately the same test power.The classical F-test provided the highest test power for all number of replications when the normality assumption holds.While the distributions are Chi-square and t distribution, AMT test provided the highest test power when the sample size is 12 and ART test provided the highest test power when the sample size is 16 or 24.For 3×3 factorial design classical Ftest and FW test provided approximately the same test power.F-test and FW test have the highest test power when the normality assumption is satisfied.While the distribution are chi-square and t distribution, ART test provided the highest test power.Moreover, the result show that when sample sizes increased, all six statistical tests tended to increase the power of the test.-means the statistical test does not have the ability to control type I error.** means the statistical test has the ability to control type I error.(1) means the statistical test has the ability to control type I error and has the highest power. Note: