Diagonal and unscaled Wald-type tests in general factorial designs

: In this paper, the asymptotic and permutation testing proce- dures are developed in general factorial designs without assuming homoscedasticity or a particular error distribution. The one-way layout, crossed and hierarchically nested designs are contained in our general framework. New test statistics are modiﬁcations of Wald-type statistic, where a weight matrix is a certain diagonal matrix. Asymptotic properties of the new solutions are also investigated. In particular, the consistency of the tests under ﬁxed alternatives or asymptotic validity of the permutation procedures are proved in many cases. Simulation studies show that, in the case of small sample sizes, some of the proposed methods perform compara-bly to or even better in certain situations than the Wald-type permutation test of Pauly et al. (2015). Illustrative real data examples of the use of the tests in practice are also given. certain diagonal matrix, i.e., diagonal and unscaled Wald-type tests. The new methods do not perform equally well when small sample sizes are apparent.


Introduction
Let us consider the following general factorial design introduced by Pauly et al. [20]. We consider independent observations (1. 2) The total sample size is denoted by N = d i=1 n i . In order to derive asymptotic results, we will make the following assumption: It is worth noting that different variances, samples sizes as well as distributions of error terms are allowed as long as assumptions in (1.2) hold. Let I n be the n × n identity matrix and let 1 n be the n × 1 vector of ones. In matrix notation, (1.1) can be written as X = diag(1 n1 , . . . , 1 n d )μ+ε, where X = (X 11 , . . . , X dn d ) , μ = (μ 1 , . . . , μ d ) and ε = (ε 11 , . . . , ε dn d ) with E(ε) = 0 and Cov(ε) = diag(σ 2 1 I n1 , . . . , σ 2 d I n d ) > 0. In our general design, a factorial structure within the components of the vector μ by splitting up the indices is allowed (the index i is appropriately split in subindices i 1 , i 2 , . . . ). In this way, we can consider for example the one-way layout and crossed and hierarchically nested designs (see [20] and Section 5, for more detail and some real data examples).
To formulate a general hypothesis testing problem, we need a contrast matrix H, i.e., H1 = 0, where 1 is the column vector of ones of the appropriate size. We are interested in testing the null hypothesis H 0 : Hμ = 0. This hypothesis is equivalent to H 0 : Tμ = 0, where T = H (HH ) − H is the unique projection matrix (M − denotes a generalized inverse of M), which is symmetric and idempotent. For instance, in the one-way layout where factor A has a levels, the centring matrix T = P a = I a − (1/a)1 a 1 a is used for testing the hypothesis of no treatment effect H A 0 : μ 1 = · · · = μ a . Other examples of contrast matrix H and projection matrix T are given in [20] and in Section 5.
For testing H 0 , there exist many inference methods for normal or homoscedastic models (see [20] and the references therein). Nevertheless, such assumptions are often not met or it is difficult to check them in practice. To avoid these limitations, Pauly et al. [20] proposed the testing procedures based on the Wald-type statistic (WTS) Q N (T) = NX · T(TV N T) + TX · , whereX · = (X 1· , . . . ,X d· ) is the vector of the meansX i· = 1/n i ni j=1 X ij , V N = N · diag(σ 2 1 /n 1 , . . . ,σ 2 d /n d ),σ 2 i = 1/(n i − 1) ni j=1 (X ij −X i· ) 2 is the empirical variance of X i = (X i1 , . . . , X ini ) , i = 1, . . . , d, and M + denotes the Moore-Penrose inverse of M [11]. The asymptotic WTS test is given by ϕ N = {Q N (T) > χ 2 rank(T),1−α }, where χ 2 p,α is the α-quantile from the χ 2 p -distribution. This test is asymptotically exact, but it requires large sample sizes to keep the nominal type I error level.
Pauly et al. [20] proposed a permutation test based on WTS to overcome this problem. They applied a modified permutation principle [12,18]. This principle can be used in situations where exchangeability of the data does not hold. The test retains the finite exactness property under exchangeability. Moreover, it is even asymptotically exact and consistent when the data are not exchangeable. From simulations, the Wald-type permutation statistic (WTPS) test tends to result in accurate test decision for small sample sizes in many cases, but it is also more or less liberal for extremely skewed distributions (like log-normal one) in the case of unequal variances. Small sample sizes usually means that there are a few or over a dozen observations in each sample, which of course depends on the number of groups.
In the Wald-type statistic proposed by Pauly et al. [20], the Moore-Penrose inverse of TV N T is used as a so-called weight matrix. However, it can be chosen in other way resulting in a modified Wald-type statistic. The choice of a weight matrix may be significant for performance of a test (see, for example, Duchesne and Francq [7]). Smaga [23] considered the asymptotic and permutation tests based on modified WTS, where the Moore-Penrose inverse is replaced by a {2}-inverse, i.e., a matrix satisfying the second relation defining the Moore-Penrose inverse [7,9]. Under some assumptions, these testing procedures are also asymptotically valid but, however, consistent for a smaller class of fixed alternatives than the tests based on WTS and WTPS. For extremely skewed distributions, heteroscedastic designs and small sample sizes, the methods based on {2}-inverses seem to be a more conservative replacement for the WTPS. However, they may perform worse under symmetric distributions, i.e., they may be more conservative or more liberal than the WTPS test.
The testing procedures of Pauly et al. [20] and Smaga [23] are constructed without assumption of equal sample sizes, equal variances and a particular distribution of the errors. However, no one of them is better than the other and in general they do not perform satisfactory well for extremely skewed distributions, heteroscedastic designs and small sample sizes. In this paper, we propose new inference methods based on the ideas from tests for problems in high-dimensional data analysis. More precisely, we consider modified Wald-type statistics, where L. Smaga a weight matrix is a certain diagonal matrix, which may be related to TV N T, or not. In such a way, the singularity problem of the matrix TV N T is circumvented. We consider the asymptotic and permutation methods to approximate null distribution of a test statistic. Simulation studies show that some of our new solutions perform comparable to or even better in certain scenarios than existing competitors.
The remainder of the paper is organized as follows. New testing procedures are introduced in Sections 2 and 3. Their properties are also given there. Section 4 contains a Monte Carlo simulation study providing an idea of the size control and power of the tests. Illustrative real data examples are presented in Section 5. Some conclusions are given in Section 6. Proofs are outlined in the Appendix.

Diagonal Wald-type test
For high-dimensional low sample size data, the Hotelling's T 2 test suffers from a singularity problem in the covariance matrix estimation and therefore is not valid in that setting. To overcome this problem, some remedies are proposed in the literature. One of them is the assumption of diagonal covariance matrix. This idea was first considered by Wu et al. [29] and further investigated by Dong et al. [6], Park and Nag Ayyala [19], Srivastava [25], Srivastava and Du [26], Srivastava et al. [27]. Here, we use this idea to handle with the problem of singularity of the matrix TV N T. Specifically, we propose the following test statistic where diag(M) denotes the diagonal matrix with diagonal entries of the quadratic matrix M, and the vectorX · and the matrixV N were defined in Section 1.
To methods based on Q D N (T), we refer as the diagonal Wald-type tests. The null hypothesis is rejected for large values of Q D N (T). For small sample sizes, the sample variances inV N are perhaps not reliable estimators. This may have a negative effect on testing procedures which use this matrix. Since the tests based on Q D N (T) only use the diagonal elements of TV N T, a negative effect of unreliable estimation seems to be smaller for these tests than for the WTS and WTPS ones, where whole matrix TV N T is used. On the other hand, the diagonal Wald-type tests do not use the information from the off-diagonal elements of the matrix TV N T, in contrast to the WTS and WTPS testing procedures. However, our results indicate that negative effect of the off-diagonal elements of TV N T is stronger than the information from them (see Section 4), which is favourable for the diagonal Wald-type tests.
We should check that the statistic Q D N (T) is well defined. By definition of the estimatorV N , the ith diagonal element of TV N T is of the form is the ith row of the projection matrix T. Sinceσ 2 j > 0, j = 1, . . . , d with probability one, the ith diagonal element T iVN T i of the matrix TV N T is equal to zero if and only if T i = 0 d . However, this is impossible, because T is a projection matrix. Hence, {diag(TV N T)} −1 exists with probability one and the diagonal Wald-type statistic is well defined.
First, we construct the asymptotic test based on the diagonal Wald-type statistic. The asymptotic null distribution of Q D N (T) is given in the following theorem. Throughout the paper, d → and P → denote convergence in distribution and probability, respectively.
By Theorem 2.1, the asymptotic null distribution of the diagonal Wald-type statistic is a central χ 2 -type mixture distribution (see [30]). This distribution can be approximated by a scaled χ 2 -distribution, i.e., by the distribution of g D χ 2 f D such that the first two moments coincide [2,31]. The asymptotic critical value is then given by k D,α = g D χ 2 f D ,1−α . It is easy to see that The expected value and variance of Since g D and f D involve some unknown quantities, they have to be estimated. The matrix V is estimated by its consistent estimatorV N , so we consider the following simple estimators of g D , f D and k D,α : The consistency of these estimators as well as the consistency of the test with critical region ϕ D N = {Q D N (T) >k D,α } are established in the following theorem.
Moreover, under the alternative hypothesis H 1 : . Theorem 2.2 shows that the asymptotic testing procedure based on Q D N (T) is consistent for all fixed alternatives, as the WTS test. Unfortunately, the asymptotic diagonal Wald-type test is also similar to the WTS test in the sense that it also requires large sample sizes to obtain a satisfactory approximation (see Section 4). As Pauly et al. [20], we consider a permutation testing procedure based on Q D N (T) to improve the small sample behavior of the diagonal Wald-type test.

Diagonal Wald-type permutation test
Let π be a random permutation of N indices (uniformly distributed on the symmetric group of order N ) that is independent from all other occurring random variables. Then, X π = π(X 11 , . . . , X dn d ) = (X π 11 , . . . , X π dn d ) denotes the permutation of the vector of observations X. LetX π · = (X π 1· , . . . ,X π d· ) be the vector of the means andV π N = N · diag(σ 2 1,π /n 1 , . . . ,σ 2 d,π /n d ) be the empirical covariance matrix of √ NX π · . The value of Q D N (T) computed from the permuted observations is of the form A diagonal Wald-type permutation test is obtained by comparing Q D N (T) with the (1−α)-quantile of the conditional distribution of Q D,π N (T) given the observed data X. The asymptotic conditional permutation distribution of Q D,π N (T) is presented in the following theorem. The permutation distribution is the empirical distribution of a given test statistic recomputed over all permutations of the data.  1. Under the assumptions of Corollary 2.1, the asymptotic conditional permutation distribution of Q D,π N (T) always approximates the asymptotic null distribution of Q D N (T). This property is desirable for resampling procedures (see [4,13,14,20,23,24] [20] proved the conditional distribution of the Wald-type permutation statistic (WTPS) Q π N (T) = NX π · T(TV π N T) + TX π · always approximates the null distribution of Q N (T). Unfortunately, for the diagonal Wald-type statistic, such result does not hold. The asymptotic conditional distribution of Q D,π N (T) is the same as the asymptotic unconditional null distribution of Q D N (T) for homoscedastic designs only. Although these distributions are not the same for heteroscedastic designs, simulations suggest that they are quite close to each other when the variances are not extremely different. As we will see in Section 4, for small sample sizes, the permutation test based on Q D,π N (T) behaves quite well (even better than the Q π N (T) test) in the case of different variances. For these reasons, it seems sensible to consider this test as a possible testing procedure in general framework introduced in Section 1.

Diagonal Wald-type tests based on standardized test statistic
Although the Q D,π N (T) test works quite well for small sample sizes, we consider another procedure, which controls the nominal type I error level better than this test under very different variances, and both testing procedures have very similar empirical power (see Section 4). More precisely, we consider the standardized version of [21]), namely .
To simplify the analysis, we assume normality of the observations, i.e., ε i1 ∼ N (0, σ i ), i = 1, . . . , d. Then, → I d (see [20]). Therefore, we consider the following test statistic: . By Zhang [30], we approximated the distribution of Q D,s N (T) by a sequence of standardized Unfortunately the Q D,s N test works similar to or even worse than the Q D N test for small sample sizes, i.e., it is too liberal (see Section 4). This follows from the simulation results not included in the paper, but available from the author. The reason for this is that the speed of convergence of distribution of the Q D,s N statistic to its asymptotic distribution is too slow. So, we do not consider this test in the paper, but the construction and asymptotic properties of it are given in the Appendix B for completeness. For this reason, we consider a permutation test based on only. This test performs better than the Q D,π N (T) test for very small sample sizes or under very different variances. The asymptotic validity of this test in homoscedastic case is shown in the following theorem, which follows immediately from Theorems 2.1 and 2.3 and fromV N P → V andV π N P → σ 2 D (see Lemma 2 in the supplement to Pauly et al. [20]) as N → ∞, where σ 2 is given by (3.1). Although we assumed normality to construct the standardized diagonal Wald-type statistic (more precisely to establish the expected value and variance of Q D,V N N (T)), the asymptotic properties of the tests based on it are proved without this assumption.
Apart from the diagonal Wald-type testing procedures, in the next section, we also consider the unscaled Wald-type tests where we use the identity matrix as a weight matrix.

Unscaled Wald-type test
The other idea to improve the performance of the WTS is removing from it the Moore-Penrose inverse of TV N T, i.e., we consider the unscaled Wald-type statistic of the form Q U N (T) = NX · TX · . Bai and Saranadasa [1] considered this idea as the first for the Hotelling's T 2 statistic. Chen and Qin [3] and Zhang and Xu [32] extended this method for highdimensional data. Recently, Duchesne and Francq [7] and Pauly et al. [21] used this idea for multivariate hypothesis testing and to analyze high-dimensional one sample repeated measures designs, respectively.
The following result gives the asymptotic distribution of the unscaled Waldtype statistic under H 0 .
, and Z U,1 , . . . , Z U,r are the independent standard normal variables.
To construct the asymptotic test based on Q U N (T), we use the approximation by scaled χ 2 -distribution similarly as in Section 2, say f U ,1−α is the asymptotic critical value. In much the same way as in the proof of Lemma 2.1, we obtain E In the following result, the properties of them and of the asymptotic test based on Q U N (T) are given. Its proof is similar to that of Theorem 2.2, and therefore it is omitted.
The finite sample behavior of the asymptotic unscaled Wald-type test is much better than that of the WTS test and the testing procedure based on Q D N (T). However, it shows a tendency of conservativity in some situations. To overcome this problem, the first our idea was to consider a permutation test based on Q U,π N (T) = NX π · TX π · . In the following theorem, the asymptotic conditional permutation distribution of Q U,π N (T) is established.
U,π,i as N → ∞ in probability, where λ U,π,1 , . . . , λ U,π,r are the nonzero eigenvalues of the matrix σ 2 TDT, Unfortunately, Theorems 3.1 and 3.3 indicate that the Q U,π N (T) test may not work well. First of all, the asymptotic permutation distribution of Q U,π N (T) depends on the vector μ. Secondly, this distribution is rarely the same as the asymptotic null distribution of Q U N (T). For example, they are the same when 2622 L. Smaga σ 2 1 = · · · = σ 2 d and μ 1 = · · · = μ d . In fact, the simulations in Section 4 suggest that the unscaled Wald-type permutation test is too conservative or too liberal in certain cases. So, we further tried to improve this test. For this purpose, we consider the standardized version of Q U N (T), i.e., , similarly as in Section 2. Under normality assumption, theorem on the moments of quadratic forms (see, for instance, [15], p. 55), shows that .
By simulation results not included in the paper, but available from the author, the approximation by a sequence of standardized χ 2 -distributions for Q U,s N (T) (see Appendix B) resulted in test that has similar finite sample behavior to the Q U N (T) test and is even slightly more conservative than this test (see Section 4). So we do not consider it here. However, the permutation test based on Q U,s N (T) behaves very well (better than the Q U N (T) and Q U,π N (T) tests) for small sample sizes. The permutation version of Q U,s N (T) is denoted by Q U,s,π N (T). Theoretically, it is also better than the Q U,π N (T) test in the sense of the following theorem (compare with Corollary 2.1, Remark 2.1 and Theorem B.2 in the Appendix B). This result was obtained similarly to Theorem 2.4.
as N → ∞, where λ U,1 , . . . , λ U,r , V and Z U,1 , . . . , Z U,r are as in Theorem 3.1. 2. If r = rank(T), then the permutation distribution of Q U,s,π N (T) conditioned on the observed data X weakly converges to as N → ∞ in probability, where λ * U,π,1 , . . . , λ * U,π,r are the nonzero eigenvalues of the matrix TDT, and D and Z U,π,1 , . . . , Z U,π,r are as in Theorem Almost all of the new solutions considered have very similar asymptotic properties at least under homoscedastic designs. However as we will see in the next section they behave very differently for finite sample.

Simulation experiments
In this section, a simulation study is carried out in order to evaluate the finite sample performance of the testing procedures proposed in Sections 2 and 3, i.e., an idea of the size control and power of these tests is provided. The new methods are compared with the WTS and WTPS tests by Pauly et al. [20]. The simulation experiments as well as the illustrative examples of Section 5 were performed in the R programming language [22]. For the asymptotic and permutation tests proposed by Pauly et al. [20], we used the functions implemented in the R package GFD [8].

Simulation design
Similarly to Pauly et al. [20] and Smaga [23], we restrict our simulation studies to the one-way layout. We consider factor A with a = 8 levels. The data were generated from (1.1), i.e., 2) and σ 3 = (2.2, 2.1, 1.9, 1.7, 1.5, 1.3, 1.1, 1) . These settings contain the so-called positive (increasing sample sizes are combined with increasing variances) and negative (increasing sample sizes are combined with decreasing variances) pairings (see, for example, [20]). The behavior of the procedures under these two settings is a major assessment criterion for the accuracy of them.

Simulation results
Now, we discuss the behavior of the empirical sizes and powers of the tests under consideration. The following observations are confirmed by the results of multiple comparisons of tests using the Nemenyi post hoc test [5,17], which are available from the author (see [10], for a similar statistical comparison of tests). For readability, in the remainder of the paper, we omit the indication of the test statistic's dependency on the projection matrix T, e.g., we write Q N instead of Q N (T). Figure 1 and Tables 3-5 in the Appendix D depict the empirical sizes of the testing procedures considered in Sections 2 and 3. We immediately observe that the asymptotic tests based on Q N and Q D N do not keep the preassigned type I error in almost all settings except the Laplace model with n = n l + 251 8 , l = 1, 2. They are usually more or less liberal even with larger sample sizes than considered ones. Therefore, these methods can not be recommended either. On the other hand, the asymptotic unscaled Wald-type test usually demonstrates quite accurate control of the nominal type I error level under symmetric and moderately skewed (χ 2 10 -distribution in our simulations) distributions. However, for extremely skewed (the log-normal and χ 2 3 -log-normal models) and also sometimes for Laplace distributions, this method seems to have conservative character. When the sample sizes or standard deviations are equal, the unscaled permutation test based on Q U,π N works quite well and its behavior seems to be only a little worse than that of the best test among the other permutation tests. Nevertheless, this testing procedure is conservative in the case of positive pairing, and it does not maintain the nominal type I error level in the case of negative one. Our simulations confirm the results of Pauly el al. [20] and Smaga [23] about the WTPS test, which seems to be an adequate testing procedure except for the extremely skewed distribution with unequal variances, where it tends to highly over-reject the null hypothesis. In all investigated situations, the tests based on Q U,s,π N , Q D,π N and Q D,s,π N demonstrate the most accurate control of the preassigned type I error level.

Size control
Summarising the above results, we conclude that: The Q N and Q D N tests are much more liberal than the other testing procedures in almost all scenarios. Under the null, the test based on Q U N seems to perform well similarly as the permutation tests under normal and χ 2 10 -distributions. However, the asymptotic unscaled Wald-type testing procedure is conservative in the other settings. The permutation tests except eventually the test based on Q U,π N behave quite similarly under symmetric and χ 2 10 -distributions. It is also true under extremely skewed distributions when the standard deviations are equal. However, under extremely skewed distributions and unequal variances, the WTPS test and pos-sibly the test based on Q U,π N do not keep the nominal type I error level in contrast to the other permutation testing procedures.
Since the asymptotic unconditional null and conditional permutation distributions of the diagonal and unscaled test statistics are not in general the same for heteroscedastic designs (see Sections 2  and Q U N tests seem to work best. However, under log-normal distribution, the first two tests are too liberal and the last one is conservative. Nevertheless, the Q D,s,π N and Q U,s,π N tests decrease their liberality with increasing sample sizes much faster than the WTPS and diagonal permutation tests. Therefore, under very different variances, we may observe the advantage for the Q D,s,π N and Q U,s,π N tests over the other permutation ones.

Power
In Figure 2 and Tables 8-9 in the Appendix D, we present the empirical powers of the tests under consideration. Since the Q N and Q D N tests (resp. the test based on Q π N and Q U,π N ) in all settings (resp. in some cases discussed in the paper) are too liberal, their empirical powers are not really comparable (resp. in those cases). However, they are included for illustration and completeness. In fact, the Q N and Q D N tests appear to have the best power in most cases owing to their extremely liberal behavior.
Under homoscedastic settings, the empirical powers of the asymptotic unscaled Wald-type test are comparable with those of the permutation tests for symmetric and χ 2 10 -distributions, while for the other ones they are a few percent smaller. Moreover, the empirical powers of the permutation testing procedures are very similar. However, the Q U,π N and Q U,s,π N tests may be more powerful under symmetric and χ 2 10 -distributions, while the opposite is true under skewed ones. Under heteroscedastic designs and symmetric and χ 2 10 -distributions, the Q U N and Q U,s,π N tests have quasi identical power. This power is much less than that of the other permutation tests except the Q U,π N test for σ = σ 2 . However, the situation may change a little for σ = σ 3 and some alternatives.
Under extremely skewed distributions and heteroscedastic settings, the Q U,s,π N test is evidently more powerful than the unscaled Wald-type test. However, it has still less power than the permutation tests based on Q π N , Q D,π N and Q D,s,π N for σ = σ 2 . The empirical powers of the unscaled Wald-type permutation test are very similar to these of the Q U,s,π N test for σ = σ 2 and n l = n 1 . Moreover,   For σ = σ 2 and symmetric and χ 2 10 -distributions, the Q π N , Q D,π N and Q D,s,π N tests have similar power, possibly except for μ = μ 1 , where the WTPS test may be a little more powerful. Since the Q π N test tends to highly over-reject the null hypothesis in the extremely skewed distributions, its empirical powers are greater than those of the diagonal Wald-type permutation tests for σ = σ 2 . Interestingly, under negative pairing, the Q D,π N and Q D,s,π N tests are more powerful than the WTPS test (even for extremely skewed distributions where the WTPS test is too liberal). It seems that there are no significant differences in the empirical powers of the tests based on Q D,π N and Q D,s,π N in all scenarios.

Summary
Summarizing, it seems that the diagonal Wald-type permutation testing procedures perform best in size control and power for small sample sizes (m ≤ 10). The WTPS test and the test based on Q U,s,π N also work well in many scenarios, but are too liberal or have less power in some cases. For the convenience of the Reader, we provide Table 1 to indicate the recommended test statistics for different scenarios and small sample sizes based on our simulation studies. The Q N and Q D N tests are not considered, since they do not keep the type I error for small sample sizes. For large sample sizes, all testing procedures seem to work equally well, but the permutation tests may be timeconsuming.
In simulations of this section, we considered only a = 8. The comparison of the tests under different numbers of levels and observations is presented in the Appendix C, as a supplement to the above considerations.

Real data illustrative examples
In this section, we express three known experimental designs in terms of the general framework of factorial designs presented in Section 1 (see also Section 4 in [20]). For each design, an application of the tests to certain real data examples is also given.

One-way layout
In the one-way layout where factor A has a levels, X = (X 11 , . . . , X ana ) and μ = (μ 1 , . . . , μ a ) , we are interested in testing the hypothesis of no treatment effect H 0 : μ 1 = · · · = μ a . This hypothesis is equivalent to H 0 : Tμ = 0, where T = P a = I a − (1/a)1 a 1 a . As an example we consider the startup data from the R package GFD [8]. This dataset contains the startup costs (in thousands of dollars) of five companies. The company is treated as a factor with five levels (a = 5): bakery, gifts, pets, pizza and shoes (n 1 = 11, n 2 = 10, n 3 = 16, n 4 = 13, n 5 = 10). We would like to check statistically whether the type of the company has an effect on the startup costs. To solve this problem, we applied all of the tests considered in Section 4, obtaining the results given in Table 2. For α = 5%, all of the tests reject the null hypothesis that the startup costs under the five types of companies do not differ significantly. However, the p-values differ from one another. These of the Q N and Q D N tests are the smallest, while these of the unscaled Wald-type permutation tests are the largest (compare with the results of Section 4).

Two-way cross-classification design
In the cross-classification with two factors A (with a levels) and B (with b levels), we have X = (X 111 , . . . , X abn ab ) and μ = (μ 11 , . . . , μ ab ) . We consider the following hypotheses: of no main effect A, i.e., H A 0 :μ 1· = · · · =μ a· ; of no main effect B, i.e., H B 0 :μ ·1 = · · · =μ ·b ; and of no interaction between A and B. To express these hypotheses in the model of Section 1, we use the following contrast matrices P a ⊗(1/b)1 b , (1/a)1 a ⊗P b and P a ⊗P b , respectively, where ⊗ denotes the Kronecker product of matrices. For illustrative purposes, we use the batteries data ( [16], Table 5-1 p. 176). In this dataset, the life (in hours) of batteries is compared by three material types and three operating temperatures (low -15 o F, medium -70 o F, high -125 o F). From each material type, twelve batteries are randomly selected. Then, they are randomly allocated to each temperature level. The research questions concern possible difference in mean life of the batteries for differing material types and operating temperature levels. To answer these questions, the new tests and those from Pauly et al. [20] were used (p-values are given in Table 2). All testing procedures, except possibly the Q U,π N test, reject the null hypotheses, and hence we conclude that there are significant differences in mean battery life for the three material types and temperature levels, and there is interaction between them. As we observed L. Smaga Table 2 P -values (as percentages) of the tests for the startup, batteries and curdies data. in Section 4, the unscaled Wald-type permutation test is conservative in some situations, what is evident here.

Nested hierarchical design
We can also consider nested hierarchical designs. As an example, we present that with two fixed factors. Suppose that the factor A (categories) has a levels and the factor B (subcategories) has b i levels within level i of factor A. The vector of observations is X = (X 111 , . . . , X 1b1n 1b 1 , . . . , X aba1 , . . . , X aban aba ) and μ = (μ 11 , . . . , μ 1b1 , . . . , μ a1 , . . . , μ aba )  : Bμ = 0, respectively, where Q = diag ((1/b 1 )1 b1 , . . . , (1/b a )1 ba ) and B = diag(P b1 , . . . , P ba ). Here, we consider the curdies data containing the number of flatworms (dugesia) sampled in two seasons at different sites in the Curdies River in Western Victoria. This dataset is available in the R package GFD [8]. Season is a factor A with levels "summer" and "winter", while site is a factor B with levels 1 to 6, nested under A. The total number of observations is 36. To test the hypotheses H A 0 and H B(A) 0 for this data, we used the testing procedures under consideration. . So, site of the river does not have a significant effect on the number of flatworms.

Concluding remarks
Compared to other test statistics, the Wald-type one has the advantage that it is applicable in general factorial designs without assuming homoscedasticity or a particular error distribution. We have proposed the asymptotic and permutation tests based on the modified Wald-type statistics where a weight matrix is a certain diagonal matrix, i.e., diagonal and unscaled Wald-type tests. The new methods do not perform equally well when small sample sizes are apparent.
By extensive simulation studies, we conclude that the diagonal and standardized unscaled Wald-type permutation testing procedures perform best. These tests are comparable with or superior to the WTPS test of Pauly et al. [20] under finite samples. Interestingly, the best of our new solutions perform even better than the WTPS test under small sample sizes and heteroscedastic designs, where the new permutation methods are not in general asymptotically valid. So, the resampling procedures may perform well for finite sample without property of asymptotic validity. Except possibly the unscaled Wald-type permutation test, the other new permutation methods are asymptotically valid under homoscedastic designs, which means that they maintain the same asymptotic properties as the corresponding tests based on asymptotic null distribution of their test statistics.
In the case of very different variances in samples, the new tests and the WTPS test of Pauly et al. [20] may fail to keep the prescribed type I error at least under extremely skewed distributions and small sample sizes. When the number of observations is small, the sample variances are perhaps not reliable estimators, which has a negative effect on the behavior of the tests considered. Therefore, an improved variance estimation may result in better performance of the testing procedures [6,28]. This seems to be an interesting direction of the future research.  TVT). Hence, by theorem on the moments of quadratic forms (see, for instance, [15], p. 55), we conclude that Proof of Theorem 2.2. The consistency of the estimators follows immediately from the consistency ofV N for V. Under H 1 : Tμ = a = 0, we have TX · P → a as N → ∞. From the proof of Theorem 2.1, it follows that {diag( Proof of Theorem 2.3. From Lemma 1 in the supplement to Pauly et al. [20], it follows that conditional convergence in distribution given X, where σ 2 is given by (3.1). Moreover, Lemma 2 in that supplement shows that TV π N T P → σ 2 TDT as N → ∞.
The rest of the proof runs as in the proof of Theorem 2.1. [20]. From the continuous mapping theorem and the representation theorem of the quadratic forms in normal variables, it follows  [20] and Smaga [23]. It follows from that the results, which we mainly use, i.e., (A.1), the consistency ofV N and Lemmas 1 and 2 in the supplement to Pauly et al. [20] (see Remark 8.1 in this supplement), hold under these weaker assumptions than those of existing fourth moments. We present the results under stronger assumptions, since we want to be consistent with the results of Pauly et al. [20] and Smaga [23].

Appendix B: Asymptotic standardized diagonal and unscaled Wald-type tests
In this section, we briefly present the constructions of the asymptotic tests based on the standardized diagonal and unscaled Wald-type statistics. Their asymptotic properties are also investigated.
Theorem 2.4 shows that under the null hypothesis H 0 : Tμ = 0, if r = rank(T), then as N → ∞, where λ D,1 , . . . , λ D,r are the nonzero eigenvalues of the matrix {diag(TVT)} −1 TVT, V = diag(σ 2 1 /κ 1 , . . . , σ 2 d /κ d ), and Z D,1 , . . . , Z D,r are the independent standard normal variables. By Zhang [30], the asymptotic null distribution of the standardized diagonal Wald-type statistic can be approximated by that of ( The value h D,s is selected such that the first three moments of the variables coincide. The asymptotic critical value is then given by k D,s,α = ( as N → ∞, where λ U,1 , . . . , λ U,r are the nonzero eigenvalues of the matrix TVT, V = diag(σ 2 1 /κ 1 , . . . , σ 2 d /κ d ), and Z U,1 , . . . , Z U,r are the independent standard normal variables. The asymptotic null distribution of the standardized unscaled Wald-type statistic can be approximated by that of (χ

Appendix C: Additional simulation studies
The simulations presented were performed to analyse how the power of the tests jointly depends on the sample sizes and the number of levels of treatment factor. Simulation design was very similar to that of Subsection 4.1. We considered the normal, χ 2 10 and log-normal models, and n = (m+5)1 a , σ = 1 a , m = 5, 10, 15, 20 and a = 6, 8, 10, 12. We chose balanced and homoscedastic settings to fairly show the behavior of the power under increasing number of levels and observations. To investigate the type I error level (resp. power) of the tests, we considered μ = 0 a (resp. the alternative μ 1 = (1, 0 a−1 ) ).
The resulting empirical sizes and powers are depicted in Tables 10 and 11 in the Appendix D, respectively. For given sample sizes, the Q N and Q D N tests become more and more liberal with growth of the number of levels. The empirical sizes of the other testing procedures are very similar for all numbers of observations and levels. Moreover, they are quite close to significance level, except these for the Q U N test in case of the extremely skewed distribution. The empirical powers of the Q N and Q D N tests are usually at the same level for different values of a. However, they slightly increase or decrease in some cases, when the number of levels increases. This can be explained by unacceptable behavior of these tests under the null hypothesis. The empirical powers of the other testing procedures decrease as the number of levels increases, which seems to be natural as most of these tests keep the nominal type I error level. This decrease, however, usually decreases as the number of observations increases. Of course, for given number of levels, the empirical powers of these testing procedures increase quite fast as the number of observations increases.

Appendix D: Simulation results tables
Tables 3-11 contain the results of simulation studies considered in Section 4 and the Appendix C. Table 3 Empirical sizes (as percentages) of the tests obtained in the normal and Laplace models.  Table 4 Empirical sizes (as percentages) of the tests obtained in the χ 2 10 and log-normal models.   Table 6 Empirical sizes (as percentages) of the tests obtained for σ = σ 4 and n l = n 2 .