Nonparametric Simultaneous Test Procedures

In this research we propose several nonparametric simultaneous test procedures for location and scale parameters. We construct test statistics based on linear rank statistics choosing a suitable combining function. We obtain the overall p-values by applying the permutation principle. We compare the efficiency amongst combining functions by obtaining empirical powers through a simulation study. We discuss some interesting aspects of our procedure as concluding remarks.


Introduction
In order to improve test performance, some nonparametric testing procedures have adopted the trend of using several nonparametric test statistics simultaneously.One of the well-known procedures may be the versatile test (see Fleming, Harrington & O'Sullivan 1987), which combines several tests under the identical hypothesis with a suitable combining function and obtains overall p-values.Park (2011) considered several versatile tests with a group of quantile test statistics.On the other hand, one may consider reducing the scope of the null hypothesis by splitting it into several sub-null hypotheses according to the interesting aspects of the underlying distribution and then intersecting them.Pesarin (2001) has initiated this approach for nonparametric testing problems and named the multiaspect test.Taking this approach, Marozzi (2004) combined the permutation t and median tests and Marozzi (2007) added also the Wilcoxon test in the combination to address the two-sample location problem.Also Salmaso & Solari (2005) considered the multi-aspect test for the case-control study.Brombin, Salmaso, Ferronato & Galzignato (2011) considered applying this multi-aspect test to the case of bio-medical data.Furthermore, the multi-aspect test has been very useful in addressing the scale problem (see Marozzi 2012aMarozzi , 2012bMarozzi , 2012cMarozzi , 2012d)).
Simultaneous use of several nonparametric tests can be applied also to the problem of testing location-scale parameters concurrently, for the two-sample case.In order to discuss this approach more concretely, let F 1 and F 2 be the distribution functions of the populations underlying the samples.Also let δ and η be the location translation and scale parameters, respectively.We assume that F 1 and F 2 satisfy the following location-scale model such that for all x ∈ (−∞, ∞), for some δ ∈ (−∞, ∞) and η ∈ (0, ∞).In view of the simultaneous tests for δ and η under the model (1.1), we can express the null hypothesis as follows: Then performing some reasonable nonparametric tests for each individual sub-null hypothesis and combining their results with a chosen combining function, one may obtain an overall p-value from the null distribution of the combined test statistic.This can be called a nonparametric simultaneous test procedure for the locationscale problem.Lepage (1971) has initiated this topic by combining the Wilcoxon rank sum and Ansari-Bradley's (Ansari & Bradley 1960) statistics for location and scale parameters, respectively using the quadratic form for the combining function.Lepage (1973) reported the exact critical points and significance levels for some selected sample sizes.Duran, Tsai & Lewis (1976) derived the asymptotic relative efficiency of Lepage's test with respect to another simultaneous test using Mood's statistic, for the scale parameter (Mood 1954).Lepage's procedure has been reviewed and discussed extensively Podgor & Gastwirth (1994), Zhang (2006) and modified in various ways (Murakami 2007, Rublik 2009, Neuhäuser, Leuchs & Ball 2011, Marozzi 2012a).Marozzi (2013) reviewed and compared these tests by obtaining empirical powers through an extensive simulation study.We note that all the reviewed simultaneous test statistics have adopted the quadratic form to combine two individual test statistics for the sub-null hypotheses while those by Marozzi (2012a) have followed the combining function approach.A rather different test is the Cucconi one (Marozzi 2009) which is not a quadratic form combining a test statistic for location and another for scale but considers squared ranks and squared contrary-ranks.
For obtaining an overall p-value to complete a simultaneous test, one has to derive the null distribution of the chosen combining function.For this purpose, one may derive the asymptotic normality with the large sample approximation theorem.However nowadays, it is common to apply the permutation principle (Good 2000), which uses the re-sampling method.The permutation principle has been proposed by Fisher (1932) yet only until quite recently with the rapid development of computer capability has begun to be widely used.In passing, we note that the permutation principle yields an exact test (Good 2000).
In this research, we propose a nonparametric simultaneous test procedure for (2) with several combining functions for the two sample problem.We particularly note that we have excluded the use of the quadratic form for the combining function to accommodate various types of alternatives.The rest of this paper will be organized as follows.In Section 2, we construct the test statistics for the simultaneous test through p-values from individual partial tests for the sub-null hypotheses, H 01 : δ = 0 and H 02 : η = 1 , using the (1) Fisher (1932), ( 2) Liptak (1958) and ( 3) Tippett (1931) combining functions.Then we obtained overall pvalues for any chosen combining function by applying the permutation principle.In Section 3, we investigate the performance of our procedure and compare it with other tests by obtaining empirical powers through a simulation study.For this, we consider the following two cases separately: both sub-alternatives are one-and two-sided.We also considered the Wilcoxon rank sum test for location and Mood's and Ansari-Bradley's tests for scale to investigate our procedure.Then we discuss some interesting features of the nonparametric simultaneous tests, the use of the limiting distributions for the combining functions and the bootstrap method for obtaining overall p-values as concluding remarks in Section 4.

Formulation of Nonparametric Simultaneous Tests
Let X 11 , . . ., X 1n1 and X 21 , . . ., X 2n2 be two independent random samples from populations with distribution functions F 1 and F 2 , respectively.We assume that F i is unknown but continuous for each i, i = 1, 2 satisfying the relation (1).We are interested in testing (2) which requires a simultaneous test procedure for both location and scale parameters.For this purpose, we will use the nonparametric multi-aspect testing approach.This means that first of all we choose suitable linear rank tests for the sub-null hypotheses H 01 : δ = 0 and H 02 : η = 1 and then combine the results of the two individual tests to obtain an overall p-value.Randles & Wolfe (1979) have studied and summarized extensively linear rank tests for testing hypothesis on location and scale parameters for the two-sample problems.Let L and S be the respective linear rank statistics for testing the two sub-null hypotheses H 01 : δ = 0 and H 02 : η = 1 such as where a and b are the score functions and R 1j , the rank of X 1j from the combined sample.For example, one may choose the Wilcoxon rank sum statistic for L and Mood's statistic for S. Without loss of generality, we assume that both statistics L and S are of the standardized forms.In order to proceed our discussion further for the construction of the test statistics for testing (2), let λ 1 and λ 2 be the pvalues for testing the sub-null hypotheses, H 01 : δ = 0 and H 02 : η = 1 based on the statistics L and S, respectively.Then for obtaining an overall p-value for simultaneously testing (2), we consider a suitable combining function to combine the two individual p-values, λ 1 and λ 2 .For this matter, Pesarin (2001) has in detail reviewed and summarized several useful combining functions.In the following, we introduce three types of combining functions, which will be used in the simulation study.
(1) The Fisher omnibus combining function (Fisher 1932) corresponds to We note that if two partial test statistics are independent and continuous, then asymptotically C F follows a χ 2 distribution with 4 degrees of freedom under (1.1).
The Liptak combining function (Liptak 1958) corresponds to where Φ −1 is the inverse of the standard normal distribution function.
(3) The Tippett combining function (Tippett 1931) corresponds to To complete the simultaneous test for (2), we have to obtain the null distribution of any chosen combining function to compute an overall p-value.We may achieve this task by applying the permutation principle.Even though the permutation principle yields an exact test, the excessive computational burden for the consideration of all the permutational configurations leads us to take the Monte-Carlo approach for the re-sampling phase, which yields an approximate result.
In the next section, we perform a simulation study to compare the efficiency among the tests corresponding to the combining functions listed above with other well-known tests.With Fisher, Liptak and Tippett tests in the sequel, we indicate those using the combining functions (1), ( 2) and (3), respectively.In addition, we consider the Lepage (1971), Pettitt (1976) and Neuhäuser et al. (2011) tests for this comparison study.We note that the Lepage, Pettitt and Neuhäuser tests take the quadratic form for their test statistics.This implies that these three tests cannot be applied to the one-sided alternative for each individual test whereas ours can.For this reason, we will carry out the simulation study in two parts under the schemes of the one-and two-sided alternatives separately.

A Simulation Study
In this section, we investigate the performance of the tests by estimating their empirical powers through a simulation study.First of all, for the one-sided alternative, we compare the empirical powers among our proposed tests.Then we consider the Lepage, Pettitt and Neuhäuser tests with ours for the two-sided alternative.We consider the following six different distributions: normal, Cauchy, half-Cauchy, exponential, uniform and double exponential with unit variance except the Cauchy and half-Cauchy distributions.For our procedure with Lepage test, we use the Wilcoxon test for the location and the Mood and Ansari-Bradley tests for the scale.In view of the null hypothesis (2), we considered values of the pair (δ, η) varying from (0,1) to (1,2) with the increment 0.2 for each parameter.We note that (0,1) are the values of the pair (δ, η) under the null hypothesis, 2. The sample sizes for this study were chosen as (10,10), (10,20) and (20,10) and the nominal significance level is 0.05 for all cases.The simulation has been conducted with SAS/IML on the PC version and all the results in the tables are based on 10,000 simulations with the Monte-Carlo method and within a simulation, we applied the permutation principle by 5,000 iterations also with the Monte-Carlo approach to estimate the distribution for each test.During the revision of this paper, a referee brought a paper by Marozzi (2014) to our attention containing a discussion on the optimal choice of the number of iterations with the permutation principle given the Monte-Carlo simulation size.As suggested, one could say that 5,000 iterations in this work may be much higher than the recommended numbers.The simulation results are all summarized in Tables 1 through 12 for the one-sided alternative and Tables 2 to 12 for the two-sided alternative.
We note that the two kinds of procedures based on Mood and Ansari-Bradley tests for the scale yielded almost identical results for both cases.For the onesided alternative, the Liptak and Fisher tests show high performance for the normal, Cauchy, uniform and double exponential distributions while the Tippett one achieves better performance for the half-Cauchy and exponential distributions.For the two-sided alternative, the Tippett, Liptak and Fisher tests show high performance for the normal, Cauchy and double-exponential distributions while the Pettitt one obtains higher power than any other test for the exponential and uniform distributions.We note that the Neuhäuser test yields very high performance for the uniform distribution but does not achieve the nominal significance level for the half-Cauchy and exponential distributions which are skewed.Therefore, one should apply the Neuhäuser test with caution when the underlying distribution is skewed.The Lepage test shows poor performance for all cases.We likewise note that the Liptak and Fisher tests seem to be suitable for symmetric distributions while the Tippett test may be appropriate for the skewed ones for the one-sided alternative.
In general, the Lepage and Pettitt tests appear to be somewhat conservative since their empirical type I errors are lower than the nominal significance level 0.05, while our tests tend to achieve those slightly higher.However, the phenomena tests appearing conservative or not seem not to affect the power of same.Finally we note that powers may depend on whether the two sample sizes are equal or not.
where Ψ −1 is the inverse of an arbitrary distribution function Ψ and α i 's are arbitrary weights.Also T i is the summarized test statistic for the ith parameter and G i , the null distribution function of T i , i = 1, . . ., k.The choice of Φ for Ψ allows to use the table of the standard normal distribution from the point of view of the distribution theory.Also the choice of α 1 = α 2 = 1 implies that we treat location and scale aspects as equally important in this study.In situations where an aspect is considered more important than the other, we can assess this by allocating more weight to the important one.For more information related to the Liptak combination function, you may refer to van Zwet & Oostherhoff (1967).
The simultaneous test for location and scale parameters can be regarded as one other application of the multi-aspect test based on the two parameters-location and scale when one considers the following null hypothesis H 0 : F 1 = F 2 .However we do note that the multi-aspect test requires multiple partial tests for a parameter, while the simultaneous test selects a partial test for a sub-null hypothesis under the location-scale model (1.1).As an example, under the assumption that F 1 and F 2 may differ only in the location parameter with the same variance, one should use a multi-aspect test by applying the Wilcoxon and two-sample t-tests simultaneously.On the other hand, if one cannot assure which parameter may be different or assumes that both parameters may be different, it would be appropriate to apply a simultaneous test by using the Wilcoxon and Mood tests for the location and the scale parameters, respectively.Therefore the use of a simultaneous test would lead to loss of efficiency in terms of power to compared with a multi-aspect test when the location differs only between F 1 and F 2 and vice versa.
In order to obtain overall p-values in the simulation study, we have applied the permutation principle twice successively in loop style, thus demanding excessive computational work.To alleviate this computational burden, one may obtain p-values using the asymptotic normality approach based on the large sample approximation theory.For purposes of convenience in our discussion, we assume that L and S are of the standardized form under 2, which implies that the mean and variance are 0 and 1, respectively, for both cases.Then it is well-known that the limiting null distributions of L and S are standard normal (Randles & Hogg 1971).Also we note that L and S are uncorrelated when S is Ansari-Bradley or Mood's statistic (Duran et al. 1976).Then one may conclude that the limiting null distribution of C L = Φ −1 (1 − λ 1 ) + Φ −1 (1 − λ 2 ) is normal with mean 0 and variance 2. For the null distribution of C T , you may refer to Pesarin & Salmaso (2010) for some detailed discussion.For Fisher's combining function, it is obvious that the limiting null distribution becomes a χ 2 distribution with 4 degrees of freedom.
However we have used p-values instead of linear rank statistics for the construction of test statistics in this study.This may require some additional computational Revista Colombiana de Estadística 38 (2015) 107-121 work but can be applied directly to any type of alternatives since each p-value should be obtained with respect to the corresponding sub-alternative.Therefore one can apply our procedure without any modification even when the two subalternatives are different such as two-sided for the location but one-sided for the scale parameter.In order to clarify our arguments, we will consider an example.Suppose that we apply the Wilcoxon rank sum test for testing H 01 : δ = 0 against H 11 : δ = 0 and the Mood test for testing H 02 : η = 1 against H 12 : η > 1.We can obtain the two p-values, λ 1 and λ 2 based on the given data using the distributions of L and S. For any chosen combining function C discussed in the previous section, we can obtain an overall p-value using the permutation principle to complete the test procedure for testing 2 against H 1 : {δ = 0} ∪ {η > 1}.
In this paper, we took the permutation framework.Another popular resampling framework is the bootstrap method (Efron 1979, Shao & Tu 1995).The difference between the bootstrap and permutation methods is being that the bootstrap method re-samples with replacement whereas the permutation method does so without replacement from the original pooled sample.However the difference can be significant for some cases (Good 2000).Bootstrap tests are more flexible than permutation tests because they may also be used when the null hypothesis is not a hypothesis of invariance and particularly in certain cases when the exchangeability condition is not satisfied.On the other hand, they are data-dependent without being strictly conditional procedures.In other words, for the finite sample sizes, inferential interpretations of bootstrap tests are not completely clear because they are neither conditional nor unconditional procedures.For a more detailed discussion, see Pesarin & Salmaso (2010).

Table 1 :
Power estimates for normal distribution (one-sided).

Table 2 :
Power estimates for normal distribution (two-sided).

Table 9 :
Power estimates for uniform distribution (one-sided).

Table 10 :
Power estimates for uniform distribution (two-sided).

Table 11 :
Power estimates for double exponential distribution (one-sided).

Table 12 :
Power estimates for double exponential distribution (two-sided).