Skip to main content
Log in

Improving power of multivariate combination-based permutation tests

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Developing powerful hypothesis testing procedures devoted at comparing multivariate populations is quite a common and relevant topic either from the methodological and the practical point of view and in this connection the NonParametric Combination (NPC) permutation methodology provides a more flexible and effective background for many multivariate testing problems (Pesarin and Salmaso in Permutation tests for complex data: theory, applications and software, 2010a). The goal of this paper is to propose some specific procedures aimed at possibly improving power of NPC Tests in the context of the additive linear model. It will be shown by an extensive simulation study, the improved-in-power NPC Tests are certainly good alternatives with respect to the traditional multivariate tests such as Hotelling T 2 and multivariate rank-based tests, especially in cases of heavy-tailed distributions. Moreover, the NPC methodology offers several advantages since it provides robust solutions with respect to the true underlying random error distribution and it is not affected by the problem of the loss of degrees of freedom when keeping fixed the number of observations. Indeed, unlike traditional methods, when the number of informative variables increases its power monotonically increases as well (leading to the so-called finite-sample consistency property of NPC Test, Pesarin and Salmaso in J. Nonparametr. Stat. 22(5):669–684, 2010b).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Bersimis, S., Psarakis, S., Panaretos, J.: Multivariate statistical process control charts: an overview. Qual. Reliab. Eng. Int. 23, 517–543 (2007)

    Article  Google Scholar 

  • Blair, R.C., Higgins, J.J., Karniski, W., Kromrey, J.D.: A study of multivariate permutation tests which may replace Hotelling’s t 2 test in prescribed circumstances. Multivar. Behav. Res. 29, 141–163 (1994)

    Article  Google Scholar 

  • Brombin, C., Salmaso, L.: Multi-aspect permutation tests in shape analysis with small sample size. Comput. Stat. Data Anal. 53(12), 3921–3931 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  • Cliff, N.: A test that combines frequencies and quantitative information. J. Mod. Appl. Stat. Methods 10(1), 2–7 (2011)

    Google Scholar 

  • Cox, D.R., Hinkley, D.V.: Theoretical Statistics. Chapman & Hall, London (1974)

    Book  MATH  Google Scholar 

  • Folks, J.L.: Combinations of independent tests. In: Krishnaiah, P.R., Sen, P.K. (eds.) Handbook of Statistics, vol. 4, pp. 113–121. North-Holland, Amsterdam (1984)

    Google Scholar 

  • Goggin, M.L.: The “Too few cases/too many variables” problem in Implementation Research. West. Polit. Q. 39, 328–347 (1986)

    Article  Google Scholar 

  • Gupta, S.D., Perlman, M.D.: Power of the noncentral F-test: effect of additional variates on Hotelling’s T 2-test. J. Am. Stat. Assoc. 69(345), 174–180 (1974)

    MATH  Google Scholar 

  • Hoeffding, W.: The large-sample power of tests based on permutations of observations. Ann. Math. Stat. 23, 169–192 (1952)

    Article  MATH  MathSciNet  Google Scholar 

  • Hotelling, H.: Multivariate quality control—illustrated by the air testing of sample bombsights. In: Eisenhart, C., Hastay, M.W., Wallis, W.A. (eds.) Techniques of Statistical Analysis, pp. 111–184. McGraw-Hill, New York (1947)

    Google Scholar 

  • Oja, H., Randles, R.H.: Multivariate nonparametric tests. Stat. Sci. 19(4), 598–605 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Pesarin, F., Salmaso, L.: Permutation Tests for Complex Data: Theory, Applications and Software. Wiley, Chichester (2010a)

    Book  Google Scholar 

  • Pesarin, F., Salmaso, L.: Finite-sample consistency of combination-based permutation tests with application to repeated measures designs. J. Nonparametr. Stat. 22(5), 669–684 (2010b)

    Article  MATH  MathSciNet  Google Scholar 

  • Puri, M.L., Sen, P.K.: Nonparametric Methods in Multivariate Analysis. Wiley, New York (1971)

    MATH  Google Scholar 

  • Salmaso, L., Solari, A.: Multiple aspect testing for case-control designs. Metrika 12, 1–10 (2005)

    MathSciNet  Google Scholar 

  • Tsai, M.-T., Sen, P.K.: On inadmissibility of Hotelling T 2-tests for restricted alternatives. J. Multivar. Anal. 89, 87–96 (2004)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luigi Salmaso.

Appendix: Extension of finite-sample consistency to non-associative statistics

Appendix: Extension of finite-sample consistency to non-associative statistics

In order to extend finite-sample consistency to non-associative statistics, let us briefly introduce the notion of conditional (permutation) unbiasedness for any kind of statistics \(T^{*} (\boldsymbol{\delta})=S(\mathbf{Y}_{1}^{*}(\boldsymbol{\delta}))- S(\mathbf{Y}_{2}^{*})\). To this end and with clear meaning of the symbols, let us observe that:

  • T o(0)=S(Z 1)−S(Z 2), i.e. the null observed value of statistic T.

  • T o(δ)=S(Z 1+δ)−S(Z 2)=S(Z 1)+D S (Z 1,δ)−S(Z 2)=T o(0)+D S (Z 1,δ), where D S (Z 1,δ)≥0.

  • \(T^{*} (0)=S(\mathbf{Z}_{1}^{*} )- S(\mathbf{Z}_{2}^{*})\), i.e. the value of T in the permutation \(\mathbf{u}^{*} = u_{1}^{*},\dots,u_{n}^{*}\).

  • \(T^{*} (\boldsymbol{\delta})=S(\mathbf{Z}_{1}^{*} +\boldsymbol{\delta}^{*} )- S(\mathbf{Z}_{2}^{*} )=T^{*} (0)+D_{S}(\mathbf{Z}_{1}^{*},\boldsymbol{\delta}^{*} )- D_{S}(\mathbf{Z}_{2}^{*})\).

  • \(D_{S}(\mathbf{Z}_{1}^{*},\boldsymbol{\delta}^{*} )\geq D_{S}(\mathbf{Z}_{2}^{*},0)=0=D_{S}(\mathbf{Z}_{2},0)\), because effects \(\boldsymbol{\delta}_{2i}^{*}\) coming from first group are non-negative.

  • \(D_{S}(\mathbf{Z}_{1}^{*},\boldsymbol{\delta}^{*} )\leq D_{S}(\mathbf{Z}_{1}^{*},\boldsymbol{\delta})\) point-wise, because in \(D_{S}(\mathbf{Z}_{1}^{*},\boldsymbol{\delta}^{*})\) there are non-negative effects assigned to units coming from group 2; e.g., suppose n 1=3, n 2=3, and u =(3,5,4,1,2,6), then \((\mathbf{Z}_{1}^{*},\boldsymbol{\delta}^{*} )=[(Z_{13},\boldsymbol{\delta}_{13}), (Z_{22},0),(Z_{21},0)]\), and so

    $$\bigl(\mathbf{Z}_{1}^{*},\boldsymbol{\delta}\bigr)= \bigl[(Z_{13},\boldsymbol{\delta}_{13}),(Z_{22}, \boldsymbol{\delta}_{11}),(Z_{21},\boldsymbol{ \delta}_{12})\bigr], $$

    or

    $$\bigl(\mathbf{Z}_{1}^{*},\boldsymbol{\delta}_{1} \bigr)=\bigl[(Z_{13},\boldsymbol{\delta}_{13}),(Z_{22}, \boldsymbol{\delta}_{12}),(Z_{21},\boldsymbol{ \delta}_{11})\bigr]; $$

    it is to be emphasized that \(Y(u_{i}^{*})=Z(u_{i}^{*})+\boldsymbol{\delta}(u_{i}^{*})\) if \(u_{i}^{*} \leq n_{1}\), that is units coming from first group maintain their effects, whereas the rest of effects are randomly assigned to units coming from second group.

  • \(D_{S}(\mathbf{Z}_{1}^{*},\boldsymbol{\delta})\mathop{ =} \limits^{d} D_{S}(\mathbf{Z}_{1},\boldsymbol{\delta})\), because \(\mathrm{Pr}\{\mathbf{Z}_{1}^{*} |\mathsf{X}_{/\mathbf{Y}(0)} \}= \mathrm{Pr}\{\mathbf{Z}_{1} |\mathsf{X}_{/\mathbf{Y}(0)}\}\) (see Pesarin and Salmaso 2010a).

Thus \(D_{S}(\mathbf{Z}_{1}^{*},\boldsymbol{\delta}^{*} )- D_{S}(\mathbf{Z}_{2}^{*} )\leq D_{S}(\mathbf{Z}_{1},\boldsymbol{\delta})\) in permutation distribution and so

$$\begin{aligned} \lambda_{T} \bigl(\mathbf{X}(\boldsymbol{\delta})\bigr) & = \Pr \bigl\{ T\bigl(\mathbf{X}^{*} (\boldsymbol{\delta})\bigr)\geq T\bigl( \mathbf{X}(\boldsymbol{\delta})\bigr)|\mathsf{X}_{/\mathbf{Y}(\boldsymbol{\delta})}\bigr\} \\ & = \Pr \bigl\{ T^{*} (0)+D_{S}\bigl( \mathbf{Z}_{1}^{*},\boldsymbol{\delta}^{*} \bigr)- D_{S}\bigl(\mathbf{Z}_{2}^{*} \bigr)\\ &\quad {}- D_{S}(\mathbf{Z}_{1},\boldsymbol{\delta})\geq T^{o} (0)|\mathsf{X}_{/\mathbf{Y}(0)}\bigr\} \\ & \leq \Pr \bigl\{ T^{*} (0)\geq T^{o} (0)| \mathsf{X}_{/\mathbf{Y}(0)} \bigr\} = \lambda_{T}\bigl(\mathbf{Y}(0)\bigr), \end{aligned}$$

which establishes the dominance in permutation distribution of λ T (Y(δ)) with respect to λ T (Y(0)), uniformly for all data sets Y 2Y n, for all underlying distributions P, and for all associative and non-associative statistics T=S(Y 1)−S(Y 2).

These results allows us to prove the following:

Theorem

Suppose that in a two-sample problem there are p≥1 non homoscedasticity variables Y=(Y 1,…,Y p ), the observed data set is Y(δ)=(δ k +σ k Z i1k ,i=1,…,n 1,σ k Z i2k ,i=1,…,n 2; k=1,…,p) and the hypotheses are

$$\begin{aligned} & H_{0}: \Bigl[ \mathbf{Y}_{1}\mathop{ =} \limits ^{d} \mathbf{Y}_{2} \Bigr] = [ \boldsymbol{\delta} = \mathbf{0} ]\quad \mathrm{against}\\ & H_{1}: \Bigl[ \textbf{Y}_{1}\mathop{ >} \limits^{d} \textbf{Y}_{2} \Bigr] = [ \boldsymbol{\delta} > \mathbf{0} ], \end{aligned}$$

where δ=(δ 1,…,δ p )′. For the testing purpose consider the statistic

$$T_{k}^{*}(\boldsymbol{\delta})=1/p \sum _{ k \leq p}\bigl[\tilde{Y}_{1k}^{*}( \delta_{k})- \tilde{Y}_{2k}^{*} \bigr]/S_{k}, $$

where \(\tilde{Y}_{ji}^{*}(\boldsymbol{\delta})=\mathsf{M}d[Y_{ijk}^{*}(\boldsymbol{\delta})/S_{Mk}, k=1,\dots,p]\), i=1,…,n j , j=1,2, is the median vector of p scale-free variables specific to i-th subject, and \(S_{Mk} = \mathit{MAD}_{k} =\mathsf{M}d[|Y_{\mathit{ijk}} - \tilde{Y}_{k}|, i=1,\dots,n_{j}, j=1,2]\) is the median of absolute deviations from the median specific to the variable Y k .

In this setting, the test based on \(T_{Md}^{*} (\delta)\) is conditional and unconditional finite-sample consistent as far as p diverges and M d(Y 1(δ))>0 without requiring existence of any positive moment for p variables.

Proof

For the non-associative statistics it applies the uniformly stochastic ordering of the significance level functions with respect to δ and Y, that is for δ′>δ

$$\mathrm{Pr}\bigl\{ \lambda_{T} [\mathbf{Y}( \boldsymbol{\delta} ')]\leq \alpha \bigr\} \mathop{ \le} \limits^{d} \mathrm{Pr}\bigl\{ \lambda_{T} [\mathbf{Y}( \boldsymbol{\delta} )]\leq \alpha\bigr\} , $$

hence, with reference to the finite-sample consistency of the second order combined test using the medians

$$\begin{aligned} T''^{obs} & = 1/p \sum _{ k \leq p}\bigl[\tilde{Y}_{1k}( \delta_{k})- \tilde{Y}_{2k}(0)\bigr]/S_{k}\\ & = 1/p \sum _{ k \leq p}\bigl[\tilde{Y}_{1k}(0)+ \delta_{k} - \tilde{Y}_{2k}^{*}(0)\bigr]/S_{k}\\ & = 1/p \sum_{ k \leq p} \bigl[\tilde{Y}_{1k}(0)- \tilde{Y}_{2k}(0)\bigr]/S_{k} +1/p \sum _{ k \leq p} \delta_{k} /S_{k}. \end{aligned}$$

It should be noted that the quantity \(1/p \sum_{ k \leq p} [\tilde{Y}_{1k}(0)- \tilde{Y}_{2k}(0)]/S_{k}\) is nothing else than the arithmetic mean of p sample differences which are all measurable, given that all p involved variables are non-degenerate by assumption (i.e. S k >0;k=1,…,p) and, provided that min(n 1,n 2) is not too small, are all finite (for instance, with the Pareto distribution if its parameter is γ≥[min(n 1,n 2)/2]; where [⋅] is the integer part of (⋅), the first moment E Y (Y,γ) is finite; it is noticeable that E Y (Y,γ) does not exist γ≤1). Thus, by the law of large numbers for sequences of dependent variables, as p diverges it converges weakly to a constant, not necessarily null. The induced standardized global noncentrality 1/p kp δ k /S k , which is itself a mean of non-negative and measurable quantities, if it converges, it does so to a positive quantity but it could be let free to diverge as well. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Corain, L., Salmaso, L. Improving power of multivariate combination-based permutation tests. Stat Comput 25, 203–214 (2015). https://doi.org/10.1007/s11222-013-9426-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-013-9426-0

Keywords

Navigation