Spatial-Sign based High-Dimensional Location Test

In this paper, we consider the problem of testing the mean vector in the high dimensional settings. We proposed a new robust scalar transform invariant test based on spatial sign. The proposed test statistic is asymptotically normal under elliptical distributions. Simulation studies show that our test is very robust and efficient in a wide range of distributions.


Introduction
Assume X 1 , · · · , X n is an independent sample from p-variate distribution F (x − θ) located at p-variate center θ. We consider the following one sample testing problem H 0 : θ = 0 versus H 1 : θ = 0.
One typical test statistic is Hotelling's T 2 . However, it can not be applied when p > n − 1 because of the singularity of the sample covariance matrix. Recently, many efforts have been devoted to solve the problem, such as Bai and Saranadasa (1996), Srivastava and Du (2008), Srivastava (2009), Chen and Qin(2010) and Park and Ayyala (2013). They established the asymptotic normality of their test statistics under the assumption of diverging factor model (Bai and Saranadasa 1996). Even this data structure generates a rich collection of X, it is not easily met in practice. Moreover, multivariate t distribution or mixtures of multivariate normal distributions does not satisfy the diverging factor model. This motivates us to construct a robust test procedure.
Multivariate sign or rank is often used to construct robust test statistics in the multivariate setting. Especially, multivariate sign tests enjoy many desirable properties. First, those test statistics are distribution-free under mild assumptions, or asymptotically so. Second, they do not require stringent parametric assumptions, nor any moment conditions. Third, they have high asymptotic relative efficiency with respect to the classic Hotelling's T 2 test, especially under the heavy-tailed distributions. However, the classic spatial-sign test also can not work in the high-dimensional settings because the scatter matrix is unable to be estimated. Recently, without estimating the scatter matrix, Wang, Peng and Li (2014) proposed a high-dimensional nonparametric test based on the direction of X i , i.e. X i /||X i ||. Even it is workable and robust in high-dimensional settings, it loses all the information of the scalar of different variables and then is not scalar-invariant. In practice, different components may have completely different physical or biological readings and thus certainly their scales would not be identical. Srivastava (2009) and Park and Ayyala (2013) proposed two scalar-invariant tests under different assumption of correlation matrix. As shown above, they are not robust for the heavy-tailed distributions. In this paper, we proposed a new robust test based on spatial sign. We show that it is scalar invariant and asymptotic normal under some mild conditions. The asymptotic relative efficiency of our test with respect to Park and Ayyala (2013)'s test is the same as the classic spatial-sign test with respect to the Hotelling's T 2 test. Simulation comparisons show that our procedure has good size and power for a wide range of dimensions, sample sizes and distributions. All the proofs are given in the appendix.
2 Robust High-Dimensional Test

The proposed test statistic
Assume {X 1 , . . . , X n } be a independently and identically distributed (i.i.d.) random samples from p-variate elliptical distribution with density functions det(Σ) −1/2 g(||Σ −1/2 (x − θ)||) where θ's are the symmetry centers and Σ's are the positive definite symmetric p × p scatter matrices. The spatial sign function is defined as U(x) = ||x|| −1 xI(x = 0). In traditional fixed p circumstance, the following so-called "inner centering and inner standardization" sign-based procedure is usually used (cf., Section 6 of Oja 2010) whereŪ = 1 n n i=1Û i ,Û i = U(S −1/2 X ij ), S −1/2 are Tyler's scatter matrix. Q 2 n is affineinvariant and can be regarded as a nonparametric counterpart of Hotelling's T 2 test statistic by using the spatial-signs instead of the original observations X ij 's. However, when p > n, Q 2 n is not defined as the matrix S −1/2 is is not available in high-dimensional settings. Motivated by Hettmansperger and Randles (2002), we suggest to find a pair of diagonal matrix D and vector θ for each sample that simultaneously satisfy 1 n n j=1 U(ǫ j ) = 0 and p n diag where ǫ j = D −1/2 (x j − θ). (D, θ) can be viewed as a simplified version of Hettmansperger-Randles (HR) estimator without considering the off-diagonal elements of S. We can adapt the recursive algorithm of Hettmansperger and Randles (2002) to solve (2). That is, repeat the following three steps until convergence: The resulting estimators of location and diagonal matrix are denoted asθ andD. We may use the sample mean and sample variances as the initial estimators.
Then, we define the following test statistic whereD ij are the corresponding diagonal matrix estimator using leave-two-out sample {X k } n k =i,j .

Asymptotic results
We need the following conditions for asymptotic analysis: Condition (C1) is the same as the condition (4)  To get the consistency of the diagonal matrix, the dimension must diverging faster than the sample sizes.
The following theorem establishes the asymptotic null distribution of R n .

Theorem 1 Under Conditions (C1)-(C3) and
. We propose the following estimators to estimate the trace terms in σ 2 where (θ ij ,D ij ) are the corresponding spatial median and diagonal matrix estimators using leave-two-out sample {X k } n k =i,j . By Proposition 2 in Feng et al. (2014), tr(R 2 )/tr(R 2 ) → 1 as p, n → ∞. Consequently, a ratio-consistent estimator of σ 2 n under H 0 isσ 2 n = 2 n(n−1)p 2 tr(R 2 ). And then we reject the null hypothesis with α level of significance if R n /σ n > z α , where z α is the upper α quantile of N(0, 1).
Next, we consider the asymptotic distribution of R n under the alternative hypothesis Theorem 1 and 2 allow us to compare the proposed test with some existing work in terms of limiting efficiency. In order to obtain an explicit expression for comparison use, we assume that λ max (p −1 R) = o(n −1 ) and then . Thus, the asymptotic power of our proposed test under the local alternative is In comparison, Park and Ayyala (2013) showed that the asymptotic power of their proposed test (abbreviated as PA hereafter) is whereD andR are the variance and correlation matrix of X i , respectively. Thus, the asymptotic relative efficiency (ARE) of R n with PA test is where the last equality is followed by tr(R 2 ) = tr(R 2 ) andD = p −1 E(||ǫ|| 2 )D. Similar to the proof of Theorem 1, under Condition (C3), we can show that c 0 = E(||ǫ|| −1 )(1 + o (1)). Thus,    (1)) and tr(B 2 ) = p −2 δ 2 tr(R 2 ). Then, Thus, our SS test has the same power as WPL test in this case. However, their test is not scalar-invariant. To appreciate the effect of scalar-invariance, we consider the following representative cases. Let Σ be a diagonal matrix. The first half diagonal elements of Σ are all τ 2 1 and the rest diagonal elements are all τ 2 2 . The mean only shift on the first half components, i.e. µ i = ζ, i = 1, · · · , p/2 and the others are zeros. Thus, However, it is difficult to calculate the explicit form of β W P L for arbitrary τ 2 1 , τ 2 2 . We only consider two special cases. If τ 2 1 ≫ τ 2 2 , Thus, ARE(R n ,WPL) has a positive lower bound of 1/ √ 2. However, if τ 2 2 ≫ τ 2 1 , Then, ARE(R n ,WPL)=τ 2 2 /( √ 2τ 2 1 ) could be very large. This property shows the necessity of a test with the scale-invariance property.

Simulation
Here we report a simulation study designed to evaluate the performance of the proposed SS test. All the simulation results are based on 2,500 replications. The number of variety of multivariate distributions and parameters are too large to allow a comprehensive, allencompassing comparison. We choose certain representative examples for illustration. The following scenarios are firstly considered.
(II) Multivariate normal distribution with different component variances.
Here we consider the correlation matrix R = (0.5 |i−j| ) 1≤i,j≤p . Two sample sizes n = 50, 100 and three dimensions p = 200, 400, 1000 are considered. For power comparison, under H 1 , we consider two patterns of allocation for µ. One is dense case, i.e. the first 50% components of µ are zeros. The other is sparse case, i.e. the first 95% components of µ are zeros. To make the power comparable among the configurations of H 1 , we set η =: ||µ|| 2 / tr 2 (Σ) = 0.03 throughout the simulation. And the nonzeros components of µ are all equal. Then, Next, we will show that J n1 = o p (σ n ).