Testing for independence: Saddlepoint approximation to associated permutation distributions

: One of the most popular class of tests for independence be- tween two random variables is the general class of rank statistics which are invariant under permutations. This class contains Spearman’s coeﬃcient of rank correlation statistic, Fisher-Yates statistic, weighted Mann statistic and others. Under the null hypothesis of independence these test statis- tics have a permutation distribution that is usually approximated by using asymptotic normal theory to determine p-values for these tests. In this note we suggest using a saddlepoint approach that is almost exact and needs no simulations in order to calculate the p-value for tests in this class.


Introduction
Suppose we observe N independent pairs of random variables (X 1 , Y 1 ), (X 2 , Y 2 ), . . . , (X N , Y N ) and we wish to test the null hypothesis H 0 that the two variables X i and Y i are independent for each i. Rearrange all N pairs of observations according to the magnitude of their first coordinate into the sequence (X d1 , Y d1 ), (X d2 , Y d2 ), . . . , (X dN , Y dN ) in such a way that X d1 < X d2 < · · · < X dN . Then put R i equal to the rank of Y di among the observations Y d1 , Y d2 , . . . , Y dN . Under the assumption of independence and assuming no ties, all N ! orderings (R 1 , . . . , R N ) are equally likely with probability 1/N !. If we are willing to assume that the two variables have a positive association, the {R i } should reveal an upward trend, with large values tending to occur on the right of the sequence and low values on the left. An appropriate test statistic that reflects this idea is with small values of D indicating significance. The statistic D is related to the well known Spearman's coefficient of rank correlation statistic, S p , with the relation S p = 1 − 6D/N (N 2 − 1), see Gibbons and Chakraborti (2003). It is also related to the weighted Mann statistic, D ′ , by D ′ = 1 6 N (N 2 − 1) − 1 2 D. Expanding (1), D can be written as which gives an equivalent simple statistic Hajek, Sidak and Sen (1999). The statistic V ′ is equivalent to a general class of rank statistics whose null distributions are invariant under permutations. This class can be written as which contains the Fisher-Yates normal score test with f N (i) = EU are an ordered sample of N observations from the standardized normal distribution; the van der Waerden test statistic with f N (i) = Φ −1 ( i N+1 ), where Φ is the standard normal distribution function; and the quadrant test statistic with f N (i) = sign(i − N+1 2 ), Hajek, Sidak and Sen (1999).
This paper is concerned with approximating p-values for tests in the class (3) based on their permutation distributions. The approximation is based on the saddlepoint approximation rather than normal approximation or simulation. Generally, the saddlepoint approximations are accurate up to O(N −3/2 ) when considered over sets of bounded central tendency, in contrast to O(N −1/2 ) for the central limit theorem.
The presented method is a useful tool when accuracy and speed are needed. For example, the need to evaluate a very large number of p-values is increasingly common with modern genetic data. Testing the association between haplotype scores and a trait is a basic problem in genomic studies. In this situation, the asymptotic distribution is not easily available since the haplotype scores are not directly observed but are estimated from genotype data. Permutations are typically used to obtain p-values for large number of tests, but these can be computationally infeasible in large problems. For more details see Seaman and Müller-Myhsok (2005), Lin (2005) and Kustra et al. (2008).
Saddlepoint approximations to randomization distributions were introduced by Daniels (1958) and further developed by Robinson (1982) and Davison and Hinkley (1988). Booth and Butler (1990) showed that various randomization and resampling distributions are the same as certain conditional distributions and that the double saddlepoint approximation attains accuracy comparable to the single saddlepoint approach. Abd-Elfattah and Butler (2007) and Abd-Elfattah and Butler (2009) used the double saddlepoint approximation to calculate the p-values and confidence intervals for two different problems. The first paper treats the two-sample problem when both treatment and control observations are subject to right censoring. The test statistics are the class of linear rank tests. The second paper deals with extensions of the first paper to three or more treatment levels and considers tests of trend. Both of these papers deal with a survival time data, that is a time to event, which usually is subject to censoring. The current paper concerns nonparametric association with regression type data in which each subject has (x, y) values and we are interested in testing independence of X and Y . Bingyi (1998) also used the double saddlepoint approximation to approximate the p-values of the two sample permutation tests.
Our approximation is also a double saddlepoint approximation which requires no simulations. The following lemma reformulates the class (3) to more appropriate simple form to use the double saddlepoint.
Lemma 1. The class of statistics (3) can be written in an equivalent form as Proof. Simple algebra.
For example, if R 1 = 2 is arithmetical rank so that Z 2 = η 1 and N i=1 iZ i has a 2 in its first component for R 1 .
Section 2 presents the saddlepoint approximation approach. A real data example is illustrated in section 3 along with a simulation study to show the performance of the saddlepoint method. An application of the method to the Cuzick (1982) test statistic in case of interval censoring is discussed in section 4.

Saddlepoint approximation for tests of independence
Under the null hypothesis H 0 of independence, the permutation distribution of V places a uniform distribution on the set of N × 1 indicator vectors {Z i }. This distribution may be constructed from a corresponding set of i.i.d. N × 1 vectors of M ultinomial(1, θ 1 , θ 2 , . . . , θ N ) indicator vectors ζ 1 , ζ 2 , . . . , ζ N . The permutation distribution over all one way designs for which The dependence in the statistic can be removed by using the (N − 1) × 1 vectors Z − i and ζ − i , the first N − 1 components of Z i and ζ i respectively, then Assuming any probability vector {θ 1 , θ 2 , . . . , θ N } for the Multinomial distribution, the conditional distribution of T (ζ − 1 , ζ − 2 , . . . , ζ − N ) given the sufficient statistic N i=1 ζ − i is the required permutation distribution. Let P be a random variable with the required permutation distribution and v 0 the observed value of V. The required p-value is defined as the mid-p-value which is pr(P > v 0 ) + pr(P = v 0 )/2 = mid-p(v 0 ) and is approximated from the Skovgaard (1987) saddlepoint procedure as the conditional tail probability . . , 1) T } The mid-p-value is approximated from the double saddlepoint procedure using the joint cumulant generating function for ( and 1 − is (N −1)×1 vector of ones. In these expressions, K ′′ is the N ×N Hessian matrix and K ′′ ss is the ∂ 2 /∂s∂s T portion at (0, 0), see Skovgaard (1987) and Butler (2005). The saddlepoint t ,ŝ solves By using θ i = 1/N, the denominator saddlepoint equations have an explicit solution asŝ 0 = 0 and this simplifies the calculations.
The saddlepoint expression in (5) uses the saddlepoint approximation as if T (ζ − ), and consequently P, were continuous random variables. In the permutation setting however, P is discrete and not even a lattice distribution for which a continuity correction would be available. The reason that this continuous formula can and should be used is that it provides the most accurate approximation for the mid-p-value. Pierce and Peters (1992), Davison and Wang (2002), and Butler (2005, §6.1.4) discuss reasons for this accuracy.
These calculations can be summarized as follow; based on the test statistic under consideration, we calculate r ij and Q, then solving the saddlepoint equations yields the saddlepoint t ,ŝ . Substituting the saddlepoint values at (6) givesŵ andû and finally we use (5) to get the saddlepoint p-value.
The saddlepoint method requires solving system of N nonlinear saddlepoint equations. The Newton's method has been used through this paper. The IMSL routine DNEQBF is computationally faster. The DNEQBF routine uses Broyden's (1965) update of Newton's method with the finite-difference method to approximate the initial Jacobian matrix. When the sample size increases, this routine will be very useful to solve a large number of nonlinear equations. A comprehensive treatment of methods for solving nonlinear systems of equations can be found in Dennis and Schnabel (1996), also see Burden and Faires (2003) for methods and related softwares. Nayak (1988) gives the failure times of both transmission (X) and transmission pumps (Y ) on 15 caterpillar tractors as shown in Table 1.

Example and simulation study
To test the independence of failure times of X and Y , the test statistic (2) is used with L = (1, . . . , N ), and Q = L N N i=1 R i = N 2 (N + 1)/2. The true Table 1 Failure times of transmissions by Nayak (1988 )   X  1641  5556  5421  3168  1534  6367  9460  6679  6142  5995  3953  6922  4210  5161  4732  Y  850  1607  2225  3223  3379  3832  3871  4142  4300  4789  6310  6311  6378 6449 6949 (simulated) mid-p-value was calculated by using 10 6 permutations of the computed test statistic. The simulated mid-p-value is then the proportion of such generations exceeding the observed statistic plus half the proportion of times that attain v 0 . The p-value of the saddlepoint approach is compared to the normal p-value calculated using the test statistic (v ′ − E(v ′ ))/ V ar(v ′ ). The true mid-p-value and the saddlepoint approximated p-value were 0.2768 and 0.2763, respectively, while the normal p-value was 0.2693. A small simulation study has carried out to assess the performance of the saddlepoint method. Consider the general linear model of dependence . . . , N where all the variables X ′ i , Y ′ i and e i are mutually independent and their distributions do not depend on i, and λ is a real non-negative parameter. This general dependence model was considered by Hajek, Sidak and Sen (1999) and Cuzick (1982). In this model the null hypothesis H 0 of independence is equivalent to λ = 0, whereas for λ > 0 the variables X i and Y i are dependent. Data sets are generated from this model using logistic, extreme value and uniform distributions for X ′ i , Y ′ i and e i respectively. For each value of λ = 0.0, 0.5 and sample sizes (10,30,50,70), 1000 data sets were generated and the true, saddlepoint and normal p-values were calculated using the test statistic (2). Table 2 shows the proportion of the 1000 data sets for which the saddlepoint p-value was closer, in absolute error, to the true mid-p-value than the normal p-value "Sad. Prop.", "Abs. Err. Sad." is the average absolute error of the saddlepoint p-value from the true mid-p-value, and "Rel. Abs. Err. Sad." is the average relative absolute error of the saddlepoint p-value from the true mid-p-value, and the remaining listings are the same assessments for the normal approximation.
The saddlepoint p-value was more accurate in 90.8% of the overall cases as compared to the normal approximation. The average absolute saddlepoint error was less than 10 −3 with average relative error typically less than 0.1%.

Extension
The problem of testing the independence between two variables under random censoring has attracted the attention of many authors, see O'Brien (1978), Wei (1980), Oakes (1982), and Gieser and Randles (1997). The saddlepoint approach of section 2 is applicable to test statistics that deal with censored data. When one of the two variables is subject to interval censoring, say the first, and the second random variable is observed, Cuzick (1982) presents a linear log-rank test statistic to test the independence of two vectors in the form N i=1 ξ i R 2i , where {ξ i } are given scores and {R 2i } are the ranks of the observed values of the second random variable. In the linear form (4), taking L = (ξ 1 , ξ 2 , . . . , ξ N ) and f N (R i ) = R 2i , the saddlepoint method is applicable. For example, Cuzick gives a survival times for 20 patients for the analysis of the relation between hemoglobin at presentation and survival in some medical clinic. The normal p-value using his asymptotic approach was 0.0505 while the true mid-p-value and the saddlepoint p-value are 0.0516 and 0.0512, respectively. Generally the saddlepoint approximation can be applied to any linear rank test that takes the form (3).