A new approach for approximating the p-value of a class of bivariate sign tests

Bivariate data are frequently encountered in many applied fields, including econometrics, engineering, physiology, biology, and medicine. For bivariate analysis, a wide range of non-parametric and parametric techniques can be applied. There are fewer requirements needed for non-parametric procedures than for parametric ones. In this paper, the saddlepoint approximation method is used to approximate the exact p-values of some non-parametric bivariate tests. The saddlepoint approximation is an approximation method used to approximate the mass or density function and the cumulative distribution function of a random variable based on its moment generating function. The saddlepoint approximation method is proposed in this article as an alternative to the asymptotic normal approximation. A comparison between the proposed method and the normal asymptotic approximation method is performed by conducting Monte Carlo simulation study and analyzing three numerical examples representing bivariate real data sets. In general, the results of the simulation study show the superiority of the proposed method over the asymptotic normal approximation method.

Sign testing is a common method of testing symmetry.Many statisticians and those interested in statistical inference have made many generalizations to the univariate sign test in order to obtain the corresponding test in the bivariate case.Work began on this point by both Hodges 1 and Blumen 2 .After that, many studies appeared with the aim of providing and developing the sign tests for the bivariate case.In this regard, we can point out the contributions of Chatterjee 3 , Kohnen 4 , Dietz 5 , Brown and Hettmansperger 6 , Oja and Nyblom 7 , and Brown and Hettmansperger 8 .Brown et al. 9 discussed the concepts of bivariate sign test and bivariate medians.Larocque et al. 10 introduced an affine-invariant modification of the Wilcoxon signed-rank test for bivariate location problems.The advantage of this test over Jan and Randles 11 test is that its asymptotic null distribution holds without assuming elliptical symmetry.Samawi 12 introduced a bivariate sign test for the one-sample bivariate location problem using a bivariate ranked set sample.Ghute and Shirke 13 developed a nonparametric control chart for monitoring the changes in the location of a bivariate process, the proposed chart is based on Bennett's 14 bivariate sign test.
The p-value plays an important role in hypothesis tests because of its important role in determining the acceptance or rejection of the null hypothesis.Therefore, approximating the exact p-value with high accuracy is a challenge in many statistical tests.In this context, the saddlepoint approximation method is suggested to approximate the exact p-value of a class of bivariate tests which takes the general linear form H 0 : F x, y = F y, x for all x, y .where U i is the vector of score function based on observation of the sample, β i is the vector of indicators which has a sequence of ones and zeroes, and C is a constant vector possibly depending on observation of the sample.
The saddlepoint approximation method is basically just a method for approximating the density function.Daniels 15 was the first one who initially proposed the general application of the saddlepoint approximation for density function.The cumulative distribution function in the univariate case was approximated by Lugannani and Rice 16 depending on the proposal of Daniels.Skovgaard 17 provided a double saddlepoint approximation for the conditional distributions.A saddlepoint approximation for a bivariate distribution function was introduced by Wang 18 .Abd-Elfattah 19 introduced an accurate and easy approximation for the distribution function of bivariate class of random sum distributions using saddlepoint approximation technique.Abd-Elfattah 20 approximated the exact permutation distribution of a class of two-sample bivariate tests using saddlepoint approximation technique.Abd-Elfattah 21 used the saddlepoint approximation to approximate the distribution function of the bivariate symmetry test statistic under competing risk data.Abd El-raheem and Abd-Elfattah 22,23 approximated the exact permutation distribution of a class of two-sample tests for cluster data under two different randomization designs.For more recent articles in the saddlepoint approximation method; see Kamal et al. 24,25 .In the end, we can mention a number of important and basic references on the subject of saddlepoint approximations, which highlight the importance and applications of saddlepoint approximations in many branches and fields of statistics, namely: Booth and Butler 26 , Strawderman 27 , Butler 28 , Abd-Elfattah and Butler 29 , Kwok and Zheng 30 .
As mentioned earlier, our goal is to approximate the mid-p value of a class of bivariate sign tests.The focus here is on the mid p-value rather than ordinary p-value since the ordinary p-value is too conservative in comparison to the mid p-value, see Abd-Elfattah 20 .Such a class of bivariate sign tests is presented in detail in "Bivariate sign tests" section.The bivariate saddlepoint approximation is applied to approximate the mid-p value of a class of bivariate sign tests in "Bivariate saddlepoint approximations" section."Illustrative examples and simulation studies" section compares the performance of the saddlepoint approximation and the asymptotic normal method using numerical examples and simulation studies.

Bivariate sign tests
This section presents two of the most frequently used bivariate sign tests in one sample problem.After that, we formulate the two statistics of such two tests in a general linear form to facilitate obtaining highly accurate approximation of exact p-value of such bivariate sign tests using bivariate saddlepint approximation method.

Bivariate sign test of Blumen 2
The bivariate sign test was provided by Blumen 2 to test the hypothesis that the medians of two variables have a specific value.This test was created to be independent of correlation between the two variables.Let x i , y i rep- resent the bivariate sample points.In order to perform Blumen's bivariate sign test, consider the n axes created by drawing a line across each x i , y i and the origin, and number the axes corresponding to the angle counter- clockwise from the positive end of the horizontal axis.Let γ i = +1 or − 1 if the data point associated with the ith axis is higher (lower) than the horizontal axis.The center of gravity is calculated by computing the values at the intersection of the standardized vectors and the unit circle.Blumen's test statistic is given by If the null hypothesis is true, then l 1 and l 2 are approximately independent normal variables with mean zero and variance n/2.
Let α i = γ i +1 2 , then l 1 and l 2 in Eq. ( 2) become and where α i = {0, 1} .Now, the statistics l 1 and l 2 can be rewritten in the bivariate sign statistic form as o r i n t h e f o r m ( 1 ) w i t h . ( Brown et al. 9 introduced another idea for bivariate symmetry test.Let z 1 , . . ., z n be a sample drawn at random from a bivariate distribution.Brown et al. 9 meant by symmetry here that z i − µ and µ − z i are identically dis- tributed, where µ is the symmetry center.Thus, the null hypothesis of bivariate symmetry is defined by The observed data can be represented in the following form where γ i = 1 or −1 if z i is above or below the horizontal axis, respectively, r i is the ith radius, and Under the null hypothesis, P(γ i = 1) = P(γ i = −1) = 1 2 .Let z T = (z 1 , z 2 ) , and żT = (−z 2 , z 1 ) , then the gradient vector at the origin (divided by n) is given by The statistic q becomes simpler after some simplification as following: where w T = (w 1 , w 2 ) and using x n+i = −x i such that: The statistic q is asymptotic normal with mean µ = E q|H 0 = 0 and covariance matrix σ 2 then α i = {0, 1} and the statistic q becomes It is clear that the statistic q takes the same form of the linear statistic in Eq. ( 1) with

Bivariate saddlepoint approximations
The permutation distribution of the general from of the bivariate sign statistic in Eq. ( 1) is 2 n .This distribution can be derived from the set {β 1 , ..., β n } of independent and identically Bernoulli (1/2) random variables.The bivariate sign statistic in (1) can be written as two sign statistics as Let B 0 = (τ , υ) be observed value of B , it is possible to calculate the mid-p value of the statistic B at B 0 as The mid − p(B 0 ) can be approximated using saddlepoint approximation of the bivariate CDF which was developed by Wang 18 .The approximate formula presented by Wang 18 is an approximation of the bivariate cumulative distribution function as a generalization of the approximation presented by Lugannani and Rice 16 which is the approximation of the univariate cumulative distribution function.Both approximations are an approximation of the intractable integrals resulting from calculating different forms of probabilities.These approximations totally depend on the cumulant generating function (CGF).
The joint CGF of b 1 and b 2 is given by Since B 0 = (τ , υ) is the observed value of the statistic B , assume for fixed (τ , υ) that there exists a unique solution (t 0 , u 0 ) of the following equation ẋi+l . ( Vol:.( 1234567890) www.nature.com/scientificreports/and t = t 0 solves the equation where K 1 (t) is the CGF of b 1 , and similarly it can be assumed that K 2 (u) is the CGF of b 2 .
To get, the value of the approximation in ( 9), some functions are required which are as follows and

Illustrative examples simulation studies
Three published real data sets are considered this part to demonstrate the efficiency of the saddlepoint and normal approximations.Inclusive Monte Carlo simulation studies are also carried out to evaluate the accuracy of the saddlepoint approach compared to that of the traditional asymptotic method.

Examples
The precision of different approaches to approximate the exact p-value of bivariate sign tests may be illustrated using some numerical examples.As a result, three published real data sets are provided in order to compare the saddlepoint approximation and normal approximation methods.For Data Set 1, ten adult sons and their fathers participated in a study to assess eye refractions.Positive refractions indicated long-sightedness, while negative refractions showed near-sightedness.The sons were part of a large group collected in northern Finland for infants born in 1966.Data set 1 is presented in Table 1.More details can be found about this data set in Rantakallio 31 .We can indicate that several authors used the data presented in Table 1 to clarify some procedures for bivariate sign tests, for example, see Brown et al. 9 .Data set 2 is a simple study of twelve cotton textile workers who were researched by Merchant et al. 32 to determine the effects of cotton dust exposure.Before and after each participant's 6-h exposure to cotton dust, several factors were measured for each worker, including the change in closing volume and white blood cell count.Dietz 5 used data set 2 to clarify the procedures of his bivariate sign test.This data set was included in Table 1 of Dietz 5 .Data set 3 is from Samawi et al. 33 .These data represent the bilirubin levels in jaundiced infants staying in the neonatal intensive care unit.Physicians are interested in jaundice because it may have a significant influence on hearing and neurological development and is a risk factor for death.It would be extremely beneficial to physicians if they could test the hypothesis that boys and females have the same median bilirubin level when weight groups are matched.The data was collected from five hospitals in Jordan and was limited to births in the first six months of 1997.Samawi et al. 33 took fifteen pairs of male and female patients from the hospital records.
Table 2 shows the mid p-values for the three data sets for the Blumen 2 and Brown et al. 9 bivariate sign tests.Furthermore, the asymptotic normal p-values and saddlepoint p-values are also displayed in Table 2.In the remainder of this article, we refer to the Blumen 2 and Brown et al. 9 tests by test 1 and test 2, respectively.The simulated mid p-value (Sim) is derived based on 10 6 permutations of the indicators { β i } by computing the ratio of cases in which B exceeds B 0 plus half the ratio of cases in which B equals to B 0 .
In all three data sets, the saddle point approximation outperformed the normal approximation in terms of the simulated mid p-value precision.

Monte Carlo simulation study
Monte Carlo simulation studies are used to show the accuracy of the saddlepoint approximation over a wide range of simulated data from different bivariate distributions and different sample sizes.1000 bivariate data sets of sizes n = 20, 30, 40, and 60 are generated from the bivariate exponential distribution, bivariate logistic distribution, bivariate normal distribution and bivariate Poisson distribution.For generating bivariate data from normal and Poisson distributions, three cases are taken into account for the correlation coefficient between the two variables: weak, moderate, and strong.While the data are generated from the bivariate exponential and logistic distributions assuming independence between the two variables.For the four distributions the following results "Sad.P. ", "E.Sad.", and "E.Nor." are presented in Tables 3, 4, 5 and 6, where "Sad.P. " is the proportion of the 1000 data sets for which the saddlepoint p-value is closer to the simulated exact mid p-value than the normal, www.nature.com/scientificreports/"E.Sad." is the average relative absolute error of the saddlepoint approximation, and "E.Nor." is the average relative absolute error of the normal approximation.The estimated type I error and power of the considered tests at the 0.05 significance level are displayed in Tables 7 and 8, respectively.We notice from Tables 3, 4, 5 and 6 that the mean absolute error of the proposed approximation method is less than that of the normal approximation method in all the assumed cases.Moreover, we can note that the convergence percentage of suggested approximation to the simulated exact p-values was not in any case less

Table 1 .
Refraction values for ten sons with their father.

Table 2 .
Simulated, saddlepoint and normal p-values for the three data sets.

Table 3 .
The results of the comparison between the saddlepoint method and the normal approximation method based on simulated data from bivariate normal distribution with correlation coefficient ρ.

Table 4 .
The results of the comparison between the saddlepoint method and the normal approximation method based on simulated data from bivariate logistic distribution.

Table 5 .
The results of the comparison between the saddlepoint method and the normal approximation method based on simulated data from bivariate exponential distribution.

Table 6 .
The results of the comparison between the saddlepoint method and the normal approximation method based on simulated data from bivariate Poisson distribution with correlation coefficient ρ.Relative absolute errors of saddlepoint approximation and normal approximation for the Test 1 with sample size n 20 generated from bivariate Poisson distribution.