A distribution-free test of independence based on a modified mean variance index

Cui and Zhong (2019), (Computational Statistics & Data Analysis, 139, 117–133) proposed a test based on the mean variance (MV) index to test independence between a categorical random variable Y with R categories and a continuous random variable X. They ingeniously proved the asymptotic normality of the MV test statistic when R diverges to infinity, which brings many merits to the MV test, including making it more convenient for independence testing when R is large. This paper considers a new test called the integral Pearson chi-square (IPC) test, whose test statistic can be viewed as a modified MV test statistic. A central limit theorem of the martingale difference is used to show that the asymptotic null distribution of the standardized IPC test statistic when R is diverging is also a normal distribution, rendering the IPC test sharing many merits with the MV test. As an application of such a theoretical finding, the IPC test is extended to test independence between continuous random variables. The finite sample performance of the proposed test is assessed by Monte Carlo simulations, and a real data example is presented for illustration.


Introduction
As a fundamental task in statistical inference and data analysis, testing independence of random variables has been explored for decades in the literature. Based on different types of random variables, many approaches to test independence have been proposed. For instance, if one wants to test independence between two categorical random variables, then the contingency table analysis and the Pearson chi-square test can be used. If both variables are continuous, there are also many important tests, such as, Hoeffding (1948), Rosenblatt (1975), Csörgö (1985) and Zhou and Zhu (2018), among others. Testing independence between random vectors has also received much attention in recent years, for instance, Székely et al. (2007), Rizzo (2009), Heller et al. (2012), Zhu et al. (2017), Pfister et al. (2018) and Xu et al. (2020).
It is also important to test independence between a continuous variable and a categorical variable. Suppose X is a continuous variable with support R X and Y ∈ {1, . . . , R} is a categorical variable with R categories. We are interested in the following test of hypothesis: H 0 : X and Y are independent, versue H 1 : X and Y are not independent.
Or, equivalently, H 0 : F (x) = F r (x) , for any x ∈ R X and r = 1, . . . , R, versue H 1 : F (x) = F r (x) , for some x ∈ R X and r = 1, . . . , R, where F(x) = P(X ≤ x), p r = P(Y = r), and F r (x) = P(X ≤ x | Y = r), r = 1, . . . , R. Thus, testing independence between X and Y is equivalent to testing the equality of conditional distributions, which is known as the k-sample problem in the literature (see e.g., Jiang et al., 2015). Recently, Cui and Zhong (2019) proposed the mean variance (MV) test based on a new measure of dependence between X and Y, the MV index (Cui et al., 2015), to test hypothesis (1). The MV index is defined as where F(x | Y) = P(X ≤ x | Y). Given {(X i , Y i ), i = 1, . . . , n} with sample size n, the MV test statistic is proposed: where F n (x),p r and F rn (x) are the empirical counterparts of F(x), p r and F r (x), respectively. An important theoretical finding of Cui and Zhong (2019) is that when the number of categories of Y is allowed to diverge with the sample size, the standardized MV test statistic is a standard normal distribution. Cui and Zhong (2019) has argued many appealing merits of this finding. For instance, this makes it convenient for obtaining any critical value of the MV test by using an approximated normal distribution when R is large.
For any fixed x ∈ R X , dividing MV test statistic's integrand by F n (x)(1 − F n (x)) leads to the Pearson chi-square test statistic = R r=1 2 l=1 n lr (x) n − n l+ (x) n +r 2 n l+ (x) n +r n , which is widely used in practice to test independence between the indicator function I(X ≤ x) and Y. Here n lr (x) (l = 1, 2, r = 1, . . . , R) are the counts in a 2 × R contingency table (Table 1) determined in the following way n 1r (x) = |{(X i , Y i ) : X i ≤ x and Y i = r}| , for r = 1, . . . , R, where |A| denotes the cardinality of a set A, and n l+ (x) = R r=1 n lr (x), n +r = 2 l=1 n lr (x), for l = 1, 2, r = 1, . . . , R. As the Pearson chi-square test is more widely used in testing independence, we can imitate the MV test statistic to take the integral of χ 2 n (x) with respect to F n (x), and propose the following test statistic: n lr (X i ) n − n l+ (X i ) n +r 2 n l+ (X i ) n +r n We call IPC n (X, Y) as the integral Pearson chi-squared (IPC) statistic, and n IPC n (X, Y) as the IPC test statistic. It is not difficult to see that the IPC test statistic is essentially a reestablishment of the k-sample Anderson Darling test statistic proposed by Scholz and Stephens (1987). The reader is referred to He et al. (2019) and Ma et al. (2022) for some recent work on this statistics. The asymptotic null distribution of the IPC test statistic when R is fixed was established in Scholz and Stephens (1987). The promising performance of the k-sample Anderson Darling statistic (IPC test statistic) has been verified by many subsequent works in the literature and a variety of applications in practice. However, to our best knowledge, its theoretical property when the number of categories of Y is diverging remains unknown. The main goal of this paper is to fill in gaps in this area. In analogy to the MV test, we find that the IPC test also enjoys an appealing property, that is, the asymptotic null distribution of the standardized IPC test statistic when R is diverging is a standard normal distribution. This important theoretical finding allows the IPC test to share many distinguished merits with the MV test. Our work, together with Cui and Zhong (2019), establishes a solid theoretical foundation and empirical evidence for independence testing between a continuous variable and a categorical variable with a diverging number of categories. As an application of such a theoretical finding, we also extend the IPC test to test independence between two continuous random variables. The approach is carried out by slicing one of the variables on its support to get a categorical variable, and then the IPC test can be applied. We Table 1. Empirical bivariate distribution for a fixed x.
allow the slicing scheme to be finer as the sample size increases, which ensures us to obtain a satisfactory test power. Slicing technique is widely used across many statistical fields, such as feature screening (Mai & Zou, 2015b;Yan et al., 2018;Zhong et al., 2021) and k-sample test (Jiang et al., 2015). It has also been used for testing independence. For instance, it is commonly seen in practice to slice two univariate variables into categorical variables and apply Pearson chi-squared test to test their independence. Please refer to Zhang et al. (2022) for more recent development of sliced independence test. Our research enriches the application of the slicing skill in the field of independence testing. The proposed approach also provides a computationally tractable way to compute the p-value efficiently. Simulation studies show that the proposed test has satisfactory test power in many scenarios. The rest of the paper is organized as follows. Section 2 introduces some preliminaries of the IPC test. Section 3 presents the main results, including the asymptotic null distribution of the test statistic when R is diverging with the sample size. Simulation studies of the proposed test and a real data application are included in Section 4. Section 5 concludes the paper. Due to the limited space, all the technical proofs of theorems are given in Appendix.

Preliminaries
Let X be a continuous random variable with support R X , Y ∈ {1, . . . , R} be a categorical variable with R categories. Motivated by the IPC statistic in (4), we define the following IPC index between X and Y.
The IPC statistic is a natural estimator of the IPC index. Note that the n l+ (X i ) in the denominator of the right-hand side of the first equality of (4) will take zero when X i is the largest or smallest one among all {X i } n i=1 . A solution is to follow Mai and Zou (2015a) and consider the Winsorized empirical CDF at a predefined pair of number (a, b). The Winsorization will cause bias in estimating the IPC index. Though such bias can automatically vanish if we let a → 0 and b → 1 as n → ∞. However, how to properly choose a and b is beyond the scope of this paper. At the same time we notice that, if X i is the largest or smallest one, the numerator of the first equality of (4) will also take zero. Therefore, we hereafter denote 0/0 = 0 following the common practice in the literature (see for example, He et al., 2019;Ma et al., 2022) to avoid confusion. Then we have the following lemmas.
Lemma 2.1 shows that IPC n (X, Y) is a consistent estimate of the IPC index.
Lemma 2.2: 0 ≤ IPC(X, Y) < 1 and IPC(X, Y) = 0 if and only if X and Y are independent.
According to Lemma 2.2, the IPC index is an effective measure of dependence between a continuous variable and a categorical variable. Thus we can construct test of independence via the IPC statistic.
Let T n = n IPC n (X, Y). Note that T n is essentially the k-sample Anderson Darling test statistic proposed by Scholz and Stephens (1987), and then we can directly derive the asymptotic null distribution of T n .
Though Theorem 2.3 gives an explicit form of the asymptotic null distribution, the exact distribution of ∞ j=1 [j(j + 1)] −1 χ 2 j (R − 1) is not accessible since it is a summation of infinitely many chi-square random variables. To address this issue, a widely adopted approach is to approximate ∞ j=1 . However, as a chi-square type mixture, D N 's cumulative distribution function does not have a known closed form. In practice, we usually generate many samples from D N and then use the empirical distribution as a surrogate of the true distribution. We can also use permutation test or bootstrap to compute the p-value for the IPC test. However, though these numerical methods are valid, they do make the IPC test less convenient for independence testing.
Lemma 2.1 declares that IPC n (X, Y) converges in probability to IPC(X, Y), which is a new result not discussed in Scholz and Stephens (1987). Furthermore, we have a better result about the convergence rate.
Theorem 2.4: Under the conditions of Lemma 2.1, for any ε > 0, as n → 0. Here C 1 is a positive constant, and C 2 > 0 depends only on min 1≤r≤R p r .
Theorem 2.4 follows directly from Theorem 3.2 in Section 3.1. The probability inequality in (8) allows us to give a lower bound of the power of the test with finite sample size. In specific, according to Theorem 2.3, we compute the critical value C α for a given significance level α > 0. Then under H 1 , the power is According to Lemma 2.2, we have IPC(X, Y) > 0 under H 1 . Therefore, the power of the test converges to 1 as the sample size increases to infinity. In other words, this ensures that the IPC test of independence is a consistent test. We would like to conclude this section by introducing two relevant recent work in the literature on IPC index. The application of the dependence measure in marginal feature screening has received increasing attention. Recently, He et al. (2019) proposed a novel feature screening procedure based on the IPC index (which they referred to as the AD index) for ultrahigh-dimensional discriminant analysis where the response is a categorical variable with a fixed number of classes. The theoretical guarantee of the IPC statistic in He et al. (2019) has focused primarily on concentration inequality, rather than the asymptotic distribution. They showed that the proposed screening method is more competitive than many other existing methods. The promising numerical performance of He et al. Especially, the slicing technique used in Ma et al. (2022) is further considered in this article to develop method for testing independence between two continuous random variables. The details are postponed in Section 3.2.

Main results
In this section, we allow the number of categories of Y to approach infinity with the sample size n, and consider the properties of the IPC test. Research on the categorical variable with a diverging number of categories has received increasing attention in the literature. For instance, Cui et al. (2015) established the sure screening property of the MV index for discriminant analysis with a diverging number of response classes. In their setting, they allow the number of categories R to approach infinity at a slow rate of n. And Ni and Fang (2016) also proposed an entropybased feature screening for ultrahigh dimensional multiclass classification allowing the number of response classes to diverge. Readers are also referred to Ni et al. (2017), Yan et al. (2018), Ni et al. (2020) and Ma et al. (2022), among others, for more examples.
Here, we emphasize that it is also important to study test of independence between a continuous variable and a categorical variable with a diverging number of categories. One of its applications is to provide a feasible approach for testing independence between a continuous variable and a categorical variable taking infinite values. To be specific, suppose Y is a categorical variable taking infinite values (e.g., Poisson variable) and X is a continuous variable. To test independence between X and Y, we can define a new variable Y = Y ∧ R for some R, where a ∧ b = min(a, b). The IPC test is then applied to test independence between X and Y , which gives us important information about whether X and Y are independent. Then a natural question is how to choose an appropriate R. A reasonable approach is to allow R to go to infinity with the sample size n so as to obtain satisfactory test power. This is one of the reasons that motivates us to study the asymptotic properties of the IPC statistic when R is diverging.

Asymptotic properties when R is diverging
In the following, we establish the large sample properties of the IPC statistic when R is diverging with the sample size n. To avoid any ambiguity, in Section 3.1, we actually consider a sequence of problems indexed by k, k = 1, 2, . . .. For each k, Y k ∈ {1, . . . , R k } denotes the categorical variable with R k categories, p r,k = P(Y k = r), for r = 1, . . . , R k , X k denotes the continuous variable, and {(X ki , Y ki ): i = 1, 2, . . . , n k } is a random sample with sample size n k from (X k , Y k ). The following theorem shows the asymptotic normality of the standardized test statistic if X k and Y k are independent for any k = 1, 2, . . ..
and R k → ∞ as n k → ∞, and X k and Y k are independent for k = 1, 2, . . ., we have as k → ∞.
for some 0 < η < 3/4 − 2γ , namely, we allow the number of categories to go to infinity with the sample size n at the relatively slow rate. Cui and Zhong (2019) also gave a similar result for the MV test with R diverging.
Let V(R) = ∞ j=1 χ 2 j (R − 1)/[j(j + 1)] be the asymptotic null distribution in Theorem 2.3 where R is fixed. A direct application of Theorem 3.1 is that we can use a normal distribution with mean R−1 and variance 2(π 2 /3 − 3)(R − 1) to approximate the asymptotic null distribution of the IPC test (i.e., V(R)) when R is large. Denote W(R) = N(R − 1, 2(π 2 /3 − 3)(R − 1)). To gain more insight into the connection between the normal distribution W(R) and V(R), one can notice that the mean and the variance of V(R) are also R−1 and 2(π 2 /3 − 3)(R − 1), respectively. This result is a distinguished merit of the IPC test. It enables us to reduce the computational cost since it is more easy to calculate the critical value of W(R) than of V(R).
To further check the validity of using W(R) as a surrogate for V(R) to compute the critical value of the IPC test when R is large, we compare the empirical quantiles of the IPC test statistic with the theoretical quantiles of the normal distribution W(R) in (9) and the asymptotic null distribution V(R) in (7). We generate Y ∈ {1, . . . , R} with equal probabilities and X independently from U(0, 1). We consider R = 10, 15, . . . , 35. For each R, let n = 40 × R, and we repeat the simulation 1000 times to obtain 1000 values of the IPC test statistic T n . We report the 90% and 95% quantiles of 1000 T n 's (denoted by empirical quantile in Table 2), as these two quantiles are most widely used in hypothesis testing. The 90% and 95% quantiles of V(R) (denoted by theoretical quantile 1) and W(R) (denoted by theoretical quantile 2) are also computed. The results are gathered in Table 2. The empirical quantiles are close to the theoretical quantiles of W(R) even when R = 10, which further supports our proposed method of using the approximated normal distribution to calculate the critical value of the IPC test when R is relatively large. Looking further into the results in Table 2, we can see that T n 's empirical quantiles seem to be almost systematically smaller than the quantiles of V(R) (with the exception of the 95% quantile when R = 35), while larger than the quantiles of W(R) (both by a very small amount). Note that the asymptotic distribution V(R) can be viewed as a chi-square-type mixture. Such chi-square-type mixture follows an asymmetrical, positively skewed (or right-skewed) distribution, in which the left tail is shorter while the right tail is longer. To be specific, the skewness which will tend to zero as R goes to infinity. While the normal distribution W(R) is symmetric, its skewness is 0. Since V(R) is a better approximation of the exact distribution of T n , it makes sense that the 90% and 95% quantiles of both the T n 's empirical distribution and V(R) will be slightly larger than that of W(R). It is also interesting that the T n 's empirical quantiles fall between the quantiles of V(R) and the quantiles of W(R). This may implicate that the skewness of the exact distribution of T n seems to be smaller than that of V(R).
We further compare the empirical null distribution with W(R). Still generate Y ∈ {1, . . . , R} with equal probabilities and X independently from U(0, 1). Consider four scenarios: (a) R = 5, n = 100 × R = 500; (b) R = 10, n = 80 × R = 800; (c) R = 20, n = 40 × R = 800; (d) R = 50, n = 30 × R = 1500. We run the simulation 100000 times for each scenario to obtain 100000 values of the IPC test statistic T n . Then we compare the empirical distribution of the standardized IPC test statistic [T n − (R − 1)]/ 2(π 2 /3 − 3)(R − 1) with the standard normal distribution N(0, 1) in Figure 1. In scenario (a) when R = 5 is too small, the empirical density curve of the standardized IPC test statistic deviates to some extent from the normal density function, even though the sample size n = 500 is large. Also, when R = 5, the empirical density is positively skewed, with more values clustered around the left tail while the right tail is slightly longer. The empirical density curve, however, is very well matched to the standard normal density curve when R increases, such as in scenario (c) when R = 20. This further emphasizes that R should be large enough (say, larger than 10) to ensure the normal approximation in Theorem 3.1 to hold.
The following theorem allows us to bound the deviation of the IPC statistic when R is diverging, which is parallel to Theorem 3.1 in Ma et al. (2022).
for some 0 ≤ η < 1/2 and there exists a positive constant c 1 such that c 1 /R k ≤ p r,k for r = 1, . . . , R k , k = 1, 2, . . .. Then for any ε ∈ (0, 1), where C 1 is a positive constant and C 2 > 0 depends only on c 1 . The condition c 1 /R k ≤ p r,k for r = 1, . . . , R k , which is also used in Cui et al. (2015) and Cui and Zhong (2019), requires that the proportion of each category of Y k can not be too small. Indeed, the condition can be relaxed in a way that c 1 is allowed to tend to 0 at a slow rate. Specifically, if we assume c 1 = o(n −τ k ) for some 0 < τ < 1/2 − η, then the probability in (10) will still converge to zero, but the convergence rate will be relatively slower. Note that Theorem 2.4 is a special case of Theorem 3.2 when η = 0, i.e., R k is fixed, and the condition on p r,k is automatically satisfied.

Extension of the IPC test
A natural application of Theorem 3.1 is to extend the IPC test to test independence between two continuous variables via the slicing technique. Consider two continuous random variables X and Z. Without loss of generality, we assume that the supports of X and Z are R. We define a partition of the support of Z with a given positive integer R: where q 0 = −∞, q R = ∞. Each interval [q r−1 , q r ) is called a slice in the literature (Mai & Zou, 2015b;Yan et al., 2018). And a new random variable can be accordingly defined as Y S = r if and only if q r−1 ≤ Z < q r for r = 1, . . . , R. The IPC test can be applied to test independence between X and Y S . If the distribution of Z is known, we suggest a uniform slicing to partition Z such that . , R} is regarded as an intuitive uniform slicing scheme (Yan et al., 2018). We also define Obviously, it is important to choose an appropriate R for testing independence. If R is too large, then the sample size in each slice is too small, making the estimate of the IPC index inaccurate. And if R is too small, then much information of Z may be lost, making the test power poor. In the slicing literature (Mai & Zou, 2015b;Yan et al., 2018;Zhong et al., 2021), a common choice is to set R = log n , where x is the integer part of x. And according to Theorem 3.1, we can also choose R < n 1/4 . In practice, we recommend choosing R = n/k for some 20 ≤ k ≤ 50, so that the sample size in each slice is about 20 to 50.

Comparison with the MV test
In this subsection, we would like to discuss the advantages of the IPC test compared to the MV test. As explained in Cui and Zhong (2019), the MV index can be considered as the weighted average of Cramér-von Mises distances between F r (x), the conditional distribution of X given Y = r, and F(x), the unconditional distribution function of X. Note that the IPC index can be viewed as a modification of the MV index by adding a weight function {F(x)(1 − F(x)) −1 }. Such weight function is large for F(x) near 0 and 1, and smaller near F(x) = 1/2. Hence, the IPC test emphasizes more on the difference between F r (x) and F(x) near the tail of F(x). As it is known, ). Accordingly, the IPC test is more sensitive to tail differences among the conditional distributions. In the following, we consider the test of independence between a continuous random variable and a categorical variable with a relatively large number of classes (i.e., R is large) and the test of independence for two continuous random variables, and further illustrate the IPC test's sensitivity to differences in the tails of the conditional distributions through numerical simulations.
1. When R is large or is allowed to diverge. In this case, we recommend using a normal distribution to approximate the IPC test's null distribution due to Theorem 3.1. It is not surprising that given a large R, IPC test still retains sensitivity to tail differences when using a normal distribution instead of V(R) to calculate p-value. The following example is used to illustrate this issue.
Let p), W and V r are independent, W = N(0, 1) and V r = N(10 + r, 1). To intuitively gain some understanding of our simulation setting, set p = 0.8. We draw the conditional distributions of X given Y = 1 and Y = 5, respectively in Figure 2. It is easy to see that the conditional distributions differ from each other only at their right tails. We choose the sample size n = 400, and p = 0.7, 0.75, 0.8, 0.85, 0.9. We apply the IPC test and the MV test, and compute the p-values for these two tests by using their approximated normal distributions. The empirical powers of these two tests based on 500 replicates at the significance level α = 0.05 are presented in Table 3. To further validate the robustness of the IPC test against heavy-tails, we further consider W ∼ t(1) in the above setting. The empirical powers are also shown in Table 3. A larger p indicates that the differences among the conditional distributions occur in a more extreme right tail end, and thus are more difficult to detect the dependence between X and Y. We can see from Table 3 that the IPC test is significantly more powerful than the MV test when p < 0.9. When p = 0.9, neither the IPC nor the MV has sufficient statistical power to detect the dependence between X and Y. The simulation validates that the IPC test has a better power to tail differences among the conditional distributions. In Example 4.1 we will compare with other existing methods to further validate the IPC test's sensitivity towards tail differences.
2. Testing independence between continuous random variables. We follow the notation in Section 3.2. Let X and Z be two continuous random variables. It is natural to expect that the IPC test will be more powerful than the MV test to detect the tail differences among the conditional distribution of X given Z. Consider a straightforward extension of the IPC index in (5) and define the following index between X and Z:   where F(· | Z = z) is the conditional distribution of X given Z = z, and F(x) and F Z (z) are the distributions of X and Z, respectively. Given a positive integer R and a corresponding uniform slicing scheme S defined as in (11) with Under certain mild conditions, Ma et al. (2022) has shown that IPC(X, Y S ) → IPC(X, Z), as R → ∞.
From (12), again, we have some insights that the IPC test of independence emphasizes more on the difference between F(x | Z = z) and F(x) near the tail of F(x). We use a toy sample to further illustrate this issue. Generate Z ∼ Unif(4, 6), and generate X = BW + 5(1 − B)Z, where B ∼ Binomial(1, p). We still consider two settings of W: (i) W ∼ N(0, 1) and (ii) W ∼ t(1). Choose the sample size n = 400, and p = 0.7, 0.75, 0.8, 0.85, 0.9. We follow the step in Section 3.2 and choose R = 20 to conduct the test of independence. Table 4 presents the empirical powers of IPC and MV tests based on 500 replicates at the significance level α = 0.05. IPC test outperforms the MV test in these settings. Note that when p = 0.8, the MV test is almost invalid. However, the IPC test still has a reasonably acceptable power.

Numerical studies
In this section, we assess the finite-sample performance of the IPC test by comparing with some powerful methods proposed in recent years: the MV test (Cui & Zhong, 2019), the distance correlation (DC) test (Székely et al., 2007), the HHG test (Heller et al., 2012(Heller et al., , 2016 and the Hilbert-Schmidt independence criterion (HSIC) test (Gretton et al., 2005(Gretton et al., , 2007Pfister et al., 2018). The R packages energy, HHG, and dHSIC are used to implement the DC test, the HHG test and the HSIC test, respectively. Note that the DC test can not be directly applied to a categorical variable, so in our simulations we will transfer a categorical variable with R categories into a random vector with R−1 binary dummy variables and apply dcov.test to this dummy vector instead of the original data. For the DC, HHG, and HSIC tests, the permutation test with K = 200 is used to calculate the p-value.
Example 4.1: In this example, we evaluate the performance of IPC test for the large-R case. Let R = 15, and we consider the following two cases.
Let n = 400. In Model 1.2, we uniformly slice Y into a categorical variable with R = 15 classes in order to apply the IPC and MV tests. Let p vary from 0 to 1 in both two models. We compute the p-value for the IPC test by using the asymptotic distribution in Theorem 3.1. The empirical power of each test based on 500 simulations at the significance level α = 0.05 is shown in Figure 3. Note that, when p = 1, X is independent with Y in both models. We deliberately report the results, i.e., the type I error rates of each test, in Table 5. The type I error rates of the IPC test (and other tests) are close to the nominal significance level α = 0.05, which further supports Theorem 3.1. Figure 3 clearly shows that the IPC test outperforms other competitors. And the power differences between IPC test and MV test exceed 0.25 when p = 0.6 for both models.  Looking further into the models considered in this example. In both Model 1.1 and Model 1.2, the conditional distributions of X given Y differ from each other only in their right tails when p > 0.5. A larger p indicates that the conditional distribution functions differ from each other in a more extreme tail end. And when p = 1, X and Y are independent. Thus it could be more difficult to detect the dependence between X and Y for a larger p < 1. As a result, we can see from Figure 3 that the power of each test decreases with the growth of p. Among the tests considered, the DC test and the HSIC test perform the worst in both models. Their powers rapidly decrease to near 0 when p increases to 0.4. It can be seen that the IPC test and the MV test have a better performance compared to other tests. Furthermore, the IPC test has a significant higher power than the MV test when p is between 0.6 and 0.8 in both models. This further supports our observation in Section 3.3 that the IPC test is more sensitive to tail differences.

Example 4.2:
This example considers a Poisson regression model. Let Z ∼ Poisson(u), where u = exp(0.8X 1 − 0.8X 2 + log 4), (X 1 , X 2 ) ∼ N((0, 1) , ), = (0.5 |i−j| ) 1≤i,j≤2 . Let Y = Z if Y ≤ 8; otherwise Y = 9. As a consequence, Y is a 10-categories variable. Consider n = 100, 150, . . . , 300. We apply the testing methods to test independence between Y and X 1 , Y and X 2 , respectively. And the asymptotic normal distribution in Theorem 3.1 is used to compute p-value for the IPC test. The empirical powers of each test based on 500 replications are summarized in Table 6. The IPC test has most excellent power performances in all settings. The HHG test and the HSIC test perform poorly when the sample size n ≤ 150.
The power of the IPC test is only slightly higher than that of the MV test. However, it is significantly higher than that of HHG and HSIC. The DC test has moderate performance, inferior to the MV test, but better than HSIC.

Example 4.3:
In this example, we evaluate the power of the IPC test in testing independence between continuous variables. Simulations are carried out with sample size n = 400. We choose R = 15 to implement the IPC test. Generating Z ∼ Unif(−2, 2), the following alternatives are considered. (a) Linear: X = Z/2 + 12γ ε, where γ is a noise parameter ranging from 0 to 1, and ε ∼ Unif(−2, 2) is independent of Z. To conduct the IPC test and the MV test, we uniformly slice Z into a categorical variable Y with R = 15 classes. The choices of the coefficients in all of the above are to make sure that a full range of powers can be observed when γ varies from 0 to 1. In addition to the test methods mentioned before, in this example, we further consider a comparison with a new test, the modified Blum-Kiefer-Rosenblatt (MBKR) test (Zhou & Zhu, 2018) which is applied for testing independence between continuous variables. Figure 4 presents the empirical power of each test based on 500 simulations at the significance level α = 0.05. We see from the figure that the IPC test performs quite excellent when the relationship has an oscillatory nature (the W-shaped and the sinusoid). It is also better than other competitors for the step function, and comparably well to the MBKR test for the quadratic function. However, the IPC test has poor performance compared to other tests for some smooth alternatives: the linear and the ellipse. For the linear function, the MBKR test has the highest performance. IPC test has comparable performance to HSIC. For the ellipse function, HHG test has the highest power and DC test performs the poorest. The performance of the IPC test, on the other hand, is moderate.
We give an intuitive explanation here for the excellent performance of the IPC test in detecting oscillatory relationships. Denote X | Y = r as the random variable which follows the conditional distribution of X given Y = r. By simple calculation, we find that if X and Z have an oscillatory relationship, then the variances of X | Y = r differ from each other more significantly. As a comparison, if X and Z have a linear relationship, then Var{X | Y = 1} = · · · = Var{X | Y = 15}. Consequently, the IPC test has a higher test power when there is an oscillatory relationship between X and Z.

Real data application
Example 4.4: We consider a data set from AIDS Clinical Trials Group Protocol 175 (ACTG175), which is available from the R package speff2trial. Many researchers have studied this data set, such as Tsiatis et al. (2008), Zhang et al. (2008), Lu et al. (2013) and Zhou et al. (2020). The data set contains 2139 HIV-infected subjects. And all the subjects were randomized to four different treatment groups with equal probability: zidovudine (ZDV) monotherapy, ZDV+didanosine (ddI), ZDV+zalcitabine, and ddI monotherapy. In addition to the treatment indicators indicating which group each subject was assigned to, the data contains many other important variables, such as the CD4 count at 20 ± 5 weeks post-baseline (CD420), the CD4 count at baseline (CD40), the history of intravenous drug use, et al.
In this study, in order to get more elaborated results, we only consider the subjects under ZDV+zalcitabine groups (524 subjects) in the following analysis. The goal of our study is to check whether the treatment effect under ZDV + zalcitabine groups is dependent on some other covariates. Following Hammer et al. (1996) and Tsiatis et al. (2008), we use the change from baseline to 20 ± 5 weeks in CD4 cell count, i.e., CD420−CD40, to measure the treatment effect. And the covariates of interest are listed below: history of intravenous drug use (0 =no, 1 =yes), gender (0 =female, 1 =male), antiretroviral history (0 =naive, 1 =experienced), age, and CD8 count at baseline (CD80). Thus the first three covariates are categorical, and the last two are continuous covariates. Let X = CD420 − CD40, and then there are 5 candidates Y. The null hypotheses are listed as follows.
• H 1 0 : X is independent of Y with Y = history of intravenous drug use; We apply the IPC, MV, DC, HHG and HSIC tests to these five hypotheses. The permutation test with K = 1000 permutated times is used for DC, HHG and HSIC tests to compute the p-values. And for H 4 0 and H 5 0 , we follow the approach in Section 3.2 to slice Y into a categorical variable with 15 classes to implement the IPC test and MV test. Table 7 summarizes the p-values of each test. If we only consider the significance level α = 0.05, then we observe that all the tests reject H 3 0 , H 4 0 and H 5 0 , and accept H 2 0 . That is, the treatment effect under the ZDV+zalcitabine group depends on antiretroviral history, age and CD80, but not on gender. Regarding the history of intravenous drug use, the IPC, DC, HHG and HSIC tests declare statistical dependence between this and the treatment effect. However, the MV test has a p-value larger than 0.05, and thus it can not reject H 1 0 . We draw the empirical conditional distributions of X given Y = 0 and 1 as well as the side-by-side boxplots in Figure 5, where Y = history of intravenous drug use. We see that the conditional distributions of X are different across different Y. However, the difference is relatively small and mainly occurs in the right tails. According to the discussion in Section 3.3, IPC test will be more powerful in such case. Also, the categories of Y are very unbalanced with #{Y = 0} = 448 and #{Y = 1} = 76, making the MV test more difficult to detect the dependence between X and Y.

Discussion
In this paper, we studied the IPC test of independence between a continuous variable X and a categorical variable Y. When the number of categories of Y is fixed, the IPC test statistic is in essence the k-sample Anderson Darling test statistic, and its theoretical properties were studied in Scholz and Stephens (1987). Our work mainly focused on two aspects. First, we derived the convergence rate of the IPC statistic to the IPC index and thus a lower bound of the power of the test at a given significance level with a finite sample size could be derived. Second, we showed that the standardized test statistic has an asymptotic normal distribution when the number of categories R diverges to infinity with the sample size. A distinguished merit is thereby shared by the IPC test, that is, its critical values can be easily obtained by using an approximated normal distribution when R is relatively large. As an application, we extended the IPC test to test independence between two continuous random variables. We uniformly slice a continuous variable into a discrete variable in order to apply the IPC test. And by allowing more slices as the sample size increases, the IPC test is allowed to gain more test power. The proposed test was compared to the DC test, HHG test, HSIC test and MV test on many simulation experiments. The results showed that the IPC test has a better performance in many scenarios. It is also possible to consider more different slicing schemes for independence testing of continuous variables. We left it for further research.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by National Natural Science Foundation of China [Grant numbers 12271286, 11931001 and 11771241].

Appendix. Proof of theorems
This appendix contains the technical proofs of Lemma 2.2 and Theorem 3.1. Lemma 2.1 and Theorem 2.4 are direct corollaries of Theorem 3.2, and the proof of Theorem 3.2 follows from Lemma 4 in Ma et al. (2022), and thus their proofs are omitted.

A.1 Notations and preliminaries
Recall that the IPC index of (X, Y), where X is a continuous random variable with support R X and Y ∈ {1, . . . , R} is a categorical variable with R categories is defined as . . , n, the IPC statistic is defined as We first provide a proof of Lemma 2.2.

Proof of Lemma 2.2.:
It is obvious that IPC(X, Y) = 0 if and only if X and Y are independent. By noticing that R r=1 p r = 1 and R r=1 F(x, r) = F(x), we have Hence we have IPC(X, Y) < 1.
Next, we give some preparations for the proof of Theorem 3.1. For given constant C > 0, let F n, Then we have the following lemmas.

A.2 Proof of Theorem 3.1
To avoid any ambiguity, Theorem 3.1 considers a sequence of problems indexed by (n k , R k , p 1,k , . . . , p R k ,k ), k = 1, 2, . . . , where the sample size n k → ∞, the number of categories R k → ∞, and let Y k = Y(R k ) denote the categorical variable with R k categories and p r,k = P(Y(R k ) = r), r = 1, . . . , R k . From now on, we shall omit the subscript unless specifically mentioned. Moreover, in Section A.2, we should keep in mind that X and Y are independent.

A.2.1 Architecture of the proof
Our aim here is to provide a general overview of the proof of Theorem 3.1. At a high level, the general structure is fairly simple. And to make the structure clear, we divide the proof into three parts.
(1) First, given a positive constant C, we substitute F n,C (x), F n,C and p r for F n (x), F n (x) andp r in the denominator of the IPC statistic, and thereby obtain And then prove that the difference between n IPC n (X, Y)/ √ R and n IPC n, F n,6 (x) F n,6 (x) , and define Under the condition √ R min 1≤r≤R p r = o(n 3/8 ), showing that n IPC n,6 (X, Y)/ √ R is close to n IPC n (X, Y)/ √ R and combined with the first part of the proof, we can derive that (3) Finally, consider We show that J 1n − (R − 1) 2( π 2 3 − 3)(R − 1) P → 0, and J 2n 2( π 2 3 − 3)(R − 1) can be viewed as a martingale difference sequence. Then by the well-developed theory of central limit theorem of the martingale difference (Hall & Heyde, 1980), we can complete the proof.
Combined with Lemmas A.1 and A.2, the proof in part 1 is not difficult. And the proofs in part 2 and part 3 follow from Cui and Zhong (2018) and Cui and Zhong (2019) with a small modification.

A.2.2 Part 1
We summarize the conclusion we want to prove in part 1 into the following lemma.

Lemma A.3: For a fixed constant C, let
For simplicity, write IPC n = IPC n (X, Y), and IPC n,C = IPC n,C (X, Y). Then if √ R min 1≤r≤R p r = o(n 1/2 ), and under condition that X and Y are independent, we have Hence, IPC n = (1 + O p ( √ R min 1≤r≤R p r n −1/2 )) IPC n . Next, let Let X (1) ≤ X (2) ≤ · · · ≤ X (n) be the ordered statistics of X 1 , . . . , X n . Since X is continuous, there are no ties among X 1 , . . . , X n .

A.2.3 Part 2
Recall that The following lemma is what we want to prove in part 2.
Next, we follow the proof of Lemma A.1 in Cui and Zhong (2019), and show that r). By the DKW inequality, we have Here, the second equality follows bŷ and the last equality follows by It is enough to show that Without loss of generality, let F(x) be the uniform distribution function, since we can make the transformation X = F(X) for the continuous random variable X. And For any x, y ∈ (0, 1), it can be easily proved that .
And also, we have E f n (X 1 ) 2f n y 2 = 1 n 4 i,j k,l E f i,n (X 1 , r) f j,n (X 1 , r) f k,n y, r f l,n y, r dx, i,j k,l E f i,n (x, r) f j,n (x, r) f k,n (X 2 , r) f l,n (X 2 , r) dy, E f n (X 1 ) 2f n (X 2 ) 2 = 1 n 4 i,j k,l E f i,n (X 1 , r) f j,n (X 1 , r) f k,n (X 2 , r) f l,n (X 2 , r) = O n −11/4 + p r − p 2

A.2.4 Part 3
Now, we will complete the proof of Theorem 3.1. Hence, where C is a constant. Next, we only need to show that x (1 − x) y 1 − y dx dy = π 2 3 − 3.
Let F i = σ {(X 1 , Y 1 ), . . . , (X i , Y i )} be the σ -field generated by a set of random variables {(X 1 , Y 1 ), . . . , (X i , Y i )}, i = 1, . . . , n. We see that is the summation of a martingale difference sequence with E(Z ni ) = 0 and Var( n i=2 Z ni ) = (1 − 1 n )(1 + O(n −1/8 )) → 1. According to Hall and Heyde (1980) × E f j,n (x, r) f k,n y, s f l,n x , t f m,n y , q dx dy dx dy = C R 2 n 4 j<k (n − k) (n − k) R r,s R t,q p r δ rs − p r p s p t δ tq − p t p q p r p s p t p q × x ∧ y − xy x ∧ y − x y × E f j,n (x, r) f k,n y, s f j,n x , t f k,n y , q dx dy dx dy  where C, C and C are constants. By the central limit theorem of the martingale difference (Hall & Heyde, 1980), we have as n → ∞. This completes the proof.