A Bartlett-type correction for likelihood ratio tests with application to testing equality of Gaussian graphical models

This work defines a new correction for the likelihood ratio test for a two-sample problem within the multivariate normal context. This correction applies to decomposable graphical models, where testing equality of distributions can be decomposed into lower dimensional problems.


Introduction
Testing the equality of distributions in a two sample problem can conveniently be done resorting to the likelihood ratio test (LRT) statistic, W n = − 2 log Λ n , where Λ n is the likelihood ratio.In Wilks (1938), it is shown that for samples coming from p-variate normal distributions, W n is asymptotically distributed as a chi-square with f = p(p + 3)/2 degrees of freedom.It is well known (Muirhead, 1982) that the quality of the asymptotic approximation might be poor in finite sample problems, even at moderate sample sizes.However, convergence to the asymptotic distribution can be improved by multiplying the LRT statistic by a constant (Van der Vaart, 1998).Under the low-dimensional setting, where the number of variables p is considered fixed and n is large, the correction factor ρ proposed in Muirhead (1982) improves the convergence rate, but when the value of p is close to n or increases with it, this correction is unable to provide an improvement.In the high-dimensional setting, where p is assumed to increase with n, Jiang and Qi (2015) proposed a standardization of the LRT statistic that allows to resort to the central limit theorem and, therefore, to switch to a normal approximation.This solution, however, proves to be inaccurate for small p, given the asymmetry of the LRT statistic.
In a recent work, He et al. (2021) studied the phase transition boundary, d in what follows, which characterizes the approximation accuracy by establishing the necessary and sufficient condition for the chi-square approximation to hold when p increases with n.The authors showed that the chi-square approximation holds if and only if p/n d 0, with d = 1/2 for the raw LRT statistic and d = 2/3 for its ρ-corrected version.
In this paper, we propose a new multiplicative correction factor, δ n hereafter, defined to be the ratio between the degrees of freedom of the asymptotic chi-square approximation and an approximation of the expected value of the LRT statistic, under the null hypothesis, as a function of p and n.We prove that its phase transition boundary d is equal to 1, so that the chi-square approximation holds in all situations in which p/n 0 .We show the usefulness of our proposal in the context of Gaussian graphical models (GGM).Here, the problem of testing equality of two distributions Markov with respect to a decomposable graph can be broken up into testing equality of lower dimensional Gaussian distributions.According to the structure of the graph, these lower dimensional problems can lead to very different values of the p/n ratio.Hence, it becomes crucial to rely on an approximation that guarantees a good finite sample accuracy even in extreme cases, where p is close to n.

A quick tour of the state of art
Consider two p-dimensional multivariate normal distributions, N p μ j , Σ j , j = 1, 2, and the problem of testing their equality based on two independent random samples of size n j .In detail, consider the hypothesis of equality of distributions H 0 : μ 1 = μ 2 , Σ 1 = Σ 2 vs .H a : H 0 is not true . (1) The LRT for testing (1), derived in Wilks (1938), can be written as where n = n 1 + n 2 , Σ and Σ j , j = 1, 2 are the maximum likelihood estimates of the covariance matrices under the null and alternative hypotheses, respectively, and det(Σ) denotes the determinant of Σ.Under the null hypothesis in (1), the LRT statistic W n = − 2 log Λ n , has an asymptotic chi-square distribution, with f = p p + 3 /2 degrees of freedom.
In settings where p is fixed and n is allowed to grow, a first correction of the statistic W n was proposed by Bartlett (1937), based on a rescaling aimed at making its mean exactly equal to the mean of the asymptotic chi-square distribution, i.e., equal to f.The corrected statistic, W n B say, takes the following form where E H 0 W n is the expected value of W n under the null hypothesis; see for example Van der Vaart (1998).Later, Muirhead (1982) proposed a version of Bartlett correction that leverages on an expansion of the correction factor, leading to the following correction (3) The author showed that the resulting corrected statistic, W n ρ say, where W n ρ = − 2ρ log Λ n , has a chi-square limit, with an improved approximation rate with respect to W n .Both corrections, however, fail when p and n grow at comparable rates.
Recent studies have considered the problem when the dimension p changes with the sample size n.In these settings, Jiang and Yang (2013) and Jiang and Qi (2015) established the following result based on the central limit theorem (CLT): where μ n and σ n > 0 are functions of both n and p and are the asymptotic mean and standard deviation of log Λ n , respectively.The use of the central limit theorem has the advantage of being appropriate in a high dimensional setting; however, it is less accurate when p is small, due to the asymmetric shape of the LRT distribution.

Our proposal
In this section, we propose a Bartlett-type correction of the LRT statistic, under the assumption that p changes with the sample size n.This correction replaces the denominator of ( 2) with a function of the approximated mean given in Eq. ( 4).In a two sample problem, the term μ n defined by Jiang and Qi (2015) is where n′ j = n j − 1 and r x = ( − log(1 − p/x)) 1/2 , for x > p, and n = n 1 + n 2 .Let μ w n = − 2μ n , we define the adjusted statistic T n as where f = p p + 3 /2 are the degrees of freedom of the chi-square asymptotic null distribution of W n .We now prove that T n is asymptotically chi-square distributed.
Theorem 1.Let p = p n n ∈ N be a sequence of integers 1 ≤ p n < n j − 1.Under H 0 , for T n defined as in ( 6), min j = 1, 2 n j ∞ and p/n 0, we have that and the phase transition boundary of T n is d = 1.

Proof. See Appendix A. □
In Theorem 1, the condition n j > p + 1 is assumed to ensure the existence of the LRT.Moreover, the condition p/n 0 defines the phase transition of the adjusted statistic, as introduced in He et al. ( 2021), which represents the boundary in which the chi-square approximation starts to fail as p increases and characterizes the approximation accuracy.
This boundary is an improvement over W n and W n ρ , whose approximations hold for p/n d 0, with d = 1/2 and d = 2/3, respectively.

Simulation study
In this section we present a simulation study to compare the performances of the LRT statistics based on four different approximations: the classic chi-square approximation, the ρ-adjusted approach of Muirhead (1982), the CLT approach of Jiang and Qi (2015) and our proposed δ-adjusted approach.
We study how the correction acts considering a fixed sample size and letting the dimension p change.Data are drawn from a multivariate normal distribution, with fixed covariance matrix and mean vector and we set n 1 = n 2 = 50 and p = 2, 30, 40.For each scenario, five thousand simulations are run.Results are shown in Fig. 1.For each value of p we plot the histograms of the empirical distribution of the four statistics, namely W n , W n ρ , T n and W n clt , and compare them with the chi-square distribution with p p + 3 /2 degrees of freedom in the first three cases and a standard normal in the last case.The top row of Fig. 1 shows how the statistic W n departs from the theoretical χ 2 distribution as p grows.This is expected and motivates the need of an adjustment when dealing with testing problems in which the dimension grows with n.In fact, if 50 observations might be enough for testing a problem of dimension 2, this is not the case for other values of p, especially when p and n have comparable values.The second row shows the results for the statistic corrected with ρ.Note that, also in this case, the approximation to the χ 2 fails as p approaches the group sample size, n j .With respect to the previous case, however, the departure from the chi-square distribution occurs for higher values of p.The third row highlights the problem of applying the CLT when p is small.For example, when p = 2 the approximation to the normal distribution fails, while it improves as p increases.This approach works well also for values of p very close to n j .The bottom row shows the accuracy of the approximation of the proposed adjusted statistic T n .Note that this correction leads to a good approximation regardless of the dimension of the testing problem, as long as p/n 0, and could be used as a unique tool for correcting W n at different values of p and n.
Finally, we run some simulations to examine the phase transition boundary in Theorem 1, under the null hypothesis.We consider p = n 1 ε , n 1 = n 2 , n = j = 1 2 n j and n j ∈ 100, 500, 1000 and finally ε ∈ 6/24, …, 23/24, 23.5/24 .⋅ denotes the rounding to the nearest integer function.We plot the empirical type-I error rate (over 1000 simulations) versus ε, for each chi-square approximation: W n , W n ρ and T n .Results are plotted in Fig. 2. The first two panels confirm the results in He et al. ( 2021), while the one on the right hand side shows how the phase transition boundary of the adjusted statistic T n is close to 1.The particular case with ε exactly equal to one is excluded, to ensure the identifiability of the covariance matrix.

Testing equality of distributions in Gaussian graphical models
In the remaining sections of the paper, we assume the reader is familiar with the basic theory of (decomposable) undirected graphical models, as presented for instance in Lauritzen (1996); see also Whittaker (1990).We adopt a standard terminology and a rather intuitive notation: we let G = V , E denote an undirected graph, with V a finite set of nodes and E = v, t : v ≠ t; v, t ∈ V a finite set of edges between vertices.We denote its cliques, separators and residuals by C, S and R, respectively.
Our proposal finds a natural application in the context of decomposable graphical models.One prominent advantage of decomposable graphs is that their cliques can be arranged so as to satisfy the running intersection property (RIP), and the joint probability distribution of the associated random vectors factorizes accordingly.In detail, if a graph G = V , E decomposes into k, say, cliques, let C i , i = 1, …, k, be a sequence of cliques satisfying the RIP and S i = C i ∩ C i − 1 and R i = C i \C i − 1 , i = 2, …, k the set of corresponding separators and residuals, respectively.Then, the probability distribution of the random vector X V factorizes as Lauritzen (1996) for an exhaustive explanation.
Such factorization renders tractable inference in the setting of large-scale graphical models, where the dimension p of the problem is higher that the available sample size n.Even when p < n, using the information on the graphical structure allows us both to improve the power of detecting a difference between the two distributions under study (the size of the model is reduced by constraints on the covariance matrix), and to localize that difference, thanks to the modular nature of graphical models (Djordjilović and Chiogna, 2022).This potential has fed the increasing prominence of graph-theoretic representations of probability distributions in fields such as statistical and quantum physics, bioinformatics, signal processing, econometrics and information theory.In our problem setting, this factorization assumes a crucial role as it allows to decompose the global problem of testing equality of distribution in two samples into a sequence of local tests of equality of distributions defined on a smaller set of variables, as follows with S 1 = ∅ and R 1 = C 1 .Hence, to test the global hypothesis H, one can test the k local hypotheses H i , i = 1, …, k of equality of the conditional distributions of X R i X S i .In the case of strong meta Markov models (Lauritzen, 1996;Edwards, 2000), as is the Gaussian case, Djordjilović and Chiogna (2022) showed that the local hypotheses H i , i = 1, …, k, are independent and that the LRT statistic for testing H also decomposes into k LRT statistics, one for testing each local hypothesis.Specifically, the LRT, W n , factorizes as where W n A , A ⊆ V , represents the LRT for the hypothesis of equality of distributions for X A , namely As proved in Theorem 1 of Djordjilović and Chiogna (2022), the …, k, in the right-hand side of ( 8) are all asymptotically independent and chi-square distributed, with f C 1 and f C i − f S i , i = 2, …, k, degrees of freedom, respectively, being f C i and f S i the degrees of freedom associated to the marginal test on the cliques and the separators, respectively.It is worth noting that, since B , the only quantities needed to compute W n are the observed values of the LRT on the marginal distributions defined over cliques and separators.It is easy to see that Here, Σ A is the maximum likelihood estimate of Σ A , the block submatrix corresponding to the nodes in A in the null covariance matrix Σ = Σ 1 = Σ 2 ; and Σ A j are the maximum likelihood estimates of Σ A j , the block submatrices corresponding to the nodes in A of Σ j , j = 1, 2.Moreover, each W n A has a chi-square limit with f A = p A p A + 3 /2 degrees of freedom, where p A is the cardinality of the set A. One remarkable side effect of the decomposition is that the dimension of each local problem is determined by the cardinality of the set of variables on which it is defined, so that, for a fixed sample size n, dimensionality regimes of local problems vary as a function of their cardinality.Local problems for which p ≪ n might coexist with problems for which p ≈ n.
Our proposal naturally steps in this context, providing a convenient solution able to accommodate such variety of situations.The extension of our correction to the test statistics of the kind W n C S does not represent an obstacle, resulting indeed to be straightforward.In The corrected statistics for the tests relative to the decomposition (7) simply become

Simulation in the graphical setting
In this section, we present a simulation study aimed at showing the performances of our corrected LRTs versus ordinary LRTs when working with Gaussian graphical models.For a real data application, see the Supplementary Material.We consider a p-variate Gaussian graphical model Markov with respect to a graph with p = 14 nodes and k = 4 cliques (see Supplementary Material for a representation of the graph).We consider a RIP-respecting sequence C 1 C 2 , C 3 , C 4 of with cardinalities C 1 = 8, C 2 = 5, C 3 = 3, C 4 = 2, giving rise to the following cardinalities for the corresponding sequence of separators: S 2 = 2, S 3 = 1, S 4 = 1.We generate data assuming that differences between the two conditions are attributable to nodes 1 and 2, located in C 1 .In particular, in one condition the means of the two elected nodes is set to be 1.5 times greater than the means of the same nodes in the other condition, while the variances are decreased by 50%.It follows that the null hypothesis of equality of distribution for X C 1 is false, since C 1 includes the two altered nodes.All remaining null hypotheses of equality of distribution for X R i X S i , i = 2, 3, 4, are true, thanks to the Markov properties of the graph.We run 10,000 simulations assuming n 1 = n 2 ∈ 10, 50, 100, 250 .For each sample, we compute the following statistics: The nominal Type I error rate is set to be α = 0.05.
Results are reported in Table 1 (see also the Supplementary Material for a simulation under the global null).Row 1 of Table 1 shows the empirical power of the test, while rows 2-4 show the empirical Type I error rates.For what concerns W n , note that for small sample sizes, the empirical Type I error rate is significantly higher than the nominal one, due to a large number of false rejections.This happens for all the local problems, but, for a fixed sample size, the number of false rejections largely depends on the dimension of the problem.As expected, this behavior decreases as the sample size increases, and asymptotically, the distribution of W n can be approximated with a chi-square.On the other hand, the adjusted statistic T n reaches the nominal size of the test for each considered sample size, regardless of the dimension of the local problems.The power of the test based on the adjusted statistic T n on the clique C 1 increases with the sample size.The high power observed for W n should not be misleading, as it highly depends on the false rejections due to the approximation issues already highlighted in Section 4. The adjusted statistic meets the expectations, being able to identify the altered clique, while controlling the Type I error of the remaining local tests.

Conclusions
In this paper, we proposed an adjusted LRT, which leads to valid inference at different dimensionality regimes.Our proposal overcomes some weaknesses of alternative corrections reported in the literature, that occur at small sample sizes and, in particular, when dimension p is close to n.We showed that the phase transition boundary of the LRT statistic corrected following our proposal is d = 1, indicating that the only condition needed to work is p/n 0. Simulations confirmed that the adjusted test statistic is well approximated by a chi-square distribution both for small and large values of p.
In the context of decomposable Gaussian graphical models, where the problem of testing equality of two networks breaks down into a sequence of problems defined on smaller sets of variables, our correction can help tackling the possibly high heterogeneity resulting from the decomposition in terms of dimensionality regimes.Our simulation study showed that the size of the test was reached for different configurations of p and n and, in the presence of a difference in two conditions, the adjusted statistic is able to detect it, still controlling the Type I error in the other cliques.Chi-square approximation of W n , W n ρ and T n .Empirical type-I error rate for n j ∈ 100, 500, 1000 , j = 1, 2 over 1000 simulations.The vertical dotted lines represents the phase transition boundaries for the three statistics: 1/2, 2/3 and 1, respectively.The horizontal dashed line represents the nominal significance level, 0.05.

Fig. 1 .
Fig. 1.Simulation results with n 1 = n 2 = 50 and p = 2, 30, 40.From the top to the bottom row: empirical distribution of W n , W n ρ , W n clt , and T n .The solid line in the first, second, and fourth rows shows the nominal χ 2 distribution, with 5, 495 and 860 degrees of freedom (from left to right) respectively.The solid line in the third row, corresponding to the W n clt statistic, shows the standard normal distribution.

Table 1
Power and Type I error computed for each term of the decomposition.Proportion of rejected tests out of 10 thousand simulations, for different sample sizes, with significance level