Mixtures of g -Priors for Analysis of Variance Models with a Diverging Number of Parameters

. We consider Bayesian approaches for the hypothesis testing problem in the analysis-of-variance (ANOVA) models. With the aid of the singular value decomposition of the centered designed matrix, we reparameterize the ANOVA models with linear constraints for uniqueness into a standard linear regression model without any constraint. We derive the Bayes factors based on mixtures of g -priors and study their consistency properties with a growing number of parameters. It is shown that two commonly used hyper-priors on g (the Zellner-Siow prior and the beta-prime prior) yield inconsistent Bayes factors due to the presence of an inconsistency region around the null model. We propose a new class of hyper-priors to avoid this inconsistency problem. Simulation studies on the two-way ANOVA models are conducted to compare the performance of the proposed procedures with that of some existing ones in the literature.


Introduction
In the field of applied statistics, analysis-of-variance (ANOVA) is a collection of statistical models commonly used to test hypotheses about the presence of a group (treatment) effect. It has been widely recognized as an important tool to formulate evidence favoring certain theoretical positions and disfavoring others in various areas of application, such as agriculture (VanLeeuwen, 1997), biology (Lazic, 2008), ecology (Qian and Shen, 2007), and psychological studies (Rouder et al., 2012).
We deal with applications of hypothesis testing in the multi-way ANOVA designs, which have been employed by researchers to assess main effects and their interactions in experimental designs. Suppose that Y = [y 1 , · · · , y n ] is a random sample of size n drawn from normally distributed with mean vector μ μ μ = [μ 1 , · · · , μ n ] and covariance matrix σ 2 I n , where μ μ μ and σ 2 are both unknown, and I n is an n × n identity matrix. The corresponding model can be written as where N n μ μ μ, Σ Σ Σ denotes the multivariate normal distribution of dimension n with mean vector μ μ μ and covariance matrix Σ Σ Σ. The cell mean μ μ μ can be further decomposed as μ μ μ = 1 n α + Xβ β β, where 1 n is an n × 1 vector of ones, X represents an n × p design matrix, β β β is a p-dimensional vector of unknown regression coefficients The model can be reexpressed as Y ∼ N n 1 n α + Xβ β β, σ 2 I n . (2) The design matrix X is often referred to as factors in the experiment and is populated by entries of one or zero that describe how effect parameters map onto observations. The regression coefficient β β β can be viewed as level-specific parameters. Draper and Smith (1998) developed a dummy coding, denoted by model.matrix (·) in R language (R Development Core Team, 2011) to construct X. Model (2) may not be identifiable, because there are a total of p + 1 parameters that determine the p cell means, and thus some linear constraints are usually imposed for uniqueness. We here consider the sum-to-zero linear constraints proposed by Fujikoshi (1993). As a result, the intercept becomes the grand mean and each regression coefficient represents the deviation from the grand mean. The regression coefficient of the last level is equal to minus the sum of other regression coefficients. Later on, we will justify that any linear constraint can be adopted without affecting the main results in this paper.
In the ANOVA models, we are interested in the model selection problem where we would like to select a model which is compatible with the observable data. This problem is equivalent to selecting a submodel of (2), whose mean is of the form where X γ is an n × k submatrix of X and β β β γ is a k × 1 vector of unknown regression coefficients. Various procedures have been proposed for the above problem ranging from frequentist ones such as the p-values and the Akaike information criterion (AIC) to Bayesian methods. From a frequentist viewpoint, researchers routinely report the pvalues as a measure of evidence for competing positions, even though a number of critiques of using the p-values have been raised in the literature; see Rouder et al. (2012) for detailed comments on the topic. Recently, a growing chorus of researchers advocate the use of the Bayesian procedure as evidence; see, for example, Maruyama (2012), Rouder et al. (2012), Wetzels et al. (2012), Wang and Sun (2013), among others. There are many advantages of using Bayesian approaches for making inference over the frequentist one; see Berger and Pericchi (2001) for a detailed discussion.
For the sake of simplicity, we describe the Bayesian formulation for comparing model (3) with the null model (μ μ μ = 1 n α) which does not include any of the predictors. This formulation can be easily adjusted for other model comparison problems. Since the Bayesian approach for the problems of model selection and hypothesis testing is conceptually the same (Guo and Speckman, 2009), we consider the hypothesis testing problem of the form on the posterior model probabilities. The Bayes factor for comparing M γ to M 1 in (4) can be expressed in terms of the ratio of the two marginal likelihood functions where the marginal likelihood of Y given M γ is and the marginal likelihood of Y given M 1 is where π γ (α, β β β, σ 2 ) and π 1 (α, σ 2 ) are the joint prior densities for the unknown parameters under M γ and M 1 , respectively. ] > 100, provides "decisive" evidence. Note that the logarithm of the Bayes factor can be interpreted as the weight of evidence provided by the data; see Kass and Raftery (1995).
We need to specify priors for the unknown parameters α, β β β γ , and σ 2 . We choose a 'noninformative' prior for the common parameters α and σ 2 that appear in both models and place a partially conjugate normal prior on β β β γ that appears only in the alternative model. We consider Zellner's g-prior because it leads to a simple expression of the marginal likelihood. The choice of g becomes quite crucial in the revival of Bayesian inference because it controls the amount of information in Zellner's g-prior. A nice review of mixtures of g-priors and different fixed values of g was recently provided by Liang et al. (2008). In this paper, we follow the suggestion of Ley and Steel (2012) and adopt a hyper-prior on g to reflect its uncertainty and randomness and to allow for the data to determine the inference on g. This hyper-prior must be proper, because the null model does not involve g and the improper prior will yield the Bayes factor with an undefined normalizing constant.
Since the seminal work of Zellner and Siow (1980), the Zellner-Siow (ZS) prior has been widely adopted for the unknown parameters in normal linear regression models. Liang et al. (2008) considered three families of hyper-prior for g (the ZS prior, the hyperg prior, and the hyper-g/n prior) and studied Bayes factor consistency associated with these priors when the model dimension p is fixed. Here, consistency means that the true model (hypothesis) will be eventually detected if enough data is provided, assuming that it exists. Later on, Maruyama and George (2011) proposed the beta-prime prior on g, which yields an analytic Bayes factor without integral representation. They also proved that consistency holds with this prior under the same scenario. Ley and Steel (2012) conducted Monte Carlo simulations to compare the performance of various mixtures of g-priors in the literature. Wetzels et al. (2012) recently generalized mixtures of g-priors into the ANOVA models, whereas they failed to further examine the performance of the Bayes factors under these priors from both theoretical and practical points of view. Rouder et al. (2012) developed a set of the Bayes factors based on multivariate generalizations of the Cauchy prior and also did not establish consistency property of the considered Bayes factors, even thought they emphasized consistency as a desirable theoretical property for a proposed procedure. Bayarri et al. (2012) considered consistency as one of the desired criteria that a model selection prior should satisfy in the context of normal linear regression models, whereas there is not much discussion about this topic in ANOVA designs. This paper fills the gap by studying Bayes factors consistency under various mixtures of g-priors in the ANOVA models.
Linear models with a growing number of parameters have received considerable attention in the literature. This is often referred to as 'large p, large n' regime, in which there is a sizable number of predictors compared to the sample size. Consistency under this regime has been studied in linear models; see, for example, Moreno et al. (2010), Girón et al. (2010), Johnson and Rossell (2012), Wang and Sun (2014), among others. However, the study is still quite scant in ANOVA settings (for exceptions, see Maruyama, 2012 andWang andSun, 2013). The question we address here is whether consistency still holds under this regime for the proposed Bayes factors in the ANOVA models. Equivalently, we consider Bayes factor consistency when the model dimension increases with the sample size with rate k = O(n b ) where 0 ≤ b ≤ 1. As n approaches infinity, we focus on the following two asymptotic scenarios Scenario 2: k growing proportionally to n.
We first consider consistency of the Bayes factors under two commonly used hyper priors on g: the Zellner-Siow prior (Zellner and Siow, 1980) and the beta-prime prior (Maruyama and George, 2011). The first prior is often considered as a default choice for the unknown parameters in the literature; see, for example, Bayarri and García-Donato (2007), Wetzels et al. (2012), Rouder et al. (2012). The second one results in a closed-form Bayes factor, which can thus be calculated as easily as in the case with the fixed values of g. It deserves mentioning that the proposed results based on the second prior generalize some existing ones for the one-way/two-way ANOVA models studied by Maruyama (2012) and Wang and Sun (2013).
We then show that under very general conditions, the Bayes factors under the above two priors are consistent under the null model and are inconsistent under the alternative model when k grows proportionally to n. Although we can explicitly characterize the inconsistency regions in terms of a pseudo-distance between the two competing models, the inconsistency regions could lead to the rejection of the alternative hypothesis when it is true. Of particular note is that the inconsistency region under the Zellner-Siow prior is slightly larger than the one under the beta-prime prior. This finding may justify that the latter outperforms the former from a theoretical viewpoint.
To avoid the inconsistency regions mentioned above, we propose a new family of hyper-priors on g, which is very flexible and requires little or no a priori input. The Bayes factor based on the proposed prior is not only a closed-form Bayes factor with the unidimensional integral, but also is consistent whichever the true model is and not vulnerable to the Jeffreys-Lindley paradox and the information paradox. More importantly, we demonstrate that the proposed results in the ANOVA models are also valid for the hypothesis testing problem in linear models with a growing number of parameters. The study is of the utmost importance to researchers from both theoretical and practical viewpoints. From a theoretical perspective, it justifies the asymptotic behavior of the proposed procedures for choosing the true model when it exists. From a practical perspective, it provides a guideline to choose appropriate mixtures of g-priors in many practical applications.
The remainder of the paper is organized as follows. In Section 3, we use the two-way ANOVA model to illustrate how the singular value decomposition (SVD) can be implemented to convert the ANOVA model with constraints into a linear regression model without constraint. In Section 3, we specify priors for the unknown parameters and derive the Bayes factors based on various mixtures of g-priors. In Section 4, we investigate Bayes factor consistency under various mixtures of g-priors when k = O(n b ) where 0 ≤ b ≤ 1. In Section 5, simulation studies are conducted to evaluate the performance of the considered priors. Finally, some concluding remarks are presented in Section 6, with additional material and proofs given in the Supplementary Material (Wang, 2016).

Model reparameterization
In this section, we discuss the general development of Bayesian analysis on multi-way ANOVA models. As mentioned in Section 1, we cannot directly implement Zellner's g-prior for the regression coefficients, because the design matrix (2) does not have full column rank. In order to overcome such difficulty, we implement the sum-to-zero constraints for uniqueness and then consider the SVD of the centralized design matrix. Specifically, we reparameterize the ANOVA model with constraints for uniqueness into a standard linear regression model without any constraint. For the sake of simplicity, we choose the two-way unbalanced ANOVA model to illustrate the implementation of the full-rank model reparameterization.
Consider a factorial design with two treatment factors A and B having a and b levels, respectively, with a total of ab factorial cells. Suppose that y ijk is the kth observation in the (i, j)th cell defined by the ith level of factor A and the jth level of factor B, satisfying the following model where μ ij 's represent the cell mean (expected value), and the residual errors ε ijk 's are assumed to be independent random variables, each having a normal distribution with mean zero and unknown variance σ 2 . The value of n ij represents the number of observations in the (i, j) cell. The total number of observations is n = a i=1 b j=1 n ij . Model (6) becomes balanced for the case with n 11 = · · · = n ab = m.
We decompose the cell mean μ ij into the form μ ij = μ + α i + τ j + γ ij for i = 1, · · · , a and j = 1, · · · , b, where μ is the grand mean, α i and τ j represent the ith and jth main effects of factors A and B, respectively, and γ ij is their (i, j)th interaction. Model (6) can be rewritten as The above model is not identifiable because the parameters (μ, α i , τ j , γ ij ) cannot be uniquely defined. Following the work of Fujikoshi (1993), we impose the following linear constraints (7) can be rewritten compactly in a matrix form as follows where ε ε ε = [ε 111 , · · · , ε abn ab ] follows the multivariate normal distribution with mean vector 0 n and covariance matrix σ 2 I n . We follow the Searle's notations (Searle et al., 1992, pp. 212-213) and let Z A , Z B and Z C be matrices of orders n×a, n×b, and n×ab, respectively. They are given by where the use of c and d within the braces represents that the corresponding partitioned matrices are of the column and diagonal types, respectively. By using the following useful products (Searle et al., 1992) it can be readily verified that , · · · , n a· , n ·1 , · · · , n ·b , n 11 , · · · , n 1b , n 21 , · · · , n ab ] ⊗ 1 n /n 1 a+b+ab = 0 a+b+ab , is an n × (a + b + ab) matrix and ⊗ stands for the Kronecker product. We thus treat X F −[n 1· , · · · , n a· , n ·1 , · · · , n ·b , n 11 , · · · , n 1b , n 21 , · · · , n ab ]⊗1 n /n as the centered matrix of X F . We consider the SVD of this centered matrix, namely, , · · · , n a· , n ·1 , · · · , n ·b , n 11 , · · · , n 1b , n 21 , · · · , n ab ] ⊗ 1 n /n = UΣ Σ ΣV , where U and V are n × (ab − 1) and (a + b + ab) × (ab − 1) orthogonal matrices, respectively, and Σ Σ Σ is an (ab − 1) × (ab − 1) diagonal matrix with positive diagonal (9) is equivalent to the standard linear regression model with unconstraint regression coefficient β β β * F given by In the two-way ANOVA model, we are usually concerned about the following five realistic models that include main effects and interactions M 1 : No effect of factor A and no effect of factor B, i.e., α α α = 0 a , τ τ τ = 0 b , γ γ γ = 0 ab . With the help of the above full-rank parametrization, the resulting model M γ can be rewritten as a standard linear model given by We here used the two-way unbalanced ANOVA model to illustrate how the SVD can be implemented to reparameterize the model with constraints into a standard linear regression model without constraint. In a similar way as done above, we can generalize the previous developments to the hypothesis testing problem (4) in multi-way ANOVA models and obtain that where β β β * γ is a k-dimensional vector of regression coefficients and U is an n×k orthogonal matrix such that U U= I k . It should be noted that the value of k is determined by the levels of factors in the alternative model under consideration. One advantage of such reparameterization is that it avoids the difficulty of directly implementing Zellner's gprior for the alternative model in (4), whose design matrix X γ does not have full column rank.
It is remarkable that Bayarri and García-Donato (2007) proposed the Bayes factor based on the Zellner-Siow prior for testing the general hypotheses in normal linear models, which do not require the design matrix to be of full rank. The Bayes factor has a simple expression with the unidimensional integral. They generalized the Bayes factor to a variety of problems in which the null hypotheses are given by general linear restrictions. In the two-way ANOVA models, we observe that the constraints in (8) can be expressed as a form of Cβ β β F = 0 with This observation shows that the model reparameterization based on the SVD in this paper is equivalent to the full-rank factorization technique described in Proposition 5 of Bayarri and García-Donato (2007). Consequently, we may conclude from Theorem 1 of Bayarri and García-Donato (2007) that any linear constraint can be adopted for uniqueness without affecting our Bayes factors and their corresponding theoretical properties in this paper.

Mixtures of g-priors
Bayesian analysis begins with prior specifications for the unknown model parameters.
For our hypothesis testing problem (10), we need to specify priors for the unknown parameters α, σ 2 , and β β β * γ . We can regard α and σ 2 as 'common' parameters that appear in both models M 1 and M γ and thus specify the right-Haar prior One may refer to Berger and Pericchi (1996) for an asymptotic justification of the use of the same (even noninformative) prior on the common parameters. A more solid justification has recently been provided by Bayarri et al. (2012) based on the group invariance and predictive matching arguments. Since U U = I k , we assign Zellner's g-prior on β β β * γ p(β β β * γ | σ 2 , g) ∼ N 0 k , gσ 2 I k , where 0 k is a k×1 vector of zeros. The scaling factor g controls the amount of information of the prior, so its choice is very critical in the revival of Bayesian inference and will be discussed later. The resulting Bayes factor for comparing M γ and M 1 is given by where R 2 is the coefficient of determination of M γ in (10).
There has been a large fraction of the literature dealing with the choice of g, which can be categorized into two types: a fixed value of g and a hyper-prior on g. George and Foster (2000) commented that fixed values of g may cause some undesirable behavior: large values would favor the null model, while small values result in the prior dominating the likelihood. We thus choose a proper hyper-prior on g, denoted by π(g), which yields the Bayes factor We observe that a number of hyper-priors on g are obtained in the literature by specifying priors on the shrinkage factor g/(1 + g), mainly because their properties are often evaluated in terms of this shrinkage factor; see Ley and Steel (2012). In the following section, we consider two commonly used hyper-priors on g and propose a new type of hyper-priors to avoid the inconsistency issue encountered by the existing procedures.

The Zellner-Siow prior
Inspired by Jeffreys (1961) comments of using the Cauchy prior for comparing a univariate normal mean, Zellner (1986) proposed the multivariate Cauchy distribution for the regression coefficients in normal linear models. This prior is equivalent to Zellner's g-prior with an inverse gamma distribution on g which corresponds to the following prior for the shrinkage factor ϑ = g/(1 + g) The Zellner-Siow prior is often adopted to derive 'default' Bayes factor approaches in the Bayesian literature; see, for example, Wetzels et al. (2012), Rouder et al. (2012). The Bayes factor under this prior has a simple expression which can be easily calculated using unidimensional integration technique. Alternatively, we may employ Monte Carlo approximation by generating samples from an inverse gamma distribution (Bayarri and García-Donato, 2007) or the Laplace approximation with a change of variables (Liang et al., 2008). Maruyama and George (2011) recently assigned the beta-prime prior on g

The beta-prime prior
which is obtained by specifying a beta distribution with parameters s + 1 and t + 1 for 1 − ϑ. Later on, Maruyama (2012) derived several expressions of the Bayes factors based on this prior in the one-way/two-way ANOVA models. In this paper, we propose a unified expression of these Bayes factors, which can be implemented the multi-way ANOVA models. Simple algebra shows that the Bayes factor can be unified as where 2 F 1 is the Gaussian hypergeometric function defined as with γ > β > 0. The 2 F 1 in (16) can be numerically estimated by using subroutines in the Cephes library (http://www.netlib.org/cephes). With the particular choice of t, we derive an explicit Bayes factor summarized in the following theorem with its proof in Appendix A of the Supplementary Material.  (10) is given by Such a closed-form expression is unavailable for other choices of t. By following the suggestions by Maruyama and George (2011), we recommend the choice of s ∈ (−1, −1/2] for practical applications. Of particulate note is that when the number of parameters under M γ is bounded, they are asymptotically equivalent to the Bayesian information criterion (BIC) summarized in the following theorem.

Theorem 2. When k is fixed and n is sufficiently large, the Bayes factors in (15) and (17) can be approximated by
Proof. The proof directly follows Stirling's formula for the gamma function (in the Supplementary Material) and some algebraic manipulations.

Remark 1.
In the context of linear regression models, Moreno et al. (2015) also showed that the intrinsic Bayes factor is asymptotically equivalent to the BIC, illustrating that the three Bayes factors have the same asymptotic behavior with the BIC when k is bounded and n is sufficiently large.

The proposed prior
In the g-prior framework, Ley and Steel (2012) commented that as the sample size n increases, we should account for the fact that the information accrues with the sample size and that the inverse of the information matrix is of order 1/n. They suggested the choice of prior distribution on g/n, instead of g. Equivalently, we should choose the beta distribution on the shrinkage factor υ = g/(n + g), This suggestion results in the hyper-g/n prior given by which can be viewed as a modification of the hyper-g prior (π HG (g) = (a−2)/2(1+g) a/2 ); see Liang et al. (2008) in detail about the use of the hyper-g/n prior.
When the model dimension k grows with n, it seems natural to take the accrual of the information with both n and k into consideration. A possible way is to specify prior distributions on g/r rather than g or g/n, where r = n/k. This motivates us to choose a beta distribution with parameters α + 1 and β + 1 for the modified shrinkage factor ϕ = g/(r + g), instead of υ = g/(n + g). The prior is given by which leads to the following hyper-prior on g π(g) = 1 rB(α + 1, β + 1) The prior in (20) is the Pearson type VI distribution (Pearson, 1895) with shape parameters α > −1, β > −1, and scale parameter r > 0. This prior has a density whose right tail behaves like (g/r) −(α+2) , thus providing very fat tails for small values of α.
We suggest the choice of α ∈ (−1, −1/2]. The Bayes factor under this prior has a simple expression with the unidimensional integral and can thus be easily computed by numerical integration techniques. Alternatively, we can use the Monte Carlo method by generating t 1 · · · , t N observations from the Pearson type VI distribution with the parameters α, β, and r. The Bayes factor can be approximated by Note that the Pearson type VI random variables can be easily generated by using rpearsonVI(·) in the R-package of PearsonDS; see Becker and Klößner (2013).
We consider several choices of the hyperparameters to illustrate the flexibility of the proposed prior in (20) . We start from the hyper-g/r prior with α = a/2 − 2 > −1 (i.e., a > 2) and β = 0 which is similar to the hyper-g/n prior in (19) proposed by Liang et al. (2008). The main difference between the two priors is that the prior in (21) depends on k only through r. This is a key feature to study the asymptotic behavior of the Bayes factors under the scenario in which k The proposed prior in (20) corresponds to the horseshoe prior (Carvalho et al., 2010) developed in a different setting if we choose α = β = −1/2. It is therefore of interest to study the effects of the shrinkage factor ϕ if we have the U-shape prior with its spike around 0, which provides very strong shrinkage and thus induces zero regression coefficients in the shrinkage-prior framework. We thus consider the proposed prior with The proposed prior in (20) is also related to a natural 'generic' prior distribution for the shrinkage factor ϕ, which specifies a uniform prior on the interval [0, 1] if we choose α = β = 0. The prior is given by The choice of the uniform prior on ϕ is often treated as reference purposes of Bayesian analysis in the literature. Finally, we propose a general family of hyper-priors on g given by where f r (·) is a function satisfying ∞ 0 f r (t) dt = 1. The Bayes factor under the prior in (24) is given by It should be noted that the Bayes factor under the prior in (20) is just a special case of the one in (25) if we choose f r (t) as We have considered three different types of hyper-priors on g in the ANOVA models: the Zellner-Siow prior, the beta-prime prior, and the new proposed prior. It is therefore of interest to compare the performance of the Bayes factors under these priors from both theoretical and practical points of view.

Bayes factor consistency
In this section, we consider the asymptotic behavior of the Bayes factors under various mixtures of g-priors. We focus on the information paradox in Section 4.1, and the model selection consistency in Section 4.2.

Information paradox of mixtures of g-priors
Suppose, for the hypothesis testing problem in (10), model M γ accounts for an overwhelming amount of the variability of data compared to model M 1 . In this setting, with both n and k fixed, R 2 should approach 1 (or equivalently, the usual F -statistic approaches infinity). We anticipate that the Bayes factor in (13) goes to infinity as the information against M 1 accumulates, whereas it converges to a constant (1+g) (n−k−1)/2 for a fixed choice of g. This phenomenon is often called the information paradox; see, for example, Jeffreys (1961), Wang and Sun (2013), among others.
Theorem 3. With both n and k fixed satisfying n > k + 1, and R 2 → 1, the Bayes factor in (15) avoids the information paradox, and the Bayes factor in (17) also avoids the information paradox when −1 < s < (n − k − 3)/2.
Proof. The proof of the result for the Bayes factor in (15) follows directly from Theorem 2 of Liang et al. (2008) and is thus omitted for simplicity. When R 2 → 1, the Bayes factor in (17) approaches infinity for −(n − k − 3)/2 + s < 0, indicating that it avoids the information paradox given that −1 < s < (n − k − 3)/2.
We here present a general condition under which the proposed prior in (24) resolves the information paradox.
Theorem 4. With both n and k fixed satisfying n > k+1, and R 2 → 1, the Bayes factor in (25) avoids the information paradox whenever Proof. The integrand of the Bayes factor in (25) is a monotonic increasing function of R 2 . By using the monotonic convergence theorem, the Bayes factor tends to ∞ 0 (1 + rt 2 ) (n−k−1)/2 f r (t) dt as R 2 → 1. Thus, the nonintegrability of (1 + rt 2 ) (n−k−1)/2 f r (t) is a necessary and sufficient condition for resolving the information paradox.
It deserves mentioning that the above three different mixtures of g-priors may fail to resolve the information paradox with a minimal sample size. For instance, the information paradox exists in the one-way ANOVA model with a fixed-effect having 2 levels with 1 observation in each level.

On consistency of mixtures of g-priors
We study Bayes factor consistency associated with various mixtures of g-priors when k = O(n b ) for 0 ≤ b ≤ 1. Consistency means that the true hypothesis will be selected if enough data is provided, assuming that one of the hypotheses is true. According to Fernández et al. (2001), the Bayes factor is said to be consistent if if M 1 is the true model, where 'plim' stands for convergence in probability. As n → ∞, the asymptotic behavior of the Bayes factor depends on the pseudo-distance between the two competing models. We define the pseudo-distance from M γ to M 1 in (10) as where β β β * γ is the regression coefficients of the alternative model. For simplicity of notation, we assume that under the alternative model, the limit of the distance exists and is denoted by δ = lim n→∞ δ n .
Theorem 5. We consider the Bayes factors BF BP in (17) and BF ZS in (15) for the hypothesis testing problem given by (4) in the ANOVA models.
Proof. See Appendix B of the Supplementary Material.
Part (a) of Theorem 5 shows that under Scenario 1, the two Bayes factors asymptotically choose the true model whichever the true model is. However, Part (b) indicates that under Scenario 2, they fail to be consistent under M γ due to the presence of a small inconsistency region around M 1 and that we can characterize the inconsistency region with the pseudo-distance between the two models. Figure 1 shows that as r increases, the two regions approach each other and will eventually disappear when r tends to infinity. In other words, equations (26) and (27) are both decreasing convex functions of r, satisfying lim r→∞ ξ(r) = 0 and lim r→∞ Q(r, τ ) = 0 with τ being the solution of equation (27). We observe from Figure 1 that the inconsistency region of BF BP is narrower than the one of BF ZS . We may thus conclude that BF BP outperforms BF ZS from a theoretical point of view.
In the context of the one-way ANOVA model, Theorem 5 seems to be in contradiction with Theorem 3.2 of Berger et al. (2003), which states that under Scenario 2 with known σ 2 , the Bayes factor under the general form of the prior (Equation 14 of Berger et al., 2003) is always consistent under the alternative model if δ > 0. Under the same scenario, Figure 1: The inconsistency region (an area between the curve and x-axis) for the Bayes factor under Scenario 2 when sampling from M γ . Girón et al. (2010) also observed a similar contradictory result for the Bayes factor based on the intrinsic prior, whose inconsistency region is given by δ > κ(r), where which is a deceasing function of r satisfying lim n→∞ κ(r) = 0. The main reason of this contradiction is that the prior of the general form studied by Berger et al. (2003) incorporates the prior with its variance tending to 0, indicating that the mass of the prior is getting less and less to any neighborhood of the null model. Thus, the inconsistency region around the null model will disappear as the variance of the prior approaches 0. We observe that the asymptotic behavior of BF ZS is quite similar to the intrinsic Bayes factor, because the intrinsic prior can be represented as a scaled mixture of g-priors with a beta mixing distribution; see Lemma 1 of Womack et al. (2014). We here refer the interested reader to Section 4 of Girón et al. (2010) for a further discussion about the inconsistency region of the intrinsic Bayes factor.
Theorem 6. We consider the Bayes factor BF HGr in (25) for the hypothesis testing problem given by (4)  This theorem shows that the Bayes factor under the proposed prior in (24) is always consistent whichever the true model is when k = O(n b ) with 0 ≤ b ≤ 1. As an illustration, we compare the performance of the two Bayes factors under the various mixtures of g-priors in the one-way ANOVA model. To mimic the asymptotic scenario, let n = 1, 000 and the levels of the treatment factor be 100, indicating that the average number of observations per level is r = 10. Suppose in particular that we take δ = {0.2, 0.3, 0.35}. Numerical results are presented in Table 1. As one would expect, the alternative model M γ is more favorable than the null model M 1 with the large sample size. When δ = 0.2, BF BP and BF ZS are both in favor of M 1 . When δ = 0.3, BF BP provides strong evidence against M 1 , whereas BF ZS is still in favor of M 1 because δ = 0.3 < 0.3086. These results are quite reasonable because the two values of δ fall in the corresponding inconsistency regions. As expected, when δ = 0.35, they both correctly support M γ . This simple example confirms the results of Theorem 5 and shows that BF BP has a smaller inconsistency region than BF ZS when the model dimension grows proportionally to the sample size n. In addition, this simple example further confirmed the results of Theorem 6.
where p γ is the number of non-zero regression coefficients and R 2 γ is the ordinary coefficient of determination of model M γ . By replacing k and R 2 with p γ and R 2 γ , respectively, the Bayes factor in (14) in the ANOVA models is the same as the one proposed by Liang et al. (2008) in linear models. This connection shows that the proposed Bayes factors and their attractive properties (Theorems 5 and 6) remain valid in the context of linear regression models under the same asymptotic scenarios if we assume the existence of the pseudo-distance from M γ to M 1 given by where β β β γ is the regression coefficients of model M γ . Equivalently, when the sample size and the number of parameters approach infinity, the limit of this distance exists and is given by δ = lim n→∞ δ n > 0 when sampling from M γ .

Simulation study
In this section, we conduct simulation studies to compare the performance of the Bayes factors under the various mixtures of g-priors in the two-way ANOVA models. We consider the three special choices of the proposed prior: the HGr1 prior in (21), the HGr2 prior in (22) with a = 3, and the HGr3 prior in (23). Along with these priors, we also consider two commonly used mixtures of g-priors: the Zellner-Siow (ZS) prior and the beta-prime (BP) prior with s = −1/2.
As an illustration, we mainly focus on the model comparison between M 5 and M 1 described in Section 2. The results from other model comparisons are similar and are thus omitted for simplicity. Without loss of generality, let μ = 0, and σ 2 = 1; see Min and Sun (2015). We simulate the data from the sampling models as follows. For the two-way ANOVA model (9) in Section 2, α α α is generated from N (0 a , gσ 2 I a ), τ τ τ from N (0 b , gσ 2 I b ), γ γ γ from N (0 ab , gσ 2 I ab ), and ε ε ε from N (0 n , σ 2 I n ), where g = {0, 0.05, 0.2}. When g = 0, the data are from the null model M 1 ; when g > 0, the data are from the full model M 5 . To mimic the two asymptotic scenarios in the Introduction, we consider two different choices of {m, a, b} in model (6)    BF ZS in terms of the RF of the correct model. These findings exactly match the claims in Theorems 5 and 6. Even when both k and r are large (not shown here), BF ZS and BF BP could still be negative occasionally and support the null model for g = 0.05 (i.e., the sampling model is M γ ). This undesirable situation occurs, mainly because the two Bayes factors put much evidence in supporting the null model.
Numerical results in the two tables also show that BF ZS and BF BP have a similar behavior and both are strongly biased toward the null model (g = 0), leading to a smaller Type I error than the proposed Bayes factors. Consequently, they perform more poorly than the proposed Bayes factors when sampling from the alternative model. This is the price that they have to pay for their smaller Type I error behavior. We observe that the proposed Bayes factors display a more balanced Type I and II error probabilities than the ones based on the ZS prior and the BP prior. We have also conducted simulation studies with other choices of (a, b, m) for the hypothesis testing problem in other ANOVA settings. Numerical findings are in good agreement with the findings mentioned above and are thus not shown here for simplicity.

Concluding remarks
We have studied Bayes factor consistency under the various mixtures of g-priors for the hypothesis testing problem in the multi-way ANOVA models with a diverging number of parameters. It has been shown that the Bayes factors based on the ZS prior and the BP prior are not always consistent due to the presence of an inconsistency region around the null model when k grows proportionally to n. The Bayes factor based on the proposed family of hyper-priors avoids this undesirable inconsistency problem. Simulation results show that the proposed Bayes factors perform well and yield satisfactory results in terms of balancing Type I and II error probabilities under different simulation situations. Among the three special choices of the proposed prior, we have a preference for the HGr1 because its overall performance is superior than the other two when sampling from the null model and they all behave similarly when sampling from the alternative model. In ongoing work, we are further studying the performance of the Bayes factors with other choices of f r (t) given by (24).
The robust prior (Bayarri et al., 2012) has recently received much attention in context of normal linear regression models, because it was originally developed based on a number of theoretical augments. It includes the hyper-g prior and the hyper-g/n prior (Liang et al., 2008) as particular cases and is also closely connected with the ZS prior studied in this paper. These observations motivate us to study Bayes factor consistency under the robust prior in the ANOVA models with a growing number of parameters, which is currently under investigation and will be reported elsewhere.

Supplementary Material
Supplementary material for "Mixtures of g-priors for analysis of variance models with a diverging number of parameters" (DOI: 10.1214/16-BA1011SUPP; .pdf).