Abstract
Model selection for normal linear regression models with grouped covariates is considered under a class of Zellner’s \(g\)-priors. The marginal likelihood function is derived under the proposed priors, and a simplified closed-form expression is given assuming the commutativity of the projection matrices from the design matrices. As illustration, the marginal likelihood functions of the balanced \(q\)-way ANOVA models, either solely with main effects or with all interaction effects, are calculated using the closed-form expression. The performance of the proposed priors in model comparison problems is demonstrated by simulation studies on two-way ANOVA models and by two real data studies.
Similar content being viewed by others
References
Baksalary, J. K. (1987). Algebraic characterizations and statistical implications of the commutativity of orthogonal projectors. In T. Pukkila, S. Puntanen (Eds.), Proceedings of the second international Tampere conference in statistics (pp. 113–142, 1–4 June 1987). Tampere: Department of Mathematical Sciences/Statistics, University of Tampere.
Baksalary, J. K., Baksalary, O. M., Szulc, T. (2002). A property of orthogonal projectors. Linear Algebra and Its Applications, 354, 35–39.
Bayarri, M. J., Berger, J. O., Forte, A., García-Donato, G. (2012). Criteria for Bayesian model choice with application to variable selection. The Annals of Statistics, 40(3), 1550–1577.
Bayarri, M. J., García-Donato, G. (2007). Extending conventional priors for testing general hypotheses in linear models. Biometrika, 94(1), 135–152.
Berger, J. O., Bayarri, M. J., Pericchi, L. R. (2014). The effective sample size. Econometric Reviews, 33, 197–217.
Berger, J. O., Pericchi, L. R. (1996). The intrinsic Bayes factor for model selection and prediction. Journal of the American Statistical Association, 91(433), 109–122.
Berger, J. O. Pericchi, L. R. (2001). Objective Bayesian methods for model selection: introduction and comparison. In P. Lahiri (Ed.), Model selection, volume 38 of IMS Lecture Notes—Monograph Series (pp. 135–207). Institute of Mathematical Statistics.
Berger, J. O., Pericchi, L. R., Varshavsky, J. A. (1998). Bayes factors and marginal distributions in invariant situations. Sankhya: The Indian Journal of Statistics Series A 60(3), 307–321.
Deltell, A. F. (2011). Objective Bayes criteria for variable selection. Ph.D. thesis, Universidad de València.
Dickey, J. (1971). The weighted likelihood ratio, linear hypotheses on normal location parameters. The Annals of Statistics, 42, 204–223.
Fernández, C., Ley, E., Steel, M. F. J. (2001). Benchmark priors for Bayesian model averaging. Journal of Econometrics, 100(2), 381–427.
García-Donato, G., Sun, D. (2007). Objective priors for hypothesis testing in one-way random effects models. The Canadian Journal of Statistics, 35(2), 303–320.
Gelfand, A. E., Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410), 398–409.
Gelman, A. (2005). Analysis of variance why it is more important than ever. The Annals of Statistics, 33(1), 1–53.
George, E. I., McCulloch, R. E. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88(423), 881–889.
Guo, R., Speckman, P. L. (2009). Bayes factor consistency in linear models. In: The 2009 international workshop on objective Bayes methodology, Philadelphia, 5–9 June 2009.
Hald, A. (1952). Statistical theory with engineering applications. New York: Wiley.
Jeffreys, H. (1961). Theory of probability. New York: Oxford University Press.
Kass, R. E., Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.
Liang, F., Paulo, R., Molina, G., Clyde, M. A., Berger, J. O. (2008). Mixtures of \(g\)-priors for Bayesian variable selection. Journal of the American Statistical Association, 103(481), 410–423.
Maruyama, Y., George, E. I. (2011). Fully Bayes factors with a generalized \(g\)-prior. Annals of Statistics, 39(5), 2740–2765.
Park, T., Casella, G. (2008). The Bayesian Lasso. Journal of the American Statistical Association, 103(482), 681–686.
Pérez, J. M., Berger, J. O. (2002). Expected-posterior prior distributions for model selection. Biometrika, 89(3), 491.
Raftery, A. E., Madigan, D., Hoeting, J. A. (1997). Bayesian model averaging for linear regression models. Journal of the American Statistical Association, 92(437), 179–191.
Rao, C. R., Yanai, H. (1979). General definition and decomposition of projectors and some applications to statistical problems. Journal of Statistical Planning and Inference, 3, 1–17.
Scott, J. G., Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics, 38, 2587–2619.
Searle, S., Casella, G., McCulloch, C. (1992). Variance components. Wiley series in probability and statistics. New York: Wiley.
Sun, D., Speckman, P. L., Liu, F., Rouder, J. N. (2010). One-way ANOVA, fixed effects or random? (unpublished manuscript).
Vandaele, W. (1978). Participation in illegitimate activities: Ehrlich revisited. In A. Blumstein, J. Cohen, D. Nagin (Eds.), Deterrence and incapacitation: estimating the effects of criminal sanctions on crime rates (pp. 270–335). National Academy of Sciences.
Verdinelli, I., Wasserman, L. (1995). Computing Bayes factors using a generalization of the Savage–Dickey density ratio. Journal of the American Statistical Association, 90(430), 614–618.
Wood, H., Steinour, H. H., Starke, H. R. (1932). Effect of composition of portland cement on heat evolved during hardening. Industrila and Engineering Chemistry, 24(11), 1207–1214.
Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with \(g\)-prior distributions. In P. K. Goel, A. Zellner (Eds.), Bayesian inference and decision techniques: essays in honor of Bruno de Finetti (pp. 233–243). New York/Amsterdam: Elsevier/North-Holland.
Zellner, A., Siow, A. (1980). Posterior odds ratios for selected regression hypothesis. In J. M. Bernardo, M. H. Degroot, D. V. Lindley, A. F. M. Smith (Eds.), Bayesian statistics: proceedings of the first international meeting held in Valencia (pp. 585–603). Valencia: University of Valencia Press.
Author information
Authors and Affiliations
Corresponding author
Additional information
Most of the work was done while Xiaoyi Min was a graduate student at the University of Missouri. This research was supported in part by NSF Grants DMS-1007874 SES-1024080, SES-1260806, and NIH Grant R01DA016750.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendix A: Proof of Theorem 1
Note that
If we write \(\tilde{\varvec{\beta }} = (\varvec{X}'\varvec{X}+\varvec{M})^{-1}\varvec{X}'\varvec{y},\)
where \(\varvec{R}\) is defined by (11). Therefore,
This proves (10).
Appendix B: Proof of Theorem 2
To prove Theorem 2, we first derive some of the necessary results in the following lemma.
Lemma 2
Suppose that (13) holds.
-
(a)
Both \(\varvec{P}_{\varvec{\gamma }}\) in (14) and \(\varvec{A}_{\varvec{\gamma }}\) in (15) are projection matrices.
-
(b)
For any \(\varvec{\gamma }\ne \varvec{\gamma }^*\subseteq \{0,1,\ldots ,m\}\), we have \(\varvec{A}_{\varvec{\gamma }}\varvec{A}_{\varvec{\gamma }^*}=\mathbf {0}\).
-
(c)
We have the expression for the determinant,
$$\begin{aligned} \left| \varvec{I}_n+\sum _{j=1}^{m}{g_j(\varvec{I}_n-\varvec{P}_0)\varvec{P}_j}\right| = \prod _{\varvec{\gamma }\in \varvec{\Gamma }}{\left( 1+ \sum _{j \in \varvec{\gamma }}{g_j}\right) ^{p_{\varvec{\gamma }}}}. \end{aligned}$$(22) -
(d)
We have the expression for the inverse,
$$\begin{aligned} \left[ \varvec{I}_n+\sum _{j=1}^{m}{g_j(\varvec{I}_n-\varvec{P}_0)\varvec{P}_j} \right] ^{-1} = \varvec{I}_n+\sum _{\varvec{\gamma }\in \varvec{\Gamma }} {u_{\varvec{\gamma }}(\varvec{I}_n-\varvec{P}_0)\varvec{P}_{\varvec{\gamma }}}, \end{aligned}$$(23)where \(u_{\varvec{\gamma }}\) is defined as in (18).
-
(e)
\(u_{\varvec{\gamma }}\) defined in (18) satisfies the following property: for any \(\varvec{\gamma }_0\in \varvec{\Gamma }\),
$$\begin{aligned} \sum _{\emptyset \ne \varvec{\gamma }\subseteq \varvec{\gamma }_0}{u_{\varvec{\gamma }}} =-1+\frac{1}{1+\sum _{j\in \varvec{\gamma }_0}{g_j}}. \end{aligned}$$(24)
Proof
Parts (a) and (b) are easy. For Part (c), the identity matrix \(\varvec{I}_n\) can be decomposed as \(\varvec{I}_n=\sum _{\varvec{\gamma }\subseteq \{0,\ldots ,m\}}{\varvec{A}_{\varvec{\gamma }}}\). Since \((\varvec{I}_n-\varvec{P}_0)\varvec{P}_j=\sum _{\varvec{\gamma }\in \varvec{\Gamma }:j\in \varvec{\gamma }}{\varvec{A}_{\varvec{\gamma }}}\), we know that
\(\forall \varvec{\gamma }\subseteq \{0,\ldots ,m\}\), \(\varvec{A}_{\varvec{\gamma }}\) is idempotent and symmetric, whose eigenvalues are \(p_{\varvec{\gamma }}\) \(1\)’s and \((n-p_{\varvec{\gamma }})\) \(0\)’s. Therefore, there is an \(n\times p_{\varvec{\gamma }}\) matrix \(\varvec{B}_{\varvec{\gamma }}\) (if \(p_{\varvec{\gamma }}=0\), we let \(\varvec{B}_{\varvec{\gamma }}\) be a null matrix) such that \(\varvec{A}_{\varvec{\gamma }}=\varvec{B}_{\varvec{\gamma }}\varvec{B}_{\varvec{\gamma }}'\) and \(\varvec{B}_{\varvec{\gamma }}'\varvec{B}_{\varvec{\gamma }}=\varvec{I}_{p_{\varvec{\gamma }}}\). Note that for \(\varvec{\gamma }^*\ne \varvec{\gamma }\), \(\varvec{B}_{\varvec{\gamma }}'\varvec{B}_{\varvec{\gamma }^*}=\mathbf {0}_{p_{\varvec{\gamma }}\times p_{\varvec{\gamma }^*}}\). Further, if \(\varvec{\gamma }\in \varvec{\Gamma }\), write \(\varvec{C}_{\varvec{\gamma }}=\sqrt{1+\sum _{j \in \varvec{\gamma }}{g_j}}\varvec{B}_{\varvec{\gamma }};\) if \(\varvec{\gamma }\notin \varvec{\Gamma }\), define \(\varvec{C}_{\varvec{\gamma }}=\varvec{B}_{\varvec{\gamma }}.\) We then combine all \(\varvec{C}_{\varvec{\gamma }}\)’s side-by-side into an \(n\times n\) matrix \(\varvec{C}\) and get \(\bigg |\varvec{I}_n+\sum _{j=1}^{m}{g_j(\varvec{I}_n-\varvec{P}_0)\varvec{P}_j}\bigg |=|\varvec{C}\varvec{C}'|=|\varvec{C}'\varvec{C}|\), and (22) follows by noting that \(\varvec{C}'\varvec{C}\) is a block diagonal matrix with the diagonal parts being \((1+\sum _{j \in \varvec{\gamma }}{g_j})\varvec{I}_{p_{\varvec{\gamma }}}\) if \(\varvec{\gamma }\in \varvec{\Gamma }\), and \(\varvec{I}_{p_{\varvec{\gamma }}}\), otherwise.
For Part (d), note that
Consider the product of \((\varvec{I}_n\!+\!\sum _{j\!=\!1}^{m}{g_j(\varvec{I}_n\!-\!\varvec{P}_0)\varvec{P}_j})\) and \((\varvec{I}_n\!+\sum _{\varvec{\gamma }\in \varvec{\Gamma }}{u_{\varvec{\gamma }} (\varvec{I}_n-\varvec{P}_0)\varvec{P}_{\varvec{\gamma }}})\), the coefficient before each term \((\varvec{I}_n-\varvec{P}_0)\varvec{P}_{\varvec{\gamma }}\) should be zero. We use mathematical induction to prove (23). For \((\varvec{I}_n-\varvec{P}_0)\varvec{P}_{\{j\}}\), we have \(g_j+u_{\{j\}}+g_ju_{\{j\}}=0.\) This implies that \(u_{\{j\}}=-g_j/(1+g_j)\), so (18) holds when \(|\varvec{\gamma }|=1\). For \((\varvec{I}_n-\varvec{P}_0)\varvec{P}_{\varvec{\gamma }}\) with \(|\varvec{\gamma }|=k\ge 2\), if (18) holds for \(|\varvec{\gamma }|=k-1\), we have \(u_{\varvec{\gamma }}+\sum _{j\in \varvec{\gamma }}{g_j(u_{\varvec{\gamma }}+u_{\varvec{\gamma }\setminus \{j\}})}=0\), which implies that
The conclusion (18) also holds for \(|\varvec{\gamma }|=k\). Thus, (23) and (18) are proved.
For Part (e), without loss of generality, we only prove (24) for \(\varvec{\gamma }_0=\{1,\ldots ,k\}\). In fact,
By the induction, we have
The lemma is proved. \(\square \)
Now we are ready to prove Theorem 2. To calculate \(\varvec{R}\), we write
where \(\varvec{X}^*\) is defined in (8). Then \(\varvec{R}= \varvec{I}_n-\varvec{X}\varvec{D}'[\varvec{D}(\varvec{X}'\varvec{X})\varvec{D}'+\varvec{D}\varvec{M}\varvec{D}']^{-1}\varvec{D}\varvec{X}'\). We get \([\varvec{D}(\varvec{X}'\varvec{X})\varvec{D}'+\varvec{D}\varvec{M}\varvec{D}']^{-1}= \mathrm{diag}( (\varvec{X}_0'\varvec{X}_0)^{-1}, ~ (\varvec{X}^{*'}(\varvec{I}_n-\varvec{P}_0)\varvec{X}^*+\varvec{M}_1)^{-1})\). Define \(\tilde{\varvec{X}}=(\varvec{I}_n-\varvec{P}_0)\varvec{X}^*,\) then \(\varvec{R}=(\varvec{I}_n-\varvec{P}_0)-\tilde{\varvec{X}}(\tilde{\varvec{X}}'\tilde{\varvec{X}}+\varvec{M}_1)^{-1}\tilde{\varvec{X}}'\). Use the fact that for invertible matrices \(\varvec{\Phi }\) and \(\varvec{\Delta }\), \((\varvec{\Phi }+\varvec{\omega }\varvec{\Delta }\varvec{\omega }')^{-1}=\varvec{\Phi }^{-1}-\varvec{\Phi }^{-1}\varvec{\omega }(\varvec{\Delta }^{-1}+\varvec{\omega }'\varvec{\Phi }^{-1}\varvec{\omega })^{-1}\varvec{\omega }'\varvec{\Phi }^{-1}\), and define \(\varvec{O}=\tilde{\varvec{X}}\varvec{M}_1^{-1}\tilde{\varvec{X}}'\), then
Also,
Applying (27) and (23) to (26), we get \(\tilde{\varvec{X}}(\tilde{\varvec{X}}'\tilde{\varvec{X}}+\varvec{M}_1)^{-1}\tilde{\varvec{X}}' \!=\!-\sum _{\varvec{\gamma }\in \varvec{\Gamma }}{u_{\varvec{\gamma }}(\varvec{I}_n\!-\!\varvec{P}_0)\varvec{P}_{\varvec{\gamma }}}\). This proves (17). Next, we calculate \(|\varvec{X}'\varvec{X}+\varvec{M}|\). For \(\varvec{D}\) defined in (25), \(|\varvec{X}'\varvec{X}+\varvec{M}|=|\varvec{D}\varvec{X}'\varvec{X}\varvec{D}'+\varvec{D}\varvec{M}\varvec{D}'| = |\varvec{X}_0'\varvec{X}_0|\,|\tilde{\varvec{X}}' \tilde{\varvec{X}}+ \varvec{M}_1|\). Using the identity \(|\varvec{\omega }\varvec{\Delta }\varvec{\omega }'+\varvec{\Phi }|=|\varvec{\Delta }|\,|\varvec{\Phi }|\,|\varvec{\Delta }^{-1}+\varvec{\omega }'\varvec{\Phi }^{-1}\varvec{\omega }|,\) we have
By Part (c) of Lemma 2,
The conclusion (16) follows by plugging (28) into (10). The theorem is proved.
Appendix C: Proof of Theorem 3
For each \(\varvec{\gamma }\in \varvec{\Gamma }\), define \(\xi _{\varvec{\gamma }}=\xi (\varvec{\gamma })=\bigcap _{\xi \in \varvec{\gamma }}{\xi }\). We first show that for any \(\varvec{\gamma }\in \varvec{\Gamma }\),
In fact, by the definition of \(\xi (\varvec{\gamma })\), \(\forall \tau \in \varvec{\gamma },~\xi (\varvec{\gamma }) \subseteq \tau \). On the other hand, if \(\exists \tau \supseteq \xi (\varvec{\gamma }) ~ \mathrm{s.t.}~\tau \notin \varvec{\gamma }\), then \((\varvec{I}_n-\varvec{P}_{\tau })\varvec{P}_{\xi _{\varvec{\gamma }}}=\mathbf {0}_{n\times n}\) in \(\varvec{A}_{\varvec{\gamma }}\). So (29) holds. Therefore, \(\forall \varvec{A}_{\varvec{\gamma }}\ne \mathbf {0}\),
so \(p_{\varvec{\gamma }}=\sum _{\tau \subseteq \xi _{\varvec{\gamma }}} {(-1)^{|\xi _{\varvec{\gamma }}|-|\tau |}p_{\tau }}\). In (16),
Next, we need to calculate \(\varvec{R}\) in this case. In (17), \(\varvec{P}_{\varvec{\gamma }}=\prod _{\xi \in \varvec{\gamma }}{\varvec{P}_{\xi }} =\varvec{P}_{\xi _{\varvec{\gamma }}}\). Therefore, in (17),
The theorem will be proved given the following lemma.
Lemma 3
For nonempty \(\tau \subseteq \{1,2,\ldots ,q\}\), define \(U'_{\tau }=\sum _{\varvec{\gamma }\in \varvec{\Gamma }:\xi (\varvec{\gamma })=\tau }{u_{\varvec{\gamma }}}\). Then we have
Proof
For (30), note that for any \(\varvec{\gamma }\in \varvec{\Gamma }\), \(\xi (\varvec{\gamma })\supseteq \tau \Leftrightarrow \varvec{\gamma }\subseteq \{\tau ^*:\tau ^*\supseteq \tau \}\). Therefore, using Lemma 2(e), we get
For (31), we use mathematical induction. If \(\tau =\{1,\ldots ,q\}\), (31) is exactly (30). If \(\tau =\{1,\ldots ,q-1\}\) , from (30), we have \(U'_{\{1,\ldots ,q\}}+U'_{\{1,\ldots ,q-1\}} =-1+1/(1+g_{\{1,\ldots ,q-1\}}+g_{\{1,\ldots ,q\}})\), which implies that
This proves that (31) holds for \(\tau \) with \(|\tau |=q-1\). Clearly, (30) implies a recursive formula,
Suppose (31) holds for \(\tau \) with \(|\tau |=k+1\), then
Therefore, (31) holds for \(\tau \) with \(|\tau |=k\). Repeat this procedure recursively, we can show that (31) holds for any nonempty \(\tau \subsetneq \{1,\ldots ,q\}\). The lemma is proved. \(\square \)
About this article
Cite this article
Min, X., Sun, D. Bayesian model selection for a linear model with grouped covariates. Ann Inst Stat Math 68, 877–903 (2016). https://doi.org/10.1007/s10463-015-0518-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-015-0518-9