Bayesian variable selection for logistic mixed model with nonparametric random effects
Introduction
In longitudinal studies, logistic mixed models (Drum and McCullagh, 1993, Noortgate and Boeck, 2005) are widely used for clustered binary data to study the relationship between the response and covariates. Generally the random effects are incorporated to account for subject-specific variation and are routinely assumed to follow normal distribution with mean zero. However, this assumption might not be realistic and one might question the validity of inferences of the mixed effects when it is violated. Moreover, flexible specification for random effects such as multimodal or skewness might provide insight into heterogeneity and even unveil failure to include important covariates in the model. Such concern has motivated many nonparametric approaches for the random effects. Zhang and Davidian (2001) approximated the random effects by the seminonparametric approach of Gallant and Tauchen (1987). Further, Chen et al. (2002) extended it to the generalized linear mixed models (GLMMs). There are also some other frequentist approaches proposed such as Lai and Shih (2003), and Ghidey et al. (2004). Alternatively, many Bayesian nonparametric approaches using Dirichlet process (DP) (Ferguson, 1973) and DP mixtures (DPM) are also proposed. Readers can refer to Bush and MacEachern (1996), Kleinman and Ibrahim (1998), Ishwaran and Takahara (2002), among many others. However, these methods do not address the uncertainty of predictors to be included in the mixed effects of the model.
Typically, a random variable is included when it is expected to vary among subjects. However, a practical problem is how to decide which predictors have coefficients varying among subjects. Standard approaches such as Akaike information criterion (AIC), Bayesian information criterion (BIC), generalized information criterion (GIC) and Bayes factor (BF) generally compare a few models in enumeration. However, such methods do not work well when the number of potential predictors is large. Especially, the number of possible models increases exponentially with the number of predictors. For example, with fixed effects and random effects, the total number of possible models is . When , the total number of model is well above one million.
Unlike the linear mixed effects (LME) model (Laird and Ware, 1982), the random effects have a rather complicated maximum likelihood form in logistic mixed models. Inference based on likelihood requires integration over the dimensions of the random effects, which is often intractable even with simple normal distribution. With this, researchers proposed Laplace and other approximation approaches, for example, Schall (1991), Breslow and Clayton (1993) etc. However, such approaches may result in biased estimates for the fixed effects (Breslow and Lin, 1995, Lin and Breslow, 1996). To resolve the difficulty, some Bayesian methods have been developed to circumvent the intense integration. Zeger and Karim (1991) used Gibbs sampling for the random effects. McCulloch (1997) and Booth and Hobert (1999) used Monte Carlo EM algorithm for posterior inference.
For mixed effects models, it is desirable to accommodate uncertainty of predictors to be included in the model for enhanced flexibility. Bayesian methods can accommodate such flexibility and avoid cumbersome integration with MCMC algorithms. In addition, one can easily infer from the variable selections results, for example, posterior probabilities of the mixed effects inclusion and models of the Bayesian approaches. Kuo and Mallick (1998) and George and McCulloch, 1993, George and McCulloch, 1997 used the approach of Bayesian variable selection for the general linear model. Chen and Dunson (2003) used the Cholesky decomposition for the random effects. Kinney and Dunson (2007) extended the approach to logistic mixed model. Bondell et al. (2010) proposed a penalized joint likelihood with an adaptive penalty in joint selection of both fixed and random effects. Ibrahim et al. (2010) used maximum penalized likelihood estimation for fixed and random effects selection. However, all these approaches do not have flexible specification for the random effects. For nonparametric specification of the uncentered random effects, the expected mean generally is not zero and thus causes identifiability with the fixed effects. Ultimately, bias is incurred. Cai and Dunson (2010) proposed a nonparametric random effect model without addressing the potential bias. Though, they might take the approach by Yang and Dunson (2010), Yang et al. (2010) and Li et al. (2011) to reduce bias. However, it is difficult for interpretation with variable selection, in particular, when the fixed effect is selected but the corresponding random effect is not. With this, Yang (2010) used the centered Dirichlet process mixture models for the random effects. To the author’s best knowledge, there is no method proposed for GLMM which addresses joint selection of mixed effects, flexible prior specification and bias control simultaneously.
In this article, we address variable selection for logistic mixed model with nonparametric random effects. The article is organized as follows: Section 2 describes the logistic mixed models. Section 3 describes the approach of joint selection of fixed and random effects and the posterior inference. Sections 4 Simulation, 5 Application presents simulation and real data example respectively. A final discussion is provided to conclude the article.
Section snippets
General description
Suppose there are subjects in a study and each subject has repeated observations for . Let denote the predictor for subject at observation , a vector of dimension , let be the corresponding binary response variable, is a predictor vector of dimension . Then the logistic mixed model is denoted as: where is the fixed effect coefficient vector, is the th random effect, is the logistic link
Posterior inference
We outline the brief Gibbs sampler for posterior sampling with details provided in the Appendix. The posterior sampling proceeds as follows with the initial values assigned for the parameters:
- 1.
Given the data and the current values of , sample from the full conditional posterior distribution.
- 2.
Update , for from the full conditional posterior distributions given the data and the current values of .
- 3.
Sample from the posterior Gamma distribution given the data and
Simulation
To evaluate the performance of our proposed algorithms, we conduct the following simulation. We generate data with 200 subjects, each subject with 20 observations. There are four covariates in , that is, . The first element is fixed as one and the other three elements are generated from the uniform distribution . We set the design matrix , . The random effect is generated from a mixture of three multivariate normal distributions
Application
We take a subset of the ICPSR (Inter-University Consortium for Political and Social Research) data set collected for the World Value Survey (WVS 1981–2004) at the following website (http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/04531). The WVS was designed to understand a crossnational, crosscultural comparison of values and norms on a wide variety of topics and to monitor changes in values and attitudes across the globe. The survey was conducted by researchers in over 80 societies,
Discussion
Mixed effects models have received considerable attention for their flexibility in characterizing heterogeneity across clusters in the literature. In this article, we propose a logistic mixed model for variable selections with nonparametric random effects. Compared to the previous work such as Chen and Dunson (2003), Kinney and Dunson (2007), Cai and Dunson (2010) and Bondell et al. (2010), our approach has several advantages: easy implementation, efficient algorithm and bias reduction due to
Acknowledgments
The author thanks his colleagues and the three anonymous reviewers for their comments and critical reading of the manuscript.
References (53)
- et al.
Nonparametric regression using Bayesian variable selection
Journal of Econometrics
(1996) - et al.
Semiparametric Bayes hierarchical models with mean and variance constraints
Computational Statistics and Data Analysis
(2010) - et al.
Bayesian tests and model diagnostics in conditionally independent hierarchical models
Journal of the American Statistical Association
(1997) - et al.
Ferguson distributions via Polya urn schemes
Annals of Statistics
(1973) - et al.
Joint variable selection for fixed and random effects in linear mixed effects models
Biometrics
(2010) - et al.
Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm
Journal of the Royal Statistical Society, Series B
(1999) - et al.
Approximate inference in generalized linear mixed models
Journal of the American Statistical Association
(1993) - et al.
Bias correction in generalized linear mixed models with a single component of dispersion
Biometrika
(1995) - et al.
A semiparametric Bayesian model for randomised block designs
Biometrika
(1996) - Cai, B., Dunson, D.B., 2010, Variable selection in nonparametric random effects models, Technical...
Random effects selection in linear mixed models
Biometrics
A Monte Carlo EM algorithm for generalized linear mixed models with flexible random effects distribution
Biostatistics
Nonparametric Bayes conditional distribution modeling with variable selection
Journal of the American Statistical Association
Monte Carlo methods for Bayesian analysis of survival data using mixtures of Dirichlet priors
Journal of Computational and Graphical Statistics
REML estimation with exact covariance in the logistic mixed model
Biometrics
Estimating normal means with a Dirichlet process prior
Journal of the American Statistical Association
Bayesian density estimation and inference using mixtures
Journal of the American Statistical Association
Asymptotic behavior of Bayes estimates
Annals of Mathematical Statistics
On the asymptotic behavior of Bayes estimates in the discrete case
Annals of Mathematical Statistics
A Bayesian analysis of some nonparametric problems
Annals of Statistics
Prior distributions on spaces of probability measures
Annals of Statistics
Nonlinear Models for Repeated Measurement Data
Prior distributions for variance parameters in hierarchical models
Bayesian Analysis
Variable selection via Gibbs sampling
Journal of the American Statistical Association
Approaches for Bayesian variable selection
Statistica Sinica
Smooth random effects distribution in a linear mixed model
Biometrics
Cited by (14)
Inner spike and slab Bayesian nonparametric models
2023, Econometrics and StatisticsA Bayesian goodness-of-fit test for regression
2021, Computational Statistics and Data AnalysisAssessing the impacts of signal coordination on the crash risks of various driving cohorts
2019, Journal of Safety ResearchA Bayesian analysis of the incomplete block crossover design
2023, Communications in Statistics: Simulation and Computation