Sequential Monte Carlo on large binary sampling spaces

Schäfer, Christian; Chopin, Nicolas

doi:10.1007/s11222-011-9299-z

Sequential Monte Carlo on large binary sampling spaces

Published: 29 November 2011

Volume 23, pages 163–184, (2013)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Christian Schäfer^1,2 &
Nicolas Chopin^1,3

1043 Accesses
63 Citations
11 Altmetric
3 Mentions
Explore all metrics

Abstract

A Monte Carlo algorithm is said to be adaptive if it automatically calibrates its current proposal distribution using past simulations. The choice of the parametric family that defines the set of proposal distributions is critical for good performance. In this paper, we present such a parametric family for adaptive sampling on high dimensional binary spaces.

A practical motivation for this problem is variable selection in a linear regression context. We want to sample from a Bayesian posterior distribution on the model space using an appropriate version of Sequential Monte Carlo.

Raw versions of Sequential Monte Carlo are easily implemented using binary vectors with independent components. For high dimensional problems, however, these simple proposals do not yield satisfactory results. The key to an efficient adaptive algorithm are binary parametric families which take correlations into account, analogously to the multivariate normal distribution on continuous spaces.

We provide a review of models for binary data and make one of them work in the context of Sequential Monte Carlo sampling. Computational studies on real life data with about a hundred covariates suggest that, on difficult instances, our Sequential Monte Carlo approach clearly outperforms standard techniques based on Markov chain exploration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Albert, A., Anderson, J.A.: On the existence of maximum likelihood estimates in logistic regression models. Biometrika 72, 1–10 (1984)
Article MathSciNet Google Scholar
Andrieu, C., Thoms, J.: A tutorial on adaptive MCMC. Stat. Comput. 18(4), 343–373 (2008)
Article MathSciNet Google Scholar
Bahadur, R.: A representation of the joint distribution of responses to n dichotomous items. In: Solomon, H. (ed.) Studies in Item Analysis and Prediction, pp. 158–168. Stanford University Press, Stanford (1961)
Google Scholar
Bottolo, L., Richardson, S.: Evolutionary stochastic search for Bayesian model exploration. Bayesian Anal. 5(3), 583–618 (2010)
Article MathSciNet Google Scholar
Cappé, O., Douc, R., Guillin, A., Marin, J., Robert, C.: Adaptive importance sampling in general mixture classes. Stat. Comput. 18(4), 447–459 (2008)
Article MathSciNet Google Scholar
Carpenter, J., Clifford, P., Fearnhead, P.: Improved Particle Filter for nonlinear problems. IEE Proc. Radar Sonar Navig. 146(1), 2–7 (1999)
Article Google Scholar
Chopin, N.: A sequential particle filter method for static models. Biometrika 89(3), 539 (2002)
Article MathSciNet MATH Google Scholar
Clyde, M., Parmigiani, G.: Protein construct storage: Bayesian variable selection and prediction with mixtures. J. Biopharm. Stat. 8(3), 431 (1998)
Article MATH Google Scholar
Clyde, M., Ghosh, J., Littman, M.: Bayesian adaptive sampling for variable selection and model averaging. J. Comput. Graph. Stat. 20(1), 80–101 (2011)
Article MathSciNet Google Scholar
Cox, D.: The analysis of multivariate binary data. Appl. Stat. 113–120 (1972)
Cox, D., Wermuth, N.: A note on the quadratic exponential binary distribution. Biometrika 81(2), 403–408 (1994)
Article MathSciNet MATH Google Scholar
Cox, D., Wermuth, N.: On some models for multivariate binary variables parallel in complexity with the multivariate Gaussian distribution. Biometrika 89(2), 462 (2002)
Article MathSciNet MATH Google Scholar
Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo samplers. J. R. Stat. Soc., Ser. B, Stat. Methodol. 68(3), 411–436 (2006)
Article MathSciNet MATH Google Scholar
Dongarra, J., Moler, C., Bunch, J., Stewart, G.: LINPACK: Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia (1979)
Book Google Scholar
Emrich, L., Piedmonte, M.: A method for generating high dimensional multivariate binary variates. Am. Stat. 45, 302–304 (1991)
Google Scholar
Fearnhead, P., Clifford, P.: Online inference for hidden Markov models via particle filters. J. R. Stat. Soc., Ser. B, Stat. Methodol. 65(4), 887–899 (2003)
Article MathSciNet MATH Google Scholar
Firth, D.: Bias reduction of maximum likelihood estimates. Biometrika 80, 27–38 (1993)
Article MathSciNet MATH Google Scholar
Gelman, A., Meng, X.: Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Stat. Sci. 13(2), 163–185 (1998)
Article MathSciNet MATH Google Scholar
Genest, C., Neslehova, J.: A primer on copulas for count data. ASTIN Bull. 37(2), 475 (2007)
Article MathSciNet MATH Google Scholar
George, E.I., McCulloch, R.E.: Approaches for Bayesian variable selection. Stat. Sin. 7, 339–373 (1997)
MATH Google Scholar
Gilks, W., Berzuini, C.: Following a moving target Monte Carlo inference for dynamic Bayesian models. J. R. Stat. Soc., Ser. B, Stat. Methodol. 63(1), 127–146 (2001)
Article MathSciNet MATH Google Scholar
Gordon, N.J., Salmond, D.J., Smith, A.F.M.: Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. Radar Sonar Navig. 140(2), 107–113 (1993)
Google Scholar
Harrison, D., Rubinfeld, D.L.: Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag. 5(1), 81–102 (1978)
Article MATH Google Scholar
Jasra, A., Stephens, D., Doucet, A., Tsagaris, T.: Inference for Lévy-Driven stochastic volatility models via adaptive sequential Monte Carlo. Scand. J. Stat. (2008)
Joe, H.: Families of m-variate distributions with given margins and m (m−1)/2 bivariate dependence parameters. Lect. Notes Monogr. Ser. 28, 120–141 (1996)
Article MathSciNet Google Scholar
Kitagawa, G.: Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. J. Comput. Graph. Stat. 5(1), 1–25 (1996)
MathSciNet Google Scholar
Kong, A., Liu, J.S., Wong, W.H.: Sequential imputation and Bayesian missing data problems. J. Am. Stat. Assoc. 89, 278–288 (1994)
Article MATH Google Scholar
Lee, A.: Generating random binary deviates having fixed marginal distributions and specified degrees of association. Am. Stat. 47(3) (1993)
Lee, A., Yau, C., Giles, M., Doucet, A., Holmes, C.: On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. J. Comput. Graph. Stat. 19(4), 769–789 (2010)
Article Google Scholar
Leisch, F., Weingessel, A., Hornik, K.: On the generation of correlated artificial binary data. Technical report, WU Vienna University of Economics and Business (1998)
Liang, F., Wong, W.: Evolutionary Monte Carlo: Applications to Cp model sampling and change point problem. Stat. Sin. 10(2), 317–342 (2000)
MATH Google Scholar
Liu, J.: Peskun’s theorem and a modified discrete-state Gibbs sampler. Biometrika 83(3), 681–682 (1996)
Article MathSciNet MATH Google Scholar
Liu, J., Chen, R.: Sequential Monte Carlo methods for dynamic systems. J. Am. Stat. Assoc. 93(443), 1032–1044 (1998)
Article MATH Google Scholar
Lunn, A., Davies, S.: A note on generating correlated binary variables. Biometrika 85(2), 487–490 (1998)
Article MathSciNet MATH Google Scholar
Neal, R.: Annealed importance sampling. Stat. Comput. 11(2), 125–139 (2001)
Article MathSciNet Google Scholar
Nelsen, R.: An Introduction to Copulas. Springer, Berlin (2006)
MATH Google Scholar
Nott, D., Kohn, R.: Adaptive sampling for Bayesian variable selection. Biometrika 92(4), 747 (2005)
Article MathSciNet MATH Google Scholar
Oman, S., Zucker, D.: Modelling and generating correlated binary variables. Biometrika 88(1), 287 (2001)
Article MathSciNet MATH Google Scholar
Park, C., Park, T., Shin, D.: A simple method for generating correlated binary variates. Am. Stat. 50(4) (1996)
Qaqish, B.: A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika 90(2), 455 (2003)
Article MathSciNet MATH Google Scholar
Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer, Berlin (2004)
MATH Google Scholar
Schäfer, C.: Parametric families on large binary spaces. Technical report, Centre de Recherche en Economie et en Statistique, Paris (2011)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Article MATH Google Scholar
Suchard, M., Holmes, C., West, M.: Some of the what?, why?, how?, who? and where? of graphics processing unit computing for Bayesian analysis. In: Bernardo, J.M. (ed.) Bayesian Statistics, vol. 9. Oxford University Press, London (2010)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B, Methodol. 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Yeh, I.: Modeling of strength of high-performance concrete using artificial neural networks. Cem. Concr. Res. 28(12), 1797–1808 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Centre de Recherche en Économie et Statistique, 3 Avenue Pierre Larousse, 92240, Malakoff, France
Christian Schäfer & Nicolas Chopin
CEntre de REcherches en MAthématiques de la DEcision, Université Paris-Dauphine, Place du Maréchal de Lattre de Tassigny, 75775, Paris, France
Christian Schäfer
Ecole Nationale de la Statistique et de l’Administration, 3 Avenue Pierre Larousse, 92240, Malakoff, France
Nicolas Chopin

Authors

Christian Schäfer
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Chopin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Schäfer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schäfer, C., Chopin, N. Sequential Monte Carlo on large binary sampling spaces. Stat Comput 23, 163–184 (2013). https://doi.org/10.1007/s11222-011-9299-z

Download citation

Received: 01 February 2011
Accepted: 07 November 2011
Published: 29 November 2011
Issue Date: March 2013
DOI: https://doi.org/10.1007/s11222-011-9299-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sequential Monte Carlo on large binary sampling spaces

Abstract

Access this article

Similar content being viewed by others

Adaptive random neighbourhood informed Markov chain Monte Carlo for high-dimensional Bayesian variable selection

Latent uniform samplers on multivariate binary spaces

Bayesian Approaches to the Design of Markov Chain Monte Carlo Samplers

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sequential Monte Carlo on large binary sampling spaces

Abstract

Access this article

Similar content being viewed by others

Adaptive random neighbourhood informed Markov chain Monte Carlo for high-dimensional Bayesian variable selection

Latent uniform samplers on multivariate binary spaces

Bayesian Approaches to the Design of Markov Chain Monte Carlo Samplers

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation