Skip to main content
Log in

Sequential Monte Carlo on large binary sampling spaces

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

A Monte Carlo algorithm is said to be adaptive if it automatically calibrates its current proposal distribution using past simulations. The choice of the parametric family that defines the set of proposal distributions is critical for good performance. In this paper, we present such a parametric family for adaptive sampling on high dimensional binary spaces.

A practical motivation for this problem is variable selection in a linear regression context. We want to sample from a Bayesian posterior distribution on the model space using an appropriate version of Sequential Monte Carlo.

Raw versions of Sequential Monte Carlo are easily implemented using binary vectors with independent components. For high dimensional problems, however, these simple proposals do not yield satisfactory results. The key to an efficient adaptive algorithm are binary parametric families which take correlations into account, analogously to the multivariate normal distribution on continuous spaces.

We provide a review of models for binary data and make one of them work in the context of Sequential Monte Carlo sampling. Computational studies on real life data with about a hundred covariates suggest that, on difficult instances, our Sequential Monte Carlo approach clearly outperforms standard techniques based on Markov chain exploration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Albert, A., Anderson, J.A.: On the existence of maximum likelihood estimates in logistic regression models. Biometrika 72, 1–10 (1984)

    Article  MathSciNet  Google Scholar 

  • Andrieu, C., Thoms, J.: A tutorial on adaptive MCMC. Stat. Comput. 18(4), 343–373 (2008)

    Article  MathSciNet  Google Scholar 

  • Bahadur, R.: A representation of the joint distribution of responses to n dichotomous items. In: Solomon, H. (ed.) Studies in Item Analysis and Prediction, pp. 158–168. Stanford University Press, Stanford (1961)

    Google Scholar 

  • Bottolo, L., Richardson, S.: Evolutionary stochastic search for Bayesian model exploration. Bayesian Anal. 5(3), 583–618 (2010)

    Article  MathSciNet  Google Scholar 

  • Cappé, O., Douc, R., Guillin, A., Marin, J., Robert, C.: Adaptive importance sampling in general mixture classes. Stat. Comput. 18(4), 447–459 (2008)

    Article  MathSciNet  Google Scholar 

  • Carpenter, J., Clifford, P., Fearnhead, P.: Improved Particle Filter for nonlinear problems. IEE Proc. Radar Sonar Navig. 146(1), 2–7 (1999)

    Article  Google Scholar 

  • Chopin, N.: A sequential particle filter method for static models. Biometrika 89(3), 539 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Clyde, M., Parmigiani, G.: Protein construct storage: Bayesian variable selection and prediction with mixtures. J. Biopharm. Stat. 8(3), 431 (1998)

    Article  MATH  Google Scholar 

  • Clyde, M., Ghosh, J., Littman, M.: Bayesian adaptive sampling for variable selection and model averaging. J. Comput. Graph. Stat. 20(1), 80–101 (2011)

    Article  MathSciNet  Google Scholar 

  • Cox, D.: The analysis of multivariate binary data. Appl. Stat. 113–120 (1972)

  • Cox, D., Wermuth, N.: A note on the quadratic exponential binary distribution. Biometrika 81(2), 403–408 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Cox, D., Wermuth, N.: On some models for multivariate binary variables parallel in complexity with the multivariate Gaussian distribution. Biometrika 89(2), 462 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo samplers. J. R. Stat. Soc., Ser. B, Stat. Methodol. 68(3), 411–436 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Dongarra, J., Moler, C., Bunch, J., Stewart, G.: LINPACK: Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia (1979)

    Book  Google Scholar 

  • Emrich, L., Piedmonte, M.: A method for generating high dimensional multivariate binary variates. Am. Stat. 45, 302–304 (1991)

    Google Scholar 

  • Fearnhead, P., Clifford, P.: Online inference for hidden Markov models via particle filters. J. R. Stat. Soc., Ser. B, Stat. Methodol. 65(4), 887–899 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Firth, D.: Bias reduction of maximum likelihood estimates. Biometrika 80, 27–38 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Gelman, A., Meng, X.: Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Stat. Sci. 13(2), 163–185 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Genest, C., Neslehova, J.: A primer on copulas for count data. ASTIN Bull. 37(2), 475 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • George, E.I., McCulloch, R.E.: Approaches for Bayesian variable selection. Stat. Sin. 7, 339–373 (1997)

    MATH  Google Scholar 

  • Gilks, W., Berzuini, C.: Following a moving target Monte Carlo inference for dynamic Bayesian models. J. R. Stat. Soc., Ser. B, Stat. Methodol. 63(1), 127–146 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Gordon, N.J., Salmond, D.J., Smith, A.F.M.: Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. Radar Sonar Navig. 140(2), 107–113 (1993)

    Google Scholar 

  • Harrison, D., Rubinfeld, D.L.: Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag. 5(1), 81–102 (1978)

    Article  MATH  Google Scholar 

  • Jasra, A., Stephens, D., Doucet, A., Tsagaris, T.: Inference for Lévy-Driven stochastic volatility models via adaptive sequential Monte Carlo. Scand. J. Stat. (2008)

  • Joe, H.: Families of m-variate distributions with given margins and m (m−1)/2 bivariate dependence parameters. Lect. Notes Monogr. Ser. 28, 120–141 (1996)

    Article  MathSciNet  Google Scholar 

  • Kitagawa, G.: Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. J. Comput. Graph. Stat. 5(1), 1–25 (1996)

    MathSciNet  Google Scholar 

  • Kong, A., Liu, J.S., Wong, W.H.: Sequential imputation and Bayesian missing data problems. J. Am. Stat. Assoc. 89, 278–288 (1994)

    Article  MATH  Google Scholar 

  • Lee, A.: Generating random binary deviates having fixed marginal distributions and specified degrees of association. Am. Stat. 47(3) (1993)

  • Lee, A., Yau, C., Giles, M., Doucet, A., Holmes, C.: On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. J. Comput. Graph. Stat. 19(4), 769–789 (2010)

    Article  Google Scholar 

  • Leisch, F., Weingessel, A., Hornik, K.: On the generation of correlated artificial binary data. Technical report, WU Vienna University of Economics and Business (1998)

  • Liang, F., Wong, W.: Evolutionary Monte Carlo: Applications to Cp model sampling and change point problem. Stat. Sin. 10(2), 317–342 (2000)

    MATH  Google Scholar 

  • Liu, J.: Peskun’s theorem and a modified discrete-state Gibbs sampler. Biometrika 83(3), 681–682 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Liu, J., Chen, R.: Sequential Monte Carlo methods for dynamic systems. J. Am. Stat. Assoc. 93(443), 1032–1044 (1998)

    Article  MATH  Google Scholar 

  • Lunn, A., Davies, S.: A note on generating correlated binary variables. Biometrika 85(2), 487–490 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Neal, R.: Annealed importance sampling. Stat. Comput. 11(2), 125–139 (2001)

    Article  MathSciNet  Google Scholar 

  • Nelsen, R.: An Introduction to Copulas. Springer, Berlin (2006)

    MATH  Google Scholar 

  • Nott, D., Kohn, R.: Adaptive sampling for Bayesian variable selection. Biometrika 92(4), 747 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Oman, S., Zucker, D.: Modelling and generating correlated binary variables. Biometrika 88(1), 287 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Park, C., Park, T., Shin, D.: A simple method for generating correlated binary variates. Am. Stat. 50(4) (1996)

  • Qaqish, B.: A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika 90(2), 455 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer, Berlin (2004)

    MATH  Google Scholar 

  • Schäfer, C.: Parametric families on large binary spaces. Technical report, Centre de Recherche en Economie et en Statistique, Paris (2011)

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MATH  Google Scholar 

  • Suchard, M., Holmes, C., West, M.: Some of the what?, why?, how?, who? and where? of graphics processing unit computing for Bayesian analysis. In: Bernardo, J.M. (ed.) Bayesian Statistics, vol. 9. Oxford University Press, London (2010)

    Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B, Methodol. 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Yeh, I.: Modeling of strength of high-performance concrete using artificial neural networks. Cem. Concr. Res. 28(12), 1797–1808 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Schäfer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schäfer, C., Chopin, N. Sequential Monte Carlo on large binary sampling spaces. Stat Comput 23, 163–184 (2013). https://doi.org/10.1007/s11222-011-9299-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-011-9299-z

Keywords

Navigation