Skip to main content

Mixture Models

  • Chapter
  • First Online:
Bayesian Essentials with R

Part of the book series: Springer Texts in Statistics ((STS))

Abstract

This chapter covers a class of models where a rather simple distribution is made more complex and less informative by a mechanism that mixes together several known or unknown distributions. This representation is naturally called a mixture of distributions, as illustrated above. Inference about the parameters of the elements of the mixtures and the weights is called mixture estimation, while recovery of the original distribution of each observation is called classification (or, more exactly, unsupervised classification to distinguish it from the supervised classification to be discussed in Chap. 8).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This is not a definition in the mathematical sense since all densities can formally be represented that way. We thus stress that the model itself must be introduced that way. This point is not to be mistaken for a requirement that the variable z be meaningful for the data at hand. In many cases, for instance the probit model, the missing variable representation remains formal.

  2. 2.

    We will see later that the missing structure of a mixture actually need not be simulated but, for more complex missing-variable structures like hidden Markov models (introduced in Chap. 7), this completion cannot be avoided.

  3. 3.

    The Frenchman Alphonse Bertillon is also the father of scientific police investigation. For instance, he originated the use of fingerprints in criminal investigations.

  4. 4.

    To get a better understanding of this second mode, consider the limiting setting when p = 0.5. In that case, there are two equivalent modes of the likelihood, \((\mu _{1},\mu _{2})\) and \((\mu _{2},\mu _{1})\). As p moves away from 0.5, this second mode gets lower and lower compared with the other mode, but it still remains.

  5. 5.

    In non-Bayesian statistics, the EM algorithm is certainly the most ubiquitous numerical method, even though it only applies to (real or artificial) missing variable models.

  6. 6.

    Historically, missing-variable models constituted one of the first instances where the Gibbs sampler was used by completing the missing variables by simulation under the name of data augmentation (see Tanner, 1996, and Robert and Casella, 2004, Chaps. 9 and 10).

  7. 7.

    That this is a natural estimate of the model, compared with the “plug-in” density using the estimates of the parameters, will be explained more clearly in Sect. 6.5.

  8. 8.

    In practice, the Gibbs sampler never leaves the vicinity of a given mode if the attraction of this mode is strong enough, for instance in the case of many observations.

  9. 9.

    While this resolution seems intuitive enough, there is still a lot of debate in academic circles on whether or not label switching should be observed on an MCMC output and, in case it should, on which substitute to the posterior mean should be used.

  10. 10.

    This section may be skipped by most readers, as it only addresses the very specific issue of handling improper priors in mixture estimation.

  11. 11.

    By nature, ill-posed problems are not precisely defined. They cover classes of models such as inverse problems, where the complexity of getting back from the data to the parameters is huge. They are not to be confused with nonidentifiable problems, though.

References

  • Chib, S. (1995). Marginal likelihood from the Gibbs output. J. American Statist. Assoc., 90:1313–1321.

    Article  MathSciNet  MATH  Google Scholar 

  • Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer-Verlag, New York, New York.

    MATH  Google Scholar 

  • Gelfand, A. and Dey, D. (1994). Bayesian model choice: asymptotics and exact calculations. J. Royal Statist. Society Series B, 56:501–514.

    MathSciNet  MATH  Google Scholar 

  • Green, P. (1995). Reversible jump MCMC computation and Bayesian model determination. Biometrika, 82(4):711–732.

    Article  MathSciNet  MATH  Google Scholar 

  • Hjort, N., Holmes, C., Müller, P., and Walker, S. (2010). Bayesian Nonparametrics. Cambridge University Press, Cambridge.

    Book  MATH  Google Scholar 

  • Marin, J.-M. and Robert, C. (2007). Bayesian Core. Springer-Verlag, New York.

    MATH  Google Scholar 

  • Robert, C. and Casella, G. (2004). Monte Carlo Statistical Methods. Springer-Verlag, New York, second edition.

    Google Scholar 

  • Tanner, M. (1996). Tools for Statistical Inference: Observed Data and Data Augmentation Methods. Springer-Verlag, New York, third edition.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Marin, JM., Robert, C.P. (2014). Mixture Models. In: Bayesian Essentials with R. Springer Texts in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8687-9_6

Download citation

Publish with us

Policies and ethics