Skip to main content

MML Clustering of Continuous-Valued Data Using Gaussian and t Distributions

  • Conference paper
  • First Online:
Book cover AI 2002: Advances in Artificial Intelligence (AI 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2557))

Included in the following conference series:

Abstract

Clustering, also known as mixture modelling or intrinsic classification, is the problem of identifying and modelling components (or clusters, or classes) in a body of data. We consider here the application of the Minimum Message Length (MML) principle to a clustering problem of Gaussian and t distributions. Earlier work in the MML clustering was conducted in regards to the multinomial and Gaussian distributions (Wallace and Boulton, 1968) and in addition, the von Mises circular and Poisson distributions (Wallace and Dowe, 1994, 2000). Our current work extends this by applying the Gaussian distribution to the more general t distribution. Point estimation of the t distribution is performed using the MML approximation proposed by Wallace and Freeman (1987). A comparison of the MML estimations of the t distribution to those of the Maximum Likelihood (ML) method in terms of their Kullback-Leibler (KL) distances is also provided. Within each component, our application also performs a model selection on whether a particular group of data is best modelled as a Gaussian or a t distribution. The proposed modelling method is then applied to several artificially generated datasets. The modelling results are compared to the results obtained when using the MML clustering of Gaussian distributions. Our modelling method compares quite well to an alternative clustering program (EMMIX) which uses various modelling criteria such as the Akaike Information Criterion (AIC) and Schwarz’s Bayesian Information Criterion (BIC).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akaike H.: A new look at the statistical model identificati on. IEEE Transactionson Automatic Control, AC-19, 6 (1974) 716–723

    Article  MathSciNet  Google Scholar 

  2. Baxter R.A. and Oliver J.J.: Finding overlapping components with MML. Statistics and Computing, 10 (2000) 5–16

    Article  Google Scholar 

  3. Boulton D.M.: The information criterion for intrinsic classification. Ph.D. Thesis, Dept. Computer Science, Monash University Clayton 3800 Australia (1975)

    Google Scholar 

  4. Chaitin G.J.: On the length of programs for computing finite sequences. Journal of the Association for Computing Machinery, 13 (1966) 547–569

    MATH  MathSciNet  Google Scholar 

  5. Conway J.H. and Sloane N.J.A.: Sphere Packings Lattices and Groups. 3rd edn. Springer-Verlag, London (1998)

    Google Scholar 

  6. Everitt B.S. and Hand D.J.: Finite Mixture Distributions. Chapman and Hall, London (1981)

    Google Scholar 

  7. Figueiredo, M.A.T. and Jain A.K.: Unsupervised Learning of Finite Mixture Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3) (2002) 381–396.

    Article  Google Scholar 

  8. Hunt L.A. and Jorgensen M.A.: Mixture model clustering using the multimix program. Australian and New Zealand Journal of Statistics, 41(2) (1999) 153–171.

    Article  MATH  Google Scholar 

  9. Lebedev N.N.: Special functions and their applications. Prentice-Hall, NJ (1965)

    Google Scholar 

  10. Liu C. and Rubin D.B.: ML Estimation of t distribution using EM and its extensions, ECM and ECME. Statistica Sinica, 5 (1995) 19–39.

    MATH  MathSciNet  Google Scholar 

  11. McLachlan G.J. and Basford K.E.: Mixture Models. Marcel Dekker, NY (1988)

    Google Scholar 

  12. McLachlan G.J. and Peel D.: Finite Mixture Models. John Wiley, NY USA (2000)

    MATH  Google Scholar 

  13. McLachlan G.J., Peel D., Basford K.E. and Adams P.: The EMMIX software for the fitting of mixtures of Normal and t-components. Journal of Statistical Software, 4 (1999)

    Google Scholar 

  14. Schwarz G.: Estimating the dimension of a model. Annals of Statistics, 6 (1978) 461–464

    Article  MATH  MathSciNet  Google Scholar 

  15. Solomonoff R.J.: A formal theory of inductive inference. Information and Control, 7 (1964) 1–22, 224–254

    MathSciNet  MATH  Google Scholar 

  16. Titterington D.M., Smith A.F.M. and Makov U.E.: Statistical Analysis of Finite Mixture Distributions. John Wiley and Sons, Chichester (1985)

    Google Scholar 

  17. Wallace C.S. An improved program for classification. Proceedings of the Ninth Australian Computer Science Conference (ACSC-9), 8, Monash University Australia (1986) 357–366

    Google Scholar 

  18. Wallace C.S. and Boulton D.M.: An information measure for classification. Computer Journal, 11(2), (1968) 185–194

    MATH  Google Scholar 

  19. Wallace C.S. and Dowe D.L.: MML estimation of the von Mises concentration parameter. Technical Report TR 93/193, Dept. of Computer Science, Monash University Clayton 3800 Australia (1993)

    Google Scholar 

  20. Wallace C.S. and Dowe D.L.: Intrinsic classification by MML-the Snob program. In Zhang C. et al. (Eds.), Proc. 7th Australia Joint Conference on Artificial Intelligence. World Scientific, Singapore (1994) 37–44

    Google Scholar 

  21. Wallace C.S., and Dowe D.L.: Minimum Message Length and Kolmogorov Complexity. Computer Journal, 42(4) (1999) 270–283, Special issue on Kolmogorov Complexity.

    Article  MATH  Google Scholar 

  22. Wallace C.S., and Dowe D.L.: MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing, 10(1) (2000) 73–83

    Article  Google Scholar 

  23. Wallace C.S. and Freeman P.R.: Estimation and Inference by Compact Coding. Journal of the Royal Statistical Society Series B, Vol. 49(3) (1987) 240–265

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Agusta, Y., Dowe, D.L. (2002). MML Clustering of Continuous-Valued Data Using Gaussian and t Distributions. In: McKay, B., Slaney, J. (eds) AI 2002: Advances in Artificial Intelligence. AI 2002. Lecture Notes in Computer Science(), vol 2557. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36187-1_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-36187-1_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00197-3

  • Online ISBN: 978-3-540-36187-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics