Skip to main content
Log in

Methods for merging Gaussian mixture components

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

The problem of merging Gaussian mixture components is discussed in situations where a Gaussian mixture is fitted but the mixture components are not separated enough from each other to interpret them as “clusters”. The problem of merging Gaussian mixtures is not statistically identifiable, therefore merging algorithms have to be based on subjective cluster concepts. Cluster concepts based on unimodality and misclassification probabilities (“patterns”) are distinguished. Several different hierarchical merging methods are proposed for different cluster concepts, based on the ridgeline analysis of modality of Gaussian mixtures, the dip test, the Bhattacharyya dissimilarity, a direct estimator of misclassification and the strength of predicting pairwise cluster memberships. The methods are compared by a simulation study and application to two real datasets. A new visualisation method of the separation of Gaussian mixture components, the ordered posterior plot, is also introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803–821

    Article  MATH  MathSciNet  Google Scholar 

  • Baudry JP, Raftery AE, Celeux G, Lo K, Gottardo R (2008) Combining mixture components for clustering. Technical report 540, University of Washington, Seattle

  • Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE T Pattern Anal 22: 719–725

    Article  Google Scholar 

  • Campbell NA, Mahon RJ (1974) A multivariate study of variation in two species of rock crab of genus Leptograpsus. Aust J Zool 22: 417–425

    Article  Google Scholar 

  • Davies PL (1995) Data features. Stat Neerl 49: 185–245

    Article  MATH  Google Scholar 

  • Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97: 611–631

    Article  MATH  MathSciNet  Google Scholar 

  • Fraley C, Raftery AE (2003) Enhanced software for model-based clustering, density estimation and discriminant analysis. J Classif 20: 263–286

    Article  MATH  MathSciNet  Google Scholar 

  • Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic Press, New York

    MATH  Google Scholar 

  • Hartigan JA, Hartigan PM (1985) The dip test of unimodality. Ann Stat 13: 70–84

    Article  MATH  MathSciNet  Google Scholar 

  • Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J Roy Stat Soc B Met 58: 155–176

    MATH  MathSciNet  Google Scholar 

  • Hennig C (2005) Asymmetric linear dimension reduction for classification. J Comput Graph Stat 13: 930–945

    MathSciNet  Google Scholar 

  • Hennig C (2010) Ridgeline plot and clusterwise stability as tools for merging Gaussian mixture components. In: Locarek-Junge H, Weihs C (eds) Classification as a tool for research. Springer, Berlin, accepted for publication

  • Hennig C, Coretto P (2008) The noise component in model-based cluster analysis. In: Preisach C, Burkhard H, Schmidt-Thieme L, Decker R (eds) Data analysis, machine learning and applications. Springer, Berlin, pp 127–138

    Chapter  Google Scholar 

  • Keribin C (2000) Consistent estimation of the order of a mixture model. Sankhya Ser A 62: 49–66

    MATH  MathSciNet  Google Scholar 

  • Li J (2004) Clustering based on a multilayer mixture model. J Comput Graph Stat 14: 547–568

    Google Scholar 

  • Matusita K (1971) Some properties of affinity and applications. Ann I Stat Math 23: 137–155

    Article  MATH  MathSciNet  Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  • Qiu W, Joe H (2006) Generation of random clusters with specified degree of separation. J Classif 23: 315–334

    Article  MathSciNet  Google Scholar 

  • Ray S, Lindsay BG (2005) The topography of multivariate normal mixtures. Ann Stat 33: 2042–2065

    Article  MATH  MathSciNet  Google Scholar 

  • Redner RA, Walker HF (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26: 195–239

    Article  MATH  MathSciNet  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464

    Article  MATH  Google Scholar 

  • Street WN, Wolberg WH, Mangasarian OL (1993) Nuclear feature extraction for breast tumor diagnosis. IS & T/SPIE 1993 international symposium on electronic imaging: science and technology, vol 1905, San Jose, CA, pp 861–870

  • Tantrum J, Murua A, Stuetzle W (2003) Assessment and pruning of hierarchical model based clustering. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, pp 197–205

  • Tibshirani R, Walther G (2005) Cluster validation by prediction strength. J Comput Graph Stat 14: 511–528

    Article  MathSciNet  Google Scholar 

  • Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a dataset via the gap statistic. J Roy Stat Soc B Met 63: 411–423

    Article  MATH  MathSciNet  Google Scholar 

  • Ueda N, Nakano R, Ghahramani Z, Hinton GE (2000) SMEM algorithm for mixture models. Neural Comput 12: 2109–2128

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Hennig.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hennig, C. Methods for merging Gaussian mixture components. Adv Data Anal Classif 4, 3–34 (2010). https://doi.org/10.1007/s11634-010-0058-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-010-0058-3

Keywords

Mathematics Subject Classification (2000)

Navigation