Holistic Assessment of Structure Discovery Capabilities of Clustering Algorithms

Höppner, Frank; Jahnke, Maximilian

doi:10.1007/978-3-030-46150-8_14

Frank Höppner¹⁴ &
Maximilian Jahnke¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11906))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1984 Accesses
2 Citations

Abstract

Existing cluster validity indices often possess a similar bias as the clustering algorithm they were introduced for, e.g. to determine the optimal number of clusters. We suggest an efficient and holistic assessment of the structure discovery capabilities of clustering algorithms based on three criteria. We determine the robustness or stability of cluster assignments and interpret it as the confidence of the clustering algorithm in its result. This information is then used to label the data and evaluate the consistency of the stability-assessment with the notion of a cluster as an area of dense and separated data. The resulting criteria of stability, structure and consistency provide interpretable means to judge the capabilities of clustering algorithms without the typical biases of prominent indices, including the judgment of a clustering tendency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, C.C., Reddy, C.K. (eds.): Data Clustering: Algorithms and Applications. Chapman & Hall, London (2013)
Google Scholar
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Perez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46, 243–256 (2013)
Article Google Scholar
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. - Theory Methods 3(1), 1–27 (1974)
Article MathSciNet Google Scholar
Chouikhi, H., Charrad, M., Ghazzali, N.: A comparison study of clustering validity indices. In: Global Summit on Computer & Information Technology, pp. 1–4 (2015)
Google Scholar
Davies, D., Bouldin, D.: A cluster separation measure. Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)
Article Google Scholar
Desgraupes, B.: Clustering indices. R-package ‘clusterCrit’ (2017)
Google Scholar
Dunn, J.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)
Article MathSciNet Google Scholar
Everitt, B.S., Landau, S.: Cluster Analysis. Wiley, Hoboken (2011)
Book Google Scholar
Färber, I., et al.: On using class-labels in evaluation of clustering. In Proceedings of MultiClust 2010 (2010)
Google Scholar
Fred, A., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 835–850 (2005)
Article Google Scholar
Halkidi, M., Vazirgiannis, M.: Clustering validity assessment: finding the optimal partitioning of a data set. In: IEEE International Conference on Data Mining, pp. 187–194 (2001)
Google Scholar
Jain, A.K., Murty, N.M., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Kaufman, L.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (2005)
Google Scholar
Kuncheva, L.I., Vetrov, D.P.: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1798–1808 (2006)
Article Google Scholar
Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004)
Article Google Scholar
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: Proceedings of International Conference on Data Mining, pp. 911–916 (2010)
Google Scholar
Pal, N.R., Bezdek, J.C.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3(3), 379–379 (1995)
Article Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article Google Scholar
Wiwie, C., Baumbach, J., Röttger, R.: Comparing the performance of biomedical clustering methods. Nat. Methods 12(11), 1033–1040 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Ostfalia University of Applied Sciences, 38302, Wolfenbüttel, Germany
Frank Höppner & Maximilian Jahnke

Authors

Frank Höppner
View author publications
You can also search for this author in PubMed Google Scholar
Maximilian Jahnke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frank Höppner .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
IRISA/Inria, Rennes, France
Elisa Fromont
University of Würzburg, Würzburg, Germany
Andreas Hotho
Leiden University, Leiden, The Netherlands
Arno Knobbe
ETH Zurich, Zurich, Switzerland
Marloes Maathuis
Institut National des Sciences Appliquées, Villeurbanne, France
Céline Robardet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Höppner, F., Jahnke, M. (2020). Holistic Assessment of Structure Discovery Capabilities of Clustering Algorithms. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-46150-8_14
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46149-2
Online ISBN: 978-3-030-46150-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)