Abstract
This lecture starts with two remarks:
-
1
Many people, papers, books claim “the number of clusters has to be given in advance”
-
1
Even good texts do not provide a formal definition of “cluster” or “clustering”
Starting with remark 1 we show, that any formal definition of “clustering” uniquely defines the number of clusters. For remark 2 we prove that relations called “similarities” define nicely “clustering”. We also show that every clustering is defined by some similarity, although it may be given only implicitly. Finding clusters given by some similarity is the same problem as the “maximal clique problem”. Hence it is NP-complete. It is also very clear that the very old techniques of “reduction of partial automate” is the same. In both areas algorithms are known, for maximal cliques even good heuristics. Coming back to remark 1. The problem that people really address, is the case where the desired similarity is not given explicitly, but by some of its properties. Among these, there may be the number of clusters. Very many of the known algorithms have the following structure:
-
1
Generate some similarity Pi
-
1
Is it “good”?
Yes: Stop here and use Pi for clustering
No: generate Pi+1 and continue.
For generating new similarities many techniques are used: distances, probabilities, etc. and also fuzzy theory. Finally we show that using “representatives” for clustering is not the same as using similarities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Reusch, B. (2008). How Many Clusters Are There? – An Essay on the Basic Notions of Clustering. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2008. Lecture Notes in Computer Science(), vol 5177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85563-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-85563-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85562-0
Online ISBN: 978-3-540-85563-7
eBook Packages: Computer ScienceComputer Science (R0)