Skip to main content

How Many Clusters Are There? – An Essay on the Basic Notions of Clustering

  • Conference paper
Knowledge-Based Intelligent Information and Engineering Systems (KES 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5177))

  • 1923 Accesses

Abstract

This lecture starts with two remarks:

  1. 1

    Many people, papers, books claim “the number of clusters has to be given in advance”

  2. 1

    Even good texts do not provide a formal definition of “cluster” or “clustering”

Starting with remark 1 we show, that any formal definition of “clustering” uniquely defines the number of clusters. For remark 2 we prove that relations called “similarities” define nicely “clustering”. We also show that every clustering is defined by some similarity, although it may be given only implicitly. Finding clusters given by some similarity is the same problem as the “maximal clique problem”. Hence it is NP-complete. It is also very clear that the very old techniques of “reduction of partial automate” is the same. In both areas algorithms are known, for maximal cliques even good heuristics. Coming back to remark 1. The problem that people really address, is the case where the desired similarity is not given explicitly, but by some of its properties. Among these, there may be the number of clusters. Very many of the known algorithms have the following structure:

  1. 1

    Generate some similarity Pi

  2. 1

    Is it “good”?

    Yes: Stop here and use Pi for clustering

    No: generate Pi+1 and continue.

For generating new similarities many techniques are used: distances, probabilities, etc. and also fuzzy theory. Finally we show that using “representatives” for clustering is not the same as using similarities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Authors

Editor information

Ignac Lovrek Robert J. Howlett Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Reusch, B. (2008). How Many Clusters Are There? – An Essay on the Basic Notions of Clustering. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2008. Lecture Notes in Computer Science(), vol 5177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85563-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85563-7_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85562-0

  • Online ISBN: 978-3-540-85563-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics