article

On clusterings: Good, bad and spectral

Authors:
Ravi Kannan

Yale University, New Haven, Connecticut, CT

Yale University, New Haven, Connecticut, CT
View Profile

,
Santosh Vempala

M.I.T., Cambridge, Massachusetts, MA

M.I.T., Cambridge, Massachusetts, MA
View Profile

,
Adrian Vetta

M.I.T., Cambridge, Massachusetts, MA

M.I.T., Cambridge, Massachusetts, MA
View Profile

Authors Info & Claims

Journal of the ACM Volume 51 Issue 3pp 497–515https://doi.org/10.1145/990308.990313

Published:01 May 2004Publication History

Journal of the ACM

Abstract

We motivate and develop a natural bicriteria measure for assessing the quality of a clustering that avoids the drawbacks of existing measures. A simple recursive heuristic is shown to have poly-logarithmic worst-case guarantees under the new measure. The main result of the article is the analysis of a popular spectral algorithm. One variant of spectral clustering turns out to have effective worst-case guarantees; another finds a "good" clustering, if one exists.

References

Alpert, C., Kahng, A., and Yao, Z. 1999. Spectral partitioning: The more eigenvectors the better. Discr. Appl. Math. 90, 3--26. Google Scholar
Azar, Y., Fiat, A., Karlin, A., McSherry F., and Saia, J. 2001. Spectral analysis of data. In Proceedings of 33rd Symposium on Theory of Computing, ACM, New York, 619--626. Google Scholar
Benczur, A., and Karger, D. 1996. Approximate s--t min-cuts in O(n2) time. In Proceedings of 28th Symposium on Theory of Computing. ACM, New York, 47--55. Google Scholar
Charikar, M., Chekuri, C., Feder, T., and Motwani, R. 1997. Incremental clustering and dynamic information retrieval. In Proceedings of 29th Symposium on Theory of Computing. ACM, New York, 626--635. Google Scholar
Charikar, M., Guha, S., Shmoys, D., and Tardos, E. 1999. A constant-factor approximation for the k-median problem. In Proceedings of 31st Symposium on Theory of Computing. ACM, New York, 1--10. Google Scholar
Cheng, D., Kannan, R., Vempala, S., and Wang, G. 2003. On a recursive spectral algorithm for clustering from pairwise similarities. MIT LCS Technical Report MIT-LCS-TR-906.Google Scholar
Dhillon, I. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM Conference on Knowledge Discovery and Data Mining. ACM, New York, 269--274. Google Scholar
Drineas, P., Frieze, A., Kannan, R., Vempala, S., and Vinay, V. 1999. Clustering in large graphs and matrices. Proceedings of 10th Symposium on Discrete Algorithms. ACM, New York, 291--299. Google Scholar
Dyer, M., and Frieze, A. 1985. A simple heuristic for the p-center problem. Oper. Res. Lett. 3(6), 285--288.Google Scholar
Eigencluster See http://www-math.mit.edu/cluster/.Google Scholar
Frieze, A., Kannan, R., and Vempala, S. 1998. Fast Monte-Carlo algorithms for finding low-rank approximations, In Proceedings of 39th Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif, 370--378. Google Scholar
Garey, M., and Johnson, D. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, San Francisco, Calif. Google Scholar
Indyk, P., 1999. A sublinear time approximation scheme for clustering in metric spaces, In Proceedings of 40th Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., 154--159. Google Scholar
Jain, K., and Vazirani, V. 2001. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. ACM 48, 2, 274--296. Google Scholar
Leighton, T., and Rao, S. 1999. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. J. ACM 46, 6, 787--832. Google Scholar
Papadimitriou, C., Raghavan, P., Tamaki, H., and Vempala, S. 2000. Latent semantic indexing: a probabilistic analysis, J. Comput. Syst. Sci., 61, 217--235. Google Scholar
Shi, J., and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. Patt. Anal. Mach. Intell., 22, 8, 888--905, (See http://www-2.cs.cmu.edu/&sim;jshi/Grouping/.) Google Scholar
Sinclair, A., and Jerrum, M. 1989. Approximate counting, uniform generation and rapidly mixing Markov chains. Inf. Comput. 82, 93--133. Google Scholar
Stewart, G. 1973. Error and perturbation bounds for subspaces associated with certain eigenvalue problems. SIAM Rev. 154, 727--764.Google Scholar
Weiss, Y. 1999. Segmentation using eigenvectors: A unifying view. In Proceedings of IEEE International Conference on Computer Vision. IEEE Computer Society Press, Los Alamitos, Calif., 975--982. Google Scholar

Index Terms

On clusterings: Good, bad and spectral
1. Information systems
  1. Information retrieval
  2. Information storage systems
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Characterization and evaluation of similarity measures for pairs of clusterings

In evaluating the results of cluster analysis, it is common practice to make use of a number of fixed heuristics rather than to compare a data clustering directly against an empirically derived standard, such as a clustering empirically obtained from ...
Read More
Combining Multiple Clusterings Using Evidence Accumulation

We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of ...
Read More
Combining multiple clusterings using similarity graph

Multiple clusterings are produced for various needs and reasons in both distributed and local environments. Combining multiple clusterings into a final clustering which has better overall quality has gained importance recently. It is also expected that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Journal of the ACM Volume 51, Issue 3
May 2004
153 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/990308
Issue’s Table of Contents

Copyright © 2004 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2004
Published in jacm Volume 51, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Clustering
graph algorithms
spectral methods
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 577
  Total Citations
  View Citations
- 4,890
  Total Downloads
- Downloads (Last 12 months)97
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On clusterings: Good, bad and spectral

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Characterization and evaluation of similarity measures for pairs of clusterings

Combining Multiple Clusterings Using Evidence Accumulation

Combining multiple clusterings using similarity graph

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

On clusterings: Good, bad and spectral

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Characterization and evaluation of similarity measures for pairs of clusterings

Combining Multiple Clusterings Using Evidence Accumulation

Combining multiple clusterings using similarity graph

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media