skip to main content
article

On clusterings: Good, bad and spectral

Published:01 May 2004Publication History
Skip Abstract Section

Abstract

We motivate and develop a natural bicriteria measure for assessing the quality of a clustering that avoids the drawbacks of existing measures. A simple recursive heuristic is shown to have poly-logarithmic worst-case guarantees under the new measure. The main result of the article is the analysis of a popular spectral algorithm. One variant of spectral clustering turns out to have effective worst-case guarantees; another finds a "good" clustering, if one exists.

References

  1. Alpert, C., Kahng, A., and Yao, Z. 1999. Spectral partitioning: The more eigenvectors the better. Discr. Appl. Math. 90, 3--26. Google ScholarGoogle Scholar
  2. Azar, Y., Fiat, A., Karlin, A., McSherry F., and Saia, J. 2001. Spectral analysis of data. In Proceedings of 33rd Symposium on Theory of Computing, ACM, New York, 619--626. Google ScholarGoogle Scholar
  3. Benczur, A., and Karger, D. 1996. Approximate s--t min-cuts in O(n2) time. In Proceedings of 28th Symposium on Theory of Computing. ACM, New York, 47--55. Google ScholarGoogle Scholar
  4. Charikar, M., Chekuri, C., Feder, T., and Motwani, R. 1997. Incremental clustering and dynamic information retrieval. In Proceedings of 29th Symposium on Theory of Computing. ACM, New York, 626--635. Google ScholarGoogle Scholar
  5. Charikar, M., Guha, S., Shmoys, D., and Tardos, E. 1999. A constant-factor approximation for the k-median problem. In Proceedings of 31st Symposium on Theory of Computing. ACM, New York, 1--10. Google ScholarGoogle Scholar
  6. Cheng, D., Kannan, R., Vempala, S., and Wang, G. 2003. On a recursive spectral algorithm for clustering from pairwise similarities. MIT LCS Technical Report MIT-LCS-TR-906.Google ScholarGoogle Scholar
  7. Dhillon, I. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM Conference on Knowledge Discovery and Data Mining. ACM, New York, 269--274. Google ScholarGoogle Scholar
  8. Drineas, P., Frieze, A., Kannan, R., Vempala, S., and Vinay, V. 1999. Clustering in large graphs and matrices. Proceedings of 10th Symposium on Discrete Algorithms. ACM, New York, 291--299. Google ScholarGoogle Scholar
  9. Dyer, M., and Frieze, A. 1985. A simple heuristic for the p-center problem. Oper. Res. Lett. 3(6), 285--288.Google ScholarGoogle Scholar
  10. Eigencluster See http://www-math.mit.edu/cluster/.Google ScholarGoogle Scholar
  11. Frieze, A., Kannan, R., and Vempala, S. 1998. Fast Monte-Carlo algorithms for finding low-rank approximations, In Proceedings of 39th Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif, 370--378. Google ScholarGoogle Scholar
  12. Garey, M., and Johnson, D. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, San Francisco, Calif. Google ScholarGoogle Scholar
  13. Indyk, P., 1999. A sublinear time approximation scheme for clustering in metric spaces, In Proceedings of 40th Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., 154--159. Google ScholarGoogle Scholar
  14. Jain, K., and Vazirani, V. 2001. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. ACM 48, 2, 274--296. Google ScholarGoogle Scholar
  15. Leighton, T., and Rao, S. 1999. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. J. ACM 46, 6, 787--832. Google ScholarGoogle Scholar
  16. Papadimitriou, C., Raghavan, P., Tamaki, H., and Vempala, S. 2000. Latent semantic indexing: a probabilistic analysis, J. Comput. Syst. Sci., 61, 217--235. Google ScholarGoogle Scholar
  17. Shi, J., and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. Patt. Anal. Mach. Intell., 22, 8, 888--905, (See http://www-2.cs.cmu.edu/∼jshi/Grouping/.) Google ScholarGoogle Scholar
  18. Sinclair, A., and Jerrum, M. 1989. Approximate counting, uniform generation and rapidly mixing Markov chains. Inf. Comput. 82, 93--133. Google ScholarGoogle Scholar
  19. Stewart, G. 1973. Error and perturbation bounds for subspaces associated with certain eigenvalue problems. SIAM Rev. 154, 727--764.Google ScholarGoogle Scholar
  20. Weiss, Y. 1999. Segmentation using eigenvectors: A unifying view. In Proceedings of IEEE International Conference on Computer Vision. IEEE Computer Society Press, Los Alamitos, Calif., 975--982. Google ScholarGoogle Scholar

Index Terms

  1. On clusterings: Good, bad and spectral

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Journal of the ACM
          Journal of the ACM  Volume 51, Issue 3
          May 2004
          153 pages
          ISSN:0004-5411
          EISSN:1557-735X
          DOI:10.1145/990308
          Issue’s Table of Contents

          Copyright © 2004 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 May 2004
          Published in jacm Volume 51, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader