Abstract
We motivate and develop a natural bicriteria measure for assessing the quality of a clustering that avoids the drawbacks of existing measures. A simple recursive heuristic is shown to have poly-logarithmic worst-case guarantees under the new measure. The main result of the article is the analysis of a popular spectral algorithm. One variant of spectral clustering turns out to have effective worst-case guarantees; another finds a "good" clustering, if one exists.
- Alpert, C., Kahng, A., and Yao, Z. 1999. Spectral partitioning: The more eigenvectors the better. Discr. Appl. Math. 90, 3--26. Google Scholar
- Azar, Y., Fiat, A., Karlin, A., McSherry F., and Saia, J. 2001. Spectral analysis of data. In Proceedings of 33rd Symposium on Theory of Computing, ACM, New York, 619--626. Google Scholar
- Benczur, A., and Karger, D. 1996. Approximate s--t min-cuts in O(n2) time. In Proceedings of 28th Symposium on Theory of Computing. ACM, New York, 47--55. Google Scholar
- Charikar, M., Chekuri, C., Feder, T., and Motwani, R. 1997. Incremental clustering and dynamic information retrieval. In Proceedings of 29th Symposium on Theory of Computing. ACM, New York, 626--635. Google Scholar
- Charikar, M., Guha, S., Shmoys, D., and Tardos, E. 1999. A constant-factor approximation for the k-median problem. In Proceedings of 31st Symposium on Theory of Computing. ACM, New York, 1--10. Google Scholar
- Cheng, D., Kannan, R., Vempala, S., and Wang, G. 2003. On a recursive spectral algorithm for clustering from pairwise similarities. MIT LCS Technical Report MIT-LCS-TR-906.Google Scholar
- Dhillon, I. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM Conference on Knowledge Discovery and Data Mining. ACM, New York, 269--274. Google Scholar
- Drineas, P., Frieze, A., Kannan, R., Vempala, S., and Vinay, V. 1999. Clustering in large graphs and matrices. Proceedings of 10th Symposium on Discrete Algorithms. ACM, New York, 291--299. Google Scholar
- Dyer, M., and Frieze, A. 1985. A simple heuristic for the p-center problem. Oper. Res. Lett. 3(6), 285--288.Google Scholar
- Eigencluster See http://www-math.mit.edu/cluster/.Google Scholar
- Frieze, A., Kannan, R., and Vempala, S. 1998. Fast Monte-Carlo algorithms for finding low-rank approximations, In Proceedings of 39th Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif, 370--378. Google Scholar
- Garey, M., and Johnson, D. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, San Francisco, Calif. Google Scholar
- Indyk, P., 1999. A sublinear time approximation scheme for clustering in metric spaces, In Proceedings of 40th Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., 154--159. Google Scholar
- Jain, K., and Vazirani, V. 2001. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. ACM 48, 2, 274--296. Google Scholar
- Leighton, T., and Rao, S. 1999. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. J. ACM 46, 6, 787--832. Google Scholar
- Papadimitriou, C., Raghavan, P., Tamaki, H., and Vempala, S. 2000. Latent semantic indexing: a probabilistic analysis, J. Comput. Syst. Sci., 61, 217--235. Google Scholar
- Shi, J., and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. Patt. Anal. Mach. Intell., 22, 8, 888--905, (See http://www-2.cs.cmu.edu/∼jshi/Grouping/.) Google Scholar
- Sinclair, A., and Jerrum, M. 1989. Approximate counting, uniform generation and rapidly mixing Markov chains. Inf. Comput. 82, 93--133. Google Scholar
- Stewart, G. 1973. Error and perturbation bounds for subspaces associated with certain eigenvalue problems. SIAM Rev. 154, 727--764.Google Scholar
- Weiss, Y. 1999. Segmentation using eigenvectors: A unifying view. In Proceedings of IEEE International Conference on Computer Vision. IEEE Computer Society Press, Los Alamitos, Calif., 975--982. Google Scholar
Index Terms
- On clusterings: Good, bad and spectral
Recommendations
Characterization and evaluation of similarity measures for pairs of clusterings
In evaluating the results of cluster analysis, it is common practice to make use of a number of fixed heuristics rather than to compare a data clustering directly against an empirically derived standard, such as a clustering empirically obtained from ...
Combining Multiple Clusterings Using Evidence Accumulation
We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of ...
Combining multiple clusterings using similarity graph
Multiple clusterings are produced for various needs and reasons in both distributed and local environments. Combining multiple clusterings into a final clustering which has better overall quality has gained importance recently. It is also expected that ...
Comments