ABSTRACT
Kernel k-means and spectral clustering have both been used to identify clusters that are non-linearly separable in input space. Despite significant research, these methods have remained only loosely related. In this paper, we give an explicit theoretical connection between them. We show the generality of the weighted kernel k-means objective function, and derive the spectral clustering objective of normalized cut as a special case. Given a positive definite similarity matrix, our results lead to a novel weighted kernel k-means algorithm that monotonically decreases the normalized cut. This has important implications: a) eigenvector-based algorithms, which can be computationally prohibitive, are not essential for minimizing normalized cuts, b) various techniques, such as local search and acceleration schemes, may be used to improve the quality as well as speed of kernel k-means. Finally, we present results on several interesting data sets, including diametrical clustering of large gene-expression matrices and a handwriting recognition data set.
- F. Bach and M. Jordan. Learning spectral clustering. In Proc. of NIPS-16. MIT Press, 2004.Google Scholar
- A. Banerjee, S. Merugu, I. Dhillon, and J. Ghosh. Clustering with Bregman divergence. Proceeding of SIAM Data Mining conference, pages 234--245, 2004.Google ScholarCross Ref
- N. Cristianini and J. Shawe-Taylor. Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge, U.K., 2000. Google ScholarDigital Library
- I. S. Dhillon, J. Fan, and Y. Guan. Efficient clustering of very large document collections. In Data Mining for Scientific and Engineering Applications, pages 357--381. Kluwer Academic Publishers, 2001.Google Scholar
- I. S. Dhillon, Y. Guan, and J. Kogan. Iterative clustering of high dimensional text data augmented by local search. In Proceedings of The 2002 IEEE International Conference on Data Mining, pages 131--138, 2002. Google ScholarDigital Library
- I. S. Dhillon, E. M. Marcotte, and U. Roshan. Diametrical clustering for identifying anti-correlated gene clusters. Bioinformatics, 19(13):1612--1619, September 2003.Google ScholarCross Ref
- M. Girolami. Mercer kernel based clustering in feature space. IEEE Transactions on Neural Networks, 13(4):669--688, 2002. Google ScholarDigital Library
- G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins University Press, 1989.Google Scholar
- R. Kannan, S. Vempala, and A. Vetta. On clusterings -- good, bad, and spectral. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science, 2000. Google ScholarDigital Library
- A. Y. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Proc. of NIPS-14, 2001.Google Scholar
- B. Scholkopf, A. Smola, and K.-R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:1299--1319, 1998. Google ScholarDigital Library
- J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(8):888--905, August 2000. Google ScholarDigital Library
- S. X. Yu and J. Shi. Multiclass spectral clustering. In International Conference on Computer Vision, 2003. Google ScholarDigital Library
- H. Zha, C. Ding, M. Gu, X. He, and H. Simon. Spectral relaxation for k-means clustering. In Neural Info. Processing Systems, 2001.Google Scholar
Index Terms
- Kernel k-means: spectral clustering and normalized cuts
Recommendations
The global kernel k-means algorithm for clustering in feature space
Kernel k-means is an extension of the standard k-means clustering algorithm that identifies nonlinearly separable clusters. In order to overcome the cluster initialization problem associated with this method, we propose the global kernel k-means ...
On affinity matrix normalization for graph cuts and spectral clustering
A relationship with invariant property about cluster's data assignment is established for graph partitioning problems.The relationship holds for normalized affinity matrix having constant row/column-sum.Consequently, the solution of numerous spectral ...
A distributed framework for trimmed Kernel k-Means clustering
Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of unlabeled data. Kernel k-Means is a state of the art ...
Comments