Article

Kernel k-means: spectral clustering and normalized cuts

Authors:
Inderjit S. Dhillon

University of Texas at Austin, Austin, TX

University of Texas at Austin, Austin, TX
View Profile

,
Yuqiang Guan

University of Texas at Austin, Austin, TX

University of Texas at Austin, Austin, TX
View Profile

,
Brian Kulis

University of Texas at Austin, Austin, TX

University of Texas at Austin, Austin, TX
View Profile

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2004Pages 551–556https://doi.org/10.1145/1014052.1014118

Published:22 August 2004Publication History

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 551–556

ABSTRACT

Kernel k-means and spectral clustering have both been used to identify clusters that are non-linearly separable in input space. Despite significant research, these methods have remained only loosely related. In this paper, we give an explicit theoretical connection between them. We show the generality of the weighted kernel k-means objective function, and derive the spectral clustering objective of normalized cut as a special case. Given a positive definite similarity matrix, our results lead to a novel weighted kernel k-means algorithm that monotonically decreases the normalized cut. This has important implications: a) eigenvector-based algorithms, which can be computationally prohibitive, are not essential for minimizing normalized cuts, b) various techniques, such as local search and acceleration schemes, may be used to improve the quality as well as speed of kernel k-means. Finally, we present results on several interesting data sets, including diametrical clustering of large gene-expression matrices and a handwriting recognition data set.

References

F. Bach and M. Jordan. Learning spectral clustering. In Proc. of NIPS-16. MIT Press, 2004.Google Scholar
A. Banerjee, S. Merugu, I. Dhillon, and J. Ghosh. Clustering with Bregman divergence. Proceeding of SIAM Data Mining conference, pages 234--245, 2004.Google ScholarCross Ref
N. Cristianini and J. Shawe-Taylor. Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge, U.K., 2000. Google ScholarDigital Library
I. S. Dhillon, J. Fan, and Y. Guan. Efficient clustering of very large document collections. In Data Mining for Scientific and Engineering Applications, pages 357--381. Kluwer Academic Publishers, 2001.Google Scholar
I. S. Dhillon, Y. Guan, and J. Kogan. Iterative clustering of high dimensional text data augmented by local search. In Proceedings of The 2002 IEEE International Conference on Data Mining, pages 131--138, 2002. Google ScholarDigital Library
I. S. Dhillon, E. M. Marcotte, and U. Roshan. Diametrical clustering for identifying anti-correlated gene clusters. Bioinformatics, 19(13):1612--1619, September 2003.Google ScholarCross Ref
M. Girolami. Mercer kernel based clustering in feature space. IEEE Transactions on Neural Networks, 13(4):669--688, 2002. Google ScholarDigital Library
G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins University Press, 1989.Google Scholar
R. Kannan, S. Vempala, and A. Vetta. On clusterings -- good, bad, and spectral. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science, 2000. Google ScholarDigital Library
A. Y. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Proc. of NIPS-14, 2001.Google Scholar
B. Scholkopf, A. Smola, and K.-R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:1299--1319, 1998. Google ScholarDigital Library
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(8):888--905, August 2000. Google ScholarDigital Library
S. X. Yu and J. Shi. Multiclass spectral clustering. In International Conference on Computer Vision, 2003. Google ScholarDigital Library
H. Zha, C. Ding, M. Gu, X. He, and H. Simon. Spectral relaxation for k-means clustering. In Neural Info. Processing Systems, 2001.Google Scholar

Index Terms

Kernel k-means: spectral clustering and normalized cuts
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information retrieval

Recommendations

The global kernel k-means algorithm for clustering in feature space

Kernel k-means is an extension of the standard k-means clustering algorithm that identifies nonlinearly separable clusters. In order to overcome the cluster initialization problem associated with this method, we propose the global kernel k-means ...
Read More
On affinity matrix normalization for graph cuts and spectral clustering

A relationship with invariant property about cluster's data assignment is established for graph partitioning problems.The relationship holds for normalized affinity matrix having constant row/column-sum.Consequently, the solution of numerous spectral ...
Read More
A distributed framework for trimmed Kernel k-Means clustering

Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of unlabeled data. Kernel k-Means is a state of the art ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2004
874 pages
ISBN:1581138881
DOI:10.1145/1014052
General Chairs:
Won Kim
Cyber Database Solutions
,
Ronny Kohavi
Amazon.com
,
Program Chairs:
Johannes Gehrke
Cornell University
,
William DuMouchel
AT&T Labs Research
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 August 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph partitioning
kernel k-means
spectral clustering
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 727
  Total Citations
  View Citations
- 6,556
  Total Downloads
- Downloads (Last 12 months)245
- Downloads (Last 6 weeks)44
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Kernel k-means: spectral clustering and normalized cuts

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

The global kernel k-means algorithm for clustering in feature space

On affinity matrix normalization for graph cuts and spectral clustering

A distributed framework for trimmed Kernel k-Means clustering