research-article

Towards an Optimal Subspace for K-Means

Authors:
Dominik Mautz

Ludwig-Maximilians-Universität München, Munich, Germany

Ludwig-Maximilians-Universität München, Munich, Germany
View Profile

,
Wei Ye

Ludwig-Maximilians-Universität München, Munich, Germany

Ludwig-Maximilians-Universität München, Munich, Germany
View Profile

,
Claudia Plant

University of Vienna, Vienna, Austria

University of Vienna, Vienna, Austria
View Profile

,
Christian Böhm

Ludwig-Maximilians-Universität München, Munich, Germany

Ludwig-Maximilians-Universität München, Munich, Germany
View Profile

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2017Pages 365–373https://doi.org/10.1145/3097983.3097989

Published:04 August 2017Publication History

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 365–373

ABSTRACT

Is there an optimal dimensionality reduction for k-means, revealing the prominent cluster structure hidden in the data? We propose SUBKMEANS, which extends the classic k-means algorithm. The goal of this algorithm is twofold: find a sufficient k-means-style clustering partition and transform the clusters onto a common subspace, which is optimal for the cluster structure. Our solution is able to pursue these two goals simultaneously. The dimensionality of this subspace is found automatically and therefore the algorithm comes without the burden of additional parameters. At the same time this subspace helps to mitigate the curse of dimensionality. The SUBKMEANS optimization algorithm is intriguingly simple and efficient. It is easy to implement and can readily be adopted to the current situation. Furthermore, it is compatible to many existing extensions and improvements of k-means.

Supplemental Material

mautz_optimal_subspace.mp4

mp4

379.7 MB

Download

References

Charu C Aggarwal, Joel L Wolf, Philip S Yu, Cecilia Procopiuc, and Jong Soo Park 1999. Fast algorithms for projected clustering. In ACM SIGMoD Record, Vol. Vol. 28. ACM, 61--72.Google ScholarDigital Library
Charu C. Aggarwal and Philip S. Yu 2000. Finding Generalized Projected Clusters in High Dimensional Spaces Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD '00). ACM, New York, NY, USA, 70--81. https://doi.org/10.1145/342009.335383 Google ScholarDigital Library
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan 1998. Automatic subspace clustering of high dimensional data for data mining applications. Vol. Vol. 27. ACM. Google ScholarDigital Library
David Arthur and Sergei Vassilvitskii 2007. k-means: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 1027--1035.Google Scholar
Bahman Bahmani, Benjamin Moseley, Andrea Vattani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Scalable k-means. Proceedings of the VLDB Endowment Vol. 5, 7 (2012), 622--633. Google ScholarDigital Library
Christian Böhm, Karin Kailing, Peer Kröger, and Arthur Zimek 2004. Computing clusters of correlation connected objects Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM, 455--466.Google Scholar
Christian Böhm and Claudia Plant 2015. Mining Massive Vector Data on Single Instruction Multiple Data Microarchitectures Data Mining Workshop (ICDMW), 2015 IEEE International Conference on. IEEE, 597--606.Google Scholar
Fernando De la Torre and Takeo Kanade 2006. Discriminative cluster analysis. In Proceedings of the 23rd international conference on Machine learning. ACM, 241--248. Google ScholarDigital Library
Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. 2004. Kernel k-means: spectral clustering and normalized cuts Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22--25, 2004. 551--556. https://doi.org/10.1145/1014052.1014118Google ScholarDigital Library
Chris Ding and Tao Li. 2007. Adaptive dimension reduction using discriminant analysis and k-means clustering Proceedings of the 24th international conference on Machine learning. ACM, 521--528.Google Scholar
Joseph C Dunn. 1973. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. (1973).Google Scholar
Charles Elkan. 2003. Using the triangle inequality to accelerate k-means ICML, Vol. Vol. 3. 147--153.Google ScholarDigital Library
Sebastian Goebl, Xiao He, Claudia Plant, and Christian Böhm. 2014. Finding the optimal subspace for clustering. In Data Mining (ICDM), 2014 IEEE International Conference on. IEEE, 130--139. Google ScholarDigital Library
Robert M. Gray and David L. Neuhoff 1998. Quantization. IEEE transactions on information theory Vol. 44, 6 (1998), 2325--2383. Google ScholarDigital Library
Greg Hamerly. 2010. Making k-means even faster. In Proceedings of the 2010 SIAM international conference on data mining. SIAM, 130--140. Google ScholarCross Ref
Greg Hamerly and Charles Elkan 2004. Learning the k in k-means. (2004).Google Scholar
Aapo Hyvärinen and Erkki Oja 2000. Independent component analysis: algorithms and applications. Neural networks, Vol. 13, 4 (2000), 411--430. Google ScholarDigital Library
Anil K Jain and Richard C Dubes 1988. Algorithms for clustering data. Prentice-Hall, Inc.Google ScholarDigital Library
Ian Jolliffe. 2002. Principal component analysis. Wiley Online Library.Google Scholar
Brian Kulis and Michael I Jordan 2012. Revisiting k-means: New algorithms via Bayesian nonparametrics. In Proceedings of the 23rd International Conference on Machine Learning (2012).Google Scholar
Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information theory Vol. 28, 2 (1982), 129--137. Google ScholarDigital Library
Dijun Luo, Chris HQ Ding, and Heng Huang 2011. Linear Discriminant Analysis: New Formulations and Overfit Analysis. AAAI.Google Scholar
Andrew W Moore. 1999. Very fast EM-based mixture model clustering using multiresolution kd-trees. Advances in Neural information processing systems (1999), 543--549.Google Scholar
Dan Pelleg and Andrew Moore 1999. Accelerating exact k-means algorithms with geometric reasoning Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 277--281.Google Scholar
Dan Pelleg, Andrew W Moore, and others 2000. X-means: Extending K-means with Efficient Estimation of the Number of Clusters. ICML, Vol. Vol. 1.Google Scholar
Steven J Phillips. 2002. Acceleration of k-means and related clustering algorithms Workshop on Algorithm Engineering and Experimentation. Springer, 166--177.Google Scholar
Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller 1997. Kernel principal component analysis. In International Conference on Artificial Neural Networks. Springer, 583--588. Google ScholarCross Ref
Michael Steinbach, George Karypis, Vipin Kumar, and others. 2000. A comparison of document clustering techniques. In KDD workshop on text mining, Vol. Vol. 400. Boston, 525--526.Google Scholar
Joshua B Tenenbaum, Vin De Silva, and John C Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. science, Vol. 290, 5500 (2000), 2319--2323.Google Scholar
Max Welling and Kenichi Kurihara 2006. Bayesian K-Means as a" Maximization-Expectation" Algorithm. SDM. SIAM, 474--478.Google Scholar
Xindong Wu, Vipin Kumar, J Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J McLachlan, Angus Ng, Bing Liu, S Yu Philip, and others 2008. Top 10 algorithms in data mining. Knowledge and information systems Vol. 14, 1 (2008), 1--37. Google ScholarDigital Library
Jieping Ye, Zheng Zhao, and Mingrui Wu 2007. Discriminative K-means for Clustering. In Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6, 2007. 1649--1656. http://papers.nips.cc/paper/3176-discriminative-k-means-for-clusteringGoogle Scholar

Index Terms

Towards an Optimal Subspace for K-Means

Recommendations

Discovering Non-Redundant K-means Clusterings in Optimal Subspaces
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

A huge object collection in high-dimensional space can often be clustered in more than one way, for instance, objects could be clustered by their shape or alternatively by their color. Each grouping represents a different view of the data set. The new ...
Read More
Non-Redundant Subspace Clusterings with Nr-Kmeans and Nr-DipMeans
Special Issue on KDD 2018, Regular Papers and Survey Paper

A huge object collection in high-dimensional space can often be clustered in more than one way, for instance, objects could be clustered by their shape or alternatively by their color. Each grouping represents a different view of the dataset. The new ...
Read More
k-means discriminant maps for data visualization and classification
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

Over the years, many dimensionality reduction algorithms have been proposed for learning the structure of high dimensional data by linearly or non-linearly transforming it into a low-dimensional space. Some techniques can keep the local structure of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
General Chairs:
Stan Matwin
Dalhousie University
,
Shipeng Yu
LinkedIn
,
Faisal Farooq
IBM
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
dimensionality reduction
k-means
subspace
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 1,742
  Total Downloads
- Downloads (Last 12 months)54
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards an Optimal Subspace for K-Means

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Discovering Non-Redundant K-means Clusterings in Optimal Subspaces

Non-Redundant Subspace Clusterings with Nr-Kmeans and Nr-DipMeans

k-means discriminant maps for data visualization and classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards an Optimal Subspace for K-Means

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Discovering Non-Redundant K-means Clusterings in Optimal Subspaces

Non-Redundant Subspace Clusterings with Nr-Kmeans and Nr-DipMeans

k-means discriminant maps for data visualization and classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media