research-article

Open Access

Dip-based Deep Embedded Clustering with k-Estimation

Authors:
Collin Leiber

LMU Munich & MCML, Munich, Germany

LMU Munich & MCML, Munich, Germany
View Profile

,
Lena G. M. Bauer

University of Vienna, Vienna, Austria

University of Vienna, Vienna, Austria
View Profile

,
Benjamin Schelling

University of Vienna, Vienna, Austria

University of Vienna, Vienna, Austria
View Profile

,
Christian Böhm

LMU Munich & MCML, Munich, Germany

LMU Munich & MCML, Munich, Germany
View Profile

,
Claudia Plant

University of Vienna, Vienna, Austria

University of Vienna, Vienna, Austria
View Profile

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningAugust 2021Pages 903–913https://doi.org/10.1145/3447548.3467316

Published:14 August 2021Publication History

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Pages 903–913

ABSTRACT

The combination of clustering with Deep Learning has gained much attention in recent years. Unsupervised neural networks like autoencoders can autonomously learn the essential structures in a data set. This idea can be combined with clustering objectives to learn relevant features automatically. Unfortunately, they are often based on a k-means framework, from which they inherit various assumptions, like spherical-shaped clusters. Another assumption, also found in approaches outside the k-means-family, is knowing the number of clusters a-priori. In this paper, we present the novel clustering algorithm DipDECK, which can estimate the number of clusters simultaneously to improving a Deep Learning-based clustering objective. Additionally, we can cluster complex data sets without assuming only spherically shaped clusters. Our algorithm works by heavily overestimating the number of clusters in the embedded space of an autoencoder and, based on Hartigan's Dip-test - a statistical test for unimodality - analyses the resulting micro-clusters to determine which to merge. We show in extensive experiments the various benefits of our method: (1) we achieve competitive results while learning the clustering-friendly representation and number of clusters simultaneously; (2) our method is robust regarding parameters, stable in performance, and allows for more flexibility in the cluster shape; (3) we outperform relevant competitors in the estimation of the number of clusters.

Supplemental Material

KDD21-rst2120.mp4

mp4

118.7 MB

Download

References

Horst Bischof, Alevs Leonardis, and Alexander Selb. 1999. MDL principle for robust vector quantisation. Pattern Analysis & Applications, Vol. 2, 1 (1999), 59--72.Google ScholarDigital Library
Christian Böhm, Christos Faloutsos, Jia-Yu Pan, and Claudia Plant. 2006. Robust information-theoretic clustering. In SIGKDD. 65--75.Google Scholar
Theofilos Chamalis and Aristidis Likas. 2018. The Projected Dip-Means Clustering Algorithm. In Proceedings of the 10th Hellenic Conference on Artificial Intelligence.Google ScholarDigital Library
Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, and David Ha. 2018. Deep learning for classical Japanese literature. arXiv preprint arXiv:1812.01718 (2018).Google Scholar
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
L. Duan, C. Aggarwal, S. Ma, and S. Sathe. 2019. Improving Spectral Clustering with Deep Embedding and Cluster Estimation. In ICDM. 170--179.Google Scholar
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et almbox. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise.. In SIGKDD, Vol. 96. 226--231.Google ScholarDigital Library
Yu Feng and Greg Hamerly. 2007. PG-means: learning the number of clusters in data. In Advances in neural information processing systems. 393--400.Google Scholar
Kamran Ghasedi Dizaji, Amirhossein Herandi, Cheng Deng, Weidong Cai, and Heng Huang. 2017. Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In Proceedings of the IEEE international conference on computer vision. 5736--5745.Google ScholarCross Ref
Xifeng Guo, Long Gao, Xinwang Liu, and Jianping Yin. 2017. Improved Deep Embedded Clustering with Local Structure Preservation. In IJCAI.Google Scholar
Greg Hamerly and Charles Elkan. 2004. Learning the k in k-means. In Advances in neural information processing systems. 281--288.Google Scholar
J. A. Hartigan and P. M. Hartigan. 1985. The Dip Test of Unimodality. Ann. Statist., Vol. 13, 1 (03 1985), 70--84. https://doi.org/10.1214/aos/1176346577Google Scholar
Sebastian Houben, Johannes Stallkamp, Jan Salmen, Marc Schlipsing, and Christian Igel. 2013. Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark. In IJCNN. IEEE, 1--8.Google Scholar
Jonathan J. Hull. 1994. A database for handwritten text recognition research. IEEE Transactions on pattern analysis and machine intelligence, Vol. 16, 5 (1994), 550--554.Google ScholarDigital Library
Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, and Hanning Zhou. 2017. Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering. In IJCAI. 1965--1972. https://doi.org/10.24963/ijcai.2017/273Google Scholar
Argyris Kalogeratos and Aristidis Likas. 2012. Dip-means: an incremental clustering method for estimating the number of clusters. In Advances in Neural Information Processing Systems.Google Scholar
Yann Lecun. 1987. PhD thesis: Modeles connexionnistes de l'apprentissage (connectionist learning models). Universite P. et M. Curie (Paris 6).Google Scholar
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, Vol. 86, 11 (1998), 2278--2324.Google ScholarCross Ref
Samuel Maurus and Claudia Plant. 2016. Skinny-dip: clustering in a sea of noise. In SIGKDD. 1055--1064.Google Scholar
Dominik Mautz, Claudia Plant, and Christian Böhm. 2019. Deep embedded cluster tree. In ICDM. IEEE, 1258--1263.Google Scholar
Tom Monnier, Thibault Groueix, and Mathieu Aubry. 2020. Deep Transformation-Invariant Clustering. In NeurIPS.Google Scholar
Sudipto Mukherjee, Himanshu Asnani, Eugene Lin, and Sreeram Kannan. 2019. ClusterGAN: Latent Space Clustering in Generative Adversarial Networks. In AAAI. 4610--4617.Google Scholar
Dan Pelleg and Andrew W. Moore. 2000. X-Means: Extending K-Means with Efficient Estimation of the Number of Clusters. In ICML (ICML '00).Google ScholarDigital Library
B. Schelling, L. Bauer, S. Behzadi Soheil, and C. Plant. 2020. Utilizing Structure-rich Features to improve Clustering. In ECML-PKDD 2020.Google Scholar
B. Schelling and C. Plant. 2018. Dip Transformation: Enhancing the Structure of a Dataset and Thereby Improving Clustering. In ICDM.Google Scholar
Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance. Journal of Machine Learning Research (2010).Google Scholar
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. [arXiv]cs.LG/1708.07747 [cs.LG]Google Scholar
Junyuan Xie, Ross B. Girshick, and Ali Farhadi. 2016. Unsupervised Deep Embedding for Clustering Analysis. In ICML (JMLR Workshop and Conference Proceedings).Google Scholar
Bo Yang, Xiao Fu, Nicholas D. Sidiropoulos, and Mingyi Hong. 2017. Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering. In ICML.Google Scholar
Jianwei Yang, Devi Parikh, and Dhruv Batra. 2016. Joint Unsupervised Learning of Deep Representations and Image Clusters. In CVPR.Google Scholar
Lihi Zelnik-Manor and Pietro Perona. 2005. Self-Tuning Spectral Clustering. In Advances in Neural Information Processing Systems.Google Scholar

Index Terms

Dip-based Deep Embedded Clustering with k-Estimation
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
        Dimensionality reduction and manifold learning
    2. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Centroids-guided deep multi-view K-means clustering
Abstract
With the progress of deep learning used in unsupervised learning, deep approach based multi-view clustering methods have been increasingly proposed in recent years. However, in most of these methods, deep representation learning is not ...
Read More
Point Symmetry-based Deep Clustering
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

Clustering is a central task in unsupervised learning. Recent advances that perform clustering into learned deep features (such as DEC[14], IDEC [6] or VaDe [10]) have shown improvements over classical algorithms, but most of them are based on the ...
Read More
Deep Convolutional Center-Based Clustering
Pattern Recognition and Computer Vision
Abstract
Deep clustering utilizes deep neural networks to learn feature representation which is suitable for clustering. One popular category of deep clustering algorithms combines stacked autoencoder and k-means clustering by defining objectives including ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
August 2021
4259 pages
ISBN:9781450383325
DOI:10.1145/3447548
General Chairs:
Feida Zhu
Singapore Management University
,
Beng Chin Ooi
National University of Singapore
,
Chunyan Miao
Nanyang Technology University
,
Program Chairs:
Haixun Wang,
Iryna Skrypnyk,
Wynne Hsu,
Sanjay Chawla
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 August 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep clustering
dip-test
estimating the number of clusters
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 1,489
  Total Downloads
- Downloads (Last 12 months)429
- Downloads (Last 6 weeks)37
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dip-based Deep Embedded Clustering with k-Estimation

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Centroids-guided deep multi-view K-means clustering

Point Symmetry-based Deep Clustering

Deep Convolutional Center-Based Clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Dip-based Deep Embedded Clustering with k-Estimation

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Centroids-guided deep multi-view K-means clustering

Point Symmetry-based Deep Clustering

Deep Convolutional Center-Based Clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media