Data Points Clustering via Gumbel Softmax

Acharya, Deepak Bhaskar; Zhang, Huaming

doi:10.1007/s42979-021-00707-4

Data Points Clustering via Gumbel Softmax

Original Research
Published: 31 May 2021

Volume 2, article number 311, (2021)
Cite this article

SN Computer Science Aims and scope Submit manuscript

659 Accesses
2 Citations
Explore all metrics

Abstract

Finding useful patterns in the dataset has been a fascinating topic, and one of the most researched problems in this area is identifying the cluster groups within the dataset. This research paper introduces a “new data clustering method” called Data Points Clustering via Gumbel Softmax (DPCGS) and demonstrates that it is suitable for clustering the data points datasets. We evaluate DPCGS efficiency and clustering quality through several experiments. Experiments show that statistically relevant clustering structures can be identified with our method, depending on the dataset. We also present a performance comparison table where we use datasets such as Wine, Wheat seeds, Iris, Wisconsin breast cancer and compare the DPCGS results with different benchmarking and recently proposed clustering algorithms such as Birch, K-Means, Affinity propagation, Agglomerative clustering, and Mini-batch K-Means and Nested mini-batch K-Means. Our method DPCGS performs better than most of the previously and recently proposed clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on semi-supervised learning

Article Open access 15 November 2019

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

A review of unsupervised feature selection methods

Article 29 January 2019

References

Acharya DB, Zhang H. Community detection clustering via gumbel softmax. SN Comput Sci. 2020;1(5):1–11.
Article Google Scholar
Acharya DB, Zhang H. Feature selection and extraction for graph neural networks. In: Proceedings of the 2020 ACM southeast conference, ACMSE, vol. 20. New York: Association for Computing Machinery; 2020. p. 252–5.
Chapter Google Scholar
Bodenhofer U, Kothmeier A, Hochreiter S. Apcluster: an r package for affinity propagation clustering. Bioinformatics. 2011;27(17):2463–4.
Article Google Scholar
Bottou L, Bengio Y. Convergence properties of the k-means algorithms. In: Advances in neural information processing systems. Berlin: Springer; 1995. p. 585–92.
Google Scholar
Buitinck L, Louppe G, Blondel M, Pedregosa F, Mueller A, Grisel O, Niculae V, Prettenhofer P, Gramfort A, Grobler J, Layton R, VanderPlas J, Joly A, Holt B, Varoquaux G. API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 2013, pp. 108–22.
Ding Y, Zhao Y, Shen X, Musuvathi M, Mytkowicz T. Yinyang k-means: a drop-in replacement of the classic k-means with consistent speedup. In: International conference on machine learning. London: PMLR; 2015. p. 579–87.
Google Scholar
Dua D, Graff C. UCI machine learning repository, 2017.
Elkan C. Using the triangle inequality to accelerate k-means. In Proceedings of the 20th international conference on Machine Learning (ICML-03), 2003, pp. 147–53.
Fout A, Byrd J, Shariat B, Ben-Hur A. Protein interface prediction using graph convolutional networks. In: Advances in neural information processing systems. Berlin: Springer; 2017. p. 6530–9.
Google Scholar
Gupta T, Panda SP. Clustering validation of clara and k-means using silhouette dunn measures on iris dataset. In 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), 2019, pp. 10–3.
Jang E, Gu S, Poole B. Categorical reparameterization with Gumbel-softmax. Toulon: ICLR; 2017.
Google Scholar
Kamal S, Ripon SH, Dey N, Ashour AS, Santhi V. A mapreduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset. Comput Methods Progr Biomed. 2016;131:191–206.
Article Google Scholar
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44.
Article Google Scholar
Lloyd S. Least squares quantization in PCM. IEEE Trans Inform Theory. 1982;28(2):129–37.
Article MathSciNet Google Scholar
Murtagh F, Legendre P. Wards hierarchical agglomerative clustering method: which algorithms implement wards criterion? J Class. 2014;31(3):274–95.
Article MathSciNet Google Scholar
Newling JP. Novel algorithms for clustering. EPFL: Technical Report; 2018.
Peng K, Leung VC, Huang Q. Clustering approach based on mini batch kmeans for intrusion detection system over big data. IEEE Access. 2018;6:11897–906.
Article Google Scholar
Praveen B, Menon V. Novel deep-learning-based spatial-spectral feature extraction for hyperspectral remote sensing applications. In 2019 IEEE International Conference on Big Data (Big Data), 2019, pp. 5444–52.
Praveen B, Menon V. Study of spatial-spectral feature extraction frameworks with 3-d convolutional neural network for robust hyperspectral imagery classification. IEEE J Sel Top Appl Earth Observ Remote Sens. 2021;14:1717–27.
Article Google Scholar
Rodriguez MZ, Comin CH, Casanova D, Bruno OM, Amancio DR, Costa LDF, Rodrigues FA. Clustering algorithms: a comparative approach. PloS One. 2019;14(1):e0210236.
Article Google Scholar
Rosenberg A, Hirschberg J. V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing, 2007, pp. 410–20.
Sanchez-Gonzalez A, Heess N, Springenberg JT, Merel J, Riedmiller M, Hadsell R, Battaglia P. Graph networks as learnable physics engines for inference and control. Stockholm: ICML; 2018.
Google Scholar
Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc. 2001;63(2):411–23.
Article MathSciNet Google Scholar
Vassilvitskii S, Arthur D. k-means++: the advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 2006, pp. 1027–35.
Yuan C, Yang H. Research on k-value selection method of k-means clustering algorithm. J Multidiscip Sci. 2019;2(2):226–35.
Google Scholar
Zhang T, Ramakrishnan R, Livny M. Birch: an efficient data clustering method for very large databases. ACM Sigmod Rec. 1996;25(2):103–14.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, The University of Alabama in Huntsville, Huntsville, AL, 35806, USA
Deepak Bhaskar Acharya & Huaming Zhang

Authors

Deepak Bhaskar Acharya
View author publications
You can also search for this author in PubMed Google Scholar
Huaming Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepak Bhaskar Acharya.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Source Code Available

The code for this research is available on github at: https://github.com/deepakacharyab/data_points_cluster_gumbel_softmax

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Acharya, D.B., Zhang, H. Data Points Clustering via Gumbel Softmax. SN COMPUT. SCI. 2, 311 (2021). https://doi.org/10.1007/s42979-021-00707-4

Download citation

Received: 02 October 2020
Accepted: 15 May 2021
Published: 31 May 2021
DOI: https://doi.org/10.1007/s42979-021-00707-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Points Clustering via Gumbel Softmax

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

A Comprehensive Survey of Clustering Algorithms

A review of unsupervised feature selection methods

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Source Code Available

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data Points Clustering via Gumbel Softmax

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

A Comprehensive Survey of Clustering Algorithms

A review of unsupervised feature selection methods

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Source Code Available

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation