research-article

Gaussian Mixture Model Clustering with Incomplete Data

Authors:
Yi Zhang

School of Computer, NUDT, China

School of Computer, NUDT, China

0000-0001-8700-7712
View Profile

,
Miaomiao Li

Changsha University; School of Computer, NUDT

Changsha University; School of Computer, NUDT
View Profile

,
Siwei Wang

School of Computer, NUDT, China

School of Computer, NUDT, China
View Profile

,
Sisi Dai

School of Computer, NUDT, China

School of Computer, NUDT, China
View Profile

,
Lei Luo

School of Computer, NUDT, China

School of Computer, NUDT, China
View Profile

,
En Zhu

School of Computer, NUDT, China

School of Computer, NUDT, China
View Profile

,
Huiying Xu

College of Mathematics and Computer Science, Zhejiang Normal University; Department of Computer Science, City University of Hong Kong, China

College of Mathematics and Computer Science, Zhejiang Normal University; Department of Computer Science, City University of Hong Kong, China
View Profile

,
Xinzhong Zhu

College of Mathematics and Computer Science, Zhejiang Normal University, China

College of Mathematics and Computer Science, Zhejiang Normal University, China
View Profile

,
Chaoyun Yao

State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System, NUDT, China

State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System, NUDT, China
View Profile

,
Haoran Zhou

Chongqing University of Technology, China

Chongqing University of Technology, China
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17 Issue 1sArticle No.: 6pp 1–14https://doi.org/10.1145/3408318

Published:31 March 2021Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Gaussian mixture model (GMM) clustering has been extensively studied due to its effectiveness and efficiency. Though demonstrating promising performance in various applications, it cannot effectively address the absent features among data, which is not uncommon in practical applications. In this article, different from existing approaches that first impute the absence and then perform GMM clustering tasks on the imputed data, we propose to integrate the imputation and GMM clustering into a unified learning procedure. Specifically, the missing data is filled by the result of GMM clustering, and the imputed data is then taken for GMM clustering. These two steps alternatively negotiate with each other to achieve optimum. By this way, the imputed data can best serve for GMM clustering. A two-step alternative algorithm with proved convergence is carefully designed to solve the resultant optimization problem. Extensive experiments have been conducted on eight UCI benchmark datasets, and the results have validated the effectiveness of the proposed algorithm.

References

Elie Aljalbout, Vladimir Golkov, Yawar Siddiqui, and Daniel Cremers. 2018. Clustering with deep learning: Taxonomy and new methods. arXiv: Learning (2018).Google Scholar
Marco Aste, Massimo Boninsegna, Antonino Freno, and Edmondo Trentin. 2015. Techniques for dealing with incomplete data: A tutorial and survey. Pattern Analysis and Applications 18, 1 (2015), 1–29. Google ScholarDigital Library
Steffen Bickel and Tobias Scheffer. 2004. Multi-view clustering. In Fourth IEEE International Conference on Data Mining (ICDM'04). 19–26. Google ScholarDigital Library
Charles A. Bouman, Michael Shapiro, G. W. Cook, C. Brian Atkins, and Hui Cheng. 1997. Cluster: An Unsupervised Algorithm for Modeling Gaussian Mixtures. Technical Report.Google Scholar
Magalie Celton, Alain Malpertuy, Gaëlle Lelandais, and Alexandre G. De Brevern. 2010. Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments. BMC Genomics 11, 1 (2010), 15.Google ScholarCross Ref
Liang Du, Peng Zhou, Lei Shi, Hanmo Wang, Mingyu Fan, Wenjian Wang, and Yi-Dong Shen. 2015. Robust multiple kernel \(\)-means clustering using \(\)-norm. In International Joint Conference on Artificial Intelligence (IJCAI'15). 3476–3482. Google ScholarDigital Library
Eduard Eiben, Robert Ganian, Iyad Kanj, Sebastian Ordyniak, and Stefan Szeider. 2019. On clustering incomplete data. arXiv preprint arXiv:1911.01465 (2019).Google Scholar
Payam Ezatpoor, Justin Zhan, Jimmy Ming-Tai Wu, and Carter Chiu. 2018. Finding top-\(\) dominance on incomplete big data using Mapreduce framework. IEEE Access 6 (2018), 7872–7887.Google ScholarCross Ref
Pedro J. García-Laencina, José-Luis Sancho-Gómez, Aníbal R. Figueiras-Vidal, and Michel Verleysen. 2009. K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72, 7–9 (2009), 1483–1493. Google ScholarDigital Library
Zoubin Ghahramani and Michael I. Jordan. 1994. Supervised learning from incomplete data via an EM approach. In Advances in Neural Information Processing Systems. 120–127. Google ScholarDigital Library
Iffat A. Gheyas and Leslie S. Smith. 2010. A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 73, 16–18 (2010), 3039–3065. Google ScholarDigital Library
Xifeng Guo, Long Gao, Xinwang Liu, and Jianping Yin. 2017. Improved deep embedded clustering with local structure preservation. In International Joint Conference on Artificial Intelligence (IJCAI'17). 1753–1759. Google ScholarDigital Library
John A. Hartigan. 1975. Clustering Algorithms. John Wiley & Sons, Inc. Google ScholarDigital Library
Shao-Yuan Li, Yuan Jiang, and Zhi-Hua Zhou. 2014. Partial multi-view clustering. In Association for the Advance of Artificial Intelligence (AAAI'14). 1968–1974. Google ScholarDigital Library
Tianhao Li, Liyong Zhang, Wei Lu, Hui Hou, Xiaodong Liu, Witold Pedrycz, and Chongquan Zhong. 2017. Interval kernel fuzzy C-means clustering of incomplete data. Neurocomputing 237 (2017), 316–331. Google ScholarDigital Library
Xinwang Liu, Yong Dou, Jianping Yin, Lei Wang, and En Zhu. 2016. Multiple kernel k-means clustering with matrix-induced regularization. In Association for the Advance of Artificial Intelligence (AAAI'16). 1888–1894. Google ScholarDigital Library
Xinwang Liu, Wen Gao, Xinzhong Zhu, Miaomiao Li, Lei Wang, En Zhu, Tongliang Liu, Marius Kloft, Dinggang Shen, and Jianping Yin. 2020. Multiple kernel k-means with incomplete kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 5 (2020), 1191–1204. Google Scholar
Xinwang Liu, Lei Wang, Xinzhong Zhu, Miaomiao Li, En Zhu, Tongliang Liu, Li Liu, Yong Dou, and Jianping Yin. 2020. Absent multiple kernel learning algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 6 (2020), 1303–1316. Google ScholarCross Ref
Volodymyr Melnykov and Ranjan Maitra. 2010. Finite mixture models and model-based clustering. Statistics Surveys 4 (2010), 80–116.Google ScholarCross Ref
Erxue Min, Xifeng Guo, Qiang Liu, Gen Zhang, Jianjing Cui, and Jun Long. 2018. A survey of clustering with deep learning: From the perspective of network architecture. IEEE Access 6 (2018), 39501–39514.Google ScholarCross Ref
Chunfeng Song, Feng Liu, Yongzhen Huang, Liang Wang, and Tieniu Tan. 2013. Auto-encoder based data clustering. In Iberoamerican Congress on Pattern Recognition (CIARP'13). 117–124. Google ScholarDigital Library
Anusua Trivedi, Piyush Rai, Hal Daumé III, and Scott L. DuVall. 2010. Multiview clustering with incomplete views. In NIPS Workshop, Vol. 224.Google Scholar
Siwei Wang, Miaomiao Li, Ning Hu, En Zhu, Jingtao Hu, Xinwang Liu, and Jianping Yin. 2019. K-means clustering with incomplete data. IEEE Access 7 (2019), 69162–69171.Google ScholarCross Ref
Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, Qing Zhang, and Xiaodi Huang. 2015. Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Transactions on Image Processing 24, 11 (2015), 3939–3949.Google ScholarDigital Library
Y. Wang, L. Wu, X. Lin, and J. Gao. 2018. Multiview spectral clustering via structured low-rank matrix factorization. IEEE Transactions on Neural Networks and Learning Systems 29, 10 (2018), 4833–4843.Google ScholarCross Ref
Lin Wu, Yang Wang, and Ling Shao. 2019. Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Transactions on Image Processing 28, 4 (2019), 1602–1612.Google ScholarDigital Library
Cong-Hua Xie, Jin-Yi Chang, and Yong-Jun Liu. 2013. Estimating the number of components in Gaussian mixture models adaptively for medical image. Optik 124, 23 (2013), 6216–6221.Google ScholarCross Ref
Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised deep embedding for clustering analysis. In International Conference on Machine Learning (ICML'16). 478–487. Google ScholarDigital Library
Rui Xu and Donald Wunsch. 2005. Survey of clustering algorithms. IEEE Transactions on Neural Networks 16, 3 (2005), 645–678. Google ScholarDigital Library
Shi Yu, Léon-Charles Tranchevent, Xinhai Liu, Wolfgang Glänzel, Johan A. K. Suykens, Bart De Moor, and Yves Moreau. 2012. Optimized data fusion for kernel k-means clustering. IEEE TPAMI 34, 5 (2012), 1031–1039. Google ScholarDigital Library
En Zhu, Sihang Zhou, Yueqing Wang, Jianping Yin, Miaomiao Li, Xinwang Liu, and Yong Dou. 2017. Optimal neighborhood kernel clustering with multiple kernels. In Association for the Advance of Artificial Intelligence (AAAI'17). 2266–2272. Google ScholarDigital Library

Index Terms

Gaussian Mixture Model Clustering with Incomplete Data
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Unsupervised learning and clustering

Recommendations

A new feature selection method for Gaussian mixture clustering

With the wide applications of Gaussian mixture clustering, e.g., in semantic video classification [H. Luo, J. Fan, J. Xiao, X. Zhu, Semantic principal video shot classification via mixture Gaussian, in: Proceedings of the 2003 International Conference ...
Read More
Active curve axis Gaussian mixture models

Gaussian Mixture Models (GMM) have been broadly applied for the fitting of probability density function. However, due to the intrinsic linearity of GMM, usually many components are needed to appropriately fit the data distribution, when there are curve ...
Read More
Laplacian Regularized Gaussian Mixture Model for Data Clustering

Gaussian Mixture Models (GMMs) are among the most statistically mature methods for clustering. Each cluster is represented by a Gaussian distribution. The clustering process thereby turns to estimate the parameters of the Gaussian mixture, usually by ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 1s
January 2021
353 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3453990
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2021 Association for Computing Machinery.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 March 2021
- Accepted: 1 June 2020
- Revised: 1 May 2020
- Received: 1 April 2020
Published in tomm Volume 17, Issue 1s

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GMM
clustering
EM
incomplete data
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 692
  Total Downloads
- Downloads (Last 12 months)272
- Downloads (Last 6 weeks)47
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Gaussian Mixture Model Clustering with Incomplete Data

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

A new feature selection method for Gaussian mixture clustering

Active curve axis Gaussian mixture models

Laplacian Regularized Gaussian Mixture Model for Data Clustering