ABSTRACT
This paper proposes a method that reduces the intra-cluster distance and increases the inter-cluster distance in the k-means problem with missing data. Filling in missing data, calculating intra-cluster distances between clusters, and clustering problems are integrated into one function, and solved through loop iterations. Finally, the method is applied to 4 UCI datasets, and the results show that the method has good effect.
- Jain, A. K. 2010. Data clustering: 50 years beyond K-means. Pattern recognition letters, 31(8), 651-666.Google Scholar
- Feng, J., Zhang, Y., Yue, G., Liu, X., Su, H., & Zhang, P. F. 2018. Atherosclerotic Plaque Pathological Analysis by Unsupervised $ K $-Means Clustering. IEEE Access, 6, 21530-21535.Google ScholarCross Ref
- Munir, M. U., Javed, M. Y., & Khan, S. A. 2012. A hierarchical k-means clustering based fingerprint quality classification. Neurocomputing, 85, 62-67.Google ScholarDigital Library
- Peng, K., Leung, V. C., & Huang, Q. 2018. Clustering approach based on mini batch kmeans for intrusion detection system over big data. IEEE Access, 6, 11897-11906.Google ScholarCross Ref
- Lin, X., & Li, C. T. 2016. Large-scale image clustering based on camera fingerprints. IEEE Transactions on Information Forensics and Security, 12(4), 793-808.Google Scholar
- Wang, S., Li, M., Hu, N., Zhu, E., Hu, J., Liu, X., & Yin, J. 2019. K-means clustering with incomplete data. IEEE Access, 7, 69162-69171.Google ScholarCross Ref
- Wu, S., & Chow, T. W. 2004. Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density. Pattern Recognition, 37(2), 175-188.Google ScholarCross Ref
- García-Laencina, P. J., Sancho-Gómez, J. L., Figueiras-Vidal, A. R., & Verleysen, M. 2009. K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing, 72(7-9), 1483-1493.Google ScholarDigital Library
- Aste, M., Boninsegna, M., Freno, A., & Trentin, E. 2015. Techniques for dealing with incomplete data: a tutorial and survey. Pattern Analysis and Applications, 18(1), 1-29.Google ScholarDigital Library
- Dempster, A. P., Laird, N. M., & Rubin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1-22.Google ScholarCross Ref
Recommendations
Inter cluster distance management model with optimal centroid estimation for K-means clustering algorithm
Clustering techniques are used to group up the transactions based on the relevancy. Cluster analysis is one of the primary data analysis method. The clustering process can be done in two ways such that Hierarchical clusters and partition clustering. ...
Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering
AbstractData distribution has a significant impact on clustering results. This study focuses on the effect of cluster size distribution on clustering, namely the uniform effect of k-means and fuzzy c-means (FCM) clustering. We first provide some related ...
Ant clustering algorithm with K-harmonic means clustering
Clustering is an unsupervised learning procedure and there is no a prior knowledge of data distribution. It organizes a set of objects/data into similar groups called clusters, and the objects within one cluster are highly similar and dissimilar with ...
Comments