Abstract
Clustering is an unsupervised process since there are no predefined classes and no examples that would indicate grouping properties in the data set. The majority of the clustering algorithms behave differently depending on the features of the data set and the initial assumptions for defining groups. Therefore, in most applications the resulting clustering scheme requires some sort of evaluation as regards its validity. Evaluating and assessing the results of a clustering algorithm is the main subject of cluster validity. In this paper we present a review of the clustering validity and methods. More specifically, Part I of the paper discusses the cluster validity approaches based on external and internal criteria.
- Michael J. A. Berry, Gordon Linoff . Data Mining Techniques For marketing, Sales and Customer Support. John Willey & Sons, Inc, 1996.]] Google ScholarDigital Library
- Dave, R. N. . "Validating fuzzy partitions obtained through c-shells clustering", Pattern Recognition Letters, Vol .10, pp613-623, 1996.]] Google ScholarDigital Library
- Ester, M., Kriegel, H-P., Sander, J., Xu, X.. "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proceedings of 2nd Int. Conf. On Knowledge Discovery and Data Mining, Portland, pp. 226-23, 1996.]]Google Scholar
- Fayyad, M. U., Piatesky-Shapiro, G., Smuth P., Uthurusamy, R.. Advances in Knowledge Discovery and Data Mining. AAAI Press, 1996]] Google ScholarDigital Library
- Gath I., Geva A. B. "Unsupervised optimal fuzzy clustering", IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 11(7), 1989.]] Google ScholarDigital Library
- Guha, S, Rastogi, R., Shim K. "ROCK: A Robust Clustering Algorithm for Categorical Attributes", Published in the Proceedings of the IEEE Conference on Data Engineering, 1999.]] Google ScholarDigital Library
- Han, J., Kamber, M. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2001.]] Google ScholarDigital Library
- Jain, A. K., Murty, M. N., Flyn, P. J.. "Data Clustering: A Review", ACM Computing Surveys, Vol.31, No3, 1999.]] Google ScholarDigital Library
- MacQueen, J. B. "Some Methods for Classification and Analysis of Multivariate Observations", In Proceedings of 5th Berkley Symposium on Mathematical Statistics and Probability, Volume I: Statistics, pp281-297, 1967.]]Google Scholar
- Rezaee, R, Lelieveldt, B. P. F., Reiber, J. H. C. "A new cluster validity index for the fuzzy c-mean", Pattern Recognition Letters, 19, pp. 237-246, 1998.]] Google ScholarDigital Library
- Theodoridis, S., Koutroubas, K.. Pattern recognition, Academic Press, 1999.]] Google ScholarDigital Library
- Xie, X. L, Beni, G.. "A Validity measure for Fuzzy Clustering", IEEE Transactions on Pattern Analysis and machine Intelligence, Vol.13, No4, 1991.]] Google ScholarDigital Library
Index Terms
- Cluster validity methods: part I
Recommendations
Relational visual cluster validity (RVCV)
The assessment of cluster validity plays a very important role in cluster analysis. Most commonly used cluster validity methods are based on statistical hypothesis testing or finding the best clustering scheme by computing a number of different cluster ...
Some connectivity based cluster validity indices
Identification of the correct number of clusters and the appropriate partitioning technique are some important considerations in clustering where several cluster validity indices, primarily utilizing the Euclidean distance, have been used in the ...
Cluster validity measurement techniques
AIKED'06: Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data BasesClustering is a process of discovering groups of objects such that the objects of the same group are similar, and the objects belonging to different groups are dissimilar. Several research fields deal with the problem of clustering: for example pattern ...
Comments