Skip to main content
Log in

A novel cluster validity index for fuzzy C-means algorithm

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

To overcome the main problem of the cluster number in many clustering applications, a new clustering approach with improved morphology similarity distance and the novel cluster validity index is proposed in this paper. An optimized morphology similarity distance based on the Standard Euclidean distance and ReliefF algorithm is used to create a new validity index, which can balance the intra-cluster consistency and inter-cluster consistency. The proposed validity index is combined with fuzzy C-means to produce a creative algorithm simply named the OMS-OSC algorithm. Experimental results obtained using different artificial data sets and real-world data sets show that the new algorithm can not only yield good performance but also detect the correct cluster number.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Bezdek JC (1974) Numerical taxonomy with fuzzy sets. J Cybern 1(1):57–71

    MathSciNet  MATH  Google Scholar 

  • Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybern 28:301–315

    Article  Google Scholar 

  • Bezdek R (2010) A cluster validity index for fuzzy clustering. Fuzzy Sets Syst 161:3014–3025

    Article  MathSciNet  MATH  Google Scholar 

  • Cui HY, Xie MZ, Cai YL, Huang X, Liu YJ (2014) Cluster validity index for adaptive clustering algorithms. Inst Eng Technol 8(13):2256–2263

    Google Scholar 

  • Cui LZ, Li GH, Lin QZ, Chen JY, Lu N (2016) Adaptive differential evolution algorithm with novel mutation strategies in multiple sub-populations. Comput Oper Res 67:155–173

    Article  MathSciNet  MATH  Google Scholar 

  • Ester M, Kriegel H, Sander J, Xu X (1996) On knowledge discovery and data mining. In: 2nd international conference. ACM, pp 226–231

  • Fu ZJ, Ren K, Shu JG, Sun XM, Huang FX (2015) Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans Parallel Distributed Sys 27:2546–2559

    Article  Google Scholar 

  • Fukuyama Y, Sugeno M (1989) A new method of choosing the number of clusters for fuzzy C-means method. In: Proceedings Of the 5th fuzzy system symposium, Japanese, pp 247–250

  • Gu B, Sheng VS (2016) A robust regularization path algorithm for \(\nu \)-support vector classification. IEEE Trans Neural Netw Learn Syst 1:1–8

    Google Scholar 

  • Gu B, Sheng VS, Wang ZJ, Ho D, Osman S, Li S (2015) Incremental learning for \(\nu \)-support vector regression. Neural Netw 67:140–150

    Article  Google Scholar 

  • Gu B, Sun XM, Sheng VS (2016) Structural minimax probability machine. IEEE Trans Neural Netw Learn Syst 1:1–11

    Google Scholar 

  • Hinneburg A, Keim D (1998) An efficient approach to clustering large multimedia databases with noise. In: Proceedings of the 4th ACM SIGKDD, ACM, New York, pp 58–65

  • Horiguchi Y, Suzuki T, Sawaragi T, Nakanishi H, Takimoto T (2016) Dominant pattern extraction from train driver’s eye-gaze data using Markov cluster algorithm. In: Joint 8th international conference on soft computing and intelligent systems and 17th international symposium on advanced intelligent systems, pp 116–122

  • Kaufman L, Rousseeuw JP (1990) Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken

    Book  MATH  Google Scholar 

  • Khan I, Huang JZ, Ivanov K (2016) Incremental density-based ensemble clustering over evolving data streams. Neurocomputing 191:34–43

    Article  Google Scholar 

  • Kim Y, Kim D, Lee D, Lee K (2004) A cluster validation index for GK cluster analysis based on relative degree of sharing. Inf Sci 168:225–242

    Article  MathSciNet  MATH  Google Scholar 

  • Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the 9th international workshop on machine learning, vol 48, pp 249–256

  • Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: ECML-94 Proceeding of the European conference on machine learning on machine learning. Springer

  • Kononenko I, Robnik-Sikonja M (2003) Theoretical and empirical analysis of ReliefF and RReliefF. In: Machine learning vol 53. Springer, pp 23–69

  • Li B, Wang M, Li XL, Tan SQ, Huang JW (2015a) A strategy of clustering modification directions in spatial image steganography. IEEE Trans Inf Forensics Secur 10(9):1905–1917

    Article  Google Scholar 

  • Li H, Li C, Hu J, Fan XD (2015b) A resampling based clustering algorithm for replicated gene expression data. IEEE/ACM Trans Comput Biol Bioinform 12(6):1295–1303

    Article  Google Scholar 

  • Li J, Li XL, Yang B, Sun XM (2015c) Segmentation-based image copy–move forgery detection scheme. IEEE Trans Inf Forensics Secur 10(3):507–518

    Article  Google Scholar 

  • Li K, Zhang C, Chen Z, Chen Y (2014) Development of a weighted fuzzy C-means clustering algorithm based on JADE. Int J Numer Anal Model Ser B 5:113–122

    MathSciNet  MATH  Google Scholar 

  • Li Z, Yuan JS, Zhang WH (2009) Fuzzy C-mean algorithm with morphology similarity distance. In: Sixth international conference on fuzzy systems and knowledge discovery. pp 90–94

  • Liang ZP, Sun JT, Lin QZ, Du ZH, Chen JY, Ming Z (2016) A novel multiple rule sets data classification algorithm based on ant colony algorithm. Appl Soft Comput 38:1000–1011

    Article  Google Scholar 

  • MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Stat 1:281–297

  • McDonough AL, Batavia M, Chen FC, Kwon S, Ziai J (2001) The validity and reliability of the GAITRite systems measurements: a preliminary evaluation. Arch Phys Med Rehabil 82:419–425

    Article  Google Scholar 

  • Pal NR, Bezdek JC (1995) On cluster validity for the fuzzy C-means model. IEEE Trans Fuzzy Syst 3(3):370–379

    Article  Google Scholar 

  • Raymond TN, Han JW (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th international conference on very large data bases. pp 144–155

  • Roubens M (1978) Pattern classification problems and fuzzy sets. Fuzzy Sets Syst 1:239–253

    Article  MathSciNet  MATH  Google Scholar 

  • Saad MF, Adel MA (2012) Validity index and number of clusters. Int J Comput Sci Issues 9(1):52–57

    Google Scholar 

  • Wen ZW, Li RJ (2010) Fuzzy C-means clustering algorithm based on improved PSO. Appl Res Comput 27:2520–2522

    Google Scholar 

  • Xie JY, Hone K, Xie WX, Gao XB, Shi Y, Liu XH (2013) Extending twin support vector machine classifier for multi-category classification problems. Intell Data Anal 17(4):649–664

    Google Scholar 

  • Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted \(k\)-nearest neighbors. Inf Sci 354:19–40

    Article  Google Scholar 

  • Xie XLL, Beni G (1991) A validity measure for fuzzy clustering. Trans Pattern Anal Mach Intell 13:841–847

    Article  Google Scholar 

  • Zhang Q, Yu SP, Zhou DS, Wei XP (2015) An efficient method of key-frame extraction based on a cluster algorithm. J Hum Kinet 39:5–13

    Google Scholar 

  • Zheng YH, Jeon B, Xu DH, Wu QM, Zhang H (2015) Image segmentation by generalized hierarchical fuzzy C-means algorithm. J Intell Fuzzy Syst 28(2):961–973

    Google Scholar 

  • Zhu CJ, Zhang Y (2012) Research of improved fuzzy C-mean clustering algorithm. J Henan Univ (Nat Sci) 42:92–95

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China with the Grant Nos. 61573157, 61561024 and 61562038, the Fund of Natural Science Foundation of Guangdong Province of China with the Grant No. 2014A030313454, the Key Project of Natural Statistical Science and Research with the Grant No. 2015LZ30.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kangshun Li.

Ethics declarations

Conflict of interest

The authors declares that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, S., Li, K., Liang, Z. et al. A novel cluster validity index for fuzzy C-means algorithm. Soft Comput 22, 1921–1931 (2018). https://doi.org/10.1007/s00500-016-2453-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-016-2453-y

Keywords

Navigation