Modified Fuzzy Gap statistic for estimating preferable number of clusters in Fuzzy k-means clustering
Section snippets
Data sets
To evaluate the various internal measures, we analyzed the two artificial data sets, those data sets with noise, a leukemia data set and a yeast data set that were experimentally obtained.
Artificial data sets
We prepared two artificial data sets: C4D3 (4 clusters, 3-dimensional, as shown in Fig. 1), C4D5 (4 clusters, 5-dimensional; data not shown).
Leukemia data set
We used the same data set for gene expression in acute leukemia as that analyzed by Golub et al. (28) by hierarchical clustering. These data, obtained from 38 patients,
Estimation results for number of clusters in C4D3 and C5D3 data sets
As shown in Fig. 1, we prepared two artificial data sets (C4D3 and C4D5), with each having 4 clusters. When the MFGS and Gap statistic were used for the C4D3 data set, the values of the modified Gap(k) (MFGap(k)) and Gap(k), as calculated using Eq. 6, rapidly increased until clusters = 4 (see Fig. 2A, B). From the definition of MFGS and Gap statistic (Eq. 7), each plot for these results had an error bar (sk). Meanwhile, that for PC, FHV, and XB had no error bar (Fig. 2C–E). Since the minimum k
References (36)
- et al.
A genome-wide transcriptional analysis of the mitotic cell cycle
Mol. Cell
(1998) - et al.
A preprocessing method for inferring genetic interaction from gene expression data using Boolean algorithm
J. Biosci. Bioeng.
(2004) - et al.
Application of bioinformatics for DNA microarray data to bioscience, bioengineering and medical field
J. Biosci. Bioeng.
(2006) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
J. Comput. Appl. Math.
(1987)- et al.
A cluster validation index for GK cluster analysis based on relative degree of sharing
Inf. Sci.
(2004) - et al.
Knowledge-assisted recognition of cluster boundaries in gene expression data
Artif. Intell. Med.
(2005) - et al.
Validity index for crisp and fuzzy clusters
Pattern Recognit.
(2004) A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment
Pattern Recogn. Lett.
(2007)- et al.
The transcriptional program of sporulation in budding yeast
Science
(1998) - et al.
Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization
Mol. Biol. Cell
(1998)
An integrated comprehensive workbench for inferring genetic networks: voyagene
J. Bioinform. Comput. Biol.
Cluster analysis and display of genome-wide expression patterns
Proc. Natl. Acad. Sci. USA
Systematic determination of genetic network architecture
Nat. Genet.
Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation
Proc. Natl. Acad. Sci. USA
Analysis of expression profile using fuzzy adaptive resonance theory
Bioinformatics
Fuzzy C-means method for clustering microarray data
Bioinformatics
Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering
Genome Biol.
Transcriptional regulation and function during the human cell cycle
Nat. Genet.
Cited by (26)
Automatic organofacies identification by means of Machine Learning on Raman spectra
2023, International Journal of Coal GeologyWeDIV – An improved k-means clustering algorithm with a weighted distance and a novel internal validation index
2022, Egyptian Informatics JournalCitation Excerpt :Furthermore, several new internal validation indices have been proposed in recent years. Arima et al. [61] proposed a modified Gap index in fuzzy k-means clustering and applied it to gene expression datasets. Mur et al. [62] designed the GS index by combining the Sil index and the concept of local scaling to estimate the number of clusters in spectral clustering.
Unsupervised learning approach in defining the similarity of catchments: Hydrological response unit based k-means clustering, a demonstration on Western Black Sea Region of Turkey
2020, International Soil and Water Conservation ResearchAutomatic pattern recognition of ECG signals using entropy-based adaptive dimensionality reduction and clustering
2017, Applied Soft Computing JournalCitation Excerpt :Krista Rizman Žalik [17] developed a COr index for clusters widely differing in density or size. In recent years, some new concepts, such as granulation error [34], gap statistic [35], non-local spatial information [36], fuzzy partition stability[37] and geometrical compactness [38], have been involved in the development of fuzzy indexes. More fuzzy indices can be found in Refs. [18] and [21].
A novel automatic picture fuzzy clustering method based on particle swarm optimization and picture composite cardinality
2016, Knowledge-Based SystemsCitation Excerpt :Scanning: This is the simplest way which tries each number of clusters in a given range for clustering and takes one having the best clustering quality in terms of validity indices as the final number of clusters. This approach was used in the works of Alp Erilli et al. [1], Arima et al. [2], Fang & Wang [11], Fujita et al. [12], Lee & Olafsson [15], and Liang et al. [16]. However, computational complexity is the main drawback of this approach since it has to assess all candidates to find the best one.
Multi-model control of blast furnace burden surface based on fuzzy SVM
2015, Neurocomputing