Elsevier

Pattern Recognition

Volume 60, December 2016, Pages 531-542
Pattern Recognition

Fuzzy based affinity learning for spectral clustering

https://doi.org/10.1016/j.patcog.2016.06.011Get rights and content

Highlights

  • A fuzzy-set based affinity graph construction method is proposed.

  • The model is capable of capturing and combining subtle similarity information distributed over discriminative feature subspaces.

  • Experiments on different kinds of data show the superiority of the proposed approach compared to other state-of-the-art methods.

Abstract

Spectral clustering makes use of spectral-graph structure of an affinity matrix to partition data into disjoint meaningful groups. It requires robust and appropriate affinity graphs as input in order to form clusters with desired structures. Constructing such affinity graphs is a nontrivial task due to the ambiguity and uncertainty inherent in the raw data. Most existing spectral clustering methods typically adopt Gaussian kernel as the similarity measure, and employ all available features to construct affinity matrices with the Euclidean distance, which is often not an accurate representation of the underlying data structures, especially when the number of features is large. In this paper, we propose a novel unsupervised approach, named Axiomatic Fuzzy Set-based Spectral Clustering (AFSSC), to generate more robust affinity graphs via identifying and exploiting discriminative features for improving spectral clustering. Specifically, our model is capable of capturing and combining subtle similarity information distributed over discriminative feature subspaces to more accurately reveal the latent data distribution and thereby lead to improved data clustering. We demonstrate the efficacy of the proposed approach on different kinds of data. The results have shown the superiority of the proposed approach compared to other state-of-the-art methods.

Introduction

Unsupervised data analysis using clustering algorithms provides a useful tool to explore data structures. Clustering methods [1], [2] have been studied in many contexts and disciplines such as data mining, document retrieval, image segmentation and pattern classification. The aim of clustering is to group pattern on the basis of similarity (or dissimilarity) criteria where groups (or clusters) are sets of similar patterns. Traditional clustering approaches, such as the k-means and Gaussian mixture models, which are based on estimating explicit models of the data, provide high quality results when the data is distributed according to the assumed models. However, when data appears in more complex or unknown manners, these methods tend to fail. An alternative clustering approach that was shown to handle such structured data is spectral clustering. It does not require estimating an explicit model of data distribution, rather a spectral analysis of the pairwise similarities needs to be conducted.

Spectral clustering normally contains two steps: constructing an affinity graph based on appropriate metric and establishing an appropriate way to “cut” the graph. Plenty of approaches exist to address the graph cut problem, such as minimal cut [3], ratio cut [4] and normalized cut [5], etc. For constructing affinity graph, there are mainly three popular approaches: (1) The ε-neighborhood graph: This kind of graph is constructed by connecting all points whose pairwise distances are smaller than a pre-set constant ε. (2) The k-nearest neighbor graph: Here the goal is to connect vertex vi and vj if vj is among the k-nearest neighbors of vi. (3) The fully connected graph: Here all vertices are connected and the edges are weighted by the positive similarities between each pair of vertices. According to Luxburg in [6], all three types of affinity graphs mentioned above are regularly used in spectral clustering, and there is no theoretical analysis on how the choice of the affinity graph would influence the performance of spectral clustering.

The crucial problem of constructing the fully connected graph is to define the pairwise similarity. The notion of data similarity is often intimately tied to a specific metric function, typically the ℓ2-norm (e.g. the Euclidean metric) measured with respect to the whole feature space. However, defining the pairwise similarity for effective spectral clustering is fundamentally a challenging problem [7] given complex data that are often of high dimension and heterogeneous, when no prior knowledge or supervision is available. Trusting all available features blindly for measuring pairwise similarities and constructing data graphs is susceptible to unreliable or noisy features [8], particularly so for real-world visual data, e.g. images and videos, where signals can be intrinsically inaccurate and unstable owing to uncontrollable sources of variations and changes in illumination, context, occlusion and background clutters etc. [9]. Moreover, confining the notion of similarity to the ℓ2-norm metric implicitly imposes unrealistic assumption on complex data structures that do not necessarily possess the Euclidean behavior [8].

In this paper, our aim is to deduce robust pairwise similarity so as to construct more meaningful affinity graph, yielding performance improvement of spectral clustering. To achieve this goal, we first formulate a unified and generalized data distance inference framework based on AFS fuzzy theory [10] with two innovations: (1) Instead of using the complete feature space as a whole, the proposed model is designed to avoid indistinctive features using fuzzy membership function, yielding similarity graphs that can better express the underlying semantic structure in data; this will significantly reduce the number of features used in the clustering process. (2) The Euclidean assumption for data similarity inference is relaxed using fuzzy logic operations defined in AFS. The data distance is then put into the Gaussian kernel to enforce locality. It is worth mentioning that the distinctive features used to represent samples may be different from one another, e.g., every sample could have its own feature subspace. Accordingly the distance measured is dependent on the pairwise feature subspace. A similar idea was presented in [11], which states that different similarities can be induced from a given sample pair if distinct propositions are taken or different questions are asked about data commonalities. In our proposed model, the assumption is that there is no optimal feature subspace which works well for all samples. Each sample pair has its own best feature subspace in terms of distance measure. In terms of AFS clustering, we propose a new method to solve the similarity matrix instead of using the Transitive closure, which needs additional evaluation criteria to obtain clustering result. Extensive experiments have demonstrated that the proposed method is superior compared to both the original spectral clustering and the AFS clustering when the number of features is large.

The rest of this paper is organized as follows. Section 2 presented some previous work on spectral clustering. The main ideas of AFS theory are described in Section 3. In Section 4 we propose a novel approach for generating robust affinity graphs. Experimental results on UCI datasets, USPS handwritten digits and face images are presented in Section 5 and we conclude our work in Section 6.

Section snippets

Related work

Large amount of work has been conducted on spectral clustering [5], [12], [13], [14], [15], [16]. Generally, existing approaches for improving spectral clustering performance can be classified into two paradigms: (1) How to improve data grouping while the affinity graph is fixed [5], [12], [15]. For example, Xiang and Gong [15] proposed to identify informative and relevant eigenvectors of a data affinity matrix. (2) How to construct appropriate affinity graphs so as to improve the clustering

AFS theory

The proposed affinity matrix construction approach is built based on the AFS theory. AFS theory was originally proposed in [10] and then extensively developed in [27], [24], [28], etc. AFS fuzzy sets determined by membership functions and their logic operations are algorithmically determined according to distributions of the original data and the semantics of the fuzzy sets. The AFS framework enables the membership functions and fuzzy logic operations to be created based on information within a

Feature descriptions for samples

Given a set of data points X in Rn, and a feature set F={f1,f2,,fl}, a set of fuzzy terms M={mi,j|1il,1jki} can be defined, where mi,1,mi,2,,mi,l are fuzzy terms associated with the feature fi in F. Usually we set k=3 meaning that each feature fi is associated with 3 fuzzy terms mi,1,mi,2,mi,3, representing semantic concepts “large”, “medium”, and “small” respectively. However if a certain feature fi is a Boolean parameter, ki is set to 2 and only two fuzzy terms mi,1,mi,2 are defined for

Experimental settings

The proposed method AFSSC method and the widely used NJW spectral clustering (SC) [12], self-tuning spectral clustering (STSC) [13], and AFS clustering (AFS) [26] methods are applied on the same data sets from UCI, USPS handwritten digits, as well as CMU-PIE and Yale face image sets. In the experiment, for SC, the value of σ is obtained by searching and the one which gives the best result is picked, as suggested in [12]. With STSC and AFSSC, M varies from 1 to 100 (including 7 as suggested in

Conclusion

In this paper, a novel generalized and unsupervised approach to constructing more robust and meaningful data affinity graphs for improving spectral clustering have been presented. Instead of acquiescently using the Euclidean distance, we adopt a fuzzy-theoretic definition on data similarity and derive affinity graphs by fuzzy membership function. Furthermore, rather than blindly trusting all available variables, affinity graphs are derived through capturing and combining subtle pairwise

Qilin Li received the BSc degree in Computer Science from Sun Yat-Sen University, China, in 2013. Currently, he is doing his MPhil degree in Curtin University, Australia. His research interests include pattern recognition, computer vision and machine learning.

References (41)

  • L. Hagen et al.

    New spectral methods for ratio cut partitioning and clustering

    IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

    (1992)
  • J. Shi et al.

    Normalized cuts and image segmentation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • U. Von Luxburg

    A tutorial on spectral clustering

    Stat. Comput.

    (2007)
  • X. Zhu, C.C. Loy, S. Gong, Constructing robust affinity graphs for spectral clustering, in: 2014 IEEE Conference on...
  • S. Gong, C.C. Loy, T. Xiang, Security and surveillance, in: Visual Analysis of Humans, Springer, 2011, pp....
  • D. Lin, An information-theoretic definition of similarity, in: ICML, vol. 98, 1998, pp....
  • A.Y. Ng et al.

    On spectral clusteringanalysis and an algorithm

    Adv. Neural Inf. Process. Syst.

    (2002)
  • L. Zelnik-Manor, P. Perona, Self-tuning spectral clustering, in: Advances in Neural Information Processing Systems,...
  • C. Fowlkes et al.

    Spectral grouping using the Nystrom method

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2004)
  • H.-C. Huang, Y.-Y. Chuang, C.-S. Chen, Affinity aggregation for spectral clustering, in: 2012 IEEE Conference on...
  • Cited by (27)

    • NMF with feature relationship preservation penalty term for clustering problems

      2021, Pattern Recognition
      Citation Excerpt :

      It has successful applications in computer vision [2], bioinformatics [3], data engineering [4], and social networks [5], to name a few. A wide variety of clustering algorithms has been proposed in the literature in recent years and most (but not all) of these algorithms can be grouped into: k-means-based clustering [6], fuzzy-based clustering [7], K-NN-based clustering [8], learning-based clustering [9], support vector-based clustering [10]. However, in many pattern recognition problems, non-negativity is inherent to the data considered.

    • Semi-Supervised Multi-view clustering based on orthonormality-constrained nonnegative matrix factorization

      2020, Information Sciences
      Citation Excerpt :

      For instance, Zhang et al. [25] use the pairwise constraint information to construct the intrinsic graph and the penalty graph, respectively. Li et al. [26] exploit discriminative feature subspace to construct a robust affinity graph for improving the spectral clustering performance. Yang et al. [27] construct a hierarchical bipartite graph by exploiting multi-layer anchors with a pyramid-style structure.

    View all citing articles on Scopus

    Qilin Li received the BSc degree in Computer Science from Sun Yat-Sen University, China, in 2013. Currently, he is doing his MPhil degree in Curtin University, Australia. His research interests include pattern recognition, computer vision and machine learning.

    Yan Ren received the BSc degree in Mathematics and Applied Mathematics from Liaoning Normal University, Dalian, China, in 2004, the MSc degree in Applied Mathematics from Dalian Maritime University, Dalian, China, in 2007, and the PhD degree in Control Theory and Control Engineering from Dalian University of Technology, Dalian, China, in 2011. She is currently a lecturer in School of Automation at Shenyang Aerospace University. Her current research interests include AFS theory and its applications, knowledge discovery and representations, and face image analysis.

    Ling Li obtained her Bachelor of Computer Science from Sichuan University, China, Master of Electrical Engineering from China Academy of Post and Telecommunication, and PhD of Computer Engineering from Nanyang Technological University (NTU), Singapore. She worked as an Assistant Professor and subsequently an Associate Professor in the School of Computer Engineering in NTU. She is now an Associate Professor in the Department of Computing at Curtin University in Perth, Australia. Her research interest is mainly in computer graphics and vision, and artificially intelligent beings. She has given a number of keynotes addresses in international conferences and published over 100 referred research papers in international journals and conferences.

    Wanquan Liu received the BSc degree in Applied Mathematics from Qufu Normal University, PR China, in 1985, the MSc degree in Control Theory and Operation Research from Chinese Academy of Science in 1988, and the PhD degree in Electrical Engineering from Shanghai Jiaotong University, in 1993. He once held the ARC Fellowship, U2000 Fellowship and JSPS Fellowship and attracted research funds from different resources over 2 million dollars. He is currently an Associate Professor in the Department of Computing at Curtin University and is in editorial board for seven international journals. His current research interests include large-scale pattern recognition, signal processing, machine learning, and control systems.

    View full text