Clustering with r-regular graphs

doi:10.1016/j.patcog.2008.11.022

Pattern Recognition

Volume 42, Issue 9, September 2009, Pages 2020-2028

https://doi.org/10.1016/j.patcog.2008.11.022 Get rights and content

Abstract

In this paper, we present a novel graph-based clustering method, where we decompose a (neighborhood) graph into (disjoint) $r$ -regular graphs followed by further refinement through optimizing the normalized cluster utility. We solve the $r$ -regular graph decomposition using a linear programming. However, this simple decomposition suffers from inconsistent edges if clusters are not well separated. We optimize the normalized cluster utility in order to eliminate inconsistent edges or to merge similar clusters into a group within the principle of minimal $K$ -cut. The method is especially useful in the presence of noise and outliers. Moreover, the method detects the number of clusters within a pre-specified range. Numerical experiments with synthetic and UCI data sets, confirm the useful behavior of the method.

Introduction

Clustering is a fundamental technique in exploratory data analysis, which is often encountered into many applications such as data mining, image segmentation, information retrieval, and bioinformatics [9]. There exist many successful clustering algorithms, however, clustering is still a difficult problem because the notion of clusters is strongly dependent on the context as well as the purpose of clustering. The main issues that are unsolved and controversial in clustering, can be summarized as follows [10]. First, it is data-dependent to select appropriate measure of similarity or dissimilarity between data points. There is no guidelines which allow us to choose the best one among diverse measures. Second, there exist widely acceptable conceptual definitions of a cluster, but it is difficult to derive an operational definition leading to a concrete algorithm from the conceptual one. The operational definition is also related to selecting the optimal number of clusters which is usually unknown and is difficult to estimate in real world applications. Third, the quality of clustering is very sensitive to background noise or outliers. Therefore, it is important to remove or to discriminate them from actual clusters, for robust clustering. Fourth, many clustering algorithms, which are based on minimizing the cost of error iteratively, suffer from getting trapped in local minima. Last, the hyperparameters such as the kernel width in spectral clustering are not easy to tune manually [20].

In this paper, we present a novel $r$ -regular graph-based clustering algorithm which can alleviate the aforementioned problems. In Section 2, we give a review of graph-based clustering algorithms. In Section 3, we first define an operational definition of a cluster in terms of an underlying graph structure and the dissimilarity between vertices. The optimal clusters are determined by maximizing the normalized cluster utility. We emphasize that the graph structure of $r$ -regular graphs plays an important role in eliminating inconsistent edges and in calculating the normalized cluster utility. In Section 4, we explain how to decompose a graph into disjoint $r$ -regular graphs. Our proposed clustering algorithm is presented in Section 5, incorporating the decomposed $r$ -regular graph into the definition of a cluster. Numerical experimental results with synthetic and UCI data sets are provided in Section 6, showing the useful behavior of our method. Finally conclusions are drawn and discussions are given in Section 7.

Section snippets

Related work

In this section, we give an overview of graph-based partitional clustering algorithms that are related to our work. Extensive literature survey for clustering can be found in [11], [27]. In general, graph-based partitional clustering algorithms consist of three steps that are summarized below:

(1)
Construct an underlying graph to capture a geometric structure among data points.
(2)
Remove some inconsistent edges according to a rule.
(3)
Identify clusters from resulting connected subgraphs.

Various graph

Clusters, dissimilarity, and cluster validation

We present an operational definition of a cluster based on an underlying graph structure, leading a novel dissimilarity measure between data points, which enables us to define an optimization criterion that quantitatively validates the quality of clustering.

We consider an undirected weighted graph $G = (V, E)$ where $V = {v_{1}, v_{2}, \dots, v_{n}}$ is a set of vertices (nodes) and $E = {e_{ij}}$ is a set of edges with each edge $e_{ij}$ weighted by Euclidean distance between $v_{i}$ and $v_{j}$ . The weighted adjacency matrix of a graph $G$ ,

$r$ -regular graph decomposition

It follows from the arguments described in Section 3 that the essential requirements for the underlying graph of a data set $S$ include: (1) all vertices should have the same degree; (2) vertices in a neighborhood should be connected; and (3) connected subgraphs should be mutually disconnected when they are well separated. In this section, we present a method for constructing an underlying graph satisfying these three requirements. The main task involves a decomposition of a complete graph into

$r$ -regular graph clustering

We present the $r$ -regular graph clustering algorithm which consists of two parts: (1) the $r$ -regular graph decomposition and (2) the refinement where inconsistent edges are eliminated and noise clusters are merged through maximizing the normalized cluster utility. Before we illustrate the detailed clustering algorithm, we introduce two user-specified parameters which involve resolving the ambiguities of noise clusters and controlling the level of resolution.

Definition 7

A noise cluster is a connected

Numerical experiments

We evaluated the clustering performance in terms of classification accuracy using several labeled data sets (labels are hidden to clustering algorithms). Experiments were done with two synthetic data sets with background noise, and one synthetic data set without noise. We also used six UCI data sets [2], including iris, Wisconsin original breast cancer (WBC), Wisconsin diagnostic breast cancer (WDBC), ionosphere, and handwritten digit data (<?MCtwidthcolumnwidth?>Table 1). In Table 1, $n$ is the

Conclusions

We have presented a novel graph-based clustering algorithm that was composed of the $r$ -regular graph decomposition followed by the further optimization in the framework of the maximum normalized cluster utility. Inspired by the perceptual grouping, the $r$ -regular graph decomposition determined a disjoint union of $r$ -regular graphs in such a way that the sum of weights of edges eliminated during that decomposition, was maximized. The $r$ -regular graph decomposition captured the proximity between data

Acknowledgments

J.K. Kim was supported by Microsoft Research Asia fellowship. This work was supported by National Core Research Center for Systems Bio-Dynamics and KOSEF Basic Research Program (Grant R01-2006-000-11142-0).

About the Author—JONG KYOUNG KIM received the B.S. degree in chemistry, the B.S. degree in computer science in 2004, and the M.S. degree in computer science in 2006, from Pohang University of Science and Technology, Pohang, Korea. He is studying for the Ph.D. degree in computer science in the same university. His research interests include statistical machine learning and bioinformatics.

References (28)

E. Hartuv et al.
A clustering algorithm based on graph connectivity
Information Processing Letters
(2000)
R. Urquhart
Graph theoretical clustering based on limited neighborhood sets
Pattern Recognition
(1982)
N. Ahuja
Dot pattern processing using voronoi neighborhoods
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1982)
C.L. Blake, C.J. Merz, UCI repository of machine learning databases,...
U. Brandes et al.
Experiments on graph clustering algorithms
J.-S. Cherng et al.
A hypergraph based clustering algorithm for spatial data sets
B.S. Everitt
Cluster Analysis
(1974)
P. Foggia et al.
Assessing the performance of a graph-based clustering algorithm
K.C. Gowda et al.
Agglomerative clustering using the concept of mutual nearest neighborhood
Pattern Recognition
(1978)
A.K. Jain et al.
Algorithms for Clustering Data
(1988)

A.K. Jain et al.

Statistical pattern recognition: a review

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2000)

A.K. Jain et al.

Data clustering: a review

ACM Computing Surveys

(1999)

R.A. Jarvis et al.

Clustering using a similarity measure based on shared near neighbors

IEEE Transactions on Computers

(1973)

R. Kannan et al.

On clustering: good, bad and spectral

Journal of the ACM

(2004)

Cited by (9)

Cooperative clustering
2010, Pattern Recognition
Data clustering plays an important role in many disciplines, including data mining, machine learning, bioinformatics, pattern recognition, and other fields, where there is a need to learn the inherent grouping structure of data in an unsupervised manner. There are many clustering approaches proposed in the literature with different quality/complexity tradeoffs. Each clustering algorithm works on its domain space with no optimum solution for all datasets of different properties, sizes, structures, and distributions. In this paper, a novel cooperative clustering (CC) model is presented. It involves cooperation among multiple clustering techniques for the goal of increasing the homogeneity of objects within the clusters. The CC model is capable of handling datasets with different properties by developing two data structures, a histogram representation of the pair-wise similarities and a cooperative contingency graph. The two data structures are designed to find the matching sub-clusters between different clusterings and to obtain the final set of clusters through a coherent merging process. The cooperative model is consistent and scalable in terms of the number of adopted clustering approaches. Experimental results show that the cooperative clustering model outperforms the individual clustering algorithms over a number of gene expression and text documents datasets.
Graph matching and learning in pattern recognition in the last 10 years
2014, International Journal of Pattern Recognition and Artificial Intelligence
The Generalized Adjacency, Laplacian and Signless Laplacian Spectra of the Mixed Weighted Corona
2021, International Journal of Nonlinear Science
Study on adjacent spectrum of two kinds of joins of graphs
2020, Modern Physics Letters B
A unified framework for structured graph learning via spectral constraints
2020, Journal of Machine Learning Research
Electronic nose and HS-SPME-GC-MS analysis of volatile substances in Enteromorpha prolifera: Evaluation of different processing temperatures
2013, International Agricultural Engineering Journal

View all citing articles on Scopus

About the Author—SEUNGJIN CHOI received the B.S. and M.S. degrees in electrical engineering from Seoul National University, Korea, in 1987 and 1989, respectively, and the Ph.D. degree in electrical engineering from the University of Notre Dame, Indiana, in 1996. He was a Visiting Assistant Professor in the Department of Electrical Engineering at University of Notre Dame, Indiana, during the Fall semester of 1996. He was with the Laboratory for Artificial Brain Systems, RIKEN, Japan, in 1997 and was an Assistant Professor in the School of Electrical and Electronics Engineering, Chungbuk National University from 1997 to 2000. He is currently an Associate Professor of Computer Science at Pohang University of Science and Technology, Korea. His primary research interests include statistical machine learning, probabilistic graphical models, Bayesian learning, kernel machines, manifold learning, independent component analysis, and pattern recognition.

View full text

Clustering with r-regular graphs

Abstract

Introduction

Section snippets

Related work

Clusters, dissimilarity, and cluster validation

r-regular graph decomposition

r-regular graph clustering

Numerical experiments

Conclusions

Acknowledgments

Information Processing Letters

Pattern Recognition

Dot pattern processing using voronoi neighborhoods

IEEE Transactions on Pattern Analysis and Machine Intelligence

Experiments on graph clustering algorithms

A hypergraph based clustering algorithm for spatial data sets

Cluster Analysis

Assessing the performance of a graph-based clustering algorithm

Agglomerative clustering using the concept of mutual nearest neighborhood

Pattern Recognition

Algorithms for Clustering Data

Statistical pattern recognition: a review

IEEE Transactions on Pattern Analysis and Machine Intelligence

Data clustering: a review

ACM Computing Surveys

Clustering using a similarity measure based on shared near neighbors

IEEE Transactions on Computers

On clustering: good, bad and spectral

Journal of the ACM

Clustering with $r$ -regular graphs

$r$ -regular graph decomposition

$r$ -regular graph clustering