Clustering as physically inspired energy minimization

doi:10.1016/j.patcog.2018.09.008

Pattern Recognition

Volume 86, February 2019, Pages 265-280

https://doi.org/10.1016/j.patcog.2018.09.008 Get rights and content

Highlights

•
We more completely map the energy model of statistical physics onto clustering problem, and our method can be totally unsupervised.
•
We make a perfect analogy with the energy model used in vision and borrow the methods from vision field to clustering under this mapping.
•
We propose a data point local density estimation method which can account for the datasets with arbitrary shapes and topologies.
•
We point out that the energy model of spectral clustering methods (such as Normalized-cut [22]) is incomplete compared with our energy model.

Abstract

We formulate the task of density based clustering as energy minimization, using both binary/pairwise energy term and unary/data energy term (the latter was largely ignored in previous clustering methods). Binary energy is defined in terms of inhomogeneity in local point density. While most previous methods use binary/pairwise energy only, the unary/data energy can represent the natural tendency of a given point belonging to a given cluster, which is also crucial for the clustering. Since our energy is expressed as the sum of a unary (data) term and a binary (pairwise or smoothness) term, we can thus make a perfect analogy with the energy model used in vision and borrow everything (such as the optimization algorithms) from vision field to clustering under this mapping. This correspondence provides an entirely new view point in handling the clustering problem, and in fact many mature methods and algorithms are already provided in the vision field and can be adopted by the clustering field readily. During our energy optimization, a sequence of energy minima are found to recursively partition the points, and thus find a hierarchical embedding of clusters that are increasingly homogeneous in density. Disjoint clusters with the same density are identified separately. Our clustering method is totally unsupervised (which is superior to most existing methods, as those listed below). It does not need any user input parameters (say number of segments, any bandwidth parameter, cutoff distance/scale, etc.), except one can specify the homogeneity criterion — the degree of acceptable fluctuation in density within a cluster (which is target-related), or let it be specified automatically in a hierarchical way. We conduct experiments on both synthetic datasets and real-image tasks. Experimental results on synthetic datasets show that our method is able to handle clusters of different shapes, sizes and densities. We present the performance of our approach using the commonly used energy optimization algorithms from vision such as ICM, LBP, Graph-cut, Mean field theory algorithm, as well as the integer programming algorithm. We also contrast our performance with several other commonly used clustering algorithms, such as k-means, fussy c-means, DBSCAN, as well as a recent state-of-the-art clustering method as reported in [40]. Our experiments on real-image tasks further validate the performance of the method. In addition, we show that the family of commonly used spectral, graph clustering algorithms (such as Normalized-cut) uses only the binary energy term while ignoring the unary energy term; therefore, their energy model is incomplete compared with ours.

Introduction

This paper presents an approach to detection of clusters characterized by homogeneity of density inside and a discontinuity of density across border. We define an energy function that captures density variation and minimize it to find the clusters of the data points. This is in contrast with many methods summarized in literature [1], [2], [3], [4], as well as the more recent ones in [40], [41], [42], [43], [44], [45], [46], [47], [48], [49]. The recent literature [1] classifies clustering algorithms into traditional and modern ones. They divide the traditional algorithms into the following 9 classes, based on: (1) partition (2) hierarchy (3) fuzzy theory (4) distribution (5) density (6) graph theory (7) grid (8) fractal theory (9) model. They divide the modern clustering methods into the following 10 classes, based on: (1) kernel (2) ensemble (3) swarm intelligence (4) quantum theory (5) spectral graph theory (6) affinity propagation (7) density and distance (8) spatial data (9) data stream (10) large-scale data. For example, the popular k-means method [20] is based on partition, mean-shift method [21] and DBCSAN [25] are based on density, and the Normalized-cut method [22] is based on spectral graph theory.

Our approach applies ideas from the analysis of particle interactions done in statistical physics to associate an energy function with the network of data points, and perform graphical model energy optimization using Ising model [23] from physics. Desired clusters are then obtained corresponding to energy minima. Energy is defined as consisting of a unary/data energy term and another, binary/smoothness term capturing interactions among data points. Previous work [5], [6], [7], [8], [9], [10] has also considered using concepts from statistical physics for the clustering problem, but they did so only partially. For example, Rose et al. [5], uses energy but defines it heuristically (e.g., based on square distance between points), instead of following the Ising model; in [6], the energy function does not contain the unary term, which led the authors to conclude that the energy function is symmetric in the global permutation of all labels. However, as we will discuss later, every data point has its natural tendency to belong to a certain cluster; for example, a data point located in a high density area is more likely to belong to the cluster with higher density and vice versa. With the inclusion of the unary energy term, we can quantify this natural tendency automatically.

A key issue our method accounts for when applying the energy optimization scheme to clustering problem is that we do not know the number of clusters in advance. In addition, we avoid defining clustering affinity among the data points based on their mutual distances, since being closer does not guarantee more likely belonging to the same cluster (Fig. 1).

The contributions of this work as well as the essential difference with the previous “energy based” clustering schemes are as follows:

1.
We more completely map the energy model of statistical physics onto clustering problem; especially, we emphasize the importance of the unary energy term in clustering. Therefore, our method can be totally unsupervised; which is not true for most previous methods where some key input parameters are needed, such as the number of segments, the bandwidth parameters, cutoff distance/scale, etc.
In our work, we do not just introduce or emphasize the unary term in clustering, but more importantly we put the unary and binary energy terms in the principled way defined by statistical physics, not in some heuristic way. Therefore our entire scheme is systematic and principled, which also corresponds to the energy formulation used in computer vision immediately, and hence opens the doors for borrowing the mature concepts/algorithms/solutions in computer vision for the clustering problem as stated below.
2.
Since our energy model consists of a unary (data) term and a binary (pairwise or smoothness) term, we can thus make a perfect analogy with the energy model used in vision and borrow the concepts, ideas, and schemes (such as the optimization algorithms) from vision field to clustering under this mapping. This correspondence was largely ignored in the clustering field and hence this mapping provides an entirely new view point for handling the clustering problem.
3.
We propose a data point local density estimation method which can account for the datasets with arbitrary shapes and topologies (such as with holes inside).
4.
We point out that the spectral clustering methods (such as Normalized-cut [22]) use only the binary/pairwise energy term while leaving out the unary/data energy term; therefore, their energy model is incomplete compared with ours.

The organization of this paper is as follows. In Section 2, we present our method, including determination of the point local density and its homogeneity, and the energy function with its unary and binary terms. In Section 3, we show that the energy model considered in the graph spectral clustering method (such as Normalized-cut) is incomplete compared with ours. Section 4 presents the experimental results we obtain on both synthetic datasets and real-image tasks, as well as the comparison with other related methods. Finally we conclude in Section 5.

Section snippets

Energy optimization based clustering

We draw an analogy with the image segmentation problem to formulate the energy based objective function. In image segmentation, the property of each node (image pixel or superpixel) that distinguishes it from other nodes (pixels) is the pixel intensity, and the adjacent pixels with similar intensities are more likely to belong to the same segment of the image. Analogously, in clustering problem, the property of each node (data point) that distinguishes it from other nodes is the point local

The connections with Graph spectral based clustering approach

In this section, we relate our algorithm with the graph spectral method [34] because it shares some important characteristics with our algorithm. They are both based on the graphical model structure connecting all the points, and both also consider interaction among them via edge weight or adjacency matrix. We show that the graph spectral method (such as Normalized-cut [22]) uses only the binary or smoothness term in the energy model, and ignores the unary or data term; therefore, their energy

Experiments on synthetic dataset

We first evaluate our method on some synthetic datasets. In our experiment, we evaluate the performance of our clustering method and compare it with other commonly used and best-performing methods (Sections 4.1.1 and 4.1.2). Meanwhile, we compare the performance of various specific energy optimization algorithms within our scheme (Sections 4.1.3). It should be noted that most previous methods require some heuristic user input for the clustering (such as cluster number, distance scale or minimum

Conclusion

In this work, we propose a clustering method based on physics-inspired graphical model energy optimization. The clustering method is unsupervised with no user input parameter needed. The energy formulation uses the Ising model as the basis, and we emphasize the using of the unary/data energy (in addition to the commonly used binary/pairwise energy) in the energy model, which represents the natural tendency of a node belonging to a certain cluster. We perform hierarchical clustering, where the

Acknowledgments

The author would like to give the special thanks to Prof. Mark Hasegawa-Johnson at University of Illinois at Urbana-Champaign, for his kind and detailed editing of the paper content in many sections. The same thanks go for the anonymous reviewer as well. We also want to thank Mr. Marcus Edvall at Tomlab Optimization for the continuous support on the software license to facilitate our research. The support of the Office of Naval Research under grant N00014-16-1-2314 is gratefully acknowledged.

Huiguang Yang received his Ph.D. from Electrical and Computer Engineering, University of Illinois, Urbana-Champaign. He received his B.S. in Environmental Engineering, Tsinghua University, Beijing, China, and received his M.S. in Atmospheric Sciences, University of Illinois, Urbana-Champaign. His research in Electrical and Computer Engineering focuses on image processing and computer vision, such as image segmentation and object recognition. He is now a researcher at Samsung Research Institute

References (49)

D. Xu et al.
A comprehensive survey of clustering algorithms
Ann. Data. Sci.
(2015)
A. K. Jain et al.
Data clustering: a review
ACM Comput. Surv. (CSUR)
(1999)
A.K. Jain
Data clustering: 50 years beyond k-means
Pattern Recognit. Lett.
(2010)
P. Berkhin
Survey of clustering data mining techniques
book chapter in Grouping Multidimensional Data, pp 25–71
(2002)
K. Rose et al.
Statistical mechanics and phase transitions in clustering
Phys. Rev. Lett.
(1990)
M. Blatt et al.
Superparamagnetic clustering of data
Phys. Rev. Lett.
(1996)
A. Noack
Energy-based clustering of graphs with nonuniform degrees
book chapter in Graph Drawing, Volume 3843 of the series Lecture Notes in Computer Science, pp 309–320
(2005)
A. Noack
Energy models for graph clustering
J. Graph Algorithms Appl.
(2007)
Y. Fu et al.
Application of statistical mechanics to NP-complete problems in combinatorial optimization
J. Phys. A
(1986)
R. Albert et al.
Statistical mechanics of complex networks
Rev. Mod. Phys.
(2002)

J.E. Besag

On the statistical analysis of dirty pictures

J. R. Stat. Soc. Ser. B

(1986)

J. Pearl

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (2nd ed.)

(1988)

P.F. Felzenszwalb

Efficient belief propagation for early vision

In CVPR

(2004)

J. Yedidia et al.

Generalized belief propagation

Advances in Neural Information Processing Systems

(2000)

Y. Boykov et al.

Fast approximate energy minimization via graph cuts

PAMI

(2001)

V. Kolmogorov et al.

What energy functions can be minimized via graph cuts

PAMI

(2004)

M. Saito et al.

Application of the mean field methods to MRF optimization in computer vision

CVPR

(2012)

H.J. Kappen et al.

Mean field theory for graphical models

In Advanced Mean Field Methods: Theory and Practice, chapter 4

(2001)

Tomlab optimization website,...

J. MacQueen

Some methods for classification and analysis of multivariate observations

Proc. Fifth Berkeley Symp Math Stat Probab

(1967)

D. Comaniciu et al.

Mean shift: a robust approach toward feature space analysis

IEEE Trans. Pattern Anal. Mach. Intell.

(2002)

J. Shi et al.

Normalized cuts and image segmentation

IEEE Trans. Pattern Anal. Mach. Intell.

(2000)

E. Ising

Beitrag zur theorie des ferromagnetismus

Z. Phys.

(1925)

C. Wang et al.

Markov random field modeling, inference & learning in computer vision & image understanding: a survey

Comput. Vision Image Understanding

(2013)

Cited by (1)

Novel fuzzy clustering algorithm with variable multi-pixel fitting spatial information for image segmentation
2022, Pattern Recognition
Citation Excerpt :
Its aim is to make the features of the same region exhibit similarity, while different regions are different. The methods of image segmentation mainly include thresholding [5,6], mean sift [7], region growing [8,9], deep learning [10,11], clustering [12,13] and so on. Threshold segmentation [6] segments an image into different regions by one or several thresholds.
Spatial information is often used to enhance the robustness of traditional fuzzy c-means (FCM) clustering algorithms. Although some recently emerged improvements are remarkable, the computational complexity of these algorithms is high, which may lead to lack of practicability. To address this problem, an efficient variant named the fuzzy clustering algorithm with variable multi-pixel fitting spatial information (FCM-VMF) is presented. First, a fuzzy clustering algorithm with multi-pixel fitting spatial information (FCM-MF) is developed. Specifically, by dividing the input image into several filter windows, the spatial information of all pixels in each filter window can be obtained simultaneously by fitting the pixels in its corresponding neighbourhood window, which enormously reduces the computational complexity. However, the FCM-MF may result in the loss of edge information. Therefore, the FCM-VMF integrates a variable window strategy with FCM-MF. In this strategy, to preserve more edge information, the sizes of the filter window and generalized neighbourhood window are adaptively reduced. The experimental results show that FCM-VMF is as effective as some recent algorithms. Notably, the FCM-VMF has extremely high efficiency, which means it has a better prospect of application.

Narendra Ahuja received his Ph.D. from the University of Maryland, College Park, in 1979. He is Donald Biggar Willet Professor in Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign. His fields of professional interest are next generation cameras, 3D computer vision, video analysis, image analysis, pattern recognition, human computer interaction, image processing, image synthesis, and robotics.

View full text

Clustering as physically inspired energy minimization

Highlights

Abstract

Introduction

Section snippets

Energy optimization based clustering

The connections with Graph spectral based clustering approach

Experiments on synthetic dataset

Conclusion

Acknowledgments

A comprehensive survey of clustering algorithms

Ann. Data. Sci.

Data clustering: a review

ACM Comput. Surv. (CSUR)

Data clustering: 50 years beyond k-means

Pattern Recognit. Lett.

Survey of clustering data mining techniques

book chapter in Grouping Multidimensional Data, pp 25–71

Statistical mechanics and phase transitions in clustering

Phys. Rev. Lett.

Superparamagnetic clustering of data

Phys. Rev. Lett.

Energy-based clustering of graphs with nonuniform degrees

book chapter in Graph Drawing, Volume 3843 of the series Lecture Notes in Computer Science, pp 309–320

Energy models for graph clustering

J. Graph Algorithms Appl.

Application of statistical mechanics to NP-complete problems in combinatorial optimization

J. Phys. A

Statistical mechanics of complex networks

Rev. Mod. Phys.

On the statistical analysis of dirty pictures

J. R. Stat. Soc. Ser. B

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (2nd ed.)

Efficient belief propagation for early vision

In CVPR

Generalized belief propagation

Advances in Neural Information Processing Systems

Fast approximate energy minimization via graph cuts

PAMI

What energy functions can be minimized via graph cuts

PAMI

Application of the mean field methods to MRF optimization in computer vision

CVPR

Mean field theory for graphical models

In Advanced Mean Field Methods: Theory and Practice, chapter 4

Some methods for classification and analysis of multivariate observations

Proc. Fifth Berkeley Symp Math Stat Probab

Mean shift: a robust approach toward feature space analysis

IEEE Trans. Pattern Anal. Mach. Intell.

Normalized cuts and image segmentation

IEEE Trans. Pattern Anal. Mach. Intell.

Beitrag zur theorie des ferromagnetismus

Z. Phys.

Markov random field modeling, inference & learning in computer vision & image understanding: a survey

Comput. Vision Image Understanding