Clustering as physically inspired energy minimization
Introduction
This paper presents an approach to detection of clusters characterized by homogeneity of density inside and a discontinuity of density across border. We define an energy function that captures density variation and minimize it to find the clusters of the data points. This is in contrast with many methods summarized in literature [1], [2], [3], [4], as well as the more recent ones in [40], [41], [42], [43], [44], [45], [46], [47], [48], [49]. The recent literature [1] classifies clustering algorithms into traditional and modern ones. They divide the traditional algorithms into the following 9 classes, based on: (1) partition (2) hierarchy (3) fuzzy theory (4) distribution (5) density (6) graph theory (7) grid (8) fractal theory (9) model. They divide the modern clustering methods into the following 10 classes, based on: (1) kernel (2) ensemble (3) swarm intelligence (4) quantum theory (5) spectral graph theory (6) affinity propagation (7) density and distance (8) spatial data (9) data stream (10) large-scale data. For example, the popular k-means method [20] is based on partition, mean-shift method [21] and DBCSAN [25] are based on density, and the Normalized-cut method [22] is based on spectral graph theory.
Our approach applies ideas from the analysis of particle interactions done in statistical physics to associate an energy function with the network of data points, and perform graphical model energy optimization using Ising model [23] from physics. Desired clusters are then obtained corresponding to energy minima. Energy is defined as consisting of a unary/data energy term and another, binary/smoothness term capturing interactions among data points. Previous work [5], [6], [7], [8], [9], [10] has also considered using concepts from statistical physics for the clustering problem, but they did so only partially. For example, Rose et al. [5], uses energy but defines it heuristically (e.g., based on square distance between points), instead of following the Ising model; in [6], the energy function does not contain the unary term, which led the authors to conclude that the energy function is symmetric in the global permutation of all labels. However, as we will discuss later, every data point has its natural tendency to belong to a certain cluster; for example, a data point located in a high density area is more likely to belong to the cluster with higher density and vice versa. With the inclusion of the unary energy term, we can quantify this natural tendency automatically.
A key issue our method accounts for when applying the energy optimization scheme to clustering problem is that we do not know the number of clusters in advance. In addition, we avoid defining clustering affinity among the data points based on their mutual distances, since being closer does not guarantee more likely belonging to the same cluster (Fig. 1).
The contributions of this work as well as the essential difference with the previous “energy based” clustering schemes are as follows:
- 1.
We more completely map the energy model of statistical physics onto clustering problem; especially, we emphasize the importance of the unary energy term in clustering. Therefore, our method can be totally unsupervised; which is not true for most previous methods where some key input parameters are needed, such as the number of segments, the bandwidth parameters, cutoff distance/scale, etc.
In our work, we do not just introduce or emphasize the unary term in clustering, but more importantly we put the unary and binary energy terms in the principled way defined by statistical physics, not in some heuristic way. Therefore our entire scheme is systematic and principled, which also corresponds to the energy formulation used in computer vision immediately, and hence opens the doors for borrowing the mature concepts/algorithms/solutions in computer vision for the clustering problem as stated below.
- 2.
Since our energy model consists of a unary (data) term and a binary (pairwise or smoothness) term, we can thus make a perfect analogy with the energy model used in vision and borrow the concepts, ideas, and schemes (such as the optimization algorithms) from vision field to clustering under this mapping. This correspondence was largely ignored in the clustering field and hence this mapping provides an entirely new view point for handling the clustering problem.
- 3.
We propose a data point local density estimation method which can account for the datasets with arbitrary shapes and topologies (such as with holes inside).
- 4.
We point out that the spectral clustering methods (such as Normalized-cut [22]) use only the binary/pairwise energy term while leaving out the unary/data energy term; therefore, their energy model is incomplete compared with ours.
The organization of this paper is as follows. In Section 2, we present our method, including determination of the point local density and its homogeneity, and the energy function with its unary and binary terms. In Section 3, we show that the energy model considered in the graph spectral clustering method (such as Normalized-cut) is incomplete compared with ours. Section 4 presents the experimental results we obtain on both synthetic datasets and real-image tasks, as well as the comparison with other related methods. Finally we conclude in Section 5.
Section snippets
Energy optimization based clustering
We draw an analogy with the image segmentation problem to formulate the energy based objective function. In image segmentation, the property of each node (image pixel or superpixel) that distinguishes it from other nodes (pixels) is the pixel intensity, and the adjacent pixels with similar intensities are more likely to belong to the same segment of the image. Analogously, in clustering problem, the property of each node (data point) that distinguishes it from other nodes is the point local
The connections with Graph spectral based clustering approach
In this section, we relate our algorithm with the graph spectral method [34] because it shares some important characteristics with our algorithm. They are both based on the graphical model structure connecting all the points, and both also consider interaction among them via edge weight or adjacency matrix. We show that the graph spectral method (such as Normalized-cut [22]) uses only the binary or smoothness term in the energy model, and ignores the unary or data term; therefore, their energy
Experiments on synthetic dataset
We first evaluate our method on some synthetic datasets. In our experiment, we evaluate the performance of our clustering method and compare it with other commonly used and best-performing methods (Sections 4.1.1 and 4.1.2). Meanwhile, we compare the performance of various specific energy optimization algorithms within our scheme (Sections 4.1.3). It should be noted that most previous methods require some heuristic user input for the clustering (such as cluster number, distance scale or minimum
Conclusion
In this work, we propose a clustering method based on physics-inspired graphical model energy optimization. The clustering method is unsupervised with no user input parameter needed. The energy formulation uses the Ising model as the basis, and we emphasize the using of the unary/data energy (in addition to the commonly used binary/pairwise energy) in the energy model, which represents the natural tendency of a node belonging to a certain cluster. We perform hierarchical clustering, where the
Acknowledgments
The author would like to give the special thanks to Prof. Mark Hasegawa-Johnson at University of Illinois at Urbana-Champaign, for his kind and detailed editing of the paper content in many sections. The same thanks go for the anonymous reviewer as well. We also want to thank Mr. Marcus Edvall at Tomlab Optimization for the continuous support on the software license to facilitate our research. The support of the Office of Naval Research under grant N00014-16-1-2314 is gratefully acknowledged.
Huiguang Yang received his Ph.D. from Electrical and Computer Engineering, University of Illinois, Urbana-Champaign. He received his B.S. in Environmental Engineering, Tsinghua University, Beijing, China, and received his M.S. in Atmospheric Sciences, University of Illinois, Urbana-Champaign. His research in Electrical and Computer Engineering focuses on image processing and computer vision, such as image segmentation and object recognition. He is now a researcher at Samsung Research Institute
References (49)
- et al.
A comprehensive survey of clustering algorithms
Ann. Data. Sci.
(2015) - et al.
Data clustering: a review
ACM Comput. Surv. (CSUR)
(1999) Data clustering: 50 years beyond k-means
Pattern Recognit. Lett.
(2010)Survey of clustering data mining techniques
book chapter in Grouping Multidimensional Data, pp 25–71
(2002)- et al.
Statistical mechanics and phase transitions in clustering
Phys. Rev. Lett.
(1990) - et al.
Superparamagnetic clustering of data
Phys. Rev. Lett.
(1996) Energy-based clustering of graphs with nonuniform degrees
book chapter in Graph Drawing, Volume 3843 of the series Lecture Notes in Computer Science, pp 309–320
(2005)Energy models for graph clustering
J. Graph Algorithms Appl.
(2007)- et al.
Application of statistical mechanics to NP-complete problems in combinatorial optimization
J. Phys. A
(1986) - et al.
Statistical mechanics of complex networks
Rev. Mod. Phys.
(2002)
On the statistical analysis of dirty pictures
J. R. Stat. Soc. Ser. B
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (2nd ed.)
Efficient belief propagation for early vision
In CVPR
Generalized belief propagation
Advances in Neural Information Processing Systems
Fast approximate energy minimization via graph cuts
PAMI
What energy functions can be minimized via graph cuts
PAMI
Application of the mean field methods to MRF optimization in computer vision
CVPR
Mean field theory for graphical models
In Advanced Mean Field Methods: Theory and Practice, chapter 4
Some methods for classification and analysis of multivariate observations
Proc. Fifth Berkeley Symp Math Stat Probab
Mean shift: a robust approach toward feature space analysis
IEEE Trans. Pattern Anal. Mach. Intell.
Normalized cuts and image segmentation
IEEE Trans. Pattern Anal. Mach. Intell.
Beitrag zur theorie des ferromagnetismus
Z. Phys.
Markov random field modeling, inference & learning in computer vision & image understanding: a survey
Comput. Vision Image Understanding
Cited by (1)
Novel fuzzy clustering algorithm with variable multi-pixel fitting spatial information for image segmentation
2022, Pattern RecognitionCitation Excerpt :Its aim is to make the features of the same region exhibit similarity, while different regions are different. The methods of image segmentation mainly include thresholding [5,6], mean sift [7], region growing [8,9], deep learning [10,11], clustering [12,13] and so on. Threshold segmentation [6] segments an image into different regions by one or several thresholds.
Huiguang Yang received his Ph.D. from Electrical and Computer Engineering, University of Illinois, Urbana-Champaign. He received his B.S. in Environmental Engineering, Tsinghua University, Beijing, China, and received his M.S. in Atmospheric Sciences, University of Illinois, Urbana-Champaign. His research in Electrical and Computer Engineering focuses on image processing and computer vision, such as image segmentation and object recognition. He is now a researcher at Samsung Research Institute China, Xian (SRCX).
Narendra Ahuja received his Ph.D. from the University of Maryland, College Park, in 1979. He is Donald Biggar Willet Professor in Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign. His fields of professional interest are next generation cameras, 3D computer vision, video analysis, image analysis, pattern recognition, human computer interaction, image processing, image synthesis, and robotics.