Elsevier

Pattern Recognition

Volume 86, February 2019, Pages 265-280
Pattern Recognition

Clustering as physically inspired energy minimization

https://doi.org/10.1016/j.patcog.2018.09.008Get rights and content

Highlights

  • We more completely map the energy model of statistical physics onto clustering problem, and our method can be totally unsupervised.

  • We make a perfect analogy with the energy model used in vision and borrow the methods from vision field to clustering under this mapping.

  • We propose a data point local density estimation method which can account for the datasets with arbitrary shapes and topologies.

  • We point out that the energy model of spectral clustering methods (such as Normalized-cut [22]) is incomplete compared with our energy model.

Abstract

We formulate the task of density based clustering as energy minimization, using both binary/pairwise energy term and unary/data energy term (the latter was largely ignored in previous clustering methods). Binary energy is defined in terms of inhomogeneity in local point density. While most previous methods use binary/pairwise energy only, the unary/data energy can represent the natural tendency of a given point belonging to a given cluster, which is also crucial for the clustering. Since our energy is expressed as the sum of a unary (data) term and a binary (pairwise or smoothness) term, we can thus make a perfect analogy with the energy model used in vision and borrow everything (such as the optimization algorithms) from vision field to clustering under this mapping. This correspondence provides an entirely new view point in handling the clustering problem, and in fact many mature methods and algorithms are already provided in the vision field and can be adopted by the clustering field readily. During our energy optimization, a sequence of energy minima are found to recursively partition the points, and thus find a hierarchical embedding of clusters that are increasingly homogeneous in density. Disjoint clusters with the same density are identified separately. Our clustering method is totally unsupervised (which is superior to most existing methods, as those listed below). It does not need any user input parameters (say number of segments, any bandwidth parameter, cutoff distance/scale, etc.), except one can specify the homogeneity criterion — the degree of acceptable fluctuation in density within a cluster (which is target-related), or let it be specified automatically in a hierarchical way. We conduct experiments on both synthetic datasets and real-image tasks. Experimental results on synthetic datasets show that our method is able to handle clusters of different shapes, sizes and densities. We present the performance of our approach using the commonly used energy optimization algorithms from vision such as ICM, LBP, Graph-cut, Mean field theory algorithm, as well as the integer programming algorithm. We also contrast our performance with several other commonly used clustering algorithms, such as k-means, fussy c-means, DBSCAN, as well as a recent state-of-the-art clustering method as reported in [40]. Our experiments on real-image tasks further validate the performance of the method. In addition, we show that the family of commonly used spectral, graph clustering algorithms (such as Normalized-cut) uses only the binary energy term while ignoring the unary energy term; therefore, their energy model is incomplete compared with ours.

Introduction

This paper presents an approach to detection of clusters characterized by homogeneity of density inside and a discontinuity of density across border. We define an energy function that captures density variation and minimize it to find the clusters of the data points. This is in contrast with many methods summarized in literature [1], [2], [3], [4], as well as the more recent ones in [40], [41], [42], [43], [44], [45], [46], [47], [48], [49]. The recent literature [1] classifies clustering algorithms into traditional and modern ones. They divide the traditional algorithms into the following 9 classes, based on: (1) partition (2) hierarchy (3) fuzzy theory (4) distribution (5) density (6) graph theory (7) grid (8) fractal theory (9) model. They divide the modern clustering methods into the following 10 classes, based on: (1) kernel (2) ensemble (3) swarm intelligence (4) quantum theory (5) spectral graph theory (6) affinity propagation (7) density and distance (8) spatial data (9) data stream (10) large-scale data. For example, the popular k-means method [20] is based on partition, mean-shift method [21] and DBCSAN [25] are based on density, and the Normalized-cut method [22] is based on spectral graph theory.

Our approach applies ideas from the analysis of particle interactions done in statistical physics to associate an energy function with the network of data points, and perform graphical model energy optimization using Ising model [23] from physics. Desired clusters are then obtained corresponding to energy minima. Energy is defined as consisting of a unary/data energy term and another, binary/smoothness term capturing interactions among data points. Previous work [5], [6], [7], [8], [9], [10] has also considered using concepts from statistical physics for the clustering problem, but they did so only partially. For example, Rose et al. [5], uses energy but defines it heuristically (e.g., based on square distance between points), instead of following the Ising model; in [6], the energy function does not contain the unary term, which led the authors to conclude that the energy function is symmetric in the global permutation of all labels. However, as we will discuss later, every data point has its natural tendency to belong to a certain cluster; for example, a data point located in a high density area is more likely to belong to the cluster with higher density and vice versa. With the inclusion of the unary energy term, we can quantify this natural tendency automatically.

A key issue our method accounts for when applying the energy optimization scheme to clustering problem is that we do not know the number of clusters in advance. In addition, we avoid defining clustering affinity among the data points based on their mutual distances, since being closer does not guarantee more likely belonging to the same cluster (Fig. 1).

The contributions of this work as well as the essential difference with the previous “energy based” clustering schemes are as follows:

  • 1.

    We more completely map the energy model of statistical physics onto clustering problem; especially, we emphasize the importance of the unary energy term in clustering. Therefore, our method can be totally unsupervised; which is not true for most previous methods where some key input parameters are needed, such as the number of segments, the bandwidth parameters, cutoff distance/scale, etc.

    In our work, we do not just introduce or emphasize the unary term in clustering, but more importantly we put the unary and binary energy terms in the principled way defined by statistical physics, not in some heuristic way. Therefore our entire scheme is systematic and principled, which also corresponds to the energy formulation used in computer vision immediately, and hence opens the doors for borrowing the mature concepts/algorithms/solutions in computer vision for the clustering problem as stated below.

  • 2.

    Since our energy model consists of a unary (data) term and a binary (pairwise or smoothness) term, we can thus make a perfect analogy with the energy model used in vision and borrow the concepts, ideas, and schemes (such as the optimization algorithms) from vision field to clustering under this mapping. This correspondence was largely ignored in the clustering field and hence this mapping provides an entirely new view point for handling the clustering problem.

  • 3.

    We propose a data point local density estimation method which can account for the datasets with arbitrary shapes and topologies (such as with holes inside).

  • 4.

    We point out that the spectral clustering methods (such as Normalized-cut [22]) use only the binary/pairwise energy term while leaving out the unary/data energy term; therefore, their energy model is incomplete compared with ours.

The organization of this paper is as follows. In Section 2, we present our method, including determination of the point local density and its homogeneity, and the energy function with its unary and binary terms. In Section 3, we show that the energy model considered in the graph spectral clustering method (such as Normalized-cut) is incomplete compared with ours. Section 4 presents the experimental results we obtain on both synthetic datasets and real-image tasks, as well as the comparison with other related methods. Finally we conclude in Section 5.

Section snippets

Energy optimization based clustering

We draw an analogy with the image segmentation problem to formulate the energy based objective function. In image segmentation, the property of each node (image pixel or superpixel) that distinguishes it from other nodes (pixels) is the pixel intensity, and the adjacent pixels with similar intensities are more likely to belong to the same segment of the image. Analogously, in clustering problem, the property of each node (data point) that distinguishes it from other nodes is the point local

The connections with Graph spectral based clustering approach

In this section, we relate our algorithm with the graph spectral method [34] because it shares some important characteristics with our algorithm. They are both based on the graphical model structure connecting all the points, and both also consider interaction among them via edge weight or adjacency matrix. We show that the graph spectral method (such as Normalized-cut [22]) uses only the binary or smoothness term in the energy model, and ignores the unary or data term; therefore, their energy

Experiments on synthetic dataset

We first evaluate our method on some synthetic datasets. In our experiment, we evaluate the performance of our clustering method and compare it with other commonly used and best-performing methods (Sections 4.1.1 and 4.1.2). Meanwhile, we compare the performance of various specific energy optimization algorithms within our scheme (Sections 4.1.3). It should be noted that most previous methods require some heuristic user input for the clustering (such as cluster number, distance scale or minimum

Conclusion

In this work, we propose a clustering method based on physics-inspired graphical model energy optimization. The clustering method is unsupervised with no user input parameter needed. The energy formulation uses the Ising model as the basis, and we emphasize the using of the unary/data energy (in addition to the commonly used binary/pairwise energy) in the energy model, which represents the natural tendency of a node belonging to a certain cluster. We perform hierarchical clustering, where the

Acknowledgments

The author would like to give the special thanks to Prof. Mark Hasegawa-Johnson at University of Illinois at Urbana-Champaign, for his kind and detailed editing of the paper content in many sections. The same thanks go for the anonymous reviewer as well. We also want to thank Mr. Marcus Edvall at Tomlab Optimization for the continuous support on the software license to facilitate our research. The support of the Office of Naval Research under grant N00014-16-1-2314 is gratefully acknowledged.

Huiguang Yang received his Ph.D. from Electrical and Computer Engineering, University of Illinois, Urbana-Champaign. He received his B.S. in Environmental Engineering, Tsinghua University, Beijing, China, and received his M.S. in Atmospheric Sciences, University of Illinois, Urbana-Champaign. His research in Electrical and Computer Engineering focuses on image processing and computer vision, such as image segmentation and object recognition. He is now a researcher at Samsung Research Institute

References (49)

  • D. Xu et al.

    A comprehensive survey of clustering algorithms

    Ann. Data. Sci.

    (2015)
  • A. K. Jain et al.

    Data clustering: a review

    ACM Comput. Surv. (CSUR)

    (1999)
  • A.K. Jain

    Data clustering: 50 years beyond k-means

    Pattern Recognit. Lett.

    (2010)
  • P. Berkhin

    Survey of clustering data mining techniques

    book chapter in Grouping Multidimensional Data, pp 25–71

    (2002)
  • K. Rose et al.

    Statistical mechanics and phase transitions in clustering

    Phys. Rev. Lett.

    (1990)
  • M. Blatt et al.

    Superparamagnetic clustering of data

    Phys. Rev. Lett.

    (1996)
  • A. Noack

    Energy-based clustering of graphs with nonuniform degrees

    book chapter in Graph Drawing, Volume 3843 of the series Lecture Notes in Computer Science, pp 309–320

    (2005)
  • A. Noack

    Energy models for graph clustering

    J. Graph Algorithms Appl.

    (2007)
  • Y. Fu et al.

    Application of statistical mechanics to NP-complete problems in combinatorial optimization

    J. Phys. A

    (1986)
  • R. Albert et al.

    Statistical mechanics of complex networks

    Rev. Mod. Phys.

    (2002)
  • J.E. Besag

    On the statistical analysis of dirty pictures

    J. R. Stat. Soc. Ser. B

    (1986)
  • J. Pearl

    Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (2nd ed.)

    (1988)
  • P.F. Felzenszwalb

    Efficient belief propagation for early vision

    In CVPR

    (2004)
  • J. Yedidia et al.

    Generalized belief propagation

    Advances in Neural Information Processing Systems

    (2000)
  • Y. Boykov et al.

    Fast approximate energy minimization via graph cuts

    PAMI

    (2001)
  • V. Kolmogorov et al.

    What energy functions can be minimized via graph cuts

    PAMI

    (2004)
  • M. Saito et al.

    Application of the mean field methods to MRF optimization in computer vision

    CVPR

    (2012)
  • H.J. Kappen et al.

    Mean field theory for graphical models

    In Advanced Mean Field Methods: Theory and Practice, chapter 4

    (2001)
  • Tomlab optimization website,...
  • J. MacQueen

    Some methods for classification and analysis of multivariate observations

    Proc. Fifth Berkeley Symp Math Stat Probab

    (1967)
  • D. Comaniciu et al.

    Mean shift: a robust approach toward feature space analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2002)
  • J. Shi et al.

    Normalized cuts and image segmentation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • E. Ising

    Beitrag zur theorie des ferromagnetismus

    Z. Phys.

    (1925)
  • C. Wang et al.

    Markov random field modeling, inference & learning in computer vision & image understanding: a survey

    Comput. Vision Image Understanding

    (2013)
  • Cited by (1)

    • Novel fuzzy clustering algorithm with variable multi-pixel fitting spatial information for image segmentation

      2022, Pattern Recognition
      Citation Excerpt :

      Its aim is to make the features of the same region exhibit similarity, while different regions are different. The methods of image segmentation mainly include thresholding [5,6], mean sift [7], region growing [8,9], deep learning [10,11], clustering [12,13] and so on. Threshold segmentation [6] segments an image into different regions by one or several thresholds.

    Huiguang Yang received his Ph.D. from Electrical and Computer Engineering, University of Illinois, Urbana-Champaign. He received his B.S. in Environmental Engineering, Tsinghua University, Beijing, China, and received his M.S. in Atmospheric Sciences, University of Illinois, Urbana-Champaign. His research in Electrical and Computer Engineering focuses on image processing and computer vision, such as image segmentation and object recognition. He is now a researcher at Samsung Research Institute China, Xian (SRCX).

    Narendra Ahuja received his Ph.D. from the University of Maryland, College Park, in 1979. He is Donald Biggar Willet Professor in Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign. His fields of professional interest are next generation cameras, 3D computer vision, video analysis, image analysis, pattern recognition, human computer interaction, image processing, image synthesis, and robotics.

    View full text