Monte Carlo methods for optimizing the piecewise constant Mumford–Shah segmentation model

Natural images are depicted in a computer as pixels on a square grid and neighboring pixels are generally highly correlated. This representation can be mapped naturally to a statistical physics framework on a square lattice. In this paper, we developed an effective use of statistical mechanics to solve the image segmentation problem, which is an outstanding problem in image processing. Our Monte Carlo method using several advanced techniques, including block-spin transformation, Eden clustering and simulated annealing, seeks the solution of the celebrated Mumford–Shah image segmentation model. In particular, the advantage of our method is prominent for the case of multiphase segmentation. Our results verify that statistical physics can be a very efficient approach for image processing.


Introduction
Effective statistical physics theories such as renormalization group, scaling laws, theories of phase transitions and advanced Monte Carlo methods have been developed based on isotropic lattice models. There have been some efforts to extend these theories to complex, non-isotropic lattice models by casting image processing problems in a statistical physics framework [1]- [3]. Generally, an image consists of discrete pixels arranged on a square grid. This maps naturally to the familiar concept of a square lattice in statistical physics. Such notion of thought can hold much potential for new breakthroughs as it is radically different from the approaches of traditional image processing; with this notion of thought, theories of statistical mechanics may be refined and used for image processing.
One of the earliest prominent papers that have used the Monte Carlo method for image processing is the one by Geman and Geman of 1984 [4]. The focus of computer vision has since moved away from Monte Carlo techniques because it was generally believed that Monte Carlo methods are too slow in computation. However, Monte Carlo techniques have significantly improved in the last two decades. Sampling methods experienced a speedup of one or more orders of magnitude [5,6]. We believe that Monte Carlo techniques have advanced enough to be entitled a bigger role in computer vision once again.
In this paper, we illustrate an effective use of Monte Carlo by formulating the image segmentation problem in a statistical physics framework. Image segmentation is one of the most important fundamental tasks of image processing [7], whereby the image is partitioned into segments. Segmentation results in a simpler representation of the original image and this representation is often used for other higher-order cognitive tasks such as objection recognition. As an example, in figure 1, we show a river scene image segmented into three regions, the river, land and sky. Details of the segmentation procedure will be given later in this paper.
There are many thousands of publications on image segmentation, but generally the image segmentation problem is considered unsolved. In 1989, Mumford and Shah [8] developed a segmentation model that has become the hallmark of image segmentation. Many variants of this model were developed [9]- [14] to segment almost any kind of image. A more popular variant is one that approximates the image with a cartoon-like piecewise constant intensity image. We consider the energy functional F, defined as where q is the number of constant intensities and is also commonly called the number of phases of the segmentation; u(x) is the original image and C is a set of curves that partition the image into exclusive segments i , i = 1, . . . , q; c i is a set of constants, µ is a tuning parameter and |C| represents the length of the curve measured in Euclidean distance. The actual calculation of |C| will be explained later in equation (2). The first term accounts for the deviation from constant intensities c i and the second term accounts for the regularization of curves partitioning different segments. The objective is to seek the partitioning curve C and constants c i , i = 1, . . . , q, that minimize the functional F. We use figure 1 to illustrate the physical meaning of F more clearly. The original image shows three regions of approximately constant intensities, the sky, river and land. The optimal energy functional seeks three constants c 1 (sky), c 2 (river) and c 3 (land), and a curve C that partitions the image into three regions. In this case, c 1 , c 2 and c 3 are the average pixel intensity values of the sky, river and land, respectively. We would also like to point out that the number of segmentation phases is given a priori. Much effort has been devoted to solving the Mumford-Shah model. The more popular approaches are level-set, threshold dynamics and graph cut [15]- [21]. One of the most cited among all proposed Mumford-Shah solutions is that of Chan and Vese [15]. However, this method suffers from local minimum trap although techniques have been developed to obviate this problem [22,23]. Furthermore, in most approaches, the number of phases of segmentation is always given by 2 k , where k is the number of level-set functions. This limits the freedom of choosing an arbitrary number of phases.
Some attempts to treat multiphases were reported in [24]. We organize the rest of the paper as follows. In section 2, we describe the statistical physics representation of the Mumford-Shah segmentation model. The optimization process with the Monte Carlo is explained in section 3. We present the numerical results in section 4, and a comparison with other methods in section 5. The final section is devoted to the discussion. (c) For Eden clustering, only pixels at the region boundary (shown in light gray) are randomly chosen. In this figure, the pixel in dark gray is chosen by chance and its label changed to 0.

Statistical physics representation of the Mumford-Shah model
We rewrite the Mumford-Shah model in a statistical physics framework, in particular in terms of a q-state Potts model [25] on a discretized square lattice as u(x) is the pixel intensity at the pixel location x; c i is a set of constants described in equation (1); S(x) ∈ 1, . . . , q is the Potts spin value at x; µ is the tuning parameter described in equation (1). The first term sums over all pixel (lattice) points; this term corresponds to the first term in equation (1). The second term sums over all nearest neighbors on the lattice; this term corresponds to the length term |C| in equation (1). This representation is also called Markov random fields in the image processing literature. In the q-state Potts model representation, we superimpose the image u(x) with a lattice containing Potts spins S(x) ∈ 1, . . . , q. One can immediately observe several advantages of Potts' representation of the Mumford-Shah model. Firstly, the number of phases of segmentation, q, can be chosen arbitrarily. Next, the second term in equation (2) is the same form as the Potts model. Lastly, the machinery of statistical mechanics and advanced Monte Carlo methods can be used directly to solve the Mumford-Shah model. We use a combination of statistical physics and Monte Carlo methods to develop a very fast solution for the Mumford-Shah model. Our method includes single spin flip, block-spin transformation, Eden clustering and simulated annealing.  Block spin transformation originates from renormalization group theory [26], where the lattice is coarse grained at several levels. Its purpose is to speed up the numerical computations. For two dimensions, each operation of block-spin transformation reduces the number of pixels by fourfold. Block-spin transformation is performed by taking four neighboring pixels and replacing them by one pixel with the average intensity value of all four pixels. In three dimensions, eight pixels are averaged over. Monte Carlo optimization is performed across all levels of block-spin transformations starting from the coarsest level. Figure 3 demonstrates the process of optimization across levels of block-spin transformations. Take for example the top row with 12 × 12 pixels and the middle row with 24 × 24 pixels. Optimization is first performed on the image with 12 × 12 pixels; after a suboptimal segmentation is obtained, the algorithm switches to optimize the 24 × 24 pixel image. The segmented result of 12 × 12 pixels was kept 6 as the initial condition for segmenting the 24 × 24 pixel image, with the 12 × 12 pixel segmented image being resized to 24 × 24 pixels.
It is important that the Mumford-Shah model (equation (2)) remains invariant under blockspin transformation. A theorem given by Law et al [22] ensures this invariant and gives the rule for setting the parameter µ across different scales. Essentially, µ → µ/2 across each level of block-spin transformation.
At the later stage of segmentation, only spins near the segment boundary have a high chance of flipping. Hence, we employ the method of Eden clustering [27] to take advantage of this observation. For this Monte Carlo move, we choose the pixel at segment boundary with equal probability and perform single spin flip on them. As a parameter in our simulation, we assign an Eden clustering probability p Eden and, for each Monte Carlo step, perform Eden clustering with probability p Eden and single spin flip with probability 1 − p Eden . Although Eden clustering violates detailed balance, we performed an extensive check and found that, to within error bars, Eden clustering does not affect the final minimized Mumford-Shah functional value. Figure 2 gives an illustration of an Eden clustering move.
Simulated annealing can be used to get away from a local minimum, which enhances ergodicity. Annealing is achieved by adjusting either the temperature T or the Mumford-Shah parameter µ. For initial conditions, all simulations were performed with each pixel assigned to a random initial phase. An outline of our algorithm is as follows: Steps 5a and 5b are performed whichever comes first. 6. Repeat steps 4 and 5 until a predetermined Monte Carlo step is reached. Figure 1 shows the segmentation result for a river scene. The number of phases was set to 3, q = 3, demonstrating one of the advantages of the q-state Potts representation. Three-phase segmentation is not achieved for most level-set approaches.  An equivalent of 1800 sweeps through the lattice (1800 × 608 × 816 single Monte Carlo steps) were performed. Annealing was carried out by starting with T = 800 and T → 0.99 T for every ten sweeps through the lattice. Figure 3 shows the segmentation of a brain MRI image (http://www.umassmed. edu/bmp/faculty/ross.cfm) at different levels of block-spin transformation. The segmentation can identify important structures such as the tumor in the middle even at the very coarse grained level of 12 × 12 pixels (four levels of block-spin transformations). The segmentation iteratively refines itself as more details in the original image are presented. Figure 4 demonstrates the robustness of our method to numerical instabilities and local minimum traps. The pencil drawing of the house and boat (608 × 816 pixels) is very complex and requires six phases to segment. It was reported that the level-set method [15] faces a serious local minimum problem even for four-phase segmentation [22]. One notices that the segmented image is visually a good representation of the original image. Zoomed-in images of the house on the top middle of the image show details of the segmentation. The house and background are very well segmented. As a comparison, we present the result for segmentation with µ = 0 (bottom right of figure 4). At µ = 0, segmentation results in single pixel noise. Figure 5 shows the effectiveness of block-spin transformation and Eden clustering. Convergence is much faster with block-spin transformation as indicated by arrows. Note that the Arrows indicate that convergence with block-spin transformation is very effective. Block-spin transformation was performed as described in figure 3. Different symbols are used to represent various values of Eden clustering probability, p Eden = 0 (circles), p Eden = 0.2 (squares), p Eden = 0.5 (diamonds) and p Eden = 0.8 (triangles). The inset shows an enlarged image with block-spin transformation toward the end of the simulation. The annealing parameters used to generate these plots are the same as those used in figure 3. Ten independent simulations started with different random initial conditions were performed to generate each data point in figure 5. We obtained very small error bars (when not shown they are smaller than the size of the symbols). Hence, our Monte Carlo method is very robust to initial conditions and statistical fluctuations.  Figure 6 shows two-phase segmentation results for different tuning parameters, µ. Segmentation converges quickly and shows fine details for small µ (µ = 100). The length of the boundaries of segment is much shorter for large µ and convergence is slower. It is also observed that in the level-set method [15], convergence is slower for larger µ. We found, as in the right most image of figure 6 (µ = 1 × 10 6 ), that annealing in µ can enhance convergence.

Comparisons with other methods
The graph-cut method can guarantee optimization to global minimum for two-phase segmentation [19]- [21]. We can use graph-cut as a benchmark to check how well our Monte Carlo method optimizes the two-phase image. We found that the optimized Mumford-Shah functional value for the two-phase tree-and-sky image (figure 7) is 6.3294 × 10 8 using graph-cut and 6.3417 × 10 8 using our method, with the parameter µ as 1000. This shows that our Monte Carlo method optimized the image to within 0.19% of the global minimum. The segmented tree-and-sky image by graph-cut is shown in figure 7(a), and graph-cut segmentation looks visually similar to Monte Carlo segmentation ( figure 7(b)). We also use graph-cut to compare our results for more than two-phase segmentation. First we compared our segmentation algorithm to the graph-cut method of El-Zehiry et al [21]. In this case, the graph-cut does not guarantee finding the global minimum. Figure 8 shows Monte Carlo (a, c, e) and graph-cut segmentation (b, d, f) for the brain MRI, breast cancer cells and zebra fish intestine images. For the brain MRI image, both methods can segment the tumor. But graph-cut could not segment out the gray matter (dark gray area on the surface of the brain), whereas the Monte Carlo method can distinguish between the white matter and the gray matter. The Mumford-Shah functional with µ = 1000 is 4.539 × 10 7 for graphcut and 9.353 × 10 6 for Monte Carlo. The optimized value by graph-cut is about 5 times higher. For the breast cancer cells image, the Mumford-Shah functional energy for graph-cut is 3.720 × 10 8  We remark here on the Chan-Vese algorithm [15]. Similar test images of the brain MRI, breast cancer cells and zebra fish intestine were used by Law et al [22]. Since the test images in [22] are resized to different sizes, we do not make a direct comparison of energy here. But Law et al showed unambiguously that the Chan-Vese [15] algorithm cannot handle four-phase segmentation well.
As another method of graph-cut, recently Bae and Tai [28] applied graph-cut optimization to multiphase image segmentation. They constructed a graph in a similar way to Ishikawa [29] and Ishikawa and Geiger [30]. They claimed that their algorithm can efficiently minimize the energy, and the computation time is dramatically reduced compared to the gradient descent solution of the level-set approach. The acceleration rate is about 200-400 times depending on the images. We compared the performances of our Monte Carlo method and the graphcut method of Bae and Tai [28]. As a preliminary work, we tested on the MRI image of brain, figure 3, for four phases. The Monte Carlo method with block-spin transformation arrived at the minimum energy only 0.6% higher than that of graph-cut [28]. Average computation time of Monte Carlo differs by less than 5%. We should note that the graph-cut is strongly dependent on the initial conditions. They cannot reach the minimum solution from some initial conditions. The Monte Carlo is robust to initial conditions. We can start from random initial conditions. Moreover, there is a restriction of the form of the energy functional when the graph-cut method of Bae and Tai [28] is applied. Actually, the second term in equation (2) should be replaced by µ x,y |S(x) − S(y)|. Thus, our method of Monte Carlo is very efficient for multiphase segmentation, and the performance is compatible with the sophisticated graph-cut method. The Monte Carlo has the advantages of generality and robustness. Detailed comparison is left for a future work.
The level-set-based method developed by Song and Chan [17] is similar in essence to our method. It was claimed that their method is very fast but does not guarantee a global minimum solution. The method by Song and Chan does a raster scan on the image to flip pixels if energy decreases as a result. Our method holds several advantages. We use Eden clustering that is much faster than a raster scan; our block-spin transformations can effectively speed up the segmentation as well as enhance ergodicity. If the same annealing scheme is imposed on the method of Song and Chan, it should perform comparably to the plot representing p Eden = 0 in the inset of figure 5 (represented by circles).
In addition, our method is robust to statistical fluctuations and does not require a special initial condition. We are able to treat any image, whereas other methods have problems; our method can be trivially generalized to three-dimensional images without sacrificing computational efficiency.

Discussions
In this paper, we showed that statistical physics representation for image processing can be effective, when the advanced Monte Carlo method is utilized. We emphasize that the advantage of our method is prominent in the case of multiphase segmentation. The performance of the present method is shown by the movie given in Supporting information. The Java applets for sample images are available at http://ccmp1.phys.se.tmu.ac.jp/ccmp/demo/. Although it takes time for graphics in the Java applet, the net computation speed is much faster. Timings for all figures are presented in appendix B. In our future work, we would like to extend our statistical physics approach to include topological dependence [31]- [33] and the subspace Mumford-Shah model [9]- [11]. Eden clustering can be used to include a topological dependence constraint developed recently to segment crowded objects [31,32].