A regularized k-means and multiphase scale segmentation

We propose a data clustering model reduced from variational approach. This new clustering model, a regularized k-means, is an extension from the classical k-means model. It uses the sum-of-squares error for assessing fidelity, and the number of data in each cluster is used as a regularizer. The model automatically gives a reasonable number of clusters by a choice of a parameter. We explore various properties of this classification model and present different numerical results. 
 
This model is motivated by an application to scale segmentation. A typical Mumford-Shah-based image segmentation is driven by the intensity of objects in a given image, and we consider image segmentation using additional scale information in this paper. Using the scale of objects, one can further classify objects in a given image from using only the intensity value. The scale of an object is not a local value, therefore the procedure for scale segmentation needs to be separated into two steps: multiphase segmentation and scale clustering. The first step requires a reliable multiphase segmentation where we applied unsupervised model, and apply a regularized k-means for a fast automatic data clustering for the second step. Various numerical results are presented to validate the model.


1.
Introduction. Image segmentation is used to partition images and facilitates the identification of certain objects or features in the images. Image segmentation and active contour are widely studied and various extensions are proposed in different settings such as [10,25,33,39]. Since the work by Mumford and Shah [25] and Chan-Vese's successful level set implementation [3], numerous extensions have been explored and various properties have been studied in variational settings. The Chan-Vese model made it possible to identify objects without a sharp boundary, and its main driving force is the intensity difference of the objects. This model is extended to incorporate more complicated settings such as multi-channel [2], texture [31], and logic model [30].
We explore the connections between data clustering and image segmentation, focusing on automatically deciding the number of clusters. Inspired by the variational setting of image segmentation, we propose a one-dimensional data clustering model. This new regularized k-means algorithm is an extension from the classical k-means model, and explores the connections between data clustering and image segmentation from the modeling point of view. Image segmentation is closely related to data clustering. Their connections are explored by some authors, e.g. a subspace approach and a graph-cut approach for clustering is adapted for image segmentation in [17,33] respectively.
The main motivation of this work is from the unsupervised multiphase image segmentation model presented in [32]. Unlike two-phase 1 image segmentation, when a multiple number of phases are to be found, the results become more sensitive to the choice of regularization parameters and the initial condition (in case of iterative methods). In many applications [4,14,19,36,40], the number of phases is predetermined (or a reasonable estimate is given a priori ), which parallels the case of the classical k-means algorithm. The phase balancing model proposed in [32] addresses this issue of automatically choosing the number of phases k through the minimization of energy functional: The notations P (A) and |A| represent the one-dimensional Hausdorff measure and the Lebesque measure of a phase A respectively. Note that in addition to the phases {χ i } and the intensity averages {c i }, the number of phasesK is also an unknown; only the observed image u o is given. By minimizing the functional, a reasonable number of phases is found as the image is segmented. As a numerical method, a fast algorithm is proposed which demonstrated stable results. The model has one parameter µ that allows different choices for the number of phases, while presetting it to µ=1 (for intensity range [0, 255]), unsupervised segmentation is possible. Motivated by the ability to automatically find a reasonable number of phases, we propose a one-dimensional data clustering model which is reduced from the main idea of this phase balancing model (1). We also consider multiphase scale segmentation in this paper. We classify objects by the scale in addition to the intensity. A typical Mumford-Shah-based image segmentation is driven by the intensity difference, despite of the fact that the size of the objects can also provide a meaningful classification. The scale of an object is not a local value, that scale classification needs to be proceeded by a dependable multiphase segmentation. We apply unsupervised model (1), then we further classify the objects according to the size of each connected component by the new regularized k-means model.
In this paper, we present a regularized k-means model, explore its properties, and present a numerical algorithm. In section 2, the details of the the regularized k-means model are presented. Comparisons with the classical k-means are discussed in subsection 2.1, and related works of the regularized k-means are listed in subsection 2.2. The proposed model shows preference for finding bigger clusters and stability against outliers. In section 3, a fast algorithm for the regularized k-means is proposed for stable realizations of the model, while its stopping criterion gives some insights on how the number of clusters is determined. In section 4, the main process of multiphase scale segmentation is presented, and various numerical results in subsection 4.1 show how using scale clustering can improve segmentation results. In section 5, numerical experiments show the performance of the regularized k-means are farther presented, followed by concluding remarks in section 6.

2.
A regularized k-means model. For a given data set D = {d 1 , d 2 , . . . , d n } ⊂ R 1 , the classical k-means problem formalized by MacQueen [22] is to find a set of k clusters which minimizes the following energy, Here I i represents the i-th cluster and c i is the average over I i for i = 1, . . . ,k. The number of clustersk is typically given a priori or determined by some heuristic steps. When the number of clustersk is given, there are many algorithms to compute a solution. Most classical algorithms [9,12,20,22] are based on two basic steps: (1) assign a point to its nearest center; (2) update the cluster centers. These algorithms differ in how the two steps are interlaced. The Ward's method [41] uses an agglomerative hierarchical procedure which locally optimizes the objective for eachk. Some more recent algorithms include deterministic annealing [29], cuttingplane [28], and genetic algorithm [16]. The k-means problem assumes a fixedk. However, in many practical cases,k is unknown. Ifk is allowed to vary, then a trivial solution exists-simply let each datum form its own cluster. Therefore, some additional constraints or a priori knowledge about the clusters are necessary to make the problem meaningful. We propose to add a regularization term to determine the number of clusters via minimization of the energy. We propose the following regularized k-means energy, which is to be minimized for a given one-dimensional data set Here k is the number of clusters, n i = |I i | is the number of data in the cluster I i , and c i is the average of data in I i . In this paper, we use f (t) = 1 t for regularization (as long as f (t) is convex and satisfy f (s + t) < f (s) + f (t) for all s, t > 0, different functions can be applied).
The first term regularizes the solution via the size of each cluster. The second term measures the spread (intra-cluster dissimilarity) of the clusters. In (3), if λ is set to zero and the number of clusters k is not given, the minimum of the second term will be achieved when each data point is its own cluster, i.e. k = n. To regularize, the size of the clusters are incorporated into the first term which prefers to have all the data in one cluster. This regularization favors large clusters, and bounds the number of clusters such that k < n. Together, the proposed model (3) maintains the tightness of the clusters while avoiding over-fitting by having too many small clusters.
2.1. Analysis of the regularized k-means compared to the classical kmeans. In the classical k-means model (2), since a fixed numberk is given, the distribution of data will strongly affect the clustering result. In particular, it prefers to have similar distribution (squared error) |d j − c i | 2 among thek clusters. To illustrate this effect, we consider the distribution γ i in each cluster: Here γ min i and γ max i represent the minimum and maximum distribution within the cluster I i respectively. Let us assume there exist γ * i for i = 1, 2, . . . ,k such that This setting shows that, to minimize such an energy, the result will try to balance the value of γ * i n i among thek clusters. This is due to having a fixk and minimizing the summation term. Each cluster will prefer to have a similar value of γ * i n i , i.e. a cluster I 1 with many data will be distributed closer to the center c 1 , compared to another cluster I 2 with a smaller number of data which can be distributed farther from the center c 2 , as long as γ * 1 n 1 and γ * 2 n 2 are similar. Therefore, the clustering result is heavily influenced by the choice of k and strongly depends on how the data are distributed (via the values of |d j − c i | 2 ).
The proposed model is an improvement over the classical k-means. From a similar setting as above, the proposed model becomes, This model has the same fitting term and it will give some balance among the distribution of the clusters. The regularization term prefers bigger clusters (a bigger number for each n i ) and stabilizes the process. The fact that k is chosen from the minimizer of the functional makes this method flexible, and outliers are not forced into a cluster due to a set number of clusters. (Flexibility of the choice λ is further discussed in Section 5.) 2.1.1. Stability of bigger clusters. With a regularization term such as f (t) = 1 t , a clustering result favors fewer and bigger clusters. Figure 1 is an example showing the stability of the regularized k-means compared to the classical k-means. The size of the data set is n = 10, 000, among which 70% are drawn from the normal distribution N (0.2, 0.06 2 ), 15% from N (0.8, 0.02 2 ), and the other 15% from N (0.5, 0.02 2 ). The graphs are representing the data in histogram form. Plot (a) is the regularized k-means result using λ = 0. The classical 3-means algorithm. The big cluster on the left is better classified using the regularized k-means: majority of the cluster belongs to one cluster, while classical 3mean separates them to two. Also, using classical 3-means, two smaller clusters in the right are identified as one cluster, while the regularized k-means separates them.
2.1.2. Stability against outliers. Since the classical k-means algorithm depends strongly on the distribution of data, it can become unstable when outliers or additional datum are introduced. With the regularized k-means this effect is reduced, thanks to flexibility of automatically adjusting the number of clusters. Additional clusters can be created to deal with the outliers, that the clustering of the remaining data is largely unaffected. Figure 2 (the size of the data set is n = 10, 000) (a) shows a result of using the regularized k-means with λ = 0.1, which automatically gave three clusters. This result is exactly the same as using the classical 3-means. When new additional datum are introduced to (a), using the classical 3-means gave (b) as a result. Here the intervals are all shifted: in ( We can observe a couple of effects: (i) For a clearly separated data set, the classical k-means and the regularized k-means can give exactly the same results. This shows, the regularized k-means can handle the data change automatically with a fixed parameter λ.

2.1.3.
Effect of the regularization term. The regularization term we used has two basic properties. First, it decreases with k in the sense that if two clusters of size n 1 and n 2 respectively are merged, then the regularization term decreases, The regularization term is minimized when all data points are assigned to one cluster. Second, clusters of equal sizes are preferred, for any n 1 and n 2 such that n 1 + n 2 = n. In fact, we have Therefore, the "small k effect" will eventually overrule the "balancing effect" when the regularization parameter is large enough. Intuitively speaking, decreasing k by one would induce a relatively large increase in the sum-of-squares error. Thus, before λ is large enough to compensate such an increase in the fitting error, the balancing effect takes place to fine tune the clusters (although the changes may not be very significant). We illustrate these properties in Figure 3. In (a), a data set with two "true" clusters (red and blue) and the optimal 2-means clustering (obtained by enumerating all contiguous clusterings with k = 2) are shown. The true blue and red clusters have 100 and 200 points respectively. Due to the difference between the spread of the two clusters, there are 14 points that are "misclassified" by the 2-means method. In (b), the optimal clustering of the regularized k-means with respect to λ = 20, 30, . . . , 180 are shown. We remark that when λ ≤ 10, the optimal clustering has k ≥ 3; when λ ≥ 180, the optimal clustering has k = 1. We observe that as λ increases from 20 to 80, the number of misclassified points decreases due to the tendency of the regularization term to balance the size of the clusters. When 80 ≤ λ ≤ 170, where the size of the two clusters are 100 and 200 respectively, the balancing effect ceases. This is because the left-most points of the red cluster are quite far from the center of the blue cluster. Thus, further balancing would cause a relatively large increase to the sum-of-squares error. When λ ≥ 180, the small k effect takes over so that a single big cluster becomes the optimal choice. The graph (c) depicts the number of misclassified points for 20 ≤ λ ≤ 170.

2.2.
Related work on regularized k-means. Appending a regularization term to the k-means objective has been studied by many authors for different purposes. In [15,18,24], an entropy of the cluster membership is added to achieve fuzzy clustering. The fuzziness is controlled through the regularization parameter. The algorithm in [18] is agglomerative which finds a sequence of nested clusterings corresponding to a decreasing sequence of regularization parameters. The final number of clusters is then chosen by using an external cluster validation measure. In [27], an entropy of the cluster sizes is added so that the number of clusters k is part of the objective to optimize. In [38], the number of outliers is added to make the k-means problem less sensitive to outliers.
If we take a broader view that the k-means problem is a special instance of mixture modeling [6, p.526], then appending a regularization term is equivalent to introducing a prior distribution to the model parameters in the Bayesian framework, under some mild assumptions on the distributions. In particular, several prior distributions of k have been considered. They include the Laplace-empirical criterion, Bayesian inference criterion, minimum description length, minimum message length, Bayesian information criterion, see [23,8] and the references therein.
In our approach, we do not regularize k directly. Instead, we regularize the size of the clusters via the term k i=1 1 ni which favors a smaller k. A similar approach is considered in [27] where an entropy of the cluster sizes is used. We consider those f s that stem from geometric concepts in variational image segmentation. The regularization term in [27] is where k max is a predetermined upper bound of k and f (n i ) = −(n i /n) ln(n i /n). Here, n i can take zero value to account for empty clusters. The regularization term used in our model does not allow n i = 0; empty clusters are simply dropped from the expression. This implies that moving one point from a large cluster to a  new cluster would induce a relatively large increase in our regularization term (i.e. (1)). Thus our model has a great resistance to the formation of small spurious clusters.
Modifying the standard k-means to balance the clusters has been studied in different contexts, e.g. vehicle routing [13] and scheduling in wireless sensor networks [37]. These algorithms are domain specific which take into consideration some special structure of the data. The model proposed in [42] assumes the providence of a set of class labels. The goal is to construct clusters such that each cluster contains a similar number of data from each class and that the cluster size is balanced. Each of the two objectives is achieved by adding a sum-of-squares error term.

3.
A regularized k-means algorithm. In variational settings, a typical approach to find a minimizer of the functional is to consider its Euler-Lagrange form and apply a gradient descent method. For the proposed regularized k-means model, the number of clusters k is also an unknown that considering a gradient direction requires tedious computation of topological derivatives. Therefore, we propose to directly consider the change in the energy for each data point to find a solution, as in [11,32,34]. This is also possible since we are working on a discrete setting of A regularized k-means algorithm • Input data set D and λ. Let k = 1 and assume all the data are in one cluster.
• Iterate Compute the following for each datum d j for j = 1, . . . , n 1. For each datum d j ∈ I i , compute ∆E il as in (4) [34] is to consider the change in the energy directly, that each pixel is moved to the phase which locally minimizes the energy compared to the previous phase. Each iteration of this greedy algorithm is very fast, due to a simple format of the change in the energy functional. As in [32], we also add an option of creating a new cluster to allow changing the number of clusters k.
It is important to notice that this algorithm starts with a simple initial condition: all data are assumed to be in one cluster and the number of clusters is set to be k = 1. The model dynamically adjusts the number of clusters during the iteration of algorithm. From the proposed model (3), we consider the energy difference ∆E il for each pair (i, l), which represents the energy change when a datum d j ∈ D is moved from cluster I i to I l . For each data d j ∈ D, the energy change is computed by If ∆E il > 0, then the datum d j will not be moved to I l since that will increase the energy. If ∆E il is negative, then it is better to move d j to cluster I l . A new cluster I k+1 with n k+1 = 1 is created, when the following is the smallest negative value among ∆E il for l = 1, 2, . . . , k + 1:    [32]. These are represented using characteristic functions, indicating the location of black and one of darker gray regions. The segmentation is driven by the intensity, that each phase has many different sized objects mixed together.
A necessary condition of creating a new cluster can be computed by considering the sign of ∆E. The algorithm will add a new cluster as long as the following holds As n i increases, |d j − c i | 2 needs to be bigger to create new clusters, e.g. when one cluster has many data points, it will attract more points, unless it is far from the center c i . This inequality also shows that λ should be chosen depending on the squared-error (d j − c i ) 2 and n i (or f (n i )). For example, 4. Multiphase scale segmentation. We propose image segmentation using the scale of objects. One can achieve a better classification of objects by utilizing intensity and size information together. Figure 4 shows a motivation of this approach. The original image (a) has many disks with different sizes and intensities. Using intensity based segmentation gives results such as (b) and (c). This is showing two out of five phases, indicating the location of black and one of darker gray regions respectively. The segmentation is driven by the intensity, that each phase has many different sized objects mixed together. For some medical applications, it may be meaningful to cluster the objects according to the size of objects in addition to the intensity.
For the notion of size, we use the scale term such as where P (A) denotes the perimeter of a set A and |A| denotes the 2-dimensional area of a set A. This term is inversely proportional to the size of the object. The notation is used as an automatic parameter scaling in unsupervised multiphase segmentation model [32] and in [35], the inverse is used in the context of total variation (TV) denoising [35]. This term S is related to the Cheeger Set, where the objective is to find a nonempty set A ⊂ Ω of finite perimeter which minimizes min A⊂Ω S(A). This is widely studied in the calculus of variation analysis and some references include [1,7].
We propose to use scale information for a better classification. Using scale for object classification is considered in a different context and a good literature review on recent approaches can be found in [21]. Unlike intensity based image segmentation, scale is not a local value. Each pixel is not aware of the scale of the connected component it belongs to, until each connected component is identified. Therefore, scale segmentation and intensity based segmentation can not be performed simultaneously and a stable multiphase segmentation is necessary as a pre-precessing step. Any multiphase segmentation method that partitions image into separate objects can be applied. We apply the phase balancing multiphase segmentation model (1) from [32]: since it is stable and able to handle sensitivity issues of multiphase segmentation, such as initial condition dependence and the pre-assigned number of phases. After the step of multiphase segmentation, the image domain Ω is now partitioned by phases, χ a , according to their intensity value, i.e. Ω = ∪ l a=1 χ a and χ a ∩ χ b = ∅ for a = b. Each phase χ a may contain many separate connected components, and we further label each connected component by χ a,b such that This data set D contains various scale values of objects in the image. The size of D, n, is the number of connected components (objects) in the image. We only consider the case when n < ∞, that there are finitely many objects which is the case for the discrete setting. A grouping of the objects is found by minimizing the energy (3),    Figure 5 shows an example where using scale segmentation can further improve the segmentation result. Image (b) is using intensity segmentation, where light green regions are all identified together. By further clustering scale values with a regularized k-means algorithm, the clover is clearly identified in image (c). The large clover has a different scale value compare to that of the narrow leaves. Figure 6 is an example using a color blind test image. Since there are some color differences in color-blind test images, using intensity based segmentation method can identify the letters in some of multiple phases. Figure 6   the scale values obtained using the regularized k-means algorithm. We discarded values bigger than 1.5 to disregard noisy small objects. Three phases image (d), (e) and (f) correspond to the first three intervals in the histogram (e), and other three intervals give even smaller objects.    image segmentation, the image is separated according to the different intensities: Figure 8 (b) and (c) show two phases (not showing the background phase).
Using the regularized k-means in addition, we can further distinguish objects according to their scales. Figure 8 Figure 9 (b). Figure 9 (a) shows a scale value distribution in histogram form, and red bars represent each cluster interval using the regularized k-means. Since scale segmentation is a two step processing, a scale clustering result will be influenced by the multiphase segmentation result. If two of the same shaped objects are touching and identified as one connected component by the multiphase segmentation, then the two objects will be identified as one big object. The scale value of this big object can be different from that of each separate object.
Using scale information is different from using a shape prior for image segmentation, but it is flexible as a general classification. It may not be ideal for identifying specific shape, such as finding a particular shaped triangle. But, using scale segmentation will find similar sizes and will distinguish elongated shapes from rounded compact objects.
This approach of scale clustering can be related to histogram segmentation such as [5,26]. A clear difference is that we are proposing a data clustering algorithm, and the histogram clustering result is not directly related to the performance in the object identification.
Note: we considered some variations of the model to include the scale as well as the intensity in one segmentation model, such as where s i represents scale values of connected components. This model and variations of such model were not successful, due to a simple but an important fact that the scale value is a non-local term. For example, while one region is growing, it can get stuck at the size where many of other objects are, and over-segment one connected component. In addition, each pixel is not aware of its own location, finding the scale value s i became very unstable. One can consider adding the scale term in the model, however, most likely it will require a two-step algorithm separating scale computation and intensity segmentation. 5. Numerical experiments of the regularize k-means model. In this section, we further present the effects of the regularized k-mean algorithm (3). The data sets are generated either from probability distributions or from a histogram of some images. These graphs are plotting in the form of probability distribution and different colors indicates different clustering results by the regularized k-mean algorithm (3).
[Different λ effect] The first example shows how different choice of λ effects the results. Figure 10 (a) shows a given data set of size n = 20, 000. Data derived from a histogram of a general image. The parameter λ is changed from 0.1, 0.15, to 0.3, and the algorithm gives four, three and two number of clusters respectively. The given data is reasonably separated among the region with a high peak and other regions. This example shows an effect of the parameter λ.
[λ vs n] The stopping criteria (5) and the λ term (6) show that the parameter can depend on the size of the data set. We experimented with different sizes of data set, and for a data set of size between n ≈ 100 and 100, 000, the parameter λ stayed the same and stable. We kept the same distribution and changed the size of the data: 25% of the data are given from the normal distribution N (0.3, 0.1 2 ), 25% from N (0.6, 0.02 2 ) and the other 50% from N (0.8, 0.08 2 ). With the same value of λ = 0.1, the regularized k-means gives three clusters consistantly. Figure 11 (a) shows one such case of n = 10, 000. When a significantly small (or large) number of data is used, λ can be adjusted. Figure 11  [λ vs k] The regularized k-means algorithm does not require any prior knowledge of the number k, however, the results are influenced by the choice of λ. It is important to explore how sensitive this choice of λ is compared to the clustering result. Figure 12 shows graphs of a choice of λ verses the number of clusters k. Clearly the graph is a step function (since k is an integer) and decreasing. Notice there are large flat intervals with the same k values, which shows the stability of the choice of λ. The shape of this graph has relations to the distribution of the data. We experimented with the distribution of the data. Consider a data set generated with N (0.3, 0.05 2 ) and N (0.7, 0.05 2 ) with different proportions. We fixed the parameter to be λ = 0.1. Figure 13 is the case when only 1% of data is in the interval around 0.3, and the rest is around 0.7. The algorithm finds two intervals separately. Until 1%, the algorithm seems stable and consistently finds two clusters. Around 1% with λ = 0.1, sometimes it can also give three clusters further separating the big cluster into two clusters (b).
[distribution vs λ] We tested a data set with two clusters of uniform distribution and normal distribution around [0.4 − δ, 0.4 + δ] and [0.6 − δ, 0.6 + δ], with a varying δ. The two intervals have the same number of data. For each different δ, λ is found to give two clusters as a result. These experiments show that the parameter λ is dependent on the distribution of the data. As the intervals get wider and closer to each other, a bigger λ is needed to identify the two clusters. Figure 14 (a) shows one case of uniform distribution with δ = 0.09, (b) the graph of δ (x-axis) verse λ (y-axis) for uniform distribution. (c) shows the case of normal distribution with δ = 0.1. (d) and (e) are the graph of δ (x-axis) verse λ (y-axis) for normal distribution. The graph (d) has a jump around δ ≈ 0.06, since it is where two clusters start to merge and start to look like one cluster as in (c). The graph (e) is the case of two normal distributions at [0.3 − δ, 0.3 + δ] and [0.7 − δ, 0.7 + δ] (data not shown). This case the graph (e) is more uniform compared to (d). (Note these values of λ are not a definite value, but one among many possible values, as illustrated in the case of λ vs k in Figure 12.) Furthermore, the range of λ varies a little for uniformly distributed data (b), y-axis range [0.01, 0.022], while normal distributed data, the value of λ is larger. The range of the graph in (d) is [0, 0.14] and (e) [0.04, 0.24]. This indicates that for completely separated clusters, the choice of λ can be more sensitive, compared to the case of normal distribution case.
6. Concluding remark. We propose a data clustering model reduced from variational approach. The model is regularized from the classical k-means using the number of data in each cluster. We consider 1 ni that stems from geometric concepts in variational image segmentation. A fast greedy algorithm was directly utilized to each data point to efficiently obtain a solution. The proposed regularized k-means algorithm is stable and gives reasonable number of clusters automatically. If the data are scaled between [0, 1], λ = 0.1 is a reasonable choice. The algorithm is stable with respect to the size of the data n ≈ 100 to n ≈ 100, 000. The λ and k relation shows a stable choice of k for a wide range of λ. Various experiments are performed to show the effects of the model.
For this type of NP-hard problem, we are not guaranteed convergence to the global minimum. Variations in the minimum of the algorithm are illustrated in Figure 15. This example is similar to Figure 12, except for each different λ, we randomly permuted the order of how we input the data D. The data set itself is the same and only the input order is permuted randomly. The oscillations around the transitions indicate that the algorithm may get stuck in the local minimum. On the other hand, even if this is the case, the result would be reasonable, since the oscillations are only around the transition between two neighboring intervals where both solutions would make sense. Also, the oscillations are only between two neighboring integers showing the stability of this algorithm.
Moreover, we explored using scale information for multiphase image segmentation. As shown in the experiments, this allows to further identify the objects more clearly. This application is different from using shape priors, and it clusters objects according to the scale. Adding the intensity and scale, we can also consider vectorial version of the clustering algorithm.  Figure 15. The graph of λ verses the number of cluster k. This experiment is similar to that of Figure 12, except a random reordering of the input datum is used. The graph is a step function and decreasing, but shows some oscillations around the transition. This indicates a possibility of being stuck in the local minimum. However, the result will be reasonable, since the oscillations are only around the transitions and the values are only between two neighboring integers showing the stability of this model.