Mean-Based Breakpoint Selection on Circular Histogram

Circular histogram represents the statistical distribution of circular data; theH component histogram of HSI color model is a typical example of the circular histogram. When using H component to segment color image, a feasible way is to transform the circular histogram into a linear histogram, and then, the mature gray image thresholding methods are used on the linear histogram to select the threshold value. *us, the reasonable selection of the breakpoint on circular histogram to linearize the circular histogram is the key. In this paper, based on the angles mean on circular histogram and the line mean on linear histogram, a simple breakpoint selection criterion is proposed, and the suitable range of this method is analyzed. Compared with the existing breakpoint selection criteria based on Lorenz curve and cumulative distribution entropy, the proposed method has the advantages of simple expression and less calculation and does not depend on the direction of rotation.


Introduction
e data obtained from actual observations can be expressed in various measurement spaces, and the angles' space showing the angle change is one of the measurement spaces. Angles' space data processing belongs to the branch of the discipline of statistics: direction (circular) statistics [1,2]. Angle-based data are called direction data, and angles are commonly expressed as unit vectors. Different from the measurement based on the scale, the direction data have inherent periodic (cyclic) characteristics, which make the direction data have many unique and novel characteristics in modeling and statistical processing. e data that show the angle change of a single variable is called circular data [2], and one of its visual display methods is a point on the unit circle or a unit vector on a plane. A typical example of circular data is the H component in the HSI color model of color image [3]. e HSI color model is a mathematical image model proposed by the American colorist H. A. Munsell in 1915; it uses H (hue), S (saturation), and I (intensity) to describe color characteristics. e HSI color model is different from the commonly used RGB color models. e three components R(red), G(green), and B(blue) of RGB color model are linearly dependent, but the three components H, S, and I of the HSI color model are linearly independent. Since the HSI color model has a good capability of representing the colors of human perception, the color image segmentation in HSI color space has achieved good results [4][5][6][7][8][9][10].
As a typical example of circular data, Hue(H) represents the basic colors of the image (H ∈ [0, 2π)); it can be expressed as a circular histogram, due to its periodicity. At first, some scholars used the hue histogram to segment the color image without considering its periodicity [11,12]. Tseng et al. [13] proposed the thresholding method of circular histogram for color image segmentation for the first time in order not to lose the periodicity of the hue. Wu et al. [14] gave an iterative Otsu's algorithm based on the circular histogram, but this method is not the optimal method and cannot guarantee the convergence of the algorithm. Dimov et al. [15] gave the method of optimal thresholding and multithresholding of circular histogram through the symmetry constraint of threshold point pairs, but the calculation is very complicated. Utilizing the cyclic characteristic of a circular histogram, Lai and Posin [16] theoretically analyzed that when the circular histogram is expanded into a linear histogram and the Otsu method is adopted, only half of the points on the circle need to be searched to obtain the optimal threshold point pair, which successfully reduces the time complexity from O(N 2 ) to O(N). However, this method is not general and can only be applied to two-class threshold.
Lai and Posin's research [16] shows that it is a feasible way that first expand the circular histogram into a linear histogram and then use the mature linear histogram threshold methods (such as Otsu's [17], fuzzy entropy [18], and context sensitive [19] thresholding techniques) to obtain the threshold of a circular histogram. How to choose a suitable breakpoint to linearize the circular histogram becomes a key. For this reason, we have proposed two breakpoint selection criteria. One is the criterion based on the Lorenz curve [20]; we discussed the relation between the area difference and the expansion direction and gave the optimal breakpoint selection criterion in the anticlockwise or clockwise direction. e other is the criterion based on the cumulative distribution entropy [21], and we built a circular histogram expansion model based on the cumulative distribution entropy and discussed the optimal breakpoint selection criteria under different expansion directions.
ese two circular histogram expansion methods overcome the randomness of breakpoint selection. However, the computational complexity of the Lorenz curve and cumulative distribution entropy is relatively high and makes the selection of the optimal breakpoint spend much time.
Circular statistics, as a particular branch, generally deals with data composed of angles or directions. Due to the obvious periodicity of this data, it is necessary to distinguish between circular data and linear data. In circular statistics, angle mean is used to represent the average angle of a set of data on a circle; it is a circular statistical invariant on the circular histogram; it does not change with the rotation of the circular histogram. On the contrary, line mean represents the average of a linear set of data and is a linear statistical invariant on the linear histogram. Since the angle mean is a circular statistical invariant, the line mean is a linear statistical invariant; in view of this, this paper proposes a simple breakpoint selection criterion to minimize the distance between the angle mean on the circle and the point on the circle corresponding to the line mean of the expanded linear histogram. e proposed criterion can quickly and reasonably find the breakpoint that keeps the distribution unchanged after the circular histogram is expanded.
is paper proposes a fast method for breakpoint selection in circular histogram, which solves the problem of low efficiency in expanding circular histogram into linear histogram. It is organized as follows. Section 2 describes the angle mean of the circular histogram and the line mean of the linear histogram and gives the optimal breakpoint selection criterion. In Section 3, the suitable range of the proposed method is given by comparing it with the optimal breakpoint selection criteria of the Lorentz curve based and cumulative distribution entropy based. Section 4 summarizes the paper.

Criteria for Selection of Breakpoint in Circular Histogram
In this section, we use the H component histogram of the HSI color model to explain the circular histogram. Figure 1(a) shows the H component diagram in the HSI color model. e H component (H ∈ [0, 2π)) represents the periodic change of color in the anticlockwise direction. For example, red is 0, green is (2π/3), and blue is (4π/3). Taking into account the periodic changes of the H component, a circular histogram is used to represent the statistical distribution of the H component (Figure 1(b)). When we use H component to realize color image segmentation, a feasible approach is to transform the circular histogram into a linear histogram, and then, we use the threshold segmentation methods on the linear histogram to select the threshold. e distribution information carried by different linearized histograms produced by the same circular histogram at different cutting points may be different. Figure 2 shows the result of the circular histogram ( Figure 1(b)) expanded at two different points. Although they are derived from the same circular distribution, their linearized distributions are not similar. In order to keep the distribution of the linearized histogram as consistent as possible with the distribution of the circular histogram, a new breakpoint selection method is given below.

Angles Mean and Linearized Mean of Circular Histogram.
(1) e angles mean μ L on circular histogram is defined as where μ ′ � tan − 1 (S/C). e average direction (μ L ∈ 0, 2π) given in definition (2) is a statistic that describes the position state characteristics of the circular histogram. It does not depend on the starting point and the rotation direction, reflecting the center of the circular histogram [1,2]. e red line in Figure 3 represents the angles mean of the circular histogram. To show more clearly, Figure 3 uses the rose diagram to illustrate the circular histogram.
Suppose the circular histogram h(t) { } ( Figure 1 e corresponding point of the line mean μ L (t 0 ) on the circular histogram is formulated as

Breakpoint Selection
Criteria. e goal of linearizing the circular histogram is to be able to maintain the complete original distribution. To find the optimal breakpoint, considering the angles mean μ L is a circular invariant on a circular histogram and μ L (t 0 ) is a linear invariant of the linear histogram expanded at the breakpoint t 0 , it is hoped that the point μ L (t 0 ) on the circle corresponding to μ L (t 0 ) and μ L are as close as possible so that the linear histogram μ L (t 0 ) and μ L are points on the circle. Because of periodicity, the distance between them is different from the Euclidean distance, and more attention is paid to the difference in the direction of the two values. e cosine value of the angle between them can be used to measure the difference in the direction of them. e distance between μ L (t 0 ) and μ L can be measured by the cosine of the angles [1,2] and expressed as e value of D is only related to the angles of μ L (t 0 ) and μ L , D ∈ [0, 2]. When the two angles are the same, D � 0; when the directions of two angles are opposite, D � 2.
e mean-based selection criterion for the optimal breakpoint t * 0 is Obviously, when the circular histogram expands in the clockwise direction, the corresponding value of the line mean on the circular histogram is the same as the value obtained in equation (4). erefore, the method in this paper is unrelated to the expansion direction of the circular histogram.
It is important to emphasize that the idea of mean-based breakpoint selection criterion is different from the existing Lorenz curve-based and cumulative distribution entropybased breakpoint selection criteria [20,21]. e mean-based method uses the invariants of circular statistics and linear statistics. Lorenz curve-based and cumulative distribution entropy-based methods, using the cumulative distribution information of each linearized histogram, are related to the counterclockwise or clockwise direction of rotation. e algorithm of breakpoint selection on circular histogram is very simple and easy to implement. e algorithm of breakpoint selection with mean is illustrated in Algorithm 1.

Experiment Results and Analysis
e experiment is divided into two parts to evaluate the proposed method. e experiments are performed using Python3.8 on a PC with Intel Core 2.50 GHZ CPU and 8 GB RAM, under Windows 10 operating system. In circular models, the von Mises distribution (also known as the circular normal distribution) is the most important distribution. e status is equivalent to the normal distribution in the linear distribution. Many theories with applications in the circular statistics are often discussed for the von Mises distribution [1,2]. erefore, the first part shows the results of selecting breakpoint for different types of artificial von Mises distributions and discusses the influence of parameters b (the mean direction) and κ (the concentration parameter) of the bimodal von Mises distribution [1,2] on the proposed method. In the second part, the proposed meanbased breakpoint selection criterion is compared with the existing breakpoint selection criteria, including the breakpoint selection method based on the Lorenz curve [20] (Lorenz-based), cumulative distribution entropy [21] (CDFE-based), and artificial bee colony [22] (ABC-based) on the H component circular histogram corresponding to 8 images from the Berkeley dataset. For convenience, the quantitative level L of the H component in the experimental part is 360.
To more specifically illustrate the linearization effect of the mean-based method on the mixture distribution of the same κ, Figure 6 shows the relation between the percentage of the broken distribution (see the red box in Figure 5(f )) and the mean direction difference (|b 2 − b 1 |). Due to the symmetry of the circle, the mean direction difference only is selected from 0 to 180. e effect of the rest is equivalent to its symmetrical part. When the mean direction difference is closer to 180, the proportion of the broken distribution will suddenly increase, but the maximum will not exceed 0.5%. When the mean direction difference is less than 150, the linearization effect is similar and better.
Similarly, Figures 7-9 show the relation between the percentage of the broken distribution and the mean direction difference when κ 1 and κ 2 are the combinations of (5, 10), (10,15), and (10, 20), respectively. e maximum percentage of broken distribution is 20% in Figure 7, and it is 14% in Figure 8, which shows the lower the overall concentration parameter, the worse the overall linearization effect.
In Figure 9, the maximum percentage of broken distribution is 16%. Figures 8 and 9 show the small difference in the concentration parameter of the two distributions is conducive to the linearization of the circular histogram.
In Figures 7-9, the percentage of broken distribution in different concentration parameters is positively related to the mean direction difference, and it increases exponentially around 180. is exponential increase greatly reduces the linearization effect near 180.  Figure 5: Different mixture bimodal histograms and corresponding mean-based linearized histograms: (a-c) circular histograms of the mixture distributions of g(t; 0, 10) with g(t; 60, 10), g(t; 120, 10), and g(t; 180, 10), respectively, and (d-f ) linearized histograms of (a-c) with the mean-based breakpoint selection method, respectively.
Input: H-histogram Hist, Hue magnitude L Output: e optimal breakpoint t * t * ←0 en calculate the distance D(t * ) with Hist according to equation (5) for t � 1: L magnitude do Rotate historium to the right or left by t Calculate the distance D(t * ) with Hist according equation (5) if (D(t * ) < D(t * )) then (D(t * )←D(t * )) t * ←t end if end for return t * ALGORITHM 1: e breakpoint selection with mean.

Mathematical Problems in Engineering
In summary, from the results of Figures 6-9, it can be seen that the mean-based method is suitable for situations where the target is not far from the background in the circular histogram. e distance between the mean direction should generally not exceed 5π/6.

Real Circular Histograms.
To further illustrate the scope and effect of the mean-based breaking method, the H component circular histogram corresponding to the 8 color images in the Berkeley dataset is selected for breakpoint selection and compared with the Lorenz-based [20], CDFEbased [21], and ABC-based [22] breakpoint selection criteria. e linearization result of 8 images can be seen in Figures 10-17.
e variance and kurtosis have also been computed to fully compare the effects of 4 algorithms. Tables 1 and 2 depict the variance and kurtosis using the existing Lorenzbased, CDFE-based, ABC-based, and proposed meanbased histogram techniques. Equation (7) defines the calculation formula of variance. Variance represents the discrete trend of data distribution. When the data distribution is relatively scattered, the variance is large, and when the data distribution is relatively concentrated, the variance is small. Equation (8) defines the calculation formula of kurtosis. e lower limit of kurtosis will not be lower than 1, and the upper limit will not be higher than the number of data. e greater the kurtosis, the steeper the distribution: where σ 2 is variance, K is kurtosis, X i is the ith value of the variable X, X is the average of the variable X, L is the number of the variable X, and p i is the probability of the ith value of the variable X.
As we can see from Figures 10 and 11, when the distribution is unimodal (bimodal coincidence), the breaking results of Lorenz-based, CDFE-based, ABC-based, and mean-based methods are appropriate, which guarantees the integrity of the distribution.
It can be seen from the H component histograms in Figures 12 and 13 that the distance between the centers of the two distributions of the circular histogram is small. In terms of maintaining the integrity of circular distribution, the CDFE-based method and the mean-based method shows the better effect. e variance and kurtosis of the mean method is the best in Tables 1 and 2.
Comparing the results of Figures 14(c)-14(f ), the CDFEbased method shows the best effect and is the Lorenz-based method. e distance between the centers of the two distributions of the circular histogram (Figure 14       consistent with the conclusion obtained by the artificial circular histogram analysis. When the centers of the two distributions are far apart, the linearization effect of the mean-based method will deteriorate.
From Figure 15, we can see the linearization effect of the CDFE-based and mean-based methods is better than that of the Lorenz-based and ABC-based methods. e linearization result using the Lorenz-based method does not Mathematical Problems in Engineering maintain the integrity of the circular distribution; a small part becomes the right part of the linearized histogram. e linearization result using the ABC-based method completely destroys the distribution. For complex distributions in Figures 16 and 17 that most color types appear, the frequency is different. It can be seen from Tables 1-2 that the mean-based method has a slight advantage over the ABC-based method, the CDFE-based method, and the Lorenz-based method in variance and kurtosis.
On the whole, from the linearization results of the 8 circular histograms, the mean-based method is superior to CDFE-based, Lorenz-based, and ABC-based methods, when the difference between the center positions of the target and the background is not particularly large. e mean-based method is effective in suitable scenarios. Judging from the average of metrics of the 8 images shown in Tables 1 and 2, the mean-based method is the best among the four methods. Table 3 shows the time spent on the linearization of the above 8 images by the Lorenz-based, CDFE-based, ABCbased, and mean-based methods, respectively. e meanbased method has a great advantage in speed. Compared with the Lorenz-based method, on an average, it can save about 7 times of time. Compared with the CDFE-based and ABC-based methods, the improvement is even greater, on an average, shortening about 71 times the time.

Conclusions
For the linearization of circular histograms, we propose a new method to select breakpoint. e new method uses a simple mean operation to give the optimal breakpoint selection criterion. We discuss the applicable scenarios of this breakpoint selection criterion. Experiments show that the new method can guarantee the linearization effect of the circular histogram in suitable scenarios, reduce the computational complexity of the breakpoint selection, and provide a better way for the linearization of the circular histogram. In future, we will explore a new breakpoint selection method based on the mean-based method in this paper, which have a better linearization effect when the centers of the two distributions are far apart.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.