Fuzzy Clustering Algorithm with Histogram Based Initialization for Remotely Sensed Imagery

. The paper presents histogram-based ini-tialzation of Fuzzy C Means (FCM) clustering algo-rithm for remote sensing image analysis. The draw-back of well known FCM clustering is sensitive to the choice of initial cluster centers. In order to overcome this drawback, the proposed algorithm, ﬁrst, determines the optimal initial cluster centers by maximizing the histogram-based weight function. By using these initial cluster centers, the given image is segmented us-ing fuzzy clustering. The major contribution of the proposed method is the automatic initialization of the cluster centers and hence, the clustering performance is enhanced. Also, it is empirically free of experimentally set parameters. Experiments are performed on remote sensing images and cluster validity indices Davies-Bouldin, Partition index, Xie-Beni, Partition Coeﬃcient and Partition Entropy are computed and compared with prominent methods such as FCM, K-Means, and automatic histogram based FCM. The experimental outcomes show that the proposed method is competent for remote sensing image segmentation.


Introduction
Remote sensing images are used to extract land cover information which is useful for many applications based on Geographical Information System (GIS), such as creation & update of maps, infrastructure development, disaster planning, and military operations. In order to extract land cover information, segmentation plays a crucial role in image analysis and understanding. Several clustering algorithms such as ISODATA [1], K-Means [2], Expectation-Maximization [3], K-Nearest Neighbor [4], FCM [5] and their variants have been proposed for image segmentation.
Clustering is a process to label a set of given observed input data vectors or image pixels such that samples of the same label are homogeneous and different from the samples of the other labels. There are four broad categories of clustering methods [6]: • hierarchical, • based on graph theory, • decomposition of a density function, • minimization of an objective function.
In this paper, we focus on the clustering methods using the minimization of an objective function, which can be further divided into two main clustering strategies: • hard clustering scheme, • soft or fuzzy clustering scheme.
The hard clustering methods classify each data point or pixel to one of the clusters, therefore, the results are often very crispy. However, this crisp clustering causes some difficulties in remote sensing images, which have limited spatial resolution, poor contrast, the complexity of the ground surface and diversity of disturbance or a spectral variation [7]. On the other side, soft clustering methods are based on Fuzzy set theory [8] and [9] and invoke the concept of partial membership function, which has been widely used in data clustering and image segmentation. The FCM [11] is one of the most popular and successful algorithms used for image segmentation because it has robust characteristics for ambiguity and can retain much more information than hard segmentation methods [12].
The FCM clustering was originally developed by Dunn [10] and further improved by Bezdek [11]. In conventional FCM, the initial cluster centers are randomly selected which may affect the clustering performance. In order to enhance the accuracy of fuzzy partitioning, different kinds of improvement have been contributed by many researchers. Kim et al. [13] proposed a method to initialize the cluster centers of FCM clustering by using the dominant colors but its clustering efficiency is degraded in images which consist of clutter scenario. Tian et al. [14] present an automatic K-means initialization algorithm based on histogram analysis for medical CT images. The initialization of K-means clustering is accurate and closer to ground truth. In addition, Zhong et al. [15] proposed an automatic fuzzy clustering method based on Adaptive Multi-Objective Differential Evolution (AFCMDE) algorithm having two stages. The optimization stage is used to find the optimal number of clusters and classification stage initializes the cluster centers and classifies the image by using FCM clustering. In another study, Shang et al. [16] proposed a Clone Kernel Spatial FCM (CKS_FCM) algorithm. In that work, the cluster centers are initialized by using an immune clone algorithm and also enhanced the robustness to the noise by incorporating spatial information. In 2014, Ghaffarian [17] proposed an Automatic Histogram based Fuzzy C Means (AHFCM) clustering for remote sensing imagery. The number of clusters and their initial values have been determined by slope analysis on the histogram and band fusion principle. However, the number of clusters is controlled by two threshold parameters and there is no defined way to determine their value, therefore, the prediction of the threshold value can be done by experience or hit and trial error experiments.
The common drawback of the FCM algorithm is the selection of initial cluster centers. In remote sensing images, a lot of noise and some very similar objects exist; therefore the selection of initial cluster centers controls the clustering outcome and the convergence speed of the algorithm. To address the abovementioned drawback, the paper proposed a novel and robust histogram-based initialization of FCM clustering algorithm for remote sensing imagery. The proposed algorithm first preprocesses the input image and predicts the initial cluster centers by iterative maximizing the weight parameter of the brightness values. Finally, the image is segmented by using FCM clustering and above-calculated initial cluster centers.
The paper is organized as follows: Sec. 2.
describes the proposed method; Sec. 3. discusses the cluster validation indexes and results obtained from the algorithm and Sec. 4. concludes the paper.

Proposed Method
The flow chart of the proposed method is shown in Fig. 1. It comprises preprocessing, automatic initialization of cluster centers and clustering using an FCM algorithm.

Input image
Extract green band

Preprocessing
Initially, the green image band (shown in Fig. 2 is extracted from the given input visible color image ( Fig. 2(a)) because it contains more information due to the Rayleigh effect [18]. In order to increase the contrast and dynamic range, the histogram equalization ( Fig. 2(d)) is performed.

Initialization of the Cluster Centers
In FCM, the process of the cluster centers initialization strongly determines the convergence speed and segmentation accuracy of the algorithm. The initialization of cluster centers in conventional FCM is randomly done, which will affect the convergences on local optima. In the proposed method, the optimum initialization of cluster centers is computed as follows: The preprocessed image consists of brightness values B p , p = 0, 1, 2, . . . , M and their frequency of occurrence is f p . The brightness value whose frequency of occurrence is maximum in a preprocessed image is assumed as a first initial cluster center: where c is the number of cluster centers computed till a given iteration, V is the initial cluster center and M is the number of distinct brightness values for a given image. Further, to find the next initial cluster center value, the weight parameter (W p ) is computed for each brightness value. The weight parameter is the function of the difference between the brightness value and the previously computed cluster centers and their occurrence rate. In the proposed initialization method, we have considered the occurrence frequency of brightness value as well as the distribution of the initial cluster centers. Therefore, it efficiently converges to the optimal cluster centers. Hence, the weight parameter (W p ) is computed as: The brightness value, for which the above-computed weight parameter is maximized, has been considered as the next initial cluster center: (3) If the number of already computed cluster centers (c) is less than that of the user-specified number of cluster centers (C), the algorithm iteratively computes the new cluster center using Eq. (2) and Eq. (3). Hence, by using the above procedure, the initial cluster centers have been determined.

Automatically Initialized FCM Clustering
Finally, the input preprocessed green image band is clustered by using the above-computed initial cluster centers and FCM clustering [11]. The final outcome, i.e. segmented image, is shown in Fig. 2(f) and the key steps of the proposed algorithm are summarized in Alg. 1.
Algorithm 1 Proposed Fuzzy based Algorithm.
where N is the number of data points or pixels in a given image. 4: Update cluster centers: The cluster centers are updated as: 5: Update membership matrix : The membership matrix is updated by replacing V by v (above computed cluster centers in step 4) in Eq. (4).

Experimental Results
In this section, the performance of the proposed method is examined and compared with three algorithms, i.e. Standard FCM [11], K-Means [2], and AHFCM [19]. All experiments are implemented on the Intel Pentium CPU at 2.10 GHz, 4.0 GB internal RAM and Windows 7 computer using MATLAB 2018.

1) Davies-Bouldin (DB) Index:
DB index is defined as the ratio of the sum of withincluster scatter to the between-cluster separation for C clusters as shown in Eq. (7).

2) Partition (SC) Index:
SC is defined as the sum of the ratio of cluster compactness to their separation. In other words, it is the sum of the individual cluster validities and it measures the normalized division using the fuzzy cardinality of each cluster:

3) Xie-Beni (XB) Index:
XB is defined as the ratio of the total variation and the minimum distance between clusters:

4) Partition Coefficient (PC) Index:
PC measures the amount of overlap between clusters, and is defined as:

5) Partition Entropy (PE) Index:
PE measures the fuzziness of clusters: where C is the number of clusters, N is the number of data points or pixels, u ij is the membership degree of i th pixel for j th cluster, x is the data point or image pixel value, v is the cluster center, N j & N k is the number of pixels in j th & k th cluster respectively and n j is the fuzzy cardinality of the cluster and equal to For better clustering outcomes, the minimum value of DB, SC, and XB are preferred. Also, when the value of PE approaches zero and the value of PC approaches one, it results in better partitioning. In all the experiments, the exponent m is taken as 2, maximum number of iterations 100, and parameter as 10 −4 .

Results and Discussion
The proposed clustering algorithm has been evaluated on a dataset provided by NWPU-RESISC45 [23]. It has been provided by Northwestern Polytechnical University (NWPU) and avails the benchmark for Remote Sensing Image Scene Classification. This dataset contains 31 500 images covering 45 scene classes with 700 images in each class. The size of each image is 256 × 256 pixels in the Red-Green-Blue (RGB) color space, a spatial resolution varies from about 30 m to 0.2 m per pixel and images are obtained from Google Earth.
A large number of experiments have been performed to evaluate the proposed clustering algorithm. The test images on which we got peculiar outcomes are shown in Fig. 3, where the first row ( Fig. 3(a), Fig. 3 Fig. 3(c) and Fig. 3(d)) shows the input images, the second row (Fig. 3(e), Fig. 3(f), Fig. 3(g) and Fig. 3(h)) shows the segmented outcome of FCM and the third row ( Fig. 3(i), Fig. 3(j), Fig. 3(k) and Fig. 3(l)) shows the results extracted by K-Means. Then, the fourth row shows the results of AHFCM and the fifth row shows the results of the proposed method. The performance of the proposed algorithm is examined for the different number of clusters on the diverse domain images. As shown in Fig. 3, the first column compares the clustering outcomes on cloudy image clustered in six classes. Similarly, the second column evaluates the proposed method for dense residential image clustered in four classes. Columns third and fourth evaluate the algorithm on lake image clustered in five classes and on mountain image clustered in seven classes respectively. The analysis of results reveals that the more compact and homogeneous clusters can be obtained by using the proposed method in comparison to earlier counterparts.
In Tab. 1, the above-defined cluster validity parameters SC, XB, DB, PC, PE are computed for standard FCM, K-Means, AHFCM, and the proposed clustering algorithm for images #1 to #4 respectively. As seen from Tab. 1, for each image, all indices show that the proposed clustering algorithm outperforms over the randomly initialized conventional FCM, K-Means, and the automatically initialized AHFCM algorithms. Taking the DB index as an example, the proposed algorithm gains the average improvement by 12 %, 11 %, and 28 % over the conventional FCM, K-means and AHFCM algorithms.

Computational Complexity
The time required for the proposed method and AHFCM to cluster the test images is shown in Fig. 4. The figure shows that the elapsed time for the proposed method is comparatively less than that of AHFCM. This shows that the initialization of the cluster center controls the number of iterations and hence the elapsed time required for clustering.

Conclusion
In this study, we proposed a histogram-based initialization of FCM clustering algorithm for remote sensing images. The green band of a given image is enhanced by the histogram equalization process. Further, the initial cluster centers are automatically computed by maximizing the weight parameter. Experiments on NWPU-RESISC45 [23] dataset are performed to demonstrate the effectiveness of the proposed algorithm. The qualitative, as well as quantitative assessment, show that the proposed method performs better than the comparable clustering algorithms FCM, K-Means, and AHFCM. The experimental analysis shows that the preprocessing done at the beginning and the automatic initialization of FCM clustering play a significant role in a successful clustering process.
In the future, further research will be conducted on automatic determination of the number of clusters.
Technique. Her teaching and research interest include Image Processing, Video Signal Processing, Wireless networks. She is a member of the Institution of Electrical and Electronics Engineers (IEEE). She has published more than 100 papers in various prestigious journals and conferences.