Fuzzy Clustering Methods in Multispectral Satellite Image Segmentation

. Segmentation method for subject processing the multispectral satellite images based on fuzzy clustering and preliminary non-linear ﬁltering is represented. Three fuzzy clustering algorithms, namely Fuzzy C-means, Gustafson-Kessel, and Gath-Geva have been utilized. The experimental results obtained us-ing these algorithms with and without preliminary nonlinear ﬁltering to segment multispectral Landsat images have approved that segmentation based on fuzzy clustering provides good-looking discrimination of different land cover types. Implementations of Fuzzy C-means, Gustafson-Kessel, and Gath-Geva algorithms have got linear computational complexity depending on initial cluster amount and image size for single iteration step. They assume internal parallel implementation. The preliminary processing of source channels with nonlinear ﬁlter provides more clear cluster discrimination and has as a consequence more clear segment outlining.


INTRODUCTION
It is known that forests and wetland are the main factors preventing the decline in biodiversity on the Earth in aggressive conditions of human activity. The main problem is agricultural expansion and deforestation. Deforestation is the consequence of two main reasons -agricultural expansion and accidental events. But forestry and agriculture are inseparable and condemned to work hand in hand. Significant part of forest is damaged by fire, pests, irrational agricultural politics leading to change of ground water level and as result leads to sickness and wreck. At now it is possible to discriminate forest areas on early stage of damaging using multispectral images of high spatial resolution received from satellites. That technology started about 40 years ago to monitor Earth surface at now is the effective instrument of ecological and agricultural monitoring such the regions as forests and wetland and preventing any accidents. Multi-spectral satellite images are able to bring us information in both visible and invisible spectral bands about vegetation, water temperature and land cover.
Multi-dimensional cluster analysis and segmentation are base procedures in thematic processing the multi-spectral images received from remote sensing satellites. There are lot of clustering and segmentation methods which have different benefits and imperfections. The special class of such the methods is represented by fuzzy clustering ones.

METHOD DESCRIPTION
Three clustering algorithms based on fuzzy methods where developed and utilized as part of segmentation software. There are fuzzy c-means [1] (FCM) and its variants -Gustafson-Kessel clustering algorithm [2,3] and Gath-Geva one [4].
FCM is a method of data clustering which allows one data objects to be a member of two or more clusters. This method developed by [5] and improved by [6] is based on minimization of the following objective function: where m is real number ≥ 0, µ ij -degree of membership of xi in cluster s j , x i is the multidimensional data object, s j is the multi-dimensional center of the cluster, and ║.║is any norm expressing the similarity between any measured data and the cluster center. Fuzzy clustering is carried out through an iterative optimization of the J m with the update of membership matrix µ ij and the cluster centers sj using following algorithm: 1. The data set X = (x j ) = (x j1 , x j2 . . . , x jp ) T given.
We choose the number of clusters, denoted by "c", 1 < c < n, where p -dimension of data set. 2. Initialize the partition matrix µij using random number generator in range [0, 1]: 3. Calculate the cluster centers s i = (s i1 , s i2 , . . . , s ip ) T (i = 1, 2, ..., c): 4. Calculate the error: If error is small enough, then stop iterations, else go to item 5. 5. Calculate the "new" partition matrix µ ij , where 2 mj d is similarity measure: (5) and then go to step 3.
Fuzzy c-means, Gustafson-Kessel, and Gath-Geva clustering algorithms are distinguished in the definition of distance function between the objects to be classified: Fuzzy C-means: (6) Simple Euclidean distance provides hyperspherical form of clusters. Gustafson-Kessel (relation (7)) and Gath-Geva (relation (8)) provide ellipsoidal form of clusters and more comprehensive partitioning the multi-dimensional data. (7) (8) F i is covariance matrix and α i is calculated in following manner: Covariance matrix is used to form non-spherical clusters which are more suitable for multidimensional data partitioning. That also leads to visible difference in algorithm convergence, effectivity and performance of processing the multispectral images.
Distance function for Gustafson-Kessel and Gath-Geva algorithms uses covariance matrix to take into account different metrics (scales) in different dimensions. The large images to be processed might be the reason of weak covariance matrix conditionality that results in heavy losses of aptitude for discrimination for Gustafson-Kessel and overflows for Gath-Geva when fixed digit capacity arithmetic based on atomic numerical types is used. To keep of those problems standard normalization methods and arbitrary (multi-precision) arithmetic are the sufficient efforts. When 2 ij d becomes small the overflow may occur. To prevent that case the distant function is confined from below by some smallest value d min .
Due to source signal noise the some spatial segment granularity occurs. To reduce that granularity the weak nonlinear filtering using algorithm [7] was applied to source channels. That algorithm may be used both for edges extraction and for nonlinear filtering. It does not lower the sharpness of transitions of channel brightness the boundaries of cover types remain clear. As a result the boundaries of spatial segments after clustering process also remain legible. Algorithm works with values of brightness and pixels coordinates simultaneously. To form the homogeneous areas or search the edges on the gray-scale image a round mask is used. Usually the mask's radius is 3.4 pixels which gives mask of 37 pixels size. The mask is placed at each point of the image and the brightness of each pixel of the mask is compared with the brightness of the mask central pixel (relation (10)), where I(r0) -the brightness value of the mask's center, I(r) -the brightness value of the mask pixel, t specific threshold. (20) The result of the comparisons is conform to the relation (11), where n is the quantity of pixels in the USAN (Univalue Segment Assimilating Nucleus). Then that sum should be minimized, so the algorithm is called SUSAN (Smallest USAN).
Parameter t means the maximum of ignored noise. Then n is compared with it's thresholding value g, which is 3n max /4, where n max -maximum value which could be assigned to n.

EXPERIMENTAL RESULTS
To test the fuzzy clustering algorithms the multispectral Landsat images have been used. Fig. 1 shows channels 2-5 which were chosen for processing. Landsat channels shown on fig.1 have different dispersion and were equalized using netpbm utilities to make them visible more clearly. The channel data to be fed into clustering software remain untouched because of clustering algorithm takes in to account real distribution of channel signal level on one's own. Raw output data of developed segmentation software is 2-d vector field over R N , where N is a number of clusters. Each component of that field is strictly increasing function of probability for pixel to be member of one of N clusters.  In order to obtain exploitable data for a classification scheme, we first needed to extract relevant information of raw Nomarski's microscopy issued images. We proposed to proceed in two steps [2]: first a detected items' images extraction phase and then an appropriated coding of the extracted images.
On presented images one can see some well discriminated land cover types. There are open water, forests, wetlands, bushes and agriculture areas. Results of segmentation on 15 clusters using fuzzy C-means without any preliminary filtering show that there also are fuzzy boundaries of segments on the cluster map going on to sand in some places. To diminish such the phenomena one can use filter which will smooth slightly changing areas rather blur clearly discriminated land cover type edges. Most appropriate filter having got such the behaviour is the non-linear one described in [7]. The fig. 2 demonstrates smoothing properties of that filter when brightness threshold was chosen be equal to 7, 15, and 20. Besides the brightness threshold parameter the filter has spatial one -distance threshold which was set into default value corresponding to radius equal to about 3.4 and was not changed across experiments. As one can see, smoothing property of that filter increases with brightness threshold grows. At the same time some part of edges remains sharp. Amount of sharp edges and consequently cluster granularity depend on brightness threshold. Thus we get additional degree of freedom in segmentation process control that could dramatically improve the segmentation results.
Nonlinear filtering lowers intra-class covariance but practically doesn't touch interclass ones, so cluster discrimination doesn't suffer. Smoothing the channels before segmentation allows to eliminate fine granularity of output segment map and to merge the small cluster with large ones. In reality this means disappearance of some real small objects, for example, bushes in wetland areas. Nevertheless, sometimes it is necessary to get generalized segment map. The SUSAN filter allows fulfilling nonlinear processing varying the threshold in wide region achieving the smoothing results from negligible small up to very strong. Fig. 3-6 demonstrate examples of segmentation using non-linear filtering with different brightness threshold for Gustafson-Kessel and Gath-Geva clustering algorithms. The cluster number was chosen taking into account real diversity of land covers for territory being investigated. Actually a lot of experiments with different cluster number were fulfilled. As soon as segmentation become to look stable cluster number was stated.

Fig. 3 -Segmentation results using Gustafson-Kessel (a) and Gath-Geva (b) algorithms at 20 clusters without any filtering.
Obtained results have transformed into scalar field using simple maximal probability solver to get the possibility of visual evaluation and have presented as colour map of spatial partitions. As a result every pixel became a member of one cluster. All pixels of same colour are members of same cluster. No any intelligent algorithm was used to colourize segment map so the same areas on different pictures have different colours. As one can see the growth of the threshold leads to raising the segment area when target number of clusters is fixed. However it should be pointed that general structure of segment arrangement and its structure in relation with land cover types is not significantly changed.

ACCELERATION OF CALCULATIONS
For increase in usability and calculation efficiency can be made with use in systems of mass parallelism, and also have connection with GIS. As a basis for parallel processing interface MPI (The Message Passing Interface) is chosen. Tests of system of processing of images were made on a supercomputer "SKIF K-1000". At carrying out of experiments it was involved from 1 up to 64 computing units for colour images in the size 2000x2000 and 1000x1000 pixels. Dependence of a system operating time of on amount of the involved computing units is resulted on the schedules resulted on fig. 7.

CONCLUSION
Segmentation results of multispectral Landsat images obtained using fuzzy clustering methods such as Fuzzy C-means, Gustafson-Kessel, and Gath-Geva with and without preliminary nonlinear filtering testify that segmentation using fuzzy clustering methods provides good-looking discrimination of land cover types that occurs in uch the complex cases as wetland, water-meadow, and bush areas. The discrimination quality of segmented images was tested and approved by landimprovement specialists using data of land-based expedition. The non-linear filtering with guided smoothing is the convenient instrument of preliminary processing for semi-automatic segmentation of complex land covers.
Implementations of Fuzzy C-means and Gustafson-Kessel algorithms have got linear computational complexity depending on innitial cluster amount and image size for single iteration step. All the algorithms assume internal parallel implementation for MPP computer. The preliminary processing of source channels with nonlinear filter provides clearer cluster discrimination and has as a consequence more clear segment outlining and provides operated generalization of output segment maps.
Really, segmentation software returns the results as 2-d vector field v i , which is the strictly increasing function of probability p i that pixel is a member of ith cluster. When using more complex tool then simple maximal probability solver, for example, maximal likehood one, it is possible to significantly improve the results of segmentation and to avoid large part of manual processing the multi-spectral data.