Superpixel segmentation and machine learning classification algorithm for cloud detection in remote‐sensing images

Cloud detection is a fundamental yet challenging topic in remote-sensing image processing. The authors propose a method for multi-dimensional feature extraction and superpixel segmentation, and use a voting-based clustering ensemble to capture the whole target shape. In order to further identify clouds, snow-covered lands, and bright buildings on remote-sensing images, they first implement an Ostu threshold to get high grey-level sub-regions, and then extract the descriptors of these sub-regions and put them into the softmax regression classifier. Regarding these methods, the authors conduct experiments using GF-1 remote-sensing images. The results demonstrate the effectiveness and excellency of their proposed method.


Introduction
Cloud detection is a fundamental yet challenging topic in remotesensing image processing. With the rapid development of remotesensing technology, more and more high-spatial-resolution remotesensing images become available [1]. They contain rich visual information that can describe surface appearance in great detail. As such, especially for optical remote-sensing images, how to accurately, efficiently, and robustly detect targets in complex scenes is a burning question in remote-sensing imaging analysis.
Mainly, there have been two main methods of cloud detection discussed in publications. The first is to separate multi-cloud regions on the image, regarding cloud detection as a segmentation problem. De Wildt (2007) used multiple thresholds to detect clouds among high-brightness objects from the image background [1]. Zhang and Xiao (2014) put forward a scheme to further refine clouds based on the RGB colour aerial photograph partition through a series of steps, which is not quite robust with parameter settings in segmentation process [2]. Liu S et al. (2014) applied superpixel segmentation method to detect clouds from all-sky images, but the detection relies on the calculated threshold, in which the problem of similarity detection problem for other targets (e.g. snow covered lands) is unsolved [3]. The second method, with the application of machine learning, is to use convolutional neural networks and other classifiers to process cloud detection, classifying all pixels on the image as cloudy or non-cloudy. Rossi et al. (2011) used SVD to extract features on blurred images and apply SVM for cloud detection, but the scheme requires that both QuickBird and Landsat 7 satellite images to be co-registered [4]. Vivone et al. (2014) introduced a new penalty item in the classic maximum posteriori probability-MRF (MAP-MRF) to improve the accuracy of classification [5]. Kun Yuan et al. (2017) proposed an edge-aware network and an easy-to-hard training strategy [6], which achieved a good accuracy and better convergence with test image size 300 × 300, while larger size image tests are needed. However, the aforementioned methods face three main problems: (1) The complex structures of neural networks heavily consume calculating time; (2) The difficulties in classification of cloudy regions and bright non-cloudy regions (e.g. snow covered lands) have not been considered particularly; and (3) Segmentation and classification are implemented separately.
In this paper, we propose a method to segment remote-sensing images, extract features of high grey-level superpixel sub-regions, and classify and detect cloudy areas using machine learning algorithm. Specifically, a texture including six-dimensional feature vectors is proposed and applied to SLIC, which improves segment quality. Superpixel sub-regions fusion is aided by a voting-based clustering ensemble method. The Ostu threshold is used to detect clouds, snow, and other bright objects, which are key parts to be classified, then the descriptors of picked superpixels are extracted and put into softmax regression classification.

Local binary pattern histogram
Local binary patterns (LBP) is a widely used texture operator first proposed by Ojala et al. [7]. Due to the use of multi-scale filters, it is efficient to apply LBP extraction, where invariance to scaling and rotation can be achieved. The idea of LBP operator is to assign each pixel with a dependent code greyscale. By comparing the grey level of the centre pixel I c of coordinates (x c , y c ) and its neighbours I n , the texture feature can be described as where p is the number of neighbouring pixels. In general, we consider a 3 × 3 box where there are eight neighbours (p = 8) for the centre. The LBP values are among 0∼255 for each pixel, and a histogram is calculated based on these values to form the LBP descriptor. In order to reduce the length of the feature vector, we use a simple rotation invariant descriptor, i.e. a uniform LBP, which only contains the 0-1 or the 1-0 transition.

Superpiexl sub-regions segmentation
Superpixels provide a convenient primitive, from which local image features can be computed. They are able to capture redundancy on the image and greatly reduce the complexity of subsequent image processing tasks. Superpixel segmentation can take the full advantage of multi-dimensional features of the image to improve accuracy of the remote-sensing image segmentation with adaptability and robustness. We propose an improved Simple Linear Iterative Clustering method, which clusters pixels in the combined six-dimensional colour, texture, and cluster distance to efficiently generate compact and nearly uniform superpixels. The idea is to convert the colour image into the CIELAB colour space, and extract the sixdimensional feature vector [ Suppose the original image has N pixels, and the number of the expected superpixels is K, then the size of each superpixel will be N/K, and the distance between each cluster centre will be ∼ S = N /K. In case that the cluster centre is located at the edge of the image, we move the cluster centre to the position with the smallest gradient value in a 3 × 3 window. Each cluster centre is labelled i.
We then estimate the similarity between each pixel and the closest cluster centre beside it. After this step, we assign the label of the most similar cluster centre to this pixel, and perform cluster convergence through multiple iterations. The expressions of the six-dimensional features clustering distance and similarity are where d lab is the difference in CIAELAB colour space between two pixels; d xy is their spatial distance; d lap is the texture unit histogram distance; D s is the weighted similarity of six-dimensional features; i denotes the cluster centre; j denotes the pixel corresponding to 2S × 2S size neighbourhood of the cluster centre i, and m is the balance parameter of colour similarity component, which can be taken in the range [1,20]. A larger m indicates that more spatial proximity is emphasised and more compact the cluster is. t is the balance parameter of texture similarity component used to measure the weight of the three features (i.e. colour, space and texture) in the similarity calculation (see Fig. 1).

Clustering ensemble
The segmentation based on improved six-dimensional SLIC produces uniform small-sized blocks, which may lose the complete information of the target shape in real remote-sensing images. In our work, to ensemble the whole shape of the target, the connectivity is considered in both the last step of improved sixdimensional SLIC and by changing the size of SLIC superpixels and using voting strategy. The voting strategy improves the accuracy of finding the similarity between each label. Meanwhile, it ensures the computational speed by using distance clustering. Therefore, voting-based clustering ensemble obtains a more generalised algorithm. A similarity matrix is calculated as A i = number of times being boundary of cluster number of clustering times H We set a total number of clustering time H. A i add up 1/H when the cluster boundary is kept in the same position. If A i is larger than 0.5, we ensemble the two clusters.

Otsu thresholding
The grey levels of clouds, lands, and oceans are different from each other in remote-sensing images. Based on this, we can set a threshold to identify superpixels of high grey values, which are similar to each other and more difficult to detect, examples of these include clouds, snow covered lands and other bright buildings. A threshold is set to separate bright targets and reduce the number of test samples to the classifier. The threshold is determined using the Otsu method via minimising the between class variance of grey histogram.
In using the Otsu method [8], we search for the threshold that minimises the intra-class variance, which is a weighted sum of variances of the two classes: where ω 0 , ω 1 are the probabilities of class occurrence, σ 0 2 and σ 1 2 are the class variances, μ 0 and μ 1 are the class means.
Otsu shows that minimising the intra-class variance equals maximising inter-class variance [8]: Then, the optimal threshold t * can be estimated

Softmax regression
Softmax regression is applied to classify test superpixels after clustering ensemble and Otsu filter. In each test superpixel, LBP feature is used as the descriptor. Softmax regression generalises logistic regression to multiclass identification problems and keeps the high efficiency. In our method, we aim to classify cloud, snow-covered lands, and bright buildings after choosing the targets superpixels by using Otsu threshold. The label y take on three different values. Thus, in our training set (x (1) , y (1) ), …, (x (m) , y (m) ) , we now have that y (i) ∈ {1, 2, 3}. We can estimate the probability of the class label taking on each of the three different possible values, and our hypothesis h θ (x) is Here θ 1 , θ 2 , θ 3 ∈ ℜ n + 1 are the parameters of our model. Notice that the term 1/ ∑ j = 1 3 e θ j T x (i) normalises the distribution, so that it sums to one. The cost function of softmax regression is given by Bishop as below [9]: Note that the indicator function is used here, which is defined as 1{true statement} = 1, 1{false statement} = 0.
We resort to an iterative optimisation algorithm (e.g. gradient descent) to solve for the minimum of J(θ). Taking derivatives, the gradient is (see Fig. 2):

Our method
Our method is to combine the efficient multi-dimensional featuresbased superpixel segmentation method and robust machine leaning classifier to solve target-detection problems in remote-sensing images. A texture included six-dimensional feature vector is proposed and applied to SLIC, which improves the segment quality. A voting-based clustering ensemble method helps superpixel sub-regions fusion. The Ostu's threshold is used to detect the cloud, snow, and other bright parts, which is the key parts to be classified. Then, descriptors of picked superpixels are extracted and put into softmax regression classification Fig. 3.

Experiments and results
In our study, we used GF-1 satellite remote-sensing images. The GF series satellite is a series of satellites developed and launched by China. The GF-1 satellite has two panchromatic wide coverage cameras with 2 and 8 m resolution, and four 16 m resolution multispectral cameras, with a width of 60 and 800 km, respectively. The GF-1 remote-sensing images are widely used in target detection, resource detection, ecological research, survey mapping, military activities etc. Fig. 4 shows the result of six-dimensional features-based SLIC segmentation. The original image is 909 × 1605 pixels, and the number of the expected superpixels is 200. The clouds, lands, oceans, and islands are segmented. Fig. 5 shows the comparison of original ground remote-sensing images, SLIC segmentation, and improved six-dimensional SLIC segmentation. The details of the detected island and cloud are zoomed in. Our proposed method performs better in detecting the boundaries and have higher accuracy of thin cloud detection than other methods.
The cluster ensemble result is shown in Fig. 6. We have changed the size of SLIC superpixels and cluster for 10 times. Three of them are shown on the left panels, and after voting cluster ensemble, the fusion image of cloud is shown on the right panels. Clouds areas of various sizes are all successfully detected. Fig. 7 shows the grey histogram, and Fig. 8 shows the segmented cloud test samples collected after Ostu method.
The training dataset of clouds, snow, bright buildings are from GF-1 and Google Earth. Accuracy and Kappa coefficient are calculated to evaluate the softmax classification performance. The kappa coefficient is above 0.8.

Conclusion
In this paper, we propose a superpixel segment and machine learning classification method for cloud detection in remotesensing images. The six-dimensional features-based superpixel segmentation method achieves an excellent accuracy, and clustering ensemble maintains the whole shape of targets. Descriptors are extracted from low grey level selected superpixles using the Otsu threshold and put into softmax regression classifier to detect cloud, snow covered land, and bright buildings. Remotesensing images from GF-1 satellites are used in our experiments. The results show our method performs well in both segmentation and classification, which provides a fast and robust algorithm for cloud detection in remote-sensing images.