Discriminative Feature Learning Constrained Unsupervised Network for Cloud Detection in Remote Sensing Imagery

Cloud detection is a significant preprocessing step for increasing the exploitability of remote sensing imagery that faces various levels of difficulty due to the complexity of underlying surfaces, insufficient training data, and redundant information in high-dimensional data. To solve these problems, we propose an unsupervised network for cloud detection (UNCD) on multispectral (MS) and hyperspectral (HS) remote sensing images. The UNCD method enforces discriminative feature learning to obtain the residual error between the original input and the background in deep latent space, which is based on the observation that clouds are sparse and modeled as sparse outliers in remote sensing imagery. The UNCD enforces discriminative feature learning to obtain the residual error between the original input and the background in deep latent space, which is based on the observation that clouds are sparse and modeled as sparse outliers in remote sensing imagery. First, a compact representation of the original imagery is obtained by a latent adversarial learning constrained encoder. Meanwhile, the majority class with sufficient samples (i.e., background pixels) is more accurately reconstructed than the clouds with limited samples by the decoder. An image discriminator is used to prevent the generalization of out-of-class features caused by latent adversarial learning. To further highlight the background information in the deep latent space, a multivariate Gaussian distribution is introduced. In particular, the residual error with clouds highlighted and background samples suppressed is applied in the cloud detection in deep latent space. To evaluate the performance of the proposed UNCD method, experiments were conducted on both MS and HS datasets that were captured by various sensors over various scenes, and the results demonstrate its state-of-the-art performance. The sensors that captured the datasets include Landsat 8, GaoFen-1 (GF-1), and GaoFen-5 (GF-5). Landsat 8 was launched at Vandenberg Air Force Base in California on 11 February 2013, in a mission that was initially known as the Landsat Data Continuity Mission (LDCM). China launched the GF-1 satellite. The GF-5 satellite captures hyperspectral observations in the Chinese Key Projects of High-Resolution Earth Observation System. The overall accuracy (OA) values for Images I and II from the Landsat 8 dataset were 0.9526 and 0.9536, respectively, and the OA values for Images III and IV from the GF-1 wide field of view (WFV) dataset were 0.9957 and 0.9934, respectively. Hence, the proposed method outperformed the other considered methods.


Introduction
Remote sensing imaging technology such as multispectral (MS) imaging and hyperspectral (HS) imaging can perceive targets or natural phenomena remotely [1][2][3]. Depending on the wide-scale In view of this, we propose a discriminative feature learning constrained unsupervised network for cloud detection (UNCD) in MS and HS images. The proposed UNCD method depends on an important observation that clouds are sparse and modeled as sparse outliers [28][29][30]. Adversarial feature learning is conducted on an unsupervised neural network, namely, an autoencoder (AE) [31,32], to extract a compact representation of the original input image in the deep latent space. An image discriminator is introduced to correct image features in order to avoid the generalization of out-of-class features that is caused by the adversarial feature learning. With sufficient training samples, the background can be more distinctively reconstructed than the clouds. Besides, a multivariate Gaussian distribution is adopted to extract a discriminative feature matrix of the background in the latent space; hence, if the dataset contains clouds, the encoder will encourage the learning of the background distribution of the dataset. As a consequence, the residual error between these two lower-dimensional representations is beneficial to cloud detection. We conducted experiments on both MS and HS images and evaluated the performance of the proposed UNCD framework in terms of detection accuracy and generalization.
The contributions of this paper are fourfold: 1. A novel UNCD method is proposed to address the issue of insufficient training data in remote sensing images, especially hyperspectral data, in the field of cloud detection. To the best of our knowledge, in this paper, such an unsupervised adversarial feature learning model is utilized for the first time for MS and HS cloud detection. 2. Latent adversarial learning is introduced such that the AE focuses on extracting a compact representation of the input image in the latent space. 3. An image discriminator is used to prevent the generalization of out-of-class features. 4. A multivariate Gaussian distribution is adopted to extract a discriminative feature matrix of the background in the latent space, and the residual error between the low-dimensional representations of the original and background pixels is beneficial to cloud detection.
The remainder of this paper is organized as follows. In Section 2, AE and adversarial learning models are briefly described. Section 3 introduces the proposed UNCD framework. The experimental results and discussion are presented in Section 4. The discussion of the experimental results is introduced in Section 5. Finally, Section 6 presents the conclusions of this study.

Generative Adversarial Network
A generative adversarial network (GAN) is a highly efficient generation model proposed by Goodfellow et al. [33] in 2014. GANs consist of a generator G and a discriminator D, which are trained against each other. The generator G inputs a random data z whose probability distribution is p z (z) and outputs a data x f ake = G(z). The discriminator D receives two inputs, real data x with a probability distribution p data (x) and generated data G(z). The discriminator is trained to identify the real data. Via successive adversarial learning, G generates increasingly many real data. The objective of this adversarial learning process can be expressed as [33]: Due to its advantages in terms of generation and adversarial learning performance, we constructed a GAN model that is based on an encoder and a decoder for the improved production of discriminative features.

Variational Autoencoders
An AE updates the parameters by minimizing the distance between the input data and the reconstructed data, which is an unsupervised neural network. Many extensions that are based on AEs have been proposed and broadly used, one of which is the variational AE (VAE) [32]. The VAE conducts variational inference to match the distribution of the hidden code vector of the AE with a predefined prior distribution. From the perspective of the probabilistic graph model, the encoder and decoder of the VAE can be interpreted as a probabilistic encoder and a probabilistic decoder, which are denoted as p θ (x|z) and q φ (z|x), respectively [32]. To match the distribution of the hidden code vector z with a predefined prior distribution, variational inference is performed to optimize the variational lower bound on the marginal log-likelihood of each observation. Thus, the objective function of VAE can be expressed as: where x i refers to a sample from the training dataset. The first term is the KL-divergence term, which represents the difference between the distribution of the extracted latent feature samples and the Gaussian distribution. The smaller the difference, the closer the distribution of the extracted latent features is to the Gaussian distribution. The second term is the AE's reconstruction error, which represents the expectation of the reconstruction error, namely, the gap between the input and the output. The smaller the expected value of the reconstruction error, the closer the output can be to the input. The hidden code vector z is sampled from Both the encoder and the decoder originally leverage the sigmoid function as the nonlinear activation function.
Recently, the rectified linear unit (ReLU) [34] has been widely used as the nonlinear activation function. Figure 1 illustrates the overall framework of our proposed UNCD method. Let Y= y i M×N i=1 , y i ∈ R L×1 denote an input remote sensing image with M × N spectral vectors, and each spectral vector contains L dimensions. Firstly, the underlying characteristics are extracted by an encoder E and a decoder De trained via an adversarial approach, thereby yielding a compact representation

Proposed Method
, l < L of the redundant remote sensing images. Based on the observation that clouds are sparse and modeled as sparse outliers, some powerful constraints are imposed on the unsupervised neural network for the extraction of a discriminative feature matrix E (De (Z)) of the background, which has the same dimension as Z. Due to the reconstruction ability of AEs and the generation capability of GANs, it is possible to obtain features of the full image and the background in deep latent space, respectively. As a consequence, Z and E (De (Z)) are compact representations of the original image and background, respectively. Specifically, the residual error between Z and E (De (Z)) is beneficial to cloud detection. More details of each step are described as follows.

Constructing the Residual Error in the Latent Space
The proposed UNCD method relies on two assumptions for cloud detection. One is that the background can be distinguished from the clouds in the latent feature space and most of a remote sensing scene is background, while clouds are sparse. When a remote sensing image Y that contains clouds is input into the network, the network is expected to generate a compact representation of Y and to reconstruct a corresponding background via an unsupervised approach. Consequently, the residual error refers to the deep latent feature space in which the clouds are enhanced, and the background is suppressed: The encoder learns a mapping from input Y into the deep latent space, which produces a feature matrix Z that preserves the essential information including the clouds and background. Then, Z is input into the decoder De to reconstruct Y. Since sufficient training samples are available for most of the background and clouds are sparse with limited training samples in a remote sensing scene, the decoder produces small reconstruction error for the background region but relatively large reconstruction error for the clouds. The adversarial leaning terms and physical constraints regarding the characteristics of the cloud-contaminated images are imposed on the network to enhance the capability of discriminative feature learning performance.

Adversarial Feature Learning Term
Adversarial feature learning is used in the latent space to extract distinctive features. As shown in Figure 1, the encoder0 acts as a generator to produce latent feature variables z i ∼ q E 0 y i , which strives to fool the latent discriminator D z with z i ∼ p z i sampled from the prior distribution. The latent discriminator D z aims to distinguish between input that comes from the encoder z i ∼ q E 0 y i versus that from prior distribution z i ∼ p z i . As a consequence, the generator (encoder) E 0 attempts to minimize this objective against the latent discriminator D z that attempts to solve min where Loss adv_z is denoted as: Here, we set the prior distribution z i ∼ p z i as multivariate Gaussian distribution to extract feature variables that are more beneficial to cloud detection considering that clouds are sparse and the majority is background.

Adversarial Image Learning Term
The adversarial feature learning in the previous part may generate out-of-class features due to the capability of GAN to generate new variants. To generate features of the original remote sensing images, we constrain the network by the image discriminator D I to avoid the generation of out-of-class features, as illustrated in Figure 1. Via this approach, the decoder De and the image discriminator D I are combined into a GAN again, in the training phase of which the decoder De is changed to a generator to ensure that the generated image can fool the image discriminator D I . The image discriminator D I aims to distinguish whether its input comes from the generator De or the real input. The optimization problem is to achieve min De max D I Loss adv_I , where Loss adv_I can be expressed as: The first term indicates that the discriminator attempts to maximize the output of the real sample to make it closer to 1. The second term indicates that by optimizing the discriminator, the output to the generated sample becomes closer to zero. At the same time, the generator is optimized, thereby making the output of the discriminator on the generated samples closer to 1.

Latent Representation of the Background
The Gaussian distribution that is imposed on the network focuses on reconstructing the majority class, namely, the background, with relatively sufficient samples. Consequently, the generated image De (Z) with the same size as the input is closer to the background than to the clouds. Due to the strong representational capability of AE, it can still reconstruct the sparse pixels (i.e., the clouds). To further reduce the proportion of clouds in the reconstructed image De (Z), we impose the representation consistency constraint in the deep latent feature space, that is, we enforce the satisfaction of a multivariate Gaussian distribution by E (De (Z)) via an unsupervised approach that better accords with the background characteristics: where E De z i,l = µ i + σ i ε l and ε l ∼ N (0, I). The encoded output of the second encoder is represented by E De z i , and the output of the first decoder is denoted by De z i . During the training of encoder1, only the parameters of the encoder are optimized, while the parameters of the decoder De are kept fixed as, illustrated in Figure 1. E De z i,l is the projection of the reconstructed spectral vector De z i in the low-dimensional latent space, which is the enhanced background representation obtained by imposing the representation consistency constraint. Thus, the distinctive residual error for distinguishing between the background and the clouds in the low-dimensional latent space can be calculated via Equation (3), and an example is represented in Figure 2.

Reconstruction Loss
In a traditional AE network, the input data are encoded by the hidden layers to be decoded correctly in the output. The parameters are selected such that the objective of minimizing the following cost function is realized: where y i denotes an input spectral vector with L dimensions andŷ i denotes the corresponding reconstructed spectral vector. Finally, the combination of the aforementioned losses yields Loss = Loss adv_z +Loss adv_I +Loss Gauss +Loss r .
The model is trained and updated via the stochastic gradient descent (SGD) algorithm. When the model loss converges, the parameters, including the weight matrix and the bias, are obtained. According to Figure 2, the compact representation of the original image, Z, contains features of both clouds and background regions, whereas E (De (Z)) can well represent the background information with the clouds restricted. As a consequence, the residual error ∆Z obtained from the pixel-wise difference enhances the clouds while suppressing the background in the lower-dimensional feature space, which facilitates the detection of clouds as shown in Figure 3. Subsequently, an adaptive weighting method that was proposed in our previous work [35,36] is applied on each dimension of the residual error ∆Z to discard redundant information and to construct a comprehensive map, where the structure tensor (ST) is utilized. The ST of the ith dimension in the residual error ∆Z can be defined as [35,36]: where ∆Z i x = ∂∆Z i ∂x and ∆Z i y = ∂∆Z i ∂y represent the derivatives of the ith dimension of the residual error ∆Z along the x and y directions, respectively. Since this structure tensor S i of the ith dimension is a semi-determined matrix, it can be decomposed into [35,36]: where λ i 1 and λ i 2 are the non-negative eigenvalues of the ith dimension and η i 1 , η i 2 are the corresponding eigenvectors. The larger eigenvalue is represented by λ i 1 . As discussed in our previous work [35,36], the relative larger eigenvalue can represent the response intensity of each pixel in the corresponding dimension of ∆Z. Therefore, the weight vector for each dimension of ∆Z is calculated from the relative larger eigenvalue. The larger the edge intensity of the ith dimension, the more structural information the ith dimension of ∆Z contains. Consequently, the ith dimension should occupy a larger proportion. The (i − 1)th dimension should also occupy a larger proportion. If the larger eigenvalues of all the dimensions are same, all the dimensions will have the same proportion, which is the same as the result obtained with the averaging function. In summary, the weighted residual error ∆Ẑ is calculated as [35,36]: where l is the dimension of ∆Z. To further consider that the adjacent pixels have high correlations with each other, we utilize a guided filter [37] to move the neighboring pixels to the same object for both the background and clouds. Finally, we use an iterative optimization step to further increase the detection accuracy, namely, we multiply the initial detection map by the residual error ∆Z, and repeat the detection procedure until the following stopping rule is satisfied: where C f −1 D and C f D represent the ( f − 1)th and f th detection maps, respectively. According to this stopping rule, if the ( f − 1)th and f th detection maps are highly similar, the iterative optimization will terminate.

Landsat 8 Dataset
The Landsat 8 dataset was made available online https://www.usgs.gov/land-resources/nli/ landsat/spatial-procedures-automated-removal-cloud-and-shadow-sparcs-validation, and is widely used to evaluate the cloud detection methods due to its wide coverage and high resolution. This dataset consists of 80 MS remote sensing images, with details described in [38]. The Landsat 8 dataset was collected globally using two instruments: OLI (Operational Land Imager) and TIRS (Thermal Infrared Sensor). We use the data from both of these. All the channels are used for cloud detection. The spatial resolution of each Landsat image is 30 m-that is, each pixel represents information on an area of 900 m 2 . The location of the dataset is global. Each image contains 1000 × 1000 pixels in the spatial domain and ten bands in the spectral domain. The corresponding reference maps are shown in Figure 4.

GF-1 WFV Dataset
The GaoFen-1 (GF-1) satellite was launched by China. Gaofen means "high resolution" in Chinese. Detailed information on the GF-1 satellite and wide field of view (WFV) images are available online http://sendimage.whu.edu.cn/en/resources/mfc-validation-data/. The GF-1 WFV image considered here is a Class 2A product produced via relative radiation correction and system geometry correction. The Class 1A data are the original digital product for regular radiation calibration, while the Class 2A data were generated after system geometry correction, where all pixels are re-sampled to 16-m resolution with 10-bit data. The false-color images and the corresponding reference maps are shown in Figure 5.

GF-5 Hyperspectral Dataset
The GaoFen-5 (GF-5) satellite can capture hyperspectral observations in the Chinese Key Projects of the High-Resolution Earth Observation System. Six payloads were carried on it: an Advanced Hyperspectral Imager (AHSI), a Visual and Infrared Multispectral Imager (VIMI), an Atmospheric Infrared Ultra-Spectral Sounder (AIUS), a Greenhouse Gases Monitoring Instrument (GMI), an Environmental Monitoring Instrument (EMI), and a Directional Polarization Camera (DPC). The spectral coverage of these sensors ranges from ultraviolet to long-wave infrared bands. Two HS images captured by the GF-5 AHSI sensor were used to evaluate the performance of the proposed method. Each image has a size of 430 × 430 × 180 pixels. The false-color images are shown in Figure 6. The reference maps for this dataset have not been published yet.

Experimental Results
We comprehensively evaluated the proposed method on three real datasets, which included MS and HS images captured by various imaging sensors over various scenes. First, we describe the dataset. Then, we introduce the compared methods and evaluation criterion. Third, we further investigate the performance of the proposed UNCD method, both qualitatively and quantitatively. Finally, we analyze the impact of the network structure on the detection performance.

Experimental Setting
Due to insufficient samples in the remote sensing imagery, we fixed the depth of our UNCD to 2 to avoid overfitting. Inspired by [39], the number of hidden nodes was fixed to √ L + 1, where L is the number of bands of the input remote sensing image. The leaky ReLU (LReLU) was used as the nonlinear activation function with a slope of 0.2 to compress the negative input and retain the negative part of the information to a certain extent. As the number of epochs increases, both the detection performance and the computational complexity increase. Thus, we set the number of epochs to 1000 as a trade-off. The batch size was fixed to the number of pixels in the spatial domain, namely, M × N for each remote sensing image. The learning rate was set to 0.01. The parameters introduced above were set to the default values in our experiments, and they can be tuned by the user for the optimal results.
We conducted experiments on 8 NVIDIA Tesla k80 graphics cards based on a system running Python 3.6.0 and TensorFlow 1.10.0. All compared methods were implemented in MATLAB R2017a.

Compared Methods and Evaluation Criterion
To evaluate the performance of the proposed UNCD model, several representative cloud detection methods that are frequently cited in the literature, namely, K-means, PRS, SVM, PCANet, and SL, were employed for comparison in terms of both visual effects and quantitative evaluations. The K-means method is a typical unsupervised method. The PRS method yields satisfactory results on several remote sensing images, and is also an unsupervised method. Since the PRS method is only applicable to RGB images, we combined band 4-red, band 3-green, and band 2-blue into RGB images. The SVM, PCANet, and SL methods are supervised learning methods for cloud detection.
To comprehensively evaluate the cloud detection results, the three commonly used evaluation criteria, namely, the area under the curve (AUC) [40] of the receiver operating characteristic (ROC), the overall accuracy (OA), and the kappa coefficient (Kappa), were employed.
The area under the curve (AUC) [40] is widely used in objective evaluation indices. The AUC can identify general trends in the detection performance. The larger the AUC value of the ROC curve, the better the performance. The OA is defined as [41]: where true positives (TPs), true negatives (TNs), false positives (FPs) and false negatives (FNs) represent the number of correctly detected cloud pixels, the number of correctly detected non-cloud pixels, the number of false-alarm pixels, and the number of missed cloud pixels, respectively. The kappa coefficient reflects the agreement between a final cloud detection image and the ground-truth map. Compared with the overall accuracy (OA), the kappa coefficients can more objectively reflect the accuracy of the results. The larger the value of the kappa coefficient, the higher the accuracy of the result will be. The kappa coefficient is calculated as [41]:

Landsat 8 Dataset Results
The reference maps and the visual cloud detection results that were obtained by the competing methods for two images from the Landsat 8 dataset are shown in Figures 4 and 5. The reference map is published at the same time as the dataset and used together with the data. Moreover, the provider of the dataset marks the annotation. Image I is an example of a case with thin clouds, and Image II is an example of a case with thick clouds. With these two images, we compared our method with K-means, PRS, SVM, PCANet, and SL. We implemented these compared methods via the publicly released codes. According to Figures 4 and 5, the proposed UNCD method caused the smallest visual difference between the reference map and the detection map compared to K-means, PRS, SVM, PCANet, and SL. As illustrated in Figures 4 and 5, the K-means method yielded many noise-like detection results. The PRS method realized minimal improvement for Image I with respect to the K-means method, and thin clouds were undetected in large areas in Image I. The SL method generated some detection mistakes in the image due to the thinness and thickness of the clouds. The PCANet and SVM methods outperformed the K-means, PRS, and SL methods. In particular, the proposed UNCD method could accurately distinguish clouds and complex background scenes. The reason is that the proposed detection method extracts the spectral features of the image while utilizing the spatial background information between adjacent pixels. In comparison to these methods, the advantage of the proposed UNCD method is its robust detection performances for different clouds with different scales in different scenes. The objective evaluations, including AUC, OA, and Kappa obtained by all the considered algorithms, are reported in Table 1, which complies with the visual observation. Concretely, the AUCs obtained by the proposed UNCD method were 0.9543 and 0.9637 for Images I and II, respectively, which are much higher than those obtained by the second-best approach in each case, 0.8485 (PRS method) and 0.8848 (K-means method). The OA and Kappa obtained by the proposed UNCD method were also the highest, and much higher than those obtained by the second-best method.

GF-1 WFV Dataset
The reference maps and detection maps obtained by the compared methods are shown in Figures 6 and 7. The detection results obtained by the proposed method were similar to the reference maps, indicating that the proposed method could obtain a satisfactory detection performance. It is apparent that the UNCD method achieved better detection results than the K-means, PRS, PCANet, SVM, and SL methods. For the GF-1 WFV dataset, Table 2 lists the corresponding quantitative metrics. In addition, the competing methods are also reported in Table 2. From Table 2, the AUC values, OA, and Kappa obtained by the proposed UNCD method were the best among the compared methods. The reported OA scores on Images III and IV (0.9957 and 0.9934) obtained by the proposed UNCD method were higher than that obtained by the second-best methods (0.9835 for SVM on Image III and 0.9690 for SL on Image IV). The GF-5 dataset was used to evaluate the feasibility of our proposed UNCD method on real hyperspectral data. The cloud detection results are displayed in Figures 8 and 9. Image V in Figure 8 contains large clouds, while Image VI in Figure 9 contains many small clouds. It can be observed that our method was capable of discriminating clouds of different sizes from the background pixels. The K-means method introduced many false positives. PRS and PCANet failed to detect all the clouds. By contrast, the SVM and SL methods had better performance while also generating some detection mistakes. For example, the buildings in Image V were still detected as clouds by the SVM and SL methods.

Component Analysis
This section analyzes the effects of the significant processing components on the detection performance on each dataset. Since the reference maps of the GF-5 hyperspectral dataset are not publicly available, the objective evaluations could not be obtained for this dataset. Therefore, four remote sensing images coming from the Landsat 8 dataset and the GF-1 WFV dataset were used to evaluate the effect of each component objectively. Three comparison experiments were conducted. In the first experiment, only the AE was considered, which is a basic model for joint encoder-decoder training. The second considered AE with additional adversarial training (a latent feature discriminator). The third utilized the proposed method. The AUC, OA, and Kappa values were calculated as reported in Table 3. The better the detection performance, the higher the AUC, OA, and Kappa values. The AUC values were 0.9254, 0.9025, 0.9436, and 0.9462 for four images when only using AE. When applying the adversarial feature learning, the AUC values were improved to 0.9477, 0.9379, 0.9506, and 0.9689, respectively. When the multivariate Gaussian distribution was introduced, the method yielded the best AUC values, which reached 0.9543, 0.9637, 0.9676, and 0.9860, respectively. Similarly, the other two indicators (i.e., OA and Kappa) also increased. These results demonstrate that each component of the proposed UNCD method has a positive influence on the cloud detection performance.

Discussion
According to the values of AUC, OA, and Kappa in Tables 1 and 2 Figures 4-9 for various types of datasets, compared to several state-of-the-art methods, the proposed UNCD method performed the best in detection according to both the objective evaluation results and the visual observations. The superior performance of the proposed method is due to the latent adversarial learning constrained encoder, the image discriminator, and the multivariate Gaussian distribution in the network architecture. It can be concluded from the component analysis experiments that each part had a positive impact on the detection results. While the proposed UNCD method yielded promising results in cloud detection, several areas were identified during experiments for improvement in future work. It is worthwhile to further utilize the data characteristics of remote sensing images in order to optimize unsupervised networks and to improve the cloud detection performance. In addition, the network architecture can be enhanced by adding some loss functions and constraints.

and the visual observations in
The OA values of Images I-IV that were obtained by the proposed UNCD method were 0.9526, 0.9536, 0.9957, and 0.9934, respectively. Our method outperformed the second-best method by 1.44%, 5.23%, 1.24%, and 1.96%, respectively. While the performance of our method was the best among the considered methods, there is still room for improvement. Moreover, the accuracy of cloud detection was not high, and there were still many missed detections. However, other comparison methods also have this problem. Meanwhile, the proposed UNCD method is devoted to cloud detection. In the future, we will further realize cloud shadow detection. At the same time, to make the method universal, we will conduct additional experiments on datasets acquired by additional sensors.

Conclusions
In this paper, we proposed a discriminative feature learning constrained unsupervised network for cloud detection (UNCD) for remote sensing imagery. The induced latent discriminator, image discriminator, and multivariate Gaussian distribution consider the fact that clouds are sparse and modeled as outliers and realize a discriminative residual map between the original input and the background. Based on the analysis of the strong correlation between adjacent pixels, a guided filter is employed on the residual map to obtain an initial detection map. To further improve the detection performance, an iterative optimization algorithm is introduced which terminates automatically if the stopping condition is satisfied. Extensive experimental results on several datasets demonstrate that the proposed UNCD not only realizes a more favorable detection performance but also generalizes better to different datasets compared with other state-of-the-art methods. Moreover, the OA values for Images III and IV from the GF-1WFV dataset were 0.9957 and 0.9934, respectively. This signifies that our algorithm performwe better than other known algorithms. In future work, we will expand the datasets used for the experiments to improve the performance of the algorithm.

Conflicts of Interest:
The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript: