Synthetic Aperture Radar Image Change Detection via Siamese Adaptive Fusion Network

Synthetic aperture radar (SAR) image change detection is a critical yet challenging task in the field of remote sensing image analysis. The task is non-trivial due to the following challenges: Firstly, intrinsic speckle noise of SAR images inevitably degrades the neural network because of error gradient accumulation. Furthermore, the correlation among various levels or scales of feature maps is difficult to be achieved through summation or concatenation. Toward this end, we proposed a siamese adaptive fusion network for SAR image change detection. To be more specific, two-branch CNN is utilized to extract high-level semantic features of multitemporal SAR images. Besides, an adaptive fusion module is designed to adaptively combine multiscale responses in convolutional layers. Therefore, the complementary information is exploited, and feature learning in change detection is further improved. Moreover, a correlation layer is designed to further explore the correlation between multitemporal images. Thereafter, robust feature representation is utilized for classification through a fully-connected layer with softmax. Experimental results on four real SAR datasets demonstrate that the proposed method exhibits superior performance against several state-of-the-art methods. Our codes are available at https://github.com/summitgao/SAR_CD_SAFNet.


I. INTRODUCTION
R MOTE sensing image change detection aims to find the changed information between two multitemporal images acquired in the same area at different times [1]. It provides valuable information for many applications, such as target detection [2], natural resource supervision [3], and agricultural development [4]. When a natural disaster suddenly occurs, a robust change detection algorithm can efficiently detect subtle changes, and corresponding measures can be quickly taken by local governments to reduce the loss of life and property [5] [6]. Therefore, change detection has attracted extensive research attention. Synthetic aperture radar (SAR) images are produced by an active system that sends a signal to the ground, and then receives the reflected signal. Different objects exhibit different characteristics between the scattering and polarized signals, which is beneficial to further accurate interpretation [7]. The SAR sensor has an all-weather and all-time imaging capability. It can penetrate smoke, cloud, and haze to acquire high-quality images [8]. Especially when a natural disaster occurs, SAR images make up the shortcomings of other data sources, such as LiDAR, optical, and multispectral data. Therefore, SAR imagery is a very useful data source for change detection.
It is worth mentioning that some pioneer efforts have been devoted to tackling the SAR change detection task. These methods can be classified into two broad categories: supervised and unsupervised approaches [9]. The supervised approaches usually achieve better performance than the unsupervised ones by learning from labeled samples. However, it is generally difficult to collect high-quality labeled samples or acquire prior knowledge of the region to be studied. Therefore, the supervised approaches are commonly combined with an unsupervised one. The unsupervised approaches are employed for reliable sample generation, and the tedious task of manually labeling samples is eliminated.
In this paper, we focus on unsupervised SAR image change detection. Due to speckle noise, it is very challenging to identify changes accurately in SAR images. To overcome the challenges, researchers designed a framework with three steps: image preprocessing, difference image (DI) generation, and DI classification [10]. The first step involves denoising and coregistration. Denoising reduces speckle noise to some extent, but it may cause undesired degradation of the geometric details. Coregistration with subpixel accuracy is critical to generate a robust DI. In the second step, the ratio method [11] is commonly used to generate a DI. Many efficient operators are proposed, such as the log-ratio operator [12], Gaussratio operator [13], and neighborhood-based ratio operator [14]. In the final step of change detection, thresholding, maximum expectation, and clustering methods are generally involved for classification. In [15], fuzzy local information cmeans (FLICM) clustering algorithm was proposed to provide robustness to noisy images. In [16], changed and unchanged pixels were clustered by the fuzzy c-means (FCM) based on Markov random field (MRF) energy function. Gao et al. [17] proposed the PCANet to further classify the preclassification results. That achieves excellent performance in speckle noise suppression. In addition, many advanced methods have been employed for change detection, such as extreme learning machine (ELM) [18] and support vector machines (SVM) [19].

arXiv:2110.09049v1 [eess.IV] 18 Oct 2021
However, it is difficult to capture high-level semantic features for change detection automatically and effectively.
Recently, deep learning-based methods have become increasingly popular and shown their superiority in remote sensing communities [20]. Many attempts in remote sensing image analysis based on deep learning methods have been motivated by these successful applications [21]- [24]. Some researchers make efforts to design deep models for the change detection task. In these models, unsupervised clustering methods are first employed to preclassify the input SAR images, and training samples are selected from reliable labels based upon preclassification. These samples are considered as prior knowledge and fed into a deep model for training [25]. Finally, the model forms its interpretation of the input image, and then the final change map is obtained. Gong et al. [26] first presented a SAR change detection method based on deep learning. The stacked restricted Boltzmann machines (RBMs) are utilized for feature extraction. Liu et al. [27] presented a change detection framework based on a convolutional coupling network. The network is symmetric with each side consisting of one convolutional layer and several coupling layers. Some recent breakthroughs in change detection were achieved by the convolutional neural network (CNN). Mou et al. [28] proposed a change detection framework that combines CNN and recurrent neural network into an end-to-end network. Gao et al. [29] detected changed information from sea ice SAR images by transferred deep learning. In [30], a general end-toend 2-D CNN named GETNET was designed for hyperspectral image change detection. Liu et al. [31] presented a local restricted CNN framework for SAR change detection, in which the original CNN is improved with a local spatial constraint. In [32], a noise modeling-based unsupervised fully convolutional network (FCN) framework was presented for HSI change detection, which was proved with powerful learning features. However, a single-branch neural network inevitably results in the accumulation of error gradients. In other words, unstable feature representation limits its performance.
It is non-trivial to build an effective SAR change detection model, due to the following two challenges: 1) Unstable feature representation. Intrinsic speckle noise of SAR image inevitably degrades the neural network because of error gradient accumulation. It affects the performance of change detection to some extent, which results in an unstable network. Moreover, it is difficult to make full use of complementary information between different levels of features. Therefore, the extracted features may not well describe the changed information between multitemporal SAR images. 2) Insufficient feature correlation. Feature representations from two images are employed to represent the changed information, but the correlation may not be well explored through fusion operations such as summation and concatenation.
To solve the above challenges, we establish a deep Siamese Adaptive Fusion Network (SAFNet) for SAR image change detection, which exploits stable feature representation based on siamese architecture. Specifically, we employ two-branch networks to exploit the high-level semantic features, which are rarely considered in SAR image change detection tasks. Furthermore, a correlation layer is designed to integrate features from two-branch networks. The attention mechanism is introduced to adaptively choose features among different scales. Ultimately, we introduce conditionally parameterized convolutions to enhance feature representation. Extensive experiments on four real SAR datasets demonstrate the superiority of the proposed SAFNet over state-of-the-art methods. Meanwhile, we have released our codes to facilitate other researchers.
The main contributions of the proposed SAFNet are summarized as follows: • We explore the SAR change detection task via a welldesigned siamese neural network. The two branch network independently extracts the features of multitemporal images, and then the discrimination of features is improved through similarity measurement. • For stable feature representation, an adaptive fusion module is utilized to combine the outputs of different layers.
Since features from different layers contain complementary information, the attention-based fusion mechanism is introduced to use such information to improve feature representation. Therefore, the multiscale responses from convolutional layers are adaptively fused. • To avoid the loss of correlation caused by traditional feature integration methods, such as summation and concatenation, a correlation layer is designed for feature integration. The rest of this paper is organized as follows. In Section II, the proposed method is described in detail. Section III presents the experimental results on real multitemporal SAR images to validate the proposed method. Finally, the conclusion is drawn in Section IV with plausible future works.

II. METHODOLOGY
Given two coregistered SAR images I 1 and I 2 captured at different periods, we aim to generate a change map that represents the changed information between two images. The proposed change detection method is comprised of two steps: Firstly, high-level semantic features of multitemporal images are extracted through two-branch networks, and a similarity measure is used to optimize the process of feature extraction. Secondly, a correlation layer is employed to integrate the features for classification and change map generation.

A. Feature Extraction and Reliable Sample Generation
In this work, spatial neighborhood information of each pixel is analyzed for label prediction, thus detecting the changed region (the pixel with label "1") in the image. Given two multitemporal SAR images I 1 and I 2 , we first obtain the pixel-wised patch-pairs (x t1 i , x t2 i , y i ), where x t1 i ∈ R r×r and x t2 i ∈ R r×r are the image patches centered the i-th pixel in I 1 , and y i is the ground truth label. Therefore, the spatial information of the i-th pixel is fed into the CNN for feature extraction.
The traditional change detection model usually analyzes the difference image generated by log-ratio or neighborhood ratio. Therefore, change detection results depend heavily on the quality of the difference image. Especially, the performance always deteriorates when SAR images are seriously disturbed by speckle noise. In this paper, feature representation is directly extracted from the original SAR images by a twobranch CNN model, which is less sensitive to speckle noise. Therefore, stable feature representation can be achieved. The proposed SAFNet is comprised of two branches (S 1 , S 2 ) that accept patch-pairs as input. The feature extraction step is expressed as: where F i 1 denote the features of image patch x t1 i extracted by S 1 , F i 2 denote the features of image patch x t2 i extracted by S 2 , and C(·) denotes the correlation analysis. It is implemented by a group convolution operation, where F i 1 and F i 2 denote the input and kernels respectively. Then, the final fused features F i merge are obtained for classification by a fully-connected layer with softmax. In the process of network training, similarity measure and classification loss are applied to optimize the SAFNet. The pseudo-label samples are generated by the FCM algorithm in an unsupervised manner [18]. We randomly select a certain proportion of samples from the pseudo label set for training, and the rest for testing. The details of pseudolabel sample generation are described as follows: 1: The FCM algorithm is performed on DI to generate the changed and unchanged clusters: Ω 1 c and Ω 1 u . Here, the number of changed clusters is T 1 c . The upper limit of the change class is no more than T 1 c · θ, θ = 1.2. 2: The FCM algorithm is reperformed on DI to generate five clusters: . Ω 2 1 has the higher probability of changed clusters, and so on. The number of clusters is , and the rest clusters are denoted as Ω u . Thus, the preclassification map consisted of [Ω c , Ω i , Ω u ] is generated.

B. Siamese Adaptive Fusion Network
In this paper, we propose SAFNet to compare image patches from SAR images, and it achieves robust feature discrimination power through adaptive fusion (AF) module and correlation layer. The framework of the proposed SAFNet is illustrated in Fig. 1.
Image patches centered at selected sample pixels are extracted from I 1 and I 2 , respectively. Two groups of patchpairs are treated as the input of two branches. The network optimization in SAFNet is achieved by weight sharing and residual learning. Each branch of the SAFNet is comprised of three CondConv blocks and the AF module, which are merged by the correlation layer and fully-connected layer. In the following, we will describe the SAFNet in detail.
1) The CondConv Blocks: Although feature extraction is greatly improved by aggregating multiple convolutional branches, the computational cost increases dramatically [33]. To address the problem, conditional computation was performed by activating only a portion of the entire network [34]. In [35], efficient inference was performed by conditionally parameterized convolutions.
The structure of a typical conditionally parameterized convolutions (CondConv) block is shown in Fig. 2, which generates several groups of convolution kernels through routing function and initialization weights, and the details of CondConv are described in [35]. ConvBN is the combination of three operations: convolution, batch normalization, and activation layer.

ConvBN CondConv
Input: Output: The detail of Route Function Fig. 2. Detailed components of the CondConv block. ConvBN denotes three operations: convolution, batch normalization and activation layer. The conditionally parameterized convolutions (CondConv) are achieved by combining initialization weight and routing weight from route function. Therefore, the ability of feature extraction is significant with one kernel computation.
The routing weights W r generated by routing function is calculated as: where GAP denotes the global average pooling, δ is the Sigmoid function, and * is the convolution operation. W f denotes the matrix of fully-connected layer, which mapping the global feature to k routing weights. Therefore, the output of CondConv through residual learning [36] is defined as: where [α 1 , · · · , α k ] are the weights from the routing weights W r . Consequently, the CondConv is mathematically equivalent to a linear mixture of multiple convolutions.
2) Feature Fusion by Adaptive Fusion Module: Feature fusion exhibits good performance in many image classification and object detection tasks. Feature fusion is considered as an effective method for complementarity, however, the redundant features are introduced which degrade the discrimination. In the proposed SAFNet, an adaptive feature fusion mechanism is introduced to extract the complementary information among different CondConv blocks. As illustrated in Fig. 1, multiple CondConv blocks are used to capture features of different levels, including low-level, mid-level, and high-level features.
The important part of each layer of features is given higher weight through a non mutually exclusive attention vector, so the discrimination of the final fused features is stronger.
The output features of the three levels of CondConv blocks (i.e., CondConvBlock1, CondConvBlock2, and Cond-ConvBlock3 in Table I) are denoted as F 1 , F 2 and F 3 , respectively. One CondConv block is employed to extract the features at each level. In this paper, F 1 contains 16 feature maps, F 2 contains 32 feature maps, and F 3 contains 64 feature maps. Since F 1 , F 2 , and F 3 contain different number of feature maps, dimension matching is essential for feature fusion. To achieve this, 64 kernels of size 1 × 1 with different strides are employed to convolute F 1 , F 2 , and F 3 . After such convolution, the feature maps from three levels all become 64 dimensions with the same spatial size for fusion.
Recently, computer vision adaptively encodes informative context from a long-range region to select more critical information for the current task. Hu et al. [37] proposed the squeeze-and-excitation network (SENet) to explore the channel relationship of features. In [38], selective kernel network (SKNet) adaptively selects the kernel size through attention mechanism. Inspired by these methods, the adaptive fusion (AF) module is designed to merge feature maps from different levels, where an attention mechanism is introduced to emphasize important features while suppressing unnecessary ones. As illustrated in Fig. 3, three feature maps F 1 , F 2 , and F 3 are obtained from multiple CondConv blocks. Recognizing that not all the features are essential for the final classification, the attention mechanism is introduced to adaptively choose features from suitable scales. Firstly, the input feature maps are fused by element-wise summation after dimension matching as: where F ∈ R w×w×c is the fused feature. D(·) is the function of dimension matching. Then global average pooling (GAP) is employed to capture the global information of F s as follows: Then, information is aggregated in F s ∈ R c×1 . After that, F s is further squeezed into a compact feature F z ∈ R c γ ×1 by fully-connected layer as: where σ is the ReLU activation, and W ∈ R c γ ×c is the weighting matrix ( γ = 8 in this paper). Therefore, the model complexity is reduced significantly.
Fully-connected layer and soft attention (softmax layer) are used to adaptively select features from suitable scales, which is guided by the compact feature descriptor F z . Let a, b, c ∈ R c×1 represent the soft attention vector obtained by the softmax layer. Note that a i is the i-th element of a, likewise b i and c i . We can have a i + b i + c i = 1 owing to the intrinsic feature of the softmax layer. Finally, the feature map F v is obtained through the attention weights on various scales: F v is utilized to generate the final features of one branch of the SAFNet which is denoted by f eat.
3) Correlation Layer: The two branches of networks are employed to extract the semantic features from multitemporal SAR images, and then changed information prediction is achieved after feature integration. Generally speaking, it is easy to ignore the correlation through simple integration operations such as summation and concatenation. Therefore, a feature correlation operation is developed for feature integration. The features from AF modules are denoted as F v1 and F v2 , which can be further integrated through the correlation layer as shown in Fig. 4. The integrated features are computed by: where * denotes the convolution operation (conv), and F c ∈ R b×1×1×c is the final feature representation. It should be noted that the group convolution strategy is applied to the correlation Fig. 3. Detail components of the adaptive fusion module. layer. In other words, feature integration through convolution participates in the parameter optimization under the end-toend training. For the correlation layer, F v1 ∈ R b×w×w×c and F v2 ∈ R b×w×w×c are first reshaped to F v1 ∈ R 1×k×w×w and F v2 ∈ R k×1×w×w , respectively. Here, b is the batch size, and k = b × c denotes the number of kernels. It should be noted that the group convolution strategy is also applied, and the number of groups is k. Table I shows the implementation details of the proposed SAFNet (a typical branch). The input image patch is resized to 28 × 28 pixels. In each branch of the network, three levels of CondConv blocks are implemented. Conv1 and Conv2 are the transitional convolution operators among different levels of CondConv blocks. Each CondConv block contains two convolutions, as illustrated in Fig. 2. After feature extraction, the AF module is employed to emphasize meaningful features. As mentioned before, the SAFNet contains two branches of neural network. Therefore, the SAFNet generates a feature pair F v1 and F v2 . f eat 0 and f eat 1 are computed as:

C. SAFNet Optimization and Change Map Generation
After obtaining f eat 0 and f eat 1 , the Euclidean distance metric is employed to produce a value that is used to measure the feature similarity of two branches. To customize an appropriate metric where patch-pairs have stronger discrimination, the contrastive loss is employed. Contrastive loss is formulated as: where f eat 0 and f eat 1 denote the feature maps extracted from two branches of convolutional network, respectively. m is a margin, which is set as 1 in this paper, and D(f eat 0 − f eat 1 ) measures the distance between f eat 0 and f eat 1 using Euclidean distance. In this paper, y is the ground truth label that measures the similarity of image pairs. y = 0 represents higher similarity, and there is no change in the patch-pairs. y = 1 indicates that the land cover is changed.
To obtain the final classification label in the change detection task, the cross-entropy loss is utilized to optimize the network. The features obtained by the AF module are integrated by correlation operation, and label prediction is achieved through the fully-connected layer and softmax layer. Therefore, the loss value between label prediction and the ground truth is formulated as: where y t is the ground truth while y is the label mapping from the last fully-connected layer. L 1 is designed to supervise the learning process of the fused feature between multitemporal SAR images. The final loss value is represented as the combination of L 1 and L 2 : where λ is the weight parameters for the contrastive loss L 2 .
In the experiments, λ is empirically set to 0.5 Image patch pairs centered at selected samples are used for parameter optimization of the SAFNet. The same as most CNN models, L is optimized using the backpropagation algorithm. Note that L 2 can also be considered as regularization terms for L 1 to standardize the training process. Finally, the trained SAFNet is utilized for prediction. Each test sample is assigned a label according to the results of SAFNet via a feedforward propagation ("0" denotes unchanged and "1" denotes changed ). Then, the final change map is obtained.

III. EXPERIMENTAL RESULTS AND ANALYSIS
In this section, experiment data and evaluation criteria are introduced firstly. Then we will investigate the factors that may influence the performance of SAFNet. Finally, the effectiveness of SAFNet will be evaluated by comparison with several state-of-the-art methods.

A. Experiment Data and Evaluation Criteria
To verify the effectiveness of SAFNet for SAR image change detection, experiments are conducted on four real SAR datasets. The first dataset is the San Francisco dataset. It was acquired by the ERS-2 SAR sensor (the spatial resolution is 30m) over the city of San Francisco. The size of the original image is 7749 × 7713 pixels. We select a typical region (256 × 256 pixels) to evaluate the proposed method. The images were captured in August 2003 and May 2004, respectively. The ground truth image was created by integrating prior information with photo interpretation. The dataset is shown in Fig. 5. The second dataset is the Ottawa dataset, as shown in Fig. 6. Two images were acquired by the Radarsat sensor in May 1997 and August 1997, respectively. The National Defense Research and Development Canada provides the dataset, and the dataset shows the changed information in areas affected by floods. These images were registered by the automatic registration algorithm from A.U.G. Signals Ltd. that is available through distributed computing at www.signalfusion.com. The size of Ottawa dataset is 290 × 350 pixels and the spatial resolution is 10m. The available ground truth was created by integrating prior information and photo interpretation. The third dataset is the Yellow River dataset. It was selected from two large SAR images collected in the Yellow River Estuary area of China. Both images were taken by Radarsat-2 (the spatial resolution is 8m) in June 2008 and June 2009, respectively. The original size of the dataset is 7666 × 7692 pixels. The size is too large to describe details. Therefore, two typical regions (306 × 291 pixels and 289 × 257 pixels) were used in the experiment to verify the effectiveness of the proposed SAFNet. The changed regions showed newly cultivated farmland. The dataset is shown in Fig. 7. In particular, this dataset is polluted by noise with different characteristics. Specifically, one image in the dataset is a single-look image while the other is a four-look image. Therefore, it is very challenging to perform change detection on this dataset.
Both quantitative measures and qualitative analysis are performed on the four datasets. Specifically, the qualitative analysis is achieved by visual comparison between the change map generated by the proposed method and the ground truth image. Besides, false positives (FP), false negatives (FN), overall errors (OE), percentage correct classification (PCC), and Kappa coefficient (KC) are utilized as the quantitative measures to assess the change detection performance. FP denotes the number of unchanged pixels that are mistakenly detected as the changed ones. FN is the number of pixels belonging to the changed class but is incorrectly detected as the unchanged ones. OE denotes the total number of pixels that are incorrectly detected, i.e., the sum of FP and FN. The PCC is computed by: where N u denotes the total number of unchanged pixels in the ground truth image while N c denotes the total number of changed pixels. The KC is computed by: where (16) Here, we can observe that the value of KC is determined by FP and FN rather than by OE alone. Therefore, KC reflects the balance between FP and FN to some extent. More detailed information should be taken into account to obtain suitable KC values, and KC is a more persuasive indicator of the change detection result than PCC or OE.
B. Parameter Analysis 1) Analysis of the Sample Image Size: We first investigate the change detection performance by tuning the values of r, which denotes the size of the input patch-pairs employed in SAFNet. We set r to, 7, 9, 11, 13, and 15 to indicate the relationship between r and PCC on four datasets. As shown in Fig. 8, the PCC values do not perform well when r = 7. It is evident that deep learning methods can not extract robust features in such small image patches. On the Ottawa dataset, the proposed SAFNet obtains the best performance when r = 9. When r > 9, change detection results tend to deteriorate. On the San Francisco and the Yellow River I datasets, the SAFNet achieves the best PCC values when r = 13. When r > 13, PCC values tend to get worse. It indicates that it is The patch size of sample images difficult to describe the changed information of the center pixel by large image patches. Therefore, in our implementation, we set r = 9 and r = 11 on the Ottawa and Yellow River II datasets. On the San Francisco and Yellow River I datasets, r = 13 is employed as the best choice. 2) Analysis of the Number of Training Samples: Next, we discuss the relationship between the number of training samples N t and the experimental results, because the training samples are essential. Training samples are randomly selected from pseudo label set. N t = [1%, 2%, 3%, 4%, 5%, 6%]. From Fig. 9, we can observe that it is difficult to obtain excellent classification performance with fewer training samples. Because insufficient training samples lead to over-fitting. PCC values commonly decrease with the reduction of N t , especially when N t < 3% has a curve of obvious decline. When N t > 3%, PCC value tends to be stable. Considering that more training samples will affect the efficiency and generalization to a certain extent. Therefore, we choose N t = 4% on the Yellow River dataset, and N t = 4% on the San Francisco and Ottawa datasets.
3) Analysis of the Hyperparameter: In this paper, the combination of similarity measure and classification loss is used to jointly optimize the model parameters. We use a hyperparameter to balance the similarity measure and classification prediction. Here, λ is set to [0.01, 0.1, 0.5, 1]. It is found from Table II that when λ = 1, it is difficult to focus on the model prediction, which is utilized for the measurement of final results. When λ > 0.5, the PCC values began to decrease gradually, due to the similarity measurement is reduced. It reflects that similarity measure improves discrimination to a certain extent. Therefore, λ = 0.5 is selected as the most appropriate hyperparameter.

4) Ablation Experiment:
We present an ablation experiment in which importance of model components is evaluated. Table  III shows the relationship among AF module, correlation layer, and PCC values. We can observe that the full model (SAFNet with AF module and correlation layer) achieves better performance. Specifically, there are 0.09%, 0.58%, 0.49%, and 0.33% improvements in PCC on four datasets compared with the full model excluding AF module, respectively. That is because multiscale responses from convolutional layers are adaptively combined by AF module. Therefore, it is shown that the AF module effectively alleviates the problem of unstable feature representation. In addition, an ablation experiment between the correlation layer and PCC values is exhibited. Table III reveals that the PCC value of the full model is higher than the full model without the correlation layer. That is because the features from two-branch networks further integrate for classification. To sum up, the proposed SAFNet benefits from the AF module and correlation analysis.

C. Experimental Results and Discussion
To verify the effectiveness of the proposed SAFNet, several existing change detection methods are used for comparison, i.e., PCAKM [39], NBRELM [18], GaborPCANet [17], RMG-FDA [40], ResNet [36], DBN [26], DCNet [41], and ESCNet [42]. PCAKM uses PCA filters for feature extraction, and  then the extracted features are classified by k-means clustering. In NBRELM, the neighborhood-based ratio operator is adopted on DI generation and feature extraction. The extracted features are then classified by the extreme learning machine (ELM). GaborPCANet utilizes PCANet to extract discriminant features. PCANet is a simple deep learning network whose convolution filters are chosen from PCA filters. In RMG-FDA, the graph-based method is chosen as the classifier for changed pixel identification. In ResNet, ResNet-18 is employed for pixel-wise classification. DBN applies a deep belief network to complete the SAR change detection task. DCNet is a deep neural network, which cascades multiple channel-weighting based residual blocks. In ESCNet, two weight-share superpixel sampling networks and a siamese neural network based on U-NET are utilized to mine the information between multitemporal images. Specifically, for the PCAKM, the size of neighborhood patch is set to be S = 3, and the number of principal components selected by PCA is set as h = 5. In GaborPCANet, f = √ 2, V = 5, U = 8, and k max = 2π are utilized for feature extraction with PCANet. For the NBRELM and GaborPCANet, the size of neighborhood patch is set to be 7 × 7. In RMG-FDA, the threshold for frequency-domain analysis is set as SmoothV al = 0.1 and t sal = 0.3, and 6 graphs are used in the random multi-graphs algorithm.
1) Results on the San Francisco Dataset: Fig. 10 provides the visual comparison of the change maps generated by dif-  2) Results on the Ottawa Dataset: Fig.11 illustrates the change maps by different methods on the Ottawa dataset.   0.47% in PCC. In ECANet's results, FP value is the lowest while FN value is high, because many subtle change regions are eliminated as noise. In addition, the PCC of SAFNet is slightly higher than DBN and DCNet. That is because SAFNet has better detail retention resulting in lower FN values. In general, the proposed SAFNet is predominant on the Ottawa dataset compared with some state-of-the-art methods.  3) Results on the Yellow River Dataset: The change maps generated by different methods on the Yellow River dataset is shown in Fig. 12 and Fig. 13, and the corresponding evaluation criteria are listed in Table VI and Table VII. The Yellow River datasets are seriously interfered by speckle noise. Therefore, it is challenging for traditional techniques to obtain satisfying results. For Yellow River I dataset, the change maps of PCAKM and GaborPCANet exhibit many noise regions, and hence both methods suffer from very high FP values. For NBRELM, many changed pixels are missed, and therefore the FN value is relatively high. Despite the noise interference is suppressed, much important change information is ignored and results in a high FN value. Fig. 12 (f)-(i) perform better, and it is evident that deep learning-based methods can explore contextual information more effectively. Especially, ESCNet and SAFNet suppress the noise at the bottom of the change map.
For Yellow River II dataset, we can observe a large number of noise regions in Fig. 13 (a)-(c). In addition, many changed information is missed at the bottom of changed regions. As a result, the change detection results are seriously deteriorated. For RMG-FDA, noise is effectively suppressed by frequencydomain analysis, but much changed change information is lost. In contrast, Fig. 13 (f)-(h) generated by deep learningbased methods are more similar to ground truth image. However, the changed information is ignored on the left of the change map resulted by DCNet. Therefore, the FN values are relatively high. Generally speaking, the proposed SAFNet   Based on the above experiments on four real SAR datasets, the proposed SAFNet has superior performance over several traditional shallow classification models. Besides, by employing the dual network with similarity loss and classification loss to extract the high-level semantic features, the proposed   SAFNet achieves better performance than other deep learningbased methods on four real SAR datasets. Furthermore, AF module improves the change detection performance by adaptively fusing the multi-level features for classification. Moreover, the proposed SAFNet has a strong capacity in feature learning. It is a powerful and useful tool for SAR image change detection.

D. Noise Robustness Analysis
Speckle noise degrades the quality of SAR images. It causes the entanglement between noise and signal. Therefore, we further compare and analyze the noise robustness of the proposed method in this experiment. First, speckle noise is added to the original SAR dataset I n = I · n. Here, I is the original SAR image, n is the noise characteristics. As shown in Table VIII and Table VIII, V ar = 40 represents the noise with Gaussian distribution variance of 40, and so on. Ray = π/2 represents Rayleigh distribution noise with parameter π/2. Larger variance means image degradation more seriously. The noise-polluted San Francisco dataset is typically shown in Fig. 14, Gaussian noise reduces the signal-to-noise ratio (SNR) of the image, and thus affects the visualization. Rayleigh distribution noise aggravates the interference of multiplicative noise which is difficult to filter out.
We can observe from Table VIII that as the noise level increases, the performance of the change detection method deteriorates rapidly. Especially when V ar > 60, the PCC values drop sharply. In addition, the results become worse under the interference of Rayleigh distribution noise. Although the original images suffer from different levels of noise, the SAFNet still reaches acceptable PCC values in most cases. In conclusion, the performance of SAFNet declined with along a small scope. That is because the stable feature representation is directly obtained from the input images, thus the noise interference is suppressed to some extent.

IV. CONCLUSIONS
In the past few years, many deep learning methods for SAR image change detection have attracted a lot of attention. However, discriminative feature representation still needs improvement. In this paper, features from different levels are adaptively fused by AF module. Therefore, meaningful features are emphasized and irrelevant ones are suppressed. In addition, the correlation layer is introduced to further integrate the features from two-branch networks. Besides, feature extraction of two-branch networks is guided by the similarity measure. In the projection space, the changed image pairs still maintain the similarity, while the unchanged pixels are far away from each other. After the correlation layer, the classification loss is employed to optimize the whole network. Therefore, distinctive feature representation is achieved for change detection. Compared with the state-of-the-art methods, the SAFNet exhibits superior performance in terms of quantitative metrics and visual comparison.
With the development of high-resolution Earth observation technology, change detection technology will serve highresolution, long-time series, and large-scale scenarios in the future. In addition, multisource image change detection is also deserved extensive attention.