SAR Target Detection Based on Domain Adaptive Faster R-CNN with Small Training Data Size

: It is expensive and time-consuming to obtain a large number of labeled synthetic aperture radar (SAR) images. In the task of small training data size, the results of target detection on SAR images using deep network approaches are usually not ideal. In this study, considering that optical remote sensing images are much easier to be labeled than SAR images, we assume to have a large number of labeled optical remote sensing images and a small number of labeled SAR images with the similar scenes, propose to transfer knowledge from optical remote sensing images to SAR images, and develop a domain adaptive Faster R-CNN for SAR target detection with small training data size. In the proposed method, in order to make full use of the label information and realize more accurate domain adaptation knowledge transfer, an instance level domain adaptation constraint is used rather than feature level domain adaptation constraint. Speciﬁcally, generative adversarial network (GAN) constraint is applied as the domain adaptation constraint in the adaptation module after the proposals of Faster R-CNN to achieve instance level domain adaptation and learn the transferable features. The experimental results on the measured SAR image dataset show that the proposed method has higher detection accuracy in the task of SAR target detection with small training data size than the traditional Faster R-CNN.


Introduction
Synthetic aperture radar (SAR) is an active Earth observation system, which has the characteristics of all-weather, all-time, high resolution and strong penetration. Therefore, SAR can play a role that other remote sensing means cannot play. Automatic target recognition (ATR) of SAR image has become a key technology for processing massive SAR image data. A typical SAR image ATR system is usually divided into three stages: detection, discrimination and recognition. SAR image target detection technology is important for the SAR ATR because the performance of the detection stage will directly affect the accuracy of the subsequent processing.
The existing SAR image target detection methods can be divided into two types: nonlearning target detection algorithm and learning based target detection algorithm. Constant false alarm rate (CFAR) [1] is a kind of traditional non-learning target detection algorithm which is widely used in SAR system for SAR target detection. With the rapid development of deep learning, learning based deep convolution neural network (CNN) has been widely and successfully applied in target detection [2][3][4][5][6] and achieves better target detection performance than non-learning algorithms, such as faster regions with convolutional neural network (Faster R-CNN) [3], which combs a regional proposal network based on CNN with a Fast R-CNN detector. Although the SAR target detection methods based on Faster R-CNN can achieve satisfying performance, a large number of labeled training samples are required in most of these methods for model learning, which means that it will lead to a significant performance drop for small training dataset sizes.
Some studies expand the training samples by using artificially generated samples to deal with the small training data size problem. For example, several data generation methods based on generative adversarial network (GAN) models are proposed in [7][8][9][10] to generate realistic infrared images from optical images. Using data from the source domain to assist the interested target domain's task is also present in transfer learning. In particular, when the data distribution is assumed to change across the source and target domains and the learning task remains the same, we call this transfer learning as domain adaptation [11]. The domain adaptation [12,13] has been studied to improve the performance of target domain with the help of source domain that contains abundant samples. The purpose of domain adaptation is to map the data in source domain and target domain with different distribution to the same feature space, and find a measurement criterion to make them as close as possible in this space. Then the data in the target domain can be classified by directly using the classifier trained on the source domain [14,15]. Early domain adaptation methods minimize the maximum mean discrepancy (MMD) [12,13,16] to reduce domain discrepancy. Recent works [14,15,17,18] [19]. In target detection research, Inoue et al. achieved a two-step weakly supervised domain adaptation framework using conventional pixel-level domain adaptation methods and pseudo labeling [20]. Chen et al. proposed a domain adaptive Faster R-CNN model to learn domain-invariant features by aligning feature distribution at the image-level and instance-level through adversarial learning [21]. He et al. and Chen et al. proposed an importance-weighted GRL to re-weigh source or interpolation samples [22,23].
In this study, a SAR target detection method based on domain adaptive Faster R-CNN is proposed, which can solve the problem of low detection accuracy due to small training data size of SAR images by transferring knowledge from optical remote sensing images of source domain to SAR images of target domain through domain adaptive learning. A large number of labeled optical remote sensing images and a small number of labeled SAR images can capture the similar scenes, such as the parking lots, which contain the interested target with similar high-level semantic features in both source and target domains. In the proposed method, domain adaptation constraint is added after the proposals of Faster R-CNN, which realizes instance level constraint and makes full use of label information to achieve accurate domain adaptation learning. More specifically, a GAN constraint is utilized as the domain adaptation constraint in the adaptation module to achieve domain adaptation and learn the transferable features. With the help of label information, the proposed model can accurately narrow the differences between the high-level features of interested target of the target domain and those of the source domain via domain adaptive learning, so as to assist the learning of target detection model in the target domain. The proposed model can alleviate the overfitting problem caused by the small training data size of SAR images, and effectively improve the SAR image target detection performance.
The remainder of this letter is arranged as follows. Section 2 represents the proposed domain adaptive Faster R-CNN framework. Section 3 shows the experimental results and analysis based on the measured SAR data and optical remote sensing data. Section 4 discusses the domain adaptation performance in our method. Finally, Section 5 concludes this letter.

Domain Adaptative Faster R-CNN
The SAR target detection methods are faced with the challenge of performance degradation when lacking labeled data for model training. However, the optical remote sensing images are much easier to be labeled, thus abundant labels can be obtained to learn the features that perform well for detection. In this paper, we aim to transfer the abundant information of optical remote sensing images with a large number of labeled samples to SAR images with a small number of labeled samples to help SAR target detection. As mentioned in the introduction, although there are great differences between optical remote sensing images and SAR images, the interested target captured in them have similar highlevel semantic features. Thus, it is possible to learn the transferable high-level features between the optical and SAR images via adaptation methods. Let represents the target domain dataset with N t samples.
The  The source domain has abundant samples to learn the features that depict the source domain data well, while the target domain cannot achieve promising performance with small training data size. We first train the source domain detection model with the source domain data, and then try to transfer the knowledge from the source domain to the target domain. First, we initialize the target domain detection model with the source domain detection model. For the source and target domain data, i.e., the optical images and SAR images, capturing the similar scene, we use the distributions over the highlevel representations as the bridge for cross-domain data. Moreover, we assume that the marginal distributions over the high-level features across two domains should be similar. Specifically, to reduce the domain shift, we use the GAN constraint [24] in the domain adaptation module: F t presents the feature extraction module of target domain, E x t [log(1 − D(F t (x t )))] indicates the expectation that the discriminator D judges F t (x t ) as the feature belong to target domain, F s presents the feature extraction module of source domain, E x s [log D(F s (x s ))] indicates the expectation that the discriminator D judges F s (x s ) as the feature belong to source domain.
In our model, F t and D are iteratively updated. Specifically D wants to maximize the L DA loss and F t wants to minimize the L DA loss. The feature extraction module of source domain F t and the discriminator D compete in a two-player minimax game. The discriminator tries to distinguish features of source domain from features of target domain; and the F t tries to fool the discriminator. As a result, our model can force F t (x t ) to be indistinguishable from F s (x s ) and reduce the feature distance.
Then, we can give the objective function of the target domain Faster R-CNN: where FC presents the detection module of target domain, L RPN_reg and L RPN_cls are respectively the anchor-location and anchor-classification loss terms of the RPN for the target proposals; L reg and L cls are respectively the box-regression and box-classification probability loss terms of the ROI for the target, α and β represent the weight coefficients of L RPN + L cls_reg and L DA , respectively.

Results
To validate the effectiveness of the proposed method, some experiments are conducted in this section. The description of the datasets is firstly presented. Then the experimental results and analysis on the measured data are shown via a domain adaptation scenario.

Description of the Datasets
One SAR image dataset and one optical remote sensing image dataset are adopted to conduct some experiments to verify the effectiveness of our method. In the following, the SAR image dataset is the miniSAR dataset [25], and the optical image dataset is the Toronto dataset [26].
The miniSAR dataset is acquired by U.S. Sandia National Laboratories with the resolution of 0.1 m × 0.1 m. In detail, miniSAR dataset contains nine images with size 1638 × 2510 pixels, seven of which are used for training and the other two for testing. Two SAR images of the miniSAR dataset are shown in Figure 2. In addition, the Toronto dataset covers the city of Toronto with a color depth of 24 bits per pixel (RGB) and 0.15 m spatial resolution. The original image has the size of 11,500 × 7500 pixels, thus the original images are segmented into several subareas. Twelve subarea images are selected as the training set and 11 subarea images are selected as test images, respectively. In Figure 3, we present two images of the Toronto dataset. As we can see from Figures 2 and 3, there is some artificial clutter, including roads and buildings, and some natural clutter such as grasslands and trees.
It is not suitable to directly use the original images as the input of network, because the size of the original images is very large. In the following experiments, the original training images of the two datasets are cropped into multiple 300 × 300 sub-images, and then these sub-images are utilized for network training. Moreover, the original test images are also cropped into 300 × 300 sub-images by sliding window repeatedly. Here the size of the repetition sliding window is set as 100 pixels. Then the detection results of these test sub-images can be restored to the original test SAR image. Finally, the final detection results can be obtained by employing the non-maximum suppression (NMS) deduplication algorithm.

Evaluation Criteria
The detection performance of different methods can be quantitatively evaluated via precision, recall, and F1-score [27], and the calculation formulas of them are as: where the number of correctly detected targets is represented as TP, the number of false alarms is represented as FP, and the number of miss alarms is represented as FN. The precision and recall measure the fraction of true positives over all detected results and the fraction of true positives over the ground truths, respectively. By calculating the harmonic mean between precision and recall, the main reference F1-score can be obtained to evaluate the detection performance comprehensively.

Performance Comparison
In this subsection, the proposed domain adaptive Faster R-CNN method is compared with conventional SAR target detection methods, such as the Gaussian-CFAR [1], Faster R-CNN [3], and the Faster R-CNN initialized with the source domain detection model. Except for the CFAR method, which does not require training samples, the number of training samples for each method in the target domain is 110 firstly. The intuitional target detection results on miniSAR dataset of the proposed method and three comparison methods are shown in Figure 4. To make the experimental results clear and intuitive, the correctly detected target chips are represented as green rectangles, false alarms are represented as red rectangles and missing alarms are represented as blue rectangles. Since the performance of the CFAR methods depend on the detection threshold, all the following results of SAR target detection methods are the best by setting different thresholds.
As shown in Figure 4a,b, the Gaussian CFAR cannot locate targets accurately and has a lot of false alarms. Since the ground scene is quit complex in the miniSAR data, which contains vehicles as the targets to be detected and a lot of buildings, trees, roads and grasslands as the clutter, the CFAR methods are hard to gain a satisfying result by selecting a suitable clutter statistical model. As shown in Figure 4a,b, the Gaussian CFAR has a lot of false alarms and cannot locate targets accurately. Figure 4c,d show the detection results of Faster R-CNN. Compared with the unsupervised method CFAR, Faster R-CNN has a better detection performance with labeled information and deep convolutional network. Figure 4e,f show the detection results of Faster R-CNN initialized with the source domain detection model, which proves that simple transfer method has little help for the SAR detection task. Figure 4g,h show the detection results of our method. Both the missing alarms and false alarms in the results are reduced compared with the abovementioned methods because our method make use of the knowledge transferred from optical images.  For quantitative evaluation and analysis, the missing alarms and the false alarms of all the methods are counted from the correctly detected target chips to verify the performance. Further, to quantitatively analyze the overall SAR target detection results, we calculate precision, recall and F1-score shown in Table 1 as the evaluation criteria. Compared with other SAR target detection methods, the proposed domain adaptive Faster R-CNN can correctly detect more targets. Meanwhile, the missing alarms and the false alarms of the proposed method are both the fewest. It can be clearly seen from the quantitative evaluation criteria that our domain adaptive Faster R-CNN achieves the highest scores on precision, recall and F1-score. Specifically, the proposed method is at least 2.81%, 2.44% and 2.61% higher in terms of precision, recall and F1-score than the other compared SAR target detection methods. It can be concluded that the performance of the proposed domain adaptive Faster R-CNN method significantly superior to that of the other three SAR target detection methods.
Further, the experimental results of each method are also given in Table 1 when the number of training samples in the target domain are reduced to 11. The proposed method yields a nearly 6% improvement in F1-score compared to Faster-RCNN. It can be found that our method has a greater performance improvement with a smaller number of target domain training samples. Then, the experimental results of Faster-RCNN, Faster-RCNN + parameter transferring and the proposed method with different numbers of used SAR training samples in the target domain are presented in Figure 5. As shown in Figure 5, the more the SAR training samples are used, the better performance the original Faster R-CNN and the other two methods obtain. Moreover, compared to the results of the original Faster R-CNN, the parameter transferring and the domain adaptation module both can contribute to better detection performance in the proposed method. In particular, when the used SAR training samples are fewer, the superiority of the domain adaptation module is more obvious. Thus the detection performance of the proposed method is much better than the original Faster R-CNN.

Discussion
In order to yield an intuitive visualization of the domain adaptation results, t-distribution stochastic neighbor embedding (t-SNE) is employed to map features in multidimensional space to a two-dimensional space. Then the two-dimensional t-SNE visualizations from the features of Toronto and miniSAR samples can be shown. Since the miniSAR and Toronto datasets respectively contain 110 and 2600 training samples, we randomly choose 110 examples from the Toronto dataset and all 110 samples from the miniSAR dataset to show the two-dimensional t-SNE results of the features for the 220 samples. The features of the 220 samples from two datasets are respectively obtained via the source and target base networks and then they are jointly mapped to a two-dimensional space with t-SNE, and the projection results are presented in Figure 6. As we can see from Figure 6a, before adaptation, the feature space between Toronto and miniSAR samples are clearly separable, which shows there is great domain discrepancy between the two datasets. In Figure 6b, after feature-level adaptation via our method, the features distribution of miniSAR samples is similar to that of Toronto samples, which illustrates our method can effectively align the features distribution and reduce the domain shift.

Conclusions
This paper presents a novel SAR target detection method based on domain adaptive Faster R-CNN, which takes advantage of the domain adaptation to solve the problem of small training data size. The proposed approach is built on the Faster R-CNN model. The instance level domain adaptation is incorporated to learn the transferable features that bridge the cross-domain discrepancy between the optical remote sensing images and SAR images. In this way, the optical remote sensing images are utilized to help learn the features of SAR images that are useful for detection. Detailed experimental comparisons are given to confirm the effectiveness of the proposed method.