Region Matching of SAR Images Using Blocks for Target Recognition

A synthetic aperture radar (SAR) target recognition method based on image blocking and matching is proposed. The test SAR image is first separated into four blocks, which are analyzed and matched separately. For each block, the monogenic signal is employed to describe its time-frequency distribution and local details with a feature vector. The sparse representation-based classification (SRC) is used to classify the four monogenic feature vectors and produce the reconstruction error vectors. Afterwards, a random weight matrix with a rich set of weight vectors is used to linearly fuse the feature vectors and all the results are analyzed in a statistical way. Finally, a decision value is designed based on the statistical analysis to determine the target label. The proposed method is tested on the moving and stationary target acquisition and recognition (MSTAR) dataset and the results confirm the validity of the proposed method.


Introduction
High-resolution synthetic aperture radar (SAR) images provide basis for efficient and accurate intelligence interpretation [1]. e moving and stationary target acquisition and recognition (MSTAR) dataset provided a benchmark for the research and evaluation of SAR target recognition algorithms [2,3]. e resolution of SAR images in this dataset reaches 0.3 m, which can be effectively used for the classification of vehicle targets such as tanks, armored vehicles, and cannons. With nearly 30 years of developments, the SAR target recognition methods on the MSTAR dataset have made great progress in performance. However, these researches also revealed the shortcomings of current methods for the extended operating conditions (EOCs). EOCs in SAR target recognition may be caused by variations of target configurations, backgrounds, sensors, etc. [4]. As a result, the test samples to be recognized may have notable differences with the established training samples. Hence, the recognition problems under the standard operating condition (SOC) are not challenging and more focus should be imposed on solving the nuisance cases under EOCs [5,6].
SAR target recognition methods usually combine feature extraction and classifier design. e two steps are closely coupled to improve the recognition performance. In terms of feature extraction, a rich set of features has been applied into SAR target recognition, which can be generally summarized as geometrical, transformation, and electromagnetic ones. e geometrical features depict target shapes including region, contour, and shadow. In [7][8][9][10], the Zernike and Chebyshev moments were used as basic features to describe the target region. In [11][12][13], the target regions in SAR images are directly matched with the support of morphological operations. In [14][15][16], the target contour or outline was adopted for target recognition. e transformation features are usually extracted based on the pixel distribution in SAR images. Typical algorithms include the projection ones such as principal component analysis (PCA) [17] and nonnegative matrix factorization (NMF) [18] and the decomposition ones such as monogenic signal [19,20] and empirical mode decomposition (EMD) [21]. e electromagnetic features describe the backscattering characteristics of the targets such as the attributed scattering centers (ASC) and polarization [22][23][24][25]. In the classification stage, a decision is made on the features extracted from the test sample. For the transformation features with uniform forms and dimensions, traditional classifiers such as K-nearest neighbor (KNN) [17], support vector machine [26,27] (SVM), and sparse representation classification (SRC) [27][28][29] can be directly employed for classification. For the irregularly arranged and inconsistent features such as target contour points and scattering centers, it is necessary to employ some specially designed classification strategies, such as the similarity measure for the scattering center sets designed in [22][23][24][25]. In recent years, the deep learning models have been also widely used in SAR target recognition like the convolutional neural networks (CNN) [30][31][32]. e deep learning models are directly trained and learned based on the original images, avoiding the traditional manual feature extraction process. e research results verified the effectiveness of the deep learning models for SAR target recognition under the premise of sufficient training samples. For EOCs, the relevant training samples are very limited, which leads to poor adaptability of the deep learning methods for SAR target recognition.
is paper proposes a SAR target recognition method based on image blocking and matching. e original image is separated into several blocks and the target label is determined by comparing and analyzing each block. Under EOCs, the target in SAR images may have local changes caused by noises, occlusions, etc. But in essence, the corrupted test sample can still share high similarities with the corresponding sample from the actual training class. In this sense, by observing and evaluating the local differences and consistency between SAR images, EOCs can be overcome with high effectiveness. e proposed method divides the SAR image into 4 blocks of equal area with the target center as the reference point. For each block, the monogenic signal is employed for feature extraction and a unified feature vector is constructed. According to the properties of monogenic signal, the constructed features can effectively reflect the spectral characteristics and local distribution of the target. For the feature vector constructed from each block, SRC is used for as the basic classifier and the reconstruction error vector of different training classes can be obtained. For the results of 4 blocks, a random weight matrix with massive weight vectors is developed to linearly fuse them. For the correct class, the blocks with low reconstruction errors account for the majority, so its corresponding reconstruction errors from the four blocks have smaller mean and variance. On the contrary, for the wrong class, the mean value of the four reconstruction errors tends to be relatively large and also the variance because of the randomness. Based on statistical analysis, a decision value is defined as the measure to determine the target label. In the experiments, the proposed method is investigated on the MSTAR dataset under different scenarios. e experimental results show its significant superiority over the compared methods under both SOC and EOCs.

SAR Image Blocking
e previous researches showed that EOCs in SAR images are mostly related to the local variations of the target. For example, in the case of configuration variation, the test target only has some local structural differences with the reference one in the training set, which can also be reflected in local pixel distribution and geometric structure in the SAR image. erefore, it is meaningful to fully investigate the local changes of the target as for handling the EOCs. Traditional methods were generally developed based on overall SAR images for feature extraction and classification. In this case, the local changes may cause variations of global feature changes. As a result, the idea of global feature matching may lose some precision for target classification. As a remedy, this study divides the original SAR image into several blocks and then analyzes the target characteristics by each of them separately. Finally, a reliable classification result can be achieved based on the joint analysis of the results from different blocks.
Specifically, the image blocking algorithm used can be implemented mainly in two steps. First, the original image is centralized and the target centroid is adopted as the reference point for the following blocks. Afterwards, the original image is divided along the range and cross range directions to obtain 4 subimages. Figure 1 shows the blocking result of a SAR image from the MSTAR dataset. Each subimage is processed independently. Hence, when a certain subimage has some local variations, its classification result has little influence on other subimages. It is beneficial to obtain the true correlation between the test sample and the training classes, thus improving the classification accuracy.

Feature Extraction
For each subimage from the blocking stage, the traditional target recognition procedure with feature extraction and classification is employed. e monogenic signal is used for feature extraction for those subimages [19,20]. Denote z � (x, y) T as the coordinates in 2D space; f(z) is the image or matrix to be processed. e monogenic signal corresponding to f(z) is calculated as follows: where f R (z) represents the Riesz transform of f(z); i and j are the imagery units along two dimensions of the image. A further decomposition is conducted with three types of components, i.e., local amplitude, local phase, and local orientation, as follows: where f x (z) and f y (z) are resulted from the i-imaginary and j-imaginary parts of f M (z), respectively. Generally, the target recognition methods based on monogenic signal are developed on the three components because they can comprehensively describe the target chrematistics. A(z) reflects the local amplitudes, which describes the intensity distribution. φ(z) and θ(z) depict the structural and geometric properties of the target, respectively. is study constructs a feature vector based on the monogenic components with reference to [15], in which the special parameters were determined for monogenic decomposition and the results are reorganized to a concatenated vector.

SRC.
For the extracted monogenic features, SRC is adopted as the classifier [27][28][29]. e idea of sparse representation assumes that the test sample can be linearly reconstructed by the training samples from the same class. Φ k � [x k,1 , . . . , x k,n k ] ∈ R d×n k (k � 1, . . . , C) is constructed as a local dictionary with n k d-dimensional samples from the kth class; the test sample y is represented as follows: where α k � [α k,1 , . . . , α k,n k ] T ∈ R n k comprises of the linear coefficients.
When the test sample is from an unknown class, the linear representation should be performed on all the potential classes. So, SRC usually conducts the representation over the global dictionary as follows: where Φ � [Φ 1 , . . . , Φ C ] ∈ R d×n is the global dictionary which comprises of samples from C training classes; α � [α 1 , . . . , α C ] T ∈ R n is the global coefficient vector to be solved; ε is the error tolerance. As a nondeterministic polynomial (NP) hard problem, the optimization task in equation (4) is complex to be solved.
ere are two main ways to handle this issue in previous works. One is replacing the ℓ 0 norm by ℓ 1 norm to formulate a convex optimization objective function for smooth solution. Another is using the greedy algorithms, such as the orthogonal matching pursuit (OMP), to obtain an approaching result.

Decision Fusion with Random Weight
Matrix. For the classification results from different subimages, they should be combined and fused to reach a final decision. Although there are different information fusion algorithms in previous works, the linear weighing fusion is a simple but suitable one for this study. Furthermore, to handle the possible instability of a fixed weight vector, the random weight matrix W is designed with multiple choices of weight vectors, in which the elements in each row are subject to the following constraint: For different weight vectors in the weight matrix, disproportionate importance is imposed on different subimages. With a rich set of weight vectors, the complex situations in different subimages can be comprehensively analyzed. e fusion process with the random weight matrix is performed as follows: Here, RV k denotes a row vector related with the kth training class, containing N elements corresponding to reconstruction errors of the N subimages. FV k corresponds to the fused error vector at different choices of random weight vectors. en, for C different training classes, there are C fused vectors denoted as FV 1 , FV 2 , . . ., FV C .
When the test sample is actually from the kth class, the fused errors in FV k tend to be small. Otherwise, these errors are probably at high levels. In addition, the errors at different weight vectors may vary intensively and disorderly. ese statistical phenomena can be used to evaluate the true relationship between the test sample and training classes. At first, the mean and variance of FV k are calculated as μ k � mean(FV k ) and σ 2 k � Var(FV k ). en, a similarity measure is developed as follows to properly evaluate the relation between the test sample and kth training class: Accordingly, with lower μ k and σ 2 k , a higher S k can be achieved, which indicates a higher similarity. After obtaining the similarities between the test sample and different classes, the target label can be determined as follows: indentity(Y) � max k S k , k � 1, 2, . . . , C.
(8) Figure 2 shows the basic implementation process of the proposed method. e image blocking algorithm is used to process all training samples, and a single feature vector is extracted for each subimage based on the monogenic signal. Afterwards, the dictionaries of different subimages are constructed. For the test sample, the same blocking algorithm is used for processing and feature extraction. en, the corresponding four monogenic feature vectors are obtained. Computational Intelligence and Neuroscience SRC is used to classify the feature vectors of the 4 blocks, and the reconstruction error vectors are obtained. Finally, the 4 error vectors are fused using the random weight matrix and the target label of the test sample is determined.

Basics of MSTAR Dataset.
e experiments are designed and conducted based on the MSTAR dataset, a popular and authoritative data source for the evaluation of SAR target recognition algorithms. Ten typical targets shown as Figure 3 are measured with thousands of 0.3 m-resolution SAR images, which is suitable to be used for target identification. With the support of the rich set of SAR images, various conditions or situations can be designed for experimental validations.
To objectively evaluate the performance of the proposed, we also drawn several previous methods in this field for comparison. e first one used Zernike moments of the whole target as features, which were classified by SVM for decision [7]. e second one adopted the monogenic signal and the resulted three types of features were classified by joint sparse representation [20]. e third one employed the ASCs as features and developed a matching algorithm [23]. e fourth one developed a novel CNN architecture, namely, all fully convolutional neural network (A-ConvNet), for SAR target recognition [31], which is chosen as a representation for deep learning-based algorithms. e following tests are conveyed under both SOC and EOCs to provide comprehensive evaluation of the proposed method.

Condition 1: SOC.
As explained in the former texts, SOC is a simple but representative case in SAR target recognition. Table 1 establishes the setup for SOC based on the MSTAR dataset. e training and test samples with 2°depression angle variance are assumed to be highly alike. Figure 4 displays the recognition results of the proposed method with a confusion matrix. As shown, the x and y labels correspond to the 10 targets and the diagonal elements mark the recognition rates of different classes. We define the average recognition rate of the 10 classes as P av � (N C /N T ), in which N C denotes the number of the correctly-classified samples and N T is the total number of all test samples. Correspondingly, the P av of the proposed method is computed as 99.48%. Table 2 summarizes the P av s of all the methods. In comparison with the Zernike method, the blocking of the whole image and decision fusion significantly enhance the final result. Compared with the monogenic method, the joint use of the blocks further improves the recognition performance. e A-ConvNet method ranks second in these methods, validating the high effectiveness of deep learning models when the test samples share high similarities with the training ones.

Condition 2: Configuration Variants.
For the ground targets, it is usual to see their variants for different scenarios. e 10 targets in the MSTAR dataset also have configuration variants and some are chosen as shown in Table 3 to establish the experimental setup. For the BMP2 and T72 targets, their test samples include more configurations than the training sets. e use of BTR70 in this case is mainly causing confusion, thus enhancing the difficulty of the recognition problem. Table 4 lists the P av s of different methods for comparison. With the highest performance, the proposed method maintains the best robustness under configuration variants. e ASC matching method ranks second among the five methods. As local descriptors, the structural modifications caused by configuration variants can be well sensed by the ASC parameters. Compared with the Zernike and monogenic methods, the blocking and fusion strategy in the proposed method contributes to higher recognition performance.

Condition 3: Depression Angle
Variances. When the test samples and the training samples come from two depression angles with large differences, their similarity also decreases sharply, enhancing the difficulty of the recognition problem. Table 5 establishes the experimental setup for configuration variants. e training samples of the three targets are from 17°depression angle, but the corresponding test samples are from 30°and 45°, respectively. Figure 5 shows the  Figure 2: Procedure of implementation of the proposed method. 4 Computational Intelligence and Neuroscience performance of all the methods at the two depression angles for comparison. First, the performance at 30°depression angle is much higher than that at 45°, which shows that the large depression angle change causes intensive influences on the recognition results. Second, at both depression angles, the proposed method achieves the highest P av s because the   Computational Intelligence and Neuroscience blocking patches could better deal with the image variations caused by depression angle changes. In the reference methods, the ASC matching method obtains the best performance due to the robustness of features.

Condition 4: Noise Corruption.
When the test sample is measured with a low signal-to-noise ratio (SNR), it is assumed to have many differences with the one from a high SNR. e original MSTAR images were mainly acquired from high SNRs. To test the method under noise corruption, we first simulate the noisy test sets based on the original test samples. e energy of the original SAR image is used as the reference and the additive Gaussian noises are generated according to the desired SNR [24]. Finally, these noises are added into the SAR images to obtain the noisy image. Based on the noisy test sets at different SNRs, the performance of all the methods is obtained, as shown in Figure 6. It is noticeable that the noises have significant influences on the recognition performance of all the methods. In comparison, the proposed method achieves the highest P av s at different noise levels, validating its superior noise robustness. e blocking algorithm divides the whole image into several patches. e noises in one parch will not affect the other ones. erefore, the noise interferences can be relieved to some extent. In addition, the monogenic features have some robustness to noises. So, the overall noise robustness of the proposed method can be further enhanced.

Condition 5: Partial Occlusion.
e possible occlusion case is also considered in the experiments. For example, when there is a building or obstacles between the target and SAR sensor along the radar view direction, some parts of the target may be occluded and will not be reflected in the measured SAR image. According to the previous works, the directional occlusion model is adopted in this experiment [24]. A certain proportion of the target region is removed from the original image to generate the occluded sample. Based on the simulated test sets at different occlusion levels, the performance of all the methods is obtained, as shown in Figure 7. Similar to the case of noise corruption, the directional occlusions decrease the recognition performance. With the highest P av s at different occlusion levels, the good robustness of the proposed method is validated. With the blocked patches, the occlusions          Computational Intelligence and Neuroscience 7 in one of them will not affect the remaining ones. In this sense, the occlusions can be better handled to make sure the fused decision is more accurate.

Conclusion
e paper proposes a SAR image target recognition method based on block matching. e original SAR image is processed in 4 blocks, and each subblock reflects local characteristics in different directions. e monophonic signal is used to describe the spectral characteristics and local features of each subblock and construct a feature vector. e monomorphic feature vectors of the 4 subblocks are classified by SRC to obtain the reconstruction error vector. Based on the random weight matrix, the reconstruction error vectors of the 4 subblocks are weighted and fused.
rough statistical analysis of the fusion results under multiple sets of weight vectors, the decision variables are designed to obtain sample categories. e experiment sets 4 test conditions in the MSTAR dataset, including standard operating conditions and extended operating conditions. e experimental results show that this method has significant performance advantages compared with the existing methods.
Data Availability e dataset can be accessed upon request to the corresponding author.

Conflicts of Interest
e authors declare no conflicts of interest.