Combination of Joint Representation and AdaptiveWeighting for Multiple Features with Application to SAR Target Recognition

For the synthetic aperture radar (SAR) target recognition problem, a method combining multifeature joint classification and adaptive weighting is proposed with innovations in fusion strategies. Zernike moments, nonnegative matrix factorization (NMF), andmonogenic signal are employed as the feature extraction algorithms to describe the characteristics of original SAR images with three corresponding feature vectors. Based on the joint sparse representation model, the three types of features are jointly represented. For the reconstruction error vectors from different features, an adaptive weighting algorithm is used for decision fusion. (at is, the weights are adaptively obtained under the framework of linear fusion to achieve a good fusion result. Finally, the target label is determined according to the fused error vector. Experiments are conducted on the moving and stationary target acquisition and recognition (MSTAR) dataset under the standard operating condition (SOC) and four extended operating conditions (EOC), i.e., configuration variants, depression angle variances, noise interference, and partial occlusion. (e results verify the effectiveness and robustness of the proposed method.


Introduction
Synthetic aperture radar (SAR) target recognition has been researched for decades since 1990s [1]. According to the comprehensive review of current literature, the existing SAR target recognition methods can be divided into different aspects. From the aspect of target descriptions, the methods can be categorized as template-based and model-based ones. In the former, the references for the test sample are described by SAR images from different conditions, e.g., azimuths, depression angles, backgrounds, called training samples [2][3][4]. In the latter, the target characteristics are generated by models including CAD and scattering center models [5][6][7][8][9][10]. From the aspect of the decision engine, these methods are distinguished as feature-based and classifier-based ones. e former employs or designs specific features for SAR images so the discrimination can be exploited. e latter adopts or develops suitable classifiers for SAR target recognition so the overall performance can be improved. According to previous works, the features used in SAR target recognition cover geometric ones, transformation ones, and electromagnetic ones. e geometric shape features describe the target area and contour distributions [11][12][13][14][15][16][17][18][19][20][21], such as the Zernike moments, outline descriptors. e transformation features can be further divided into two sub-categorifies as projection and decomposition ones. e former aims to find the optimal projection directions through the learning of training samples, so the high dimension of the original images can be reduced efficiently. Typical algorithms for projection features include principal component analysis (PCA) [22], nonnegative matrix factorization (NFM) [23], etc. e latter decomposes the original image through a series of signal bases to obtain different layers of descriptors. e representation algorithms for decomposition features include wavelet decomposition [24], monogenic signal [25,26], bidimensional empirical mode decomposition (BEMD) [27], etc. e electromagnetic features focus on radar backscattering characteristics of targets, e.g., the attributed scattering center [28][29][30][31][32]. Classifiers are usually applied after feature extraction to make the final decisions. e classifiers in previous SAR target recognition methods were mainly inherited from the traditional pattern recognition field, such as K nearest neighbor (KNN) [22], support vector machine (SVM) [33,34], sparse representation-based classification (SRC) [34][35][36], and joint sparse representation [37][38][39]. In recent years, with the development of deep learning theory, the relevant models represented by convolutional neural networks (CNN) [40][41][42][43] have also been continuously applied to SAR target recognition with high effectiveness.
Considering the properties of different types of features, the multifeature SAR target recognition methods were developed to combine their strengths. ese methods can be generally divided into parallel fusion, hierarchal fusion, and joint fusion. e parallel fusion classifies different features independently and further fuses their decisions [44,45]. e hierarchal fusion classifies different features sequentially and a reliable decision in the former stage can avoid the remaining works [43,46,47]. e joint fusion mainly makes use of the multitask learning algorithms, which classify different features in the same framework, such as joint sparse representation [39]. Based on the previous works, this paper proposes a SAR target recognition method via a combination of joint representation of multiple features and adaptive weighting. ree types of features, i.e., Zernike moments, NMF, and monogenic signal, are used to describe the target characteristics in SAR images, which reflect the target shape, pixel distribution, time-frequency properties, respectively. In this sense, the three features have good complementarity.
e joint sparse representation model [48,49] is used to represent the three features, which employs their inner correlation to improve the representation accuracy. In the traditional decision-making mechanism based on joint sparse representation, the reconstruction errors of different tasks are directly added, and then the decision is made according to the minimum error. Actually, different tasks have different weights because they have different discrimination capabilities, so the idea of equal weights has certain shortcomings. As a remedy, this paper uses the adaptive weighting algorithm proposed in [50] to obtain the optimal weights for different features. For the reconstructed error vectors of different types of features, the adaptive weights are solved and used for linear fusion. Finally, the target label of the test sample is decided based on the fused error vector. In the experiments, tests and verifications are carried out on the moving and stationary target acquisition and recognition (MSTAR) dataset. e results of typical experimental setups show the effectiveness and robustness of the proposed method.

Zernike Moments.
e moment features are useful features to describe the shape and outline distribution of a region. e famous Hu moments could maintain good effectiveness for image with the relatively low noise level. However, for SAR images with strong noises and rotations and translations, the Hu moments may lose their adaptability. e Zernike moments can maintain high rotation invariance and noise robustness, which are more suitable for describing the regional features of SAR images [11][12][13].
With the form of f(r, θ) in polar coordinates, the Zernike moments of the input image are calculated as follows: where n � 0, 1, . . . , ∞； l � 0, ± 1, . . .； n − |l| is an even and |l| ≤ n。 e Zernike polynomials V nl (r, θ) � R(r)e ilθ are a set of orthogonal complete complex-valued functions on the unit circle x 2 + y 2 ≤ 1, which complies with the following constraints: Based on Zernike moments, the rotation invariants can be generated as follows: Before calculating the Zernike moment of an image, it is necessary to place the center of the image at the origin of the coordinates and map the pixels to the inside of the unit circle. Based on the principle of Zernike moments, the moments of any order can be obtained. In comparison, the higher-order moments contain more detailed information about the objects in the image. With reference to [11], this paper selects the Zernike moments at the 6th, 7th, 8th, 9th, 10th, and 11th orders (i.e., [n, m] � {[0, 0, 1, 1, 1, 1, 6, 7, 8, 9, 10, 11]}), to construct a feature vector, which describes the target area in the SAR image.

NMF.
NMF provided a way to efficiently reduce data dimension. Different from traditional PCA, NMF brings in the nonnegativity constraint and the resulting projection matrix could better maintain the valid information as validated in previous works [23].
For an input matrix D ∈ R n×m , it is decomposed by NMF as follows: e reconstruction error is employed to evaluate the decomposition precision, which is defined as the square Euclidean distance as follows: e above objective function can be iteratively updated to find the solutions as follows: where 0 ≤ a < r, 0 ≤ u < m, and 0 ≤ i < n.

Scientific Programming
With the solution of the matrix W, its transpose W − 1 is used as the projection matrix for feature extraction. With reference to [23], this paper employs NMF to obtain an 80dimension feature vector for an input SAR image.

Monogenic Signal.
As a 2D extension of the traditional analytic signal, the monogenic signal has been successfully applied to feature extraction of SAR images [25,26]. Denote the input image as to where z � (x, y) T represent the pixel locations.
e monogenic component is calculated as follows: where i and j are imagery units along with different directions. Based on f M (z), three monogenic features can be generated to describe the local amplitude, local phase, and local orientation: where f x (z) and f y (z) are the i-imaginary and j-imaginary components of the monogenic component, respectively.
As reported in previous works, the monogenic features could reveal the time-frequency properties of the original SAR image, including the intensity distribution, structural, and geometric information. With reference to [25], this paper reorganizes the three features in a vector, called monogenic feature vector.

Joint Sparse Representation.
e joint sparse representation is an extended version of traditional sparse representation, which handles several related problems simultaneously [48,49]. As the inner correlations of different sparse representations are exploited, the overall reconstruction precision can be improved. For the multiple features from the same SAR image, they are related and suitable to be represented by joint sparse representation. In the following, the basic process of jointly representing multiple features is described. Assume there are M different features from the sample y, denoted as [y (1) , · · · , y (M) ], a general form of joint sparse representation is as follows: where D (l) is the global dictionary corresponding to the lth feature; A � [a (1) , . . . , a (M) ] is a matrix established by the coefficient vectors by different features.
It can be analyzed that the objective function in equation (9) is equal to the solutions of the sparse representation problems of different features separately. In this sense, it can hardly make use of the inner correlations of different features. As a remedy, the joint sparse representation model in previous works imposed ℓ 0 /ℓ 1 norm on the coefficient matrix A with a new objective function as follows: where η is the regularization factor. During the solution of equation (10), the coefficient vectors of different features tend to share a similar pattern because of the constraint of ℓ 0 /ℓ 1 norm. erefore, the inner correlations among different features can be employed. It is reported and validated that simultaneous orthogonal matching pursuit (SOMP) [48] and multitask compressive sensing [49] are suitable for solving the problem. With the solution of the coefficient matrix A � [a (1) , . . . , a (M) ], the reconstruction errors of different training classes can be calculated to further determine the target label as follows: where D l i is the local dictionary of the lth feature with regard to the ith class; a (l) i is the corresponding coefficient vector. (11) gives the basic decision-making mechanism of the traditional joint sparse representation model applied to classification. In essence, this is a linear weighted fusion algorithm with the same weight. at is, it is considered that the contributions of different features to the final recognition are consistent. However, in the actual process, the effectiveness of different features for recognition is often different, so special consideration is required. Fusion by linear weighting is an effective method for processing multisource information, and its core element is to scientifically determine the weights of different components. To this end, this paper adopts the adaptive weight determination algorithm proposed in [50]. To simplify the description, take the reconstruction error vectors of the two types of features (denoted as d 1 i and d 2 i ) as examples to describe their fusion process.

Scientific Programming
Step 4. 1, 2, . . . , C) is achieved as the fused reconstruction error. e target label is decided as the class with the minimum error k � arg min i f i . e above algorithm analyzes the distributions of single reconstruction error vectors while comparing their individual characteristics. So, the result weights could better reflect the importance of different components than the traditional experiential weights such as the equal ones. For the reconstruction error fusion of the three types of features in this paper, the same idea is adopted, and the specific algorithm can be found in [50]. Figure 1 shows the basic process of the proposed method. e three types of features produce the reconstruction error vectors corresponding to each training class under the joint sparse representation model, respectively. e final reconstruction error vector is obtained using the adaptive weighted fusion algorithm. Finally, the target label of the test sample is determined according to the minimum error.

Experimental Setup.
e MSTAR dataset is used to test and analyze the proposed method. Figure 2 shows the 10 types of typical vehicle targets included in the dataset, e.g., tanks, armored vehicles, and trucks. For each target, the MSTAR dataset collects samples in a relatively complete azimuth range with several depression angles. Table 1 shows the basic training and test sets used in subsequent experiments, which are from two depression angles of 17°and 15°. Accordingly, the test and the training sets have only a small depression angle difference, and their overall similarity is relatively high. Such a situation is generally considered as a standard operating condition (SOC). On this basis, some simulation algorithms, including noise addition and occlusion generation, are developed to obtain test samples under extended operating conditions (EOC). Furthermore, samples from different target configurations and depression angles can be employed to set up EOCs like configuration variants and depression angle variances. erefore, based on the above conditions, the performance of the proposed method can be investigated and verified in a comprehensive way.
Some reference methods selected from published works are used for comparison, including ones using single features and ones using multiple features. e former three use single features, i.e., Zernike moments [11], NMF [23], and monogenic signal [25], which are consistent with the proposed method. e latter three decision fusion strategies including parallel fusion [45], hierarchical fusion [46], and joint classification [39], in which the three classified features are the same as the proposed method. Especially, the joint classification in the reference methods only performs joint sparse representation and directly adds the reconstruction results of different features with no adaptive weighting. Table 1, the proposed method is tested under SOC. Figure 3 presents the recognition results of the proposed method in the form of a confusion matrix. According to the corresponding relationship between the horizontal and vertical coordinates, the diagonal elements mark the correct recognition rates of different categories. Define the average recognition rate P av as the proportion of the test samples correctly classified in the entire test set. e P av of the proposed method is calculated to be 99.38%. Table 2 shows the P av s of all the methods which are achieved according to the same process on the same platforms. e effectiveness of the proposed method can be intuitively validated with its highest P av . Compared with the three types of methods using single features, the multifeature methods achieve obvious advantages, reflecting the complementarity between different features. Among the four multifeature methods, the idea of joint classification has some predominance over the parallel fusion and hierarchal fusion mainly because the inner correlations of different features are exploited. In comparison with the joint classification method, the proposed one further enhances the overall recognition performance by introducing adaptive weights, verifying the effectiveness of the proposed strategy.

EOC1-Configuration Variants.
For different military applications, the same target may have different models. When the test sample and the training sample come from different models, the difficulty of the target recognition problem will increase. Table 3 shows the training and test sets under the condition of model difference, where the test samples and training samples of BMP2 and T72 are from different models. Table 4 shows the identification results of the proposed method for different models. It can be seen from the recognition rate that there are differences in the similarity between different test models and the reference models in the training set. Table 5 compares the P av s of different methods under current conditions. e performance advantage of the multifeature method compared with the single-feature method is still obvious. In the framework of joint classification, the proposed method makes full use of the classification advantages of different features for model differences by introducing adaptive weights, thereby improving the final recognition performance.

EOC2-Depression Angle Variances.
e test sample to be classified may come from a different depression angle from the training samples. Considering the sensitivity of SAR images to view angles, it is difficult to correctly classify test samples from different depression angles. Table 6 sets up the training and test samples with different depression angles, in which the test set including samples from 30°to 45°depression angles. Figure 4 shows the P av s of different methods at two depression angles. It shows that the large depression angle difference (45°) causes significant performance degradations of all the methods. Comparing the results at the two depression angles, the P av s of the proposed method are the highest, verifying its robustness. Based on multifeature joint representation, the proposed method adaptively obtains the weights of different features, so their effectiveness for depression angle variances can be better utilized.

EOC3-Noise Corruption.
In the actual process, the obtained test samples of noncooperative targets are contaminated by varying degrees of noise. With the difference of the signal-to-noise ratio (SNR) between the test and the training samples increasing, their correlation decreases simultaneously. Based on the basic training and test sets in Table 1, this paper uses noise simulation to construct test sets with different SNRs using the original test samples [31], including -10 dB, -5 dB, 0 dB, 5 dB, and 10 dB. e proposed and reference methods are tested under different noise levels, and the statistical results are shown in Figure 5. Intuitively, noise corruption has a significant impact on recognition performance. In contrast, the proposed method maintains the highest P av at each SNR, verifying its robustness. Similar to the results under SOC, the performance predominance of the multifeature methods is still obvious over the single-feature ones. Among the three types of single-feature methods, the ones using Zernike moments and monogenic features are more robust than the ones using NMF features, which also reflects the different sensitivities of different features to noise interference. By effectively fusing different        4.6. EOC4-Partial Occlusion. Occlusion situations are also very common in practical applications. As a result, part of the ground target cannot be illuminated by the radar waves with no echoes. Using a similar idea of noise simulation, this paper constructs the test sets of different occlusion levels based on the test samples in Table 1 according to the target occlusion model in [31]. Afterwards, the recognition results of the proposed and reference methods are obtained as shown in Figure 6. e comparison shows that the proposed method maintains the highest P av s at different occlusion levels, reflecting its robustness. e proposed method comprehensively exploits multiple types of features and uses adaptive weights to further employ the advantages of the specified features that are more effective for occlusion situations. erefore, the final recognition results are improved.

Conclusion
is paper applies joint sparse representation and adaptive weighting to SAR target recognition. For the reconstruction error vectors of the three types of features resulting from the joint sparse representation, their corresponding weights are determined adaptively, reflecting different contributions of different features to the final classification result. Based on the MSTAR data set, experiments were carried out to test the recognition performance under a typical SOC and four representative EOCs. e proposed method achieves a P av of 99.38% for 10-class targets under SOC and superior performance over the reference ones under different EOCs. e experimental results show that the high effectiveness and robustness of the proposed method, which has certain advantages and potentials in practical uses.
Data Availability e dataset used in this paper is publicly available.