Physically Explainable CNN for SAR Image Classification

Integrating the special electromagnetic characteristics of Synthetic Aperture Radar (SAR) in deep neural networks is essential in order to enhance the explainability and physics awareness of deep learning. In this paper, we first propose a novel physically explainable convolutional neural network for SAR image classification, namely physics guided and injected learning (PGIL). It comprises three parts: (1) explainable models (XM) to provide prior physics knowledge, (2) physics guided network (PGN) to encode the knowledge into physics-aware features, and (3) physics injected network (PIN) to adaptively introduce the physics-aware features into classification pipeline for label prediction. A hybrid Image-Physics SAR dataset format is proposed for evaluation, with both Sentinel-1 and Gaofen-3 SAR data being experimented. The results show that the proposed PGIL substantially improve the classification performance in case of limited labeled data compared with the counterpart data-driven CNN and other pre-training methods. Additionally, the physics explanations are discussed to indicate the interpretability and the physical consistency preserved in the predictions. We deem the proposed method would promote the development of physically explainable deep learning in SAR image interpretation field.


Introduction
Synthetic Aperture Radar (SAR) can work in all-day all-weather conditions as an active microwave sensing technology. Different from the optical remote sensing images close to the visual understanding system of human eyes, SAR images reflect the electromagnetic characteristics of objects and terrain. In order to understand SAR images in a more comprehensive way, the artificial intelligence approaches should pay close attention on not only the visual information, but also the physical properties of SAR.
SAR image classification is a basic task, aiming to assign the semantic label to each SAR image patch. Some conventional theory-driven approaches were explored to extract the hand-crafted features based on the expertise of SAR, e.g., the statistic model based Leng et al. (2020); Gao et al. (2017) and the physical model based methods Leng et al. (2019). These model based approaches have strong interpretability, yet the feature selection and classifier design are time-consuming and lack flexibility. As a comparison, the data-driven deep learning approaches can build an end-to-end system to learn the hierarchical features automatically and predict the semantic labels simultaneously without human intervention, superior to the pure model-based methods on SAR image classification tasks Huang et al. (2017); Chen et al. (2016).
Nevertheless, the current data-driven solutions for SAR image classification are still facing several challenges. The first is the contradiction between the data-hungry deep learning approaches and the expensive cost in manual annotation for SAR. At present, some pre-training related methods are popularized to tackle the issue, such as transfer learning and self-supervised learning. The transfer learning methods utilize the models pre-trained on other data domains (like natural images, optical remote sensing imagery, etc) via fine-tuning Huang et al. (2017), domain adaptation Huang et al. (2020b), meta-learning Fu et al. (2022), etc. The self-supervised learning usually takes the current domain data without annotations to optimize a designed contrastive or pretext task, obtaining the pre-training model for the downstream classification Wen et al. (2021); Ren et al. (2021).
Despite the good performance of transfer learning and self-supervised learning on SAR image classification, the prediction of most deep models is short of physical explanation. In consideration of the special physical characteristics underlying SAR images, it is important to develop the hybrid approaches that blend the deep learning algorithms with physical models in SAR domain to keep the prediction consistent with physics and expert knowledge, which still remains a big challenge in the current researches. At first, such feature fusion methods that combine the CNN features with the handcrafted description of SAR images were proposed Zhang et al. (2020); Wang et al. (2021); Sun et al. (2020), but there still exist limitations on providing explanations and physical insights of SAR. More advanced hybrid approaches are required to embed the prior scientific knowledge from physical model into the deep neural networks de Bézenac et al. (2019), particularly in SAR image classification domain, where the relevant studies are at the initial stage with a rising trend Huang et al. (2020a).
To meet the above challenges in SAR image classification, we propose a novel physically explainable CNN that blends the data-driven and model-driven approach to perform  Figure 1: A surrogate task is built based on the physical information of SAR ℎ derived from the explainable model . Thus, the prior knowledge of physical model is embedded as feature representation , which is successively injected into the main classification task via to learn the semantic label .
impressive generalization with limited labeled data and achieves physically explainable predictions. Our motivations are two-fold, as depicted in Fig. 1. Firstly, we intend to build a physics-inspired data-driven model for SAR image classification such as Daw et al. (2020); Park and Park (2019); Svendsen et al. (2018) in other research fields, which embeds the knowledge prior of physical model into the neural network. Secondly, inspired by self-supervised learning where the semantic feature embeddings are learned without supervision, we set a surrogate task based on the physical model to leverage the unlabeled SAR images and further support the main classification task. The proposed method, namely physics guided and injected learning (PGIL), is composed of three modules: For evaluation, a hybrid Image-Physics dataset format is proposed, equipped with both SAR image patches and the corresponding physical scattering mechanisms. Sufficient experiments are conducted mainly on the Sentinel-1 seaice classification dataset and also the Gaofen-3 SAR data to demonstrate the effectiveness of each module in the proposed method. The results show that the proposed PGIL exhibits remarkable generalization performance compared with the counterpart CNN architecture with supervised learning, transfer learning, and self-supervised learning in case of limited labeled data. More importantly, the physical explanations are discussed to demonstrate how the prior knowledge constrains the network from training and how the physics consistency is maintained in the predictions.
The contributions are summarized as follows: 1. A novel physically explainable deep learning method is proposed for SAR image classification that deeply integrates the data-driven and theory-driven approaches. 2. By establishing a novel surrogate task based on explainable physical models, an unsupervised physics guided network is optimized to learn general features aware of prior knowledge. 3. A hybrid Image-Physics dataset formation is proposed for evaluation, which combines the image and physics information of SAR in a concise way. 4. We analyze the physics awareness of features, the good generalization on limited labeled data, and the explainability as well as the physics consistency of the predictions by sufficient experiments and discussions.
The rest of this paper is organized as follows. Section 2 reviews the background knowledge of physical models applied in this paper. Section 3 presents the physics guided and injected learning (PGIL) neural network for SAR image classification. The experiments and discussions are given in Section 4. Finally, Section 5 provides the conclusions.

Background
In this section, we introduce the background of explainable theory-driven models for SAR applied in the following proposed method.
The first is the target decomposition model for PolSAR data to represent the target scattering by several basic scattering mechanisms. One of the well-known methods is the Cloude-Pottier decomposition for full-polarized SAR Cloude et al. (1997), with the entropy and the angle calculated from coherency matrix. An ∕ plane is separated into nine zones to depict different scattering characteristics of full-pol SAR data, as shown in Fig. 2 (a). The scattering mechanism classification result can be obtained via the complex Wishart classifier proposed in Jong-Sen Lee et al. (1999). Afterwards, the Cloude-Pottier decomposition model has been improved for dual-polarized SAR images Ji and Wu (2015). In this paper, we employ the Cloude-Pottier decomposition for both full-and HH/HV dualpol SAR data Cloude et al. (1997); Ji and Wu (2015), and S(x 0 , y 0 )  Cloude et al. (1997) to demonstrate the scattering mechanisms for full-polarized SAR data. (b) indicates the time-frequency analysis model in HDEC-TFA method Huang et al. (2021a) where the backscattering variations in different range and azimuth bandwidths of SAR targets are characterized.

Bandpass filters
the scattering mechanism classification results are obtained by SNAP software.
The polarimetric decomposition is no longer available in single channel SAR image data. The second one we introduce in this paper is based on the time-frequency analysis model for single-polarized SAR data, as shown in Fig. 2(b). The 2-dimensional short-time Fourier transform based timefrequency analysis on complex-valued high resolution SAR data characterizes the backscattering intensity variations of targets with different range and azimuth bandwidths, denoted as sub-band scattering pattern Huang et al. (2021a). Given a specific target with the position of ( 0 , 0 ), and a segment centered in ( 0 , 0 ). The sub-band scattering pattern of target ( 0 , 0 ) is defined as where ( , ) represents a series of bandpass filters centered on frequency pairs {( , )} in both range and azimuth directions. The details can be found in Huang et al. (2021a). Fig. 2(b) gives some examples of the extracted subband scattering pattern for different targets. A learning based HDEC-TFA method was proposed in Huang et al. (2021a) to classify the scattering patterns. The above two different physical models will be applied in our proposed method.

Overview
In most patch-wise SAR image classification methods, the processed SAR amplitude images denoted as are a1 a2 a3 b f c1 c2 c3 d e1 e2 e3 g1 g2 g3 1 2 3 4 5 6 7 8 9 (a) a1 a2 a3 b f c1 c2 c3 d e1 e2 e3 g1 g2 g3 1 2 3 4 5 6 7 8 9 (b) considered other than the original complex product to predict the image label . Thus, the intrinsic electromagnetic characteristics of SAR is not considered but desired. To this end, our proposed physics guided and injected learning (PGIL), as summarized in Fig. 1, leverages the visual friendly image data and the underlying prior knowledge in physical model. The basic motivation is to embed the physics knowledge into the neural network effectively.
Three main modules are included in PGIL, that are explainable models (XM), physics guided learning network (PGN), and physics injected learning network (PIN). XM offers prior knowledge of physical model. PGN convert the prior knowledge into feature embedding, which is successively fused in PIN for label prediction. The overall framework is depicted in Fig. 4.
XM acts on complex SAR image data , where the explainable descriptor ℎ is obtained to represent the physical scattering properties of SAR image: (2) Phy. Guided-Lay.

Physics-Aware Features
x  ℎ plays a major role in establishing the surrogate task of PGN for optimization, therefore referred to physics guided signal. PGN follows an unsupervised learning manner and outputs the feature embedding aware of prior physical knowledge, namely physics-aware features. The mapping function is written as:

Physics-Injected Learning
Finally, PIN is proposed to complete the main classification task where the physics-aware feature is injected, denoted as

Explainable Models
As introduced in Section 2, the physics-based H/ -Wishart Cloude et al. (1997); Ji and Wu (2015) and HDEC-TFA Huang et al. (2021a) models serving as a part of XM, are adopted to obtain the scattering mechanism of target in SAR image , denoted as ( ). The discrete physical scattering labels ( ) either depict different zones in H/ plane Cloude et al. (1997); Ji and Wu (2015), or refer to different targets with diverse scattering variation patterns ( Fig. 2(b)) Huang et al. (2021a).
Compared with SAR image label , ( ) is too physics-specific to offer the semantic information. In view of optimizing PGN to obtain the feature embedding aware of physics knowledge as well as being semantic distinctive, we additionally employ the Latent Dirichlet Allocation (LDA) on ( ) to output the topic mixture ℎ , denoted as ℎ = (( )).
In LDA topic modeling, the document formulated with bag-of-words representation is characterized by a distribution over latent topics, and each topic is represented by a distribution over words in the vocabulary. The corpus is gathered from dataset to train the LDA model in an unsupervised pattern and the generative process is explainable. The details can be found in the related literature Blei et al. (2003); Rasiwasia and Vasconcelos (2013). We redefine the essential variables of LDA on the basis of SAR scattering characteristics as follows.
• word vector: Randomly crop an area with the size of 8×8 from ( ) as  and calculate the normalized histogram of  as the word vector.
• vocabulary: With randomly generated word vectors, apply the -means algorithm Hartigan and Wong (1979) to obtain the vocabulary with cluster centers.
• document: For each SAR image patch, the set of scattering word vector is gathered by tiling ( ) with a step-size of 4. The document is given by the frequency of each word in vocabulary .

• corpus
The corpus is collected for training the LDA model that formed as a matrix with the size × , where is the number of document.
Finally, ℎ is obtained as the topic mixture (namely Bag of Topics, BoT) of ( ), where denotes the score of the th topic. Generally, the summation of equals 1.

Physics Guided Network
The role of PGN lies in embedding the prior physics knowledge in a neural network, so as to extract the physicsaware features with semantic discrimination beneficial to classification. The optimization of PGN is motivated by the pretext task setting in self-supervised learning Misra and Maaten (2020). Tȃnase et al. Tȃnase et al. (2017) pointed that the topic semantics of scattering properties are close to human semantics used for basic land-cover types. Correspondingly, we propose to build a surrogate task under the following assumption: the SAR image features and the topic mixture of physical scattering labels should share common attributes in semantic level. In other word, the physics descriptor ℎ can be partly represented by high-level deep features extracted from SAR image .
We apply the first three residual blocks of ResNet-18 He et al. (2016) as the SAR image feature extractor, denoted as  ( ). Since the weak relationship between physics-specific topics ℎ and image-specific features, we design the physics mapping layer (PML) denoted as  to narrow the knowledge gap. The PML is composed of a convolution module and a fully-connected layer, mapping the image representations to physics topic space, denoted as =  ( ( )) where ∈ ℝ . The following objective function describes the soft semantic relations between them, that where denotes the topics that can be represented by features from SAR vision domain. Equa. (6) is a relaxed constraint, where only the related semantics are considered to be similar. As a comparison, the hard constraint is where the unrelated semantics ( ∉ ) are additionally required to be highly different.
We choose the soft constraint in our method considering the semantic gap between SAR physics knowledge and the visual perception of image data. The follow-up experiments will discuss the differences of soft and hard constraints.
In order to simplify the gradient descent optimization, we modify Equa. (6) as where is the activation term with a value of 1 or 0. We choose the locations where ℎ ⩾ as activated ones ( = 1) with a probability of , otherwise deactivated ( = 0). The parameter filters the remarkable attributes in ℎ , that is, only the significant semantic topics are considered as possibly related. The probability decides only part of the semantic topics are selected to be related.

Physics Injected Network
The physics injected network (PIN) is designed to inject the physics-aware features obtained from the unsupervised PGN into the traditional deep neural network. The injected features provide abundant prior information for the deep network training and are adapted to satisfy the classification task as far as possible.
To decide which layer is appropriate to be taken for the injected physics-aware features is crucial. In our work, we select the output of ResBlk-3 as the physics-aware features to be injected, that is, =  ( ). The decision will be discussed in the following experiments.
We propose a simple injection strategy that adding the transformed physics-aware features to the mid-and high-level layers of the traditional classification network successively. The blue modules in Fig. 4 are the conventional ResNet-18 classification networks, denoted as  18 . The transform layers, shown in the red module in Fig. 4, are designed to convert the physics-aware features to the same size of the destination, denoted as  . The transform module is composed of a 1×1 convolution layer and an upsample layer if needed, for channel and feature size transformation respectively. The final output of the physics injected neural network is written as ( 18 ( ),  ( )). In the classification case, the cross entropy softmax loss function is widely used. Here, we denote the CE loss as where = ( 18 ( ),  ( ( ))). In order to ensure the physics-aware features to be more adaptive to the classification task, we add the small weighted soft constraint in Equa. (8) as an regularization term and fine-tune the PGN slightly in supervised classification training. The total loss function is written as

Experiments
In this section, we firstly introduce the hybrid Image-Physics data format and the experimented datasets. Then, we conduct several experiments to prove the effectiveness of our proposed method with sufficient discussions.

Dataset and Experimental Setup
Most SAR image classification datasets, like Open-SARUrban Zhao et al. (2020), only provide the processed SAR amplitude images for a better visual understanding. For the purpose of leveraging the underlying physics knowledge in SAR and meanwhile preventing large storage space for complex data , we propose the hybrid Image-Physics (Img-Phy) data format to integrate and ( ) in a concise way, to accomplish the proposed PGIL method.
We mainly evaluate our method on a sea-ice classification dataset acquired by Sentinel-1, as shown in Fig. 5 (Left). The Sentinel-1 Interferometric Wide (IW) SAR data in polar region is downloaded 1 , both single-look complex (SLC) and multi-looked Ground Range Detected High resolution   Lee et al. (2004), which are also given in Fig. 3.

Table 1
The sea-ice classification dataset of Sentinel-1. (GRDH) product (HH channel). Seven sea-ice types are annotated with the patch size of 256×256 for GRDH image, serving as . Besides, the dual-polarized SLC data is processed by SNAP software 2 to obtain the H/ labels, serving as ( ). To ensure ( ) almost covers the same area with , some essential operations like multi-looking, grounded range projection, are required. Since the different pixel spacing between the processed SLC and GRDH, the Phy data is not square anymore, which is stored as a matrix with the size of about 187×139. For better visualization, Fig.  5 shows the resized square Phy patch in RGB format, where each color represents a scattering label.
For a better evaluation, especially in case of limited labeled data, we randomly select 45, 35, 25, 15, and 5 samples from each class for training, denoted as train-45, train-35, train-25, train-15, and train-5, respectively. The test set is fixed, as shown in Table 1.
In addition, a Gaofen-3 SAR scene image covering a wide urban area is experimented 3 , shown in Fig. 5 (Right). Seven land cover and land use classes are annotated. The  (2004) on full-polarized data). In the following experiments, we will discuss the physics guided learning results with different Phy data. When using H/ -Wishart result of the polarimetric SAR as the Phy data, the obtained ( ) is with = 9 classes of physical scattering characteristics, corresponding to nine zones in H/ plane shown in Fig. 2(a). While for HDEC-TFA result of the single channel (HH) SAR image, is set to 15 according to Huang et al. (2021a). In the topic modeling for physics guided signals generation, the vocabulary size and the topic number of the LDA model are set to 500 and 175, respectively. Note that the topic number is a critical parameter in the algorithm, that will be under discussions in the following experiments. To determine in Equation (8), we set and to 0.1 and 0.9, respectively. The following discussions will illustrate the strategy of parameter setting.
The physics guided learning is optimized by stochastic gradient descent (SGD) with a fixed learning rate of 0.05, Table 3 The SVM performance (Overall Accuracy / F1-score (%)) of physics-aware features in case of different topic numbers. Parameter setting: = 0. and the momentum is set to 0.9 by default. All Img-Phy pairs in the dataset are fed into the PGN for training, lasting 200 epochs in total. The physics injected learning only takes annotated data to train  18 and  . The initial learning rate is set to 0.001 and the cosine annealing strategy is applied to decrease the learning rate to 10 −8 in the last 3 epochs of 50 in total. The soft constraint regularization term in Equa. (10) is weighted by set to 0.1.
All experiments are conducted on a workstation of 64 bit Linux operating system, with 64G RAM and NVIDIA RTX 3090 graphics card of 24GB GDDR6X VRAM clocked at 1700 MHz.

Unsupervised Physics Guided Learning
The PGN learns physics-aware features from all Img-Phy pairs. To evaluate the discriminative ability of in semantic domain, we train a support vector machine (SVM) on to predict . In this section, we will firstly discuss how the topic number , and the activation strategy effect the discriminative physics-aware feature learning based on the sea-ice classification dataset of Sentinel-1. Then, the characteristics of physics-aware features, as well as the differences between hard and soft constraint, are analyzed. At last, we additionally evaluate the effectiveness of the proposed physics guided learning approach on Gaofen-3 SAR data covering a wide urban area.

Hyperparameter Discussion
Firstly, we set and to 0.1 and 0.9, respectively, and discuss the topic number . The classification results are shown in Table 3, where the highest overall accuracy or F1-score are marked in red. We check out six different values of (25,50,100,150,175,200) and train the SVM classifier on 5 different training sets. It can be observed from Table 3 that a larger almost leads to a better result. Since the fullyconnected layer in the PML module is determined by the topic number, a large would introduce plenty of parameters and increase the computation load. Consequently, we decide = 175 for a better trade-off. The topic number decides the shared attributes space ℝ where the image representation is mapped. The assumption is proposed to build a bridge between ∈ ℝ and { }. A larger ensures more fine-grained physics attributes, so that the soft constraint in Equa. (6) can be more precise. We calculate the sparsity of BoT vector , defined where || ⋅ || 0 denotes the L0 norm. Fig. 6 plots the sparsity of BoT representation in case of different topic numbers. We find the fine-grained physics attributes lead to a more sparse representation of BoT encoding. The BoT sparsity is also highly relate to the physics-aware feature performance. Intuitively, a sparsity greater than 0.985 can be regarded as a good choice with the topic number no less than 150. Next, we will discuss how to fix the activated physics attributes in Equa. (8). is determined by and , where filters the prominent attributes in ℎ as the candidates, and randomly select the potential attributes to calculate the constraint. Table 4 shows the SVM classification results of different , with = 175 and = 0.9. When equals 0, some unconsidered attributes ( close to 0) may be included to guide the network to learn the insignificant features. It is better to set as a small value but greater than 0, e.g. 0.1 in our case. Table 5 discusses the values of in the context of = 0.1 and = 100. is the probability for randomly selecting the potential attributes in ℎ . The value of from 0.5 to 1.0 indicates the constraint becomes rigid. The result shows it is better to choose a greater values of , which demonstrates a majority of remarkable attributes should be considered. Here, we choose = 0.9 in our experiments. To summarize, the topic number is the most important hyperparameter in PGN learning which can be determined by the sparsity of BoT representation. controls a relax activation of physics attributes, decided by a relatively casual value of and . We recommend to set to 0.1 or 0.2, and to 0.9 or 0.8, respectively.

The Physics-Aware Features Discussion
The PGN is comprised of the ResNet-13 backbone and the 2-layer PML module. The ResNet-13 backbone  extracting hierarchical features from is basically image specific, with the higher-level features becoming closer to semantic meaning. The PML module transforms  ( ) to the physics attributes space ℝ , building the semantic relation between image representation and physics knowledge. The physics-aware features are expected to be discriminative in the classification semantic domain and also with physics awareness.
We analyze the outputs of different layers in PGN to demonstrate the semantic discrimination and the physics awareness of features. The SVM classification results are used to indicate the semantic discrimination, as shown in the first row in Table 6. The features from ResBlk-3 reach the highest classification accuracy of 84.18%, followed by an overall accuracy of 82.39% for the features from the convolution layer in PML module. The results demonstrate how the feature discrimination in semantic level changes with the physics guided neural network. Note that the physics BoT encoding only achieves 61.53% in classification, which indicates ℎ is highly physics specific and the semantic gap is truly existed between ℎ and . Even so, ℎ can guide the PGN to learn the discriminative features close to successfully, with the designed objective function. Intuitively, the first row in Fig. 7 displays the annotated labels with different colors, indicating the feature discrimination in semantic level. We can observe that the BoT encodings are confused in understanding the semantic labels, since the SAR images in the same class may have different physics attributes. After the physics guided learning, the feature discrimination in semantic level has been improved from PML-fc to ResBlk-3 layer. Due to the lower level of ResBlk-2, the features in Fig. 7(a) are not as discriminative as those of ResBlk-3 in Fig. 7(b).
Additionally, we demonstrate the physics awareness of features by visualization and quantitative metrics, shown in the second row of Fig. 7 and Table 6, respectively. Given the BoT encodings of physics attributes, we apply ameans algorithm to cluster the BoT representations into classes as the color identification in the second row of Fig.  7. Thus, (f)(g)(h)(i)(j) indicate the feature discrimination in physics level, that is, the samples with the same color have similar physics attributes. We can observe how the features become physics aware with the help of PML module. Table  6 also lists the silhouette coefficient of features which reflect the separation between clusters. The silhouette coefficient values between -1 and 1, and a higher silhouette score indicates better discrimination of physics information in this feature space, that is, stronger physics awareness of features. It gradually decreases from physics topic space to image feature space, but still keep a positive value of 0.0351 in ResBlk-3. As a comparison, the silhouette score of features in ResBlk-3 of the traditional CNN learning model is -0.0085, indicating physics unawareness. Hence, we assert the PGN is able to learn the physics-aware features.

Hard and Soft Constraint Discussion
We discuss the soft and hard constraint objective functions defined in Equa. (6) and (7). The soft constraint First Author et al.: Preprint submitted to Elsevier Table 6 The SVM classification results (OA / F1-score), and the physics awareness analysis of features (quantified by silhouette score) from different layers in PGN. The soft and hard constraints in Equa. (6) and (7)   only emphasizes the common semantics from physics and vision domain to be highly similar, while the hard constraint additionally restricts the specific ones to be different. The SVM classification results in Table 6 indicate that the soft constraint guides the PGN to learn more general features which achieve better performance in classification. The listed silhouette coefficients of hard constraint in Table 6 are larger than those of soft constraint, which demonstrate the the learned features by hard constraint are more physics specific but less semantic discriminative. The discussion explains the semantic gap between the physics attributes and the image features of SAR, encouraging us to find a trade-off in learning the physics-aware features.

Generalization Analysis of PGN
We additionally take the other Gaofen-3 SAR data, as shown in Fig. 5 (right), to demonstrate the effectiveness and generalization ability of the proposed PGN, besides the polarimetric characteristics derived from H/ -Wishart method Jong-Sen Lee et al. (1999). In our previous work Huang et al. (2021a), we verified that the HDEC-TFA method could automatically discover the time-frequency properties of SAR target in high resolution SAR images, especially for some man-made targets with characteristic scattering behaviors. It is based on the physics meanings of timefrequency analysis on complex SAR data, which reveals the scattering variation on different azimuth angles and range bandwidths. We apply different physics information on PGN to illustrate our proposed method can be integrated with various physical models. The SAR image in urban city with dense buildings and man-made targets is more complicated than polar area. With very limited annotation, it is difficult for supervised CNN training to learn generalized and discriminative features. Table 7 shows the CNN only achieves an accuracy and F1-score of 52.43% and 48.51%, respectively, with severe overfitting. The SVM classification is utilized to evaluate the physics-aware features by PGN with different Phy data. As recorded in Table 7, the classification result of physics-aware features guided by Phy-1 (HDEC-TFA) signals improves 16.33% in accuracy than CNN training result, 6.22% better than guided by H/ -Wishart scattering characteristics. It also verifies the effectiveness of HDEC-TFA learning approach to extract significant physics properties under circumstance of no polarimetric information available. Fig. 8 visualizes the annotated ground truth, the test results of CNN, PGN with Phy-1 and Phy-2 on two classes, High-density residential area and Airport. The scattering in high-density residential area is strong and extremely complicated so that the visual interpretation is difficult. However, the physical characteristics in this area, as shown in Fig. 3, are more distinctive than other regions, especially in HDEC-TFA case. Consequently, PGN with Phy-1 (HDEC-TFA) achieves the best prediction result on high-density residential area with minimum false negative samples. There are 4 airports in the SAR image, as shown in Fig. 8, where only 10 annotated samples in Airport-1 are used for training. The PGN with Phy-1 can remarkably predict the remaining 3 unseen airports with only 3 false negative samples, which is much superior than traditional data-driven CNN model.

Supervised Physics Injected Learning
In this section, we discuss the effectiveness of PIN module on sea-ice classification dataset in Table 1 (train-45 as the training data). The physics-aware representation is considered as an off-the-shelf feature from PGN to be injected. The baseline is the traditional CNN method, that is, training  18 with only labeled amplitude SAR images. We both test the training from the scratch strategy and the transfer learning strategy using a SAR image pre-trained model proposed in Huang et al. (2021b). The classification results are shown in Table 8, where the retraining accuracy is only 74.55% and the transfer learning result is 82.57%. It shows that the limited labeled data is insufficient to train a very deep neural network from scratch. After the physics-aware features injection, however, the retraining results improve more than 10%, as shown in Table 8. Table 8 The ablation study of physics injected learning. (OA (%))

inj-2 inj-3 inj-4
retrain ( We discuss the different locations where the physicsaware features are injected, including the ResBlk-2, ResBlk-3, and ResBlk-4, denoted as inj-2, inj-3, and inj-4 in Table  8, respectively. With the same depth of and the features of ResBlk-3, the injection in ResBlk-3 reaches the best performance of single-injection strategy, marked in blue.
The results indicate that the obtained physics-aware feature is with abstract meanings, but still has semantic gap with the semantic features for target task. We also find that the multi-layer injection can improve the classification most in both retraining and transfer learning cases.
The unsupervised PGN training costs about 4.45h, with the batchsize of 300 and training epochs of 200. Afterwards, the supervised PIN only takes 13.25 minutes for training, with the batchsize of 100 and training epochs of 100.

Ablation Study
In order to demonstrate the effectiveness of each module in the proposed method, we conduct the detailed ablation study of each module on sea-ice dataset (train-45 as the training data).
As shown in Table 9, the baseline is set as retraining the CNN model from scratch, achieving an overall accuracy of  Table 9 shows the ablation experiments, excluding each part above-mentioned separately. It is clear that the XM and PGN play the most important roles in the proposed method. Generating an appropriate physics guided signal ℎ in XM remarkably effects the quality of injected features learned by PGN, with the accuracy increasing from 80.61% to 84.96%. The PGN module ensures the injected knowledge more related to the target task. Otherwise, directly injecting the BoT representation in Equa. (5) only has an accuracy of 78.78% in average, and the proposed PGN contributes 6.18% improvement. The PIN learning makes further efforts on fusing the physics knowledge and vision features together, which has an improvement of about 2.37%. The SAL part which makes the physics-aware features more adaptive to the target task slightly improves the result. Additionally, we compare some self-supervised learning methods in computer vision field Wu et al. (2018);Chen et al. (2020) and also for PolSAR data Ren et al. (2021) with the proposed PGIL, since they all establish pretext tasks for unsupervised learning the feature embeddings. NPID Wu et al. (2018) learned the optimal feature via instance-level discrimination, while SimCLR Chen et al. (2020) conducted the contrastive learning based on dataaugmentation, both focusing on image contents. MI-SSL Ren et al. (2021) was proposed for PolSAR land cover classification, learning discriminative high-level features between multi-modal representations of PolSAR data. In order to adapt MI-SSL method to our case, we changed the SSL input of multi-modal features for full-polarized SAR data to our Img-Phy pairs. The results are listed in Table 10. Although NPID and SimCLR perform well in natural image classification, such as ImageNet, the results

Interpretability Discussion
In this section, we use the sea-ice classification case to demonstrate the physics explainability of the proposed method. The explainability lies in the following two aspects. Firstly, the topic modeling for physics information provides explainable representations for each SAR image patch. Secondly, the PGN and PIN maintain the explainable physics consistency of features to learn reasonable results and prevent overfitting during automatic training.

Physics Explanation of ℎ
The PGN optimization is driven by the physics guided signals ℎ , denoted as (( )) = { 1 , 2 , ..., } in Equa. (5). The LDA topic modeling processing  explains the physical scattering characteristics ( ) by a combination of topics. Each latent topic is represented by a set of specific words, that is, the physical scattering characteristics distribution in a small area of SAR image (defined as "word" in LDA). The weight assigned ( ) describes the probability of the SAR image patch belonging to the topic . This benefits the understanding of hidden semantic structure between scattering labels of a large-scale SAR image area at an aggregate level. Fig. 10 presents the averaged physics topic distribution of training data for some selected sea-ice classes, with topic number = 175. Since we have discussed previously that most SAR patches are with highly sparse physics topic representations, the classes whose distribution concentrated in fewer topics, such as iceberg and glaciers shown in Fig. 10, are more characteristic. The iceberg is mainly represented by topic-73 weighted 0.6 and topic-119 weighted 0.13. The word distribution of each topic is also given in Fig. 10, where each word can be explained by the physical scattering properties. In this sea-ice hybrid Phy-Img dataset, word-6 and word-7 are mostly random surface and Bragg surface  scattering, respectively, and word-50 is a mixture of these two scattering properties. Also, we list the four dominated topics and the topic weights of water bodies and floating ice in Fig. 10. The result indicates the physics attributes of water bodies and the floating ice are similar in semantic level. The inference can be proved from the semantic definition of floating ice -any form of ice found floating in water 4 , that is to say, the SAR image patch with floating ice probably includes water bodies. Additionally, there are similar topics combinations in water bodies, floating ice, and young ice, shown in the zoom-in region of Fig. 10. The young ice has another specific topic of 62, indicating this class could have two kinds of representative physics attributes.

Explanation of PGIL Results
Since the PGN is trained with the explainable BoT physics encoding, the above inferred information can explain the discrimination of physics-aware features in Fig.  7(b). With more characteristic topic distribution of glaciers and icebergs, their physics-aware features are the most discriminative, and the SVM classification results show the two classes achieve the best F1-score of 0.909 and 0.911, respectively, as shown in Fig. 9. In addition, the previous analysis explains the similar feature distribution of water bodies and floating ice shown in Fig. 7(b), and the features of young ice have two different characteristics, one of which is close to water bodies and floating ice. As a result, we can observe in the confusion matrix that the most misclassified test samples in water bodies are predicted as young ice and floating ice, with a number of 57 and 37, respectively. Among the 11 false negative samples in the floating ice class, 9 are classified to water bodies and 2 to young ice.
Two cases are presented in Fig. 11 to demonstrate how the PIN preserves the physics consistency of features. We visualize the features of CNN supervised training, physicsaware features from unsupervised PGN, and the features after injection in PIN by t-sne Maaten and Hinton (2008).
As shown in Fig. 11(a), the CNN feature of instance 996 and 496 in glacier class are far away from the majority due to the different image contents. Based on the similar physics attributes, the middle figure in Fig. 11(a) shows the two samples are close to most glacier data in physics-aware features. In the visualization of features after physics injected learning, we can observe sample 496 and 996 are still close to each other due to the similar visual representation, and also, they maintain the closeness with sample 494, 793, and 1193 as they were in physics-aware features. As a result, we infer the physics injected learning can preserve the physics consistency during the network training. Another example is about sample 5013 and 4224, shown in Fig. 11(b). They have similar image features in CNN training while different from the other first year ice samples. The physics-aware features in the middle figure reveals that sample 5013 and 4224 have their own physics characteristics. The physics constraints existed in physics-aware features are, i.e, sample 5013 having similar physics properties with 4628, 5606, 5906, and sample 4224 being very close to sample 3925 from old ice class. PIN continues this kind of constraint, as shown in the right figure of Fig. 11(b). In a word, the proposed PGN and PIN represent the SAR images from a more comprehensive perspective than the traditional supervised CNN learning.

The Inspiration from Explainability
In addition, the explainability we discussed before can inspire us to improve our deep learning algorithm in the future work. For example, the above analysis indicates the floating ice has a similar physics BoT representation with water bodies so that the physics-aware features of them are not well discriminative in semantic level, as shown in Fig.  11. On the contrary, they can be discriminative based on the visual contents. The final PIN result shows injecting physics-aware features could not improve the performance of recognizing water bodies and floating ice, because their physics knowledge is not as helpful as other classes. We can see in Fig. 9 that the true positive samples of water bodies and floating ice in PGIL result are fewer than those in CNN training result. Thus, it inspires us to re-think the physics injected learning strategy that the constraint of physics consistency should be relaxed in such classes. In our future work, we will further improve the physics injected learning method in this direction to achieve a better result.

Conclusion
In this paper, we propose a novel physics guided and injected learning neural network for SAR image classification with limited labeled data, to explore the potential of physically explainable deep learning. Three components of PGIL are explainable model, physics guided network, and physics injected network. The prior knowledge in explainable models is encoded into the physics-aware features via unsupervised PGN learning, and then is injected in the classification pipeline through PIN, supervised by limited labeled data. The hybrid Img-Phy dataset format is proposed for evaluation and abundant experiments are conducted on Sentinel-1 and Gaofen-3 SAR data. The results demonstrate the semantic discrimination and the physics awareness of the learned features by PGN, as well as the good generalization. Additionally, we discuss the interpretability of the guided signals in the established surrogate task to prove the results are with physical constraint. The advantages of the proposed PGIL are (1) the unsupervised PGN is a plug-and-play module which can be integrated to any deep learning framework for physics-aware feature injection; (2) the physics knowledge injection is capable of preserving the physics consistency in the prediction and preventing overfitting in case of limited labeled data; (3) the results are explainable with the help of the distinct physical models and expertise to a certain extent, that inspire us to further improve the deep learning model in the right direction. The sea-ice dataset and source code are publicly in https: //github.com/Alien9427/XAI4SAR-PGIL.