Supervised Contrastive Learning for RFF Identification With Limited Samples

Radio-frequency fingerprint (RFF), which comes from the imperfect hardware, is a potential feature to ensure the security of communication. With the development of deep learning (DL), DL-based RFF identification methods have made excellent and promising achievements. However, on one hand, existing DL-based methods require a large amount of samples for model training. On the other hand, the RFF identification method is generally less effective with limited amount of samples, while the auxiliary data set and the target data set often needs to have similar data distribution. To address the data-hungry problems in the absence of auxiliary data sets, in this article, we propose a supervised contrastive learning (SCL)-based RFF identification method using data augmentation and virtual adversarial training (VAT), which is called “SCACNN.” First, we analyze the causes of RFF, and model the RFF identification problem with augmented data set. A nonauxiliary data augmentation method is proposed to acquire an extended data set, which consists of rotation, flipping, adding Gaussian noise, and shifting. Second, a novel similarity radio-frequency fingerprinting encoder (SimRFE) is used to map the RFF signal to the feature coding space, which is based on the convolution, long short-term-memory, and a fully connected deep neural network (CLDNN). Finally, several secondary classifiers are employed to identify the RFF feature coding. The simulation results show that the proposed SCACNN has a greater identification ratio than the other classical RFF identification methods. Moreover, the identification ratio of the proposed SCACNN achieves an accuracy of 92.68% with only 5% samples.


Supervised Contrastive Learning for RFF Identification With Limited Samples
Yang Peng , Changbo Hou , Yibin Zhang , Yun Lin , Member, IEEE, Guan Gui , Senior Member, IEEE, Haris Gacanin , Fellow, IEEE, Shiwen Mao , Fellow, IEEE, and Fumiyuki Adachi , Life Fellow, IEEE Abstract-Radio-frequency fingerprint (RFF), which comes from the imperfect hardware, is a potential feature to ensure the security of communication.With the development of deep learning (DL), DL-based RFF identification methods have made excellent and promising achievements.However, on one hand, existing DL-based methods require a large amount of samples for model training.On the other hand, the RFF identification method is generally less effective with limited amount of samples, while the auxiliary data set and the target data set often needs to have similar data distribution.To address the datahungry problems in the absence of auxiliary data sets, in this article, we propose a supervised contrastive learning (SCL)-based RFF identification method using data augmentation and virtual adversarial training (VAT), which is called "SCACNN."First, we analyze the causes of RFF, and model the RFF identification problem with augmented data set.A nonauxiliary data augmentation method is proposed to acquire an extended data set, which consists of rotation, flipping, adding Gaussian noise, and shifting.Second, a novel similarity radio-frequency fingerprinting encoder (SimRFE) is used to map the RFF signal to the feature coding space, which is based on the convolution, long short-term-memory, and a fully connected deep neural network (CLDNN).Finally, several secondary classifiers are employed to identify the RFF feature coding.The simulation results show that the proposed SCACNN has a greater identification ratio than the other classical RFF identification methods.Moreover, the identification ratio of the proposed SCACNN achieves an accuracy of 92.68% with only 5% samples.

I. INTRODUCTION
W ITH the development of wireless communication tech- nology, the Internet of Things (IoT) has been widely used in numerous fields, such as smart home, intelligent driving, and intelligent transportation [1], [2].In addition, the Internet of Everything (IoE) represents an inevitable trend of current industries.Moreover, the wide deployment of the fifth-generation (5G) wireless communications has brought the benefits of low latency and high speed, which will further promote the development of these industries.However, the rapid increase of communication and IoT devices has also brought about many security issue [3], [4], [5], [6].Traditional authentication and identification techniques are usually based on the cryptography, which may be at risk in facing malicious users with huge computing resources [7].Therefore, an effective authentication and identification method for IoT devices is needed.
In recent years, great achievements have been made in device authentication and identification based on the physical layer features.The specific emitter identification (SEI) is a technology that relies on the characters of radio-frequency (RF) signals [8], [9].The RF fingerprinting (RFF) identification is an effective method in SEI.Specifically, the difference in RF signals can be considered as fingerprints of the RF signals with the ability to uniquely and stably represent the emitters [10], [11].In addition, RFF comes from the imperfection of hardware, which is inevitable in the manufacturing process [13].RFF identification consists of data preprocessing, feature extraction, and identification.The data preprocessing usually contains power normalization, synchronization, and data cleaning.It aims at normalizing the RF signal and mitigating the impact of nonstandard RF signals on feature extraction.
As mentioned above, the difference of RF signals are caused by the hardware manufacturing process, which are usually very small to distinguish.Therefore, the feature extraction is the principal step of RFF identification, which is employed to reduce the dimension of RF signal and "amplify" the difference.Traditional feature extraction of RFF identification usually employ statistical processing technologies.Then, classic machine learning methods, such as support vector machine (SVM), multilayer perception (MLP), and linear Bayesian classifier, are employed for the identification [14], [15], [16].Most of the traditional RFF identification methods can achieve a near perfect performance.However, the signal This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
processing-based feature extraction is usually based on the prior knowledge of the devices., which is device specific and cannot be generalized to another signal.Second, traditional feature extraction usually requires manual feature extraction with expert knowledge, which is impractical in dynamic communication scenarios.
Deep learning (DL) has achieved a great performance in computer vision (CV), natural language processing (NLP), and intelligent communication.Furthermore, DL shows excellent abilities in data mining and feature extraction, which has been employed in channel state information (CSI) feedback [17], [18], resource allocation [20], beamforming [21], [22], and automatic modulation classification (AMC) [23], [24], [25], [26].Meanwhile, the DL-based RFF identification method performs feature extraction and identification at the same time.It can also obtain more robust and effective RFF features of RF signals using a neural network, when compared with traditional methods [27], [28], [29], [30], [31], [32], [33], [34].However, the neural network is a data-driven model, which depends on the availability of historical data sets.The data sets used in most DL-based RFF identification methods are capable of supporting neural network fitting.Nevertheless, in the real world scenarios, a large amount of labeled data may not be available, which leads to a sharp degradation of the performance of DL-based RFF identification methods.Although this problem can be solved with few-shot learning, it still requires a massive auxiliary data set with a similar data distribution to get optimum weights for the neural network.In this article, we focus on RFF identification with nonauxiliary, limited samples, which is a limited data set and unable to support the neural network fitting.
Compared with the softmax-based RFF identification methods, the metric-learning-based methods have better performance with limited samples.Meanwhile, the supervised contrastive learning is the improved form of metric learning.In this article, a supervised contrastive-learning-based RFF identification method is proposed, which contains data augmentation, a similarity RFF encoder (SimRFE), and a secondary classifier.Specifically, data augmentation uses prior knowledge to extend the raw data set with simple mathematical operations, which does not require extra auxiliary data sets.Then, unlike the traditional softmax-based RFF identification method, we use a SimRFE to map the original RF signal to a high-dimensional feature space and constrain the feature coding with a supervised contrastive loss (SCL).Finally, several secondary classifiers based on neural network, traditional machine learning, and distance similarity measurement are used to identify the RFF feature coding.The main contribution of this article is summarized as follows.
1) Different feature embedding methods are employed to improve the performance of RFF identification with limited samples, including a data augmentation method and a model-based virtual adversarial training (VAT) method.2) We propose an SCL-based RFF identification method, which has a better feature clustering performance.The optimal hyperparameters of the proposed method are also provided to balance the training cost with the identification performance.
3) Several typical machine-learning-based secondary classifies are employed for identification of RFF feature codings.A simple distance similarity measurementbased classifier is also proposed for the identification with a simple mathematical operation.The remainder of this article is organized as follows.In Section II, we introduce related RFF identification methods and DL with limited samples.The RFF signal is modeled and the RFF identification problem is described in Section III.In Section IV, we present the proposed SCL-based RFF identification methods.Various simulation results are provided to show the performance of our proposed system in Section V. Finally, we conclude our work in Section VI.

II. RELATED WORKS
In this section, we introduce the traditional RFF identification methods with artificial RFF feature coding, the DL-based RFF identification methods, and three feasible schemes for DL-based RFF identification with limited samples.

1) Traditional RFF Identification Methods:
Traditional RFF identification methods usually consist of feature extraction and identification.The feature extraction aims at acquiring artificial RFF features via prior knowledge, which also reduces the dimension of the high-dimensional raw RF samples.Then, the artificial RFF features are identified with a traditional machine learning method.Cobb et al. [14] sliced the RF signal into slices, and obtained the instantaneous amplitude, phase, and frequency of RF signals slices as characteristics according to the definition of I/Q signals.Then, the statistical characteristics, including standard deviation, variance, kurtosis, and skewness of those three instantaneous characteristics, were used to represent the RFF features.Multiple discriminant analysis (MDA) and a linear Bayesian classifier were employed to reduce the dimension of RFF features and identify them, respectively.Tu et al. [16] employed conventional signal processing, such as dual tree complex wavelet transform (DT-CWT), short-time fourier transform (STFT), and Wigner-Ville distribution (WVD), to extract the RFF features.Robust principle component analysis (RPCA) and SVM were employed.Furthermore, constellation error [35], [36], IQ imbalance [37], and modulation shape [38] have all been utilized for RFF identification.
2) DL-Based RFF Identification Methods: Unlike the traditional RFF identification methods, the DL-based RFF identification benefit from the powerful feature extraction and model fitting capabilities, and avoid the complex feature extraction operation.Usually, the RF signal is in the form of I/Q signals, while there are several methods which convert I/Q signals into the form of image.Existing DL-based RFF identification methods can be divided into time-series signal-based and image-form signal-based methods.
The time-series data set mainly consists of I/Q signals, spectrum, or signal components in the form of time series.Merchant et al. [27] employed the multilayer neural network to identify RFF signals.Specifically, the authors estimate the ideal signal, and then subtract the ideal signal from the measured signal to get the error signal, which can be considered as the RFF feature.Then, the CNN is used to identify the error signal.Yu et al. [28] proposed a denoising autoencoder to reduce the influence of noise, which achieved considerable improvements at all SNR levels.Yu et al. [29] also proposed a multisampling convolutional neural network (MSCNN) to identify RF devices.The MSCNN extracts multiscale RFF feature from the selected region of interesting (ROI).The MSCNN achieves a 97% identification ratio for 54 CC2530 ZigBee devices under the line-of-sight (LOS) scenarios with SNR = 30 dB.Ding et al. [31] employed a supervised dimensionality reduction method to compress the dimensions of the bispectrum, and then, adopted a convolutional neural network to identify specific emitters.
In addition, recurrent neural networks (RNNs) have been used to handle time series.Roy et al. [32] studied the performance of long short-term memory (LSTM), gated recurrent unit (GRU), and convolutional LSTM (ConvLSTM) for RFF identification.RF signals collected from 8 USRP B210 were used, while achieved an accuracy of over 92%.
The common image-form signals include spectrogram and constellation diagram.Shen et al. [33], [39] used STFT to convert the I/Q signal into spectrogram.Then, a metric-learningbased neural network was used to extract RFF features.Note that Peng et al. [40] used the difference constellation trace figure (DCTF) to replace I/Q signals, which was focused on the carrier frequency offset between the transmitter and receiver.Tu et al. used a window function to slide on the constellation diagrams.Then, the number of sampling points in the window were countered to generate the contour stella image [41].Peng et al. [42] converted the I/Q signal into heat constellation trace figure (HCTF), which used the area of the Voronoi diagram to calculate the heat of the sampling points trace.A slice integration cooperation was also employed to improve the performance.The simulation results showed that all of the RFF identification methods based on the constellation diagram were effective.

B. DL With Limited Samples
In order to achieve a good performance with limited samples, we start with data set preprocessing and feature extraction.For example, data augmentation extends the raw data set, while a better feature extraction model has the ability of dealing with limited data more effectively.
1) Data Augmentation: Huang et al. [43] proposed a data augmentation method for AMC, which contained Rotation, Flip, and Gaussian.Cai et al. studied the performance of data augmentation for small sample communication device recognition.The RFF samples were augmented with different operations, which consists of noise disturbance, amplitude and time-delay transformation, frequency offset, and phase shift transformation [44].The VAT was also used to augment the RFF samples during the training process.The simulation results demonstrated the feasibility of these methods.
2) Metric Model and Contrastive Model: As can be seen in Fig. 1, the metric model and contrastive model are similar in structure.Both of them extract features by measuring the similarity of feature vectors, which can be considered as "learning to compare." The Siamese neural network is a widely used metric learning method, which extracts features with a weight-shared "encoder" and measures similarity with the Euclidean distance [45].Unlike metric learning, contrastive learning is usually used in semisupervised learning.Chen et al. [46] proposed a simple framework for contrastive learning of visual representations (SimCLR), which is a classical contrastive model.In addition, van den Oord et al. [47] modified the loss function in SimCLR for supervised representation learning.The simulation results showed that the supervised contrastive loss has a strong ability of feature representation.Wu et al. [48] and Sun [49] introduced the Siamese network for SEI, which achieved a great performance in both open set identification and few-shot learning.
3) Meta Model: Meta learning is also a very effective method in the face of limited samples, which uses auxiliary data sets to build multiple subtasks for the most generalized model.Yang et al. [50] first introduced the model-agnostic metalearning for SEI.The RF signals from 20 ZigBee devices were used as the auxiliary data set, while the generalizability of the model has also been demonstrated with the performance on a limited UAV data set.
Considering that the meta model has complex pretraining and unstable gradients [51], the metric model and contrastive model are more suitable for RFF identification.Those models are easy to implement, since there are similar to the conventional DL-based model.The supervised contrastive loss (SCL) in [47] can be regarded as an improved form of triplet loss, which is also a typical metric loss function.Therefore, we employ the supervised contrastive model in this article.

A. RFF Signal Modeling
As in [27], the collected I/Q modulation RF signal is described as where r meas (t) means the collected signal at the receiver, and r error (t) means the error signal, which is considered as the RFF.However, as can be seen in Fig. 2, the RF signal is generated by passing the baseband signal to digital to analog converter (DAC), bandpass filter, oscillator, and power amplifier (PA).Each of the modules will introduce some RF signal error.Therefore, (1) is not sufficient to describe the complex RFF signal, and the specific details of r error (t) are also worth studying.The baseband signal can be described as where r I 1 (t) and r Q 1 (t) are the baseband signals of the I and Q channels, respectively.Then, r base is transmitted into the DAC, and the signal after DAC is described as where μ I and μ Q are the direct current (dc) offsets of the I and Q channels, respectively.Then, the bandpass filter is used to filter r DC , and the filter error is described as where h BF I (t) is the filter transfer function of the I channel, and h BF Q (t) is the transfer function of the Q channel.⊗ symbol means the convolution operation.When r BF (t) is passed into the upconverter, the RF signal will be corrupted by the gain imbalance.In addition, the oscillator will also introduce a phase offset and the frequency offset in the carrier.The phase difference of the I channel and Q channel is not exactly equal to π/2, which will also introduce errors in r meas (t).The signal r up (t) is given as where λ is the gain imbalance of the upconverters, ϕ is the quadrature offset error, and ω and ϕ are the offset of phase and frequency, respectively.Then, r in is written as where . X I and X Q are expressed as The nonlinearity of the PA plays a significant role in r error , which can be described with the baseband nonlinear model.In this article, the memory effects of PA is ignored.The AM-PM conversion of PA can also be ignored, when the carrier frequency is much higher than the baseband bandwidth.The Saleh model [52] and the complex coefficient polynomial model [53] are employed to model the nonlinearity of the PA.
1) Saleh Model: The AM-AM conversion of PA in the Saleh model is written as where α a and β a are the fitting parameters of PA.A(t) is the amplitude of the RF signal and A(t) = |r in (t)|, and ω in and ϕ in are the frequency and the phase of r in , respectively.

2) Complex Coefficient Polynomial:
The AM-AM conversion of PA in the complex coefficient polynomial model is as follows: (10) where H(•) means the Hilbert transform.Considering the model of the RFF signal, r error (t) in ( 1) is a function of r ideal .Therefore, there will be a residual component of r ideal (t), which will influence the performance of RFF identification.We rewrite (1) as follows: where G r denotes the model of the RFF signal, and θ i = {μ i , h BF i , ω i , ϕ i , ϕ i , w i PA } represents the parameters which belong to y i , and the y i is the real label of RF device.

B. Problem Description
The RFF identification is based on the imperfection of the transmitter, which is represented by θ i in (11).In addition, θ i can be fitted with r meas (t) and r ideal (t).However, r ideal (t) may be hard to obtain, and it is impracticable to achieve θ i when considering the error of the modeling.Thus, most of the RFF identification methods extract the feature vector of θ i rather than finding the specific value with r meas (t).
1) RFF Identification Problem: We use r n to represent r meas (t), which is the nth RFF sample collected from an RF device.Let y n represent the real category of nth RFF sample.The data set of RFF signal is denoted as D raw = {r n , y n }.RFF identification can be described as a feature extractor problem, which is given as follows: where h n is the feature of r n and f RFF (•) is the feature extractor.D h is the data set of features.r n ideal (t) is the disturbed component of r n .h n should reduce the influence of r n ideal (t), which is described as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where P(r n ideal (t)) is the marginal probability distribution, and P(h n , r n ideal (t)) is the joint probability distribution.Then, h n is considered as the RFF feature.The softmaxbased identification methods of the RFF feature h n are written as min E (h n ,y n )∼D h L ŷn , y n (14) where ŷn is the predicted category of h n , and L(•) is used to evaluate the difference between ŷn and y n .The frequently used loss function L(•) here is cross-entropy loss function, which is written as where logit d n represents the logical value.When ŷn = y d , logit d n = 0, on the contrary, logit d n = 1.y d is the dth category.P d n is the probability of r n belong to y d .In addition, the metric and contrastive-based identification methods are written as min where h N is the set of all RFF features.h c is metric or contrastive RFF feature, which usually consists of positive samples and negative samples, and y c is the category.Contrastive loss function in the Siamese network is a typical L(•), which is written as where d(h n , h c ) is the Euclidean distance of h n and h c .max(•) means the maximum value of two elements.margin is the ideal extra-class distance.When y n = y c , logic c n = 0; otherwise, it is equal to 1.
2) Limited Samples RFF Identification Problem: It is obvious that the more samples in D h , the better performance of the neural network.Thus, the data augmentation is employed to acquire an extended data set without using an auxiliary data set.The extended data set is described as: Ḋraw = D raw ∪ D aug = {ṙ n , ẏn } and D aug = f ag (D raw ) = {r n , y n }, where f ag (•) is the function of data augmentation.f ag usually contains simple mathematical operations, such as adding Gaussian noise, introducing channels, etc.Then, ḣn is considered as the augmented RFF feature.The identification of the augmented RFF feature h n is given by min where ŷn is the predict category of ḣn .
It is also a feasible method to solve this problem by setting regularization constraints on the identification model.The regularization training method based on adversarial samples are popular which constructs virtual samples by adding perturbations.which is given by where h adv n is the feature of RFF signal with perturbations, and r adv is the perturbations which is depended on the RFF identification model.The detailed identification of the regularization training method is given in Section IV.

A. Framework of Proposed RFF Identification Method
As shown in Fig. 3, the proposed system is divided into two parts: 1) data preprocessing and 2) model fitting.RF signals from different signal generators are collected.However, the collected signals are not sufficient to achieve a high identification ratio and there is no auxiliary data set can be used.Therefore, the raw data set is augmented to obtain an extended data set.In order to adapt to SCL, the augmented data set is converted to minibatch data set.In the part of model fitting, the minibatch data set is inputted into the SimRFE.Then, the SCL and VAT is used to get a robust and specific features, which should be normalized by the module normalization (MN).Finally, the classifier is employed to get the predicted category.

B. Data Augmentation
The measured signal above is described as where r 2 I (t) and r 2 Q (t) denote the measured signal of the I channel and Q channel in the receiver, respectively.Here, r R is the complete set of RF signals, P(r R , y i ) is the marginal probability distribution of the complete set, which belongs to y i .Data augmentation can be considered as to acquire the extended data set with prior knowledge.Then, the extended data set should be label reserved, which is described as y n = y n .In addition, data augmentation aims to generate a different sample space distribution.Therefore, the main idea of augmentation is to acquire labels preserved in the extended samples, which are least similar to the raw sample.However, it is unrealistic to measure the label reservation between D raw and D aug .In this article, all the extended data set are based on the transformation of raw data.Four different simple data transformation are employed: 1) Rotation; 2) Flipping; 3) Gaussian [43]; and 4) Shifting.
1) Rotation: Considering the modulation of the RF signal used, the RF signal in the I channel and Q channel share the same generation method, except the different symbols (the difference of carrier frequency can be removed by phase shifting).Therefore, it is reasonable to exchange the I channel and Q channel to acquire more samples, described as where r s denotes an extended RF sample in the receiver and ϕ is the rotation angle.Unlike the rotation angle in CV, ϕ is aimed at the value of in-phase and quadrature components.Thus, ϕ will affect the quality of the extended data set.In order to avoid superposition of in-phase and quadrature components, the ideal ϕ values are π/2, π , and 3π/2.When ϕ moves away from the ideal angle, the result of augmentation will be come unpredictable.However, Rotation will also introduce troubles in the RFF identification.As can be seen in Fig. 2, it is obvious that when the ideal angel is π/2 or 2π/3, the difference between the I/Q channels will be interference.As can be seen in ( 11), r n and r n have different μ i , h BF i , and ϕ i .Therefore, Rotation acquires the extended data set at the cost of losing RFF features.
2) Flipping: As can be seen above, the I channel and Q channel can be exchanged to acquire more data samples.Thus, r I and r Q can be flipped, which is described as Flipping can be considered as setting the initial symbol to its opposite one.Flipping augments the data set with a simple operation, and r n and r n will have different μ i s.

3) Adding Gaussian Noise:
The RF signal will be affected by the noise in transmission, and the SNR of the RF signal is not constant.Thus, Gaussian noise can be added to the RF signal to acquire the extended data set, which is described as where N s is a stochastic time series and it obeys the Gaussian distribution N (μ, σ ).However, it is difficult to find the optimal value of μ and σ .Let the power of N s be within [a, b] dB.If the SNR of r s (t) is out of the range, the extended data set will be corrupted, which will be counterproductive to RFF identification.In addition, if N s is too small, the result of data augmentation may be poor.Here, μ is set as: 0, while σ is set as: 0.0005, 0.001, and 0.002, respectively, 4) Shifting: Inspired by the Shifting in CV, r s can be considered as an L × H image, where L is the length of the RF signal, and H = 2 is the height.Then, the Shifting in r s is where i k and q k denote the kth sampling point of the I channel and Q channel, respectively.Unlike CV, RFF identification does not rely on the information of the signal itself, which makes it free from the risk of cutting off the interest area.However, when the RF signal is composed of a constant frame structure, Shifting may decrease the accuracy of RFF identification.In addition, as can be seen in ( 4), r BF (t) is obtained by the convolution operation.Therefore, the shifted data will have a different h BF i .In addition, r up and r out are generated from r BF (t).Therefore, r n after Shifting only share the same μ i with r n .
Here, the self-collected data set Device 1 is taken as an example.The data distributions of raw data set and augmented data set are shown in Fig. 4.Only the adding Gaussian noise has significant change.The amplitude and frequency are also normalized to avoiding interference.Considering the Shift does not change the amplitude of raw data set, it only given in the data distributions of frequency.The data distributions of the phase are also analyzed, while the distributions of those methods are quiet similar.

C. Virtual Adversarial Training
The VAT is the most classical regularization training method, which can improve the generalization of the model by virtual adversarial samples.Unlike the traditional adversarial training methods, the virtual perturbations are obtained through gradient, which is described as where g n is the gradient of the distance between raw RFF signal and RFF signal with perturbations, d n is a random unit vector obeying Gaussian distribution, ξ and are tiny numbers used to control the intensity of perturbations, and KL(•) is the Kullback-Leibler (KL) divergence, which is described as where f adv (•) is the RFF identification model.Then, the KL divergence is added as the loss function.Considering the

D. SCL-Based RFF Identification Method
As mentioned above, SCL is evolved from the normalized temperature-scaled cross-entropy (NT-Xent) loss in semisupervised contrastive learning (SimCLR).The NT-Xent loss is mainly used in SimCLR.The main steps of SimCLR are as follows: first, a huge number of unlabeled data are augmented, and the augmented sample and the raw sample are considered to have the same label.Second, all the samples are mapped to the feature space by an encoder.Finally, feature coding is constrained by the NT-Xent loss to increase the similarity between the raw sample and the augment sample.
When the NT-Xent loss is introduced to SCL, NT-Xent loss is incapable of using the information of label.Thus, the SCL is proposed, which is described as where A(i) is the set of all the samples except the ith sample, P(i) ≡ {p ∈ A(i) : y p = y i } is the set of positive samples in A(i), and τ is the temperature.In order to distinguish from the hidden layer features of other metric and contrastive models, z is used here to represent the feature coding.The detailed procedure of SCL is introduced in Algorithm 1.It can be seen that when there is only one positive sample, SCL is the same as the NT-Xent loss.Meanwhile, if there is only one class of negative examples, SCL can be regarded as a deformation of triplet loss.In this article, we propose two modules for RFF identification with SCL.

1) Module Normalization Layer:
The MN layer in this article is described as As can be seen in ( 29), the similarity of feature coding is measured by the value of the dot product.It is obvious that the module of feature coding will affect the performance of SCL.Thus, an analysis of gradients is implemented, which is focused on the module of feature coding Then, the partial derivative of L i with respect to z j is described in three scenarios (32) where N(i) ≡ {n ∈ A(i) : y n = y i } is the set of negative samples in A(i).The partial derivative of each scenario is as follows.
When i = j, when z j is belong to positive samples, when z j is belong to negative samples, where Then, (∂L/z j ) is given by and (∂L/∂z) is described as where J z is the Jacobian, all the elements in the matrix should meet three scenarios above.It is obvious that when the module of z is too large, J z will be more sparse.Thus, the back propagation will be slowed down or even "killed."In addition, in reality, the loss is limited by the numeric type.If the module is unconstrained, the exp(z i • z a /τ ) term will be out of control.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.2) Minibatch Generator: The minibatch in SCL [47] refers to the semisupervised learning in CV.Therefore, the minibatch in RFF identification needs improvement.As shown in Algorithm 1 and (37), the samples in different minibatch are independent of each other.It will lead to the x i with the same y be mapped into z i with various distributions, which can be seen in the left plot of Fig. 5. Thus, this article employs a minibatch generator to constrain z i ∈ y in different minibatches, which is shown in the right plot of Fig. 5.With the "anchor," the samples of different minibatch are linked together.The details of the minibatch generator are given in Algorithm 2.
As shown in Algorithm 2 and Fig. 6, the last sample in each minibatch is chosen as the mark signal or in other words, the anchor.The mark interval λ is employed to adjust the number of anchors between the minibatch.

E. Proposed SimRFE
Contrary to the traditional softmax-based supervised RFF identification method, the SCL-based method consists of the SimRFE and the secondary classifier.The SimRFE shares the same feature extraction layers with the traditional method, but the last fully connection layers are different.In this article, two neural networks are employed to analyze the performance of the SCL-based method.One of them is considered as a weak neural network, and the other is considered as a competent neural network to study the stability of the proposed method.The structure of the neural networks can be seen in Fig. 7.
The neural network in Fig. 7(a) is a typical CNN, which is modified from [27].The neural network in Fig. 7(b) is a convolution, long-short-term-memory, and fully connected deep neural network (CLDNN) [54].The specific parameters are similar to the CNN.The CLDNN can be considered as a combination of CNN and LSTM.The CNN can reduce the frequency variance of RF signals, and the LSTM is employed to handle the middle features with a large number of time features.In order to acquire the middle features at different time scales, the output of the first convolution layer and the second convolution layer are concatenated.Then, the features are passed into the fully connected layers with the same structure.The last layer of both neural networks is dependent on the loss function.The Adam is employed as the optimizer, and the initial learning rate is 0.001.

F. Secondary Classifier
Contrary to the softmax-based RFF identification method, the SCL and triplet loss-based method needs a secondary classifier for RFF identification with the feature codings.Thus, this article employs several classification models, including DL, machine learning, and simple mathematics.
1) DL: The softmax-based CNN and fully connection neural network are used here, which are denoted as DL1 and DL2, respectively.Both neural networks share partial of their structure, which is shown in Table I.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The CNN is composed of a 1-D convolution layer with a 1 × 4 kernel, and two dense layers.While the fully connection neural network consists of the last two dense layers, which are the same as the layers in CNN.
2) Machine Learning: Considering the cost of neural networks, this article also employs the traditional machine learning method as the secondary classifier, i.e., SVM, which is a classical supervised learning method.
3) Distance Similarity Measurement: Both the secondary classifiers above need model fitting and complex parameter passing.Considering the core of the SCL, this article also implements a classification index with the dot product.The detail is given in Algorithm 3, where I(i, j) ≡ {i ∈ I : y i = y j } is the set of all the samples belonging to the jth category, and ẑj is the "standard" feature coding of the jth sample.It is obvious that the dot classifier is based on simple mathematical operations without model fitting and complex parameter passing.Then, the max value of column in the ith row indicates the category i, i.e., ŷi = arg max R dp . (38)

G. Comparison Methods
In this article, an SCL-based RFF identification with data augmentation and VAT is employed to solve the limited samples RFF identification problem, which is named as "SCACNN."The other RFF identification methods for comparison are as follows.
1) Direct and DA-Aid Methods: The direct methods are the classical RFF identification.Here, two different RFF identification architectures are used: softmax-based [27], [31], triplet-based [34], based on training on D raw .Besides, the  [43].It is worth noting that all of the network structures are replaced with suitable structure for fair comparison.
2) SiameseNet Method: The SiameseNet method is a effective contrastive learning architecture, and is also widely used in RFF identification.Here, the SiameseNet method [48], [49] uses D raw for training, and its specific implementation steps has been introduced in Section II.In addition, the CNN of the SiameseNet is also replaced with a suitable structure to adapt to the limited samples RFF identification in this article.
3) Instantaneous Feature: The instantaneous feature is a traditional RFF identification, which extracts the statistical feature of I/Q samples.First, the instantaneous amplitude, phase, and frequency are extracted, and the raw I/Q samples are divided into slices.Then, the variance, skewness, and kurtosis of the instantaneous components of each slice are used to represent the I/Q signal [16].

A. Data Set and Experiment Setup
In this article, to verify the universality of our proposed SCACNN, two data sets are used: 1) an RFF data set collected by ourselves and 2) the public RFF data set [55].The first data set is collected by the spectrum analyzer BB60C.The signal generator VSG25A with different PAs are used to generate RF signals.Each device is collected with LOS transmission.The environment in which the sampled signals are collected is approximately a noise-free environment.In addition, Our data set does not contain a special signal structure, it is only a simple 16QAM modulated signal transmitted by single antenna.The public data set collects the RF signal from different Wi-Fi devices, each emulated by a USRP X310.A fixed USRP B210 is used to collected the RF signals.Both of the parameters in the data set are set to the same to avoid the influence of the other feature in the signal.The detailed settings of data sets are shown in Table .II.
The empirical cumulative distribution function (ECDF) and probability distribution are distribution measurement methods.As can be seen in Fig. 8, all of the devices has the similar distributions.More specifically, the Device 3 and Device 6 may be easily classified.However, the other devices needs appropriate RFF signal identification methods.Both the above data sets are divided into three sets: 1) a training data set; 2) a validation data set; and 3) a test data set.The training data set consists of 60% samples, while the validation data set takes 20% of samples and the test data set takes the rest.RFF identification achieves a great result with the complete training data set and validation data set.Then, a few samples of the complete training data set are randomly selected to simulate the case of RFF identification with a limited data set.Significantly, considering the difference of the data sets, the data ratio mentioned below is a relative value, i.e., 5% of samples in data set A is not equivalent to the same data ratio of samples in data set B.
Data preprocessing is carried out with MATLAB.The loss functions and neural networks are built with Python and Tensorflow.A PC with GeForce RTX 2070 is used to train the neural networks in this article.

B. Performance of Different RFF Identification Method
The performance of different RFF identification methods with 5% RFF samples is shown in Fig. 9.The proposed SCACNN is compared with several classical RFF identification methods.It is obvious that the proposed SCACNN has the best performance in all of the SNR environments.
From the perspective of the RFF identification model architecture, the contrastive loss architecture shows better

TABLE III OPERATION TIME OF DIFFERENT SECONDARY CLASSIFIERS
performance than the Softmax architecture except the DA-SoftmaxNet.It indicates that the contrastive loss architecture does well in dealing with the limited samples problem, or in other words, the contrastive loss architecture has stronger feature extraction capability than the Softmax one.In addition, the DA-aid also shows wonderful performance improvement.However, considering the basic performance of SoftmaxNet, the DA-SoftmaxNet in [43] is incapable to solve the limited samples problem.Besides, the instantaneous feature also shows terrible performance, which indicates that traditional RFF feature extraction will also facing the limited samples problem.

C. Performance of Different Hyperparameters
There are lots of hyperparameters that can influence the performance of the proposed SCACNN.Here, the different secondary classifiers, the length of z i , and the use of the MN layer are considered.As can be seen in Fig. 10(a), the secondary classifiers of the SCACNN shows negligible difference, and the same is true for the classifiers of the triplet loss-based method.The Dot scheme in Fig. 10(a) means the distance similarity measurement.
It is obvious that the Dot shows the worst performance in all case, while the performance of DL1 and DL2 is similar which is slightly less effective than the SVM.However, However, even the difference between the Dot and the SVM is within 1%.From the perspective of computational cost, as can be seen in Table III, the operation time of secondary classifiers are shown.The Dot scheme has the east computational cost, while the DL1 and DL2 have thousandfold operation time.Even the SVM also has twentyfold operation time.Finally, considering the performance and operation time, the SVM and the Dot scheme are both optional secondary classifiers.Here, in order to facilitate comparison with other contrastive architecture models, the SVM is selected as the representative of secondary classifier in this article.Considering the performance of the proposed method with different sizes, only the data set with 10% and 5% samples are shown.In Fig. 10(b), it can be seen that the accuracy of the proposed method increases with the increase of the length of z i .More specifically, the performance with 10% samples shows a relatively constant rate of increase, while the performance with 5% has a sharp increase between length 8 and 16.It is obvious that the longer the feature coding, the richer the feature distribution space.However, a longer feature coding will also increase the cost of computation.Significantly, the length of z i grows as a power of 2, so the actual growth rate is even slower.The feature coding with length of 16 has the optimal performance in both the 10% and 5% data sets.

D. Performance of Different Data Augmentation
The performance of the SCL-based RFF identification method with different data augmentation methods are also studied.The accuracy of RFF identification with different data augmentation methods in 20-dB environment is shown in Table IV.As can be seen, all the proposed data augmentation methods improve the accuracy of RFF identification.fRotation achieves the greatest improvement in all the cases, while Flipping and Shifting are slightly inferior to Rotation.In view of the loss functions, the SCL-based RFF identification method achieves the best performance with all the augmentation.What is more, the accuracy improvement of the SCL-based method also far exceeds the others.The SCL-based method with Rotation augmentation and 5% samples achieves an accuracy of up to 88.88%, and 96.75% with Rotation augmentation and 10% samples.
The performance of the SCL-based RFF identification method in different SNR environments is shown in Fig. 11.When the SNR is lower than 10 dB, the improvement of data  augmentation is small.Rotation shows better performance, when the SNR is up to 10 dB.

E. Ablation Experiment
In order to demonstrate the effectiveness of each components in SCACNN, ablation experiments were carried out in this article.As shown in Table V, the SCACNN has the best performance in all SNR environment.It is obvious that both the DA and VAT have significant performance improvement.The SCL + VAT can improve the accuracy by about 3% when the SNR is up to 5 dB.However, the SCL + VAT also reach the upper bound when the SNR is 15 dB.The SCL + DA has a adequate improvement when the SNR is up to 10 dB.And different from SCL + VAT, the SCL + DA has a huge improvement when the SNR is 20 dB, which indicate the DA can provide additional RFF sample information.Finally, the SCACNN combines the advantages of DA and VAT, and its performance improves with the increase of SNR.
Some specific experiment result about the minibatch generator and the MN layer are also given in Fig. 12. Method 1 is the initial SCL-based method, Method 2 is the initial SCL + minibatch generator, Method 3 is the initial SCL + MN layer, and the methods is the proposed improved SCL-based method.It is obvious that the module layer has huge impact on the result, the Method 3 and Method 4 show great stability.Meanwhile, the minibatch generator can improve performance of the SCL-based method, the Method 2 and Method 4 is better than the Method 1 and Method 4, respectively.It is noting that the minibatch generator will also slightly reduce the stability, which can be ignored when compared with the improvement of identification ratio.

F. Performance of Different Scenarios
The proposed SCL-based RFF identification method is not only for the data set collected by ourselves and CLDNN, but Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE VI ACCURACY OF DIFFERENT LOSS FUNCTIONS WITH
THE WEAK NEURAL NETWORK also has strong generalizability.Here is the performance of our proposed method with the weak neural network and the public data set.As can be seen in Table VI, the SCL-based RFF identification method also outperforms the softmax-based method in the 20-dB SNR environment.From the view of the neural network, compared with the competent neural network, the weak neural network takes more samples to achieve the same accuracy.In addition, the performance of the proposed SCLbased method decrease less than the softmax-based method.Thus, the proposed SCL-based RFF method is more robust to the fitting ability of neural networks.The secondary classifiers shows similar performance as before, while the SVM also achieves the best performance.The public data set consists of the Wi-Fi signals generated by USRP X310, which is collect with USRP B210 [55].The data set contains Wi-Fi signals which are collected twice at different distances.In this article the 12 of the Wi-Fi signals collected at a distance of 2 ft are used.The performance of RFF identification with the public data set is show in Table VII.Here, the performance of the benchmark and augmented with Rotation are shown.It is obvious that the SCL-based method shows an excellent improvement, when compared with the softmax-based method.Rotation also improves the accuracy of the SCL-based method in all the cases.However, the accuracy of the softmax-based method has a negligible increase after Rotation, when the data ratio is 5%.Unlike the performance with our data set, Shifting achieves a poor performance, which may because the public data set is composed of specific data structure.This will also appear with the ADS-B data set [56].

G. Feature Visualization
The feature clustering results of different RFF identification methods with 5% samples are shown in Fig. 13.It is obvious that the proposed SCACNN has the best clustering result.The SCACNN has a clear clustering boundary, while the results of TripletNet and SiameseNet seem a little messy.In addition, the data augmentation has greater improvement with SCL-based method than the TripletNet and SiameseNet.It is worth noting that the VAT shrinks the intraclass distances and expands the interclass distances between feature coding.

VI. CONCLUSION
In this article, we proposed a novel supervised contrastive loss base RFF identification method, which showed excellent performance in the case of using limited samples.A data augmentation method and a model-based regularization constraints were employed to improve the training model performance.Specifically, the proposed method consists of Data preprocessing and Model fitting.The raw RFF data set is converted into augmented minibatch data set.Then, SimRFE maps the RFF samples to the feature space with SCL and VAT, while different secondary classifiers are employed for RFF identification.Although the distance similarity measurement showed a slightly inferior performance, it is still valuable in future work for its simplicity.The simulation results indicated the proposed SCACNN are effective.Feature visualization also demonstrated that the ability of SCACNN was better than TripletNet and SiameseNet.In addition, the feature visualization indicates that it is necessary to further limit the feature intraclass distances.It is worth noting that the current study still has some open problems.The augmented data set exists redundant signal, which increases the cost of training.Thus, a augmented data set evaluation algorithm is needed to choose the representative samples.In addition, the "anchor" in minibatch is selected randomly, when outlier samples are selected, the performance of RFF identification system will be affected.Hence, these questions are worth to address in our future work.

Fig. 1 .
Fig. 1.Similar structure of the metric model and the contrastive model.

Fig. 2 .
Fig. 2. Model of the RFF causes in the transmitter.

Fig. 3 .
Fig. 3. Framework of the proposed DL-based RFF identification method using SCL.

TABLE II PARAMETERS
SETTING OF THE RFF DATA SET DA-aid methods are based on on D raw ∪ D aug

TABLE IV ACCURACY
OF RFF IDENTIFICATION WITH DIFFERENT DATA AUGMENTATION METHODS

TABLE V ABLATION
EXPERIMENT OF DATA AUGMENTATION

TABLE VII ACCURACY
OF DIFFERENT LOSS FUNCTIONS WITH PUBLIC DATA SET