Open set recognition algorithm based on Conditional Gaussian Encoder

: For the existing Closed Set Recognition (CSR) methods mistakenly identify unknown jamming signals as a known class, a Conditional Gaussian Encoder (CG-Encoder) for 1-dimensional signal Open Set Recognition (OSR) is designed. The network retains the original form of the signal as much as possible and deep neural network is used to extract useful information. CG-Encoder adopts residual network structure and a new Kullback-Leibler (KL) divergence is defined. In the training phase, the known classes are approximated to different Gaussian distributions in the latent space and the discrimination between classes is increased to improve the recognition performance of the known classes. In the testing phase, a specific and effective OSR algorithm flow is designed. Simulation experiments are carried out on 9 jamming types. The results show that the CSR and OSR performance of CG-Encoder is better than that of the other three kinds of network structures. When the openness is the maximum, the open set average accuracy of CG-Encoder is more than 70%, which is about 30% higher than the worst algorithm, and about 20% higher than the better one. When the openness is the minimum, the average accuracy of OSR is more than 95%.


Background and motivation
In order to ensure the communication quality, corresponding anti-jamming measures should be adopted according to different jamming [1]. The anti-jamming effect depends on the accurate  [19] proposed the OpenMax model and replaced the SoftMax activation layer in the neural network with the OpenMax layer, which is used to estimate the probability that input images come from unknown classes. This is the first solution for an open set deep network. Prakhya et al. [20] explored open set text categorization along the OpenMax model. Shu et al. [21] replaced SoftMax layer with 1-vs-rest layer, and proposed deep open classifier (DOC) model for text classification. Kardan et al. [22] proposed a COOL (Competitive Overcomplete Output Layer) neural network, and demonstrated the effectiveness of COOL by applying it to high-dimensional images. Dhamija et al. [23] solves the OSR problem by combining SoftMax with novel entropy open set and target ball loss. Shu et al. [24] proposed a joint open classification model to determine whether a pair of samples belong to the same class, where the sub-model can be used as a distance function of clustering to discover hidden classes in rejected samples. But these models cannot be directly applied to noisy signal recognition, which are suitable for computer vision, text classification and etc.

Generative model
Different from the discriminative model, the generative method uses GAN [25], auto-encoder [26] and flow-based model [27] to generate unknown or known samples to help the classifier learn the decision boundary between known and unknown samples. Ge et al. [28] proposed the G-OpenMax algorithm, which is a direct extension of OpenMax, using conditional generative to synthesize unknown classes. This algorithm provides explicit probability estimates of the generative unknown classes, enabling the classifier to locate decision margins based on the knowledge of known classes and generative unknown classes. Unlike G-OpenMax, Neal et al. [29] introduced a new data set enhancement technique called OSRCI, which uses the VAEGAN architecture to generate synthetic open set examples that are close to but not part of any known class. Similar to [29], Jo et al. [30] used GAN technology to generate pseudo data as unknown class data to further enhance the robustness of unknown class classifier. Yoshihashi et al. [31] proposed a Classification-Reconstruction Open Set Recognition (CROSR), which uses latent representations to reconstruct, enabling unknown class detection robustly without compromising the classification accuracy of known classes. Oza and Patel [32] proposed a C2AE model using a class conditional auto-encoder with novel training and testing methods, which uses class conditional auto-encoder to derive the decision boundary from EVT reconstruction errors. Variational Auto-Encoder (VAE) [33] is combined with clustering [34], one class [35] or Gaussian mixed model (GMM) [36] algorithm for OSR. The posterior distribution ( ) | q  zx in latent space is trained to approximate a prior distribution ( ) p  z , which enables VAE to correctly describe the known data, and the deviated samples will be identified as unknown. Xin et al. [26] provided VAE with a kind of conditional Gaussian distribution learning, which can detect unknown and classify known samples by forcing different latent features to approach different Gaussian models. Zhang et al. [27] proposed a joint embedded space consisting of a classifier and a flow-based density estimator. But these generative models cannot get ideal effect of OSR of noisy signals.
However, the CG-Encoder we propose in this paper not only classifies known jamming types, but also detects an unknown jamming accurately.

Contributions and structure of the paper
The contributions of this paper are mainly as follows: • To our knowledge, we are the first to study open set recognition of communication interference signals. • We propose a new classification model called CG-Encoder. Compared with previous methods based on convolution neural network, the proposed method not only achieves better classification results, but also can be used for unknown detection. • We found a novel unknown detection method based on probability density function. The proposed algorithm is superior to other detection methods for unknown signals. • We conduct experiments on nine common classes of communication jamming, and the results show that our method outperform existing methods and achieve new state-of-the-art performance. The rest of the paper is organized as follows. Section 2 introduces briefly Variational Auto-Encoder (VAE) and Deep Residual Structure. Section 3 discusses Open Set Recognition Algorithm Based on CG-Encoder in detail. Finally, Section 4 gives the algorithm simulations and performance analysis in detail. Finally, Section 5 concludes the paper.

Variational auto-encoder (VAE)
VAE [33] is generally composed of two neural networks: encoder and decoder. The parameters, input and output of the encoder are  , sample x and latent representation z, respectively. The parameters, input and output of the decoder are θ, z and the probability distribution of samples. The loss function of VAE is as follows: where is the KL-divergence between the approximate posterior distribution ( ) | q  zx and the prior distribution ( ) p θ z and represents the reconstruction error.
In general, Gaussian distribution with diagonal covariance matrix: where the mean µ and the standard deviation σ are the encoding multilayered perceptrons' (MLPs) outputs. z is defined as: 0I , • is the element-wise product. The KL-divergence [18] can be calculated: where J is the dimensionality of z. By minimizing ( ) ,, L  θ x , the VAE is trained not only to reconstruct the input accurately, but also to force ResNet [18] is a deep residual structure, which is constructed from the basic block shown in Figure 1, it is defined as:

Deep residual structure
where x and y are the input and output vectors, the function F(W, x) represents the residual map to be learned, and relu is one of the nonlinear operations. The structure in Figure 1 has two layers, then The form of residual function F is variable, and the trunk of basic block can stack more layers. For the sake of simplicity, the above symbol is about the fully connected layer. In fact, the function F(W, x) can represent multiple convolution layers, and the elements are added channel by channel on two feature maps.

Open set recognition algorithm based on CG-Encoder
In the communication jamming recognition identification, the input sample is 1-dimensional , x0 is the jamming signal with a sampling length of l, and noise is Gaussian white noise with the same length. If the inputs are reconstructed by the usual VAE model, this will be affected by the noise, and the reconstruction loss cannot be used as the condition of unknown detection. Therefore, this paper only uses the encoder network to learn the latent feature distribution of the classes, and judges whether the test samples are known or unknown by their probability density value, so as to realize the unknown detection.

Design of the CG-Encoder structure
As shown in Figure 2, the structural block diagram of the jamming signal OSR method (CG-Encoder) consists of three modules. Encoder, Classifier, and Detector.
Encoder is a 1-dimensional residual network, which consists of 33 1-dimensional convolution layers (including 16 basic residual blocks), two 1-dimensional pooling layers and two fully connected layers. Its input is x; The outputs are the mean μ and variance 2 σ obtained by the two parallel fully connected layers respectively. The nonlinear function softplus is used to ensure that all components of variance are greater than 0. The input and output dimensions of the residual blocks of the solid shortcut in the Figure 2 are the same, and Eq. (5) is used to calculate output. And the dimensions are not same in the dotted shortcut, and the input and output should map linearly using Eq. (6).
The convolution layer parameters meanings are convolution kernel size, type of convolution layer, number of convolution kernel, and change of the sequence length through that layer. For example, the first layer parameters are {7 × 1 conv1d, 64, /2}, which mean that the layer uses 1-dimensional convolution (conv1d) with convolution kernel size of 7 × 1, the number of convolution kernel is 64, and the jamming sequence length after the layer is shortened by half. The pooling layers are max pool and adapt pool respectively. The former reduces the signal sequence length by half after, and the latter can accept inputs of any length sequence and make the output length fixed, here set a fixed value as 1.
Classifier is a fully connected layer with SoftMax as the activation function. Its input is z obtained by Eq. (3) and its output is a known class label.
Detector is modeled by information hidden in the latent representation z. During testing, the detector is viewed as a binary classifier. When output is 1, x is recognized as unknown jamming, and when output is 0, x is recognized as class y.

Closed set training phase
In the training phase,  (4)) is modified as follows: CG-Encoder has no decoder compared with VAE, so the loss function discards the reconstruction error in Eq. (1) and adds the classification loss where num is the batch size, K is the number of known classes, zi is the feature of the i-th sample, yi is the class label corresponding to xi, and Wj and bj are the weight and bias of class j.

The loss function of CG-Encoder is
where λ is a constant. The parameters of CG-Encoder are optimized by minimizing the loss function L, and the training method is consistent with the common closed set training method. During training, the latent vector z of correctly classified training set samples is saved for later open set testing to use.

Establishment of multivariate gaussian model
According to the class labels of training samples, the latent vector z is divided into K sets, namely {z1}，{z2}，…，{zK}, each set contains only one class latent representation. The mean vector and covariance matrix of K kinds of multivariate Gaussian distribution models can be obtained from   (12) where n is the dimension of latent space.

Threshold setting
Because the signal distribution can provide effective information for unknown detection, according to Eq. (12), the probability density values of all latent vectors in K sets {z1},{z2},…,{zK}, namely {p1}，{p2}，…，{pK} are calculated and arranged in descent in each set. In a manner similar to Reference [26], the threshold k  is set to less than the probability density of the first 98% and greater the probability density of the last 2%.

Open set test algorithm
The specific steps of the algorithm are as follows: , the Detector detects xt as unknown, otherwise, it gets its class y through Classifer.

Algorithm simulation and performance analysis
The Adam optimizer with initial learning rate of 0.001 is used, and the batch size is fixed to 256; the dimension n of latent representation z is 32, parameter λ is set to 100.

Datasets
In order to test the performance of the proposed OSR method, simulation experiments are carried out on 9 kinds of jamming signals, including single-tone jamming, multi-tone jamming, periodic Gaussian pulse jamming, frequency hopping jamming, linear sweeping frequency jamming, second sweeping frequency jamming, BPSK modulation jamming, noise frequency modulation jamming and QPSK modulation jamming. The range of jamming-to-noise rate (JNR) is -10~18dB, with a value taken every 2dB. The additive noise is Gaussian white noise in the signal band. The sampling frequency is 10MHz, the number of sampling points is l, the size of jamming sample is expressed as 1 × l, and the parameters of each jamming type are shown in Table 1. Figure 3 shows the time domain waveforms of the above 9 jamming signals randomly generated when JNR = 10dB and l = 1024. The frequency is quadratic, and other parameters are the same as jam5.

BPSK modulation jam7
The information symbol is a 32-bits 0,1 random sequence, the symbol period is 3.2 s  , and the modulation signal is sinusoidal signal.
noise frequency modulation jam8 The frequency modulation coefficient is between 0.125 and 0.933, and the carrier signal parameters are the same as jam1.

QPSK modulation jam9
The information symbol is a 32-bit 0,1 random sequence, the symbol period is 3.2 s  , I-channel modulation signal is sinusoidal signal, Q-channel modulation signal is cosine signal.

Performance analysis of CSR and OSR
The performance of CSR and OSR of CG-Encoder algorithm and the following three algorithms with JNR of -10 ~ 18dB is simulated and analyzed.
(1) CNN [12]. The network structure of this algorithm is similar to CG-Encoder, the difference is that there is no shortcut, and only one fully connected layer is connected after convolution layers to get the latent vector z. The threshold of unknown detection is the confidence that makes 98% of the correctly classified training samples known. If the confidence of test sample is greater than the threshold, it is known. If the confidence of test sample is less than the threshold, it is unknown. The model can be regarded as a traditional CNN.
(2) ResNet [18]. The network structure of this algorithm is similar to CG-Encoder, but only one fully connected layer is connected after convolution layers to get the latent vector z. Unknown detection algorithm is the same as CNN. This model can be regarded as a common ResNet structure.
(3) ResNet+G [26]. The network structure of this algorithm is similar to CG-Encoder, the difference is that the posterior distribution of all classes approximates a single multivariate Gaussian distribution. The open set testing phase is the same as section 3.3.3. This model can be regarded as that ResNet learning a multivariate Gaussian model, named ResNet+G.

CSR performance analysis
CSR is usually recognition for known classes, without using unknown detector. Set the number of sampling points to 1024. The training set classes include jam1 ~ jam8, each class has 2000 samples under each JNR, a total of 240000. The testing set classes are also jam1 ~ jam8, with 2000 samples for each category under each JNR. In this paper, the accuracy is used to measure the performance of the algorithm. The experimental results of the four algorithms are shown in Figure 4. It can be seen from Figure 4 that the closed set recognition accuracy of the four algorithms increases with the increase of JNR. When JNR > -10dB, the accuracy is higher than 88%, and the accuracy is close to 1 when JNR > 0dB, so the recognition performance of four networks for known classes is better. Under the low JNR, CNN performance is slightly inferior to the other three networks. Under the high JNR, ResNet+G performance is slightly inferior to the other three networks. ResNet performance is better than CNN, which shows that shortcut in residual structure can improve recognition performance of known classes. ResNet performance is better than ResNet+G, which indicates that the difference between classes will be reduced by the approximation of posterior distribution to a single Gaussian model. CG-Encoder performance is equivalent to ResNet, which illustrates that the latent distribution of different classes approximates different Gaussian models, which can improve the performance of CSR.

OSR performance analysis
The training set of OSR is consistent with CSR, and jam9 is added to the testing set as the unknown class to verify the unknown detection performance of the four algorithms. The experimental results of OSR are shown in Figure 5.
In the case of OSR, the accuracy of open set recognition increases with the increase of JNR. The CG-Encoder algorithm has the best performance, which proves the OSR effectiveness of the algorithm for noisy jamming signals. When JNR = -10 ~ 0dB, the accuracy is low, which indicates that noise has a great influence on OSR performance. When JNR > 5dB, the change of accuracy is small, and the performance of each algorithm is stable.
When JNR > 0dB, the performance of OSR of network structure is CG-Encoder > ResNet > CNN > ResNet+G, and CG-Encoder is about 2%, 4% and 10% higher than the average accuracy of other three algorithms respectively. CG-Encoder > ResNet+G shows that the latent distribution of different classes approximates different Gaussian models, which not only makes the known classes more separable, but also improves the division between known and unknown classes. ResNet > CNN shows that the shortcut method improves the accuracy of known classes, and also indirectly benefits the performance of OSR. CNN > ResNet+G indicates that when all the latent distributions of all classes belong to one distribution, the unknown class will approach the distribution. Even if residual structure is adopted, the performance of ResNet+G will no t be better than that of ordinary CNN network.

Visual analysis of latent space features
In order to better observe the latent space features of the samples, the dimension of latent representation z is set to 2, and four kinds of algorithm network models are retrained to visualize the latent space that each network learned on the 2-dimensional plane, as shown in Figure 6.  Figure (c), ResNet+G network makes the unknown class almost coincide with jam4 and jam8, which is difficult to distinguish. As can be seen from Figure (d), CG-Encoder algorithm completely separates the known classes, whose effect the former three algorithms cannot achieve. Although the unknown class are close to jam4, they only overlap a little, and the Detector can effectively detect the unknown jamming.

Openness of OSR
Openness is related to the number of training classes train N and the number of test class test N .
The formula is given in Reference [13]. 2  According to the formula of openness, the larger train N , the smaller O, the less unknown information. Figure 5 shows that the OSR performance of the four algorithms is relatively stable when JNR > 0dB, so the average accuracy between JNR = 0 ~ 18dB is used to analyze the openness of the four algorithms. As shown in Figure 7, the horizontal axis represents the degree of openness. In order to be more intuitive, train N v test N is used instead of its corresponding O value. On the whole, the OSR performance of the four algorithms increases with the increase of the number of known class, indicating that the less unknown information, the better the OSR performance. The CG-Encoder algorithm proposed in this paper has the best recognition effect under different openness. When 2 train N = 、 9 test N = , the OSR average accuracy of CG-Encoder algorithm is more than 70%, which is about 30% and 20% higher than CNN and ResNet+G, respectively. When the openness is the minimum, the OSR average accuracy of CG-Encoder algorithm can reach more than 95%. ResNet+G is better than that of CNN and ResNet algorithm, while when 5 train N  , the recognition performance of ResNet+G is worse than that of ordinary CNN algorithm, which indicates that the more the number of known classes is, the more confusion between classes caused by using posterior distribution to approximate a single distribution will be, and the more features of each class need to be learned.

Conclusion
In order to solve the problem that the existing jamming signal recognition algorithms mistakenly recognize the unknown class as a known class with a certain probability, a CG-Encoder network structure suitable for 1-dimensional signal OSR is constructed based on ResNet and multivariate Gaussian model. This paper not only defines a reasonable loss function for the training of the network, but also designs a specific OSR process. For nine types of jamming, simulation experiments are carried out under JNR= -10 ~ 18dB. The CSR and OSR performance of CNN,