An automatic modulation classification network for IoT terminal spectrum monitoring under zero-sample situations

Rely on powerful computing resources, a large number of internet of things (IoT) sensors are placed in various locations to sense the environment we live. However, the proliferation of IoT devices has led to the misuse of spectrum resources, and many IoT devices occupy the frequency band without permission. As a consequence, the spectrum regulation has become an essential part of the development of IoT. Automatic modulation classification (AMC) is a task in spectrum monitoring, which senses the electromagnetic space and is carried out under non-cooperative communication. Generally, deep learning (DL)-based methods are data-driven and require large amounts of training data. In fact, under some non-cooperative communication scenarios, it is challenging to collect the wireless signal data directly. How can the DL-based algorithm complete the inference task under zero-sample conditions? In this paper, a signal zero-shot learning network (SigZSLNet) is proposed for AMC under the zero-sample situations. The semantic descriptions and the corresponding semantic vectors are designed to generate the feature vectors of the modulated signals. The generated feature vectors act as the training data of zero-sample classes. The experimental results demonstrate the effectiveness of the proposed SigZSLNet. The accuracy of one unseen class and two unseen classes exceeds 90% and 76%, respectively. Simultaneously, we show the generated feature vectors and the intermediate layer output of the model.

valuable resource in wireless networks, and the increase in IoT end devices may lead to spectrum resource abuse. Therefore, regulating the spectrum becomes an important task, and it is a non-cooperative communication method. Consequently, it is imperative to classify receiver modulation types under non-cooperative communication conditions. Some researchers have proposed an automatic classification modulation scheme called automatic modulation classification (AMC), which contributes to signal recognition, threat assessment, and spectrum monitoring [10].
For the urgent needs of spectrum regulation in IoT, AMC has attracted much attention in recent years. The conventional modulation recognition algorithms can be divided into likelihood-based (LB) methods and feature-based (FB) methods [11]. LB determined the modulation mode of the received signal by the likelihood comparison, which infers the labels by the Bayesian optimization. FB regarded AMC as a pattern recognition issue, and it yielded suboptimal classification accuracy although it was less sensitive to the uncertainties. The algorithm extracts the statistical features [12][13][14], instantaneous time features [15] and wavelet features [16] from the original modulated signal, and these features are used as input to the machine learning algo- The data augmentation method that based on Generative Adversarial Networks (GAN) [29] is a possible way to address the issue of missing data for some classes, where the generator generates a specified data to fool the discriminator, and then the discriminator attempts to distinguish the real data from the generated data accurately. In the literature [30], the unstable convergence of GAN was solved, and an efficient approximation of the Wasserstein distance was provided for WGAN. For WGAN-GP [31], the gradient disappearance and the weight clipping were considered by an improved gradient penalty. Note that GAN-based data augmentation methods still require the corresponding data for the training and the augmentation. Under the zero-sample conditions, GAN-based data augmentation methods are not applicable anymore for the data imbalance issue. For the classes with entirely missing data, the zero-sample learning scheme is always considered by expert-set linguistic descriptions, transfer learning, or matrix transformations [32,33]. However, the linear matrix transformation methods have a particularly low inferential accuracy under complex data scenes. Furthermore, the modulated signal does not have a semantic expert dataset. As a result, the methods based on semantic space mapping do not have modulated semantics.
In this paper, in order to better regulate the spectrum usage of IoT devices, a novel signal zero-shot learning network (SigZSLNet) is proposed to solve the issue of zero-sample mentioned above for AMC firstly, where the mapping relations between the classes are established by the semantic space mapping, and the expert-set linguistic descriptions are conducted for the modulated modes. GAN drives the generation of unseen classes, which enriches the training set. In summary, the contributions of this article can be concluded as (1) The semantic descriptions are designed for different modulated signals by their properties, and the semantic vectors are obtained based on the generation module of the semantic attribute vector. (2) The WGAN-GP based method is employed to generate the feature vectors of the modulated signals under the guidance of the semantic vectors. After that, a complete training dataset is constructed with the WGAN-GP module, including the real feature vector and the synthetic feature vector. (3) The complete training dataset is input to the classifier, and the estimation result of the AMC is obtained. Experimental results indicate that the proposed SigZSLNet can solve the zero-sample issue for AMC effectively.
To better present this paper, the remainder of this paper is organized as follows. Section II introduces the motivation and the description of the model. Section III presents the experiments of various groups, and the result is analyzed in detail. We discuss the result in section IV. Finally, Section V presents the conclusion of this article.

Motivation
With the rapid development of 5G technology, more and more sensors and devices are connected to the internet, and the internet is transmitted through the wire and air network, satellite network, etc. As the number of participating end terminal devices becomes more and more extensive, the information received by the central server becomes more and more complex. As the wireless bands become more compact, monitoring and management of wireless signals turn critical and challenging. As shown in Fig. 1, the scenarios of wireless spectrum management are considered in this paper, where the receiver does not have prior knowledge of the modulated signal at the transmitter. Accordingly, the classes of the modulated signal at the receiver may be different from the pre-collected data, and we also need to learn the details of the received signal. The task of non-cooperative communication is to recognize the modulation types of the wireless signal under the above-mentioned scenarios. Additionally, the training data collected in advance are called seen classes, and training data that are not available are called unseen classes. In this way, the available classes of the modulated signal at the receiver is C seen = {c 1 , c 2 , . . . , c m−k } , and the unavailable classes (1) where the T seen and T unseen are the test signal data of seen and unseen classes, respectively. L is the label of the test data.
Generally, the common DL-based methods [21] can infer the received data with an adequate training data set. To make the DL-based method work under zero-sample conditions, this paper mainly concerns how to generate the missing data C unseen accurately with WGAN-GP and how to conduct AMC on the basis of the seen data and the generated data.

Model description
The overall process of our proposed method is shown in Fig. 2, which consists of three subsystems: semantic vector generation, generated module, and classification subsystems.
(2) L = f (T seen , T unseen |C seen ), In the former subsystem, there are two inputs: the wireless signal (seen) and semantic description (seen and unseen). Accordingly, the data of seen signal and the semantic description of seen classes are leveraged to train the generation module of the semantic attribute vector. Consequently, the semantic vector of unseen classes is obtained by the semantic description of unseen classes. Then, in step 2, the original wireless signal of seen types and the pre-trained convolutional neural part are employed to yield the CNN feature vector of seen classes. Next, the generated semantic vector in step 1 and the CNN feature in step 2 are used to generate CNN features of unseen types in step 3. Finally, the CNN feature of seen classes and the generated feature vector of unseen classes form the training set, which is employed to classification subsystem in step 4.
The generation module of the semantic attribute vector is shown in Fig. 3. The network architecture of the proposed SigZSLNet is exhibited in Fig. 4. From Fig. 4, SigZSLNet contains three modules, the CNN module, the WGAN-GP module, and the Classifier module.

Semantic generation
As it is shown in Fig. 3, the generation module of the semantic attribute vector contains the convolutional part and the encoding part, where the convolutional part is to extract the signal data features, and the convolutional encoding part is to encode semantic descriptions and extract semantic features. The layer of semantic generation module and output dimensions of each layer are represented in Table 1. More specifically, various modulated signals are related to their property descriptions. As a result, the property descriptions of different modulated signals are quantified with one-hot encoding in this letter, and the quantification results form the specific semantic vectors, which can guide the WGAN-GP to generate specific data. After the loss function optimization, a semantic feature vector is obtained that can be used as the input to the generator to generate the specific data.
The measured relations between the semantic description and the signal are optimized by Joint Embedding Loss [34], which is formulated as max l s (s n , t n , y n ) + l t (s n , t n , y n ) The generation module of semantic attribute vector. The left part is the convolutional module and the right part is the convolutional encoding module encoder of the input signals, and ϕ(t) is the encoder of the input attribute description. The algorithm for semantic generation module is shown in Algorithm 1, where C seen is the signal of seen classes, D seen is the semantic description of C seen , D unseen represents the semantic description of unseen classes. S seen and S unseen are the output semantic vector of seen classes and unseen classes.

CNN module
The CNN module is leveraged to obtain the feature vectors, which consists of five convolutional layers, one fully connected layer. In detail, the convolution layer  includes a convolution with the stride of 1 × 1 and the convolution kernel of 1 × 8, an activation function f relu = max(0, x) , and a maximum pooling with the stride of 1 × 2 and the kernel of 1 × 2. The output of the last fully connected layer is 128 dimensions. The details are depicted in Table 2. In general, the main role of CNN module is to extract spatial features of modulated signal data. Similarly, for CNN's ability to extract spatial features, the open source pre-trained CNN model with convolutional module can be used as the CNN module in the proposed SigZSLNet. The pre-training weights of the model proposed in [22] can be found at GitHub. In this way, we utilize this model as a CNN module of SigZSLNet, using the output of the first fully connected layer as the feature vector of the signal.

GAN module
The main motivation of the GAN module is to generate the feature vectors of the unseen (missing) classes, which contains the generator (G) part and the discriminator (D) part. The generator consists of two fully connected layers with 256 neurons in the middle layer and 128 outputs, and the activation function in each layer is leaky_relu . Similarly, the discriminator consists of two fully connected layers with 256 neurons, and the activation function in the first layer is leaky_relu , while the second layer has no activation function. Table 2 shows the detailed structure of WGAN-GP. The input to the generator is a semantic attribute vector of the modulated signal, which is generated in the semantic generation module. The input of the discriminator is a 128-dimensional attribute vector generated by the generator. The generator and the discriminator are playing a game called Min-Max, where the generator tries to generate data from the semantic attribute vector that can be judged as true by the discriminator. Simultaneously, the discriminator tries to distinguish between the real feature data from the CNN module and the synthetic data generated by the generator. After several epochs, the generator generates modulated signal data that makes the discriminator difficult to distinguish. In this way, the generated feature data can substitute for the real data. The Wasserstein GAN with gradient penalty (WGAN-GP) objective function is employed in the process we trained, which is proposed by Gulrajani et al. [35]. The optimization objective is expressed as where x ∼ p r denotes the real data and x ∼ p g signifies the generated data from the generator. [� ∇ x D(x) � 2 −1] 2 is the gradient penalty, while serves as the penalty coefficient with the default = 10 . Algorithm 2 describes the generation process of the missing signal classes, where f G (·) denotes the generator, f D (·) denotes the descriminator, f CNN (·) demotes the CNN part, y unseen is the generated feature vector of unseen classes.

Classifier module
The classifier module contains a fully connected layer and a Softmax layer. Specifically, the fully connected layer has 128 neurons. PReLU is used as the activation functions in the fully connected layer, and Softmax is the judicial function of the last layer, which are, respectively, formulated as where ∈ (0, 1) is a variable that can be learned by a backpropagation algorithm, and adjust to the most appropriate slope value.
The training stage focuses on training the classifier to have the ability to classify seen and unseen classes. First, the semantic attribute vector is generated based on the semantic description. In this way, the semantic vectors of the seen and unseen categories are obtained. Then, WGAN-GP is trained by the seen classes and their semantic vectors. Consequently, synthetic feature vectors for the unseen categories are generated from the where h is the cross entropy, which is defined as h(p, q) = − i p i log(q i ) . y denotes the true label and f signifies the classifier; θ is the weight of the fully connected layer, and x seen is the CNN feature of the seen classes; x unseen denotes the unseen class. The test stage is concerned with determining of the category to which the received modulated signal data belongs. Generally, In an end-to-end AMC system, the modulated signals received by the receiver may belong to the seen class or the unseen class of modulated signals. The feature vector is obtained after feature extraction by the CNN module, and then inferred by the classifier module to obtain the inference results.

Dataset and settings
The experimental dataset of the in-phase and quadrature (IQ) samples is obtained based on the MATLAB 2019b platform. The dataset consists of seven modulated modes, Group = {BPSK, QPSK, 2ASK, 4ASK, 16QAM, 32QAM, 64QAM} (SNRs range from 0 to 10 dB, with the stride 1 dB). For each class, there are 400 modulated signals for each SNR, where 300 of them are for training and the rest are for the test. To sum up, the training set contains 23,100 modulated signals, while the test set contains 7700 modulated signals. The semantic description of each modulated mode is shown in Table 3, where the "statistical peaks" is the peak number of the modulated signal.
Four groups of the seen classes are considered in the experiments, which are presented in Table 4 The attribute feature vector is normalized in this letter because it can accelerate the training and prevent overfitting. Simultaneously, the Gaussian Noise is added to the input of the generator because the Gaussian data are easier to map into the CNN feature distribution. Additionally, the harmonic mean is considered to reflect the inference ability, and it is defined as where s denotes the average classification accuracy for the seen classes, and u is for the unseen classes. The implementation of the SigZSLNet is based on Tensorflow. Table 5 provides the performance comparisons of SigZSLNet among various groups. In addition, there is one missing class in Group 1 and Group 2, while Group 3 and Group 4 have two classes of data missing. The recognition accuracy in Table 5 is the average value of five experiments. In fact, the convolutional neural part of the pretrained model can employ in the CNN part of the proposed SigZSLNet. In the experiment, we leverage the ResNet pre-trained on rml2018.a [22] to serve as the CNN module of the SigZSLNet. As shown in Table 5, the average classification accuracy of the proposed SigZSLNet exceeds 76%. In detail, the accuracy of Group 1 and Group 2 exceeds 85%, which indicates that SigZSLNet can effectively conduct AMC under zero-sample situations. Besides, compared with the recognition accuracy of Group 1 and Group 2, the performance of Group 3 and Group 4 declines. With the increase in missing categories, the data quality of generating different missing categories decreases.

Results
The reason for the above result is the limited generation ability of the proposed two fully connected layers. With the increase in missing categories, the data quality of generating decreases. As a result, each class of experiment data has its unique embedding space, and it is difficult for generate models to map various different distribution   spaces at the same time. Simultaneously, the modulated signal is highly susceptible to the signal-to-noise ratio. Under low SNR conditions, the signal is strongly disturbed, the features are not obvious, and the features learned by the generator are not obvious. In experiments, we mixed low SNR and high SNR data, and the model was unable to learn a valid feature representation. Consequently, the recognition accuracy decreases due to the fall of the data generation quality. In total, Table 5 states that the proposed SigZSLNet is an AMC scheme more applicable to real scenarios. The test accuracy comparisons under various SNRs are shown in Fig. 5, where the settings of Group 1-4 are the same as that of Table 5. From Fig. 5, the recognition accuracy improves with the rise of SNR firstly and then slightly declines with the rise of SNR when the SNR is above 6 dB. It indicates that SNR is an important influence factor for the recognition accuracy of AMC. Particularly, the average classification accuracy exceeds 85% of various groups when SNR varies from 6 dB to 9 dB because GAN balances the quality of the generated feature vector. In addition, as discussed in Table 5, the recognition accuracy of the proposed SigZSLNet gradually deteriorates with the rise of the unseen classes. In Fig. 6, 50 synthetic feature vectors and 50 real feature vectors of the random 2ASK modulation signal are visualized. From Fig. 6, although there is much difference between the synthetic feature vector and the real feature vector, their similarity is at a high level, especially for the waveform variation trend. The generated feature vectors have great similarity with the real feature vectors. This means that SigZSLNet can generate the feature vectors of zero-sample classes accurately.
As a supplement, the confusion matrix comparisons of various groups are made under SNR = 6 dB in Fig. 7, where the BPSK signal is missing in Group 1 and exists in Group 2-4. Figure 7 indicates that the classification accuracy of Group 1 is obviously higher than that of Group 2-4 for the BPSK signal, which manifests the generation quality of the synthetic feature vectors by the GAN module outperforms that of the real feature vectors by the CNN module. This states that the proposed SigZSLNet can effectively fill the missing classes and improve the accuracy of AMC under zerosample situations. Simultaneously, for the computational complexity of the proposed model, the generator and discriminator separately consist of two fully connected layers, and the WGAN module and the classifier contain 91,009 parameters, which make the network converse easily. For the training process, the total number of training data is 20900, and the training process takes 0.628 seconds in each epoch. In Fig. 8, we show the visualized results of the feature vectors before input to the classifier module. The 128-dimensional features are downscaled into two-dimensional coordinate vectors by the t-SNE algorithm [36]. Figure 8a-d shows the visualization of the seen and unseen classes of Group 4 with different signal-to-noise ratios for the feature vectors. The classification accuracy of the classifier is highest at the SNR of about 6 dB, as described in the previous experimental results. For the results of dimensionality reduction visualization, at a signal-to-noise ratio of 6dB, different classes are clustered together, and each class is easier to distinguish. When SNRs = {0 dB, 4 dB, 10 dB}, the reduced-dimensional features of BPSK and QPSK are mixed together and not easily distinguished. However, for 4ASK and 64QAM, which are unseen categories, the generated data are distinguishable. It is concluded that the algorithm has the highest classification accuracy at a signal-to-noise ratio of 6 dB because the CNN feature vectors of BPSK and QPSK are easily confused. Thus, the result of difficulty distinguishing between BPSK and QPSK is that the pre-trained CNN feature vector outputs low-quality feature vectors.

Discussion
Simulation results show that the SigZSLNet proposed in this paper can generate data of missing modulation signal to make up for that data in the training set and solve zero-sample in AMC. In the groups where one category was missing, and two categories were missing, the accuracy rate exceeded 76% in the task of classifying seven categories. However, it is known from the simulation results that the generated modulated signals lack diversity and can only generate data of modulated signals with a single SNR. The model needs to be further improved in terms of SNR diversity to generate rich modulated signal SNRs. In addition, the poor robustness of the open-source CNN model, which is only trained with 24 categories of modulated signals, makes it difficult to excel in our experiments. In future work, we will collect rich modulated signal data, making a dataset like Imagenet [37], and CoCo [38], to train more robust pre-trained models.

Conclusion
The increasing number of IoT devices means that more traffic will occupy the scarce available spectrum in the future, so it becomes extremely important to regulate and recognize the observed signals. However, in complex electromagnetic environments, some classes of modulated signals cannot be collected in advance to train the classifier, which requires us to find a way to address the recognition of signals in zero-sample conditions. In this paper, we first propose the method SigZSLNet to implement AMC under zero-sample conditions. Based on the semantic feature vector, the feature vectors of the missing modulated signals are generated with WGAN, the classification accuracy of unseen classes has been greatly improved.
Simultaneously, the various groups' experimental results validate the effectiveness of the proposed SigZSLNet. The proposed method obtains an average accuracy of over 85% in the missing category when one modulation signal is missing, corresponding to an accuracy of over 76% for the classification over seen and unseen classes. While two modulation signals are missing, the proposed algorithm obtains an average accuracy of more than 69% in the missing category, corresponding to an accuracy of more than 76% for the seven classification tasks. We visualized the generated feature vectors for comparison. Also, We visualized the data clustering for each category based on the PCA algorithm and t-SNE algorithm to prove the validity of the generated data. In conclusion, the experimental results show that our proposed method effectively solves the AMC task of spectrum resource management for IoT terminal devices. Authors' contributions Q.Z was in charge of the major theoretical analysis, algorithm design, experimental simulation, and paper writing. XJ.J and Q.Z had contributions to paper writing. FP.Z and RH.Z had contributions to theoretical analysis and gave suggestions of the organization. All authors read and approved the final manuscript.

Funding
This research was solely the work of the authors, funded by no authority.

Availability of data and materials
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.